# Pandas

Pandas has two objects, namely series and data frame

Pandas (Python for Data Analysis) is a Python library that focuses on data analysis processes such as data manipulation, data preparation, and data cleaning. Pandas provides high-level data structures and functions to make working with structured/tabular data faster, easier, and expressive.

Import libraries that are needed

In [4]:
import pandas as pd
import numpy as np

## Object Series

A Pandas Series is like a column in a table. It is a one-dimensional array holding data of any type

### Change into a Series

In [1]:
data = [0.25, 0.50, 0.75, 1]

In [2]:
print(data)

[0.25, 0.5, 0.75, 1]


In [5]:
data = pd.Series(data)

In [6]:
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

### Convert series into an array

In [7]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

### Showing Index

In [8]:
data.index

RangeIndex(start=0, stop=4, step=1)

### How to select data

In [9]:
data[2]

0.75

Implicit indexing is the default index. We can define our index, known as explicit indexing. When defining an index, the number of indexes must match the number of data points.

In [10]:
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [11]:
data = pd.Series([0.25, 0.50, 0.75, 1], index=['a', 'b', 'c', 'd'])

In [12]:
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [13]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

In [14]:
data.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [15]:
# call explicit data

data['a']

0.25

In [16]:
# panggil data implisit

data[3]

  data[3]


1.0

In [18]:
bilganjil_2 = pd.Series([1,3,5,7,9], index=[2,4,6,8,10])

In [19]:
bilganjil_2[8]

7

In [20]:
bilganjil_2[1]

KeyError: 1

In [21]:
bilganjil = pd.Series([1,3,5,7,9], index=['a', 'b', 'c', 'd', 'e'])

In [22]:
bilganjil

a    1
b    3
c    5
d    7
e    9
dtype: int64

In [23]:
# indeks eksplisit
bilganjil['b':'c']

b    3
c    5
dtype: int64

In [24]:
# indeks implisit
bilganjil[1:2]

b    3
dtype: int64

## LOC and ILOC

### Loc (Location)

Loc is short for location. As the name implies, it is used to select data at a specific location only.

In [25]:
# indeks eksplisit
bilganjil.loc['a']

1

In [26]:
bilganjil.loc['b':'d']

b    3
c    5
d    7
dtype: int64

### Iloc ()

The iloc property gets, or sets, the value(s) of the specified indexes. Specify both row and column with an index.

In [27]:
# indeks implisit
bilganjil.iloc[0]

1

In [28]:
# indeks implisit
bilganjil.iloc[1]

3

In [29]:
# indeks implisit
bilganjil.iloc[1:3]

b    3
c    5
dtype: int64

## Dictionary

In [30]:
dict_Pop = {'Indonesia':273.523e6, 
            'United States':332.905e6, 
            'Japan':125.833e6
}

In [31]:
dict_Pop

{'Indonesia': 273523000.0, 'United States': 332905000.0, 'Japan': 125833000.0}

In [32]:
dict_Area = {'Indonesia': 1.9047e6, 
        'United States':9.8335e6, 
        'Japan':377.975e3
}

In [33]:
dict_Area

{'Indonesia': 1904700.0, 'United States': 9833500.0, 'Japan': 377975.0}

In [34]:
# transform dict to series
Population = pd.Series(dict_Pop)

In [35]:
Population

Indonesia        273523000.0
United States    332905000.0
Japan            125833000.0
dtype: float64

In [36]:
# transform dict to series
Area = pd.Series(dict_Area)

In [37]:
Area 

Indonesia        1904700.0
United States    9833500.0
Japan             377975.0
dtype: float64

In [38]:
Country = pd.DataFrame({'Population':Population, 'Area':Area})

In [39]:
Country

Unnamed: 0,Population,Area
Indonesia,273523000.0,1904700.0
United States,332905000.0,9833500.0
Japan,125833000.0,377975.0


In [40]:
Country ['Population']

Indonesia        273523000.0
United States    332905000.0
Japan            125833000.0
Name: Population, dtype: float64

In [41]:
Country ['Population']['Indonesia']

273523000.0

In [42]:
# call by syntax
Country.Population

Indonesia        273523000.0
United States    332905000.0
Japan            125833000.0
Name: Population, dtype: float64

In [43]:
# call by braket 
Country['Population']

Indonesia        273523000.0
United States    332905000.0
Japan            125833000.0
Name: Population, dtype: float64

In [44]:
# Ganti nama column pop menjadi populasi
Country = pd.DataFrame({'Total Population':Population, 'Area':Area})

In [45]:
Country 

Unnamed: 0,Total Population,Area
Indonesia,273523000.0,1904700.0
United States,332905000.0,9833500.0
Japan,125833000.0,377975.0


In [46]:
Country['Total Population']

Indonesia        273523000.0
United States    332905000.0
Japan            125833000.0
Name: Total Population, dtype: float64

In [47]:
# index eksplisit
Country ['Total Population']['Indonesia':'United States']

Indonesia        273523000.0
United States    332905000.0
Name: Total Population, dtype: float64

In [48]:
# index implisit
Country ['Total Population'].iloc[0:3]

Indonesia        273523000.0
United States    332905000.0
Japan            125833000.0
Name: Total Population, dtype: float64