## Series as dictionary

In [6]:
import pandas as pd
obj_series = pd.Series([1,2,3,4], index=['a','b','c','d'])
obj_series

a    1
b    2
c    3
d    4
dtype: int64

In [2]:
#indexing key
obj_series.keys()

Index(['a', 'b', 'c', 'd'], dtype='object')

In [3]:
obj_series.values

array([1, 2, 3, 4], dtype=int64)

In [4]:
'b' in obj_series

True

## Series as Array

In [5]:
# slicing dengan explicit index
obj_series['a':'c']

a    1
b    2
c    3
dtype: int64

In [7]:
#slicing dengan implicit index
obj_series[:3]

a    1
b    2
c    3
dtype: int64

## Indexing menggunakan .loc dan .iloc objek series
- .loc indexing: indexing berdasarkan label objek series
- .iloc indexing: indexing berdasarkan posisi objek series

In [8]:
#.loc indexing
obj_series.loc['a':'c']

a    1
b    2
c    3
dtype: int64

In [9]:
#.iloc indexing
obj_series.iloc[:3]

a    1
b    2
c    3
dtype: int64

## Filtering objek series

In [10]:
obj_series[obj_series>2]

c    3
d    4
dtype: int64

In [11]:
obj_series[['c','d']]

c    3
d    4
dtype: int64

## Indexing DataFrame
- Akses DataFrame sering juga di kenal indexing atau Subset Selection. secara sederhana maknanya adalah kita memilih suatu data dari baris tertentu dan column tertentu. kita bisa saja memilih beberapa baris dan semua kolom dari data, semua baris dan beberapa column dari data, atau beberapa baris dan bebearpa column dari data.


In [12]:
df_ramen = pd.read_csv('ramen-ratings.csv')
df_ramen.head()

Unnamed: 0,Review #,Brand,Variety,Style,Country,Stars,Top Ten
0,2580,New Touch,T's Restaurant Tantanmen,Cup,Japan,3.75,
1,2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack,Taiwan,1.0,
2,2578,Nissin,Cup Noodles Chicken Vegetable,Cup,USA,2.25,
3,2577,Wei Lih,GGE Ramen Snack Tomato Flavor,Pack,Taiwan,2.75,
4,2576,Ching's Secret,Singapore Curry,Pack,India,3.75,


In [16]:
# indexing operator as series
df_ramen['Brand']

0            New Touch
1             Just Way
2               Nissin
3              Wei Lih
4       Ching's Secret
             ...      
2575             Vifon
2576           Wai Wai
2577           Wai Wai
2578           Wai Wai
2579          Westbrae
Name: Brand, Length: 2580, dtype: object

In [17]:
df_ramen[['Brand']].head()

Unnamed: 0,Brand
0,New Touch
1,Just Way
2,Nissin
3,Wei Lih
4,Ching's Secret


In [18]:
df_ramen.Brand

0            New Touch
1             Just Way
2               Nissin
3              Wei Lih
4       Ching's Secret
             ...      
2575             Vifon
2576           Wai Wai
2577           Wai Wai
2578           Wai Wai
2579          Westbrae
Name: Brand, Length: 2580, dtype: object

In [20]:
# indexing using .loc
df_ramen.loc[:, ['Country', 'Stars']].head()

Unnamed: 0,Country,Stars
0,Japan,3.75
1,Taiwan,1.0
2,USA,2.25
3,Taiwan,2.75
4,India,3.75


In [21]:
#indexing using .iloc
df_ramen.iloc[:, [4,5]].head()

Unnamed: 0,Country,Stars
0,Japan,3.75
1,Taiwan,1.0
2,USA,2.25
3,Taiwan,2.75
4,India,3.75


## Filtering DataFrame with Boolean
- Dapat dibilang kalau filtering dataframe dengan boolean is more human. karena kita mengakses suatu data berdasarkan logika dan dapat kita masukan ke dalam suatu statement, sebagai contoh, ‘Ambilkan data yang memiliki nilai luas lebih dari 50’. ketika mengubahnya kedalam suatu kode python ckup dengan menuliskan df>50. maka akan menghasillkan sua tu data Boolean Series berisi Ya (True) dan Tidak (False).

In [23]:
df = pd.read_csv('Titanic.csv')

df.head()

Unnamed: 0,Name,PClass,Age,Sex,Survived
0,"Allen, Miss Elisabeth Walton",1st,29.0,female,1
1,"Allison, Miss Helen Loraine",1st,2.0,female,0
2,"Allison, Mr Hudson Joshua Creighton",1st,30.0,male,0
3,"Allison, Mrs Hudson JC (Bessie Waldo Daniels)",1st,25.0,female,0
4,"Allison, Master Hudson Trevor",1st,0.92,male,1


In [27]:
# boolean series orang orang yang selamat
survived = df['Survived'] == 1
print(survived)

0        True
1       False
2       False
3       False
4        True
        ...  
1308    False
1309    False
1310    False
1311    False
1312    False
Name: Survived, Length: 1313, dtype: bool


In [28]:
df[survived]

Unnamed: 0,Name,PClass,Age,Sex,Survived
0,"Allen, Miss Elisabeth Walton",1st,29.00,female,1
4,"Allison, Master Hudson Trevor",1st,0.92,male,1
5,"Anderson, Mr Harry",1st,47.00,male,1
6,"Andrews, Miss Kornelia Theodosia",1st,63.00,female,1
8,"Appleton, Mrs Edward Dale (Charlotte Lamson)",1st,58.00,female,1
...,...,...,...,...,...
1279,"Vartunian, Mr David",3rd,22.00,male,1
1289,"Wennerstrom, Mr August Edvard",3rd,,male,1
1293,"Wilkes, Mrs Ellen",3rd,45.00,female,1
1302,"Yalsevac, Mr Ivan",3rd,,male,1


### 2. Data laki-laki yang selamat
- Seperti biasa kita buat terlebih dahulu boolean series nya, kali ini kita membuat 2 boolean series. yaitu boolean series orang yang selamat dan bolean series orang yang berjenis kelamin laki-laki. berikut caranya


In [33]:
# bolean series orang yang selamat
survived = df['Survived'] == 1
# boolean series laki-laki
laki2 = df['Sex'] == 'male'
print(laki2)

0       False
1       False
2        True
3       False
4        True
        ...  
1308     True
1309     True
1310     True
1311     True
1312     True
Name: Sex, Length: 1313, dtype: bool


In [34]:
print(df[(survived) & (laki2)])

                               Name PClass    Age   Sex  Survived
4     Allison, Master Hudson Trevor    1st   0.92  male         1
5                Anderson, Mr Harry    1st  47.00  male         1
13         Barkworth, Mr Algernon H    1st    NaN  male         1
18       Beckwith, Mr Richard Leord    1st  37.00  male         1
20             Behr, Mr Karl Howell    1st  26.00  male         1
...                             ...    ...    ...   ...       ...
1242        Tenglin, Mr Gunr Isidor    3rd    NaN  male         1
1258    Tornquist, Mr William Henry    3rd  25.00  male         1
1279            Vartunian, Mr David    3rd  22.00  male         1
1289  Wennerstrom, Mr August Edvard    3rd    NaN  male         1
1302              Yalsevac, Mr Ivan    3rd    NaN  male         1

[142 rows x 5 columns]


## 3. Data Perempuan yang tidak selamat dengan umur lebih dari 40 tahun atau kurang dari 20 tahun.
- Coba perhatikan ada berapa boolean series yang perlu kita buat. disini ada 4 boolean series yang perlu kita buat, yaitu orang yang tidak selamat, orang yang berjnis kelamin perempuan, orang yang berumur lebih dari 40 tahun, dan orang yang kurang dari 20 tahun. jadi mari kita buat boolean series nya.

In [36]:
# orang yang tidak selamat
not_survived = df['Survived'] == 0

# orang berjnis kelamin perempuan
perempuan = df['Sex'] == 'female'

# orang yang berumur lebih dari 40 tahun
lebih_40 = df['Age'] > 40

# orang yang berumur kurang dari 20 tahun
kurang_20 = df['Age'] < 20

In [37]:
df[(not_survived) & (perempuan) & (lebih_40 | kurang_20)]

Unnamed: 0,Name,PClass,Age,Sex,Survived
1,"Allison, Miss Helen Loraine",1st,2.0,female,0
149,"Isham, Miss Anne Elizabeth",1st,50.0,female,0
253,"Straus, Mrs Isidor (Ida Blun)",1st,63.0,female,0
363,"Carter, Mrs Ernest Courtey (Lillian Hughes)",2nd,44.0,female,0
442,"Hiltunen, Miss Marta",2nd,18.0,female,0
487,"Mack, Mrs Mary",2nd,57.0,female,0
626,"Andersson, Miss Ebba Iris",3rd,6.0,female,0
627,"Andersson, Miss Ellis An Maria",3rd,2.0,female,0
630,"Andersson, Miss Ingeborg Constancia",3rd,9.0,female,0
632,"Andersson, Miss Sigrid Elizabeth",3rd,11.0,female,0


## Data Transformation
- Data transformasi adalah mengubah format data kepada suatu format yang kita inginkan. didalam matematika dikenal dengan istilah mapping. dalam melakukan transformasi kita memerlukan terlebih dahulu suatu fungsi yang dapat memetakan bentuk data awal kita menjadi bentuk data akhir yang kita inginkan.



In [40]:
def minus_minimum(df):
    return df - df.min()

In [42]:
print(df.Age)

0       29.00
1        2.00
2       30.00
3       25.00
4        0.92
        ...  
1308    27.00
1309    26.00
1310    22.00
1311    24.00
1312    29.00
Name: Age, Length: 1313, dtype: float64


In [44]:
print(df[['Age']].apply(minus_minimum))

        Age
0     28.83
1      1.83
2     29.83
3     24.83
4      0.75
...     ...
1308  26.83
1309  25.83
1310  21.83
1311  23.83
1312  28.83

[1313 rows x 1 columns]
