## 학습목표:

- Pandas 노트 02장에서는 Pandas 데이터프레임(DataFrame)에서 가장 많이 사용하는 기능인 조회, 정렬 그리고 조건필터에 대해 알아봅니다.

- 조회, 정렬, 조건필터 기능은 엑셀에서도 가장 많이 활용하는 기능입니다. 데이터를 정리, 분석할 때 이러한 기능 없이 제대로 분석하기란 불가능에 가깝습니다.

- Pandas는 조회, 정렬, 조건필터의 기능을 매우 편리하게 사용할 수 있도록 지원합니다. 해당 기능을 사용하다보면 매우 직관적이고 사용성도 어렵지 않음을 알 수 있습니다.

- 특히, loc와 iloc는 자주 사용하는 기능으로, 이번 학습을 통해 충분히 숙지할 수 있도록 합니다.

데이터셋 다운로드를 위한 패키지 설치

In [1]:
!pip install opendata-kr -q

모듈 import

In [2]:
from IPython.display import Image
import numpy as np
import pandas as pd
import seaborn as sns

- 실습에 활용할 데이터셋
- 타이타닉: 탑승객의 사망자와 생존자 데이터 분석

In [4]:
df = sns.load_dataset("titanic")
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


#### 컬럼 (column) 설명

- survivied: 생존여부 (1: 생존, 0: 사망)
- pclass: 좌석 등급 (1등급, 2등급, 3등급)
- sex: 성별
- age: 나이

- sibsp: 형제 + 배우자 수
- parch: 부모 + 자녀 수
- fare: 좌석 요금

- embarked: 탑승 항구 (S, C, Q)
- class: pclass와 동일
- who: 남자(man), 여자(woman), 아이(child)
- adult_male: 성인 남자 여부

- deck: 데크 번호 (알파벳 + 숫자 혼용)
- embark_town: 탑승 항구 이름
- alive: 생존여부 (yes, no)
- alone: 혼자 탑승 여부

#### 주요 목표

- Pandas를 활용하여 타이타닉호 생존자, 사망자 데이터를 분석합니다.

- 데이터를 토대로 생존율이 높은 승객, 생존율이 낮은 승객은 누구인지 판단합니다.

#### head() 앞 부분 / tail() 뒷 부분 조회
- default 옵션 값으로 5개의 행이 조회됩니다.

- 괄호 안에 숫자를 넣어 명시적으로 조회하고 싶은 행의 갯수를 지정할 수 있습니다.

In [5]:
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [6]:
df.tail()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
886,0,2,male,27.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.45,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0,C,First,man,True,C,Cherbourg,yes,True
890,0,3,male,32.0,0,0,7.75,Q,Third,man,True,,Queenstown,no,True


#### info()
- 컬럼별 정보(information)를 보여줍니다.
- 데이터의 갯수, 그리고 데이터 타입(dtype)을 확인할 때 사용합니다.

In [5]:
# object 타입은 쉽게 문자열이라고 생각하면 됩니다.
# 그런데, category 타입도 있습니다. category 타입은 문자열이지만, '남자' / '여자'처럼 카테고리화 할 수 있는 컬럼을 의미
# 추후에 확인

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB


#### value_counts()
- column 별 값의 분포를 확인할 때 사용합니다.
- 남자, 여자, 아이의 데이터 분포를 확인하고 싶다면 다음과 같이 실행

In [8]:
df['sex'].value_counts()

male      577
female    314
Name: sex, dtype: int64

#### 속성: Attributes 
##### 속성 값은 함수형으로 조회하지 않습니다.
##### 자주 활용하는 DataFrame은 속성 값들은 다음과 같습니다.

- ndim: 차원을 나타냅니다. DataFrame은 2가 출력

- shape: 행과열 출력

- index: index는 기본 설정된 RangeIndex가 출력

- columns:  열을 출력

- values: 모든 값을 출력하며, numpy array 형식으로 출력

- T: 전치 (Transpose) 는 Index와 Column의 축을 교환합니다.

In [9]:
df.ndim

2

In [10]:
df.shape

(891, 15)

In [11]:
df.index

RangeIndex(start=0, stop=891, step=1)

In [12]:
df.columns

Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',
       'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town',
       'alive', 'alone'],
      dtype='object')

In [13]:
df.values

array([[0, 3, 'male', ..., 'Southampton', 'no', False],
       [1, 1, 'female', ..., 'Cherbourg', 'yes', False],
       [1, 3, 'female', ..., 'Southampton', 'yes', True],
       ...,
       [0, 3, 'female', ..., 'Southampton', 'no', False],
       [1, 1, 'male', ..., 'Cherbourg', 'yes', True],
       [0, 3, 'male', ..., 'Queenstown', 'no', True]], dtype=object)

In [14]:
df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,881,882,883,884,885,886,887,888,889,890
survived,0,1,1,1,0,0,0,0,1,1,...,0,0,0,0,0,0,1,0,1,0
pclass,3,1,3,1,3,3,1,3,3,2,...,3,3,2,3,3,2,1,3,1,3
sex,male,female,female,female,male,male,male,male,female,female,...,male,female,male,male,female,male,female,female,male,male
age,22.0,38.0,26.0,35.0,35.0,,54.0,2.0,27.0,14.0,...,33.0,22.0,28.0,25.0,39.0,27.0,19.0,,26.0,32.0
sibsp,1,1,0,1,0,0,0,3,0,1,...,0,0,0,0,0,0,0,1,0,0
parch,0,0,0,0,0,0,0,1,2,0,...,0,0,0,0,5,0,0,2,0,0
fare,7.25,71.2833,7.925,53.1,8.05,8.4583,51.8625,21.075,11.1333,30.0708,...,7.8958,10.5167,10.5,7.05,29.125,13.0,30.0,23.45,30.0,7.75
embarked,S,C,S,S,S,Q,S,S,S,C,...,S,S,S,S,Q,S,S,S,C,Q
class,Third,First,Third,First,Third,Third,First,Third,Third,Second,...,Third,Third,Second,Third,Third,Second,First,Third,First,Third
who,man,woman,woman,woman,man,man,man,child,woman,child,...,man,woman,man,man,woman,man,woman,woman,man,man


In [15]:
df

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


타입 변환 (astype)

In [16]:
df['pclass'].astype('int32').head()

0    3
1    1
2    3
3    1
4    3
Name: pclass, dtype: int32

In [17]:
df['pclass'].astype('float32').head()

0    3.0
1    1.0
2    3.0
3    1.0
4    3.0
Name: pclass, dtype: float32

In [18]:
df['pclass'].astype('str').head()

0    3
1    1
2    3
3    1
4    3
Name: pclass, dtype: object

In [19]:
df['pclass'].astype('category').head()

0    3
1    1
2    3
3    1
4    3
Name: pclass, dtype: category
Categories (3, int64): [1, 2, 3]

정렬 (sort)  
sort_index: index 정렬

- index 기준으로 정렬합니다. (기본 오름차순이 적용되어 있습니다.
- 내림차순 정렬을 적용하려면, ascending=False를 옵션 값으로 설정합니다.

In [20]:
df.sort_index().head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [21]:
df.sort_index(ascending=False).head(5)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
890,0,3,male,32.0,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
889,1,1,male,26.0,0,0,30.0,C,First,man,True,C,Cherbourg,yes,True
888,0,3,female,,1,2,23.45,S,Third,woman,False,,Southampton,no,False
887,1,1,female,19.0,0,0,30.0,S,First,woman,False,B,Southampton,yes,True
886,0,2,male,27.0,0,0,13.0,S,Second,man,True,,Southampton,no,True


sort_values: 값에 대한 정렬
- 값을 기준으로 행을 정렬합니다.

- by에 기준이 되는 행을 설정합니다.

- by에 2개 이상의 컬럼을 지정하여 정렬할 수 있습니다.

- 오름차순/내림차순을 컬럼 별로 지정할 수 있습니다.

In [22]:
df.sort_values(by='age').head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
803,1,3,male,0.42,0,1,8.5167,C,Third,child,False,,Cherbourg,yes,False
755,1,2,male,0.67,1,1,14.5,S,Second,child,False,,Southampton,yes,False
644,1,3,female,0.75,2,1,19.2583,C,Third,child,False,,Cherbourg,yes,False
469,1,3,female,0.75,2,1,19.2583,C,Third,child,False,,Cherbourg,yes,False
78,1,2,male,0.83,0,2,29.0,S,Second,child,False,,Southampton,yes,False


내림차순 정렬: ascending=False

In [23]:
df.sort_values(by='age', ascending=False).head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
630,1,1,male,80.0,0,0,30.0,S,First,man,True,A,Southampton,yes,True
851,0,3,male,74.0,0,0,7.775,S,Third,man,True,,Southampton,no,True
493,0,1,male,71.0,0,0,49.5042,C,First,man,True,,Cherbourg,no,True
96,0,1,male,71.0,0,0,34.6542,C,First,man,True,A,Cherbourg,no,True
116,0,3,male,70.5,0,0,7.75,Q,Third,man,True,,Queenstown,no,True


문자열 컬럼도 오름차순/내림차순 정렬이 가능하며 알파벳 순서로 정렬됩니다.

In [24]:
df.sort_values(by='class', ascending=False).head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
511,0,3,male,,0,0,8.05,S,Third,man,True,,Southampton,no,True
500,0,3,male,17.0,0,0,8.6625,S,Third,man,True,,Southampton,no,True
501,0,3,female,21.0,0,0,7.75,Q,Third,woman,False,,Queenstown,no,True
502,0,3,female,,0,0,7.6292,Q,Third,woman,False,,Queenstown,no,True


2개 이상의 컬럼을 기준으로 값 정렬 할 수 있습니다.

In [25]:
df.sort_values(by=['fare', 'age']).head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
302,0,3,male,19.0,0,0,0.0,S,Third,man,True,,Southampton,no,True
271,1,3,male,25.0,0,0,0.0,S,Third,man,True,,Southampton,yes,True
179,0,3,male,36.0,0,0,0.0,S,Third,man,True,,Southampton,no,True
822,0,1,male,38.0,0,0,0.0,S,First,man,True,,Southampton,no,True
806,0,1,male,39.0,0,0,0.0,S,First,man,True,A,Southampton,no,True


오름차순/내림차순 정렬도 컬럼 각각에 지정해 줄 수 있습니다.

In [26]:
df.sort_values(by=['fare', 'age'], ascending=[False, True]).head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
258,1,1,female,35.0,0,0,512.3292,C,First,woman,False,,Cherbourg,yes,True
737,1,1,male,35.0,0,0,512.3292,C,First,man,True,B,Cherbourg,yes,True
679,1,1,male,36.0,0,1,512.3292,C,First,man,True,B,Cherbourg,yes,False
27,0,1,male,19.0,3,2,263.0,S,First,man,True,C,Southampton,no,False
88,1,1,female,23.0,3,2,263.0,S,First,woman,False,C,Southampton,yes,False


Indexing, Slicing, 조건 필터링

#### loc - indexing / slicing
- indexing과 slicing을 할 수 있습니다.
- slicing은 [시작(포함): 끝(포함)] 규칙에 유의합니다. 둘 다 포함 합니다.

In [27]:
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


indexing

In [28]:
df.loc[1, 'class']

'First'

In [32]:
df.loc[2:5,['age', 'fare', 'who']]

Unnamed: 0,age,fare,who
2,26.0,7.925,woman
3,35.0,53.1,woman
4,35.0,8.05,man
5,,8.4583,man


slicing

In [33]:
df.loc[2:5, 'class':'deck'].head()

Unnamed: 0,class,who,adult_male,deck
2,Third,woman,False,
3,First,woman,False,C
4,Third,man,True,
5,Third,man,True,


In [34]:
df.loc[:6, 'class':'deck']

Unnamed: 0,class,who,adult_male,deck
0,Third,man,True,
1,First,woman,False,C
2,Third,woman,False,
3,First,woman,False,C
4,Third,man,True,
5,Third,man,True,
6,First,man,True,E


### loc - 조건 필터
boolean index을 만들어 조건에 맞는 데이터만 추출해 낼 수 있습니다.

In [35]:
cond = (df['age'] >= 70)
cond

0      False
1      False
2      False
3      False
4      False
       ...  
886    False
887    False
888    False
889    False
890    False
Name: age, Length: 891, dtype: bool

In [36]:
df.loc[cond]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
96,0,1,male,71.0,0,0,34.6542,C,First,man,True,A,Cherbourg,no,True
116,0,3,male,70.5,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
493,0,1,male,71.0,0,0,49.5042,C,First,man,True,,Cherbourg,no,True
630,1,1,male,80.0,0,0,30.0,S,First,man,True,A,Southampton,yes,True
672,0,2,male,70.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
745,0,1,male,70.0,1,1,71.0,S,First,man,True,B,Southampton,no,False
851,0,3,male,74.0,0,0,7.775,S,Third,man,True,,Southampton,no,True


### loc - 다중 조건
다중 조건은 먼저 condition을 정의하고 & 와 | 연산자로 복합 조건을 생성합니다.

In [37]:
# 조건1 정의
cond1 = (df['fare'] > 30)

# 조건2 정의
cond2 = (df['who'] == 'woman')

In [38]:
df.loc[cond1 & cond2]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
25,1,3,female,38.0,1,5,31.3875,S,Third,woman,False,,Southampton,yes,False
31,1,1,female,,1,0,146.5208,C,First,woman,False,B,Cherbourg,yes,False
52,1,1,female,49.0,1,0,76.7292,C,First,woman,False,D,Cherbourg,yes,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
853,1,1,female,16.0,0,1,39.4000,S,First,woman,False,D,Southampton,yes,False
856,1,1,female,45.0,1,1,164.8667,S,First,woman,False,,Southampton,yes,False
863,0,3,female,,8,2,69.5500,S,Third,woman,False,,Southampton,no,False
871,1,1,female,47.0,1,1,52.5542,S,First,woman,False,D,Southampton,yes,False


In [39]:
df.loc[cond1 | cond2]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
6,0,1,male,54.0,0,0,51.8625,S,First,man,True,E,Southampton,no,True
8,1,3,female,27.0,0,2,11.1333,S,Third,woman,False,,Southampton,yes,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
880,1,2,female,25.0,0,1,26.0000,S,Second,woman,False,,Southampton,yes,False
882,0,3,female,22.0,0,0,10.5167,S,Third,woman,False,,Southampton,no,True
885,0,3,female,39.0,0,5,29.1250,Q,Third,woman,False,,Queenstown,no,False
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True


In [40]:
# 조건 필터 후 데이터 대입
cond = (df['age'] >= 70)
cond

0      False
1      False
2      False
3      False
4      False
       ...  
886    False
887    False
888    False
889    False
890    False
Name: age, Length: 891, dtype: bool

In [41]:
# 나이 컬럼만 가져옵니다.

df.loc[cond, 'age']

96     71.0
116    70.5
493    71.0
630    80.0
672    70.0
745    70.0
851    74.0
Name: age, dtype: float64

In [42]:
# 조건 필터 후 원하는 값을 대입할 수 있습니다. (단일 컬럼 선택에 유의)

df.loc[cond, 'age'] = -1

In [43]:
df.loc[cond]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
96,0,1,male,-1.0,0,0,34.6542,C,First,man,True,A,Cherbourg,no,True
116,0,3,male,-1.0,0,0,7.75,Q,Third,man,True,,Queenstown,no,True
493,0,1,male,-1.0,0,0,49.5042,C,First,man,True,,Cherbourg,no,True
630,1,1,male,-1.0,0,0,30.0,S,First,man,True,A,Southampton,yes,True
672,0,2,male,-1.0,0,0,10.5,S,Second,man,True,,Southampton,no,True
745,0,1,male,-1.0,1,1,71.0,S,First,man,True,B,Southampton,no,False
851,0,3,male,-1.0,0,0,7.775,S,Third,man,True,,Southampton,no,True


### iloc
- loc와 유사하지만, index만 허용합니다.

- loc와 마찬가지고, indexing / slicing 모두 가능합니다.

In [44]:
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [45]:
# indexing
df.iloc[1, 3]

38.0

In [46]:
# Fancy Indexing
df.iloc[[0, 3, 4], [0, 1, 5, 6]]

Unnamed: 0,survived,pclass,parch,fare
0,0,3,0,7.25
3,1,1,0,53.1
4,0,3,0,8.05


In [47]:
# Slicing
df.iloc[:3, :5]

Unnamed: 0,survived,pclass,sex,age,sibsp
0,0,3,male,22.0,1
1,1,1,female,38.0,1
2,1,3,female,26.0,0


### isin
- 특정 값의 포함 여부는 isin 함수를 통해 비교가 가능합니다. (파이썬의 in 키워드는 사용 불가 합니다.)

In [48]:
sample = pd.DataFrame({'name': ['kim', 'lee', 'park', 'choi'], 
                        'age': [24, 27, 34, 19]
                      })
sample

Unnamed: 0,name,age
0,kim,24
1,lee,27
2,park,34
3,choi,19


In [49]:
sample['name'].isin(['kim', 'lee'])

0     True
1     True
2    False
3    False
Name: name, dtype: bool

In [50]:
sample.isin(['kim', 'lee'])

Unnamed: 0,name,age
0,True,False
1,True,False
2,False,False
3,False,False


In [51]:
# loc를 활용한 조건 필터링으로도 찰떡궁합입니다.

condition = sample['name'].isin(['kim', 'lee'])

In [52]:
sample.loc[condition]

Unnamed: 0,name,age
0,kim,24
1,lee,27


In [53]:
df.head(1)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False


In [54]:
df['embark_town'].value_counts()

Southampton    644
Cherbourg      168
Queenstown      77
Name: embark_town, dtype: int64