# Pandas
```index```
+ [データサイエンスのためのPython入門⑩〜PandasインストールからSeriesの使い方〜](https://datawokagaku.com/pandas_series/)
+ [データサイエンスのためのPython入門11〜PandasのDataFrameを作る．CSVファイルを読み込む〜](https://datawokagaku.com/dataframe/)
+ [データサイエンスのためのPython入門12〜DataFrameの基本的な使い方(head, describe, Seriesの取得など)〜](https://datawokagaku.com/dataframe_howto_1/)
+ [データサイエンスのためのPython入門13〜DataFrameのフィルタ操作の基本(超重要)〜](https://datawokagaku.com/dataframe_filter/)
+ [データサイエンスのためのPython入門14〜DataFrameの欠損値NaNに対応する〜](https://datawokagaku.com/dataframe_nan/)
+ [データサイエンスのためのPython入門15〜DataFrameのgroupbyをマスターする〜](https://datawokagaku.com/dataframe_groupby/)
+ [データサイエンスのためのPython入門16〜DataFrameのテーブル結合を完全解説(merge, join, concat)〜](https://datawokagaku.com/dataframe_merge/)
+ [データサイエンスのためのPython入門17〜DataFrameの重要関数(apply, unique, value_counts)を超わかりやすく解説〜](https://datawokagaku.com/dataframe_apply/)
+ [データサイエンスのためのPython入門18〜DataFrameのその他頻出関数(to_csv, iterrows, sort_values)を解説〜](https://datawokagaku.com/dataframe_func1/)
+ [データサイエンスのためのPython入門19〜DataFrameのその他頻出関数(pivot_table, xs)を解説〜](https://datawokagaku.com/dataframe_func2/)


---
## 10: インストールからSeriesの使い方
    Pandasはデータ操作や解析を目的として作られたPythonライブラリで、NumPyをなかで使っている。
    とりわけ表形式のデータ処理が得意で、エクセルで処理するようなことをPythonでできる。

### | Pandasをimport

In [None]:
import pandas as pd

In [None]:
# Pandasの所在確認
pd.__file__

### | Seriesを使う
    PandasにあるSeriesというクラスを使う。
    Series：表形式のデータの各行、カラムを切り取ったデータを表すデータ形式

<img src='https://datawokagaku.com/wp-content/uploads/2020/01/dataframe_yougo.png' width=65%>

+ Table(テーブル):表形式のデータ
+ header(ヘッダー):表の一番うえに並んでいるの
+ column(カラム):ヘッダーを構成する一つ一つ
+ row(ロウ):表には色々なデータがずらっとヘッダーのカラム順に合わせて並んでいる一行一行
+ recode(レコード):行のこと

<img src='https://datawokagaku.com/wp-content/uploads/2020/01/dataframe_series.png' width=65%>

+ DataFrame(データフレーム):Pandasでは、この表をDataFrameというデータ構造で扱う。
+ Series(シリーズ):各行をSeriesというデータ構造を使って扱う。（縦に切り取ってもSeriesになる）
（つまり，Seriesというデータ構造が集まってDataFrameというデータ構造になるイメージ）


In [None]:
# 作り方は簡単で、dictionaryを作ってそれをpd.Series()に入れるだけ!
data = {
    'name':'John',
    'sex':'mael',
    'age':22
}

john_s = pd.Series(data)
print(john_s)
print('\n Johns age is {}.'.format(john_s['age']))

In [None]:
# NumPy Arraysを使って作ることも可能
import numpy as np
array = np.array([100, 200, 300])
array = pd.Series(array)
print(array)

In [None]:
array = array.rename(index={0:'a',1:'b',2:'c'})
print(array)

## 11: DataFrameを作る CSVファイルを読み込む

### | DataFrameの作り方

#### - ndarray → df

In [None]:
import pandas as pd 
import numpy as np


ndarray = np.random.randint(5, size=(5, 4))
pd.DataFrame(data=ndarray)

In [None]:
columns = ['a', 'b', 'c', 'd']
index = np.arange(0, 50, 10)
pd.DataFrame(data=ndarray, index=index, columns=columns)

#### - dictionary → df

In [None]:
data1 = {
    'name':'John',
    'sex':'male',
    'age':22
}
data2 = {
    'name':'Zack',
    'sex':'male',
    'age':30
}
data3 = {
    'name':'Emily',
    'sex':'female',
    'age':32
}
pd.DataFrame([data1, data2, data3])

#### - 各.file → df

In [None]:
pd.read_csv('url')

## 12: DataFrameの基本的操作

In [None]:
import pandas as pd
df = pd.read_csv('train.csv')

#### - .head()

In [None]:
df.head()

#### -.describe()

In [None]:
#統計量確認
df.describe()

#### - .columns

In [None]:
df.columns

### | [](ブラケット)特定カラム抽出（Series）

In [None]:
age_df = df['Age']
age_df

In [None]:
type(age_df)

In [None]:
df[['Age', 'Parch', 'Fare']].head()

#### -.iloc[int]
(index location)

>```memo```
>
>     基本はカラムは文字列，indexは数値になると思います.
>     たまにindexにID（タイタニックでいうと PassengerId ）を指定することもありますが,
>    行の取得にはほとんどの場合 ```.iloc[] ```を使います． 私は滅多に ```loc[] ```は使わないです．
>
>     カラムはできるだけ意味のある文字列にしてください． 意味を持たない数値はややこしいのでやめましょう．

In [None]:
df.iloc[888]

In [None]:
df.iloc[888]['Age']

In [None]:
# NaNというのはnp.nanでありNoneではないことに注意!!!
import numpy as np
np.isnan(df.iloc[888]['Age'])

In [None]:
df.iloc[888]['Age'] is None

In [None]:
df2 = pd.read_csv('train.csv').iloc[5:10] # Slicing
df2.head()

#### - .drop()
>     複数のカラムを落としたい場合はカラムをリストにして渡してください．
>     また， dropしても元のdfは変更されません．

In [None]:
df2.drop(5)

In [None]:
df2.drop('Age', axis=1)

In [None]:
df2

In [None]:
df2 = df2.drop(['Age', 'Parch'], axis=1) #元のdfを上書きする

In [None]:
df2

```Python
# df上書きパターン１ : inplace=True
df = pd.read_csv('train.csv')
df.drop(['Age', 'Cabin'], axis=1, inplace=True)


# df上書きパターン２ : 同名変数に再代入
df = pd.read_csv('train.csv')
df = df.drop(['Age', 'Cabin'], axis=1)
```


>```memo```
>
>      私は後者を使います． 理由は， ぱっと見でわかるしデータサイエンスではよく使う書き方だからです．
>      他のプログラミング言語を勉強した人からすると， 後者は違和感のある書き方だと思います．
>      なるべく結果は違う変数にして変数名を変えて意味を持たせることが多いと思います．
>      例えば df_drop という変数名にしたりとか
>      しかし， データサイエンスでは一つの変数が巨大なメモリを使っているケースが多いです．
>      タイタニックのデータは練習用なので小さいですが，実業務では普通に1万レコードとかになったりします．
>      大きなデータを複数のメモリでコピーしていくとすぐにメモリリーク(メモリ不足)を起こしてしまうので， 同じメモリを使いまわすのが定石です．
>      (ここではdfというオブジェクトを使い回しています．)

## 13: DataFrameのフィルタ操作の基本(超重要)

In [1]:
import pandas as pd

df = pd.read_csv('train.csv')

### | (超重要）特定条件フィルタ(filter)

In [2]:
# 条件付きでSeriesを取得
df['Survived']==1

0      False
1       True
2       True
3       True
4      False
       ...  
886    False
887     True
888    False
889     True
890    False
Name: Survived, Length: 891, dtype: bool

In [3]:
# df[filterの条件]で， ある条件に該当したレコードだけが返ってきます．(SQLでいうwhere句のような)
filter = df['Survived']==1
df[filter]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C
...,...,...,...,...,...,...,...,...,...,...,...,...
875,876,1,3,"Najib, Miss. Adele Kiamie ""Jane""",female,15.0,0,0,2667,7.2250,,C
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56.0,0,1,11767,83.1583,C50,C
880,881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25.0,0,1,230433,26.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S


In [5]:
df[filter].describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,342.0,342.0,342.0,290.0,342.0,342.0,342.0
mean,444.368421,1.0,1.950292,28.34369,0.473684,0.464912,48.395408
std,252.35884,0.0,0.863321,14.950952,0.708688,0.771712,66.596998
min,2.0,1.0,1.0,0.42,0.0,0.0,0.0
25%,250.75,1.0,1.0,19.0,0.0,0.0,12.475
50%,439.5,1.0,2.0,28.0,0.0,0.0,26.0
75%,651.5,1.0,3.0,36.0,1.0,1.0,57.0
max,890.0,1.0,3.0,80.0,4.0,5.0,512.3292


In [6]:
# 60才以上のレコードに絞る
fill_age60 = df['Age']>=60
# 1stクラスのレコードに絞る
fill_pclass = df['Pclass']==1
# 女性のレコードに絞る
fill_sex = df['Sex'] == 'female'

#### - ()&() , ()|()　
> 複数条件フィルタ

>     Pythonだとandやorを使って条件を作りますが，
>     DataFrameのフィルタ操作では & と | であることに注意!

In [7]:
# 60歳以上の女性のデータフレーム
df[(df['Age']>=60) & (df['Sex']=='female')]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
275,276,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S
366,367,1,1,"Warren, Mrs. Frank Manley (Anna Sophia Atkinson)",female,60.0,1,0,110813,75.25,D37,C
483,484,1,3,"Turkula, Mrs. (Hedwig)",female,63.0,0,0,4134,9.5875,,S
829,830,1,1,"Stone, Mrs. George Nelson (Martha Evelyn)",female,62.0,0,0,113572,80.0,B28,


In [8]:
# 1stクラスの人、もしくは10歳未満の子どものデータフレーム
df[(df['Pclass']==1) | (df['Age']<10)]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S
...,...,...,...,...,...,...,...,...,...,...,...,...
871,872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47.0,1,1,11751,52.5542,D35,S
872,873,0,1,"Carlsson, Mr. Frans Olof",male,33.0,0,0,695,5.0000,B51 B53 B55,S
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56.0,0,1,11767,83.1583,C50,C
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S


#### - ~ （スクィグル） 
> NOT演算フィルタ

>     条件の真偽が逆転します(NOT演算, inversionという．)
>     これは特に「値がbooleanのカラムでフィルタする時」によく使う。

In [9]:
# e.g. Survivedが真偽値の場合
data = [
    {'Name':'John', 'Survived':True},
    {'Name':'Emily', 'Survived':False},
    {'Name':'Ben', 'Survived':True},
]
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Survived
0,John,True
1,Emily,False
2,Ben,True


In [10]:
df[df['Survived']==True] #値は既にBoolのため、==Trueの場合は省略可能

Unnamed: 0,Name,Survived
0,John,True
2,Ben,True


In [11]:
f = df[df['Survived']==False]
s = df[~df['Survived']]
print(f, '\n', s)

    Name  Survived
1  Emily     False 
     Name  Survived
1  Emily     False


### | index変更

#### - .reset_index()
> 再度indexを割り振る

In [12]:
df2 = pd.read_csv('train.csv')
df2 = df2[df2['Sex']=='male']
df2.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S


In [13]:
df2 = df2.reset_index()
df2.head()

Unnamed: 0,index,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
2,5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
3,6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
4,7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S


#### - .set_index()
> 特定のカラムをindexにする

In [14]:
# Nameをindexにしたい
df2 = df2.set_index('Name')
df2

Unnamed: 0_level_0,index,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
"Braund, Mr. Owen Harris",0,1,0,3,male,22.0,1,0,A/5 21171,7.2500,,S
"Allen, Mr. William Henry",4,5,0,3,male,35.0,0,0,373450,8.0500,,S
"Moran, Mr. James",5,6,0,3,male,,0,0,330877,8.4583,,Q
"McCarthy, Mr. Timothy J",6,7,0,1,male,54.0,0,0,17463,51.8625,E46,S
"Palsson, Master. Gosta Leonard",7,8,0,3,male,2.0,3,1,349909,21.0750,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
"Banfield, Mr. Frederick James",883,884,0,2,male,28.0,0,0,C.A./SOTON 34068,10.5000,,S
"Sutehall, Mr. Henry Jr",884,885,0,3,male,25.0,0,0,SOTON/OQ 392076,7.0500,,S
"Montvila, Rev. Juozas",886,887,0,2,male,27.0,0,0,211536,13.0000,,S
"Behr, Mr. Karl Howell",889,890,1,1,male,26.0,0,0,111369,30.0000,C148,C


## 14: DataFrameの欠損値NaNに対応

### | DataFrameのNaNについて

>```復習```
>
> + NaNはNot A Numberの略
> + np.nanと同じ
> + NaN判定には np.isnan() を使う(後述: pd.isna()もあります．)
> + Noneとは別物
> + DataFrameでは基本NaNが使われる．

> ちなみに， csvやエクセルで値が空白だと読み込んだ時にNaNになります．Noneではないことに注意!!!


In [16]:
import numpy as np
import pandas as pd
df = pd.read_csv('train.csv')
df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


### | DataFrameのNaN用の関数

#### - .dropna()
> デフォルト axis=0 : NaNのあるレコード(行)を落とす
>> axis=1 を引数にいれるとNaNを含むカラム(列)をdrop

```memo```

    モデルを組む際に， データ数を減らさずにデータを説明する変数(説明変数)を減らす作戦のときに使いますが，
    「NaNが一つでもあるのでその説明変数を減らす」ということはまずありません．
    どの説明変数がモデル構築に重要なのかというのは非常に重要かつ慎重に考えるべき問題です．

In [19]:
df.dropna().tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
871,872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47.0,1,1,11751,52.5542,D35,S
872,873,0,1,"Carlsson, Mr. Frans Olof",male,33.0,0,0,695,5.0,B51 B53 B55,S
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56.0,0,1,11767,83.1583,C50,C
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C


In [22]:
# 特定カラムのNaNのレコードだけ！ (実行業務頻出)
# リスト形式で渡すので注意． たとえ一つのカラムでも， リストで囲って渡す
df.dropna(subset=['Age'])

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
885,886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39.0,0,5,382652,29.1250,,Q
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


#### - .fillna(value)
> NaNに特定のvalueを代入

In [23]:
df.fillna('This is it!').tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27,0,0,211536,13.0,This is it!,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,This is it!,1,2,W./C. 6607,23.45,This is it!,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32,0,0,370376,7.75,This is it!,Q


In [24]:
df['Age'].mean()

29.69911764705882

In [28]:
# AgeのカラムのNaNだけ、Ageの平均値を代入 
df['Age'].fillna(df['Age'].mean())

0      22.000000
1      38.000000
2      26.000000
3      35.000000
4      35.000000
         ...    
886    27.000000
887    19.000000
888    29.699118
889    26.000000
890    32.000000
Name: Age, Length: 891, dtype: float64

In [29]:
df['Age'] = df['Age'].fillna(df['Age'].mean())
df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,29.699118,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


#### - pd.isna()

> ```復習```
>
>     np.isnan() で， DataFrameの全ての値のNaNの判定が可能です．
>     しかし, np.isnan() だといちいちループで回さないといけないですし， stringsを入れるとエラーになったり， 使い勝手が悪いです．
>
>     そこで, DataFrameの中の値のNaN判定には pd.isna() を使うといいです．
>     (pd.isnull() も同じです． 最近名前が変わって pd.isna() が実装されました．特に理由がなければ pd.isna() を使いましょう．np.isnanと違って最後のnがないので注意です．)

In [31]:
df2 = pd.read_csv('train.csv')
pd.isna(df2).tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,False,False,False,False,False,False,False,False,False,False,True,False
887,False,False,False,False,False,False,False,False,False,False,False,False
888,False,False,False,False,False,True,False,False,False,False,True,False
889,False,False,False,False,False,False,False,False,False,False,False,False
890,False,False,False,False,False,False,False,False,False,False,True,False


Seriesに対してよく使いますね．NaN判定の結果を別カラムで持ちたい時とか↓

In [35]:
#　Cabin_nanカラムを使いして，CabinのNaN判定結果を代入する
df2['Cabin_nan'] = pd.isna(df2['Cabin'])
df2.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Cabin_nan
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S,True
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S,False
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S,True
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C,False
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q,True


## 15: DataFrameのgroupby

### | groupby

#### - .groupby().関数()

In [36]:
df = pd.read_csv('train.csv')

In [38]:
# Pclassでgroupby
df.groupby('Pclass').mean() # .sum() .count() .descrive()等々使える

Unnamed: 0_level_0,PassengerId,Survived,Age,SibSp,Parch,Fare
Pclass,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,461.597222,0.62963,38.233441,0.416667,0.356481,84.154687
2,445.956522,0.472826,29.87763,0.402174,0.380435,20.662183
3,439.154786,0.242363,25.14062,0.615071,0.393075,13.67555


In [43]:
# 試しに、階級１をフィルタリングして統計量を算出
df[df['Pclass'] == 1].describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,216.0,216.0,216.0,186.0,216.0,216.0,216.0
mean,461.597222,0.62963,1.0,38.233441,0.416667,0.356481,84.154687
std,246.737616,0.484026,0.0,14.802856,0.611898,0.693997,78.380373
min,2.0,0.0,1.0,0.92,0.0,0.0,0.0
25%,270.75,0.0,1.0,27.0,0.0,0.0,30.92395
50%,472.0,1.0,1.0,37.0,0.0,0.0,60.2875
75%,670.5,1.0,1.0,49.0,1.0,0.0,93.5
max,890.0,1.0,1.0,80.0,3.0,4.0,512.3292


In [44]:
# 更に、meanにフォーカスしたいので、
df[df['Pclass'] == 1].describe().loc['mean']

PassengerId    461.597222
Survived         0.629630
Pclass           1.000000
Age             38.233441
SibSp            0.416667
Parch            0.356481
Fare            84.154687
Name: mean, dtype: float64

In [54]:
# 当然これもDataFrameなので．以下のようにカラムを指定して取り出すことができます．
df.groupby('Pclass').describe()

Unnamed: 0_level_0,PassengerId,PassengerId,PassengerId,PassengerId,PassengerId,PassengerId,PassengerId,PassengerId,Survived,Survived,...,Parch,Parch,Fare,Fare,Fare,Fare,Fare,Fare,Fare,Fare
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
Pclass,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1,216.0,461.597222,246.737616,2.0,270.75,472.0,670.5,890.0,216.0,0.62963,...,0.0,4.0,216.0,84.154687,78.380373,0.0,30.92395,60.2875,93.5,512.3292
2,184.0,445.956522,250.852161,10.0,234.5,435.5,668.0,887.0,184.0,0.472826,...,1.0,3.0,184.0,20.662183,13.417399,0.0,13.0,14.25,26.0,73.5
3,491.0,439.154786,264.441453,1.0,200.0,432.0,666.5,891.0,491.0,0.242363,...,0.0,6.0,491.0,13.67555,11.778142,0.0,7.75,8.05,15.5,69.55


In [55]:
df.groupby('Pclass').describe()['Age']

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Pclass,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,186.0,38.233441,14.802856,0.92,27.0,37.0,49.0,80.0
2,173.0,29.87763,14.001077,0.67,23.0,29.0,36.0,70.0
3,355.0,25.14062,12.495398,0.42,18.0,24.0,32.0,74.0


#### - set_option()

```Python
# カラムを省略せずに表示
pd.set_option('display.max_columns', None)
# 行を省略せずに表示
pd.set_option('display.max_rows', None)
```

戻し方


### | (上級者向け)groupbyの結果をfor文でまわす

    .groupby() の結果はfor文で回すことができ，(index, groupbyされたDataFrame)のタプルの形で回せます．
    
    どういうことかというと以下の例をみてください． 
    i と group_df にはそれぞれ1, 2, 3および Pclass==1, ==2, ==3でフィルタされたときのDataFrameが格納されています．
    （例として len() で各DataFrameのレコード数を表示してます．）

In [56]:
for i, group_df in df.groupby('Pclass'):
    print("{}: group_df's type is {} and has {}".format(i, type(group_df), len(group_df)))

1: group_df's type is <class 'pandas.core.frame.DataFrame'> and has 216
2: group_df's type is <class 'pandas.core.frame.DataFrame'> and has 184
3: group_df's type is <class 'pandas.core.frame.DataFrame'> and has 491


In [59]:
# 各Pclassのグループの中で，各レコードが何番目にFareが高いか数字を振ってみ
df = pd.read_csv('train.csv')
results = []
for i, group_df in df.groupby('Pclass'):
    sorted_group_df = group_df.sort_values('Fare')
    sorted_group_df['RankInPClass'] = np.arange(len(sorted_group_df))
    results.append(sorted_group_df)
result_df = pd.concat(results)
result_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,RankInPClass
633,634,0,1,"Parr, Mr. William Henry Marsh",male,,0,0,112052,0.00,,S,0
822,823,0,1,"Reuchlin, Jonkheer. John George",male,38.0,0,0,19972,0.00,,S,1
815,816,0,1,"Fry, Mr. Richard",male,,0,0,112058,0.00,B102,S,2
806,807,0,1,"Andrews, Mr. Thomas Jr",male,39.0,0,0,112050,0.00,A36,S,3
263,264,0,1,"Harrison, Mr. William",male,40.0,0,0,112059,0.00,B94,S,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...
180,181,0,3,"Sage, Miss. Constance Gladys",female,,8,2,CA. 2343,69.55,,S,486
863,864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.55,,S,487
846,847,0,3,"Sage, Mr. Douglas Bullen",male,,8,2,CA. 2343,69.55,,S,488
201,202,0,3,"Sage, Mr. Frederick",male,,8,2,CA. 2343,69.55,,S,489


## 16: DataFrameのテーブル結合

### | 表の結合とは？

> + 特定のカラムやindexをKeyにして結合する
> + DataFrameを単純に横に(もしくは縦に)結合する（ガッチャンコさせる）

    mergeとconcatでは圧倒的にmergeの方が出てきます．
    mergeもconcatも元のDataFrameを更新しないので ，新たな変数か元の変数に再代入する必要あり


In [60]:
import pandas as pd
df1 = pd.DataFrame({ 'Key': ['k0', 'k1', 'k2'],
        'A': ['a0', 'a1', 'a2'],
        'B': ['b0', 'b1', 'b2']})

df2 = pd.DataFrame({ 'Key': ['k0', 'k1', 'k2'],
        'C': ['c0', 'c1', 'c2'],
        'D': ['d0', 'd1', 'd2']})

#### - df.merge()

In [61]:
# 特定のカラムやindexをKeyにして結合する
df1.merge(df2)

Unnamed: 0,Key,A,B,C,D
0,k0,a0,b0,c0,d0
1,k1,a1,b1,c1,d1
2,k2,a2,b2,c2,d2


#### - pd.concat()
> concatenateの略

In [62]:
# DataFrameを単純に横に(もしくは縦に)結合する（ガッチャンコさせる）
# 縦
pd.concat([df1, df2], axis=0)

Unnamed: 0,Key,A,B,C,D
0,k0,a0,b0,,
1,k1,a1,b1,,
2,k2,a2,b2,,
0,k0,,,c0,d0
1,k1,,,c1,d1
2,k2,,,c2,d2


In [63]:
# 横
pd.concat([df1, df2], axis=1)

Unnamed: 0,Key,A,B,Key.1,C,D
0,k0,a0,b0,k0,c0,d0
1,k1,a1,b1,k1,c1,d1
2,k2,a2,b2,k2,c2,d2


### | .merge()の使い方をマスタ-

重要な引数：

<img src='https://datawokagaku.com/wp-content/uploads/2020/02/params_merge-300x241.png' width=35%>

> + how : どう結合するか→{‘left’, ‘right’, ‘outer’, ‘inner’}, デフォルトは ‘inner’
> + on : keyにするカラムを指定（どちらのDataFrameにも存在するカラム）．指定をしないと共通のカラムで結合される
> + left_on：leftのDataFrameのkeyにするカラム
> + right_on：rightのDataFrameのkeyにするカラム
> + left_index：leftのKeyをindexにする場合Trueを指定
> + right_index：rightのKeyをindexにする場合Trueを指定


#### - merge(how='') left right

In [64]:
df1 = pd.DataFrame({ 'Key': ['k0', 'k1', 'k2'],
                    'A': ['a0', 'a1', 'a2'],
                    'B': ['b0', 'b1', 'b2']})

df2 = pd.DataFrame({ 'Key': ['k0', 'k1', 'k3'],
                    'C': ['c0', 'c1', 'c3'],
                    'D': ['d0', 'd1', 'd3']})

In [69]:
df１

Unnamed: 0,Key1,A,B
0,k0,a0,b0
1,k1,a1,b1
2,k2,a2,b2


In [70]:
df2

Unnamed: 0,Key2,C,D
0,k0,c0,d0
1,k1,c1,d1
2,k3,c3,d3


In [65]:
# df1:left df2:right
# rightの表には’k2’がありません． なので’k2’のC, DはNaNが入ります．leftの表のレコードは失われないイメージ
df1.merge(df2, how='left')

Unnamed: 0,Key,A,B,C,D
0,k0,a0,b0,c0,d0
1,k1,a1,b1,c1,d1
2,k2,a2,b2,,


#### - merge(how='') outer

In [66]:
# left(df1)もright(df2)もどちらもレコードを失うことなく， 結合できるレコードは結合(以外はNaNで埋める)
df1.merge(df2, how='outer')

Unnamed: 0,Key,A,B,C,D
0,k0,a0,b0,c0,d0
1,k1,a1,b1,c1,d1
2,k2,a2,b2,,
3,k3,,,c3,d3


#### - merge(how='') inner

In [67]:
# left(df1)もright(df2)にもある共通のレコードだけ
df1.merge(df2, how='inner')

Unnamed: 0,Key,A,B,C,D
0,k0,a0,b0,c0,d0
1,k1,a1,b1,c1,d1


#### - merge(on='') 

In [71]:
df1 = pd.DataFrame({ 'Key': ['k0', 'k1', 'k2'],
                    'ID': ['aa', 'bb', 'cc'],
                    'A': ['a0', 'a1', 'a2'],
                    'B': ['b0', 'b1', 'b2']})

df2 = pd.DataFrame({ 'Key': ['k0', 'k1', 'k3'],
                    'ID': ['aa', 'bb', 'cc'],
                    'C': ['c0', 'c1', 'c3'],
                    'D': ['d0', 'd1', 'd3']})

In [72]:
df1

Unnamed: 0,Key,ID,A,B
0,k0,aa,a0,b0
1,k1,bb,a1,b1
2,k2,cc,a2,b2


In [73]:
df2

Unnamed: 0,Key,ID,C,D
0,k0,aa,c0,d0
1,k1,bb,c1,d1
2,k3,cc,c3,d3


In [74]:
# KeyカラムをKeyにして結合
df1.merge(df2, on='Key')

Unnamed: 0,Key,ID_x,A,B,ID_y,C,D
0,k0,aa,a0,b0,aa,c0,d0
1,k1,bb,a1,b1,bb,c1,d1


In [75]:
# IDカラムをKeyにして結合
df1.merge(df2, on='ID')

Unnamed: 0,Key_x,ID,A,B,Key_y,C,D
0,k0,aa,a0,b0,k0,c0,d0
1,k1,bb,a1,b1,k1,c1,d1
2,k2,cc,a2,b2,k3,c3,d3


In [76]:
# suffixesを指定する
df1.merge(df2, on='ID', suffixes=('_left', '_right'))

Unnamed: 0,Key_left,ID,A,B,Key_right,C,D
0,k0,aa,a0,b0,k0,c0,d0
1,k1,bb,a1,b1,k1,c1,d1
2,k2,cc,a2,b2,k3,c3,d3


#### - merge(left_on='', right_on='') 

In [77]:
df1 = pd.DataFrame({ 'Key1': ['k0', 'k1', 'k2'],
                    'A': ['a0', 'a1', 'a2'],
                    'B': ['b0', 'b1', 'b2']})

df2 = pd.DataFrame({ 'Key2': ['k0', 'k1', 'k3'],
                    'C': ['c0', 'c1', 'c3'],
                    'D': ['d0', 'd1', 'd3']})

In [79]:
df1

Unnamed: 0,Key1,A,B
0,k0,a0,b0
1,k1,a1,b1
2,k2,a2,b2


In [80]:
df2

Unnamed: 0,Key2,C,D
0,k0,c0,d0
1,k1,c1,d1
2,k3,c3,d3


In [78]:
# suffixesを指定する
df1.merge(df2, left_on='Key1', right_on='Key2')

Unnamed: 0,Key1,A,B,Key2,C,D
0,k0,a0,b0,k0,c0,d0
1,k1,a1,b1,k1,c1,d1


#### - merge(left_index='', right_index='') 

In [81]:
df1.merge(df2, left_index=True, right_index=True)

Unnamed: 0,Key1,A,B,Key2,C,D
0,k0,a0,b0,k0,c0,d0
1,k1,a1,b1,k1,c1,d1
2,k2,a2,b2,k3,c3,d3


#### - .join()

> ```memo```
>
>     複数のDataFrameを一気に連結できます．
>     が...これはオススメしません．バグのもとになりやすいですし，コードが読みにくくなります．
>     できれば一つ一つ結合することをオススメします．

In [68]:
df1 = pd.DataFrame({ 'Key1': ['k0', 'k1', 'k2'],
                    'A': ['a0', 'a1', 'a2'],
                    'B': ['b0', 'b1', 'b2']})

df2 = pd.DataFrame({ 'Key2': ['k0', 'k1', 'k3'],
                    'C': ['c0', 'c1', 'c3'],
                    'D': ['d0', 'd1', 'd3']})
df3 = pd.DataFrame({ 'Key3': ['k0', 'k1', 'k4'],
                    'E': ['c0', 'c1', 'c3'],
                    'F': ['d0', 'd1', 'd3']})

df1.join([df2, df3])

Unnamed: 0,Key1,A,B,Key2,C,D,Key3,E,F
0,k0,a0,b0,k0,c0,d0,k0,c0,d0
1,k1,a1,b1,k1,c1,d1,k1,c1,d1
2,k2,a2,b2,k3,c3,d3,k4,c3,d3


## 17: DataFrameの重要関数(apply, unique, value_counts)

#### - .unique() .nunique()
> Seriesの関数です．よく使います．

>     「このカラムはどんな値を保持するんだろう？」と思いますよね？
>     例えば「本当にPclassは1, 2, 3しか値がないのだろうか？」と思うことがあると思います． そういうときにこれらの関数を使います．


In [82]:
import pandas as pd
df = pd.read_csv('train.csv')
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [84]:
df['Pclass'].unique()

array([3, 1, 2])

In [86]:
df['Pclass'].nunique()

3

#### - .value_counts()
> それぞれの値に対していくつのレコードがあるのかをSeries形式で返してくれます．

In [87]:
df['Pclass'].value_counts()

3    491
1    216
2    184
Name: Pclass, dtype: int64

### | .apply（）

#### - (超重要).apply()
> apply : 適用
>
>     基本的にDataFrameの操作はapply関数で処理していくと言っていいと思います．
>     apply()関数を使って， DataFrameの全てのレコードに処理をして， その結果を別のカラムに格納することができます．

<img src='https://datawokagaku.com/wp-content/uploads/2020/02/apply_overview-1.png' width=45%>


In [90]:
def get_age_group(age):
    return str(age)[0] + '0s'

# 実行
get_age_group(45)

'40s'

In [91]:
df = pd.DataFrame({ 'name': ['john', 'Mike', 'Emily'],
                    'age': ['23', '36', '42']})
df

Unnamed: 0,name,age
0,john,23
1,Mike,36
2,Emily,42


In [92]:
df['age'].apply(get_age_group)

0    20s
1    30s
2    40s
Name: age, dtype: object

In [93]:
# 'age_group'カラムを新たに作り，結果を代入
df['age_group'] = df['age'].apply(get_age_group)
df

Unnamed: 0,name,age,age_group
0,john,23,20s
1,Mike,36,30s
2,Emily,42,40s


#### - lambda関数を使った.apply()

In [94]:
# lambda関数を変数fに代入して
f = lambda x: str(x)[0] + '0s'
#　試しに43を入れてみる
f(43)

'40s'

In [95]:
df['age_group'] = df['age'].apply(lambda x: str(x)[0] + '0s')

#### - レコード全体に対し.apply()
> 行に対してapplyする場合，axis=1を指定する必要があります．

In [96]:
df = pd.DataFrame({ 'name': ['john', 'Mike', 'Emily'],
                    'age': ['23', '36', '42']})
df['description'] = df.apply(lambda row: '{} is {} years old'.format(row['name'], row['age']), axis=1)
df

Unnamed: 0,name,age,description
0,john,23,john is 23 years old
1,Mike,36,Mike is 36 years old
2,Emily,42,Emily is 42 years old


## 18: DataFrameのその他頻出関数(to_csv, iterrows, sort_values)

#### - df.to_csv()

In [1]:
import pandas as pd
df = pd.read_csv('train.csv')

In [2]:
df.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


In [3]:
df['Adult'] = df['Age'].apply(lambda x : x > 20)
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Adult
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,True
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


In [4]:
df.to_csv('train_w_aduld.csv')

In [5]:
df.to_csv('../../../../train_w_adult.csv')

In [6]:
df_new = pd.read_csv('train_w_aduld.csv')
display(df_new)

Unnamed: 0.1,Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Adult
0,0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,True
1,1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,True
3,3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,True
4,4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S,True
887,887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S,False
888,888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S,False
889,889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C,True


>    ```Unnamed:```と謎のカラムが作成される
>     これは、indexeがそのまま保存されている。pandasで読み込むたびに生成されるのは不要
>     オプションを付するればOK
>

> ```Python
> df.to_csv('pass.csv', index=False)
> ```

#### - .iterrows()
> DataFrameをfor文でイテレーションするときに使います．覚えにくいですが，「rows」を「iteration」するのでiter + row + s. と覚えましょう

In [8]:
for idx, row in df.iterrows():
    if row['Age'] > 40 and row['Pclass'] == 3 and row['Sex'] == 'male' and row["Survived"] == 1:
        print("{}: {} is very luck guy...!".format(idx, row['Name']))

338: Dahl, Mr. Karl Edwart is very luck guy...!
414: Sundman, Mr. Johan Julian is very luck guy...!


#### - .sort_values()
>     SeriesではなくDataFrameの関数であることに注意してください．
>     デフォルトは昇順ソート(小→大)です．降順ソート(大→小)にするには ascending=False を指定します．


In [14]:
df.sort_values('Age', ascending=False)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Adult
630,631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0,0,27042,30.0000,A23,S,True
851,852,0,3,"Svensson, Mr. Johan",male,74.0,0,0,347060,7.7750,,S,True
493,494,0,1,"Artagaveytia, Mr. Ramon",male,71.0,0,0,PC 17609,49.5042,,C,True
96,97,0,1,"Goldschmidt, Mr. George B",male,71.0,0,0,PC 17754,34.6542,A5,C,True
116,117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.7500,,Q,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
859,860,0,3,"Razi, Mr. Raihed",male,,0,0,2629,7.2292,,C,False
863,864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.5500,,S,False
868,869,0,3,"van Melkebeke, Mr. Philemon",male,,0,0,345777,9.5000,,S,False
878,879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S,False


## 19: DataFrameのその他頻出関数(pivot_table, xs)