## Pandasについて
***
pandasの定義「データ分析を容易にする機能を提供するPythonのデータ解析ライブラリ」<br>
pandasの特徴としては、データフレーム(Data Frame)などの独自のデータ構造が提供されており、様々な処理が可能。<br>
特に表形式のSQLまたはRののように操作することが可能であり、かつ高速で処理することが可能。<br>

## Pandasでできること
***

CSVやExcel、RDBなどにデータを入出力できる。<br>
スクレイピングからのデータ抽出も可能。<br>
データ前処理(NaN / Not a Number、欠損値)。<br>
データの結合や部分的な取り出しやピボッド(pivot)処理。<br>
データの集約及びグループ演算。<br>
データに対しての統計処理及び回帰処。<br>

## Pandasを学ぶ理由
***

機械学習においてデータの前処理は多くの時間を割くことになりますが、<br>
Pandasを使うことでこのデータの前処理という工程を効率よく行うことが出来ます。<br>
主に、多種の型のデータを一つのデータフレームで扱えることや、データ加工や解析の関数が多い時に使用します。<br>
NumPyの配列（np.array）はすべての要素が同じ型でなければなりません。<br>
よって、csvファイルの読み書きなどでは、NumPyは非常に不便なライブラリです。<br>
その点、Pandasのデータフレームは異なる型のデータを入れることが出来ます。<br>
Pandasのデータフレームに格納することで、データの前処理が容易にできます。<br>




In [306]:
import pandas as pd

In [309]:
# シリーズ型
# 1次元のベクトルを扱う
sr = pd.Series([1, 2, 3, 4, 5])
sr

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [305]:
sr1 = pd.Series(data=[1, 2, 3, 4, 5],  index=['A', 'B', 'C', 'D', 'E'])
sr1

A    1
B    2
C    3
D    4
E    5
dtype: int64

In [154]:
sr1['E']

5

In [310]:
sr.loc[0 :2]

0    1
1    2
2    3
dtype: int64

In [157]:
# loc[カラム名 : カラム名]で取得する範囲を指定する

sr1.loc['A' : 'D']

A    1
B    2
C    3
D    4
dtype: int64

In [313]:
sr.iloc[3:]

3    4
4    5
dtype: int64

In [314]:
sr.iloc[2]

3

In [164]:
# iloc[:index番号]で取得する範囲を指定する

sr1.iloc[1:]

B    2
C    3
D    4
E    5
dtype: int64

In [165]:
# 最後の行以外取得

sr1.iloc[:-1]

A    1
B    2
C    3
D    4
dtype: int64

In [25]:
# シリーズ型と証明する

pd.core.series.Series

pandas.core.series.Series

In [319]:
# Data Frame型
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])
df

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9
3,10,11,12
4,13,14,15
5,16,17,18


In [318]:
df_1 = pd.DataFrame(range(10))
df

Unnamed: 0,A1,B2,C3,D4,E5,F6
A,0.259183,0.251469,0.608292,0.736487,0.756833,0.799234
B,0.274558,0.881293,0.283352,0.349801,0.470844,0.799234
C,0.301105,0.239927,0.040401,0.605154,0.754023,0.799234
D,0.119669,0.766018,0.384817,0.727653,0.194563,0.799234
E,0.197149,0.785478,0.117229,0.945975,0.013604,0.799234


In [43]:
df_2 = pd.DataFrame([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20],[21, 22, 23, 24, 25],[26, 27, 28, 29, 30]])
df_2

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,5
1,6,7,8,9,10
2,11,12,13,14,15
3,16,17,18,19,20
4,21,22,23,24,25
5,26,27,28,29,30


In [None]:
# csvの読み込み可能
# 同じディレクトリ内似なければエラーが発生
pd.read.csv('')
# スクレイピング
pd.read.html('')
# excelも可能
pd.read.Excel('')

In [320]:
# 先頭の5行だけ確認することが可能 
df.head()

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9
3,10,11,12
4,13,14,15


In [322]:
df.tail()

Unnamed: 0,0,1,2
1,4,5,6
2,7,8,9
3,10,11,12
4,13,14,15
5,16,17,18


In [41]:
df_2.head()

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9
3,10,11,12
4,13,14,15


In [21]:
# 末尾の5行だけ確認することが可能
df.tail()

Unnamed: 0,0,1,2
1,4,5,6
2,7,8,9
3,10,11,12
4,13,14,15
5,16,17,18


In [22]:
df_1.tail()

Unnamed: 0,0
5,5
6,6
7,7
8,8
9,9


In [40]:
df_2.tail()

Unnamed: 0,0,1,2
2,7,8,9
3,10,11,12
4,13,14,15
5,16,17,18
6,19,20,21


In [49]:
# 部分抽出
# df[['カラム名']]で指定したカラムを取得可能
# 今回はカラムが数値になっているためdf[[0, 1, 2]]で取得

df_2[[0, 1, 2]]

Unnamed: 0,0,1,2
0,1,2,3
1,6,7,8
2,11,12,13
3,16,17,18
4,21,22,23


In [324]:
df[[0, 1]]

Unnamed: 0,0,1
0,1,2
1,4,5
2,7,8
3,10,11
4,13,14
5,16,17


In [50]:
# 上5行も取得可能

df_2[[0, 1, 2]].head()

Unnamed: 0,0,1,2
0,1,2,3
1,6,7,8
2,11,12,13
3,16,17,18
4,21,22,23


In [51]:
# 下5行も取得可能

df_2[[0, 1, 2]].tail()

Unnamed: 0,0,1,2
1,6,7,8
2,11,12,13
3,16,17,18
4,21,22,23
5,26,27,28


In [333]:
df.loc[0:2]

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9


In [334]:
df.iloc[:, 0:2]

Unnamed: 0,0,1
0,1,2
1,4,5
2,7,8
3,10,11
4,13,14
5,16,17


In [57]:
# df.iloc[行, 列]を表示する
# [:]は全てを意味する
# 今回は[:, 0:3]はindex(要素)番号と考えて、1つ手前まで取得するといいう意味

df_2.iloc[:, 0:3]

Unnamed: 0,0,1,2
0,1,2,3
1,6,7,8
2,11,12,13
3,16,17,18
4,21,22,23
5,26,27,28


In [58]:
# 最後以外を取り出したい時
# -1は最後から1つ前までを取り出す

df_2.iloc[:, :-1]

Unnamed: 0,0,1,2,3
0,1,2,3,4
1,6,7,8,9
2,11,12,13,14
3,16,17,18,19
4,21,22,23,24
5,26,27,28,29


In [340]:
df[2] > 7

0    False
1    False
2     True
3     True
4     True
5     True
Name: 2, dtype: bool

In [59]:
# 条件抽出
# 条件に満たしている値はTrue
# 条件を満たしていない値はFalse

df_2[2] > 5

0    False
1     True
2     True
3     True
4     True
5     True
Name: 2, dtype: bool

In [350]:
df[df[2] > 7]

Unnamed: 0,0,1,2
2,7,8,9
3,10,11,12
4,13,14,15
5,16,17,18


In [61]:
# 条件分岐
# 条件に合った値だけ抽出している

df_2[df_2[2] > 10]

Unnamed: 0,0,1,2,3,4
2,11,12,13,14,15
3,16,17,18,19,20
4,21,22,23,24,25
5,26,27,28,29,30


In [363]:
df.sort_values(by=0)

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9
3,10,11,12
4,13,14,15
5,16,17,18


In [355]:
# sort_valuesで昇順・降順に並べ替えることができる

df_2.sort_values(by=2).head()

Unnamed: 0,0,1,2,3,4,new_column
0,1,2,3,4,5,4
1,6,7,8,9,10,54
2,11,12,13,14,15,154
3,16,17,18,19,20,304
4,21,22,23,24,25,504


In [364]:
df.shape

(6, 3)

In [69]:
# numpy同様にshapeで行列の確認をすることが可能

df_2.shape

(6, 5)

In [366]:
df.mean()

0     8.5
1     9.5
2    10.5
dtype: float64

In [70]:
# numpy同様にmeanで各列ごとの平均を確認することが可能

df_2.mean()

0    13.5
1    14.5
2    15.5
3    16.5
4    17.5
dtype: float64

In [367]:
df.std()

0    5.612486
1    5.612486
2    5.612486
dtype: float64

In [71]:
# numpy同様にstdで各列ごとの標準偏差を確認することが可能

df_2.std()

0    9.354143
1    9.354143
2    9.354143
3    9.354143
4    9.354143
dtype: float64

In [368]:
df.describe()

Unnamed: 0,0,1,2
count,6.0,6.0,6.0
mean,8.5,9.5,10.5
std,5.612486,5.612486,5.612486
min,1.0,2.0,3.0
25%,4.75,5.75,6.75
50%,8.5,9.5,10.5
75%,12.25,13.25,14.25
max,16.0,17.0,18.0


In [72]:
# describeを使うと平均や標準偏差、最大値や最小値など様々な値を一度に確認することが可能

df_2.describe()

Unnamed: 0,0,1,2,3,4
count,6.0,6.0,6.0,6.0,6.0
mean,13.5,14.5,15.5,16.5,17.5
std,9.354143,9.354143,9.354143,9.354143,9.354143
min,1.0,2.0,3.0,4.0,5.0
25%,7.25,8.25,9.25,10.25,11.25
50%,13.5,14.5,15.5,16.5,17.5
75%,19.75,20.75,21.75,22.75,23.75
max,26.0,27.0,28.0,29.0,30.0


In [369]:
df.values

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [73]:
# numpyの型に変換も可能

df_2.values

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25],
       [26, 27, 28, 29, 30]])

In [370]:
df.info

<bound method DataFrame.info of     0   1   2
0   1   2   3
1   4   5   6
2   7   8   9
3  10  11  12
4  13  14  15
5  16  17  18>

In [166]:
# データ数と型の確認

df_2.info

<bound method DataFrame.info of     0   1   2   3   4
0   1   2   3   4   5
1   6   7   8   9  10
2  11  12  13  14  15
3  16  17  18  19  20
4  21  22  23  24  25
5  26  27  28  29  30>

In [372]:
df.nunique()

0    6
1    6
2    6
dtype: int64

In [167]:
# ユニークな値の数を確認
# 被ってない値がないか確認
# 下記では6種類の値があるとかくにんできる

df_2.nunique()

0    6
1    6
2    6
3    6
4    6
dtype: int64

In [373]:
df.isnull()

Unnamed: 0,0,1,2
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False
4,False,False,False
5,False,False,False


In [182]:
# 欠損値の確認
# Falseなら欠損値がない状態
# NanもしくはTrueの表示は欠損している

df_2.isnull()

Unnamed: 0,0,1,2,3,4
0,False,False,False,False,False
1,False,False,False,False,False
2,False,False,False,False,False
3,False,False,False,False,False
4,False,False,False,False,False
5,False,False,False,False,False


In [184]:
df_2.index

RangeIndex(start=0, stop=6, step=1)

In [375]:
df['new_column'] = df[1] + df[2]
df

Unnamed: 0,0,1,2,new_column
0,1,2,3,5
1,4,5,6,11
2,7,8,9,17
3,10,11,12,23
4,13,14,15,29
5,16,17,18,35


In [376]:
# 引き算
df['new_column'] = df[1] - df[2]
df

Unnamed: 0,0,1,2,new_column
0,1,2,3,-1
1,4,5,6,-1
2,7,8,9,-1
3,10,11,12,-1
4,13,14,15,-1
5,16,17,18,-1


In [377]:
# 行列積
df['new_column'] = df[1] @ df[2]
df

Unnamed: 0,0,1,2,new_column
0,1,2,3,756
1,4,5,6,756
2,7,8,9,756
3,10,11,12,756
4,13,14,15,756
5,16,17,18,756


In [378]:
df['new_column'] = df[1].dot(df[2])
df

Unnamed: 0,0,1,2,new_column
0,1,2,3,756
1,4,5,6,756
2,7,8,9,756
3,10,11,12,756
4,13,14,15,756
5,16,17,18,756


In [379]:
# スカラー計算
df['new_column'] = df[1] * df[2]
df

Unnamed: 0,0,1,2,new_column
0,1,2,3,6
1,4,5,6,30
2,7,8,9,72
3,10,11,12,132
4,13,14,15,210
5,16,17,18,306


In [233]:
# データの追加
# 同時に四則演算も可能

df_2['new_column'] = df_2[0] * df_2[3]
df_2

Unnamed: 0,0,1,2,3,4,new_column
0,1,2,3,4,5,4
1,6,7,8,9,10,54
2,11,12,13,14,15,154
3,16,17,18,19,20,304
4,21,22,23,24,25,504
5,26,27,28,29,30,754


In [388]:
df.drop(columns=[2])

Unnamed: 0,0,1,new_column
0,1,2,6
1,4,5,30
2,7,8,72
3,10,11,132
4,13,14,210
5,16,17,306


In [256]:
# データの削除

df_2.drop(columns=[4])

Unnamed: 0,0,1,2,3,new_column
0,1,2,3,4,4
1,6,7,8,9,54
2,11,12,13,14,154
3,16,17,18,19,304
4,21,22,23,24,504
5,26,27,28,29,754


In [259]:
# カテゴリカルなデータの操作
# dict型に変換も可能

df_3 = pd.DataFrame({'C1' : ['A', 'S', 'D', 'F', 'G', 'H', 'J'],
                                        'C2' : [1, 2, 3, 4, 5, 6, 7],
                                        'C3' : [100, 200, 300, 400, 500, 600, 700]})
df_3

Unnamed: 0,C1,C2,C3
0,A,1,100
1,S,2,200
2,D,3,300
3,F,4,400
4,G,5,500
5,H,6,600
6,J,7,700


In [389]:
df_3.value_counts()

C1  C2  C3 
A   1   100    1
D   3   300    1
F   4   400    1
G   5   500    1
H   6   600    1
J   7   700    1
S   2   200    1
dtype: int64

In [262]:
# カテゴリとデータの数を確認

df_3['C1'].value_counts()

A    1
F    1
G    1
S    1
H    1
D    1
J    1
Name: C1, dtype: int64

In [394]:
# 特定のカテゴリのデータを取り出す

df_3[df_3['C1'] == 'D']

Unnamed: 0,C1,C2,C3
2,D,3,300


In [396]:
df_3.fillna(df_3.mode)

Unnamed: 0,C1,C2,C3
0,A,1,100
1,S,2,200
2,D,3,300
3,F,4,400
4,G,5,500
5,H,6,600
6,J,7,700


In [265]:
# カテゴリの欠損値を確認
# 今回は無い

df_3['C1'].fillna(df_3['C1'].mode([0]))

0    A
1    S
2    D
3    F
4    G
5    H
6    J
Name: C1, dtype: object

In [397]:
round(df_3.value_counts()/len(df_3))

C1  C2  C3 
A   1   100    0.0
D   3   300    0.0
F   4   400    0.0
G   5   500    0.0
H   6   600    0.0
J   7   700    0.0
S   2   200    0.0
dtype: float64

In [269]:
# 割合を計算する

round(df_3['C1'].value_counts() / len(df_3))

A    0.0
F    0.0
G    0.0
S    0.0
H    0.0
D    0.0
J    0.0
Name: C1, dtype: float64

In [409]:
df.groupby(1).sum()

Unnamed: 0_level_0,0,2,new_column
1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,1,3,6
5,4,6,30
8,7,9,72
11,10,12,132
14,13,15,210
17,16,18,306


In [405]:
df.groupby('new_column')
df.groupby('new_column').sum()

Unnamed: 0_level_0,0,1,2
new_column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
6,1,2,3
30,4,5,6
72,7,8,9
132,10,11,12
210,13,14,15
306,16,17,18


In [279]:
# グループ化して各種統計量を計算する

df_2.groupby(2)
df_2.groupby(2).sum()

Unnamed: 0_level_0,0,1,3,4,new_column
2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,1,2,4,5,4
8,6,7,9,10,54
13,11,12,14,15,154
18,16,17,19,20,304
23,21,22,24,25,504
28,26,27,29,30,754


In [280]:
df_3.groupby('C3')

df_3.groupby('C3').sum()

Unnamed: 0_level_0,C2
C3,Unnamed: 1_level_1
100,1
200,2
300,3
400,4
500,5
600,6
700,7


In [283]:
df_1 = pd.DataFrame(np.random.rand(6, 6))
df_1

Unnamed: 0,0,1,2,3,4,5
0,0.446957,0.268766,0.005685,0.672181,0.417989,0.569
1,0.831613,0.122677,0.840118,0.290256,0.621709,0.919664
2,0.811465,0.228989,0.470499,0.268017,0.475353,0.911672
3,0.061135,0.477258,0.597527,0.972203,0.591893,0.62088
4,0.900872,0.73065,0.513886,0.46794,0.326156,0.255073
5,0.174281,0.419319,0.547891,0.218665,0.565112,0.67501


In [284]:
df_2

Unnamed: 0,0,1,2,3,4,new_column
0,1,2,3,4,5,4
1,6,7,8,9,10,54
2,11,12,13,14,15,154
3,16,17,18,19,20,304
4,21,22,23,24,25,504
5,26,27,28,29,30,754


In [414]:
df_4 = pd.concat([df, df_3])
df_4

Unnamed: 0,0,1,2,new_column,C1,C2,C3
0,1.0,2.0,3.0,6.0,,,
1,4.0,5.0,6.0,30.0,,,
2,7.0,8.0,9.0,72.0,,,
3,10.0,11.0,12.0,132.0,,,
4,13.0,14.0,15.0,210.0,,,
5,16.0,17.0,18.0,306.0,,,
0,,,,,A,1.0,100.0
1,,,,,S,2.0,200.0
2,,,,,D,3.0,300.0
3,,,,,F,4.0,400.0


In [415]:
df_4.isnull()

Unnamed: 0,0,1,2,new_column,C1,C2,C3
0,False,False,False,False,True,True,True
1,False,False,False,False,True,True,True
2,False,False,False,False,True,True,True
3,False,False,False,False,True,True,True
4,False,False,False,False,True,True,True
5,False,False,False,False,True,True,True
0,True,True,True,True,False,False,False
1,True,True,True,True,False,False,False
2,True,True,True,True,False,False,False
3,True,True,True,True,False,False,False


In [288]:
# データの結合
data = pd.concat([df_1, df_2])
data

Unnamed: 0,0,1,2,3,4,5,new_column
0,0.446957,0.268766,0.005685,0.672181,0.417989,0.569,
1,0.831613,0.122677,0.840118,0.290256,0.621709,0.919664,
2,0.811465,0.228989,0.470499,0.268017,0.475353,0.911672,
3,0.061135,0.477258,0.597527,0.972203,0.591893,0.62088,
4,0.900872,0.73065,0.513886,0.46794,0.326156,0.255073,
5,0.174281,0.419319,0.547891,0.218665,0.565112,0.67501,
0,1.0,2.0,3.0,4.0,5.0,,4.0
1,6.0,7.0,8.0,9.0,10.0,,54.0
2,11.0,12.0,13.0,14.0,15.0,,154.0
3,16.0,17.0,18.0,19.0,20.0,,304.0


In [457]:
df_1

Unnamed: 0,0
0,0
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
9,9


In [458]:
df_2

Unnamed: 0,0,1,2,3,4,new_column
0,1,2,3,4,5,4
1,6,7,8,9,10,54
2,11,12,13,14,15,154
3,16,17,18,19,20,304
4,21,22,23,24,25,504
5,26,27,28,29,30,754


In [416]:
# 関数の適用
# 関数の定義

def add(x, y):
    z = x + y
    return z

In [422]:
add(df_4[1], df_4['new_column'])

0      8.0
1     35.0
2     80.0
3    143.0
4    224.0
5    323.0
0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
5      NaN
6      NaN
dtype: float64

In [298]:
# 定義した関数の引数に計算したい行列(index番号)を設定

add(data[2], data[4])

0     0.423673
1     1.461827
2     0.945852
3     1.189420
4     0.840042
5     1.113003
0     8.000000
1    18.000000
2    28.000000
3    38.000000
4    48.000000
5    58.000000
dtype: float64

In [297]:
add(data[3], data['new_column'])

0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
5      NaN
0      8.0
1     63.0
2    168.0
3    323.0
4    528.0
5    783.0
dtype: float64

In [432]:
df_4[['new', 'column']] = df_4[1].apply(square_and_cube)
df_4

Unnamed: 0,0,1,2,new_column,C1,C2,C3,new,column
0,1.0,2.0,3.0,6.0,,,,4.0,8.0
1,4.0,5.0,6.0,30.0,,,,25.0,125.0
2,7.0,8.0,9.0,72.0,,,,64.0,512.0
3,10.0,11.0,12.0,132.0,,,,121.0,1331.0
4,13.0,14.0,15.0,210.0,,,,196.0,2744.0
5,16.0,17.0,18.0,306.0,,,,289.0,4913.0
0,,,,,A,1.0,100.0,,
1,,,,,S,2.0,200.0,,
2,,,,,D,3.0,300.0,,
3,,,,,F,4.0,400.0,,


In [428]:
# 戻り値が複数ある場合
# 追加することも可能

def square_and_cube(x):
    return pd.Series([x**2, x**3])


In [430]:
data[['new', 'column']] = data[1].apply(square_and_cube)
data

KeyError: 1

In [465]:
import numpy as np
import pandas as pd

df_5 = pd.DataFrame(np.random.rand(5, 5), index=['A', 'B', 'C', 'D', 'E'], columns=['A1', 'B2', 'C3', 'D4', 'E5'])
df_6 = pd.DataFrame(np.random.rand(5, 5), index=['K', 'L', 'M', 'N', 'O'], columns=['K1', 'L2', 'M3', 'N4', 'O5'])

df_7 = pd.concat([df_5, df_6])

In [453]:
df_5

Unnamed: 0,A1,B2,C3,D4,E5
A,0.458888,0.592545,0.676668,0.272795,0.93332
B,0.856349,0.74824,0.196495,0.60863,0.481432
C,0.553574,0.967585,0.160852,0.319828,0.662224
D,0.035864,0.068381,0.47114,0.927514,0.662106
E,0.62845,0.465676,0.877915,0.715762,0.640772


In [460]:
df_6

Unnamed: 0,K1,L2,M3
K,0.603093,0.016596,0.329265
L,0.865329,0.7779,0.713528
M,0.911738,0.329568,0.073677


In [467]:
df_7

Unnamed: 0,A1,B2,C3,D4,E5,K1,L2,M3,N4,O5
A,0.712928,0.15481,0.185212,0.435413,0.353931,,,,,
B,0.339131,0.724045,0.747087,0.051563,0.121195,,,,,
C,0.722849,0.244364,0.577053,0.162271,0.369202,,,,,
D,0.16287,0.043941,0.441604,0.068215,0.957659,,,,,
E,0.580659,0.937651,0.727212,0.646666,0.101874,,,,,
K,,,,,,0.580036,0.365306,0.176739,0.440281,0.271406
L,,,,,,0.499078,0.26052,0.512586,0.076355,0.972312
M,,,,,,0.609845,0.978429,0.579128,0.477498,0.559838
N,,,,,,0.623352,0.290648,0.980757,0.52493,0.686045
O,,,,,,0.489404,0.80495,0.499854,0.73664,0.281975


In [235]:
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(5, 5), index=['A', 'B', 'C', 'D', 'E'], columns=['A1', 'B2', 'C3', 'D4', 'E5'])
df

Unnamed: 0,A1,B2,C3,D4,E5
A,0.259183,0.251469,0.608292,0.736487,0.756833
B,0.274558,0.881293,0.283352,0.349801,0.470844
C,0.301105,0.239927,0.040401,0.605154,0.754023
D,0.119669,0.766018,0.384817,0.727653,0.194563
E,0.197149,0.785478,0.117229,0.945975,0.013604


In [200]:
df.max()

0    0.950355
1    0.799158
2    0.805438
3    0.752716
4    0.952269
dtype: float64

In [202]:
df.min()

0    0.112671
1    0.278353
2    0.196980
3    0.185468
4    0.260328
dtype: float64

In [203]:
df.mean()

0    0.598636
1    0.483029
2    0.426870
3    0.452590
4    0.654709
dtype: float64

In [204]:
df.std()

0    0.304113
1    0.205771
2    0.275964
3    0.206747
4    0.271598
dtype: float64

In [237]:
df.describe()

Unnamed: 0,A1,B2,C3,D4,E5
count,5.0,5.0,5.0,5.0,5.0
mean,0.230333,0.584837,0.286818,0.673014,0.437973
std,0.072706,0.312676,0.224968,0.218285,0.332407
min,0.119669,0.239927,0.040401,0.349801,0.013604
25%,0.197149,0.251469,0.117229,0.605154,0.194563
50%,0.259183,0.766018,0.283352,0.727653,0.470844
75%,0.274558,0.785478,0.384817,0.736487,0.754023
max,0.301105,0.881293,0.608292,0.945975,0.756833


In [250]:
# データの追加と四則演算
# 足し算

df['F6'] = df['C3'] + df['B2']
df

Unnamed: 0,A1,B2,C3,D4,E5,F6
A,0.259183,0.251469,0.608292,0.736487,0.756833,0.859761
B,0.274558,0.881293,0.283352,0.349801,0.470844,1.164645
C,0.301105,0.239927,0.040401,0.605154,0.754023,0.280329
D,0.119669,0.766018,0.384817,0.727653,0.194563,1.150834
E,0.197149,0.785478,0.117229,0.945975,0.013604,0.902707


In [251]:
#　引き算

df['F6'] = df['C3'] - df['B2']
df

Unnamed: 0,A1,B2,C3,D4,E5,F6
A,0.259183,0.251469,0.608292,0.736487,0.756833,0.356822
B,0.274558,0.881293,0.283352,0.349801,0.470844,-0.597941
C,0.301105,0.239927,0.040401,0.605154,0.754023,-0.199526
D,0.119669,0.766018,0.384817,0.727653,0.194563,-0.381201
E,0.197149,0.785478,0.117229,0.945975,0.013604,-0.668248


In [252]:
# 行列積

df['F6'] = df['C3'] @ df['B2']
df

Unnamed: 0,A1,B2,C3,D4,E5,F6
A,0.259183,0.251469,0.608292,0.736487,0.756833,0.799234
B,0.274558,0.881293,0.283352,0.349801,0.470844,0.799234
C,0.301105,0.239927,0.040401,0.605154,0.754023,0.799234
D,0.119669,0.766018,0.384817,0.727653,0.194563,0.799234
E,0.197149,0.785478,0.117229,0.945975,0.013604,0.799234


In [254]:
#　行列積
df['F6'] = df['C3'].dot(df['B2'])
df

Unnamed: 0,A1,B2,C3,D4,E5,F6
A,0.259183,0.251469,0.608292,0.736487,0.756833,0.799234
B,0.274558,0.881293,0.283352,0.349801,0.470844,0.799234
C,0.301105,0.239927,0.040401,0.605154,0.754023,0.799234
D,0.119669,0.766018,0.384817,0.727653,0.194563,0.799234
E,0.197149,0.785478,0.117229,0.945975,0.013604,0.799234


In [253]:
# スカラー計算

df['F6'] = df['C3'] * df['B2']
df

Unnamed: 0,A1,B2,C3,D4,E5,F6
A,0.259183,0.251469,0.608292,0.736487,0.756833,0.152967
B,0.274558,0.881293,0.283352,0.349801,0.470844,0.249716
C,0.301105,0.239927,0.040401,0.605154,0.754023,0.009693
D,0.119669,0.766018,0.384817,0.727653,0.194563,0.294776
E,0.197149,0.785478,0.117229,0.945975,0.013604,0.092081


In [196]:
df.nunique()

0    5
1    5
2    5
3    5
4    5
dtype: int64

In [197]:
df.isnull()

Unnamed: 0,0,1,2,3,4
0,False,False,False,False,False
1,False,False,False,False,False
2,False,False,False,False,False
3,False,False,False,False,False
4,False,False,False,False,False
