# 10分で分かるpandas
## はじめに
この記事はpandas公式チュートリアル「10 minutes to pandas」の写経及び解説です

以下のURLを参考にしています
https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html


## 環境
- Python3.8
- Jupyter Lab


# とりあえずインポート

In [89]:
import numpy as np
import pandas as pd

In [90]:
np

<module 'numpy' from 'C:\\Users\\user\\AppData\\Local\\Programs\\Python\\Python37\\lib\\site-packages\\numpy\\__init__.py'>

In [91]:
pd

<module 'pandas' from 'C:\\Users\\user\\AppData\\Roaming\\Python\\Python37\\site-packages\\pandas\\__init__.py'>

## [1. Object creation - オブジェクトを作る](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#object-creation)

### Seriesクラス
[Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series)クラスにリストを入れることで簡単にデータを作ることが出来ます。


In [92]:
# 簡単に一列作る
s = pd.Series(data=[1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

### data_rangeメソッド
[date_range()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html)を使うことで、特定の期間の日付の行を作成出来ます。

In [93]:
# 2020年1月１日から6日間のデータ
dates = pd.date_range("20200101", periods=6)
dates

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06'],
              dtype='datetime64[ns]', freq='D')

### DataFrameクラス
pandasの[DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas-dataframe)クラス**引数index**を指定することで、行インデックスを指定することが出来ます。

In [94]:
# 行インデックスに2020年1月1日からのデータを指定
# 各値にはランダムな数値を入れる
df = pd.DataFrame(np.random.randn(6, 4), index=dates)
df

Unnamed: 0,0,1,2,3
2020-01-01,-0.378659,-0.919296,0.247877,1.571683
2020-01-02,0.593324,1.400992,-0.640611,-1.036085
2020-01-03,0.293012,-1.503893,-0.265546,-1.154599
2020-01-04,0.174863,-0.216113,-0.385356,0.317343
2020-01-05,-1.026046,0.778907,-0.26569,-0.301124
2020-01-06,0.635393,1.288923,0.905714,0.209032


また、同じくDataFrameクラスの
**引数columns**を指定することで列名を設定することが出来ます。

In [95]:
# 列名ABCDを設定
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df

Unnamed: 0,A,B,C,D
2020-01-01,-2.282406,-0.277953,0.036478,0.862626
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206


DataFrameクラスに辞書型のデータを渡すことで、辞書型のキーの部分が列名になります。

In [96]:
df2 = pd.DataFrame(
    {
        "A": 1.,
        "B": pd.Timestamp("20200101"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2020-01-01,1.0,3,test,foo
1,1.0,2020-01-01,1.0,3,train,foo
2,1.0,2020-01-01,1.0,3,test,foo
3,1.0,2020-01-01,1.0,3,train,foo


### DataFrame.dtypesプロパティ
**dtypesプロパティ**に参照することで各列のデータ属性が分かります。

In [97]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

## [2. Viewing data - データを表示する](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#object-creation)


DataFrameクラスの[head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html#pandas.DataFrame.head)を使うことでデータの先頭部を表示出来ます。


### DataFrame.headメソッド

In [98]:
df.head(2)

Unnamed: 0,A,B,C,D
2020-01-01,-2.282406,-0.277953,0.036478,0.862626
2020-01-02,-0.250383,0.493196,0.026729,-1.624085


同じくDataFrameクラスの[tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html#pandas-dataframe-tail)を使うことでデータの後尾部を表示出来ます。

### DataFrame.tailメソッド
同じくDataFrameクラスの[tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html#pandas-dataframe-tail)を使うことでデータの後尾部を表示出来ます。

In [99]:
df.tail(2)

Unnamed: 0,A,B,C,D
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206


### DataFrame.indexプロパティ
DataFrameクラスの[index](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.index.html#pandas-dataframe-index)を参照することでそのデータの行インデックスを表示出来ます。


In [100]:
df.index

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06'],
              dtype='datetime64[ns]', freq='D')

In [101]:
df2.index

Int64Index([0, 1, 2, 3], dtype='int64')

### DataFrame.to_numpyメソッド
DataFrameクラスの[to_numpy()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas-dataframe-to-numpy)を使うことでデータをnumpyで操作しやすいデータに変換出来ます。


In [102]:
df.to_numpy()

array([[-2.28240557, -0.27795345,  0.03647782,  0.86262606],
       [-0.25038302,  0.49319626,  0.02672862, -1.62408484],
       [ 0.6731027 , -0.98998077, -0.92515717,  0.06075122],
       [ 1.35048915, -0.71702368, -0.80938195, -0.15809403],
       [-0.05187568, -0.62931839, -0.67295941, -0.17658501],
       [-0.68684761, -1.82711825, -1.51347755,  0.20420576]])

In [103]:
df2.to_numpy()

array([[1.0, Timestamp('2020-01-01 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2020-01-01 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2020-01-01 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2020-01-01 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)

In [104]:
df

Unnamed: 0,A,B,C,D
2020-01-01,-2.282406,-0.277953,0.036478,0.862626
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206


### DataFrame.describeメソッド
DataFrameクラスの[describe()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html#pandas-dataframe-describe)を使うことで、データの各列の簡単な統計を取ることが出来ます。


In [105]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.207987,-0.658033,-0.642962,-0.13853
std,1.246102,0.76825,0.596012,0.820756
min,-2.282406,-1.827118,-1.513478,-1.624085
25%,-0.577731,-0.921741,-0.896213,-0.171962
50%,-0.151129,-0.673171,-0.741171,-0.048671
75%,0.491858,-0.365795,-0.148193,0.168342
max,1.350489,0.493196,0.036478,0.862626


### DataFrame.T属性
DataFrameクラスの[T](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.T.html#pandas-dataframe-t)を参照すると、行列入れ替えたデータにアクセス出来ます。

In [106]:
df.T

Unnamed: 0,2020-01-01,2020-01-02,2020-01-03,2020-01-04,2020-01-05,2020-01-06
A,-2.282406,-0.250383,0.673103,1.350489,-0.051876,-0.686848
B,-0.277953,0.493196,-0.989981,-0.717024,-0.629318,-1.827118
C,0.036478,0.026729,-0.925157,-0.809382,-0.672959,-1.513478
D,0.862626,-1.624085,0.060751,-0.158094,-0.176585,0.204206


### DataFrame.transposeメソッド
DataFrameクラスの[transpose()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html#pandas-dataframe-transpose)でも同じく行列の入れ替えを取得出来ます。

In [107]:
df.transpose()

Unnamed: 0,2020-01-01,2020-01-02,2020-01-03,2020-01-04,2020-01-05,2020-01-06
A,-2.282406,-0.250383,0.673103,1.350489,-0.051876,-0.686848
B,-0.277953,0.493196,-0.989981,-0.717024,-0.629318,-1.827118
C,0.036478,0.026729,-0.925157,-0.809382,-0.672959,-1.513478
D,0.862626,-1.624085,0.060751,-0.158094,-0.176585,0.204206


### DataFrame.sort_index()

DataFrameクラスの[sort_index()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_index.html#pandas-dataframe-sort-index)を使用することで、行全体もしくは列全体の並び替えを行うことが出来ます。

In [108]:
df.sort_index()

Unnamed: 0,A,B,C,D
2020-01-01,-2.282406,-0.277953,0.036478,0.862626
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206


**引数axis**に0もしくは"index"を設定すると行に、1もしくは"columns"を設定すると、列を軸に並び替えします(デフォルト値0)。  
また、**引数ascending**にFalseを指定すると並び順が降順になります(デフォルト値True)。

In [109]:
df.sort_index(axis="columns", ascending=False)

Unnamed: 0,D,C,B,A
2020-01-01,0.862626,0.036478,-0.277953,-2.282406
2020-01-02,-1.624085,0.026729,0.493196,-0.250383
2020-01-03,0.060751,-0.925157,-0.989981,0.673103
2020-01-04,-0.158094,-0.809382,-0.717024,1.350489
2020-01-05,-0.176585,-0.672959,-0.629318,-0.051876
2020-01-06,0.204206,-1.513478,-1.827118,-0.686848


In [110]:
df.sort_index(axis=0, ascending=False)

Unnamed: 0,A,B,C,D
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-01,-2.282406,-0.277953,0.036478,0.862626


### DataFrame.sort_valuesメソッド
DataFrameクラスの[sort_values()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html#pandas-dataframe-sort-values)を使用することで行単位もしくは列単位に並び替えを行うことが出来ます。


In [111]:
df.sort_values(by="B")

Unnamed: 0,A,B,C,D
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585
2020-01-01,-2.282406,-0.277953,0.036478,0.862626
2020-01-02,-0.250383,0.493196,0.026729,-1.624085


In [112]:
df.sort_values(by="2020-01-01", axis=1)

Unnamed: 0,A,B,C,D
2020-01-01,-2.282406,-0.277953,0.036478,0.862626
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206


## [3. Selection - データを選択する](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#selection)
### [3.1 getting - 単純なデータ取得](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#getting)

**df["A"]**もしくは**df.A**とすることで、指定した一列を取得することが出来ます。

In [113]:
df["A"]

2020-01-01   -2.282406
2020-01-02   -0.250383
2020-01-03    0.673103
2020-01-04    1.350489
2020-01-05   -0.051876
2020-01-06   -0.686848
Freq: D, Name: A, dtype: float64

In [114]:
df.A

2020-01-01   -2.282406
2020-01-02   -0.250383
2020-01-03    0.673103
2020-01-04    1.350489
2020-01-05   -0.051876
2020-01-06   -0.686848
Freq: D, Name: A, dtype: float64

リスト**[]**で指定した場合、Pythonのスライス操作で列や行を選択することが出来ます

インデックスの範囲を取得することも出来ます。

In [115]:
print("先頭4列表示")
df[0:3]

先頭4列表示


Unnamed: 0,A,B,C,D
2020-01-01,-2.282406,-0.277953,0.036478,0.862626
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-03,0.673103,-0.989981,-0.925157,0.060751


In [116]:
# 2020年1月2日から2020年1月4日まで表示
df['20200102':'20200104']

Unnamed: 0,A,B,C,D
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094


### [3.2 Selection by label - ラベルを指定してデータを選択する](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#selection-by-label)

DataFrameクラスの[loc()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas-dataframe-loc)にインデックス(今回の場合dates)を指定することで、行を列として選択することが出来ます。

In [117]:
df.loc[dates]

Unnamed: 0,A,B,C,D
2020-01-01,-2.282406,-0.277953,0.036478,0.862626
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206


In [118]:
df.loc[dates[0]]

A   -2.282406
B   -0.277953
C    0.036478
D    0.862626
Name: 2020-01-01 00:00:00, dtype: float64

[loc()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas-dataframe-loc)を使うことで、複数列を選択することが出来ます。

In [119]:
df.loc[:, ["A", "B"]]

Unnamed: 0,A,B
2020-01-01,-2.282406,-0.277953
2020-01-02,-0.250383,0.493196
2020-01-03,0.673103,-0.989981
2020-01-04,1.350489,-0.717024
2020-01-05,-0.051876,-0.629318
2020-01-06,-0.686848,-1.827118


[loc()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas-dataframe-loc)とスライス操作を組み合わせることで複数行、複数列選択することが出来ます。

In [120]:
df.loc['20200102':'20200104', ['A', 'B']]

Unnamed: 0,A,B
2020-01-02,-0.250383,0.493196
2020-01-03,0.673103,-0.989981
2020-01-04,1.350489,-0.717024


[loc()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas-dataframe-loc)にインデックスを指定することで単体データを取得出来ます

In [121]:
df.loc[dates[0], 'A']

-2.2824055707722506

[at()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.at.html)を使うことでより高速に単体データを取得することが出来ます

In [122]:
df.at[dates[0], 'A']

-2.2824055707722506

### [3.3 Selection by position - 位置を指定してデータを選択する](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#selection-by-position)

DataFrameクラスの[iloc()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html)を使うことで、数値を指定してデータ選択することが出来ます。


In [123]:
df

Unnamed: 0,A,B,C,D
2020-01-01,-2.282406,-0.277953,0.036478,0.862626
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206


In [124]:
df.iloc[3] # 4行目を1列として選択

A    1.350489
B   -0.717024
C   -0.809382
D   -0.158094
Name: 2020-01-04 00:00:00, dtype: float64

In [125]:
df.iloc[3:5, 0:2] # 4行目から5行目まで、1列目から2列目まで選択

Unnamed: 0,A,B
2020-01-04,1.350489,-0.717024
2020-01-05,-0.051876,-0.629318


In [126]:
df.iloc[[1, 2, 4], [0, 2]] # 2行目、3行目、5行目、1列目、3列目を選択

Unnamed: 0,A,C
2020-01-02,-0.250383,0.026729
2020-01-03,0.673103,-0.925157
2020-01-05,-0.051876,-0.672959


DataFrameクラスの[iloc()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html)の引数に開始位置終了位置を省略したスライス(:のみ)を指定することで、特定の全行 or 全列を取得出来ます

In [127]:
df.iloc[1:3, :] # 2行目から3行目を全列選択


Unnamed: 0,A,B,C,D
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-03,0.673103,-0.989981,-0.925157,0.060751


In [128]:
df.iloc[:, 1:3] # 2列目から3列目を善行選択

Unnamed: 0,B,C
2020-01-01,-0.277953,0.036478
2020-01-02,0.493196,0.026729
2020-01-03,-0.989981,-0.925157
2020-01-04,-0.717024,-0.809382
2020-01-05,-0.629318,-0.672959
2020-01-06,-1.827118,-1.513478


DataFrameクラスの[iloc()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html)に引数に数値のみ指定することで、単体データの選択が出来ます。

In [129]:
df.iloc[1, 1]

0.4931962571517133

[at()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.at.html)と同様、[iat()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iat.html)使うことでより高速に単体データを取得することが出来ます

In [130]:
df.iat[1, 1]

0.4931962571517133

### [3.4 Boolean indexing - 条件判定によるデータ選択](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#boolean-indexing)

In [131]:
df

Unnamed: 0,A,B,C,D
2020-01-01,-2.282406,-0.277953,0.036478,0.862626
2020-01-02,-0.250383,0.493196,0.026729,-1.624085
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206


A列のデータが0を超えている行を選択するには以下のようにします。

In [132]:
df[df["A"] > 0] 

Unnamed: 0,A,B,C,D
2020-01-03,0.673103,-0.989981,-0.925157,0.060751
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094


DataFrameに対して条件判定することで、特定のデータだけ表示することが出来ます。

In [133]:
df[df > 0]

Unnamed: 0,A,B,C,D
2020-01-01,,,0.036478,0.862626
2020-01-02,,0.493196,0.026729,
2020-01-03,0.673103,,,0.060751
2020-01-04,1.350489,,,
2020-01-05,,,,
2020-01-06,,,,0.204206


[isin()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.isin.html#pandas-series-isin)を使うことでフィルタリングが出来ます。

In [134]:
df2 = df.copy()
df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three']
df2

Unnamed: 0,A,B,C,D,E
2020-01-01,-2.282406,-0.277953,0.036478,0.862626,one
2020-01-02,-0.250383,0.493196,0.026729,-1.624085,one
2020-01-03,0.673103,-0.989981,-0.925157,0.060751,two
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094,three
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585,four
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206,three


In [135]:
df2[df2['E'].isin(['two', 'four'])]

Unnamed: 0,A,B,C,D,E
2020-01-03,0.673103,-0.989981,-0.925157,0.060751,two
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585,four


### [3.5 Setting - データの設定](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#setting)

新しい列を設定すると、データがインデックスによって自動的に配置されます。

In [144]:
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range('20200102', periods=6)) # 新しい列用のデータを作成する
s1

2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
2020-01-07    6
Freq: D, dtype: int64

In [145]:
df['F'] = s1 # DataFrameのF列にs1を追加
df

Unnamed: 0,A,B,C,D,F
2020-01-01,0.0,-0.277953,0.036478,0.862626,
2020-01-02,-0.250383,0.493196,0.026729,-1.624085,1.0
2020-01-03,0.673103,-0.989981,-0.925157,0.060751,2.0
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094,3.0
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585,4.0
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206,5.0


ラベル指定による設定も出来ます。

In [146]:
print(dates)
df.at[dates[0], 'A'] = 0 # 1行目A列のデータを0に設定する
df

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06'],
              dtype='datetime64[ns]', freq='D')


Unnamed: 0,A,B,C,D,F
2020-01-01,0.0,-0.277953,0.036478,0.862626,
2020-01-02,-0.250383,0.493196,0.026729,-1.624085,1.0
2020-01-03,0.673103,-0.989981,-0.925157,0.060751,2.0
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094,3.0
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585,4.0
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206,5.0


位置指定による設定も出来ます。

In [151]:
df.iat[0, 1] = 0 # 1行目B列を0にする
df

Unnamed: 0,A,B,C,D,F
2020-01-01,0.0,0.0,0.036478,0.862626,
2020-01-02,-0.250383,0.493196,0.026729,-1.624085,1.0
2020-01-03,0.673103,-0.989981,-0.925157,0.060751,2.0
2020-01-04,1.350489,-0.717024,-0.809382,-0.158094,3.0
2020-01-05,-0.051876,-0.629318,-0.672959,-0.176585,4.0
2020-01-06,-0.686848,-1.827118,-1.513478,0.204206,5.0


NumPy配列を使った設定も出来ます。

In [155]:
df.loc[:, 'D'] = np.array([5] * len(df)) # D列にNumpyで設定したデータを設定する
df

Unnamed: 0,A,B,C,D,F
2020-01-01,0.0,0.0,0.036478,5,
2020-01-02,-0.250383,0.493196,0.026729,5,1.0
2020-01-03,0.673103,-0.989981,-0.925157,5,2.0
2020-01-04,1.350489,-0.717024,-0.809382,5,3.0
2020-01-05,-0.051876,-0.629318,-0.672959,5,4.0
2020-01-06,-0.686848,-1.827118,-1.513478,5,5.0


条件判定で選択したデータに対して値を設定することも出来ます

In [157]:
df2 = df.copy()
df2[df2 > 0] = 9999 # 0より大きいデータを全て9999にする
df2

Unnamed: 0,A,B,C,D,F
2020-01-01,0.0,0.0,9999.0,9999,
2020-01-02,-0.250383,9999.0,9999.0,9999,9999.0
2020-01-03,9999.0,-0.989981,-0.925157,9999,9999.0
2020-01-04,9999.0,-0.717024,-0.809382,9999,9999.0
2020-01-05,-0.051876,-0.629318,-0.672959,9999,9999.0
2020-01-06,-0.686848,-1.827118,-1.513478,9999,9999.0


[4. Missing data - 欠落データ](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#missing-data)

pandasでは欠落データを表すのに主に [np.nan](https://docs.scipy.org/doc/numpy/reference/constants.html#numpy.nan) を利用します。  
DataFrameクラスの[reindex()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html)を使うことで指定した行・列の変更/追加/削除を行ったDataFrameを返却します。

In [160]:
df

Unnamed: 0,A,B,C,D,F
2020-01-01,0.0,0.0,0.036478,5,
2020-01-02,-0.250383,0.493196,0.026729,5,1.0
2020-01-03,0.673103,-0.989981,-0.925157,5,2.0
2020-01-04,1.350489,-0.717024,-0.809382,5,3.0
2020-01-05,-0.051876,-0.629318,-0.672959,5,4.0
2020-01-06,-0.686848,-1.827118,-1.513478,5,5.0


In [172]:
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])

In [173]:
df1.loc[dates[0]:dates[1], 'E'] = 1
df1

Unnamed: 0,A,B,C,D,F,E
2020-01-01,0.0,0.0,0.036478,5,,1.0
2020-01-02,-0.250383,0.493196,0.026729,5,1.0,1.0
2020-01-03,0.673103,-0.989981,-0.925157,5,2.0,
2020-01-04,1.350489,-0.717024,-0.809382,5,3.0,


DataFrameクラスの[dropna()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)を使うことで、欠落データ(NaN)を含むデータを除外することが出来ます。

In [175]:
df1.dropna(how='any') # NaNを含むデータを除外

Unnamed: 0,A,B,C,D,F,E
2020-01-02,-0.250383,0.493196,0.026729,5,1.0,1.0


Pandasクラスの[isna()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isna.html)を使うことで、欠落データかどうかの判定を行うことが出来ます。

In [177]:
pd.isna(df1)

Unnamed: 0,A,B,C,D,F,E
2020-01-01,False,False,False,False,True,False
2020-01-02,False,False,False,False,False,False
2020-01-03,False,False,False,False,False,True
2020-01-04,False,False,False,False,False,True


[5. Operations - 操作](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#operations)

[6. Merge - マージ](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#merge)

[7. Grouping - グルーピング](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#grouping)

[8. Reshaping - 再構築](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#reshaping)

[9. Time series - 時系列](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#time-series)

[10. Categoricals - カテゴリー化](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#categoricals)

[11. Plotting - プロット](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#plotting)

[12. Getting data in/out データの入力と出力](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#getting-data-in-out)

[13. Gotchas - 落とし穴](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#gotchas)