# １章 ウェブの注文数を分析する１０本ノック

ここでは、ある企業のECサイトでの商品の注文数の推移を分析していきます。  
データの属性を理解し、分析をするためにデータを加工した後、  
データの可視化を行うことで問題を発見していくプロセスを学びます。

### ノック１：データを読み込んでみよう

In [2]:
import pandas as pd


customer_master = pd.read_csv('customer_master.csv')
customer_master.head()

Unnamed: 0,customer_id,customer_name,registration_date,customer_name_kana,email,gender,age,birth,pref
0,IK152942,平田 裕次郎,2019-01-01 00:25:33,ひらた ゆうじろう,hirata_yuujirou@example.com,M,29,1990/6/10,石川県
1,TS808488,田村 詩織,2019-01-01 01:13:45,たむら しおり,tamura_shiori@example.com,F,33,1986/5/20,東京都
2,AS834628,久野 由樹,2019-01-01 02:00:14,ひさの ゆき,hisano_yuki@example.com,F,63,1956/1/2,茨城県
3,AS345469,鶴岡 薫,2019-01-01 04:48:22,つるおか かおる,tsuruoka_kaoru@example.com,M,74,1945/3/25,東京都
4,GD892565,大内 高史,2019-01-01 04:54:51,おおうち たかし,oouchi_takashi@example.com,M,54,1965/8/5,千葉県


In [3]:
item_master = pd.read_csv('item_master.csv')
item_master.head()

Unnamed: 0,item_id,item_name,item_price
0,S001,PC-A,50000
1,S002,PC-B,85000
2,S003,PC-C,120000
3,S004,PC-D,180000
4,S005,PC-E,210000


In [4]:
transaction1 = pd.read_csv('transaction_1.csv')
transaction1.head()

Unnamed: 0,transaction_id,price,payment_date,customer_id
0,T0000000113,210000,2019-02-01 01:36:57,PL563502
1,T0000000114,50000,2019-02-01 01:37:23,HD678019
2,T0000000115,120000,2019-02-01 02:34:19,HD298120
3,T0000000116,210000,2019-02-01 02:47:23,IK452215
4,T0000000117,170000,2019-02-01 04:33:46,PL542865


In [6]:
transaction_detail1 = pd.read_csv('transaction_detail_1.csv')
transaction_detail1.head()

Unnamed: 0,detail_id,transaction_id,item_id,quantity
0,0,T0000000113,S005,1
1,1,T0000000114,S001,1
2,2,T0000000115,S003,1
3,3,T0000000116,S005,1
4,4,T0000000117,S002,2


### ノック２：データを結合(ユニオン)してみよう

In [11]:
transaction2 = pd.read_csv('transaction_2.csv')
transaction = pd.concat([transaction1, transaction2], ignore_index=True)
transaction.head()

Unnamed: 0,transaction_id,price,payment_date,customer_id
0,T0000000113,210000,2019-02-01 01:36:57,PL563502
1,T0000000114,50000,2019-02-01 01:37:23,HD678019
2,T0000000115,120000,2019-02-01 02:34:19,HD298120
3,T0000000116,210000,2019-02-01 02:47:23,IK452215
4,T0000000117,170000,2019-02-01 04:33:46,PL542865


In [13]:
transaction_detail2 = pd.read_csv('transaction_detail_2.csv')
transaction_detail = pd.concat([transaction_detail1, transaction_detail2], ignore_index=True)

In [20]:
len(transaction_detail)

7144

### ノック３：売上データ同士を結合(ジョイン)してみよう

In [27]:
joined_transaction = pd.merge(transaction_detail, transaction[['transaction_id', 'payment_date', 'customer_id']], on='transaction_id', how='left')
print(joined_transaction.head())

   detail_id transaction_id item_id  quantity         payment_date customer_id
0          0    T0000000113    S005         1  2019-02-01 01:36:57    PL563502
1          1    T0000000114    S001         1  2019-02-01 01:37:23    HD678019
2          2    T0000000115    S003         1  2019-02-01 02:34:19    HD298120
3          3    T0000000116    S005         1  2019-02-01 02:47:23    IK452215
4          4    T0000000117    S002         2  2019-02-01 04:33:46    PL542865


In [28]:
print(len(transaction_detail))
print(len(transaction))
print(len(joined_transaction))

7144
6786
7144


### ノック４：マスタデータを結合(ジョイン)してみよう

In [33]:
joined_data = pd.merge(joined_transaction, item_master, on='item_id', how='left')
print(joined_data.head())
joined_data = pd.merge(joined_data, customer_master, on='customer_id', how='left')
print(joined_data.head())

   detail_id transaction_id item_id  quantity         payment_date  \
0          0    T0000000113    S005         1  2019-02-01 01:36:57   
1          1    T0000000114    S001         1  2019-02-01 01:37:23   
2          2    T0000000115    S003         1  2019-02-01 02:34:19   
3          3    T0000000116    S005         1  2019-02-01 02:47:23   
4          4    T0000000117    S002         2  2019-02-01 04:33:46   

  customer_id item_name  item_price  
0    PL563502      PC-E      210000  
1    HD678019      PC-A       50000  
2    HD298120      PC-C      120000  
3    IK452215      PC-E      210000  
4    PL542865      PC-B       85000  
   detail_id transaction_id item_id  quantity         payment_date  \
0          0    T0000000113    S005         1  2019-02-01 01:36:57   
1          1    T0000000114    S001         1  2019-02-01 01:37:23   
2          2    T0000000115    S003         1  2019-02-01 02:34:19   
3          3    T0000000116    S005         1  2019-02-01 02:47:23   
4

### ノック5：必要なデータ列を作ろう

In [42]:
joined_data['price'] = joined_data['quantity'] * joined_data['item_price']
joined_data[['price', 'quantity', 'item_price']].head

<bound method NDFrame.head of        price  quantity  item_price
0     210000         1      210000
1      50000         1       50000
2     120000         1      120000
3     210000         1      210000
4     170000         2       85000
...      ...       ...         ...
7139  180000         1      180000
7140   85000         1       85000
7141  100000         2       50000
7142   85000         1       85000
7143   85000         1       85000

[7144 rows x 3 columns]>

### ノック6：データ検算をしよう

### ノック7：各種統計量を把握しよう

### ノック8：月別でデータを集計してみよう

### ノック9：月別、商品別でデータを集計してみよう

### ノック10：商品別の売上推移を可視化してみよう