## 시계열 데이터 분석 with Pandas

- DataRime Index
- Time Resampling : 연/월/주/시/분/초 단위로 집계
- Time Shifting : 시간을 lag하는 경우
- Rolling and Expanding : 노이즈를 포함하는 데이터에서 일반적인 트렌드를 구하기 위해 rolling mean 사용

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

In [6]:
# 날짜 형식의 자료 만들기
today=datetime(2020,10,24)
print(today)

datetime.datetime(2020, 10, 24, 0, 0)

In [21]:
# 날짜 인덱싱
dates=[datetime(2020,10,24),datetime(2020,10,25),datetime(2019,10,16)]
print("dates : ",dates)
dt_idx=pd.DatetimeIndex(dates)
print("dt_idx : ",dt_idx)

dates :  [datetime.datetime(2020, 10, 24, 0, 0), datetime.datetime(2020, 10, 25, 0, 0), datetime.datetime(2019, 10, 25, 0, 0)]
dt_idx :  DatetimeIndex(['2020-10-24', '2020-10-25', '2019-10-25'], dtype='datetime64[ns]', freq=None)


In [27]:
# dataFrame
data=np.random.randn(3,2)
cols=["A","B"]
df=pd.DataFrame(data=data,index=dt_idx,columns=cols)
print(df)

                   A         B
2020-10-24  0.709739 -0.374505
2020-10-25  1.448306 -0.671488
2019-10-25 -0.280037 -1.144889


In [33]:
print(df.index.min()) # 가장 작은 값
print(df.index.max()) # 가장 큰 값
print(df.index.day_name()) #요일
print(df.index.argsort()) # 작은 값부터 데이터의 인덱스 반환

2019-10-25 00:00:00
2020-10-25 00:00:00
Index(['Saturday', 'Sunday', 'Friday'], dtype='object')
[2 0 1]


### Resampling

In [43]:
df=pd.read_csv('../data/AAPL.csv')
print(df.head())
print(df.info())
print(df.describe())

         Date        Open        High         Low       Close   Adj Close  \
0  2014-09-29  100.589996  100.690002   98.040001   99.620003   93.514290   
1  2014-10-06   99.949997  102.379997   98.309998  100.730003   94.556244   
2  2014-10-13  101.330002  101.779999   95.180000   97.669998   91.683792   
3  2014-10-20   98.320000  105.489998   98.220001  105.220001   98.771042   
4  2014-10-27  104.849998  108.040001  104.699997  108.000000  101.380676   

      Volume  
0  142718700  
1  280258200  
2  358539800  
3  358532900  
4  220230600  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 184 entries, 0 to 183
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       184 non-null    object 
 1   Open       184 non-null    float64
 2   High       184 non-null    float64
 3   Low        184 non-null    float64
 4   Close      184 non-null    float64
 5   Adj Close  184 non-null    float64
 6   Volume     184 non

In [58]:
# 날짜형식으로 컬럼 형식 변경
df['Date']=df['Date'].apply(pd.to_datetime)
df.set_index("Date",inplace=True)

In [66]:
# index에서 연월일 추출
df['month']=df.index.month
print(df['month'])
df['year']=df.index.year
print(df['year'])
df['day']=df.index.day
print(df['day'])
df

Date
2014-09-29     9
2014-10-06    10
2014-10-13    10
2014-10-20    10
2014-10-27    10
              ..
2018-03-05     3
2018-03-12     3
2018-03-19     3
2018-03-26     3
2018-03-29     3
Name: month, Length: 184, dtype: int64
Date
2014-09-29    2014
2014-10-06    2014
2014-10-13    2014
2014-10-20    2014
2014-10-27    2014
              ... 
2018-03-05    2018
2018-03-12    2018
2018-03-19    2018
2018-03-26    2018
2018-03-29    2018
Name: year, Length: 184, dtype: int64
Date
2014-09-29    29
2014-10-06     6
2014-10-13    13
2014-10-20    20
2014-10-27    27
              ..
2018-03-05     5
2018-03-12    12
2018-03-19    19
2018-03-26    26
2018-03-29    29
Name: day, Length: 184, dtype: int64


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,month,year,day
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2014-09-29,100.589996,100.690002,98.040001,99.620003,93.514290,142718700,9,2014,29
2014-10-06,99.949997,102.379997,98.309998,100.730003,94.556244,280258200,10,2014,6
2014-10-13,101.330002,101.779999,95.180000,97.669998,91.683792,358539800,10,2014,13
2014-10-20,98.320000,105.489998,98.220001,105.220001,98.771042,358532900,10,2014,20
2014-10-27,104.849998,108.040001,104.699997,108.000000,101.380676,220230600,10,2014,27
...,...,...,...,...,...,...,...,...,...
2018-03-05,175.210007,180.000000,174.270004,179.979996,179.979996,139852700,3,2018,5
2018-03-12,180.289993,183.500000,177.619995,178.020004,178.020004,155417500,3,2018,12
2018-03-19,177.320007,177.470001,164.940002,164.940002,164.940002,172670700,3,2018,19
2018-03-26,168.070007,175.149994,165.190002,167.779999,167.779999,157898400,3,2018,26
