# 时间序列处理
* 时间数据类型
* 时间数据计算
#### 处理dataframe时，经常会遇到时间类型数据，本次课会介绍时间序列处理的常见function

In [1]:
import pandas as pd
import numpy as np


In [2]:
df=pd.read_csv('./data/stock.csv')
df.head()

Unnamed: 0,close,date,ticker
0,776.6,2015-12-29,GOOG
1,771.0,2015-12-30,GOOG
2,758.88,2015-12-31,GOOG
3,741.84,2016-01-04,GOOG
4,108.74,2015-12-29,AAPL


## 时间数据类型

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   close   8 non-null      float64
 1   date    8 non-null      object 
 2   ticker  8 non-null      object 
dtypes: float64(1), object(2)
memory usage: 320.0+ bytes


In [4]:
pd.to_datetime(df['date'])

0   2015-12-29
1   2015-12-30
2   2015-12-31
3   2016-01-04
4   2015-12-29
5   2015-12-30
6   2015-12-31
7   2016-01-04
Name: date, dtype: datetime64[ns]

In [5]:
df=pd.read_csv('./data/stock.csv',parse_dates=['date'])


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   close   8 non-null      float64       
 1   date    8 non-null      datetime64[ns]
 2   ticker  8 non-null      object        
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 320.0+ bytes


In [8]:
df['date'].dt.year

0    2015
1    2015
2    2015
3    2016
4    2015
5    2015
6    2015
7    2016
Name: date, dtype: int64

In [9]:
df['date'].dt.month

0    12
1    12
2    12
3     1
4    12
5    12
6    12
7     1
Name: date, dtype: int64

In [11]:
df['date'].dt.day

0    29
1    30
2    31
3     4
4    29
5    30
6    31
7     4
Name: date, dtype: int64

In [12]:
df['date'].dt.week

0    53
1    53
2    53
3     1
4    53
5    53
6    53
7     1
Name: date, dtype: int64

In [13]:
df['date'].dt.weekday

0    1
1    2
2    3
3    0
4    1
5    2
6    3
7    0
Name: date, dtype: int64

In [15]:
df['date'].dt.hour

0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
Name: date, dtype: int64

In [16]:
df['date'].dt.minute

0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
Name: date, dtype: int64

In [17]:
df['date'].dt.second

0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
Name: date, dtype: int64

In [19]:
df['date'].dt.date

0    2015-12-29
1    2015-12-30
2    2015-12-31
3    2016-01-04
4    2015-12-29
5    2015-12-30
6    2015-12-31
7    2016-01-04
Name: date, dtype: object

In [20]:
df['date'].dt.dayofyear

0    363
1    364
2    365
3      4
4    363
5    364
6    365
7      4
Name: date, dtype: int64

In [21]:
df['date'].dt.days_in_month

0    31
1    31
2    31
3    31
4    31
5    31
6    31
7    31
Name: date, dtype: int64

In [24]:
df['date'].dt.day

0    29
1    30
2    31
3     4
4    29
5    30
6    31
7     4
Name: date, dtype: int64

In [25]:
df['date'].dt.quarter

0    4
1    4
2    4
3    1
4    4
5    4
6    4
7    1
Name: date, dtype: int64

In [26]:
df['date'].dt.weekofyear

0    53
1    53
2    53
3     1
4    53
5    53
6    53
7     1
Name: date, dtype: int64

## 时间数据计算

In [27]:
import datetime


In [28]:
df['date']

0   2015-12-29
1   2015-12-30
2   2015-12-31
3   2016-01-04
4   2015-12-29
5   2015-12-30
6   2015-12-31
7   2016-01-04
Name: date, dtype: datetime64[ns]

In [39]:
df['days']=df['date'].apply(lambda x:x+datetime.timedelta(days=5))

In [30]:
df['date'].apply(lambda x:x+datetime.timedelta(days=-5))

0   2015-12-24
1   2015-12-25
2   2015-12-26
3   2015-12-30
4   2015-12-24
5   2015-12-25
6   2015-12-26
7   2015-12-30
Name: date, dtype: datetime64[ns]

In [40]:
df['seconds']=df['date'].apply(lambda x:x+datetime.timedelta(seconds=5))

In [41]:
from dateutil.relativedelta import relativedelta
df['month']=df['date'].apply(lambda x:x+relativedelta(months=5))

In [42]:
df['year']=df['date'].apply(lambda x:x+relativedelta(years=5))

In [43]:
df

Unnamed: 0,close,date,ticker,days,seconds,month,year
0,776.6,2015-12-29,GOOG,2016-01-03,2015-12-29 00:00:05,2016-05-29,2020-12-29
1,771.0,2015-12-30,GOOG,2016-01-04,2015-12-30 00:00:05,2016-05-30,2020-12-30
2,758.88,2015-12-31,GOOG,2016-01-05,2015-12-31 00:00:05,2016-05-31,2020-12-31
3,741.84,2016-01-04,GOOG,2016-01-09,2016-01-04 00:00:05,2016-06-04,2021-01-04
4,108.74,2015-12-29,AAPL,2016-01-03,2015-12-29 00:00:05,2016-05-29,2020-12-29
5,107.32,2015-12-30,AAPL,2016-01-04,2015-12-30 00:00:05,2016-05-30,2020-12-30
6,105.26,2015-12-31,AAPL,2016-01-05,2015-12-31 00:00:05,2016-05-31,2020-12-31
7,105.35,2016-01-04,AAPL,2016-01-09,2016-01-04 00:00:05,2016-06-04,2021-01-04


In [44]:
df['days']-df['date']

0   5 days
1   5 days
2   5 days
3   5 days
4   5 days
5   5 days
6   5 days
7   5 days
dtype: timedelta64[ns]

In [48]:
(df['days']-df['date']).dt.total_seconds()

0    432000.0
1    432000.0
2    432000.0
3    432000.0
4    432000.0
5    432000.0
6    432000.0
7    432000.0
dtype: float64

In [47]:
(df['seconds']-df['date']).dt.total_seconds()

0    5.0
1    5.0
2    5.0
3    5.0
4    5.0
5    5.0
6    5.0
7    5.0
dtype: float64

In [56]:
((df['month']-df['date']).dt.days/30).astype(int)

0    5
1    5
2    5
3    5
4    5
5    5
6    5
7    5
dtype: int32

In [63]:
((df['year']-df['date']).dt.days/365).astype(int)

0    5
1    5
2    5
3    5
4    5
5    5
6    5
7    5
dtype: int32

### 课后练习

In [104]:
sp5_df = pd.read_csv(
    'sp500.csv', usecols=['date', 'adj_close'])

把date列的变为时间类型

通过date列，产生年，月，日 列

对 date 加 5 分钟