## Python + pandas 日期、时间、时间序列的处理

## 1. datatime

- datetime 格式定义
    * `%Y`: 4位数的年 
    * `%y`: 2位数的年 
    * `%m`: 2位数的月(01,12) 
    * `%d`: 2位数的日(01，31) 
    * `%H`: 时（24小时制）(00,23) 
    * `%l`: 时（12小时制）(01,12) 
    * `%M`: 2位数的分(00,59) 
    * `%S`: 秒(00,61)有闰秒的存在 
    * `%w`: 用整数表示的星期几(0（星期天），6) 
    * `%F`: %Y-%m-%d简写形式例如，2017-06-27
    * `%D`: %m/%d/%y简写形式 
- `timedelta(days=1,hours=1,minutes=1,seconds=1)`: 可与 datatime 相加减； 
- `datatime_stamp.strftime('%Y-%m-%d')`: 将 datatime 格式化打印；
- `datetime.strptime(str,'%Y-%m-%d')`: 将 str => datatime；

In [1]:
from datetime import datetime,timedelta

In [2]:
# 查看当前时间
now = datetime.now()
now

datetime.datetime(2018, 7, 14, 11, 10, 21, 874278)

In [7]:
# 进行加减运算
delta = now - datetime(2017,6,27,10,10,10,10)

print(f"delta \t\t\t {delta}")
print(f"delta.seconds \t\t {delta.seconds}")
print(f"delta.microseconds \t {delta.microseconds}")

display(
    delta,
    delta.seconds,
    delta.microseconds,
)

delta 			 382 days, 1:00:11.874268
delta.seconds 		 3611
delta.microseconds 	 874268


datetime.timedelta(382, 3611, 874268)

3611

874268

In [10]:
# 日期转换为字符串
stamp = datetime(2017,6,27)

display(
    stamp,
    str(stamp),
)

datetime.datetime(2017, 6, 27, 0, 0)

'2017-06-27 00:00:00'

In [5]:
# 格式化打印格式
stamp.strftime('%Y-%m-%d')  # %Y是4位年，%y是2位年

'2017-06-27'

In [6]:
# 对多个时间进行解析成字符串
date = ['2017-6-26','2017-6-27']
datetime2 = [datetime.strptime(x,'%Y-%m-%d') for x in date]
datetime2

[datetime.datetime(2017, 6, 26, 0, 0), datetime.datetime(2017, 6, 27, 0, 0)]

In [16]:
datetime2[0]

datetime.datetime(2017, 6, 26, 0, 0)

In [18]:
# timedelta的应用
aDay = timedelta(days=1,hours=1,minutes=1,seconds=1)
date1 = datetime2[0] + aDay
date1

datetime.datetime(2017, 6, 27, 1, 1, 1)

### 2. dateutil.parser [第三方库]  时间解析函数

In [12]:
from dateutil.parser import parse

In [13]:
parse('2017-6-27')   # 将 str => datatime

datetime.datetime(2017, 6, 27, 0, 0)

In [14]:
parse('27/6/2017',dayfirst =True)

datetime.datetime(2017, 6, 27, 0, 0)

## 3. pandas

In [10]:
import pandas as pd

### 3.1 转换时间格式 

In [11]:
# 转换生成
date = ['2017-6-26', '2017-6-27']
date = pd.to_datetime(date)
date[0], date[1]

(Timestamp('2017-06-26 00:00:00'), Timestamp('2017-06-27 00:00:00'))

### 3.2 时间序列基础以及时间、日期处理

In [18]:
import numpy as np
import pandas as pd

#### 3.2.1 简单时间序列

In [19]:
dates = ['2017-06-20','2017-06-21',
         '2017-06-22','2017-06-23',
         '2017-06-24','2017-06-25',
         '2017-06-26','2017-06-27']

In [20]:
ts = pd.Series(np.random.randn(8),
               index = pd.to_datetime(dates))

In [21]:
ts

2017-06-20    0.095857
2017-06-21    0.832284
2017-06-22   -1.566338
2017-06-23    1.335867
2017-06-24    1.765087
2017-06-25    1.010140
2017-06-26   -1.215724
2017-06-27   -0.106063
dtype: float64

In [16]:
ts.index

DatetimeIndex(['2017-06-20', '2017-06-21', '2017-06-22', '2017-06-23',
               '2017-06-24', '2017-06-25', '2017-06-26', '2017-06-27'],
              dtype='datetime64[ns]', freq=None)

#### 3.2.2 不同索引的时间序列之间的算术运算

In [22]:
ts[::2]

2017-06-20    0.095857
2017-06-22   -1.566338
2017-06-24    1.765087
2017-06-26   -1.215724
dtype: float64

In [23]:
ts[::-2]

2017-06-27   -0.106063
2017-06-25    1.010140
2017-06-23    1.335867
2017-06-21    0.832284
dtype: float64

In [25]:
ts + ts[::2]  # 取并集 

2017-06-20    0.191713
2017-06-21         NaN
2017-06-22   -3.132675
2017-06-23         NaN
2017-06-24    3.530175
2017-06-25         NaN
2017-06-26   -2.431449
2017-06-27         NaN
dtype: float64

#### 3.2.3 索引为日期的Series和DataFrame数据的索引、选取以及子集构造

In [26]:
ts

2017-06-20    0.095857
2017-06-21    0.832284
2017-06-22   -1.566338
2017-06-23    1.335867
2017-06-24    1.765087
2017-06-25    1.010140
2017-06-26   -1.215724
2017-06-27   -0.106063
dtype: float64

In [29]:
# 通过索引取值方法
ts[ts.index[2]], ts[2], ts['2017-06-22'],ts['20170622']

(-1.566337541054539,
 -1.566337541054539,
 -1.566337541054539,
 -1.566337541054539)

In [30]:
ts['2017-06']  # ts['201706'] 会报错

2017-06-20    0.095857
2017-06-21    0.832284
2017-06-22   -1.566338
2017-06-23    1.335867
2017-06-24    1.765087
2017-06-25    1.010140
2017-06-26   -1.215724
2017-06-27   -0.106063
dtype: float64

In [31]:
ts['2017-06-20':'2017-06-23']  # 前闭后闭区间 

2017-06-20    0.095857
2017-06-21    0.832284
2017-06-22   -1.566338
2017-06-23    1.335867
dtype: float64

#### 3.2.3 带有重复索引的时间序列

In [32]:
dates = pd.DatetimeIndex(['2017/06/01','2017/06/02','2017/06/02','2017/06/02','2017/06/03'])
dates

DatetimeIndex(['2017-06-01', '2017-06-02', '2017-06-02', '2017-06-02',
               '2017-06-03'],
              dtype='datetime64[ns]', freq=None)

In [34]:
dup_ts = pd.Series(np.arange(5),index = dates)
dup_ts

2017-06-01    0
2017-06-02    1
2017-06-02    2
2017-06-02    3
2017-06-03    4
dtype: int32

In [36]:
# 判断索引是否有重复的
dup_ts.index.is_unique

False

In [37]:
dup_ts['2017-06-02']

2017-06-02    1
2017-06-02    2
2017-06-02    3
dtype: int32

In [38]:
# 聚合求值
grouped = dup_ts.groupby(level=0).mean()
grouped

2017-06-01    0
2017-06-02    2
2017-06-03    4
dtype: int32

In [39]:
# DataFrame
dup_df = pd.DataFrame(np.arange(10).reshape((5,2)),index = dates )
dup_df

Unnamed: 0,0,1
2017-06-01,0,1
2017-06-02,2,3
2017-06-02,4,5
2017-06-02,6,7
2017-06-03,8,9


In [40]:
# 聚合
grouped = dup_df.groupby(level=0).mean()
grouped

Unnamed: 0,0,1
2017-06-01,0,1
2017-06-02,4,5
2017-06-03,8,9


In [41]:
pd.Timestamp('2017-06-01 00:00:00')

Timestamp('2017-06-01 00:00:00')

### 3.3 日期范围、频率、重采样以及频率转换

#### 3.3.1 生成指定日期范围的范围

In [42]:
pd.date_range('2017-6-20','2017-6-27')

DatetimeIndex(['2017-06-20', '2017-06-21', '2017-06-22', '2017-06-23',
               '2017-06-24', '2017-06-25', '2017-06-26', '2017-06-27'],
              dtype='datetime64[ns]', freq='D')

In [43]:
pd.date_range('2017-6-20 12:59:30','2017-6-27')

DatetimeIndex(['2017-06-20 12:59:30', '2017-06-21 12:59:30',
               '2017-06-22 12:59:30', '2017-06-23 12:59:30',
               '2017-06-24 12:59:30', '2017-06-25 12:59:30',
               '2017-06-26 12:59:30'],
              dtype='datetime64[ns]', freq='D')

In [44]:
pd.date_range('2017-6-20 12:59:30',periods = 8)

DatetimeIndex(['2017-06-20 12:59:30', '2017-06-21 12:59:30',
               '2017-06-22 12:59:30', '2017-06-23 12:59:30',
               '2017-06-24 12:59:30', '2017-06-25 12:59:30',
               '2017-06-26 12:59:30', '2017-06-27 12:59:30'],
              dtype='datetime64[ns]', freq='D')

In [45]:
pd.date_range('2017-6-20 12:59:30',periods = 8, normalize = True)

DatetimeIndex(['2017-06-20', '2017-06-21', '2017-06-22', '2017-06-23',
               '2017-06-24', '2017-06-25', '2017-06-26', '2017-06-27'],
              dtype='datetime64[ns]', freq='D')

#### 3.3.2 频率和日期偏移量
pandas中的频率是由一个基础频率（M、H）也可以是（Hour、Minute、h、min等）

In [46]:
pd.date_range('2017-6-27',periods = 7,freq = '1h30min')

DatetimeIndex(['2017-06-27 00:00:00', '2017-06-27 01:30:00',
               '2017-06-27 03:00:00', '2017-06-27 04:30:00',
               '2017-06-27 06:00:00', '2017-06-27 07:30:00',
               '2017-06-27 09:00:00'],
              dtype='datetime64[ns]', freq='90T')

In [47]:
pd.date_range('2017-6-27',periods = 7,freq = 'd')

DatetimeIndex(['2017-06-27', '2017-06-28', '2017-06-29', '2017-06-30',
               '2017-07-01', '2017-07-02', '2017-07-03'],
              dtype='datetime64[ns]', freq='D')

In [48]:
pd.date_range('2017-6-27',periods = 7,freq = 'H')

DatetimeIndex(['2017-06-27 00:00:00', '2017-06-27 01:00:00',
               '2017-06-27 02:00:00', '2017-06-27 03:00:00',
               '2017-06-27 04:00:00', '2017-06-27 05:00:00',
               '2017-06-27 06:00:00'],
              dtype='datetime64[ns]', freq='H')

In [49]:
# 获取每月的第几个星期
pd.date_range('2017-06-01','2017-07-31',freq='WOM-3FRI')

DatetimeIndex(['2017-06-16', '2017-07-21'], dtype='datetime64[ns]', freq='WOM-3FRI')

### 4 重采样和频率转换

降采样：高频数据到低频数据

升采样：低频数据到高频数据

主要函数：resample()

#### resample方法的参数

- `freq`: 表示重采样频率，例如‘M’、‘5min’，Second(15)
- `how=’mean'`: 用于产生聚合值的函数名或数组函数，例如‘mean’、‘ohlc’、np.max等，默认 是‘mean’，其他常用的值由：‘first’、‘last’、‘median’、‘max’、‘min’
- `axis=0`: 默认是纵轴，横轴设置axis=1
- `fill_method = None`: 升采样时如何插值，比如‘ffill’、‘bfill’等
- `closed = ‘right'`: 在降采样时，各时间段的哪一段是闭合的，‘right’或‘left’，默认‘right'
- `label= ‘right'`: 在降采样时，如何设置聚合值的标签，例如，9：30-9：35会被标记成9：30还是9：35,默认9：35 
- `loffset = None`: 面元标签的时间校正值，比如‘-1s’或Second(-1)用于将聚合标签调早1秒
- `limit=None`: 在向前或向后填充时，允许填充的最大时期数
- `kind = None`: 聚合到时期（‘period’）或时间戳（‘timestamp’），默认聚合到时间序列的索引类型 
- `convention = None`: 当重采样时期时，将低频率转换到高频率所采用的约定（start或end）。默认‘end’