# Time Series

### Python 原生時間型別
根據Python官方手冊(Docs) https://docs.python.org/3/library/datetime.html  
在頁面內搜尋 strftime() and strptime() Format Codes  
可以找到相關代碼的對應意義!

In [20]:
from datetime import datetime

In [4]:
datetime(year=2015,month=7,day=4)

datetime.datetime(2015, 7, 4, 0, 0)

In [5]:
dt = datetime.strptime("21/11/06 16:30", "%d/%m/%y %H:%M")
dt

datetime.datetime(2006, 11, 21, 16, 30)

In [7]:
# 找出該日期是星期幾
dt.strftime('%A')

'Tuesday'

### Numpy的datetime64
Numpy利用dtype把日期編碼成64位元的整數，讓時間的運算能夠套用Numpy的快速與便利性。  
np.datetime('YYYY-MM-DD hh:mm:ss.ms' , 'unit')  

In [21]:
import numpy as np
date = np.array('2015-07-09' , dtype=np.datetime64)
date

array('2015-07-09', dtype='datetime64[D]')

In [10]:
# default 加天數
date + np.arange(12)

array(['2015-07-09', '2015-07-10', '2015-07-11', '2015-07-12',
       '2015-07-13', '2015-07-14', '2015-07-15', '2015-07-16',
       '2015-07-17', '2015-07-18', '2015-07-19', '2015-07-20'],
      dtype='datetime64[D]')

In [17]:
np.datetime64('2020-11-10 12:59:30.80' , 'ns')

numpy.datetime64('2020-11-10T12:59:30.800000000')

### Pandas時間資料結構
pd透過TimeStamp物件以及DataFrameIndex，讓時間可以在DataFrame以及Series內被當作Index索引。  

In [1]:
# to_datetime()
import pandas as pd
date  =pd.to_datetime("4th of July , 2015")
date

Timestamp('2015-07-04 00:00:00')

In [22]:
date.strftime('%A')

'Saturday'

In [23]:
# 向量化操作  +0~11天
date + pd.to_timedelta( np.arange(12) , 'D' ) 

DatetimeIndex(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
               '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
               '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
              dtype='datetime64[ns]', freq=None)

In [27]:
# 向量化操作  +0~11週
date + pd.to_timedelta( np.arange(12) , 'W' ) 

DatetimeIndex(['2015-07-04', '2015-07-11', '2015-07-18', '2015-07-25',
               '2015-08-01', '2015-08-08', '2015-08-15', '2015-08-22',
               '2015-08-29', '2015-09-05', '2015-09-12', '2015-09-19'],
              dtype='datetime64[ns]', freq=None)

In [31]:
# 建立時間索引的Series
index = pd.DatetimeIndex(['2014-07-04','2015-07-04','2016-07-04','2017-07-04'])
data = pd.Series([0,1,2,3] , index=index)
data

2014-07-04    0
2015-07-04    1
2016-07-04    2
2017-07-04    3
dtype: int64

In [33]:
# 透過時間做索引
data['2014-07-04':]

2014-07-04    0
2015-07-04    1
2016-07-04    2
2017-07-04    3
dtype: int64

In [35]:
data['2017']

2017-07-04    3
dtype: int64

### Pandas時間資料結構
Time Stamps (建立在numpy datetime64之下) => 索引結構為DateTimeIndex  
Time Periods (以numpy datetime64編碼固定頻率間隔) => 索引結構為PeriodIndex  
Time Deltas (以numpy datetime64為基礎，取代原生datetime.timedelta) => 索引結構為TimedeltaIndex  

其中以DateTimeIndex和PeriodIndex 最常使用，而最常被使用的函數則是pd.to_datatime()，他可以接受非常多種格式的時間字串組成的list。    


In [5]:
dates = pd.to_datetime([datetime(2015,7,3) , '4th of July , 2020' , '2020-Jul-6' , '07-10-2017' , '20220531'])
dates

DatetimeIndex(['2015-07-03', '2020-07-04', '2020-07-06', '2017-07-10',
               '2022-05-31'],
              dtype='datetime64[ns]', freq=None)

In [6]:
# 任一 DateTimeIndex 可透過 to_period()函數加上一個頻率單位編碼轉換成 PeriodIndex
dates.to_period('D')

PeriodIndex(['2015-07-03', '2020-07-04', '2020-07-06', '2017-07-10',
             '2022-05-31'],
            dtype='period[D]')

In [7]:
# 日期相減時會產生TimeDeltaIndex物件
dates - dates[0]

TimedeltaIndex(['0 days', '1828 days', '1830 days', '738 days', '2524 days'], dtype='timedelta64[ns]', freq=None)

### 規則性的序列
> pd.date_range() : 建立日期序列，產生time stamps  
> pd.period_range() : 產生periods  
> pd.timedelta_range() :產生time deltas  

>|型別	|標量class	|陣列class	|pandas資料型別	|主要建立方法|
>|----|----|----|----|----|
>|Date times  |Timestamp	|DatetimeIndex	|datetime64[ns] or datetime64[ns, tz]	|`to_datetime or date_range`|
>|Time deltas	|Timedelta	|TimedeltaIndex	|timedelta64[ns]	|to_timedelta or timedelta_range|
>|Time spans	|Period	|PeriodIndex	|period [freq]	|`Period or period_range`|
>|Date offsets	|DateOffset	|None	|None	|DateOffset|

In [8]:
# 給定起始日期 + 結束日期
# 注意有包含結束
pd.date_range('2015-07-03' , '2015-07-12')

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10',
               '2015-07-11', '2015-07-12'],
              dtype='datetime64[ns]', freq='D')

In [9]:
# 給定起始日期 + period長度
pd.date_range('2015-07-03' , periods=10)

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10',
               '2015-07-11', '2015-07-12'],
              dtype='datetime64[ns]', freq='D')

In [11]:
# 給定起始日期 + period長度 + 間隔時間單位 (預設為day)
pd.date_range('2015-07-03' , periods=10 , freq='H')


DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 01:00:00',
               '2015-07-03 02:00:00', '2015-07-03 03:00:00',
               '2015-07-03 04:00:00', '2015-07-03 05:00:00',
               '2015-07-03 06:00:00', '2015-07-03 07:00:00',
               '2015-07-03 08:00:00', '2015-07-03 09:00:00'],
              dtype='datetime64[ns]', freq='H')

In [12]:
# 建立年-月的period
pd.period_range('2015-07' , periods=10 , freq='M')

PeriodIndex(['2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12',
             '2016-01', '2016-02', '2016-03', '2016-04'],
            dtype='period[M]')

In [17]:
# 每次增加1小時的delta序列
pd.timedelta_range(0 , periods=10 , freq='H')

TimedeltaIndex(['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
                '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
                '0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
                '0 days 09:00:00'],
               dtype='timedelta64[ns]', freq='H')

In [18]:
# 每次增加2.5小時的delta序列
pd.timedelta_range(0 , periods=10 , freq='2H30T')

TimedeltaIndex(['0 days 00:00:00', '0 days 02:30:00', '0 days 05:00:00',
                '0 days 07:30:00', '0 days 10:00:00', '0 days 12:30:00',
                '0 days 15:00:00', '0 days 17:30:00', '0 days 20:00:00',
                '0 days 22:30:00'],
               dtype='timedelta64[ns]', freq='150T')

### 頻率和位移值代碼
| 頻率	|日期偏移量	|說明|
|:---------------------:|---------------------|------|
| D                     | Day                 | 日曆日|
| B	                    | BusinessDay         | 工作日|
| H                    	| Hour	              | 小時|
| T 或 min              | Minute	             | 分|
| S                    	| Second	          | 秒|
| L 或 ms	           | Milli	             | 毫秒|
| U	                    | Micro	              | 微秒|
| M	                    | MonthEnd	          | 每月最後一個日曆日|
| BM	                | BusinessMonthEnd    | 每月最後一個工作日|
| MS	                | MonthBegin          | 每月第一個日曆日|
| BMS	                | BussinessMonthBegin | 每月第一個工作日|
| W-MON、W-TUE、…	      | Week                | 指定星期幾 (MON、TUE、WED、THU、FRI、SAT、SUN)|
| WOM-1MON、WOM-2MON、… | WeekOfMonth	        | 產生每月第一、第二、第三或第四周的星期幾。例如WOM-3FRI表示每月第3個星期五|
| Q-JAN、Q-FEB、…	      | QuarterEnd	         | 以指定月份結束的年度，每季度最後一個月的最後一個日曆日|
| BQ-JAN、BQ-FEB、…	  | BusinessQuarterEnd	 | 以指定月份結束的年度，每季度最後一個月的最後一個工作日|
| QS-JAN、QS-FEB、…	  | QuarterBegin	     | 以指定月份結束的年度，每季度最後一個月的第一個日曆日|
| BQS-JAN、BQS-FEB、…	  | BusinessQuarterBegin | 以指定月份結束的年度，每季度最後一個月的第一個工作日|
| A-JAN、A-FEB、…	      | YearEnd              | 每年指定月份的最後一個日曆日|
| BA-JAN、BA-FEB、…	  | BusinessYearEnd	     | 每年指定月份的最後一個工作日|
| AS-JAN、AS-FEB、…	  | YearBegin	         | 每年指定月份的第一個日曆日|
| BAS-JAN、BAS-FEB、…	  | BusinessYearBegin	 | 每年指定月份的第一個工作日|


In [23]:
# 所有時間相關的編碼可以在pd.tseries.offsets內找到
from pandas.tseries.offsets import BDay
pd.date_range('2015-07-01' , periods=10 , freq=BDay())

DatetimeIndex(['2015-07-01', '2015-07-02', '2015-07-03', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10',
               '2015-07-13', '2015-07-14'],
              dtype='datetime64[ns]', freq='B')

### 重新取樣與時間偏移

In [24]:
ts = pd.Series(np.random.randint(10, 100, 1000), index=pd.date_range('1/1/2018', periods=1000))
print(f'ts :\n{ts}\n\n'
      f"ts['2018'] :\n{ts['2018']}\n\n"
      f"ts['2018-05'][:] :\n{ts['2018-05'][:]}\n\n"
      f"ts.truncate(after='1/9/2018') :\n{ts.truncate(after='1/9/2018')}\n\n"
      f"ts.truncate(before='1/9/2018') :\n{ts.truncate(before='1/9/2018')}")

ts :
2018-01-01    46
2018-01-02    94
2018-01-03    12
2018-01-04    96
2018-01-05    80
              ..
2020-09-22    83
2020-09-23    80
2020-09-24    28
2020-09-25    15
2020-09-26    72
Freq: D, Length: 1000, dtype: int32

ts['2018'] :
2018-01-01    46
2018-01-02    94
2018-01-03    12
2018-01-04    96
2018-01-05    80
              ..
2018-12-27    38
2018-12-28    41
2018-12-29    81
2018-12-30    11
2018-12-31    35
Freq: D, Length: 365, dtype: int32

ts['2018-05'][:] :
2018-05-01    44
2018-05-02    45
2018-05-03    57
2018-05-04    24
2018-05-05    63
2018-05-06    40
2018-05-07    99
2018-05-08    37
2018-05-09    83
2018-05-10    90
2018-05-11    26
2018-05-12    36
2018-05-13    42
2018-05-14    30
2018-05-15    94
2018-05-16    25
2018-05-17    63
2018-05-18    24
2018-05-19    13
2018-05-20    38
2018-05-21    22
2018-05-22    83
2018-05-23    99
2018-05-24    21
2018-05-25    63
2018-05-26    22
2018-05-27    96
2018-05-28    22
2018-05-29    84
2018-05-30    78
2018-0

In [25]:
resampler = ts.resample(rule='BM').sum()  # try sum 2D : 2 days, M
print(f'{ts[:10]}\n\n'
      f'{resampler[:10]}\n')

2018-01-01    46
2018-01-02    94
2018-01-03    12
2018-01-04    96
2018-01-05    80
2018-01-06    18
2018-01-07    40
2018-01-08    51
2018-01-09    79
2018-01-10    32
Freq: D, dtype: int32

2018-01-31    1618
2018-02-28    1607
2018-03-30    1441
2018-04-30    1448
2018-05-31    1649
2018-06-29    1641
2018-07-31    1689
2018-08-31    1768
2018-09-28    1549
2018-10-31    1807
Freq: BM, dtype: int32



In [26]:
# shift 將資料位移
df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
                   "Col2": [13, 23, 18, 33, 48],
                   "Col3": [17, 27, 22, 37, 52]},
                  index=pd.date_range("2020-01-01", "2020-01-05"))
df

Unnamed: 0,Col1,Col2,Col3
2020-01-01,10,13,17
2020-01-02,20,23,27
2020-01-03,15,18,22
2020-01-04,30,33,37
2020-01-05,45,48,52


In [28]:
# 注意前面的資料因為位移而缺失，因此使用fill_value填補
df.shift(periods=3, fill_value=0)

Unnamed: 0,Col1,Col2,Col3
2020-01-01,0,0,0
2020-01-02,0,0,0
2020-01-03,0,0,0
2020-01-04,10,13,17
2020-01-05,20,23,27


In [29]:
# tshift 將index位移
# 目前已停用，全部使用shift來完成
df.shift(periods=3, freq="D")

Unnamed: 0,Col1,Col2,Col3
2020-01-04,10,13,17
2020-01-05,20,23,27
2020-01-06,15,18,22
2020-01-07,30,33,37
2020-01-08,45,48,52


In [31]:
# 移動欄位方向
df.shift(periods=1, axis="columns")

Unnamed: 0,Col1,Col2,Col3
2020-01-01,,10,13
2020-01-02,,20,23
2020-01-03,,15,18
2020-01-04,,30,33
2020-01-05,,45,48
