### 时序问题本质上是回归问题

## 传统时序建模

arima 模型是 arma 模型的升级版；arma 模型只能针对平稳数据进行建模，而 arima 模型需要先对数据进行差分，差分平稳后在进行建模。这两个模型能处理的问题还是比较简单，究其原因主要是以下两点：

- arma/arima 模型归根到底还是简单的线性模型，能表征的问题复杂程度有限；

- arma 全名是自回归滑动平均模型，它只能支持对单变量历史数据的回归，处理不了多变量的情况。

### 原理篇：
#### 基本的金融时间序列知识 
重点介绍基本的金融时间序列知识和 arma 模型:
https://zhuanlan.zhihu.com/p/38320827

更为高阶的 arch 和 garch 模型:
https://zhuanlan.zhihu.com/p/21962996

- arma模型实践

In [2]:
import pandas as pd
import numpy as np
from scipy import  stats
import matplotlib.pyplot as plt
import tushare as ts
import datetime
from statsmodels.graphics.api import qqplot
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARMA

In [16]:
# pro = ts.pro_api('3001fe98f65a158f337ac7b0aa45c911c57973159023bd4bcbed2fd9')
# ALL_DATA=pro.cn_gdp()

gdp_data = pd.read_csv('C:/Users/Lenovo/Downloads/gdp.csv',encoding='gbk')
gdp_data['year'] = gdp_data['季度'].apply(lambda x:str(x)[:4])
gdp_data['实际GDP'] = gdp_data['实际GDP'].apply(lambda x:float(x.replace(',','')))

In [24]:
gdp_data_year = gdp_data.groupby(['year'],as_index=False).sum()[['year','实际GDP']]
gdp_data_year = gdp_data_year.rename(columns={'实际GDP':'GDP'})
gdp_data_year

In [30]:
gdp_data_year = gdp_data_year.set_index('year')
gdp_series = gdp_data_year['GDP']
gdp_series

Unnamed: 0,year,GDP
0,1948,8079.9
1,1949,8035.8
2,1950,8736.0
3,1951,9439.9
4,1952,9824.5
...,...,...
62,2010,59135.2
63,2011,60082.3
64,2012,61476.7
65,2013,62841.1


In [34]:
dftest = adfuller(gdp_series, autolag='AIC')
dftest

(1.8404645366781154,
 0.9984259839819806,
 1,
 65,
 {'1%': -3.5352168748293127,
  '5%': -2.9071540828402367,
  '10%': -2.5911025443786984},
 872.4317205460368)

In [None]:
def test_stationarity(timeseries):
    
    #滑动均值和方差
    rolmean = timeseries.rolling(4).mean()
    rolstd = timeseries.rolling(4).std()

    #绘制滑动统计量
    plt.figure(figsize=(24, 8))   
    orig = plt.plot(timeseries, color='blue',label='Original')
    mean = plt.plot(rolmean, color='red', label='Rolling Mean')
    std = plt.plot(rolstd, color='black', label = 'Rolling Std')
    
    plt.legend(loc='best')
    plt.title('Rolling Mean & Standard Deviation')
    plt.show(block=False)
    
    #adf检验，若t值小于Critical Value，则平稳（也可以参考p值）
    print('Results of Dickey-Fuller Test:')
    dftest = adfuller(timeseries, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used',
                                             'Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    print(dfoutput)