Name : Gourav Verma<br>
Class : DSC530-T302<br>
Week 10: Time Series Analysis<br>
Assignment : 10.2, 12-2<br>

**Write a definition for a class named SerialCorrelationTest that extends HypothesisTest from Section 9.2. It should take a series and a lag as data, compute the serial correlation of the series with the given lag, and then compute the p-value of the observed correlation.
Use this class to test whether the serial correlation in raw price data is statistically significant. Also test the residuals of the linear model and (if you did the previous exercise), the quadratic model.**

In [34]:
import numpy as np
import pandas as pd

import statsmodels.formula.api as smf
import random
import regression

import thinkstats2
import thinkplot
import timeseries

In [35]:
class SerialCorrelationTest(thinkstats2.HypothesisTest):
    """Tests serial correlations by permutation."""

    def TestStatistic(self, data):
        """Computes the test statistic.

        data: tuple of xs and ys
        """
        series, lag = data
        test_stat = abs(thinkstats2.SerialCorr(series, lag))
        return test_stat

    def RunModel(self):
        """Run the model of the null hypothesis.

        returns: simulated data
        """
        series, lag = self.data
        permutation = series.reindex(np.random.permutation(series.index))
        return permutation, lag

In [36]:
def RunQuadraticModel(daily):
    """Runs a linear model of prices versus years.

    daily: DataFrame of daily prices

    returns: model, results
    """
    daily['years2'] = daily.years**2
    model = smf.ols('ppg ~ years + years2', data=daily)
    results = model.fit()
    return model, results

In [37]:
def TestSerialCorr(daily):
    """Tests serial correlations in daily prices and their residuals.

    daily: DataFrame of daily prices
    """
    # test the correlation between consecutive prices
    series = daily.ppg
    test = SerialCorrelationTest((series, 1))
    pvalue = test.PValue()
    print('correlation between consecutive prices - ','test.actual : ', test.actual, 'pvalue : ', pvalue)

    # test for serial correlation in residuals of the linear model
    _, results = timeseries.RunLinearModel(daily)
    series = results.resid
    test = SerialCorrelationTest((series, 1))
    pvalue = test.PValue()
    print('serial correlation in residuals of the linear model - ', 'test.actual : ', test.actual, 'pvalue : ', pvalue)

    # test for serial correlation in residuals of the quadratic model
    _, results = RunQuadraticModel(daily)
    series = results.resid
    test = SerialCorrelationTest((series, 1))
    pvalue = test.PValue()
    print('serial correlation in residuals of the quadratic model - ', 'test.actual : ', test.actual, 'pvalue : ', pvalue)

In [38]:
# Reads data about cannabis transactions
transactions = timeseries.ReadData()

# map from quality name to DataFrame
dailies = timeseries.GroupByQualityAndDay(transactions)

# test the correlation between consecutive prices
name = 'high'
daily = dailies[name]

series = daily.ppg
test = SerialCorrelationTest((series, 1))
pvalue = test.PValue()
print(test.actual, pvalue)

0.485229376194738 0.0


In [39]:
# Test defferent correlations
TestSerialCorr(daily)

correlation between consecutive prices -  test.actual :  0.485229376194738 pvalue :  0.0
serial correlation in residuals of the linear model -  test.actual :  0.07570473767506261 pvalue :  0.012
serial correlation in residuals of the quadratic model -  test.actual :  0.05607308161289916 pvalue :  0.048
