# Time Series Data

This notebook covers the basics of time series data utilizing the 10-year historical prices of the Campbell Soup Company. The data was obtained from Yahoo Finance (https://finance.yahoo.com/quote/CPB/history?p=CPB) on December 19, 2019.

In [1]:
import pandas as pd

In [2]:
cpb = pd.read_csv('price_data/CPB.csv')
cpb.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2009-12-21,33.119999,33.48,33.029999,33.470001,24.608217,1952800
1,2009-12-22,33.549999,33.799999,33.389999,33.779999,24.836138,1715700
2,2009-12-23,33.68,34.369999,33.630001,34.34,25.247873,2565000
3,2009-12-24,34.220001,34.52,34.119999,34.509998,25.372852,1043700
4,2009-12-28,34.400002,34.400002,34.080002,34.27,25.398802,1247800


In [3]:
cpb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2516 entries, 0 to 2515
Data columns (total 7 columns):
Date         2516 non-null object
Open         2516 non-null float64
High         2516 non-null float64
Low          2516 non-null float64
Close        2516 non-null float64
Adj Close    2516 non-null float64
Volume       2516 non-null int64
dtypes: float64(5), int64(1), object(1)
memory usage: 137.7+ KB


In [4]:
type(cpb.Date[0])

str

In [5]:
# Convert 'Date' column values to datetime data type
cpb.Date = pd.to_datetime(cpb.Date)

In [6]:
type(cpb.Date[0])

pandas._libs.tslibs.timestamps.Timestamp

In [7]:
# Set the Date column to the index
cpb.set_index('Date', inplace = True)

In [8]:
cpb.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2516 entries, 2009-12-21 to 2019-12-18
Data columns (total 6 columns):
Open         2516 non-null float64
High         2516 non-null float64
Low          2516 non-null float64
Close        2516 non-null float64
Adj Close    2516 non-null float64
Volume       2516 non-null int64
dtypes: float64(5), int64(1)
memory usage: 137.6 KB


## Resampling

### Down Sampling

Converting the data to a less frequent interval (daily to monthly) by using the mean

In [13]:
# Monthly resampling

cpb_monthly= cpb.resample('MS')
cpb_mean = cpb_monthly.mean()
cpb_mean.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2009-12-01,33.92125,34.195,33.71375,34.05125,25.1362,1545450.0
2010-01-01,33.236316,33.434736,32.89579,33.182105,24.592518,2430147.0
2010-02-01,33.217895,33.528947,32.974736,33.294737,24.675993,2486111.0
2010-03-01,34.599565,34.833479,34.440435,34.705218,25.730214,1924700.0
2010-04-01,35.533334,35.748572,35.363333,35.574286,26.570531,1884467.0


### Upsampling

Converting the data to a more frequent interval (daily to twice daily) by forward filling the data

In [14]:
cpb_bidaily = cpb.resample('12H').asfreq()
cpb_bidaily.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2009-12-21 00:00:00,33.119999,33.48,33.029999,33.470001,24.608217,1952800.0
2009-12-21 12:00:00,,,,,,
2009-12-22 00:00:00,33.549999,33.799999,33.389999,33.779999,24.836138,1715700.0
2009-12-22 12:00:00,,,,,,
2009-12-23 00:00:00,33.68,34.369999,33.630001,34.34,25.247873,2565000.0


In [15]:
# Forward fill the data

cpb_bidaily_ffill = cpb.resample('12H').ffill()
cpb_bidaily_ffill.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2009-12-21 00:00:00,33.119999,33.48,33.029999,33.470001,24.608217,1952800
2009-12-21 12:00:00,33.119999,33.48,33.029999,33.470001,24.608217,1952800
2009-12-22 00:00:00,33.549999,33.799999,33.389999,33.779999,24.836138,1715700
2009-12-22 12:00:00,33.549999,33.799999,33.389999,33.779999,24.836138,1715700
2009-12-23 00:00:00,33.68,34.369999,33.630001,34.34,25.247873,2565000
