<a href="https://colab.research.google.com/github/TurgutOzkan/Time_Series_Analysis/blob/master/Time_Series_1_Data_Management.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I will examine the implementation of time series analysis using Python in a series of posts. In this post, we will work on how to get our data ready for time series analysis. Time series are very important especially analyzing financial data, sensor input, or basically, any signal that extends over time. I will include codes and a brief explanation. They are pretty straightforward.

In [0]:
import numpy as np
import pandas as pd
from datetime import datetime 

Timestamp and period are important tools to play with time series. The latter contains frequency attribute, which is very helpful in modifying the scope of your data.

In [0]:
time=pd.Timestamp(datetime(2018,12,23))
print(time)
print(time.year)
print(time.day)
print(time.weekday_name)

2018-12-23 00:00:00
2018
23
Sunday


In [0]:
period = pd.Period("2018-12") # monthly period
print(period)
print(period.asfreq("D")) #daily frequency
print(period.asfreq("H")) # hourly frequency

2018-12
2018-12-31
2018-12-31 23:00


In [0]:
time=period.to_timestamp() # Automatically adds the first day of the month and hour as 00:00
print(time)
time.to_period("D") # converts a time stamp

2018-12-01 00:00:00


Period('2018-12-01', 'D')

**Converting a series to datetime format.**

In [0]:
#Let's say we want to convert a column to datetime.
z=pd.date_range(start="12-01-2018", periods=10, freq="D") # Create random data for the time series. It is integers
DF=pd.DataFrame(data= z, columns=["X1"])
DF.X1=pd.to_datetime(DF.X1)

**Creating Time Series**

In [0]:
date_time_index=pd.date_range(start="12-01-2018", periods=100, freq="D") # creates a date-time index object
data = np.random.randn(100,2) # Create random data for the time series
DF=pd.DataFrame(data= data, index= date_time_index, columns=["X1", "X2"]) # converts to a dataframe using the date-time index 
DF.head()

Unnamed: 0,X1,X2
2018-12-01,0.624325,-0.495021
2018-12-02,-0.099198,-0.339282
2018-12-03,-0.382629,2.048525
2018-12-04,-0.023323,0.563259
2018-12-05,-1.21909,0.52844


**Slicing the data**

In [0]:
DF.X1["2018-12-01":"2018-12-03"]

2018-12-01    0.624325
2018-12-02   -0.099198
2018-12-03   -0.382629
Freq: D, Name: X1, dtype: float64

**Changing the frequency of the series**

In [0]:
DF.asfreq('W') 

Unnamed: 0,X1,X2
2018-12-02,-0.099198,-0.339282
2018-12-09,-0.823008,-0.240675
2018-12-16,-0.296962,-0.318172
2018-12-23,1.898519,-0.261032
2018-12-30,0.097803,-0.502945
2019-01-06,-0.095242,2.47748
2019-01-13,0.666292,-0.286388
2019-01-20,-0.40063,-0.156985
2019-01-27,1.089333,-0.872086
2019-02-03,1.3095,-3.205892


In [0]:
DF.resample("W") #creates a resampler object

DatetimeIndexResampler [freq=<Week: weekday=6>, axis=0, closed=right, label=right, convention=start, base=0]

In [0]:
DF.X1.resample("W").asfreq() # if you apply another method, you get resampled dataframe

2018-12-02   -0.099198
2018-12-09   -0.823008
2018-12-16   -0.296962
2018-12-23    1.898519
2018-12-30    0.097803
2019-01-06   -0.095242
2019-01-13    0.666292
2019-01-20   -0.400630
2019-01-27    1.089333
2019-02-03    1.309500
2019-02-10   -0.000840
2019-02-17   -1.823047
2019-02-24    0.581872
2019-03-03   -0.353188
2019-03-10    0.689396
Freq: W-SUN, Name: X1, dtype: float64

In [0]:
DF.X1.resample("W").mean() # apply another method, you get resampled dataframe, but this time with mean values of X1 per week.

2018-12-02    0.262564
2018-12-09   -0.849346
2018-12-16    0.537632
2018-12-23   -0.036444
2018-12-30   -0.046397
2019-01-06   -0.133059
2019-01-13   -0.165135
2019-01-20   -0.639178
2019-01-27    0.138853
2019-02-03   -0.177281
2019-02-10    0.130841
2019-02-17   -0.270166
2019-02-24    0.308541
2019-03-03    0.285273
2019-03-10    0.659949
Freq: W-SUN, Name: X1, dtype: float64

**Creating Lagged Values**

In [0]:
DF["Lagged_X1"]=DF.X1.shift() # default prediod is 1
DF

Unnamed: 0,X1,X2,Lagged_X1
2018-12-01,0.624325,-0.495021,
2018-12-02,-0.099198,-0.339282,0.624325
2018-12-03,-0.382629,2.048525,-0.099198
2018-12-04,-0.023323,0.563259,-0.382629
2018-12-05,-1.219090,0.528440,-0.023323
2018-12-06,-0.577899,-1.128838,-1.219090
2018-12-07,-1.938390,-0.932730,-0.577899
2018-12-08,-0.981086,-0.891412,-1.938390
2018-12-09,-0.823008,-0.240675,-0.981086
2018-12-10,0.033392,-1.169605,-0.823008


In [0]:
DF["Future_Shifted_X1"]=DF.X1.shift(periods = -1) # default prediod is 1
DF

Unnamed: 0,X1,X2,Lagged_X1,Future_Shifted_X1
2018-12-01,0.624325,-0.495021,,-0.099198
2018-12-02,-0.099198,-0.339282,0.624325,-0.382629
2018-12-03,-0.382629,2.048525,-0.099198,-0.023323
2018-12-04,-0.023323,0.563259,-0.382629,-1.219090
2018-12-05,-1.219090,0.528440,-0.023323,-0.577899
2018-12-06,-0.577899,-1.128838,-1.219090,-1.938390
2018-12-07,-1.938390,-0.932730,-0.577899,-0.981086
2018-12-08,-0.981086,-0.891412,-1.938390,-0.823008
2018-12-09,-0.823008,-0.240675,-0.981086,0.033392
2018-12-10,0.033392,-1.169605,-0.823008,1.888944


**Calculating Differences Easily**

In [0]:
DF["X3"] = DF.X1.div(5).mul(10) # You can use divide and multiply methods for data frame columns
DF["X3"] = DF["X3"] +5
DF["X3"] = DF["X3"].apply(lambda x: round(x, 0)) # rounding the column so later operations become clearer
DF.head()

Unnamed: 0,X1,X2,Lagged_X1,Future_Shifted_X1,X3
2018-12-01,0.624325,-0.495021,,-0.099198,6.0
2018-12-02,-0.099198,-0.339282,0.624325,-0.382629,5.0
2018-12-03,-0.382629,2.048525,-0.099198,-0.023323,4.0
2018-12-04,-0.023323,0.563259,-0.382629,-1.21909,5.0
2018-12-05,-1.21909,0.52844,-0.023323,-0.577899,3.0


In [0]:
DF["DIFF_X3"]=DF.X3.diff()  # difference between time t and time t-1. It looks differences in rows sequentially for a given column.
DF.head()

Unnamed: 0,X1,X2,Lagged_X1,Future_Shifted_X1,X3,DIFF_X3
2018-12-01,0.624325,-0.495021,,-0.099198,6.0,
2018-12-02,-0.099198,-0.339282,0.624325,-0.382629,5.0,-1.0
2018-12-03,-0.382629,2.048525,-0.099198,-0.023323,4.0,-1.0
2018-12-04,-0.023323,0.563259,-0.382629,-1.21909,5.0,1.0
2018-12-05,-1.21909,0.52844,-0.023323,-0.577899,3.0,-2.0


In [0]:
DF["PERCENT_X3"]=DF.X3.pct_change().mul(100)  # similar to diff, pct_change() provides percentage changes for adjacent rows.
# If you provide a value, you can find further time differences, for example, change between now and seventh value.
DF.head()

Unnamed: 0,X1,X2,Lagged_X1,Future_Shifted_X1,X3,DIFF_X3,PERCENT_X3
2018-12-01,0.624325,-0.495021,,-0.099198,6.0,,
2018-12-02,-0.099198,-0.339282,0.624325,-0.382629,5.0,-1.0,-16.666667
2018-12-03,-0.382629,2.048525,-0.099198,-0.023323,4.0,-1.0,-20.0
2018-12-04,-0.023323,0.563259,-0.382629,-1.21909,5.0,1.0,25.0
2018-12-05,-1.21909,0.52844,-0.023323,-0.577899,3.0,-2.0,-40.0


**Window Functions**

Moving averages are useful for a number of purposes such as smoothing out short-term fluctuations, removing outliers or emphasizing long-term trends.

**(1) Rolling Averages For Time Series**

In [0]:
DF.rolling(window='2D').mean() # every data points are the average of preceding two days.

Unnamed: 0,X1,X2,Lagged_X1,Future_Shifted_X1,X3,DIFF_X3,PERCENT_X3
2018-12-01,0.624325,-0.495021,,-0.099198,6.0,,
2018-12-02,0.262564,-0.417152,0.624325,-0.240913,5.5,-1.0,-16.666667
2018-12-03,-0.240913,0.854621,0.262564,-0.202976,4.5,-1.0,-18.333333
2018-12-04,-0.202976,1.305892,-0.240913,-0.621207,4.5,0.0,2.500000
2018-12-05,-0.621207,0.545850,-0.202976,-0.898495,4.0,-0.5,-7.500000
2018-12-06,-0.898495,-0.300199,-0.621207,-1.258144,3.5,-0.5,-3.333333
2018-12-07,-1.258144,-1.030784,-0.898495,-1.459738,2.5,-1.0,-20.833333
2018-12-08,-1.459738,-0.912071,-1.258144,-0.902047,2.0,-0.5,62.500000
2018-12-09,-0.902047,-0.566043,-1.459738,-0.394808,3.0,1.0,100.000000
2018-12-10,-0.394808,-0.705140,-0.902047,0.961168,4.0,1.0,33.333333


**(2) Expanding Window Functions**

In [0]:
DF.X1.expanding().sum().head() # cumulative sum

2018-12-01    0.624325
2018-12-02    0.525127
2018-12-03    0.142499
2018-12-04    0.119175
2018-12-05   -1.099915
Freq: D, Name: X1, dtype: float64

In [0]:
DF.X1.cumsum().head() # a different method for cumulative sum

2018-12-01    0.624325
2018-12-02    0.525127
2018-12-03    0.142499
2018-12-04    0.119175
2018-12-05   -1.099915
Freq: D, Name: X1, dtype: float64

That's it. In a later post, I will discuss various alternatives to analyze time series data using Python. Stay tuned!