## A Time series is a collection of data points indexed, listed or graphed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.Thus it is a sequence of discrete-time data.

# Working with time series in python

### These days, time series represent a large amount of the data our economy is built on wether it's the yearly sales, or the price of a stock since it's quotation in the stock market. The study of time series will help us find trends, turning points, seasonly trands, weekly trends etc. Python allows us to do this with numerous libraries such as Numpy and Pandas.


### We will be studying the Microsoft Corporation stock market prices within a time period of 5 year to this day.

## Preparing the data

### Whether your data comes from a database, a .csv file, or an API you can organize your data with Python’s Pandas library.

In [15]:

from os import sep
import pandas as pd

# Leading the data with the read_csv method from pandas
data = pd.read_csv("../DATASETS_STOCKS_PRICE/MSFT.csv", sep=',')

# Having a look at our data with the info() method, specifying the number of rows we want to see
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       1258 non-null   object 
 1   Open       1258 non-null   float64
 2   High       1258 non-null   float64
 3   Low        1258 non-null   float64
 4   Close      1258 non-null   float64
 5   Adj Close  1258 non-null   float64
 6   Volume     1258 non-null   int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 68.9+ KB


### Once you load your data into pandas you will need to check the data type of the time based column. If the values are in string format you will need to convert the datatype to a pandas datetime object. The stock price date column was a string datatype in the format “year-month-date.” I utilized pandas’ to_datetime method to convert this column to a datetime object:

In [20]:
data.Date = pd.to_datetime(data.Date)

data = data.set_index('Date')

data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1258 entries, 2016-12-13 to 2021-12-10
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       1258 non-null   float64
 1   High       1258 non-null   float64
 2   Low        1258 non-null   float64
 3   Close      1258 non-null   float64
 4   Adj Close  1258 non-null   float64
 5   Volume     1258 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 68.8 KB


## Resampling the time intervals

### You may have multiple dataframes with different time intervals. This creates the need for resampling the frequency of the data. Let’s say you want to merge a dataframe with daily data and a dataframe with monthly data. This creates the need for downsampling, or grouping all of the daily data by month to create consistent intervals. Once the data is grouped then you need a compatible metric, usually the mean of the more frequent data, to restore consistency. These steps can be achieved in one line with the .resample() and .mean() methods:

In [21]:
data['Adj Close'].resample('MS').mean()

Date
2016-12-01     58.600345
2017-01-01     58.754541
2017-02-01     59.802722
2017-03-01     60.653525
2017-04-01     61.897882
                 ...    
2021-08-01    293.475840
2021-09-01    296.332142
2021-10-01    302.988065
2021-11-01    335.521590
2021-12-01    331.787502
Freq: MS, Name: Adj Close, Length: 61, dtype: float64

### The MS parameter corresponds to monthly resampling. If the values resampled are more frequent than the parameter used then this is called downsampling.If the values resampled are less frequent than the parameter used then this is called upsampling. Null values are added where the new date intervals are created. I upsampled the values from daily prices to twice daily. In our stock price scenario pandas will assign the original price to the midnight time stamp and the null value to the noon time stamp:

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-12-13,62.500000,63.419998,62.240002,62.980000,58.557426,35718868
2016-12-14,63.000000,63.450001,62.529999,62.680000,58.278496,30352654
2016-12-15,62.700001,63.153500,62.299999,62.580002,58.185524,27669868
2016-12-16,62.950001,62.950001,62.115002,62.299999,57.925182,42453083
2016-12-19,62.560001,63.770000,62.419998,63.619999,59.152485,34338219
...,...,...,...,...,...,...
2017-05-02,69.709999,69.709999,69.129997,69.300003,64.824242,23906119
2017-05-03,69.379997,69.379997,68.709999,69.080002,64.618462,28927973
2017-05-04,69.029999,69.080002,68.639999,68.809998,64.365891,21749409
2017-05-05,68.900002,69.029999,68.485001,69.000000,64.543625,19128782
