Introduction to Machine Learning & Time Series Analysis

Machine learning is a programming approach allowing applications to learn from their inputs and make adjustments based on their outputs.

In a nutshell, machine learning, is developing statistical models that can make predictions or decisions on new data automatically.

Machine Learning is used to predict:

Loan eligibility, foreclosure rates, and fraud.
Disease diagnosis and prognosis.
Consumer segmentation and clustering.
Presidential election results.
Natural disaster and planetary climate impacts.

" Intelligent algorithms are ones that use data to modify its behavior. Intelligent algorithms differ in that they can change their behavior as they run, often resulting in a user experience that many would say is intelligent."

They are ones that respond to data such that the algorithm gets better. It effectively “evolves.” The decision is no longer deterministic given the event.

01- Time Series Basics

In Time Series Data, Data needs to be sliced and diced at various time frequencies in order to analyze data points as a time series (E.g. day, week, month, year). Pandas DateTimeIndex index can be used to help to slice & dice data, one of the 3 main used functions:

1- weekly = df['Close'].resample('W').mean(): Pandas.DataFrmae.resample Documentation

2- df.loc[2019] loc function: Pandas.DataFrame.loc Documentation

3- df.iloc[2019] iloc function: Pandas.DataFrame.iloc Documentation

4- resample function rules: resample function rules/stackoverflow

5- Numpy Array Reshaping: Numpy Array Reshaping with Examples

02- Time Series Decomposition

Separation of a time series into useful and less useful components. The useful components can be used to observe patterns and to make predictions. Compnents are:

1- Trend: Is there an overall direction of movement? it is the upward or downward trend

2- Periodicity: Do patterns occur in cycles? it is the seasonality

3- Residual: How much noise exists in the data? it is the leftovers after trend & seasonality are omitted.

To decompse the 3 components, we have to import statsmodels liabrary in python

from statsmodels.tsa.seasonal import seasonal_decompose

you can have full insight of statsmodel liabrary at: https://www.statsmodels.org/stable/gettingstarted.html

The following plot to illustrate our Trend, Seasonality, & Noise in liquor sales example in Folder 01

03- Hodrick-Prescott Filter

A mathematical function that separates a time series into trend and non-trend components. It filters out short-term fluctuations, the following plot shows closing prices before smoothing out noise

Exponentially Weighted Moving Average (EWMA)

EWMA is an approach used to “denoise” or “smooth” out time series data so that trends and predictions can be made and to to see in which direction is the price moving?

it involves calculating the average of the last (n) prices
Weights are added to the averages based on the recency of the data
- Recent data is weighted more heavily
- Weighting decreases exponentially for previous prices/time periods
Requires past average values to be stored in memory

You can see in this plot, how the curve has been smoothed after using EWMA so we start using our filter

Next step is to apply Hodrick-Prescott Filter, so we can differntiate between the stock trend & noise, this can be done by using statsmodels as follows To decompse the 3 components, we have to import statsmodels liabrary in python

import statsmodels.api as sm
ts_noise, ts_trend = sm.tsa.filters.hpfilter(df['close'])

you can read more about Hodrick-Prescott Filter at statsmodels.tsa.filters.hp_filter.hpfilter & Wikipedia

04- Autocorrelation

Statistical correlation summarizes the strength of the relationship between two variables.

We can assume the distribution of each variable fits a Gaussian (bell curve) distribution. If this is the case, we can use the Pearson’s correlation coefficient to summarize the correlation between the variables.

The Pearson’s correlation coefficient is a number between -1 and 1 that describes a negative or positive correlation respectively. A value of zero indicates no correlation.

We can calculate the correlation for time series observations with observations with previous time steps, called lags. Because the correlation of the time series observations is calculated with values of the same series at previous times, this is called a serial correlation, or an autocorrelation.

A plot of the autocorrelation of a time series by lag is called the AutoCorrelation Function, or the acronym ACF. This plot is sometimes called a correlogram or an autocorrelation plot

import statsmodels as sm
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

Running the example creates a 2D plot showing the lag value along the x-axis and the correlation on the y-axis between -1 and 1.

Confidence intervals are drawn as a cone. By default, this is set to a 95% confidence interval, suggesting that correlation values outside of this code are very likely a correlation and not a statistical fluke.

By default, all lag values are printed, which makes the plot noisy.

Partial Autocorrelation Function A partial autocorrelation is a summary of the relationship between an observation in a time series with observations at prior time steps with the relationships of intervening observations removed.

you can dig more in the Correlation, Auto-correlation at : Autocorrelation-Wikipedia

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
01-Time_Series_Basics		01-Time_Series_Basics
02-Decomposition		02-Decomposition
03-Hodrick_Prescott		03-Hodrick_Prescott
04-Auto_Correlation		04-Auto_Correlation
Images		Images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to Machine Learning & Time Series Analysis

01- Time Series Basics

02- Time Series Decomposition

1- Trend: Is there an overall direction of movement? it is the upward or downward trend

2- Periodicity: Do patterns occur in cycles? it is the seasonality

3- Residual: How much noise exists in the data? it is the leftovers after trend & seasonality are omitted.

03- Hodrick-Prescott Filter

Exponentially Weighted Moving Average (EWMA)

04- Autocorrelation

About

Releases

Packages

Languages

SashaFlores/Time_Series_Analysis_I

Folders and files

Latest commit

History

Repository files navigation

Introduction to Machine Learning & Time Series Analysis

01- Time Series Basics

02- Time Series Decomposition

1- Trend: Is there an overall direction of movement? it is the upward or downward trend

2- Periodicity: Do patterns occur in cycles? it is the seasonality

3- Residual: How much noise exists in the data? it is the leftovers after trend & seasonality are omitted.

03- Hodrick-Prescott Filter

Exponentially Weighted Moving Average (EWMA)

04- Autocorrelation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages