![image.png](attachment:image.png)

# TIMESERIES AND MACHINE LEARNING PRIMER
**This chapter is an introduction to the basics of machine learning, time series data, and the intersection between the two.**

# Timeseries Kinds & Applicatons

Welcome to Introduction to Machine Learning for Timeseries Data. This course is focused on the intersection of Machine Learning and Time series data, and hence we expect you have taken introductory courses on Machine learning and time series analysis here on DataCamp.

This course focuses on machine learning in the context of timeseries data. Put simply, a timeseries means data that changes over time. This can take many different forms, such as atmospheric CO2 over time, the waveform of my voice as I am speaking.
![image.png](attachment:image.png)

the fluctuation of a stock's value over the year, or demographic information about a city.
![image-2.png](attachment:image-2.png)

4. What makes a time series?
Timeseries data consists of at least two things: One, an array of numbers that represents the data itself. Two, another array that contains a timestamp for each datapoint. The timestamps can include a wide range of time data, from months of the year to nanoseconds.
![image-3.png](attachment:image-3.png)

5. Reading in a time series with Pandas
Here we import timeseries data into a pandas DataFrame. Note that each datapoint has a corresponding time point (in this case, a date), though multiple datapoints may have the same time point.

6. Plotting a pandas timeseries
Here is the code to plot this timeseries data with Matplotlib and Pandas. We first create a figure and axis, then read in the data with Pandas and use the dot-plot method to plot the data on the axis.

7. A timeseries plot
The amount of time that passes between timestamps defines the "period" of the timeseries. In this case, it is about one day. This often helps us infer what kind of timeseries we're dealing with.

8. Why machine learning?
Machine learning has taken the world of data science by storm. In the last few decades, advances in computing power, algorithms, and community practices have made it possible to use computers to ask questions that were never thought possible. Machine learning is about finding patterns in data - often patterns that are not immediately obvious to the human eye. This is often because the data is either too large or too complex to be processed by a human.
![image-4.png](attachment:image-4.png)

9. Why machine learning?
Another crucial part of machine learning is that we can build a model of the world that formalizes our knowledge of the problem at hand. We can use this model to make predictions. Combined with automation, this can be a critical component of an organization's decision making.

10. Why combine these two?
Why should we treat timeseries any differently from another data set? Well, machine learning is all about finding patterns in data. Timeseries data always change over time, which turns out to be a useful pattern to utilize. For example, here is a raw waveform of someone speaking, and here is a collection of timeseries features that were extracted from it. As you can see, using timeseries-specific features lets us see a much richer representation of the raw data.

11. A machine learning pipeline
This course will focus on a simple machine learning pipeline in the context of timeseries data. This boils down to the following main steps. Feature extraction: what kinds of special features leverage a signal that changes over time? Model fitting: what kinds of models are suitable for asking questions with timeseries data? Validation: How can we validate a model that uses timeseries data? What considerations must we make because it changes in time?
- Feature Extraction
- Model Fitting
- Validtion

# Machine Learning Basics

Now we'll cover the basics of Machine Learning. This should be a recap of material that you've already covered in previous DataCamp courses. We'll start with the basics of how to fit and predict a model using scikit-learn.

2. Always begin by looking at your data
Before performing any data analysis, you should always take a look at your raw data. This gives you a quick high-level take on the quality/kind of your data. In Numpy, you can do so by printing out the first few rows of the data.

3. Always begin by looking at your data
In Pandas, this can be done by using the dot-head method, which shows the first five rows and all columns by default.

4. Always visualize your data
It is also crucial to visualize your data. The proper visualization will depend on the kind of data you've got, though histograms and scatterplots are a good place to start. Look at the distribution of your data. Does it seem reasonable? Are there any outliers? Are you missing data? Each of these questions is important to answer before doing any analysis.

5. Scikit-learn
Once you've gotten to know your data, it's time to start modeling it. The most popular library for machine learning in Python is called "scikit-learn". It has a standardized API so that you can fit many different models with a similar code structure. Here, we import Support Vector Machine to classify datapoints.

6. Preparing data for scikit-learn
scikit-learn expects data to have a particular shape. Before using scikit-learn, your data should be two-dimensional. The first axis should correspond to sample number, and the second should correspond to feature number. This pattern is used in almost all scikit-learn functions. If your data is not in this shape, there are a few options for reshaping it so that you can use it with scikit-learn.

7. If your data is not shaped properly
The most common approach is to "transpose" your data. This will swap the first and last axis. This is most useful when your data is two-dimensional.

8. If your data is not shaped properly
Another option is to use the dot-reshape method, which lets you specify the shape you want.

9. Fitting a model with scikit-learn
Now that your data has the correct shape, it's time to fit a model. First we must create an instance of the model we've imported (in this case, a support-vector classifier). You can call the method dot-fit on this instance to train the model. Here we show how you can input X (training data) and y (labels for each datapoint) to fit the model.

10. Investigating the model
It is often useful to investigate what kind of pattern the model has found. Most models will store this information in attributes that are created after calling dot-fit. Here we show the coefficients the model has given to each feature.

11. Predicting with a fit model
Once your model is fit, you can call the dot-predict method on the model to determine labels for unseen datapoints.

# Machine Learining & Time Series Data

In the final lesson of this chapter, we'll discuss the interaction between machine learning and timeseries data, and introduce why they're worth thinking about in tandem.

2. Getting to know our data
First, let's give a quick overview of the data we'll be using. They're both freely available online, and come from the excellent website Kaggle-dot-com.

3. The Heartbeat Acoustic Data
Audio is a very common kind of timeseries data. Audio tends to have a very high sampling frequency (often above 20,000 samples per second!). Our first dataset is audio data recorded from the hearts of medical patients. A subset of these patients have heart abnormalities. Can we use only this heartbeat data to detect which subjects have abnormalities?

4. Loading auditory data
Audio data is often stored in "wav" files. We can list all of these files using the "glob" function. It lists files that match a given pattern. Each of these files contains the auditory data for one heartbeat session, as well as the sampling rate for that data.

5. Reading in auditory data
We'll use a library called "librosa" to read in the audio dataset. Librosa has functions for extracting features, visualizations, and analysis for auditory data. We can import the data using the "load" function. The data is stored in audio and the sampling frequency is stored in sfreq. Note that the sampling frequency here is 2205, which means 2205 samples are recorded per second.

6. Inferring time from samples
Using only the sampling frequency, we can infer the timepoint of each datapoint in our audio file, relative to the start of the file.

7. Creating a time array (I)
Now we'll create an array of timestamps for our data. To do so, you have two options. The first is to generate a range of indices from zero to the number of datapoints in your audio file, divide each index by the sampling frequency, and you have a timepoint for each data point.

8. Creating a time array (II)
The second option is to calculate the final timepoint of your audio data using a similar method. Then, use the linspace function to generate evenly-spaced numbers between 0 and the final timepoint. In either case, you should have an array of numbers of the same length as your audio data.

9. The New York Stock Exchange dataset
Next, we'll explore data from the New York Stock Exchange. It runs over a much longer timespan than our audio data, and has a sampling frequency on the order of one sample per day (compared with 2,205 samples per second with the audio data). Our goal is to predict the stock value of a company using historical data from the market. As we are predicting a continuous output value, this is a regression problem.

10. Looking at the data
Let's take a look at the raw data. Each row is a sample for a given day and company. It seems that the dates go back all the way to 2010.

11. Timeseries with Pandas DataFrames
It is useful to investigate the "type" of data in each column. Numpy or Pandas may treat an array of data in special ways depending on its type. We can print the type of each column by looking at the dot-dtypes attribute. Here we see that the type of each column is "object", which is a generic data type.

12. Converting a column to a time series
Since we know one column is actually a list of dates, let's change the column type to "datetime" using the to_datetime function. This will help us perform visualization and analysis later on.

# TIMESERIES AS INPUTS TO A MODEL
**The easiest way to incorporate time series into your machine learning pipeline is to use them as features in a model. This chapter covers common features that are extracted from time series in order to do machine learning.**

# Classifying a Timeseries

We'll now discuss one of the most common categories of machine learning problems: classification. We'll also discuss the concept of feature engineering in the context of time series data.

2. Always visualize raw data before fitting models
Before we begin, let's take a moment to once again visualize the data we're dealing with. There is a lot of complexity in any machine learning step, and visualizing your raw data is important to make sure you know where to begin.

3. Visualize your timeseries data!
To plot raw audio, we need two things: the raw audio waveform, usually in a 1- or 2-dimensional array. We also need the timepoint of each sample. We can calculate the time by dividing the index of each sample by the sampling frequency of the timeseries. This gives us the time for each sample relative to the beginning of the audio.
![image.png](attachment:image.png)

4. What features to use?
As we saw in the introduction, using raw data as input to a classifier is usually too noisy to be useful. An easy first step is to calculate summary statistics of our data, which removes the "time" dimension and give us a more traditional classification dataset.

5. Summarizing timeseries with features
Here we see a description of this process. For each timeseries, we calculate several summary statistics. These then can be used as features for a model. We have expanded a single feature (raw audio amplitude) to several features (here, the min, max, and average of each sample).
![image-2.png](attachment:image-2.png)

6. Calculating multiple features
Here we show how to calculate multiple features for a several timeseries. By using the "axis equals -1" keyword, we collapse across the last dimension, which is time. The result is an array of numbers, one per timeseries.
![image-3.png](attachment:image-3.png)

7. Fitting a classifier with scikit-learn
In the last step, we collapsed a two-dimensional array into a one-dimensional array for each feature of interest. We can then combine these as inputs to a model. In the case of classification, we also need a label for each timeseries that allows us to build a classifier.

8. Preparing your features for scikit-learn
In order to prepare your data for scikit-learn, remember to ensure that it has the correct shape, which is samples by features. Here we can use the column_stack function, which lets us stack arrays by turning them into the columns of a two-dimensional array. In addition, the labels array is 1-dimensional, so we reshape it so that it is two dimensions. Finally, we fit our model to these arrays, X and y.
![image-4.png](attachment:image-4.png)

9. Scoring your scikit-learn model
Now that we've fit our model, we'll score the classifier. There are many ways that we can score a classifier with scikit-learn. First, we show how to generate predictions with a model that has been fit to data. If we have separate test data, we can use the "predict" method to generate a predicted list of classes for each sample. We can then calculate a score by dividing the total number of correct predictions by the total number of test samples. Alternatively, we can use the accuracy_score function that's built into scikit-learn by passing the test set labels and the predictions.
![image-5.png](attachment:image-5.png)

# Improving Features for Classification

What we've just performed is feature engineering of our audio data. Next, we'll cover a few more features that are more unique to timeseries data.

2. The auditory envelope
We'll begin by calculating the "envelope" of each heartbeat sound. The envelope throws away information about the fine-grained changes in the signal, focusing on the general shape of the audio waveform. To do this, we'll need to calculate the audio's amplitude, then smooth it over time.
![image-6.png](attachment:image-6.png)

3. Smoothing over time
First, we'll remove noise in timeseries data by smoothing it with a rolling window. This means defining a window around each timepoint, calculating the mean of this window, and then repeating this for each timepoint.

4. Smoothing your data
For example, on the left we have a noisy timeseries as well as an overlay of several small windows. Each timepoint will be replaced by the mean of the window just before it. The result is a smoother signal over time which you can see on the right.
![image-7.png](attachment:image-7.png)

5. Calculating a rolling window statistic
Let's cover how to do this with Pandas. We first use the dot-rolling method of our dataframe, which returns an object that can be used to calculate many different statistics within each window. The window parameter tells us how many timepoints to include in each window. The larger the window, the smoother the result will be.
![image-8.png](attachment:image-8.png)

6. Calculating the auditory envelope
Now that we know how to smooth our data, we can calculate the auditory envelope of our signal. First, we calculate the "absolute value" of each timepoint. This is also called "rectification", because you ensure that all time points are positive. Next, we calculate a rolling mean to smooth the signal. Let's see what these transformations look like.
![image-9.png](attachment:image-9.png)

7. The raw signal
First, we'll take a look at the raw audio signal.
![image-10.png](attachment:image-10.png)

8. Rectify the signal
Next, we take the absolute value of each timepoint.
![image-11.png](attachment:image-11.png)

9. Smooth the signal
Finally, we smooth the rectified signal. The result is a smooth representation of how the audio energy changes over time.
![image-12.png](attachment:image-12.png)

10. Feature engineering the envelope
Once we've calculated the acoustic envelope, we can create better features for our classifier. Here we'll calculate several common statistics of each auditory envelope, and combine them in a way that scikit-learn can use.
![image-13.png](attachment:image-13.png)

11. Preparing our features for scikit-learn
We'll then stack these features together with the same function we've used before. Even though we're calculating the same statistics (avg, standard deviation, and max), they are on different features, and so have different information about the stimulus.
![image-14.png](attachment:image-14.png)

12. Cross validation for classification
Now that our features are defined, lets fit a classifier and see how it performs. We'll use cross-validation in order to train and test the model on different subsets of data. We can use a single function to combine the steps of splitting data into training and validation sets, fitting the model on training data, and scoring predictions on validation data. Using "cross_val_score" will generate a list of scores across different "splits" of our data.
![image-15.png](attachment:image-15.png)

13. Using cross_val_score
To use it, pass an instance of a scikit-learn model as the first parameter, and the X and y data as second and third parameters. You can configure the strategy that scikit-learn uses to split the data with the CV parameter. Passing an integer will determine the number of splits that are made (and the number of scores generated).
![image-16.png](attachment:image-16.png)

14. Auditory features: The Tempogram
There are several more advanced features that can be calculated with timeseries data. Each attempts to detect particular patterns over time, and summarize them statistically. For example, a tempogram tells us the "tempo" of the sound at each moment. We'll show how to calculate it using a popular tool for audio analysis in Python called librosa.
![image-17.png](attachment:image-17.png)

15. Computing the tempogram
Here we show how librosa can be used to extract the tempogram from an audio array. This tells us the moment-by-moment tempo of the sound. We can then use this to calculate features for our classifier.
![image-18.png](attachment:image-18.png)

# The Spectrigram

In this lesson, we'll discuss a special case of timeseries features: the spectrogram. Spectrograms are common in timeseries analysis, and we'll cover some basics to help you apply it to your machine learning problems.

2. Fourier transforms
To begin, we'll discuss a key part of the spectrogram: the Fourier Transform. This approach summarizes a time series as a collection of fast- and slow-moving waves. The Fourier Transform (or FFT) is a way to tell us how these waves can be combined in different amounts to create our time series.

3. A Fourier Transform (FFT)
On the left is a raw audio signal, and on the right is the Fourier Transform (or FFT) of the signal. This describes, for a window of time, the presence of fast- and slow-oscillations that are present in a timeseries. The slower oscillations are on the left (closer to 0) and the faster oscillations are on the right. This is a more rich representation of our audio signal.
![image-19.png](attachment:image-19.png)

4. Spectrograms: combinations of windows Fourier transforms
We can calculate multiple fourier transforms in a sliding window to see how it changes over time. For each timepoint, we take a window of time around it, calculate a fourier transform for the window, then slide to the next window (similar to calculating the rolling mean). The result is a description of the fourier transform as it changes throughout the timeseries. This is called a short-time fourier transform or STFT.
![image-20.png](attachment:image-20.png)

5. A Spectrogram Visualized
To calculate the spectrogram, we square each value of the STFT. An example is shown here. Note how the spectral content of the sound changes over time. Because this is speech, we see interesting patterns that correspond to spoken words (e.g. vowels or consonants).
![image-21.png](attachment:image-21.png)

6. Calculating the STFT
We'll use librosa's stft function to calculate a spectrogram. There are many parameters in this process, but we'll focus on the size of the window that is used. We'll calculate the STFT of our audio file, then convert the output to decibels to visualize it more cleanly with specshow (which results in the visualized spectrogram).
![image-22.png](attachment:image-22.png)

7. Calculating the STFT with code
Here's how to compute an STFT with librosa. We first define the size of the window used for the STFT. Next, we calculate the STFT, then convert it to decibels using the amplitude_to_db function, which ensures all values are positive, real numbers. Finally, we use the specshow function, which lets us quickly visualize a spectrogram. This code was used to produce the image shown in the previous slide. Note that we're glossing over some complex details for how spectrograms are calculated, but are focusing on the essentials for the purpose of fitting models.
![image-23.png](attachment:image-23.png)

8. Spectral feature engineering
Each timeseries has a unique spectral pattern to it. This means we can use patterns in the spectrogram to distinguish classes from one another. For example, we can calculate the spectral centroid and bandwidth over time. These describe where most of the spectral energy lies over time.
![image-24.png](attachment:image-24.png)

9. Calculating spectral features
To calculate the spectral centroid and bandwidth, we again turn to librosa. We'll use the spectral_bandwidth and spectral_centroid functions to calculate these values at each moment in time for the spectrogram we've computed. These functions could also accept a raw audio signal (in which case the STFT will be performed first). This visualization code is what produced the figure on the previous slide.
![image-25.png](attachment:image-25.png)

10. Combining spectral and temporal features in a classifier
In this chapter, we've calculated many different kinds of auditory features from our heartbeat sounds. As a final step, we can combine each of the features mentioned before into a single input matrix for our classifier. Here we calculate the mean value of the spectral centroid and bandwidth, and stack these into a single classifier input matrix. In general, as we include more complex features into our model, we'll improve model performance.
![image-26.png](attachment:image-26.png)


# PREDICTING TIMESERIES DATA
**If you want to predict patterns from data over time, there are special considerations to take in how you choose and construct your model. This chapter covers how to gain insights into the data before fitting your model, as well as best-practices in using predictive modeling for time series data.**

# Predicting Data Over Time

In the third chapter we'll shift our focus from classification to regression. Regression has several features and caveats that are unique to timeseries data. We'll begin by visualizing and predicting timeseries data. Then, we'll cover the basics of cleaning the data, and finally, we'll begin extracting features that we can use in our models.

2. Classification vs. Regression
The biggest difference between regression and classification is that regression models predict continuous outputs whereas classification models predict categorical outputs. In the context of timeseries, this means we can have more fine-grained predictions over time.
![image.png](attachment:image.png)

3. Correlation and regression
Both Regression and correlation reflect the extent to which the values of two variables have a consistent relationship (either they both go down or up together, or they have an inverse relationship). However, regression results in a "model" of the data, while correlation is just a single statistic that describes the data. Regression models have more information about the data, while correlation is easier to calculate and interpret.
![image-2.png](attachment:image-2.png)

4. Correlation between variables often changes over time
When running regression models with timeseries data, it's important to visualize how the data changes over time. You can either do this by plotting the whole timeseries at once, or by directly comparing two segments of time.

5. Visualizing relationships between timeseries
Here we show two ways to compare timeseries data. On the left, we'll make two line plots with the x-axis encoding time. On the right, we'll make a single scatterplot, with color encoding time.
![image-3.png](attachment:image-3.png)

6. Visualizing two timeseries
Here is the visualization. In this case, it seems like these two timeseries are uncorrelated at first, but then move in sync with one another. We can confirm this by looking at the brighter colors on the right. We see that brighter datapoints fall on a line, meaning that for those moments in time, the two variables had a linear relationship.
![image-4.png](attachment:image-4.png)

7. Regression models with scikit-learn
Fitting regression models with scikit-learn works the same way as classifiers - the consistency in API is one of scikit-learn's greatest strengths. There are, however, a completely different subset of models that accomplish regression. We'll begin by focusing on LinearRegression, which is the simplest form of regression. Here we see how you can instantiate the model, fit, and predict on training data.
![image-5.png](attachment:image-5.png)

8. Visualize predictions with scikit-learn
Here we visualize the predictions from several different models fit on the same data. We'll use Ridge regression, which has a parameter called "alpha" that causes coefficients to be smoother and smaller, and is useful if you have noisy or correlated variables. We loop through a few values of alpha, initializing a model with each one and fitting it on the training data. We then plot the model's predictions on the test data,
![image-6.png](attachment:image-6.png)

9. Visualize predictions with scikit-learn
which lets us see what each model is getting right and wrong. For more information on Ridge regression, refer to DataCamp's introductory course on scikit-learn.
![image-7.png](attachment:image-7.png)

10. Scoring regression models
Visualizing is useful, but not quantifiable. There are several options for scoring a regression model. The simplest is the correlation coefficient, whereas the most common is the coefficient of determination, or R squared.

11. Coefficient of Determination ($R^2$)
The coefficient of determination can be summarized as the total amount of error in your model (the difference between predicted and actual values) divided by the total amount of error if you'd built a "dummy" model that simply predicted the output data's mean value at each timepoint. You subtract this ratio from "1", and the result is the coefficient of determination. It is bounded on top by "1", and can be infinitely low (since models can be infinitely bad).
![image-8.png](attachment:image-8.png)

12. $R^2$ in scikit-learn
In scikit-learn, we can import the r2_score function which calculates the coefficient of determination. It takes the predicted output values first, and the "true" output values second, to calculate r-square.
![image-9.png](attachment:image-9.png)

# Advanced Time Series Predictions

Now that we've covered some simple visualizations and model fitting with continuous timeseries, let's see what happens when we look at more real-world data.

2. Data is messy
Real-world data is always messy, and requires preparing and cleaning the data before fitting models. In timeseries, messy data often happens due to failing sensors or human error in logging the data. Let's cover some specific ways to spot and fix messy data with timeseries.

3. What messy data looks like
First, let's look at some messy-looking data. Here, we're showing the value of the company AIG over the last several years. There seem to be two periods of time where no data was produced, as well as some periods of time where the data doesn't fluctuate at all. Both look like they're aberrations, so let's see how we can correct for them. Before moving forward, note that it is not always clear whether patterns in the data are "aberrations" or not. You should always investigate to understand the source of strange patterns in the data.

4. Interpolation: using time to fill in missing data
First, let's fill in the missing data using other datapoints we do have. We'll use a technique called interpolation, which uses the values on either end of a missing window of time to infer what's in-between.

5. Interpolation in Pandas
In this example, we'll first create a boolean mask that we'll use to mark where the missing values are. Next, we call the dot-interpolate method to fill in the missing values. We'll use the first argument to signal we want linear interpolation. Finally, we'll plot the interpolated values.

6. Visualizing the interpolated data
You can see the results of interpolation in red. In this case, we used the "linear" argument so the interpolated values are a line between the start and stop point of the missing window. Other arguments to the dot-interpolate method will result in different behavior.

7. Using a rolling window to transform data
Another common technique to clean data is transforming it so that it is more well-behaved. To do this, we'll use the same rolling window technique covered in Chapter 2.

8. Transforming data to standardize variance
Using a rolling window, we'll calculate each timepoint's percent change over the mean of a window of previous timepoints. This standardizes the variance of our data and reduces long-term drift.

9. Transforming to percent change with Pandas
In this function, we first separate out the final value of the input array. Then, we calculate the mean of all but the last datapoint. Finally, we subtract the mean from the final datapoint, and divide by the mean. The result is the percent change for the final value.

10. Applying this to our data
We can apply this to our data using the dot-aggregate method, passing our function as an input. On the right, the data is now roughly centered at zero, and periods of high and low changes are easier to spot.

11. Finding outliers in your data
We'll use this transformation to detect outliers. Outliers are datapoints that are statistically different from the dataset as a whole. A common definition is any datapoint that is more than three standard deviations away from the mean of the dataset.

12. Plotting a threshold on our data
Here we'll visualize our definition of an outlier. We calculate the mean and standard deviation of each dataset, then plot outlier "thresholds" (three times the standard deviation from the mean) on the raw and transformed data.

13. Visualizing outlier thresholds
Here is the result. Any datapoint outside these bounds could be an outlier. Note that the datapoints deemed an outlier depend on the transformation of the data. On the right, we see a few outlier datapoints that were *not* outliers in the raw data.

14. Replacing outliers using the threshold
Next, we replace outliers with the median of the remaining values. We first center the data by subtracting its mean, and calculate the standard deviation. Finally, we calculate the absolute value of each datapoint, and mark any that lie outside of three standard deviations from the mean. We then replace these using the nanmedian function, which calculates the median without being hindered by missing values.

15. Visualize the results
As you can see, once we've replaced the outliers, there don't seem to be as many extreme datapoints. This should help our model find the patterns we want.

# Creating Features Over Time




# VALIDATING AND INSPECTING TIMESERIES MODELS
**Once you've got a model for predicting time series data, you need to decide if it's a good or a bad model. This chapter coves the basics of generating predictions with models in order to validate them against "test" data.**

# Creating Features from the Past
# Cross-Validating Time Series Data
# Stationarity & Stability
