# Machine Learning for Time Series Data in Python

Time series data is ubiquitous. Whether it be stock market fluctuations, sensor data recording climate change, or activity in the brain, any signal that changes over time can be described as a time series. Machine learning has emerged as a powerful method for leveraging complexity in data in order to generate predictions and insights into the problem one is trying to solve. This course is an intersection between these two worlds of machine learning and time series data, and covers feature engineering, spectograms, and other advanced techniques in order to classify heartbeat sounds and predict stock prices.

## I. Time Series and Machine Learning Primer
* Intro to the basics of machine learning, time series data, and the intersection between the two.

### Timeseries kinds and applications
* Put simply, a **timeseries** means data that changes over time.
* This can take many different forms; from atmospheric CO2 over time, to the waveform of spoken word, to climate sensor data, the fluctuation of a stock's value over the year, demographic information about a city
* **Timeseries data** consists of at least two things: 
    * One: an array of numbers that represents the data itself.
    * Two: another array that contains a timestamp for each datapoint.
* In other words, each datapoint should have a corresponding time point (whether that be a month, year, hour, or any combination of these). Note: multiple data points may have the same time point

#### Plotting a pandas timeseries

```
import matplotlib.pyploy as plt
fig, ax = plt.subplots(figsize=(12,6))
data.plot('date', 'close', ax=ax)
ax.set(title='AAPL daily closing price')
```
* **The amount of time that passes between timestamps defines the *period* of the timeseries.**
    * This often helps us infer what kind of timeseries we're dealing with.
* One crucial part of machine learning is that we can build a model of the world that formalizes our knowledge of the problem at hand. We can...
    * Predict the future
    * Automate this process
    * can be a critical component of an organization's decision making
    
* We treat timeseries data slightly differently than other types of datasets
    * Timeseries data always change over time, which turns out to be a useful pattern to utilize
    * Using timeseries-specific features lets us see a much richer representation of the raw data.
    
* This course will focus on a simple machine learning pipeline in the context of timeseries data.

* **A machine learning pipeline:**
    * Feature extraction: What kind of special features leverage a signal that changes over time?
    * Model fitting: What kinds of models are suitable for asking questions with timeseries data?
    * Prediction and Validation: How can we validate a model that uses timeseries data? What considerations must we make because it changes in time?

```
# Plot the time series in each dataset
fig, axs = plt.subplots(2, 1, figsize=(5, 10))
data.iloc[:1000].plot(y='data_values', ax=axs[0])
data2.iloc[:1000].plot(y='data_values', ax=axs[1])
plt.show()
```

```
# Plot the time series in each dataset
fig, axs = plt.subplots(2, 1, figsize=(5, 10))
data.iloc[:1000].plot(x='time', y='data_values', ax=axs[0])
data2.iloc[:1000].plot(x='time', y='data_values', ax=axs[1])
plt.show()
```

#### Machine Learning Basics
* Always begin by looking at your data:
    * `array.shape`
    * `array[:3]`
    * `dataframe.head()`
    * `dataframe.info()`
    * `dataframe.describe()`
* It is also crucial to visualize your data:

```
# Using matplotlib
fig, ax = plt.subplots()
ax.plot(...)

# Using pandas
fig, ax = plt.subplots()
df.plot(..., ax=ax)
```
* The proper visualization will depend on the kind of data you've got, though histograms and scatterplots are a good place to start.

* The most popular library for machine-learning in Python is `scikit-learn`.
    * Standardized API so that you can fit many different models with a similar code structure
    
#### Preparing data for scikit-learn
* `scikit-learn` expects a particular structure of data:
    * **`(samples, features)`**
* Make sure that your data is *at least two-dimensional.*
* Make sure the first dimension is *samples*.
* The first axis should correspond to sample number, and the second axis should correspond to feature number.

* If the axes are swapped: **transpose**
    * `array.transpose().shape`
    * `dataframe.T.shape`
    * will swap first and last axis
   
* Use **`.reshape()`** method:
    * lets you specify the shape you want

```
array.shape
array.reshape([-1, 1]).shape
```
   * `-1` will automatically fill that axis with remaining values
   
* **Investigating the model:**
    * It is often useful to investigate what kind of pattern the model has found.
    * Most models will store this information in attributes that are created after calling `.fit()`
        * `model.coef_`
        * `model.intercept_`
    * Call `.predict()` on the model to determine labels for unseen datapoints


