# 3 Time Series as Supervised Learning

---

### 1. What is Supervised Learning?
- **Supervised Learning:** you have `input variables (X)` and an `output variable (y)` and you use an algorithm to learn the mapping function from the input to the output $\rightarrow y = f(X)$
- The foundation for all predictive modeling machine learning (ML) algorithms
- **Goal:** is to approximate the real underlying mapping so well that when you have new `input data (X)`, you can predict the `output variables (y)` for that data
- Called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process
- The teacher knows the correct answers; the algorithm (or students) iteratively makes predictions on the training data and is corrected by making updates
- Learning stops when the algorithm achieves an acceptable level of performance
- **Sliding window method:** The use of prior time steps to predict the next time step; also called window method and in statistics and TSA, this is called a **lag** or lag method; turn time series problem to supervised learning problem; the larger the window size, the more time series steps in window
- **Univariate Time Series:** Datasets where only a single variable is observed at each time, such as temperature each hour
- **Multivariate Time Series:** Datasets where two or more variables are observed at each time
- **One-step Forecast:** At (given) `t (as an input feature (X))`, the next time step `y = t + 1` is predicted
- **Multi-step Forecast:** At (given) `t (as an input feature (X))`, predict `y_n = t + n` and `y_n = t + n + 1`, where n > 1 so t + 2, t + 3...


---

### 2. Why is X important?

- Transforming from a time series to a supervised learning problem allows you access to the suite of standard linear and nonlinear ML algorithms on your problem

---

### 3. What are some applications of X? What other concepts can I connect to X? Use FE when...

- Supervised learning problems can be further grouped into regression and classification problems
    - Classification: A classification problem is when the output variable is a category, such as red and blue or disease and no disease
    - Regression: A regression problem is when the output variable is a real value, such as dollars or weight

---

### 4. What is the evolution/history of X?

---

### 5. Can I predict the future use of X? How can this current usage of X improve?

---

### 6. What don't I understand about X? Why is this? What's the root of this misunderstanding?

---


## Imports and Load Data

In [1]:
import numpy as np

## Examples - Uni and Multi Variate to Supervised Learning

A contrived regression problem example of a supervised learning dataset where each **row is an observation** comprised of one `input variable (X)` and one `output variable` to be predicted `(y)`.

## Transform Time Series Problem to Supervised Learning Problem (My Solution)
- Function: `split_sequence()` splits a given univariate sequence into multiple samples where each sample has a specified number of time steps and the output is a single time step.
- Parameters:
    - `ts_data`: time series data to transform to supervised learning data
    - `input_X_size`: size of each `X` for supervised learning
- Input: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 10 x 1
- Output: X_1 = [1, 2, 3] Y_1 = [4]
X_2 = [2, 3, 4] Y_2 = [5]
X_3 = [3, 4, 5] Y_2 = [6]
... 
- Size: X 7 x 3 and y 7 x 1
- Feature (of output): A column in a dataset, such as a lag observation for a time series dataset. 3 features
- Sample (of output): A row in a dataset, such as an input and output sequence for a time series dataset. 7 samples
- Data in this form can be used directly to train a simple neural network, such as a Multilayer Perceptron

In [44]:
def split_sequence(ts_data, input_X_size):
    """
    ts_data -- 1D np array
    input_X_size -- int
    
    return -- 2D np array, 1D np array
    """
    X = []
    y = []
    
    for i in range(len(ts_data)):
        
        last_idx = i + input_X_size
        # print(ts_data[i], last_idx)
        
        if last_idx == len(ts_data):
            break
        
        # set the size of all X inputs
        get_X = ts_data[i : last_idx]
        # print(get_X)
        
        X.append(get_X)
        # print(X)
        
        get_y = ts_data[last_idx]
        # print(get_y)
        
        y.append(get_y)
        # print(y)
        # print()
    for i in range(len(X)): 
        print(i, X[i], y[i])
    return X, y

In [45]:
time_series_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
split_sequence(time_series_data, 3)

0 [1 2 3] 4
1 [2 3 4] 5
2 [3 4 5] 6
3 [4 5 6] 7
4 [5 6 7] 8
5 [6 7 8] 9
6 [7 8 9] 10


([array([1, 2, 3]),
  array([2, 3, 4]),
  array([3, 4, 5]),
  array([4, 5, 6]),
  array([5, 6, 7]),
  array([6, 7, 8]),
  array([7, 8, 9])],
 [4, 5, 6, 7, 8, 9, 10])

## Transform Time Series Problem to Supervised Learning Problem (Correct Solution)

In [55]:
def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
        
        # find the end of this pattern
        end_ix = i + n_steps
        
        # check if we are beyond the sequence 
        if end_ix > len(sequence) - 1 : 
            break
        # gather input and output parts of the pattern 
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] 
        X.append(seq_x)
        y.append(seq_y)
    
    for i in range(len(X)): 
        print(X[i], y[i])
    
    print(np.shape(X), np.shape(y))
    
    return np.array(X), np.array(y)

In [56]:
series = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(np.shape(series))
split_sequence(series, 3)

(10,)
[1 2 3] 4
[2 3 4] 5
[3 4 5] 6
[4 5 6] 7
[5 6 7] 8
[6 7 8] 9
[7 8 9] 10
(7, 3) (7,)


(array([[1, 2, 3],
        [2, 3, 4],
        [3, 4, 5],
        [4, 5, 6],
        [5, 6, 7],
        [6, 7, 8],
        [7, 8, 9]]),
 array([ 4,  5,  6,  7,  8,  9, 10]))

NameError: name 'X' is not defined