# Implementing Trading with Machine Learning Regression - Part - 2

In the previous notebook, we have covered how to import data to create indicators. We defined and independent variables for linear regression. 

In this notebook, you will learn the machine learning regression technique. We will implement a linear regression model on Gold ETF that will predict the Day's High and Day's Low given its Day's Open, High, Low and Other defined indicators. The key steps are:
1. Import the Data
2. Preprocess the Data
3. Grid Search Cross-Validation
4. Split Train and Test Data
5. Predict the High and-Low Prices

In [1]:
# Import Machine Learning libraries
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import TimeSeriesSplit
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline

# Import the libraries
import numpy as np
import pandas as pd

# For Plotting 
import matplotlib.pyplot as plt 
%matplotlib inline
plt.style.use('seaborn-darkgrid')

# To ignore unwanted warnings
import warnings 
warnings.filterwarnings("ignore")

### Import the Data
The input data is stored in `input_parameters.csv`, which we will import here as `gold_prices` to make prediction using Pipeline.

In [2]:
# Read the data
gold_prices = pd.read_csv('data/input_parameters.csv', index_col='Date')

# Printing the data
gold_prices.head()

Unnamed: 0_level_0,Open,High,Low,Close,S_3,S_15,S_60,Corr,Std_U,Std_D,OD,OL
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2013-04-15,136.0,136.75,130.509995,131.309998,,,,,0.75,5.490005,,
2013-04-16,134.899994,135.110001,131.759995,132.800003,,,,,0.210007,3.139999,-1.100006,3.589996
2013-04-17,133.809998,134.949997,132.320007,132.869995,,,,,1.139999,1.489991,-1.089996,1.009995
2013-04-18,134.119995,135.309998,133.619995,134.300003,132.326665,,,,1.190003,0.5,0.309997,1.25
2013-04-19,136.0,136.020004,134.600006,135.470001,133.323334,,,,0.020004,1.399994,1.880005,1.699997


#### Checking for NaN values
Here we will for NaN values, then we will drop all the rows having NaN values using `dropna` method

In [7]:
gold_prices.isna().sum(axis=0)

Open      0
High      0
Low       0
Close     0
S_3       3
S_15     15
S_60     60
Corr     13
Std_U     0
Std_D     0
OD        1
OL        1
dtype: int64

We have 60 NaN values is `S_60`, 15 NaN in `S_15`, 13 NaN values in `S_13` and 3 NaN values in `S_3` etc. Now we will simply drop all the NaN values using `dropna`

In [8]:
# Dropping all the NaN values
gold_prices.dropna(inplace=True)

# Checking for NaN values
gold_prices.isna().sum()

Open     0
High     0
Low      0
Close    0
S_3      0
S_15     0
S_60     0
Corr     0
Std_U    0
Std_D    0
OD       0
OL       0
dtype: int64

Now our dataframe `gold_prices` is free from NaN values.

In [None]:
# Independent variables
