# Simple Linear Regression with Python 

## Setting Up the Working Environment
Import the necessary packages for this project with their aliases.

  1. Import pandas
  2. Import numpy
  3. Import matplotlib
  4. Import seaborn
  
use this to render the plots in jupyter
```python
%matplotlib inline
```

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Loading the Boston Housing Dataset  

  - Housing Dataset is the in the datasets module from sklearn
     1. Import datasets from sklearn
     2. load the boston housing dataset
     
Syntax:
```python
# import datasets
from sklearn import datasets
# load boston housing dataset and call it boston
boston = datasets.load_boston()
```

In [None]:
# Load the boston dataset 
from sklearn import datasets

In [None]:
# load boston dataset 
boston = datasets.load_boston()

#### Note:

We can do the previous two steps in just one step like this
```python
from sklearn.datasets import load_boston
```
I am doing it in two steps for the purpose of teaching.

### Check the dataset

Syntax:
```python
# Use dir() function
dir(boston)
# Or get the keys using the keys() method
print(boston.keys())
```

### Dataset Description

Before starting analysing the data, you need to get to know it first, know the variable names and what they. If you don't know what you have, how can you use it?

Syntax:
```python
boston.DESCR
# Use print() function to get a nice display
print(boston.DESCR)
```

### Get the feature names

  - Features or attributes are the independent (explanatory) variables.
Syntax:
```python
boston.feature_names
```

### Get the Data, and check the shape

  - Data is the data points of features
  
Syntax: 
```python
# Checking the data
boston.data
# Check the first few observations
boston.data[:5, : ]
# Checking the shape
boston.data.shape
```

### Check the Target Variable, and The shape
  - The **target** is the dependent, outcome, response variable.
  
Syntax:
```python
# target 
boston.target
# Check the first 10 observations
boston.target[:10]
# Check the last 10 observations
boston.target[-10:]
# shape
boston.shape
```

### Check the Type of the Target and the Data

Syntax:
```python
# The Target
print(type(boston.target))
# The Data
print(type(boston.data))
```

### Renaming The Data and the Target Variable: 
 
 - In accordance with the common scikit-learn practice, we rename the target (response) variable as __y__ (lowercase), and __X__ (Uppercase) for the features data.  
  
 - After renaming the features data, you need to convert the array of data into a DataFrame object.
 
Syntax:

```python
# Features data
X = boston.data

# Converting X into a DataFrame
bost_df = pd.DataFrame(X, columns = boston.feature_names)
# target variable data
y = boston.target
```

### Prepare the data for estimation

  1. We have to build a DataFrame that has the target and all the features to use it for estimation.

We already converted the features data into DataFrame, we can create add the target variable and we name it as 'Price'

Syntax:
```python
bost_df['Price'] = boston.target
```

___
We are ready to start our analysis.

## Exploratory Data Analysis

- EDA is a crucial step in any machine learning project.

### Check the Info about the Dataset

Syntax:
```python
bost_df.info()
```

### Print few Observations (first and last)

Syntax:
```python 
# First obs
bost_df.head()
# Last obs
bost_df.tail()
```

### Data Summary Statistics

- Run summary statistics for the data

Syntax:
```python
bost_df.describe()
```

### Visualization of the target variable

Syntax:
```python
# Histogram of prices (this is the target of our dataset)
plt.hist(boston.target,bins=50)

#label
plt.xlabel('Price in $1000s')
plt.ylabel('Number of houses')
```

In [None]:
# Setting the figure size 
sns.set(rc={'figure.figsize':(12,8)})

In [None]:
# Plot the histogram
plt.hist(boston.target,bins=50)

# label the axes 
plt.xlabel('Price in $1000s')
plt.ylabel('Number of houses')

### Feature Selection

  - We will run the correlation to check what variables are highly correlated with the target variable. Then we plot the correlation using heatmap function from seaborn.
  
Syntax:
```python
sns.heatmap(bost_df.corr())
```

In [None]:
# Plotting the heatmap
sns.heatmap(bost_df.corr(), square=True, cmap='RdYlGn')

### Plotting highley Correlated Features with the Target

  - We see from the previous plot that the __RM (average number of rooms)__ is positively correlated with price. Therefore, we will plot price against RM.
  
```python 
plt.scatter(x = bost_df['RM'], y = bost_df['Price']) 
```

It is better to have a fitted line plotted. We can do that by using __lmplot()__ from seaborn

Syntax:
```python
sns.lmplot(x= 'RM', y = 'Price', data = bost_df)
```

### Use lmplot to plot LSTAT 
```python
sns.lmplot(x= 'LSTAT', y = 'Price', data = bost_df)
```

## Simple Linear Regression with SK-Learn

First, we run a simple linear regression to get familiar with the technique, then we move forward to advanced modelling. 

- Import linear_model from sklearn

- Run a simple linear regression

  $Price=\beta_{0} + \beta_{1} RM + error$
  
- Read the results

Syntax 01:
```python
# Import linear_model
from sklearn import linear_model

# Instantiate LinearRegression
simple_reg = linear_model.LinearRegression()

# Implement Linear Regression using fit()
simple_reg.fit(target, feature)
```

Syntax 01:
```python
from sklearn.linear_model import LinearRegression
# Create LR object
lm = LinearRegression()
```

In [None]:
# Import linear_model


In [None]:
# Create a LinearRegression Object


### Instantiation LinearRegression Object

In order to use LinearRegression() function:

  1. Instantiate (create a new object)
  2. Use the object to fit the model
  3. Use the object to predict new data
  4. Use the object to score the prediction; for example, lm.score() returns the coefficient of determination (R^2).
  
Syntax:
```python
# Create New Obj
lm = linearRegression()
# Fit Linear Regression
lm.fit()
# Predict new data
lm.predict()
# Score Prediction
lm.score()
```

### Note: 

Running Linear Regression with Sklearn requires object to be 2D-arrays. Thus, some steps are required, such as converting variables into  2D-arrays. There is a shortcut to do that by using __reshape(-1, 1)__ function

Syntax:
```python
one-D-array.reshape(-1, 1)
# Or use this syntax
np.array(oneD_obj).reshape(-1, 1)
```

In [None]:
X = bost_df['RM']
y = bost_df['Price']

In [None]:
# Print the dimensions of y and X before reshaping
print("Dimensions of y before reshaping: ", )
print("Dimensions of X before reshaping: ", )

In [None]:
# Reshape X and y
# y_reshaped 
# X_reshaped 

In [None]:
# Print the dimensions of y_reshaped and X_reshaped
print("Dimensions of y after reshaping: ")
print("Dimensions of X after reshaping: ")

### Fitting Linear Regression
 
  - Use `fit()` function to fit the model
Syntax:
```python
lm.fit(x, y)
```

In [None]:
# Fit linear model 


### Check the lm object

You can use `dir()` function to see the attributes of lm object.

Syntax:
```python 
print([att for att in dir(lm) if '_' and '__' not in att])
```

### Check the Coefficients

  1. The intercept can be checked using `lm.intercept_`
  2. The Other coefficients are avalable in `lm.coef_`
  
Syntax:
```python
# Intercept
print('The intercept of simple linear regression is', lm.intercept_)
# Coefficients
print('The intercept of simple linear regression is', lm.coef_)
```

### Saving the results into a table

In [None]:
coef_df = pd.DataFrame({'Intercept': lm.intercept_,
                       'Coef': lm.coef_.flatten()})
coef_df

### The Goodness-of-Fit

1. One of the metrics to check how well our model fits the data is the coefficient of determination of $R^{2}$. Which can be retrieved using `lm.score()` function.

Syntax:
```python
lm.score(X, y)
# Or use print 
print("The R^2 of model is: {:0.3f}".format(lm.score(X_reshaped, y_reshaped)))
```

### Prediction

Now we can use the for prediction. However, we don't new data, thus we will predict on the same data used to build the model. We will address this point later in detail. Prediction is done using `lm.predict()` function:

Syntax:
```python
y_pred = lm.predict(new_data)
```

### Ploting The results

In [None]:
plt.scatter(X, y, color='blue', alpha=0.5)
plt.plot(X_reshaped, y_pred, color='red', linewidth=3)
plt.show()

# Try with the same DataSet and Mention The best model
### 1.Support Vector Regression Model
### 2.Decision Tree Regression Model
### 3.Random Forest Regression Model
