# Predicting Boston Housing Prices

## Introduction

In this project, you will evaluate the performance and predictive power of a model that has been trained and tested on data collected from homes in suburbs of Boston, Massachusetts. A model trained on this data that is seen as a good fit could then be used to make certain predictions about a home — in particular, its monetary value. This model would prove to be invaluable for someone like a real estate agent who could make use of such information on a daily basis.

The dataset for this project originates from the [UCI Machine Learning Repository](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html)

# Steps Involved:
1. Importing the Library
2. Loading the Dataset
3. Structure of the Dataset
4. Data Exploration
    * Statistics 
    * Heat Map
    * Feature plot
5. Model Development for Prediction
    * Split the Data
    * Regression Model(Linear, Lasso, RandomForest Regressor)
6. Conclusion

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(filename='C:\\Users\\Abhishek Kumar Singh\\Boston-Regression-1\\Images\\boston-housing.jpg')

<IPython.core.display.Image object>

# 1. Importing the library

In [2]:
#Import all the required Packages like numpy ,pandas etc.
#Import the library for the plots also

# 2. Loading the Dataset

In [3]:
'''Load the dataset of Boston Housing Dataset''';
#Use the inbuilt dataset feature of sklearn for importing the Data;
#Put that object in some variable like cancer_data or anything else;

In [4]:
#Print the target value in the Dataset

In [5]:
#Print the Keys present in the Dataset

In [6]:
#Convert the boston_data into dataframe for easy of analysis as pandas provide several inbuilt functions 
#Use numpy for this, there is feature_names present in boston_data which is having the list of all the features present
#boston_data is having the data and target that too be inserted into numpy and use pd.Dataframe to make the boston_data 
#into Dataframe and save this object into some other variable like df or anything else.;
'''Print the first five record using .head to check the Dataframe''';

In [7]:
#Save the target value in some other variable for further analysis
'''Use columns name for this indexing''';

# 3. Structure of the Dataset

In [8]:
#Print the shape of the dataset

In [9]:
#Call the describe function

## Missing or Null Points

In [10]:
#Check for null value in the features using isnull function

In [11]:
#Check for nan value in the features using isnan function

In [12]:
'''Check whether you get any null value or nan value''';
#If found try to avoid that, and if not proceed to next step

# 4. Data Exploration

For your very first coding implementation, you will calculate descriptive statistics about the Boston housing prices. Since numpy has already been imported for you, use this library to perform the necessary calculations. These statistics will be extremely important later on to analyze various prediction results from the constructed model.

In the code cell below, you will need to implement the following:

* Calculate the minimum, maximum, mean, median, and standard deviation of 'MEDV', which is stored in prices. Store each           calculation in their respective variable.
* Store each calculation in their respective variable.

In [13]:
# TODO: Minimum price of the data

# Alternative using pandas
# minimum_price = prices.min()

# TODO: Maximum price of the data

# Alternative using pandas
# maximum_price = prices.max()

# TODO: Mean price of the data

# Alternative using pandas
# mean_price = prices.mean()

# TODO: Median price of the data

# Alternative using pandas
# median_price = prices.median()

# TODO: Standard deviation of prices of the data

# Alternative using pandas 
# std_price = prices.std(ddof=0)

# There are other statistics you can calculate too like quartiles

# Show the calculated statistics

After statistics analysis, go for the graphical representation.

In [14]:
#You'll use displot from seaborn on the target value

In [15]:
'''Get some observation from the above graph and perform the filteration at this level, if you found.''';
#Hint: Check for target value of 50

## Check for correlation in the Dataset

In [16]:
#Use heat map in the seaborn library to get the correlation graph

A heat map uses a warm-to-cool color spectrum to show dataset analytics, namely which parts of data receive the most attention.

The correlation coefficient ranges from -1 to 1. If the value is close to 1, it means that there is a strong positive correlation between the two variables. When it is close to -1, the variables have a strong negative correlation.

**Is there any relations among the features?**

In [17]:
'''Try to get some correlation from the above graph, and try to use the features 
   only with the more positive correlation and more correlation''';

An important point in selecting features for a regression model is to check for multi-co-linearity. 
So the features with the same correlation try to avoid that.

**Based on above analysis, now you have to perform the filteration the dataset, if you get any relevant.**

In [18]:
#Create the dataframe with the filteration applied to original

## Plotting the Graph for Each Selected Features

Here you'll plot the graph of each features with the target value

In [19]:
#Use pyplot library 
'''Other function that might you can use like .xlable, .ylable, .plot, .show''';

In [20]:
#Using Seaborn for Better Understanding of the Filter_data Features
'''Use boxplot for the features to get the value range''';

**Of the features you investigated, were there any unusual distributions?**

Find the answer of the above question, and if you find any outlier try to remove it.

And if everything is fine, proceed with the dataset

In [21]:
'''If filteration done by the above observation''';
#Save the new Dataframe and change the value of every value to be used accordingly

# 5. Model Development for Prediction

In this section, you are going to use Machine learning Inbuilt Regression Model for the Prediction of the price of the House

## Splitting the Dataframe into Training and Testing 

In [22]:
#Use inbuilt train_test model from model_selection in sklearn

In [23]:
'''Provide the proper test_size in the train_test model to split the data into that ratio''';

## Regression Model

You have to use scikit-learn’s LinearRegression , LassoRegression and RandomForest Regression to train our model on both the training and test sets.

Try to analysis the Model, step wise like go for first Linear then Lasso and then last RandomForest

**Importing the library for Regression**

In [24]:
#Import the library for the regression from sklearn

In [25]:
#Call the regression and save the object in some variable

**Training the Dataset**

In [26]:
#Train the dataset on the regression model
'''Use inbuilt .fit function for this''';

**Prediction**

In [27]:
#Predict the test data and store the result in some variable
'''Use inbuilt .predict function for this''';

**Plotting the result**

In [28]:
#Plot the graph of original y_test and that you get from the prediction
'''Use matplot for this''';

**Model Evaluation**

In [29]:
#Get the R2 score for the model
'''Use inbuilt score function in the regression model''';

### NOTE:

You Have to Repeat the above 5 steps, starting from the Importing the Library for Regression till Model Evaluation for all the Specified Regression Model

# 6. Conclusion

Get some conclusion about the dataset from all the analysis, and you have to find which Regression Model is best for Boston Dataset, among the specified Regression Model.