<a href="https://colab.research.google.com/github/Dinesha1999/ML_Project_2/blob/main/HousePrice_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **House price prediction**

In [237]:
#Generic Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


# get the data set(in here the dataset is using fetch california buildings, and this give the various parameters and target parameters)
from sklearn.datasets import fetch_california_housing

In [238]:
#get the dataset
data = fetch_california_housing()

In [240]:
print(data.DESCR)
#in this dataset we have 20640 number of rows and apart from that this has 8 number of features or attributes and also here we use latitude and longitude value to get or generate anew value which is also referred as feature engineering

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

:Number of Instances: 20640

:Number of Attributes: 8 numeric, predictive attributes and the target

:Attribute Information:
    - MedInc        median income in block group
    - HouseAge      median house age in block group
    - AveRooms      average number of rooms per household
    - AveBedrms     average number of bedrooms per household
    - Population    block group population
    - AveOccup      average number of household members
    - Latitude      block group latitude
    - Longitude     block group longitude

:Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived from the 1990 U.S. census, using one row per ce

### Create the dataset

In [241]:
# 1. Create a ata frame first
df = pd.DataFrame()

In [242]:
# 2.add the data to our dataframe
data.data

array([[   8.3252    ,   41.        ,    6.98412698, ...,    2.55555556,
          37.88      , -122.23      ],
       [   8.3014    ,   21.        ,    6.23813708, ...,    2.10984183,
          37.86      , -122.22      ],
       [   7.2574    ,   52.        ,    8.28813559, ...,    2.80225989,
          37.85      , -122.24      ],
       ...,
       [   1.7       ,   17.        ,    5.20554273, ...,    2.3256351 ,
          39.43      , -121.22      ],
       [   1.8672    ,   18.        ,    5.32951289, ...,    2.12320917,
          39.43      , -121.32      ],
       [   2.3886    ,   16.        ,    5.25471698, ...,    2.61698113,
          39.37      , -121.24      ]])

In [243]:
#to see the shape of dataset
data.data.shape
#these are the number of records I have

(20640, 8)

In [244]:
df = pd.DataFrame(data =data.data)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


##### importing independent data

In [245]:
# The all data is independent data.These independent data basically used to training our model, also we have dependent data also. the dependent data is basically reffered as output variable.
#here we need columns name instead of columns 0,1,..7
df = pd.DataFrame(data =data.data, columns = data.feature_names)
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


##### importing dependent data

In [246]:
#to get the dependent variables, create a new column here called 'Target or houseprice or output'.
df['Target'] =data.target

In [247]:
#data.target
#this is basically array and now add this array to data frame

In [248]:
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,Target
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


## EDA(Exploratory data analysis)

In [249]:
# 3. exploratory data analysis(before move on to create a model, we need to understand our data,that mean whether there is any correlation betweeen two features
# ex: if the MedInc and HouseAge is correlated then there is no use of using that in our model while training, because when medium house income is increasing house age also increasing at the same time.they both are similar.in that way discard that particular feature)
# here we use some build in tools 'sweetviz tool'

In [250]:
!pip install sweetviz



In [251]:
# determine how our data looks like in a graphical format

In [252]:
import sweetviz as sv
#create a variable called report
report = sv.analyze(df)
# To save this in a form of html
report.show_html('report.html')

                                             |          | [  0%]   00:00 -> (? left)

Report report.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


## Data preprocessing

In [253]:
# we need to preprocess the data which can see over report.html , and we can remove some of data . here Latitude and Longitude is no use,but can use this coordinates
# and somehowcome up with exact location like in a city or a street, that would add up couple more features , more than number of feaatures better the accuracy
#So using  jeopy library, can coordinates and once we pass the coordinates can be get the name of the city and various of parameters(called as feature engineering)


#### **feature** **enginerring**

In [254]:
from geopy.geocoders import Nominatim
import time
geolocator = Nominatim(user_agent='geoapiExercises')

In [255]:
# Specify a descriptive and unique user-agent
geolocator = Nominatim(user_agent="my-house-price-prediction-app")

# Introduce a delay between requests to avoid rate limiting
time.sleep(1)

In [256]:
#pass the 1st raw latitude and longitude values
#geolocator.reverse("37.88"+  " , "  +"-122.23")[0]
#ifwe want to address
geolocator.reverse("37.88"+  " , "  +"-122.23").raw['address']  # raw returns a dectionary

{'road': 'Convict Trail',
 'city': 'Oakland',
 'county': 'Alameda County',
 'state': 'California',
 'ISO3166-2-lvl4': 'US-CA',
 'postcode': '94720',
 'country': 'United States',
 'country_code': 'us'}

In [257]:
#create a function called def location, this thing takes in is a coordinates and pass the latitude and longetude as an array
# and create a variable called location and pass the latitude and longertude

def location(cord):
  Latitude = str(cord[0])
  Longitude = str(cord[1])

  #this raw return a dictionary
  location = geolocator.reverse(Latitude+","+Longitude).raw['address']

  #if the values are missing, then we have to replace that nun values.  Therefore replace by a empty string
  if location.get('road') is None:
    location['road'] = None

  if location.get('County') is None:
    location['County'] = None

  loc_update['County'].append(location['County'])
  loc_update['road'].append(location['road'])

'''in here the def function takes the coordinate and this coordinates basically are latitudes and longitudes.
these will be passed in the form of an array.here I use them as chord 0 and cord1. in order to pass this latitudes and longitudes
I need to use this reverse function.and when I use the raw function here,it converts it in to a dictionary.This dictionary contains varias features like city,
country code... as almost all the features here would be same'''

'in here the def function takes the coordinate and this coordinates basically are latitudes and longitudes.\nthese will be passed in the form of an array.here I use them as chord 0 and cord1. in order to pass this latitudes and longitudes\nI need to use this reverse function.and when I use the raw function here,it converts it in to a dictionary.This dictionary contains varias features like city,\ncountry code... as almost all the features here would be same'

In [258]:
'''import pickle #import the pickle module
loc_update = {"County":[],
              "road":[],
              "Neighbourhood":[]}

for i,cord in enumerate(df.iloc[:,6:-1].values):
  location(cord)
  #Countiniously reading my data and saving it on the go
  pickle.dump(loc_update, open('loc_update.pickle','wb'))

  if i%100==0:
    print(i)'''
'''initially here created the dictionary which id location update and this dictionary contain various parameters which are in the list,
  and by itterating the for loop and then finally add this to a pickle module,the reason we have 20 000 raws and apart from that this is an api.'''

'initially here created the dictionary which id location update and this dictionary contain various parameters which are in the list,\n  and by itterating the for loop and then finally add this to a pickle module,the reason we have 20 000 raws and apart from that this is an api.'

In [259]:
# to load the pickle model
import pickle
loc_update = pickle.load(open("/content/loc_update.pickle","rb"))
# here I passed the name of the pickle file by copying the path and thisis be a binary file and rb mean read binary.


In [260]:
loc_update.keys()
# neighbourhoos is no use,just consider county and the road

dict_keys(['County', 'road', 'Neighbourhood'])

In [261]:
#loc_update['County']
#if put here as county ,will see get all the values related to the county.

In [262]:
#now add these data back to our data frame, and before gonna delete latitude and longitude data
# Fix: Ensure all lists in loc_update have the same length
max_len = max(len(loc_update[key]) for key in loc_update)
for key in loc_update:
    loc_update[key] += [None] * (max_len - len(loc_update[key]))
loc = pd.DataFrame(loc_update)


In [263]:
loc.info()
#here we can understand how many values we have and now seehow many number of missing values we have

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   County         0 non-null      object
 1   road           19653 non-null  object
 2   Neighbourhood  0 non-null      object
dtypes: object(3)
memory usage: 483.9+ KB


In [264]:
# add  the new features to my dataset

for i in loc_update.keys():
  df[i] = loc_update[i]


df = df.sample(axis= 0,frac=1)
df.head(10) #to see 10 parameters and here we have some couple of missing data from neighbourhood and county
# and aprt from that we also have to remove latitude and longitude

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,Target,County,road,Neighbourhood
14517,3.6111,15.0,5.0,0.917241,747.0,2.575862,32.91,-117.13,1.963,,Caminito Baywood,
13428,2.7827,9.0,4.764444,1.13,3156.0,3.506667,34.1,-117.45,1.058,,Arrow Boulevard,
3862,4.0391,33.0,5.196141,1.03537,580.0,1.864952,34.16,-118.44,3.375,,Hazeltine Avenue,
8200,3.1711,44.0,4.033784,1.045608,1461.0,2.467905,33.79,-118.14,2.154,,East Atherton Street,
15050,3.6445,17.0,5.84,1.062069,2191.0,3.022069,32.83,-116.85,1.976,,Mountain Top Drive,
4035,3.2154,20.0,4.133444,1.060181,7450.0,1.772122,34.17,-118.52,2.596,,Yarmouth Avenue,
17622,4.9236,43.0,5.220844,0.962779,1137.0,2.82134,37.26,-121.94,2.38,,Woodard Road,
6377,3.6,44.0,6.094086,1.145161,980.0,2.634409,34.15,-118.02,3.074,,South 5th Avenue,
5593,3.35,36.0,4.285354,0.994949,1274.0,3.217172,33.8,-118.25,1.631,,East Lomita Boulevard,
12036,3.6583,18.0,5.329201,1.064738,2500.0,3.443526,33.92,-117.47,1.261,,Branigan Way,


In [265]:
## drop latitude and loggitude and the neighbourhood columns
df = df.drop(labels = ["Latitude" ,"Longitude" , "Neighbourhood"], axis = 1) # need to oass an exis because removing a column
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Target,County,road
14517,3.6111,15.0,5.0,0.917241,747.0,2.575862,1.963,,Caminito Baywood
13428,2.7827,9.0,4.764444,1.13,3156.0,3.506667,1.058,,Arrow Boulevard
3862,4.0391,33.0,5.196141,1.03537,580.0,1.864952,3.375,,Hazeltine Avenue
8200,3.1711,44.0,4.033784,1.045608,1461.0,2.467905,2.154,,East Atherton Street
15050,3.6445,17.0,5.84,1.062069,2191.0,3.022069,1.976,,Mountain Top Drive


In [266]:
df.info()
#here we can see couple of missing values here and there are two ways to filling this missing values
#either we can use minimum or maximum values that is repeating and replace them up or, second way
#use linear regression or logistic regression  and also whichever data is missing that would be the test data and the data is not missing is going to be training data


<class 'pandas.core.frame.DataFrame'>
Index: 20640 entries, 14517 to 7880
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   MedInc      20640 non-null  float64
 1   HouseAge    20640 non-null  float64
 2   AveRooms    20640 non-null  float64
 3   AveBedrms   20640 non-null  float64
 4   Population  20640 non-null  float64
 5   AveOccup    20640 non-null  float64
 6   Target      20640 non-null  float64
 7   County      0 non-null      object 
 8   road        19653 non-null  object 
dtypes: float64(7), object(2)
memory usage: 2.1+ MB


##Using classification algorithm to fill the missing categorical values

#### 1.Creating a missing index array, if the data is not missing ,then add them to training data set, if them missing add themto predicting or testing dataset create some dataset for training the model

In [267]:
## applying logistic regression to find the missing values/missing raws/data
#here we can use county or road or and then can use any of these two or three parameters t to find out themissing data

missing_idx = []    #missing index basically an array and this give the index value
for i in range(df.shape[0]):
  if df['road'][i] is None:
    missing_idx.append(i)
#Now have to define dependent and independet features(ex:Target,AveOccup, Population...)
#Independnet Parameters
# Use a list comprehension instead of a generator expression
missing_Road_X_train = np.array([[df['MedInc'][i] , df['AveRooms'][i] , df['AveBedrms'][i]] for i in range(df.shape[0]) if i not in missing_idx])
#Dependent parameter
missing_Road_y_train = np.array([df['road'][i] for i in range(df.shape[0]) if i not in missing_idx])
missing_Road_X_test = np.array([[df['MedInc'][i] , df['AveRooms'][i] , df['AveBedrms'][i]] for i in range(df.shape[0]) if i in missing_idx])


In [268]:
missing_Road_X_train
#here contain missing 3 features andthese are independent features,and we use these features to train our model

array([[8.3252    , 6.98412698, 1.02380952],
       [8.3014    , 6.23813708, 0.97188049],
       [7.2574    , 8.28813559, 1.07344633],
       ...,
       [1.7       , 5.20554273, 1.12009238],
       [1.8672    , 5.32951289, 1.17191977],
       [2.3886    , 5.25471698, 1.16226415]])

In [272]:
#and now here , doing simple logistic regression
from sklearn.linear_model import SGDClassifier

#model initialization
model_1 = SGDClassifier()

#model Training
model_1.fit(missing_Road_X_train , missing_Road_y_train) #output variables

#model prediction
missing_Road_y_pred = model_1.predict(missing_Road_X_test)

In [273]:
#to check what are the ,missing road data
missing_Road_y_pred

array(['Firestone Boulevard', 'Bradford Street', 'Bradford Street',
       'Bradford Street', 'East Sierra Avenue', 'Bailey Road',
       'Bradford Street', 'Clement Avenue', 'Bradford Street',
       'Bradford Street', 'Clement Avenue', 'Bradford Street',
       'Clement Avenue', 'Clement Avenue', 'Bradford Street',
       'Clement Avenue', 'Bradford Street', 'Clement Avenue',
       'Clement Avenue', 'Bradford Street', 'Clement Avenue',
       'Clement Avenue', 'East Belmont Avenue', 'Bradford Street',
       'Clement Avenue', 'Clement Avenue', 'Clement Avenue',
       'Bradford Street', 'Clement Avenue', 'Clement Avenue',
       'Clement Avenue', 'Clement Avenue', 'Clement Avenue',
       'Clement Avenue', 'Clement Avenue', 'Clement Avenue',
       'Clement Avenue', 'Clement Avenue', 'Clement Avenue',
       'Clement Avenue', 'Harris Street', 'Clement Avenue',
       'Bradford Street', 'Bradford Street', 'Clement Avenue',
       'Bradford Street', 'Carlton Way', 'Bradford Street',
 

In [274]:
np.unique(missing_Road_y_pred)
#these are the unique values we have

array(['42nd Avenue', '5th Street', 'Bailey Road', 'Bradford Street',
       'Broadway', 'Bullet DH', 'Carlton Way', 'Clement Avenue',
       'East 40th Place', 'East Belmont Avenue', 'East Sierra Avenue',
       'El Patio Drive', 'Farquhar Avenue', 'Firestone Boulevard',
       'Georgia Drive', 'Harris Street', 'McAllister Street',
       'Toyon Trail', 'Trooper Gary Gifford Memorial Highway',
       'West 124th Street'], dtype='<U77')

In [275]:
# add the model back to the data frame
for n,i in enumerate(missing_idx): #reason for using eumerate is to get the  missing index here
  df['road'][i] = missing_Road_y_pred[n]  #noe here was passed the nth index and once execute this all the values should be replace back and we'll have no missing data


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  df['road'][i] = missing_Road_y_pred[n]  #noe here was passed the nth index and once execute this all the values should be replace back and we'll have no missing data
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['road'][i] = missing_Road_y_pred[n]  #noe here was passed the nth index and once execute this all the values should be replace back and we'll have no missing data
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the i

In [276]:
df.info()
# noe here no any missing values

<class 'pandas.core.frame.DataFrame'>
Index: 20640 entries, 14517 to 7880
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   MedInc      20640 non-null  float64
 1   HouseAge    20640 non-null  float64
 2   AveRooms    20640 non-null  float64
 3   AveBedrms   20640 non-null  float64
 4   Population  20640 non-null  float64
 5   AveOccup    20640 non-null  float64
 6   Target      20640 non-null  float64
 7   County      0 non-null      object 
 8   road        20640 non-null  object 
dtypes: float64(7), object(2)
memory usage: 2.1+ MB


In [277]:
#label encording because the model ca not work on categorical features
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['road'] = le.fit_transform(df['road'])

In [278]:
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Target,County,road
14517,3.6111,15.0,5.0,0.917241,747.0,2.575862,1.963,,1323
13428,2.7827,9.0,4.764444,1.13,3156.0,3.506667,1.058,,472
3862,4.0391,33.0,5.196141,1.03537,580.0,1.864952,3.375,,3666
8200,3.1711,44.0,4.033784,1.045608,1461.0,2.467905,2.154,,2388
15050,3.6445,17.0,5.84,1.062069,2191.0,3.022069,1.976,,5263


## if have more missing values according to atributes,similary needto predict as road information features.Here we do not have more.


In [304]:
## applying logistic regression to find the missing values/missing raws/data
#here we can use county or road or and then can use any of these two or three parameters t to find out themissing data

missing_idx = []    #missing index basically an array and this give the index value
for i in range(df.shape[0]):
  if df['County'][i] is None:
    missing_idx.append(i)
#Now have to define dependent and independet features(ex:Target,AveOccup, Population...)
#Independnet Parameters
# Use a list comprehension instead of a generator expression
missing_Road_X_train = np.array([[df['MedInc'][i] , df['AveRooms'][i] , df['AveBedrms'][i]] for i in range(df.shape[0]) if i not in missing_idx])
#Dependent parameter
missing_Road_y_train = np.array([df['County'][i] for i in range(df.shape[0]) if i not in missing_idx])
missing_Road_X_test = np.array([[df['MedInc'][i] , df['AveRooms'][i] , df['AveBedrms'][i]] for i in range(df.shape[0]) if i in missing_idx])




In [305]:
missing_Road_X_train
#here contain missing 3 features andthese are independent features,and we use these features to train our model

array([[8.3252    , 6.98412698, 1.02380952],
       [8.3014    , 6.23813708, 0.97188049],
       [7.2574    , 8.28813559, 1.07344633],
       ...,
       [1.7       , 5.20554273, 1.12009238],
       [1.8672    , 5.32951289, 1.17191977],
       [2.3886    , 5.25471698, 1.16226415]])

In [307]:
missing_Road_y_train

array(['Unknown', 'Unknown', 'Unknown', ..., 'Unknown', 'Unknown',
       'Unknown'], dtype='<U7')

In [310]:
# add the model back to the data frame
for n,i in enumerate(missing_idx): #reason for using eumerate is to get the  missing index here
  df['road'][i] = missing_Road_y_pred[n]  #noe here was passed the nth index and once execute this all the values should be replace back and we'll have no missing data

In [320]:
# Convert 'County' column to numerical using Label Encoding
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
# Fit label encoder and transform the 'County' column
df['County'] = le.fit_transform(df['County'].astype(str))

In [321]:
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,County,road
14517,3.6111,15.0,5.0,0.917241,747.0,2.575862,0,1323
13428,2.7827,9.0,4.764444,1.13,3156.0,3.506667,0,472
3862,4.0391,33.0,5.196141,1.03537,580.0,1.864952,0,3666
8200,3.1711,44.0,4.033784,1.045608,1461.0,2.467905,0,2388
15050,3.6445,17.0,5.84,1.062069,2191.0,3.022069,0,5263


## Understanding  which model to use

In [329]:
#here gonna use regression model(Randomforest)
# Dependent Values

# y = df.iloc[:,-3].values #y is target variable
# df= df.drop(labels=['Target'] ,axis= 1)

# independent feature
X = df.iloc[:,:].values

In [331]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 20640 entries, 14517 to 7880
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   MedInc      20640 non-null  float64
 1   HouseAge    20640 non-null  float64
 2   AveRooms    20640 non-null  float64
 3   AveBedrms   20640 non-null  float64
 4   Population  20640 non-null  float64
 5   AveOccup    20640 non-null  float64
 6   County      20640 non-null  int64  
 7   road        20640 non-null  int64  
dtypes: float64(6), int64(2)
memory usage: 1.9 MB


In [332]:
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,County,road
14517,3.6111,15.0,5.0,0.917241,747.0,2.575862,0,1323
13428,2.7827,9.0,4.764444,1.13,3156.0,3.506667,0,472
3862,4.0391,33.0,5.196141,1.03537,580.0,1.864952,0,3666
8200,3.1711,44.0,4.033784,1.045608,1461.0,2.467905,0,2388
15050,3.6445,17.0,5.84,1.062069,2191.0,3.022069,0,5263


In [333]:
from sklearn.model_selection import train_test_split

In [334]:
#train_test_split?
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [335]:
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()
model.fit(X_train, y_train)

In [336]:
# model prediction


y_pred = model.predict(X_test)

In [337]:
#model accuracy

from sklearn.metrics import r2_score
r2_score(y_test, y_pred)*100 # actual value is y_test

87.30168987495486

## Add our own data

In [343]:
inp = np.array([3.6111,	15.0,	5.000000,	0.917241,	747.0,	2.575862,	0	,1323]) # by eliminating the target value

In [344]:
inp.shape # here can see 8 parameters

(8,)

In [345]:
model.predict([inp])

array([2.57588301])