![Immoscout](https://raw.githubusercontent.com/juliandnl/redi_ss20/master/image.png)

# **Immobilien Scout 24 Dataset**
It contains entries from rental flats in Berlin. Lets explore the dataset and find a suitable flat for my friend Josy, who is currently, desperately searching for a new flat in Berlin.

# **Import Packages**

In [292]:
import pandas as pd
import numpy as np
import seaborn as sns
sns.set(style="darkgrid")
import matplotlib.pyplot as plt
import datetime
from plotly.offline import init_notebook_mode, iplot, plot
import plotly as py
init_notebook_mode(connected=True)
import plotly.graph_objs as go
from dateutil.relativedelta import relativedelta
from wordcloud import WordCloud
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# **Load Dataset**

In [293]:
rentals = pd.read_csv("https://raw.githubusercontent.com/juliandnl/redi_ss20/master/berlin_rental.csv")
rentals.head(10)

Unnamed: 0.1,Unnamed: 0,URL,Region,Condition,Rooms,Rent,Year_Construction,Space
0,0,https://www.immobilienscout24.de/expose/116051687,Mitte,first_time_use,4.0,2659.0,2019,117.2
1,1,https://www.immobilienscout24.de/expose/115338103,Kreuzberg,first_time_use,1.0,1200.0,2020,29.33
2,2,https://www.immobilienscout24.de/expose/116458710,Köpenick,well_kept,2.0,979.0,1997,83.61
3,3,https://www.immobilienscout24.de/expose/116573177,Wilmersdorf,well_kept,4.0,1830.22,1900,171.18
4,4,https://www.immobilienscout24.de/expose/115925878,Kreuzberg,first_time_use,2.0,2272.0,2020,88.27
5,6,https://www.immobilienscout24.de/expose/115611847,Köpenick,well_kept,2.0,840.0,1997,73.51
6,7,https://www.immobilienscout24.de/expose/108376992,Mitte,mint_condition,2.0,1509.45,2015,61.61
7,10,https://www.immobilienscout24.de/expose/116573270,Charlottenburg,well_kept,2.0,730.73,1900,72.61
8,13,https://www.immobilienscout24.de/expose/116456427,Friedrichsfelde,well_kept,2.0,561.93,1971,62.4
9,16,https://www.immobilienscout24.de/expose/113934099,Tiergarten,first_time_use,3.0,1789.0,2020,77.66


### Berlin Rental Flat Dataset
We have 6 relevant columns: 
- URL: is the link to the rental expose. You can have a look at the flat!
- Region: is the Berlin district, where the flat is located
- Condition: is the condition of the flat
- Rooms: The number of rooms the flat has
- Rent: Monthly rent for the flat
- Year_Construction: The year in which the house was build
- Space: How many square meters does the flat has?

---

Exercise:
1. How many rows does the dataset have?
2. How many different Regions are there?
3. What is the maximum rent? What is the minimum rent?
4. What is the smallest flat?

# **How many rows does the dataset have?**

In [294]:
rentals.shape

(764, 8)

In [295]:
rentals.count()

Unnamed: 0           764
URL                  764
Region               764
Condition            764
Rooms                764
Rent                 764
Year_Construction    764
Space                764
dtype: int64

In [296]:
rentals.describe(include='all')

Unnamed: 0.1,Unnamed: 0,URL,Region,Condition,Rooms,Rent,Year_Construction,Space
count,764.0,764,764,764,764.0,764.0,764.0,764.0
unique,,764,10,9,,,,
top,,https://www.immobilienscout24.de/expose/100800164,Tiergarten,first_time_use,,,,
freq,,1,171,270,,,,
mean,777.913613,,,,2.549738,1768.560942,1983.695026,84.65627
std,481.576864,,,,1.010826,1118.263961,48.00606,44.195338
min,0.0,,,,1.0,271.25,1864.0,14.0
25%,321.5,,,,2.0,1039.5,1959.0,55.0975
50%,746.5,,,,2.5,1565.48,2015.0,77.6
75%,1247.5,,,,3.0,2170.0,2019.0,104.45


In [297]:
rentals.isnull().any()

Unnamed: 0           False
URL                  False
Region               False
Condition            False
Rooms                False
Rent                 False
Year_Construction    False
Space                False
dtype: bool

In [298]:
rentals = rentals.drop('Unnamed: 0', axis=1)

# **How many different Regions are there?**

In [299]:
rentals.Region.unique()

array(['Mitte', 'Kreuzberg', 'Köpenick', 'Wilmersdorf', 'Charlottenburg',
       'Friedrichsfelde', 'Tiergarten', 'Prenzlauer', 'Wedding',
       'Neukölln'], dtype=object)

In [300]:
rentals.Region.nunique()

10

# **What is the maximum rent? What is the minimum rent?**


In [301]:
rentals.Rent.max()

14207.0

In [302]:
rentals.Rent.min()

271.25

# **What is the smallest flat?**

In [303]:
rentals.Space.min()

14.0

In [304]:
rentals.Rooms.min()

1.0

In [305]:
min_space = rentals.sort_values("Space", ascending= True)
min_space.iloc[:1]

Unnamed: 0,URL,Region,Condition,Rooms,Rent,Year_Construction,Space
84,https://www.immobilienscout24.de/expose/116238503,Neukölln,well_kept,1.0,450.0,1910,14.0


### Groupby

Let's practice groupby!
If you need some help: 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

---
Exercise
1. Groupby Region - What is on average the most expensive region?
2. What is the mean rent for the different amount of rooms?
3. What is the mean rent per condition?

# **What is on average the most expensive region?**

In [306]:
rentals.groupby('Region')[['Rent']].mean()

Unnamed: 0_level_0,Rent
Region,Unnamed: 1_level_1
Charlottenburg,1753.452532
Friedrichsfelde,1043.871731
Kreuzberg,2049.138085
Köpenick,1125.475818
Mitte,2408.725033
Neukölln,1188.114915
Prenzlauer,1996.962264
Tiergarten,1914.824795
Wedding,847.138571
Wilmersdorf,1748.646226


# **What is the mean rent for the different amount of rooms?**

In [307]:
rentals.groupby('Rooms')[['Rent']].mean()

Unnamed: 0_level_0,Rent
Rooms,Unnamed: 1_level_1
1.0,833.258318
1.5,882.735556
2.0,1358.017786
2.5,1691.04375
3.0,2048.237042
3.5,1901.92875
4.0,2690.214242
4.5,1370.0
5.0,4359.084211
6.0,3739.46


# **What is the mean rent per condition?**

In [308]:
rentals.groupby('Condition')[['Rent']].mean()

Unnamed: 0_level_0,Rent
Condition,Unnamed: 1_level_1
first_time_use,1866.564222
first_time_use_after_refurbishment,1925.351944
fully_renovated,1778.308571
mint_condition,2122.196154
modernized,1185.705172
need_of_renovation,612.29
no_information,1544.274955
refurbished,2177.2565
well_kept,1217.096979


--- 

Exercise 

1. Which region has the best price per square meter? The cheapest square meter price?
2. Which region has on average the oldest buildings?
3. Which region has the best "in shape" flats to offer?
4. Does the shape have an influence on the price per square meter?


# **Which region has the best price per square meter? The cheapest square meter price?**

In [309]:
rentals["Per_sqr_meter"] = rentals["Rent"]/rentals["Space"]

In [310]:
rentals.groupby('Region').min()[['Per_sqr_meter',"Rent"]].sort_values(by="Per_sqr_meter")

Unnamed: 0_level_0,Per_sqr_meter,Rent
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
Friedrichsfelde,7.829252,280.74
Wedding,7.950653,352.95
Tiergarten,8.696825,271.25
Neukölln,8.837969,379.95
Prenzlauer,10.002556,475.0
Charlottenburg,10.019263,478.52
Köpenick,10.066225,558.06
Wilmersdorf,10.691786,404.39
Kreuzberg,11.346,435.74
Mitte,15.120052,398.0


# **Which region has on average the oldest buildings?**

In [311]:
rentals.groupby('Region')[['Year_Construction']].min().iloc[:1]

Unnamed: 0_level_0,Year_Construction
Region,Unnamed: 1_level_1
Charlottenburg,1889


# **Which region has the most "in shape" flats to offer?**

In [312]:
rentals.Condition.unique()

array(['first_time_use', 'well_kept', 'mint_condition', 'no_information',
       'fully_renovated', 'first_time_use_after_refurbishment',
       'refurbished', 'modernized', 'need_of_renovation'], dtype=object)

In [313]:
rentals.groupby(["Region","Condition"]).size().unstack(fill_value=0).sort_values(by= "first_time_use",ascending=False)

Condition,first_time_use,first_time_use_after_refurbishment,fully_renovated,mint_condition,modernized,need_of_renovation,no_information,refurbished,well_kept
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Tiergarten,101,1,10,13,2,1,28,2,13
Mitte,43,7,3,49,1,0,29,15,6
Charlottenburg,27,7,5,9,4,0,8,7,12
Friedrichsfelde,25,0,3,5,8,0,4,0,7
Neukölln,20,0,4,3,3,1,10,5,13
Wilmersdorf,19,6,3,6,1,0,4,3,11
Köpenick,15,8,1,6,4,0,0,1,20
Kreuzberg,11,2,2,17,2,0,7,2,4
Wedding,5,2,1,16,1,2,10,0,5
Prenzlauer,4,3,3,19,3,0,11,5,5


# **Does the shape have an influence on the price per square meter?**

In [314]:
rentals.groupby("Condition")[["Per_sqr_meter"]].mean()

Unnamed: 0_level_0,Per_sqr_meter
Condition,Unnamed: 1_level_1
first_time_use,22.966016
first_time_use_after_refurbishment,19.334931
fully_renovated,19.207692
mint_condition,25.134925
modernized,15.952482
need_of_renovation,11.073803
no_information,19.742599
refurbished,26.445485
well_kept,15.521736


# **Plot**
Plotting with pandas. Need some help?
- https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html

---  
Exercise
1. Explore the distribution of the rent
2. What is the relationship between the construction year and the rent?


## **Plot the distribution of the rent**

In [315]:
fig = go.Figure()
fig.add_trace(go.Scatter(y=rentals['Rent'],
                    mode='lines',
                    name='lines'))
fig.show()

## **Plot the relationship between the construction year and the rent?**

In [316]:

Yc_rnt=go.Scatter(
                    x = rentals['Year_Construction'],
                    y = rentals['Rent'],
                    mode = "markers",
                    name = "Year of Construction",
                    marker = dict(color = 'rgba(255, 128, 2, 0.8)'),
                    )

layout = dict(title = 'Relationship between the Year of Construction and the rent',
              xaxis= dict(title= 'Rent',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Year of Construction',ticklen= 5,zeroline= False)
             )
fig = dict(data = Yc_rnt, layout = layout)
iplot(fig)

## **Plot the relationship between the Space and the rent?**

In [317]:

Sp_rnt=go.Scatter(
                    x = rentals['Space'],
                    y = rentals['Rent'],
                    mode = "markers",
                    name = "Space",
                    marker = dict(color = 'rgba(0, 255, 200, 0.8)'),
                    )

layout = dict(title = 'Relationship between the Space and the rent',
              xaxis= dict(title= 'Rent',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Space',ticklen= 5,zeroline= False)
             )
fig = dict(data = Sp_rnt, layout = layout)
iplot(fig)

## **Plot the relationship between the Rooms and the rent?**

In [318]:

Rm_rnt=go.Scatter(
                    x = rentals['Rooms'],
                    y = rentals['Rent'],
                    mode = "markers",
                    name = "2014",
                    marker = dict(color = 'rgba(0, 0, 0, 0.8)'),
                    )

layout = dict(title = 'Relationship between the number of rooms and the rent',
              xaxis= dict(title= 'Rent',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Number of Rooms',ticklen= 5,zeroline= False)
             )
fig = dict(data = Rm_rnt, layout = layout)
iplot(fig)

## **Plot the relationship between the condition and the rent?**

In [319]:
data = {
  'x': rentals['Condition'],
  'name': 'Condition',
  'type': 'bar'
};

layout = {
  'xaxis': {'title': 'Regions'},
    'yaxis': {'title': 'Count of Condition'},
  'barmode': 'relative',
  'title': 'Total count of Condition'
};
fig = go.Figure(data = data, layout = layout)
iplot(fig)

## **Plot the Total count of regions?**

In [320]:
data = {
  'x': rentals['Region'],
  'name': 'Region',
  'type': 'bar'
};

layout = {
  'xaxis': {'title': 'Regions'},
    'yaxis': {'title': 'Count of regions'},
  'barmode': 'relative',
  'title': 'Total count of regions'
};
fig = go.Figure(data = data, layout = layout)
iplot(fig)

#**Recommend a good flat!**
My friend Josy is looking for a flat. She is still a student and cannot afford an expensive flat. Are there flats with a rent less than 500€ and more than 25 sqm (Space). She would like to move to Kreuzberg, Wedding, Prenzlauer Berg or Mitte. Any flats available for her? Is there a flat with two rooms and these constraints?

In [321]:
rentals[(rentals['Rent']< 500) & (rentals['Space'] > 25) & 
        ((rentals['Region'] == "Kreuzberg") | (rentals['Region'] == "Wedding")| 
         (rentals['Region'] == "Prenzlauer Berg")| (rentals['Region'] == "Mitte")) &  
        (rentals['Rooms'] == 2)]

Unnamed: 0,URL,Region,Condition,Rooms,Rent,Year_Construction,Space,Per_sqr_meter
97,https://www.immobilienscout24.de/expose/116697084,Wedding,no_information,2.0,383.46,1890,48.23,7.950653


Did you find any suitable flats? Send me the url via Slack!

# **Answer:**
Hi Josy,
So i filtered your request based on your most important criteria (Rent, Space, Region, No. of rooms) and I can recommend a 48.23m2, two bedroom apartment in Wedding at a rental cost of 383.46 Euros monthly. Here is the URL for the Ad; https://www.immobilienscout24.de/expose/116697084

# **ML Predictions**

## **Predicting space and number of rooms on rental price**

In [322]:
rentals.head()

Unnamed: 0,URL,Region,Condition,Rooms,Rent,Year_Construction,Space,Per_sqr_meter
0,https://www.immobilienscout24.de/expose/116051687,Mitte,first_time_use,4.0,2659.0,2019,117.2,22.687713
1,https://www.immobilienscout24.de/expose/115338103,Kreuzberg,first_time_use,1.0,1200.0,2020,29.33,40.91374
2,https://www.immobilienscout24.de/expose/116458710,Köpenick,well_kept,2.0,979.0,1997,83.61,11.709126
3,https://www.immobilienscout24.de/expose/116573177,Wilmersdorf,well_kept,4.0,1830.22,1900,171.18,10.691786
4,https://www.immobilienscout24.de/expose/115925878,Kreuzberg,first_time_use,2.0,2272.0,2020,88.27,25.739209


In [323]:
predictors = ['Rooms','Space']

X= rentals[predictors]

Y= rentals['Rent']

In [324]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2,random_state=1) 

In [325]:
X_train

Unnamed: 0,Rooms,Space
567,2.0,52.48
576,2.0,66.00
346,1.0,26.90
257,4.0,118.37
13,2.0,186.00
...,...,...
645,2.0,50.18
715,1.0,30.98
72,2.5,125.00
235,3.0,105.00


In [326]:
len(X_train)

611

In [327]:
clf= LinearRegression()

In [328]:
clf.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [329]:
clf.predict(X_test)

array([3274.71840856, 4829.34215318, 2814.14539572, 1782.08490655,
       1727.84168621,  539.89083875, 2662.70269386,  336.1881737 ,
       1533.46091143, 3957.80568507, 1724.48775563, 1655.77146359,
       1463.97537365, 1635.08986114,  580.80927802, 1295.08471439,
       1875.68988641, 1488.3353868 , 1559.7202727 , 2284.67008453,
       1799.67133776,  656.41943754, 2060.93477723, 1105.41035538,
       1499.23214508, 1844.57447957, 2082.40381369, 1483.44296471,
       1736.49642802, 8856.71324986, 4393.06343271, 2628.9005049 ,
       2330.13827752,  976.18774748, 1377.14397574, 1654.65954948,
       1408.27757084, 1078.92861129, 1989.56807958, 2225.29387102,
       1477.21624569, 3552.25217056,  979.52348981,  846.33436762,
       1792.29632812,  899.02090821, 1319.324442  , 2102.53855322,
       1575.16678471, 2573.32298763, 1377.36635857, 2144.68919213,
       4071.92444932,  480.73700807, 1261.70910281, 2350.903789  ,
       1564.16792914, 1995.57241578, 1461.85364271, 1995.57241

In [330]:
Y_test

457    3090.40
148    3739.46
741    3925.00
526    2105.40
400    1700.00
        ...   
491    1600.00
643    2175.00
506    2295.00
331    1570.00
439    2484.62
Name: Rent, Length: 153, dtype: float64

In [331]:
clf.score(X_test,Y_test)

0.7685562731895246

## **Predicting Year of Construction on rental price**

In [332]:
predictors = ['Year_Construction']

X= rentals[predictors]

Y= rentals['Rent']

In [333]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2,random_state=1) 

In [334]:
clf= LinearRegression()

In [335]:
clf.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [336]:
clf.predict(X_test)

array([1618.36604488, 1643.27492716, 1789.96056725, 1794.11204763,
       1755.36489742, 1791.34439405, 1742.91045628, 1643.27492716,
       1792.72822084, 1629.43665923, 1794.11204763, 1794.11204763,
       1792.72822084, 1759.5163778 , 1785.80908687, 1795.49587443,
       1794.11204763, 1795.49587443, 1643.27492716, 1629.43665923,
       1788.57674046, 1789.96056725, 1794.11204763, 1794.11204763,
       1795.49587443, 1795.49587443, 1794.11204763, 1794.11204763,
       1794.11204763, 1788.57674046, 1776.12229932, 1629.43665923,
       1646.04258075, 1794.11204763, 1720.76922759, 1794.11204763,
       1792.72822084, 1788.57674046, 1794.11204763, 1789.96056725,
       1737.37514911, 1766.43551177, 1654.34554151, 1745.67810987,
       1794.11204763, 1792.72822084, 1727.68836155, 1618.36604488,
       1730.45601514, 1788.57674046, 1795.49587443, 1785.80908687,
       1788.57674046, 1791.34439405, 1629.43665923, 1794.11204763,
       1630.82048602, 1791.34439405, 1792.72822084, 1587.92185

In [337]:
clf.score(X_test,Y_test)

0.002194409435695266

## **Predicting size per square meter on rental price**

In [338]:
predictors = ['Per_sqr_meter']

X= rentals[predictors]

Y= rentals['Rent']

In [339]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2,random_state=1) 

In [340]:
clf= LinearRegression()

In [341]:
clf.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [342]:
clf.predict(X_test)

array([1683.70172917, 1508.45941157, 2034.13046941, 1853.44633964,
       1727.18008961, 2027.46773346, 1665.72956066, 2187.75189753,
       1950.45612291, 1734.66451847, 1718.98597513, 1791.62674931,
       1747.42837018, 1493.69797713, 3659.57838913, 1639.25331276,
       1797.65928289, 1729.95574462, 1426.3464472 , 1622.94616342,
       1831.95177869, 1905.22488767, 1756.34525003, 1610.44625091,
       1913.48194495, 1640.04742238, 1659.30754335, 1739.85743724,
       1667.62784565, 2278.2864703 , 1742.2924186 , 1808.23177077,
       1457.20412417, 1891.25179849, 1324.47406919, 1762.56923072,
       1749.09250402, 2301.80587049, 1729.36145702, 1654.81140926,
       1488.39562077, 1569.11420587, 1276.2393626 , 1510.8744381 ,
       1814.45940022, 2284.88124031, 1227.31110077, 1465.8974918 ,
       1338.25648616, 1675.39837626, 1656.75629639, 1520.43145911,
       1756.94523489, 2167.64774447, 1220.36565723, 1828.9867777 ,
       1695.46318249, 1837.75951259, 2149.97859724, 1475.80075

In [343]:
clf.score(X_test,Y_test)

0.05847284076423664

## **Predicting region on rental price**

In [344]:
Region = pd.get_dummies(rentals['Region'])

In [345]:
Region.head()

Unnamed: 0,Charlottenburg,Friedrichsfelde,Kreuzberg,Köpenick,Mitte,Neukölln,Prenzlauer,Tiergarten,Wedding,Wilmersdorf
0,0,0,0,0,1,0,0,0,0,0
1,0,0,1,0,0,0,0,0,0,0
2,0,0,0,1,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,1
4,0,0,1,0,0,0,0,0,0,0


In [346]:
X= Region
Y= rentals['Rent']

In [347]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2,random_state=1)

In [348]:
clf= LinearRegression()

In [349]:
clf.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [350]:
clf.predict(X_test)

array([2336., 2112., 2336., 2112., 1728., 2336., 2016., 1216., 2336.,
       1664., 2016., 2016., 1664., 1216., 2336., 1216., 2336., 2336.,
       1216., 2048., 1664., 1088., 2016., 1088., 2048., 1216., 2016.,
       2336., 2048., 2336., 2336., 2336., 2112., 1216., 2048., 2016.,
       2016., 2016., 2016., 1728., 1728., 2336., 1216., 1216., 2016.,
       2016., 1216., 2016., 1216., 2112., 1664., 2336., 2336.,  736.,
       1216., 2016., 2112., 1728., 2016., 1088., 2112., 2112., 2112.,
       1216., 2336., 2336., 2016., 2016., 2016., 1088., 1088.,  736.,
       1728., 2016., 1216., 2336., 1088., 1664., 2112., 2016.,  736.,
       2016., 2016., 1728., 1216., 2016., 2336., 2336., 2048.,  736.,
       2336., 2016., 2016., 1216., 2336., 2336., 2016., 2016., 2336.,
       1728., 1216., 1728., 2016., 1216., 2336., 2016.,  736., 1728.,
       1728., 1664.,  736., 2336., 2336., 2016., 1728., 2336., 2016.,
       2112., 2336., 2016.,  736., 2016., 1728., 2016., 2112., 1728.,
       2336., 1216.,

In [351]:
clf.score(X_test,Y_test)

0.15015395355965777

## **Predicting Condition on rental price**

In [352]:
Condition = pd.get_dummies(rentals['Condition'])

In [353]:
Condition.head()

Unnamed: 0,first_time_use,first_time_use_after_refurbishment,fully_renovated,mint_condition,modernized,need_of_renovation,no_information,refurbished,well_kept
0,1,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,1
3,0,0,0,0,0,0,0,0,1
4,1,0,0,0,0,0,0,0,0


In [354]:
X= Condition
Y= rentals['Rent']

In [355]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3,random_state=1)

In [356]:
clf= LinearRegression()

In [357]:
clf.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [358]:
clf.predict(X_test)

array([2176., 1888., 1888., 2176., 1344., 1888., 1600., 1376., 1888.,
       1728., 1888., 1888., 1888., 1344., 1888., 1888., 1888., 1600.,
       1600., 1888., 1888., 1376., 1888., 1888., 1888., 1888., 1888.,
       1888., 1888., 1888., 1888., 1888., 1728., 1888., 1376., 1888.,
       1888., 1888., 1888., 1888., 2176., 1888., 1376., 1344., 1888.,
       1888., 1376., 1600., 1376., 1888., 1888., 1888., 1888., 1888.,
       1376., 1888., 1376., 1888., 1888., 1728., 1888., 1600., 2176.,
       1888., 1888., 1888., 1600., 1600., 1376., 1888., 1376., 1888.,
       1344., 1600., 1376., 1600., 1376., 2176., 1376., 1888., 1888.,
       1888., 1376., 1888., 1888., 1600., 2176., 1600., 1888., 1888.,
       1600., 1888., 1600., 1344., 1888., 1728., 1888., 1888., 1888.,
       1376., 1888., 1536., 1888., 1888., 1888., 1600., 1888., 1728.,
       1888., 1376., 1600., 2176., 1888., 2176., 1888., 1728., 1888.,
       1888., 1536., 1728., 1888., 1888., 1728., 1888., 1344., 1888.,
       1888., 1888.,

In [359]:
clf.score(X_test,Y_test)

0.056275031080726845

# **My questions**


1. can we predict individual features or just one features, or we always have to predict in pairs

2. in the case where on of the predictors does nt have a clear or very linear relation how do we identify that we predicting with multiple features

3. Is it just enough to use a scatter plot which already shows the linear relationship
4. How can we convert none numeric features to numeric -done, i used get_dummies😀
5. What does it mean when we have CLF score 0.05
