# Problems

🎯 This exercise is made up of two problems that must be solved using what you have learned today. 

👇 Load the `houses.csv` dataset into this notebook as a pandas dataframe, and display its first 5 rows.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
houses = pd.read_csv('../data/houses.csv')
houses.head(5)

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


ℹ️ You can download a detailed description of the dataset [here](https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_Houses_dataset_description.txt).

## Problem 1

I've bought my apartment for the price of $150,000 (`SalePrice`). It has:
- 3 bedrooms (`BedroomAbvGr`)
- 2 kitchens (`KitchenAbvGr`)
- An overall condition of 8 (`OverallQual`)

❓  What is the surface (`GrLivArea`) of my apartment? Save your answer under variable name `surface`.

In [3]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_validate

model = LinearRegression()
reduced_dataset = houses[:300].copy()
features = ['GrLivArea', 'BedroomAbvGr', 'KitchenAbvGr', 'OverallQual','SalePrice']

X = reduced_dataset[features[:-1]]
y = reduced_dataset[features[-1]]
                    
cv_reduced_dataset = cross_validate(model, X, y, cv =5)
reduced_dataset_score = cv_reduced_dataset['test_score'].mean()
reduced_dataset_score

0.7529260160815782

In [4]:
model.fit(X,y)
model.intercept_, model.coef_

(-46407.23570564232,
 array([    72.51802823, -12521.4613686 , -21514.28774308,  28796.77695901]))

In [5]:
a, b, c, d = model.coef_

In [6]:
surface = (b * 3 + c * 2 + d * 8)/a

======>>>> I could have also find `surface` by predicting it from a new model with `GrLivArea` as label.

### ☑️ Check your code

In [7]:
from nbresult import ChallengeResult

result = ChallengeResult('problem_1',
                         answer = surface
)

result.write()
print(result.check())

platform darwin -- Python 3.8.6, pytest-6.2.3, py-1.10.0, pluggy-0.13.1 -- /Users/smrack/.pyenv/versions/3.8.6/envs/lewagon/bin/python3.8
cachedir: .pytest_cache
rootdir: /Users/smrack/code/olushO/data-challenges/05-ML/01-Fundamentals-of-Machine-Learning/03-Problems
plugins: anyio-2.2.0, dash-1.20.0
[1mcollecting ... [0mcollected 1 item

tests/test_problem_1.py::TestProblem_1::test_problem_1 [32mPASSED[0m[32m            [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/problem_1.pickle

[32mgit[39m commit -m [33m'Completed problem_1 step'[39m

[32mgit[39m push origin master


## Problem 2

I am looking for a new apartment and have a maximum budget of 200,000. I come accross one apartment I really like. It has:

- A surface of 1000 (`GrLivArea`)
- 4 bedrooms (`BedroomAbvGr`)
- 1 kitchen (`KitchenAbvGr`)
- An overall quality of 7 (`OverallQual`)

❓ What is the probability that its price (`SalePrice`) is within my budget? Compute your answer and save it under variable name `probability`.

In [8]:
max_budget = 200_000

prediction = model.predict([[1000, 4, 1, 7]])

probability = (prediction <= max_budget) * 1

### ☑️ Check your code

In [9]:
from nbresult import ChallengeResult

result = ChallengeResult('problem_2',
                         answer = probability
)

result.write()
print(result.check())

platform darwin -- Python 3.8.6, pytest-6.2.3, py-1.10.0, pluggy-0.13.1 -- /Users/smrack/.pyenv/versions/3.8.6/envs/lewagon/bin/python3.8
cachedir: .pytest_cache
rootdir: /Users/smrack/code/olushO/data-challenges/05-ML/01-Fundamentals-of-Machine-Learning/03-Problems
plugins: anyio-2.2.0, dash-1.20.0
[1mcollecting ... [0mcollected 1 item

tests/test_problem_2.py::TestProblem_2::test_problem_2 [32mPASSED[0m[32m            [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/problem_2.pickle

[32mgit[39m commit -m [33m'Completed problem_2 step'[39m

[32mgit[39m push origin master


# 🏁