<font size = 6> <center> House Prices - Advanced Regression Techniques </center> </font>

<img src="https://images.unsplash.com/photo-1516156008625-3a9d6067fab5?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=870&q=80">

# Description

**Goal** </br>
Predict the final price of each residential homes in Ames, Iowa  </br>

**Metric**</br>
 Root-Mean-Squared-Error (RMSE) 
 
**Dataset Overview**
Regression problem with 79 explanatory variables, the documentation is available in "\data\data_description.txt"

# Setup

## Google Colab Configuration

In [None]:
#clone the repository to have access to all the data and files
repository_name = "Machine_Learning Pipeline_-_Complete Overview"
repository_url = 'https://github.com/TKovaks78/Machine-Learning-/tree/main/ML-Practice/ + repository_name

In [None]:
! git clone $repository_url

In [None]:
#Install Requirements
! pip install -Uqqr $repository_name/requirements.txt

⚠️ Restart the kernel after running these cells for the first time

## Essential

In [11]:
# Importing required libraries for the project
import numpy as np # for scientific computing
import pandas as pd # for data anaysis
import matplotlib # for visualization
import seaborn as sns # for visualization
import sklearn # ML Library
import os

# Scikit-Learn ≥0.20 is required
assert sklearn.__version__ >= "0.20"

# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Increase pandas display limit of columns to 500 
pd.options.display.max_columns = 500 

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# No warning of any kind please!
import warnings
# will ignore any warnings
warnings.filterwarnings("ignore")

## Save Figures

**Method 1**: makes it easy to save figure in a specific location in an organized way

In [12]:
import os

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "01_-_Getting Started"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

To call the function just insert the code below in a plot cell (We will see example later)

In [3]:
#save_fig("input figure name")

**Method 2**: matplotlib integrated function (easier but more limitation, especially if you are working on github 

In [None]:
#Insert this code in a plot cell
#fig.savefig('path/to/save/image/to.png')

# Fetch the data

## Download the data then clone it in local

In [13]:
import os
import tarfile
import urllib.request

#Define path from where you download the data
DOWNLOAD_ROOT = "https://github.com/TKovaks78/Machine_Learning-Pipeline_-_Complete_Overview/blob/main/"
PATH = os.path.join("datasets", "housing")
URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

#Function to fetch the data from the url
def fetch_data(url= URL, path= PATH):
    if not os.path.isdir(path):
        os.makedirs(path)
    tgz_path = os.path.join(path, "housing.tgz")
    urllib.request.urlretrieve(url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=path)
    housing_tgz.close()
    
#Call the function
fetch_data()

#Function to load the data
def load_data(path=PATH):
    csv_path = os.path.join(path, "housing.csv")
    return pd.read_csv(csv_path)

#Call the function
df = load_data()

#Read the data
df.head(10)

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY
5,-122.25,37.85,52.0,919.0,213.0,413.0,193.0,4.0368,269700.0,NEAR BAY
6,-122.25,37.84,52.0,2535.0,489.0,1094.0,514.0,3.6591,299200.0,NEAR BAY
7,-122.25,37.84,52.0,3104.0,687.0,1157.0,647.0,3.12,241400.0,NEAR BAY
8,-122.26,37.84,42.0,2555.0,665.0,1206.0,595.0,2.0804,226700.0,NEAR BAY
9,-122.25,37.84,52.0,3549.0,707.0,1551.0,714.0,3.6912,261100.0,NEAR BAY
