<img src="Images/IMG-Regression.png">

The purpose of this notebook is to use various characteristics to predict the value of real estate.
<img style="float: right; margin:5px 0px 0px 10px" src="Images/IMG-houses.png" alt="Some buildings" width="400">
You will receive a fully labeled data set and a data set in which the value to be predicted is missing. You can use this test data set to make a prediction with the best model you have found and send the results to us. We then calculate the mean deviation from the actual prices known to us and this creates an internal course ranking list that we will announce in the course at the end of the lecture period.

**Important:** If you work on this notebook as part of the AIS internship, please send it to your supervisor together with the notebook for classification in a ZIP archive (see slides for introduction to the internship).

## Content
It is best to follow the given structure of this notebook in your work. Good luck!
<img style="float: right; margin:5px 0px 0px 10px" src="Images/IMG-Go.png" alt="Go" width="300">
<table style="float:left; width:256; border: 1px solid black; display: inline-block">
  <tr>
    <td  style="text-align:right" width=64px><img src="Images/IMG-csv-in.png" style="float:left"></td>
      <td style="text-align:left" width=128px>
          <a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#import_data'>Importing data</a>
      </td>
  </tr>
  <tr>
    <td style="text-align:right"><img src="Images/IMG-magnifying-glass.png" style="float:left"></td>
    <td style="text-align:left" width=128px><a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#analyze_data'>Analysing data</a>
      </td>
  </tr>
    <tr>
    <td style="text-align:right"><img src="Images/IMG-broom.png" style="float:left"></td>
    <td style="text-align:left" width=128px><a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#clean_data'>Preprocessing data</a>
        </td>
    </tr>
    <tr>
    <td style="text-align:right"><img src="Images/IMG-diagram.png" style="float:left"></td>
    <td style="text-align:left" width=128px><a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#build_model'>Choosing a model</a>
        </td>
  </tr>
        <tr>
    <td style="text-align:right"><img src="Images/IMG-euro.png" style="float:left"></td>
    <td style="text-align:left" width=128px><a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#make_predictions'>Making predictions</a>
        </td>
  </tr>
        <tr>
    <td style="text-align:right"><img src="Images/IMG-csv-out.png" style="float:left"></td>
    <td style="text-align:left" width=128px><a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#save_predictions'>Saving predictions</a>
        </td>
  </tr>
</table>

<a id='import_data'></a><div><img src="Images/IMG-csv-in.png" style="float:left"> <h2 style="position: relative; top: 6px; left:10px">1. Importing data</h2>
<p style="position: relative; top: 10px">
Read in the real estate data for training the regression model from the "houses.csv" file in the "Data" folder. The features contained are briefly explained in the following table:
</p>
<table style="width:256; border: 1px solid black; display: inline-block">
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">index</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Index of property</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">crime</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Per capita crime rate in the area</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">zoned</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Share of large residential parcels in the area</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">industrial</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Share of industry in the areail von Industrie in Umgebung</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">river</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Adjacent to the river</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">nox</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Nitric oxide pollution (parts per 10 million)</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">rooms</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Number of rooms</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">age</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Share of pre-war buildings in the area</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">distances</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Weighted distance to five major employers in the area</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">highway</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Assessment of motorway access</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">tax</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Property tax in 10000 dollars</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">p_t_ratio</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Relationship of students to teachers in the area</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">lower_status</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Proportion of people of low socio-economic status in the area</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">value</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Property value in $ 1000 (to be predicted)</p>
        </td>
    </tr>
    </table>

In [5]:
import pandas as pd
# Import data

<a id='analyze_data'></a><div><img src="Images/IMG-magnifying-glass.png" style="float:left"> <h2 style="position: relative; top: 6px; left: 10px">2. Analysing data</h2>
<p style="position: relative; top: 10px">
Browse through the data. Which columns are numeric? How is the value distribution in the individual columns? Are there missing values?
</p>

In [None]:
# Data exploration

<a id='clean_data'></a><div><img src="Images/IMG-broom.png" style="float:left"> <h2 style="position: relative; top: 6px; left:10px">3. Prepocessing data</h2>

In [None]:
# Preprocess data

<a id='build_model'></a><div><img src="Images/IMG-diagram.png" style="float:left"> <h2 style="position: relative; top: 6px; left:10px">4. Chossing a model</h2>
<p style="position: relative; top: 10px">
Train a regression model to predict the value of a property. Regardless of the type of model chosen, remember to optimize all hyperparameters.
</p>

In [None]:
# Train the model

# Don't forget cross-validation to optimize the hyperparameters!

# At the end, train the model with optimal hyperparameters on all labeled data!

<a id='make_predictions'></a><div><img src="Images/IMG-euro.png" style="float:left"> <h2 style="position: relative; top: 6px; left:10px">5. Making predictions</h2>
<p style="position: relative; top: 10px">
Use your trained model to make a prediction of the value for all unlabeled observations from "houses_test.csv"!

Remember to prepare the data beforehand in <strong>exactly the same way</strong> as you did with the training data.</p>

In [None]:
# Pre-processing of test data (exactly like preprocessing of training data!)

# Make predictions

<a id='save_predictions'></a><div><img src="Images/IMG-csv-out.png" style="float:left"> <h2 style="position: relative; top: 6px; left:10px">6. Saving predictions</h2>
<p style="position: relative; top: 10px">
Save the forecasts in a CSV file. The file should only have one column. Each line should contain exactly one number that corresponds to the predicted value. The order of the predictions must match the order of the observations in houses_test.csv.
</p>

In [None]:
# Save predictions

If you want, you can send the CSV file with your predictions to <a href="mailto:simon.stone@tu-dresden.de?Subject=Vorhersagen%20zu%20Jupyter%20Notebook%20Immobilienpreise" target="_top">Simon Stone</a> and take part in a small competition: We calculate the mean deviation of the predictions of your model on the unseen data and thus establish an internal course ranking that we will announce in the course at the end of the lecture period. Optionally, you can also provide us with a pseudonym under which you will appear on the list if you do not want to appear there by name. Good luck!

<img src="Images/IMG-Monopoly-Tokens.jpg">

---
<div>Housing data in the public domain. Source: Harrison, D. and Rubinfeld, D.L. "Hedonic prices and the demand for clean air", J. Environ. Economics & Management, vol.5, 81-102, 1978.</div>
<div>Icons made by <a href="https://www.flaticon.com/authors/swifticons" title="Swifticons">Swifticons</a> from <a href="https://www.flaticon.com/" title="Flaticon">www.flaticon.com</a></div>
<div>Monopoly tokens image based on <a href="https://monopoly.fandom.com/wiki/Tokens?file=Monopoly_1946-tokens.jpg">Monopoly Wiki</a></div>
<div>Notebook erstellt von <a href="mailto:simon.stone@tu-dresden.de?Subject=Frage%20zu%20Jupyter%20Notebook%20Immobilienpreise" target="_top">Simon Stone</a></div>