# Project: Property Rentals

### 📖 Background
You have been hired by Inn the Neighborhood, an online platform that allows people to rent out their properties for short stays. Currently, the webpage for renters has a conversion rate of 2%. This means that most people leave the platform without signing up.

The product manager would like to increase this conversion rate. They are interested in developing an application to help people estimate the money they could earn renting out their living space. They hope that this would make people more likely to sign up.

The company has provided you with a dataset that includes details about each property rented, as well as the price charged per night. They want to avoid estimating prices that are more than 25 dollars off of the actual price, as this may discourage people.

### Table of Content

1. **[Getting to know the dataset](#dataset)**  <br>
    1.1 [Distributions & Descriptive Statistics](#descr) <br>
    1.2 [Missing Values](#missing) <br>
    1.3 [Quick Fixes](#fix) <br>
    1.4 [Summary](#sum1) <br>
2. **[Exploratory Data Analysis & Cleaning](#explore)**<br>
    2.1 [How we started](#start) <br>
    2.2 [Price](#price) <br>
    2.3 [Minimum nights](#nights) <br>
    2.4 [Bathrooms](#bathrooms) <br>
    2.5 [Bedrooms](#bedroom) <br> 
    2.6 [Summary](#sum2)
3. **[Feature Selection](#select)** 
4. **[Model Selection](#model)** 
5. **[Final Recommendation](#recs)**

In [2]:
# import essential packages
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib
from scipy.stats import skew
import matplotlib.pyplot as plt

import drop
import utils
from utils.visualize import viz
from utils.stats import outliers
from utils.models import quick_test, predict

import ppscore as pps

import xgboost as xgb
from sklearn.svm import SVR
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import RFE
from sklearn.pipeline import make_pipeline
from pandas.plotting import scatter_matrix
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, RidgeCV, ElasticNet, LassoCV, LassoLarsCV
from sklearn.metrics import mean_squared_error, explained_variance_score, mean_absolute_error, r2_score

import warnings
warnings.filterwarnings('ignore')

sns.set()
%matplotlib inline
plt.rcParams["figure.figsize"] = (7, 5)

  from pandas import MultiIndex, Int64Index


### Getting to know the dataset <a class="anchor" id="dataset"/>

The dataset is comprised of 8111 entries and 9 columns that can be seen below:

| Column | Description |
| --- | --- | 
| id | Listing's identificator (numeric) |
| latitude | Latitude of the property |
| longitude | Longitude of the property |
| property_type | Name of the property (e.g., "House", "Villa") |
| room_type | Type of the room (e.g., "Private room", "Entire home/apt") | 
| bathrooms | Number of bathrooms in the property | 
| bedrooms | Number of bedrooms in the property | 
| minimum_nights | Number of nights per reservation | 
| price | Price per night |


This section includes first interactions with the dataset, data cleaning, feature engineering, and data normalization. 



In [7]:
# read the dataset & show first values
rentals = pd.read_csv('data/rentals.csv')
rentals.head(3)

Unnamed: 0,id,latitude,longitude,property_type,room_type,bathrooms,bedrooms,minimum_nights,price
0,958,37.76931,-122.43386,Apartment,Entire home/apt,1.0,1.0,1,$170.00
1,3850,37.75402,-122.45805,House,Private room,1.0,1.0,1,$99.00
2,5858,37.74511,-122.42102,Apartment,Entire home/apt,1.0,2.0,30,$235.00


In [6]:
# check dataset's shape
print('Dataset shape:', rentals.shape)

# check the number of missing values
missing = rentals[rentals.isna().any(axis=1)]
print(f'Number of rows with missing values: {len(missing)} ({round((len(missing) / len(rentals)) * 100, 2)}%)')

Dataset shape: (8111, 9)
Number of rows with missing values: 16 (0.2%)
