# Predict Lodging Price
New York Accommodation Ltd (NYA) is an online booking service for New York lodging. NYA wants to estimate a fair price of lodgings dependent on some features. Using machine learning, predict lodging prices and identify how different features influence the prediction.

#### Files
```
train.csv
test.csv
sample_output.csv
 ```

### Problem

Perform an analysis of the given data to determine how different features are related to price. Build a machine learning model that can predict the price. For each record in the test set (test.csv), predict the value of the price variable . Submit a CSV file with a header row plus each of the test entries, each on its own line. The file (submissions.csv) should have exactly 2 columns:

```
id
price
 ```

### Deliverables

Well commented Jupyter notebook
“submissions.csv”
 
Explore the data, make visualizations, and generate new features if required. Make appropriate plots, annotate the notebook with markdowns and explain necessary inferences.

#### Other Criteria Considered
* Code quality and clarity such as well formated and commented functions.
* Algorithm sophistication.
 

### Evaluation Metric
The metric used for evaluating the performance of Mean Absolute Percent Error.

MAPE = MAPE is the mean of the absolute percentage errors of forecasts.


In [0]:
# If you'd like to install packages that aren't installed by default, uncomment the last two lines of this cell and replace <package list> with a list of your packages.
# This will ensure your notebook has all the dependencies and works everywhere

#import sys
#!{sys.executable} -m pip install <package list>

In [0]:
#Libraries
import numpy as np
import pandas as pd

pd.set_option("display.max_columns", 99)

## Data Description

Column | Description
:---|:---
`District` | The name of the district 
`Neighborhood` | The name of the neighborhood area
`PropertyType` | The type of the property
`CancellationPolicy` | The easiness of cancelling the reservation
`Accomodates` | The number of guests the lodging can handle 
`RoomType` | The type of the lodging.
`Bathrooms` | The number of bathrooms
`Bedrooms` | The number of bedrooms
`CleaningFee` | The fee charged to clean the room after ending the accommodation
`Latitude` | Latitude of the property
`Longitude` | Longitude of the property
`ReviewRating`|  The average score of reviews
`Price` | The lodging price per night (Target Variable)

## Data Wrangling & Visualization

In [0]:
# Dataset is already loaded below
data = pd.read_csv("train.csv")

In [0]:
data.head()

In [0]:
#Explore columns
data.columns

In [0]:
#Description_1
data.describe()

In [0]:
#Description_2
data.describe(exclude=np.number)

## Visualization, Modeling, Machine Learning

Can you build a model that can evaluate the lodging and propose a fair price for a night dependent on the given features? Please explain your findings effectively to technical and non-technical audiences using comments and visualizations, if appropriate.

- **Build an optimized model that effectively solves the business problem.**
- **Read the test.csv file and prepare features for testing.**

In [0]:
#Loading Test data
test_data=pd.read_csv('test.csv')
test_data.head()



**The management wants to know what are the most important features for your model & model performance.  Can you tell them?**

> #### Task:
- **Visualize the top 10 features and their feature importance.**
- **Show model preformance using mean absolute percent error as your performance metric.**


> #### Task:
- **Submit the predictions on the test dataset using your optimized model** <br/>
    For each record in the test set (`test.csv`), you must predict the value of the `Price` variable. You should submit a CSV file with a header row and one row per test entry. The file (submissions.csv) should have exactly 2 columns:

The file (`submissions.csv`) should have exactly 2 columns:
   - **ID**
   - **Price**

In [0]:
#Submission
submission_df.to_csv('submissions.csv',index=False)

---