The code was written in Python, using Jupyter Notebook. Coding Environment Controlled by Anaconda, Default Version of Python is 3.6. The Liberaries will be used:
- Pandas
- Numpy
- Matplotlib
- Sklearn.model_selection.train_test_split
- Sklearn.tree.DecisionTreeRegressor
- Sklearn.grid_search.GridSearchCV
- sklearn.metrics.r2_score
This project aims to follow the CRISP-DM to address the three questions which related to business or real-world applications. The dataset is picked up from Kaggle, contributed by AirBnB, which contains the rent data about Seattle. I would like to process whole data and try to find some valuable data to help owner understand which feature could improve the rate of rent, moreover, whether train a model to predict. Three questions are:
- What features influence the rating of house?
- When is the most popular time for this area?
- Could we create a model to predict the price?
There is a Jupyter Notebook file that shows whole progress of how to process data. The original datasets download from Kaggle, the link will be provided below.
In this project, I have analized the dataset about renting house from the Kaggle, which provided by AirBnb. The main motivation is to find out more insights for Airbnb business owners and customer in Seattle.
Through the analyst, we can see, a rating for a house is quite important. To improve the rating, number of reviews, cleaning fee, price, available days and house's owner is the Top 5 feature which has a big influence for attacting customer. The popular time for the customer is January and July.
Futhuremore, we could use scientice tool, such as data analyst to mining more valuable information
Blog is here
DataSet comes from Kaggle
Framework and knowledges for coding project was provided by Udacity.