Data modeling using Python Jupyter Notebook. Created July 2017.
Uses Kaggle dataset to model Regression, Clustering, and Classification algorithms.
Expedia has provided a dataset of customer searches, some of which includes what they searched for (number rooms, number people, dates, location), location where they the search from (site, channel, country where they initiated search), and the result (clicks and whether it resulted in a booking). Expeida grouped hotels into 100 clusters based on hotel popularity, rating, user review rating, price, distance from city center, and amenities.The goal of the Kaggle Competition is to predict which hotel cluster an Expedia user will book, based on their searching attributes and hotel information. I used the dataset to demonstrate 3 algorithms.
Dataset is available from Kaggle: https://www.kaggle.com/c/expedia-hotel-recommendations.
Part 1 involved exploring the dataset for meaningful patterns using statitical analysis, graphical, and numeric summeries.
Link to Part 1 Notebook: https://github.com/BethHilbert/Python-Expedia/blob/master/Expedia%20(part%201)%20Data%20Exploration.ipynb
Part 2 involved models for Regression, Clustering, and Classification Algorithms. The code is designed to easily manipulate the variables in order to experiement with different inputs (such as assigning a different target or number of clusters).
Link to Part 2 Notebook: https://github.com/BethHilbert/Python-Expedia/blob/master/Expedia%20(part2)%20Data%20Analytics%20Models.ipynb