R, Random Forest, XGBoost, K-Nearest Neighbors (KNN)
Predicted housing prices on Airbnb Los Angeles houses dataset by training and benchmarking machine learning models including Random Forest, K-Nearest Neighbors (KNN) in R (random forest, xgboost, knn). I performed log-transformation, stratified cross validation, and hyperparameter tuning on models; yielded 63% R-squared and 54% RMSE on best performing model using Bagging
- Technologies: R Studio
- R Version: 4.2.2
- Packages Used:
- Data Manipulation: tidyverse, tidymodels
- Data Visualization: ggplot2, yardstick, corrplot, rpart.plot
- Machine Learning: randomForest, xgboost, vip, ranger, kernlab, kknn, baguette
- http://insideairbnb.com/get-the-data/
- Inside Airbnb is a mission driven project that provides data and advocacy about Airbnb's impact on residential communities. We work towards a vision where data and information empower communities to understand, decide and control the role of renting residential homes to tourists.
- http://data.insideairbnb.com/united-states/ca/los-angeles/2022-09-09/visualisations/listings.csv
- Summary information and metrics for listings in Los Angeles (good for visualisations).
Our variables of focus:
id
—id
—
Desc
Desc