# Predicting preferred destination  based on taste and preference

The goal is to build a machine learning model that can predict hotel ratings based on customer reviews, budget, specific locations, and the type of residence. The dataset is scraped from TripAdvisor and it contains information about various hotels, including their ratings, reviews, amenities, pricing, geographical coordinates, and residence types (e.g., hotel, bed and breakfast, specialty lodging). By analyzing the text reviews along with these additional factors, the objective is to develop a model that can accurately predict the ratings of new, unseen hotels based on customer reviews, budget constraints, location preferences, and residence type.

Approach:

Data Preprocessing: Clean and preprocess the text reviews by removing stopwords, punctuation, and performing tokenization. Convert the text data into a numerical representation suitable for modeling. Handle missing values, if any, in the budget, location, and residence type columns.

Feature Engineering: Extract additional features from the dataset, such as review sentiment scores, review length, and any other relevant information. Engineer new features related to budget, location, and residence type, such as price range categories, geographical distance from landmarks, and one-hot encoding of residence types.

Model Selection: Experiment with different supervised learning models, such as linear regression, decision trees, random forests, or neural networks, to find the best model for predicting hotel ratings considering customer reviews, budget, location, and residence type. Evaluate the models using appropriate evaluation metrics like mean squared error (MSE) or mean absolute error (MAE).

Model Training and Evaluation: Split the dataset into training and testing sets. Train the selected model on the training set and evaluate its performance on the testing set. Fine-tune the model parameters to improve its accuracy. Perform cross-validation to assess the model's generalization capabilities.

In [32]:
import pandas as pd
import json
import glob

In [15]:
def read_json_files(json_files):
    dfs = []
    for file in json_files:
        with open(file) as f:
            json_data = json.load(f)
            df = pd.DataFrame(json_data)
            dfs.append(df)

    merged_df = pd.concat(dfs, ignore_index=True)
    return merged_df



In [30]:
json_files = ['Data\Egypt.json', 'Data\ethiopia.json', 'Data\Kenya.json', 'Data\Rwanda.json', 'Data\DRC.json', r'Data\Nigeria.json']
df = read_json_files(json_files)



In [31]:
df

Unnamed: 0,id,type,category,subcategories,name,locationString,description,image,photoCount,awards,...,hours,menuWebUrl,establishmentTypes,ownersTopReasons,rentalDescriptions,photos,bedroomInfo,bathroomInfo,bathCount,baseDailyRate
0,4022415,ATTRACTION,attraction,[Nightlife],Soho House Sharm El Sheikh,"Sharm El Sheikh, South Sinai, Red Sea and Sinai",Welcome to Soho House Sharm El Sheikh! The bes...,https://media-cdn.tripadvisor.com/media/photo-...,119,[],...,,,,,,,,,,
1,19730066,ATTRACTION,attraction,"[Shopping, Museums]",Nobles Art Gallery,"Luxor, Nile River Valley",Nobles Art Gallery is the best store in Luxor ...,https://media-cdn.tripadvisor.com/media/photo-...,105,[],...,,,,,,,,,,
2,8011182,ATTRACTION,attraction,[Outdoor Activities],YallaHorse Riding,"El Gouna, Hurghada, Red Sea and Sinai",Riding in El Gouna is an unforgettable experie...,https://media-cdn.tripadvisor.com/media/photo-...,362,[],...,,,,,,,,,,
3,7371664,ATTRACTION,attraction,[Spas & Wellness],Mividaspa at Jaz Aquamarine Resort,"Hurghada, Red Sea and Sinai",Mividaspa is fast earning a top reputation due...,https://media-cdn.tripadvisor.com/media/photo-...,67,[],...,,,,,,,,,,
4,17523327,ATTRACTION,attraction,"[Other, Transportation]",Sharm Airport Transfers Karim,"Sharm El Sheikh, South Sinai, Red Sea and Sinai",Airport transfer service safe reliable drivers...,https://media-cdn.tripadvisor.com/media/photo-...,25,[],...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11117,604362,HOTEL,hotel,[Hotel],Angeles Hotel,"Abuja, Federal Capital Territory",Welcome to Angeles Hotel located in a quite re...,https://media-cdn.tripadvisor.com/media/photo-...,3,[],...,,,,,,,,,,
11118,1788744,HOTEL,hotel,[Bed and Breakfast],Fariah Suites,"Bauchi, Bauchi State",,https://media-cdn.tripadvisor.com/media/photo-...,4,[],...,,,,,,,,,,
11119,1580022,HOTEL,hotel,[Bed and Breakfast],Precious Palm Royal Hotel,"Benin City, Edo State",See why so many travelers make Precious Palm R...,https://media-cdn.tripadvisor.com/media/photo-...,44,[],...,,,,,,,,,,
11120,13331370,HOTEL,hotel,[Hotel],Oaklands Hotel,"Enugu, Enugu State",,https://media-cdn.tripadvisor.com/media/photo-...,6,[],...,,,,,,,,,,


In [22]:
# data.to_csv(r"E:\Documents\data science\Capstone\data1")

In [23]:
from pandas_profiling import ProfileReport

  from pandas_profiling import ProfileReport


In [24]:
import pandas_profiling


In [25]:
profile_trip = pandas_profiling.ProfileReport(tanzania)
profile_trip.to_file("tanzania.html")

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]