# Predicting preferred destination  based on taste and preference

The goal is to build a machine learning model that can predict hotel ratings based on customer reviews, budget, specific locations, and the type of residence. The dataset is scraped from TripAdvisor and it contains information about various hotels, including their ratings, reviews, amenities, pricing, geographical coordinates, and residence types (e.g., hotel, bed and breakfast, specialty lodging). By analyzing the text reviews along with these additional factors, the objective is to develop a model that can accurately predict the ratings of new, unseen hotels based on customer reviews, budget constraints, location preferences, and residence type.

Approach:

Data Preprocessing: Clean and preprocess the text reviews by removing stopwords, punctuation, and performing tokenization. Convert the text data into a numerical representation suitable for modeling. Handle missing values, if any, in the budget, location, and residence type columns.

Feature Engineering: Extract additional features from the dataset, such as review sentiment scores, review length, and any other relevant information. Engineer new features related to budget, location, and residence type, such as price range categories, geographical distance from landmarks, and one-hot encoding of residence types.

Model Selection: Experiment with different supervised learning models, such as linear regression, decision trees, random forests, or neural networks, to find the best model for predicting hotel ratings considering customer reviews, budget, location, and residence type. Evaluate the models using appropriate evaluation metrics like mean squared error (MSE) or mean absolute error (MAE).

Model Training and Evaluation: Split the dataset into training and testing sets. Train the selected model on the training set and evaluate its performance on the testing set. Fine-tune the model parameters to improve its accuracy. Perform cross-validation to assess the model's generalization capabilities.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns

import json
import glob
import re

In [None]:
def read_json_files(json_files):
    dfs = []
    for file in json_files:
        with open(file) as f:
            json_data = json.load(f)
            df = pd.DataFrame(json_data)
            dfs.append(df)

    return pd.concat(dfs, ignore_index=True)



In [41]:
df=pd.read_csv(r"C:\Users\User\Desktop\travel-destination-recommendation-sys\compiled_data.csv")
df

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,id,type,category,subcategories,name,locationString,description,image,photoCount,awards,...,hours,menuWebUrl,establishmentTypes,ownersTopReasons,rentalDescriptions,photos,bedroomInfo,bathroomInfo,bathCount,baseDailyRate
0,4022415,ATTRACTION,attraction,['Nightlife'],Soho House Sharm El Sheikh,"Sharm El Sheikh, South Sinai, Red Sea and Sinai",Welcome to Soho House Sharm El Sheikh! The bes...,https://media-cdn.tripadvisor.com/media/photo-...,119,[],...,,,,,,,,,,
1,19730066,ATTRACTION,attraction,"['Shopping', 'Museums']",Nobles Art Gallery,"Luxor, Nile River Valley",Nobles Art Gallery is the best store in Luxor ...,https://media-cdn.tripadvisor.com/media/photo-...,105,[],...,,,,,,,,,,
2,8011182,ATTRACTION,attraction,['Outdoor Activities'],YallaHorse Riding,"El Gouna, Hurghada, Red Sea and Sinai",Riding in El Gouna is an unforgettable experie...,https://media-cdn.tripadvisor.com/media/photo-...,362,[],...,,,,,,,,,,
3,7371664,ATTRACTION,attraction,['Spas & Wellness'],Mividaspa at Jaz Aquamarine Resort,"Hurghada, Red Sea and Sinai",Mividaspa is fast earning a top reputation due...,https://media-cdn.tripadvisor.com/media/photo-...,67,[],...,,,,,,,,,,
4,17523327,ATTRACTION,attraction,"['Other', 'Transportation']",Sharm Airport Transfers Karim,"Sharm El Sheikh, South Sinai, Red Sea and Sinai",Airport transfer service safe reliable drivers...,https://media-cdn.tripadvisor.com/media/photo-...,25,[],...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35831,12233032,HOTEL,hotel,['Specialty Lodging'],Sandcreek Village,"Joal Fadiouth, La Petite Cote, Thies Region",,https://media-cdn.tripadvisor.com/media/partne...,0,[],...,,,,,,,,,,
35832,10071000,HOTEL,hotel,['Bed and Breakfast'],Chambres d'Hotes,"Nianing, La Petite Cote, Thies Region",,,0,[],...,,,,,,,,,,
35833,23686418,HOTEL,hotel,['Specialty Lodging'],Sessene,"Fatick, Fatick Region",,,0,[],...,,,,,,,,,,
35834,15756049,HOTEL,hotel,['Bed and Breakfast'],Havre de paix aux Almadie,"Ngor, Dakar, Dakar Region",,,0,[],...,,,,,,,,,,


In [42]:
df.isnull().sum()

id                   0
type                 0
category             0
subcategories     1339
name                 1
                 ...  
photos           34497
bedroomInfo      35133
bathroomInfo     34500
bathCount        34497
baseDailyRate    34568
Length: 65, dtype: int64

In [43]:
df.columns

Index(['id', 'type', 'category', 'subcategories', 'name', 'locationString',
       'description', 'image', 'photoCount', 'awards', 'rankingPosition',
       'rating', 'rawRanking', 'phone', 'address', 'addressObj', 'localName',
       'localAddress', 'localLangCode', 'email', 'latitude', 'longitude',
       'webUrl', 'website', 'rankingString', 'rankingDenominator',
       'neighborhoodLocations', 'nearestMetroStations', 'ancestorLocations',
       'ratingHistogram', 'numberOfReviews', 'reviewTags', 'reviews',
       'booking', 'offerGroup', 'subtype', 'hotelClass',
       'hotelClassAttribution', 'amenities', 'numberOfRooms', 'priceLevel',
       'priceRange', 'roomTips', 'checkInDate', 'checkOutDate', 'offers',
       'guideFeaturedInCopy', 'isClosed', 'isLongClosed', 'openNowText',
       'cuisines', 'mealTypes', 'dishes', 'features', 'dietaryRestrictions',
       'hours', 'menuWebUrl', 'establishmentTypes', 'ownersTopReasons',
       'rentalDescriptions', 'photos', 'bedroomInfo', '

In [44]:
df.reviewTags.value_counts()

[]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      

In [45]:
# Check null values and filter columns with more than 4000 null values
null_counts = df.isnull().sum()
columns_above_threshold = null_counts[null_counts > 10000].index

# Print the columns with more than 4000 null values
list(columns_above_threshold)


['description',
 'phone',
 'localName',
 'localAddress',
 'localLangCode',
 'email',
 'website',
 'booking',
 'offerGroup',
 'subtype',
 'hotelClass',
 'hotelClassAttribution',
 'numberOfRooms',
 'priceLevel',
 'priceRange',
 'roomTips',
 'checkInDate',
 'checkOutDate',
 'offers',
 'guideFeaturedInCopy',
 'isClosed',
 'isLongClosed',
 'openNowText',
 'cuisines',
 'mealTypes',
 'dishes',
 'features',
 'dietaryRestrictions',
 'hours',
 'menuWebUrl',
 'establishmentTypes',
 'ownersTopReasons',
 'rentalDescriptions',
 'photos',
 'bedroomInfo',
 'bathroomInfo',
 'bathCount',
 'baseDailyRate']

In [46]:
# we will drop the following columns because they do not have any contribution to our objectives.
# some also contain too many null values to fill. 
cols_to_drop = columns_above_threshold

df.drop(columns=cols_to_drop, inplace=True)

In [47]:
list(df.columns)

['id',
 'type',
 'category',
 'subcategories',
 'name',
 'locationString',
 'image',
 'photoCount',
 'awards',
 'rankingPosition',
 'rating',
 'rawRanking',
 'address',
 'addressObj',
 'latitude',
 'longitude',
 'webUrl',
 'rankingString',
 'rankingDenominator',
 'neighborhoodLocations',
 'nearestMetroStations',
 'ancestorLocations',
 'ratingHistogram',
 'numberOfReviews',
 'reviewTags',
 'reviews',
 'amenities']

In [48]:
df[['locationString','rankingPosition','rawRanking','rankingString','rankingDenominator']]

Unnamed: 0,locationString,rankingPosition,rawRanking,rankingString,rankingDenominator
0,"Sharm El Sheikh, South Sinai, Red Sea and Sinai",2.0,4.349033,#2 of 45 Nightlife in Sharm El Sheikh,45.0
1,"Luxor, Nile River Valley",1.0,4.434324,#1 of 59 Shopping in Luxor,59.0
2,"El Gouna, Hurghada, Red Sea and Sinai",4.0,4.404173,#4 of 86 Outdoor Activities in El Gouna,86.0
3,"Hurghada, Red Sea and Sinai",1.0,4.362678,#1 of 35 Spas & Wellness in Hurghada,35.0
4,"Sharm El Sheikh, South Sinai, Red Sea and Sinai",1.0,4.453663,#1 of 104 Transportation in Sharm El Sheikh,104.0
...,...,...,...,...,...
35831,"Joal Fadiouth, La Petite Cote, Thies Region",,,,
35832,"Nianing, La Petite Cote, Thies Region",,,,
35833,"Fatick, Fatick Region",,,,
35834,"Ngor, Dakar, Dakar Region",,,,


In [49]:
df[['name','rankingString', 'type']]

Unnamed: 0,name,rankingString,type
0,Soho House Sharm El Sheikh,#2 of 45 Nightlife in Sharm El Sheikh,ATTRACTION
1,Nobles Art Gallery,#1 of 59 Shopping in Luxor,ATTRACTION
2,YallaHorse Riding,#4 of 86 Outdoor Activities in El Gouna,ATTRACTION
3,Mividaspa at Jaz Aquamarine Resort,#1 of 35 Spas & Wellness in Hurghada,ATTRACTION
4,Sharm Airport Transfers Karim,#1 of 104 Transportation in Sharm El Sheikh,ATTRACTION
...,...,...,...
35831,Sandcreek Village,,HOTEL
35832,Chambres d'Hotes,,HOTEL
35833,Sessene,,HOTEL
35834,Havre de paix aux Almadie,,HOTEL


In [50]:


# Assuming your data is in a DataFrame called 'df' and the column is named 'rankingString'
# Create new columns
df['RankingType'] = ""
df['Location'] = ""
df['Numerator'] = ""
df['Denominator'] = ""

# Iterate through the rows and extract the information
for index, row in df.iterrows():
    # Check if the value is NaN
    if pd.isnull(row['rankingString']):
        continue

    if match := re.match(
        r'#(\d+)\s+of\s+(\d+)\s+(.*?)\s+in\s+(.*?)$', row['rankingString']
    ):
        numerator = match.group(1)
        denominator = match.group(2)
        ranking_type = match.group(3)
        location = match.group(4)

        # Update the new columns
        df.at[index, 'RankingType'] = ranking_type
        df.at[index, 'Location'] = location
        df.at[index, 'Numerator'] = numerator
        df.at[index, 'Denominator'] = denominator



In [51]:
df.RankingType.value_counts()

                             9930
Specialty lodging            7287
B&Bs / Inns                  6045
hotels                       4718
things to do                 3263
Outdoor Activities           1298
Tours                         693
Boat Tours & Water Sports     558
Transportation                532
places to eat                 326
hotel                         243
B&B / Inn                     239
Shopping                      162
Food & Drink                  161
Nightlife                     126
Spas & Wellness               115
Fun & Games                    73
Classes & Workshops            37
Nature & Parks                 12
Museums                         8
Concerts & Shows                7
Water & Amusement Parks         1
Sights & Landmarks              1
Traveler Resources              1
Name: RankingType, dtype: int64

In [52]:
df.columns

Index(['id', 'type', 'category', 'subcategories', 'name', 'locationString',
       'image', 'photoCount', 'awards', 'rankingPosition', 'rating',
       'rawRanking', 'address', 'addressObj', 'latitude', 'longitude',
       'webUrl', 'rankingString', 'rankingDenominator',
       'neighborhoodLocations', 'nearestMetroStations', 'ancestorLocations',
       'ratingHistogram', 'numberOfReviews', 'reviewTags', 'reviews',
       'amenities', 'RankingType', 'Location', 'Numerator', 'Denominator'],
      dtype='object')

After splitting the rankingString column to their respective elements. we observe below that the new column ranking type has some elements that are similar but grouped differently. 

In [53]:
df.RankingType.value_counts()

                             9930
Specialty lodging            7287
B&Bs / Inns                  6045
hotels                       4718
things to do                 3263
Outdoor Activities           1298
Tours                         693
Boat Tours & Water Sports     558
Transportation                532
places to eat                 326
hotel                         243
B&B / Inn                     239
Shopping                      162
Food & Drink                  161
Nightlife                     126
Spas & Wellness               115
Fun & Games                    73
Classes & Workshops            37
Nature & Parks                 12
Museums                         8
Concerts & Shows                7
Water & Amusement Parks         1
Sights & Landmarks              1
Traveler Resources              1
Name: RankingType, dtype: int64

We will then combine similar elements to have reduced distribution within the column

In [54]:
# Define the mappings to combine similar values
mappings = {
    'hotel': 'hotels',
    'B&B / Inn': 'B&Bs / Inns',
    'Sights & Landmarks': 'Nature & Parks',
    'Fun & Games': 'Outdoor Activities',
    'Boat Tours & Water Sports': 'Water & Amusement Parks',
    'Traveler Resources': 'Shopping',
    'Concerts & Shows': 'Nightlife',
    'Food & Drink': 'places to eat',
    'Nature & Parks': 'things to do',
    'Museums': 'things to do',
    'Tours' : 'things to do',
    'Outdoor Activities': 'things to do',
    'B&Bs / Inns': 'Specialty lodging'
}

# Replace the values in the 'Ranking Type' column
df['RankingType'] = df['RankingType'].replace(mappings)

In [55]:
df

Unnamed: 0,id,type,category,subcategories,name,locationString,image,photoCount,awards,rankingPosition,...,ancestorLocations,ratingHistogram,numberOfReviews,reviewTags,reviews,amenities,RankingType,Location,Numerator,Denominator
0,4022415,ATTRACTION,attraction,['Nightlife'],Soho House Sharm El Sheikh,"Sharm El Sheikh, South Sinai, Red Sea and Sinai",https://media-cdn.tripadvisor.com/media/photo-...,119,[],2.0,...,"[{'id': '297555', 'name': 'Sharm El Sheikh', '...","{'count1': 1, 'count2': 3, 'count3': 4, 'count...",198,"[{'text': 'nice cocktails', 'reviews': 4}, {'t...",[],,Nightlife,Sharm El Sheikh,2,45
1,19730066,ATTRACTION,attraction,"['Shopping', 'Museums']",Nobles Art Gallery,"Luxor, Nile River Valley",https://media-cdn.tripadvisor.com/media/photo-...,105,[],1.0,...,"[{'id': '294205', 'name': 'Luxor', 'abbreviati...","{'count1': 0, 'count2': 1, 'count3': 0, 'count...",211,"[{'text': 'winter palace', 'reviews': 16}, {'t...",[],,Shopping,Luxor,1,59
2,8011182,ATTRACTION,attraction,['Outdoor Activities'],YallaHorse Riding,"El Gouna, Hurghada, Red Sea and Sinai",https://media-cdn.tripadvisor.com/media/photo-...,362,[],4.0,...,"[{'id': '297548', 'name': 'El Gouna', 'abbrevi...","{'count1': 0, 'count2': 1, 'count3': 1, 'count...",269,"[{'text': 'well taken care', 'reviews': 10}, {...",[],,things to do,El Gouna,4,86
3,7371664,ATTRACTION,attraction,['Spas & Wellness'],Mividaspa at Jaz Aquamarine Resort,"Hurghada, Red Sea and Sinai",https://media-cdn.tripadvisor.com/media/photo-...,67,[],1.0,...,"[{'id': '297549', 'name': 'Hurghada', 'abbrevi...","{'count1': 1, 'count2': 1, 'count3': 5, 'count...",372,"[{'text': 'indian head massage', 'reviews': 2}...",[],,Spas & Wellness,Hurghada,1,35
4,17523327,ATTRACTION,attraction,"['Other', 'Transportation']",Sharm Airport Transfers Karim,"Sharm El Sheikh, South Sinai, Red Sea and Sinai",https://media-cdn.tripadvisor.com/media/photo-...,25,[],1.0,...,"[{'id': '297555', 'name': 'Sharm El Sheikh', '...","{'count1': 1, 'count2': 1, 'count3': 1, 'count...",351,"[{'text': 'always on time', 'reviews': 31}, {'...",[],,Transportation,Sharm El Sheikh,1,104
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35831,12233032,HOTEL,hotel,['Specialty Lodging'],Sandcreek Village,"Joal Fadiouth, La Petite Cote, Thies Region",https://media-cdn.tripadvisor.com/media/partne...,0,[],,...,"[{'id': '1019207', 'name': 'Joal Fadiouth', 'a...","{'count1': 0, 'count2': 0, 'count3': 0, 'count...",0,[],[],[],,,,
35832,10071000,HOTEL,hotel,['Bed and Breakfast'],Chambres d'Hotes,"Nianing, La Petite Cote, Thies Region",,0,[],,...,"[{'id': '1858023', 'name': 'Nianing', 'abbrevi...","{'count1': 0, 'count2': 0, 'count3': 0, 'count...",0,[],[],[],,,,
35833,23686418,HOTEL,hotel,['Specialty Lodging'],Sessene,"Fatick, Fatick Region",,0,[],,...,"[{'id': '12552042', 'name': 'Fatick', 'abbrevi...","{'count1': 0, 'count2': 0, 'count3': 0, 'count...",0,[],[],[],,,,
35834,15756049,HOTEL,hotel,['Bed and Breakfast'],Havre de paix aux Almadie,"Ngor, Dakar, Dakar Region",,0,[],,...,"[{'id': '1632916', 'name': 'Ngor', 'abbreviati...","{'count1': 0, 'count2': 0, 'count3': 0, 'count...",0,[],[],[],,,,


In [56]:
df.RankingType

0              Nightlife
1               Shopping
2           things to do
3        Spas & Wellness
4         Transportation
              ...       
35831                   
35832                   
35833                   
35834                   
35835                   
Name: RankingType, Length: 35836, dtype: object

In [57]:
empty_rows = df[df['RankingType'].isnull() | df['RankingType'].eq('')]
empty_rows[['RankingType', 'name', 'type']]


Unnamed: 0,RankingType,name,type
25,,Let's Explore Egypt,ATTRACTION
26,,Egypt Tailor Made Day Tours,ATTRACTION
47,,Emo Tours Egypt,ATTRACTION
88,,Deluxe Tours Egypt,ATTRACTION
100,,Deluxe Travel Egypt,ATTRACTION
...,...,...,...
35831,,Sandcreek Village,HOTEL
35832,,Chambres d'Hotes,HOTEL
35833,,Sessene,HOTEL
35834,,Havre de paix aux Almadie,HOTEL


In [58]:
speciality_lodging_rows = empty_rows[empty_rows['type'] == 'HOTEL'][['RankingType', 'name', 'type']]
speciality_lodging_rows

Unnamed: 0,RankingType,name,type
2111,,Onaty Ka Guest House,HOTEL
3869,,Markan Guest House and Pension,HOTEL
3870,,Sodere Hotel - Au,HOTEL
3872,,Green Garden Guesthouse,HOTEL
3875,,TantosTina Hotel,HOTEL
...,...,...,...
35831,,Sandcreek Village,HOTEL
35832,,Chambres d'Hotes,HOTEL
35833,,Sessene,HOTEL
35834,,Havre de paix aux Almadie,HOTEL


In [59]:
null_values = df[df['RankingType'].isna()]
null_values

Unnamed: 0,id,type,category,subcategories,name,locationString,image,photoCount,awards,rankingPosition,...,ancestorLocations,ratingHistogram,numberOfReviews,reviewTags,reviews,amenities,RankingType,Location,Numerator,Denominator


In [60]:
# Replace NaN values with "bathroom only" where type is "attraction"
df.loc[(df['type'] == 'RESTAURANT') & (df['amenities'].isna()), 'amenities'] = 'restaurant'


In [61]:
df.loc[(df['type'] == 'ATTRACTION') & (df['amenities'].isna()), 'amenities'] = 'bathroom only'

In [62]:
df['amenities'] = df['amenities'].apply(lambda x: ', '.join(x) if isinstance(x, list) else '')


In [63]:
df['amenities'].isnull().value_counts()

False    35836
Name: amenities, dtype: int64

In [64]:
df['amenities'].isna().value_counts()

False    35836
Name: amenities, dtype: int64

In [65]:

hotel_rows = df[df['type'] == 'RESTAURANT']
hotel_amenities = hotel_rows['amenities']
hotel_amenities

7648      
7649      
7650      
7651      
7653      
        ..
26858     
26860     
26862     
26896     
26898     
Name: amenities, Length: 416, dtype: object

In [66]:
df[['type', 'amenities']]

Unnamed: 0,type,amenities
0,ATTRACTION,
1,ATTRACTION,
2,ATTRACTION,
3,ATTRACTION,
4,ATTRACTION,
...,...,...
35831,HOTEL,
35832,HOTEL,
35833,HOTEL,
35834,HOTEL,


>>>>Reviews dataset

In [70]:
data=pd.read_csv(r"C:\Users\User\Desktop\CAPSTONE!!!\travel-destination-recommendation-sys\reviews_data.csv")
data

Unnamed: 0.1,Unnamed: 0,id,url,title,lang,locationId,publishedDate,publishedPlatform,rating,helpfulVotes,travelDate,text,user,ownerResponse,subratings,machineTranslated,machineTranslatable,photos,placeInfo
0,0,863480416,https://www.tripadvisor.com/ShowUserReviews-g2...,Must See,en,2189822,2022-10-06T20:13:49-04:00,Desktop,5,0,2022-09,Gee is a passionate tour guide. The animals a...,"{'userId': 'A87669AAD9DA05FFBD46F1334B329FFD',...",,[],False,False,[],"{'id': '2189822', 'name': 'CARACAL Biodiversit..."
1,1,856328161,https://www.tripadvisor.com/ShowUserReviews-g2...,Great tour,en,2189822,2022-08-25T06:53:49-04:00,Desktop,5,0,2022-08,Lots to see. Easy to get to from the Safari Lo...,"{'userId': '9FFED7DDC68883BBB8F4024333970E9A',...",,[],False,False,[],"{'id': '2189822', 'name': 'CARACAL Biodiversit..."
2,2,847451595,https://www.tripadvisor.com/ShowUserReviews-g2...,zoo for conserved animals and birds,en,2189822,2022-07-11T23:28:07-04:00,Desktop,4,0,2022-07,a kind of a zoo for injured and saved animals ...,"{'userId': 'B5E56A483B579518DDD82A3DA0E94487',...",,[],False,False,"[{'id': '613423342', 'locations': [{'name': 'C...","{'id': '2189822', 'name': 'CARACAL Biodiversit..."
3,3,731343195,https://www.tripadvisor.com/ShowUserReviews-g2...,Great place to see some of the smaller wildlif...,en,2189822,2019-12-08T03:54:09-05:00,Desktop,5,0,2019-11,They do great rehabilitating injured animals. ...,"{'userId': '882D0A6C7152105BB0D83C84F3CB160D',...",,[],False,False,"[{'id': '440235996', 'locations': [{'name': 'C...","{'id': '2189822', 'name': 'CARACAL Biodiversit..."
4,4,720927932,https://www.tripadvisor.com/ShowUserReviews-g2...,Worth it just to play with Badgy,en,2189822,2019-10-24T03:50:09-04:00,Mobile,5,1,2019-10,"We took the guided tour from Isaac, who was gr...","{'userId': '203EBC7F3F51AAAA39A87D2E58842C76',...",,[],False,False,"[{'id': '432829268', 'locations': [{'name': 'C...","{'id': '2189822', 'name': 'CARACAL Biodiversit..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
78681,78681,157601077,https://www.tripadvisor.com/ShowUserReviews-g7...,Ideal for quiet relaxation.,en,2720312,2013-04-13T07:01:47-04:00,Desktop,5,1,2013-04,Nothing I disliked.\nStopped her for an excell...,"{'userId': 'DE7F7643421284F6E26B31283D2CCB85',...",,[],False,False,[],"{'id': '2720312', 'name': 'Igongo Cultural Cen..."
78682,78682,135682898,https://www.tripadvisor.com/ShowUserReviews-g7...,Eriijukiro - the wonderful Cultural Centre at ...,en,2720312,2012-07-29T05:01:29-04:00,Desktop,5,3,2012-07,"For travellers to the south west of Uganda, th...","{'userId': 'D9205755480636B049F9DAFB8BE6FF12',...",,[],False,False,"[{'id': '45180838', 'locations': [{'name': 'Ig...","{'id': '2720312', 'name': 'Igongo Cultural Cen..."
78683,78683,129461615,https://www.tripadvisor.com/ShowUserReviews-g7...,A very pleasant stop outside Mbarara!,en,2720312,2012-05-06T08:45:16-04:00,Desktop,4,3,2012-04,Located just a few miles outside Mbarara and h...,"{'userId': 'FB3E9894020549D01D0468808AE93A5C',...",,[],False,False,"[{'id': '41254017', 'locations': [{'name': 'Ig...","{'id': '2720312', 'name': 'Igongo Cultural Cen..."
78684,78684,331367989,https://www.tripadvisor.com/ShowUserReviews-g7...,Igongo,tr,2720312,2015-12-08T00:45:50-05:00,Desktop,4,2,2015-11,Çok kaliteli ve temiz bir tesis . Yemekleri ço...,"{'userId': '8C31D999A4FC2AB2DC9824E3FFF82BE6',...",,[],False,False,"[{'id': '162787052', 'locations': [{'name': 'I...","{'id': '2720312', 'name': 'Igongo Cultural Cen..."


In [73]:
cols_to_drop=['machineTranslated','machineTranslatable','photos','ownerResponse','subratings','publishedDate','publishedPlatform','url','helpfulVotes','travelDate']
data=data.drop(columns=cols_to_drop)
data

Unnamed: 0.1,Unnamed: 0,id,title,lang,locationId,rating,text,user,placeInfo
0,0,863480416,Must See,en,2189822,5,Gee is a passionate tour guide. The animals a...,"{'userId': 'A87669AAD9DA05FFBD46F1334B329FFD',...","{'id': '2189822', 'name': 'CARACAL Biodiversit..."
1,1,856328161,Great tour,en,2189822,5,Lots to see. Easy to get to from the Safari Lo...,"{'userId': '9FFED7DDC68883BBB8F4024333970E9A',...","{'id': '2189822', 'name': 'CARACAL Biodiversit..."
2,2,847451595,zoo for conserved animals and birds,en,2189822,4,a kind of a zoo for injured and saved animals ...,"{'userId': 'B5E56A483B579518DDD82A3DA0E94487',...","{'id': '2189822', 'name': 'CARACAL Biodiversit..."
3,3,731343195,Great place to see some of the smaller wildlif...,en,2189822,5,They do great rehabilitating injured animals. ...,"{'userId': '882D0A6C7152105BB0D83C84F3CB160D',...","{'id': '2189822', 'name': 'CARACAL Biodiversit..."
4,4,720927932,Worth it just to play with Badgy,en,2189822,5,"We took the guided tour from Isaac, who was gr...","{'userId': '203EBC7F3F51AAAA39A87D2E58842C76',...","{'id': '2189822', 'name': 'CARACAL Biodiversit..."
...,...,...,...,...,...,...,...,...,...
78681,78681,157601077,Ideal for quiet relaxation.,en,2720312,5,Nothing I disliked.\nStopped her for an excell...,"{'userId': 'DE7F7643421284F6E26B31283D2CCB85',...","{'id': '2720312', 'name': 'Igongo Cultural Cen..."
78682,78682,135682898,Eriijukiro - the wonderful Cultural Centre at ...,en,2720312,5,"For travellers to the south west of Uganda, th...","{'userId': 'D9205755480636B049F9DAFB8BE6FF12',...","{'id': '2720312', 'name': 'Igongo Cultural Cen..."
78683,78683,129461615,A very pleasant stop outside Mbarara!,en,2720312,4,Located just a few miles outside Mbarara and h...,"{'userId': 'FB3E9894020549D01D0468808AE93A5C',...","{'id': '2720312', 'name': 'Igongo Cultural Cen..."
78684,78684,331367989,Igongo,tr,2720312,4,Çok kaliteli ve temiz bir tesis . Yemekleri ço...,"{'userId': '8C31D999A4FC2AB2DC9824E3FFF82BE6',...","{'id': '2720312', 'name': 'Igongo Cultural Cen..."


In [75]:
data['placeid'] = data['placeInfo'].apply(lambda x: x.get('id') if isinstance(x, dict) else None)
data.head()

Unnamed: 0.1,Unnamed: 0,id,title,lang,locationId,rating,text,user,placeInfo,placeid
0,0,863480416,Must See,en,2189822,5,Gee is a passionate tour guide. The animals a...,"{'userId': 'A87669AAD9DA05FFBD46F1334B329FFD',...","{'id': '2189822', 'name': 'CARACAL Biodiversit...",
1,1,856328161,Great tour,en,2189822,5,Lots to see. Easy to get to from the Safari Lo...,"{'userId': '9FFED7DDC68883BBB8F4024333970E9A',...","{'id': '2189822', 'name': 'CARACAL Biodiversit...",
2,2,847451595,zoo for conserved animals and birds,en,2189822,4,a kind of a zoo for injured and saved animals ...,"{'userId': 'B5E56A483B579518DDD82A3DA0E94487',...","{'id': '2189822', 'name': 'CARACAL Biodiversit...",
3,3,731343195,Great place to see some of the smaller wildlif...,en,2189822,5,They do great rehabilitating injured animals. ...,"{'userId': '882D0A6C7152105BB0D83C84F3CB160D',...","{'id': '2189822', 'name': 'CARACAL Biodiversit...",
4,4,720927932,Worth it just to play with Badgy,en,2189822,5,"We took the guided tour from Isaac, who was gr...","{'userId': '203EBC7F3F51AAAA39A87D2E58842C76',...","{'id': '2189822', 'name': 'CARACAL Biodiversit...",
