# Language_association

This step identifies the language of the reviews. Through the use of Python's langdetect library, which supports 55 languages (ISO 639-1 codes), it was possible to identify the language of the reviews, giving each review the respective language identification code. The text of the review is taken as input and, through langdetect, the language in which it is written is identified. The result is an update of the reviews table with the addition of the Language column which contains, respectively, the ISO 639-1 codes associated with the language of each review.

In [13]:
import pandas as pd
import numpy as np
from langdetect import detect
import sys
import codecs
from googletrans import Translator
df_reviews = pd.read_json("data/reviews_with_geo.json")
df_reviews

Unnamed: 0,Name,City,Rating,Review,Hometown,Date_of_stay,Trip_type,geonames_id
0,Delle Vittorie Luxury Rooms & Suites,Palermo,5,Excellent facilities in a spacious luxurious l...,Regno Unito,settembre 2019,,2635167
1,Bed and Breakfast Il Rifugio,Caltanissetta,5,location molto comoda...Ti permette di postegg...,Italia,settembre 2019,Ha viaggiato con amici,3175395
2,Delle Vittorie Luxury Rooms & Suites,Palermo,5,estuvimos alojados 4 noches y no puedo poner n...,Spagna,febbraio 2020,,2510769
3,Garibaldi R&B,Messina,5,Graziosissimo R&B con vista sullo stretto. Ott...,Italia,gennaio 2020,Ha viaggiato per affari,3175395
4,Bed and Breakfast Il Rifugio,Caltanissetta,5,Arrivato in questo B&B quasi X caso e cioè gra...,Italia,giugno 2019,Ha viaggiato da solo,3175395
...,...,...,...,...,...,...,...,...
195,Bed & Breakfast Giulia,Caltanissetta,4,Sono stato con la mia famiglia dal 29 dicembre...,Italia,dicembre 2017,Ha viaggiato con la famiglia,3175395
196,Delle Vittorie Luxury Rooms & Suites,Palermo,5,Ho soggiornato più volte presso la struttura e...,Italia,settembre 2019,Ha viaggiato per affari,3175395
197,Garibaldi R&B,Messina,4,"Ci sono stato più volte, con la famiglia e da ...",Italia,ottobre 2015,Ha viaggiato con la famiglia,3175395
198,Bed & Breakfast Giulia,Caltanissetta,3,"Premetto che sono stato ospite di amici, quind...",Italia,settembre 2017,,3175395


In [14]:
df_reviews.insert(8, 'Language', '')
df_reviews.insert(6, 'TextEn', '')

For the subsequent analysis of the data, including a comparison with the services, it was necessary to translate all the reviews available into English. The reviews are stored in a table in which each review corresponds to an ID, the nationality of origin, the capital where the accommodation is located, the rating, the type of trip and the date of stay. The text of the reviews is translated into English using the Python Googletrans library, which implements the Google Translate API. The reviews in English will be saved in a new DataFrame which will contain, in correspondence with the review in the original language, the same translated into English, saved in a new column (review_en).

In [None]:
translator = Translator()
for i in range(0,len(df_reviews)):
    lang = detect(df_reviews['Review'][i])
    df_reviews.Language.loc[i] = lang
    translations = translator.translate(df_reviews['Review'][i], dest='en', src=lang)
    trad = translations.text
    df_reviews['TextEn'].loc[i] = trad
df_reviews

In [None]:
df_reviews.to_json("data/reviews_translated.json")