### Airbnb Listings Reviews English Translation

**Goal**: The objective of this notebook is to generate a new csv file to contains all Airbnb Listings reviews in English language.

**Note**: This notebook is to be used in conjunction with the 4.3 Sentiment Analysis portion of the Airbnb Price Prediction Model (G2T5 FINAL Submission) jupyter notebook.

**Install dependencies**

In [1]:
!pip install pandas
!pip install deep-translator
!pip install datetime
import warnings
warnings.filterwarnings('ignore')













**Load Airbnb Listing dataset**

In [2]:
import pandas as pd
import os

In [3]:
df = pd.read_csv("Reviews for Airbnb listings.csv")
df.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,49091,8243238,21/10/2013,8557223,Jared,Fran was absolutely gracious and welcoming. Ma...
1,50646,11909864,18/4/2014,1356099,James,A comfortable room in a smart condo developmen...
2,50646,13823948,5/6/2014,15222393,Welli,Stayed over at Sujatha's house for 3 good nigh...
3,50646,15117222,2/7/2014,5543172,Cyril,It's been a lovely stay at Sujatha's. The room...
4,50646,15426462,8/7/2014,817532,Jake,"We had a great experience. A nice place, an am..."


In [4]:
df.dtypes

listing_id        int64
id                int64
date             object
reviewer_id       int64
reviewer_name    object
comments         object
dtype: object

In [5]:
df.shape

(100919, 6)

**Translation Testing**

Test out the translation of the package through translating a simple sentence in German to English.

In [6]:
from deep_translator import GoogleTranslator
to_translate = 'Ich möchte diesen Text übersetzen'
translated = GoogleTranslator(source='auto', target='en').translate(to_translate)
translated
# output -> I want to translate this text

'I want to translate this text'

**Make sure that there are no null values in the reviews**

In [7]:
df = df.dropna().reset_index(drop=True)
df['comments'].isnull().sum()

0

In [8]:
df.shape

(100839, 6)

**Manually remove empty strings**

As some reviews are in the form of empty strings, it does not considered to be null values. However, we will need to remove them. 

In [9]:
df['translated'] = ''

In [10]:
df.head(10)

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments,translated
0,49091,8243238,21/10/2013,8557223,Jared,Fran was absolutely gracious and welcoming. Ma...,
1,50646,11909864,18/4/2014,1356099,James,A comfortable room in a smart condo developmen...,
2,50646,13823948,5/6/2014,15222393,Welli,Stayed over at Sujatha's house for 3 good nigh...,
3,50646,15117222,2/7/2014,5543172,Cyril,It's been a lovely stay at Sujatha's. The room...,
4,50646,15426462,8/7/2014,817532,Jake,"We had a great experience. A nice place, an am...",
5,50646,15552912,11/7/2014,10942382,Subba,Quiet condo. Comfortable stay and good location.,
6,50646,15884470,17/7/2014,17569265,Claire,Nice room and friendly stay. Kindely and smili...,
7,50646,16123989,22/7/2014,17188672,Hana,"Suja and her husband are really nice, amazing,...",
8,50646,16632638,30/7/2014,18067306,Liz,Sujatha is a wonderful host and gives us a lot...,
9,50646,16729657,1/8/2014,9211315,Derrick,A wonderful experience & highly recommended! S...,


In [11]:
df['comments'][54121] = ''

In [12]:
df['comments'][77037] = df['comments'][77037][:4999]

In [13]:
df['comments'][84521] = ''

In [14]:
df['comments'][89765] = ''

In [15]:
df['comments'][97417] = ''

In [16]:
df['comments'][98252] = df['comments'][98252][:4999]

#### Perform Translation to Airbnb LIstings Reviews Dataset

**NOTE**: This could potientially take up to 2.5 - 3 hours

In [17]:
from datetime import datetime

start = datetime.now()

for i in range(98252,len(df)):
    to_translate = df['comments'][i]
    translated = GoogleTranslator(source = 'auto', target = 'en').translate(to_translate)
    df['translated'][i] = translated

end = datetime.now()

time_taken = end - start
print("Time taken:", time_taken)

ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

In [None]:
# to check if all comments have been translated

count = 0
for i in range(len(df)):
    if df['translated'][i] == '':
        print("row not translated: ", i)
        continue
    else:
        count += 1
        
print("number of rows translated = ", count)

#### Export translated reviews to CSV

In [None]:
df.to_csv("Reviews_translated_all.csv")