In [1]:
import pandas as pd
pd.set_option('display.max_colwidth',None)

In [2]:
data = pd.read_csv('reviews_d.csv')

In [3]:
data.nunique()

listing_id        10378
id               478551
date               4456
reviewer_id      440915
reviewer_name     59538
comments         460615
dtype: int64

In [4]:
data.isna().sum()

listing_id        0
id                0
date              0
reviewer_id       0
reviewer_name     0
comments         25
dtype: int64

In [5]:
df = data.dropna(subset=['comments'])
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 478526 entries, 0 to 478550
Data columns (total 6 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   listing_id     478526 non-null  int64 
 1   id             478526 non-null  int64 
 2   date           478526 non-null  object
 3   reviewer_id    478526 non-null  int64 
 4   reviewer_name  478526 non-null  object
 5   comments       478526 non-null  object
dtypes: int64(3), object(3)
memory usage: 25.6+ MB


In [6]:
df.sample(10)

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
181871,14648612,125521028,2017-01-06,6938302,Alex,A really special city hideaway with the most beautiful instrument to play.
98584,4380854,615051079,2020-03-07,183223420,Matt,"A very friendly host in a great, central spot in Berlin. I’d be happy to stay here again."
339344,32066095,951325206311246524,2023-08-05,25876505,Sophie,Love the place and the neighborhood ! Great place to stay
8415,321378,750908213,2021-04-22,188917018,Lea,"Vielen Dank für die nette Gastfreundschaft, kommen gerne wieder."
256715,22162940,643859093,2020-07-29,56015160,Karen,Es hat uns und den Kindern sehr gut gefallen. Wir können das Hostel nur weiterempfehlen!
461117,677433898395003666,860667292270905086,2023-04-02,342681926,Sebastian,"Sehr schön, gerne wieder!"
323318,30384248,557447106,2019-11-01,100405155,Oscar,"It was a very pleasant time in her apartment. The apartment has everything you need, and very clean. Kristina is always available and respond you very quickly. A very good choose to stay in Berlin. 100% recommended!"
197676,16209988,864325485181106677,2023-04-07,23577912,Andrew,"This is a fantastic, spacious place in the heart of Berlin. very close to the central train station. The apartment is light, spacious, well stocked and very clean. We were met by the host who gave us a walk-thru and made us feel very welcome. I recommend it to anyone looking to stay in Berlin"
199311,16237757,538711720,2019-09-30,118281358,Marilyn,"Amazing place to stay, especially if traveling in bigger groups. Super spacious, clean and accommodating. Kitchen was complete with all the necessities and markets were close by. Check in/out was smooth as ever and would definitely recommend to friends and family. Thank you to Gabby for the wonderful tips! Thank you also to our hosts for having us!"
50381,1111145,319429718,2018-09-06,8704329,Nicholas,Great location and great space!


## reviews cleaning and preprocessing

In [7]:
import re
import string

def remove_emojis(data):
    emoj = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002500-\U00002BEF"  # chinese char
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001f926-\U0001f937"
        u"\U00010000-\U0010ffff"
        u"\u2640-\u2642" 
        u"\u2600-\u2B55"
        u"\u200d"
        u"\u23cf"
        u"\u23e9"
        u"\u231a"
        u"\ufe0f"  # dingbats
        u"\u3030"
                      "]+", re.UNICODE)
    return re.sub(emoj, '', data)

def remove_html_tags(data):
    clean_text = re.sub(r'<.*?>', ' ', data)  # Remove HTML tags
    clean_text = clean_text.replace('\r', ' ')  # Replace '\r' with a space
    return clean_text

def remove_punctuation(text):
    
    translator = str.maketrans('', '', string.punctuation)
    
    text = text.translate(translator)
    
    return text.lower()

In [8]:
# Sample text
texts = "Vielen Dank 🙏🏼 Übergabe der Wohnung und Hygiene waren Top !! I felt comfortable# and refreshed! in such a pristine(small) environment.<br/>Ashraf's commitment to maintaining order and organization in his home was truly commendable. Her kitchen, living room and garten are great, she is very generous in sharing them. \r<br/>\r<br/>She once drove me to the supermarket for shopping and a few times offered me a breakfast, dinner or fruits."

cleaned_texts = remove_emojis(texts)
cleaned_texts = remove_html_tags(cleaned_texts)
cleaned_texts = remove_punctuation(cleaned_texts)

cleaned_texts

'vielen dank  übergabe der wohnung und hygiene waren top  i felt comfortable and refreshed in such a pristinesmall environment ashrafs commitment to maintaining order and organization in his home was truly commendable her kitchen living room and garten are great she is very generous in sharing them     she once drove me to the supermarket for shopping and a few times offered me a breakfast dinner or fruits'

In [9]:
df['comments_proc'] = df['comments'].apply(remove_emojis)
df['comments_proc'] = df['comments_proc'].apply(remove_html_tags)
df['comments_proc'] = df['comments_proc'].apply(remove_punctuation)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['comments_proc'] = df['comments'].apply(remove_emojis)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['comments_proc'] = df['comments_proc'].apply(remove_html_tags)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['comments_proc'] = df['comments_proc'].apply(remove_punctuation)


In [10]:
df[['comments','comments_proc']].sample(10)

Unnamed: 0,comments,comments_proc
377534,"Convenient location, good wifi, very warm and cozy room. Lovely housemate! Unfortunately didn’t get to meet Aneta but she seemed lovely over message :) would recommend for solo traveller on a budget",convenient location good wifi very warm and cozy room lovely housemate unfortunately didn’t get to meet aneta but she seemed lovely over message would recommend for solo traveller on a budget
29471,"A wonderful and conveniently located apartment in Neukölln. The hosts were responsive, and even took the time to offer extensions of the stay when they became available. I would heartily recommend this to anyone.",a wonderful and conveniently located apartment in neukölln the hosts were responsive and even took the time to offer extensions of the stay when they became available i would heartily recommend this to anyone
115843,"Great, clean and well equiped apartment. Big living room and a terrace - both places where you can chill. Also it was a very good starting point for city tours.\r<br/>\r<br/>Jan is a great host, he is very helpful. It was a pleasure to stay at his apartment. I can wholeheartedly recommend Jan and his flat.",great clean and well equiped apartment big living room and a terrace both places where you can chill also it was a very good starting point for city tours jan is a great host he is very helpful it was a pleasure to stay at his apartment i can wholeheartedly recommend jan and his flat
161244,The accommodation was good. The first impression about the location was not good. It looked like a social neighborhood but that could be because it was similar to the ones I have in my home country and because it was the first time in Berlin too. But at night the location was quiet and calm. The room did not have curtains to fully stop light from the window. I had to block the light with the blanket which was not easy to breath. If you have light sensitivity then you might want to bring a eye night patch.<br/>The host were very nice and Ece was very fast to respond and flexible with the check-in time.<br/>It was my first time using airbnb and this bedroom was a good experience.,the accommodation was good the first impression about the location was not good it looked like a social neighborhood but that could be because it was similar to the ones i have in my home country and because it was the first time in berlin too but at night the location was quiet and calm the room did not have curtains to fully stop light from the window i had to block the light with the blanket which was not easy to breath if you have light sensitivity then you might want to bring a eye night patch the host were very nice and ece was very fast to respond and flexible with the checkin time it was my first time using airbnb and this bedroom was a good experience
153255,"Excellent location, quiet and central.",excellent location quiet and central
29485,"Ich fühlte mich von Beginn weg wohl und willkommen bei Samia, das Zimmer und das Bad sind wunderschön, es ist alles sehr sauber und mit viel Gefühl fürs Detail eingerichtet. Das Zimmer liegt zu einem ruhigen Innenhof hin. Was für mich super war.<br/>Das Ankommen bei Samia war sehr angenehm, sie hat sich Zeit genommen mir alles zu zeigen und zu erklären. Samia war sehr herzlich und hilfsbereit. Für mich war es der perfekte Spot für meinen Aufenthalt in Berlin.<br/>Ich kann Samia und ihre Unterkunft von herzen weiterempfehlen!",ich fühlte mich von beginn weg wohl und willkommen bei samia das zimmer und das bad sind wunderschön es ist alles sehr sauber und mit viel gefühl fürs detail eingerichtet das zimmer liegt zu einem ruhigen innenhof hin was für mich super war das ankommen bei samia war sehr angenehm sie hat sich zeit genommen mir alles zu zeigen und zu erklären samia war sehr herzlich und hilfsbereit für mich war es der perfekte spot für meinen aufenthalt in berlin ich kann samia und ihre unterkunft von herzen weiterempfehlen
434150,"It was a great location and also felt safe being there, the other guests were friendly and alway said hello. Check in was very easy but host was a little slow to respond, other than that I enjoyed my stay",it was a great location and also felt safe being there the other guests were friendly and alway said hello check in was very easy but host was a little slow to respond other than that i enjoyed my stay
327323,Very nice apartments! Very nice owner! Very nice everything <3,very nice apartments very nice owner very nice everything 3
148581,great!,great
56144,"Karen was excellent with communication. The apartment was easy to find and very well located, just a few blocks from the ""Kapelle der Versöhnung,"" (on the old east/west border) and a subway station. It was well-equipped, clean, and functional. Walking in the evenings along the nearby memorials / information sites / parks associated with the wall was very pleasant. The center of the city was also reachable by foot in about 30 minutes. Some meter-readers did knock on our window at around 9:30 in the morning on our last day (before we were fully clothed) to gain entrance, which was a bit surprising.",karen was excellent with communication the apartment was easy to find and very well located just a few blocks from the kapelle der versöhnung on the old eastwest border and a subway station it was wellequipped clean and functional walking in the evenings along the nearby memorials information sites parks associated with the wall was very pleasant the center of the city was also reachable by foot in about 30 minutes some meterreaders did knock on our window at around 930 in the morning on our last day before we were fully clothed to gain entrance which was a bit surprising


## bert-base-multilingual-uncased-sentiment

Reference : https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment

Ranking based on sentiment scores :
1 - Very Negative
2 - Negative
3 - Neutral
4 - Positive
5 - Very Positive

Used this model to predict sentiment based ranking for each guest review - comment.

Motive behind using this pre-trained model is that it is trained for 6 different languages, English, Dutch, German, French, Spanish, and Italian.