**Pre-processing Airbnb Review Data for NLP**

# Introduction

## Read in libraries, data, and set notebook preferences

**Read in libraries**

In [7]:
#Read in libraries
import pandas as pd
import dask as dd
import swifter
import numpy as np
import nltk
import re

**Read in data**

In [8]:
#Set path to data
path = r'C:\Users\kishe\Documents\Data Science\Projects\Python Projects\In Progress\Air BnB - SF\Data\02_Intermediate'

#Read in data
df = pd.read_csv(path + '/2020_0131_Reviews_Cleaned.csv',sep=',',index_col=0,
                 parse_dates=['date'])

**Set preferences for notebook**

In [9]:
#Ignore warnings
import warnings; warnings.simplefilter('ignore')

#Increase number of columns and rows displayed by Pandas
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows',500)
pd.set_option('display.width',1000)

## Preview data

In [10]:
#View shape and dtypes. Preview head
print('Reviews data shape:', df.shape)
print('Data types: \n', df.dtypes)
display(df.head())

Reviews data shape: (425509, 2)
Data types: 
 comments            object
date        datetime64[ns]
dtype: object


Unnamed: 0,comments,date
19330,"Hello Josh Thank you very much for everything. I found myself very comfortable in your home. Quiet, comfortable and very complete and very clean, which I value highly. Next time I'd come with my family. I hope it's possible.",2013-12-01
143113,"Stop and book it now. Rea (Website hidden by Airbnb) this later!!! If your a single person looking for a story book San Francisco experience, look no farther. Staying in Mikes place couldn't be any more wonderful. If your familiar with ""Tales of the City"" Mike is the Olympia Dukakis. The home is warm and inviting with all the nuances of an old Victorian. Mike is an amazing host . He can tell you how walk drive or public transit the city (don't bother with a car). Would love to keep the gem to myself but everyone deserves this unique place to lay your head. Make sure while you're there be introduced to William . Book IT you won't be disappointed .",2017-06-07
1021372,"So I moved to SF in late May from Michigan to intern at Genentech for the summer. I stayed at Anjan’s apartment for 7 days while I was looking for a more permanent housing situation. Anjan was extremely hospitable and welcoming throughout the week. He was also very knowledgeable about the area and always offered to help in any way that he could. The area (SOMA) is very safe and is very “walkable.” There are plenty of restaurants and stores nearby (there’s even a target a few blocks away), so you have everything you need within a couple blocks from the apartment. As for the bedroom, it was spacious and clean. The bathroom was nice and I had to myself for the entirety of my stay. I felt very comfortable living at Anjan’s for a week and I really enjoyed staying there. If you’re a respectful person and are looking for a place to stay in SF for a short time, I highly recommend staying at Anjan’s. He’s a great person and a great host.",2013-06-02
64636,"This was the perfect home from home, our host was amazing like most California's we had a wonderful time.",2014-10-16
174143,We loved our time in beautiful SF! The place is in a fantastic location and near everything. Nadia’s communication and check in process is amazing and probably the best we have used by far. Would definitely recommend!! (Website hidden by Airbnb) Thanks for having us Nadia Jess + Mark x,2018-08-10


# Feature engineering

## Sentiment Analysis with Vader

Reviews data does not contain review scores associated with the review. Assigning  compound semantic score using VADER.

In [11]:
#Import and instantiate sentiment intensity analyzer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

def compound_scores(series):
    #Function to capture compound semantic score 
    score = analyzer.polarity_scores(series)['compound']
    return score

In [12]:
#Apply compound_scores to comments 
df['sentiment_compound']= df['comments'].swifter.apply(compound_scores)

HBox(children=(FloatProgress(value=0.0, description='Pandas Apply', max=425509.0, style=ProgressStyle(descript…




### Preview most positive and negative reviews

In [13]:
#View some of the most positive reviews
df[['comments','sentiment_compound' ]].sort_values(by = 'sentiment_compound', ascending= False).head(3)

Unnamed: 0,comments,sentiment_compound
40972,"This was perhaps the most amazing AirBnB experience I've ever had. I have used AirBnB several times in several countries and it's usually a good experience, but this was outstanding in so many unexpected ways. First off Adam and Alex are incredibly gracious and wonderful hosts. We came in super late and they made the whole place feel like home. The location is amazing. Seriously, amazing (coming from someone who used to live in SF). It's right next to the park, great restaurants around, lots of place to explore, near several great museums, easy Uber access, lots of public transportation, a farmers market, a food truck hangout, a Whole Foods 1 block away... Amazing. The kitchen is really a Kitchen with a capital K. Every spice, pan, knife, cooking implement you could hope for is there. We used that kitchen everyday and it was a godsend since we were visiting with our daughter. Everything worked perfectly, was beautifully clean and shiny, and just made you want to cook more! I only wish we got to use the grill - but we were just so busy enjoying the city. This unit has lots of unexpected surprises - there's actually a garage which Adam and Alex were kind enough to let us use (we arranged to do so beforehand), there were tons of toys and books for our daughter to entertain herself with, there's a beautiful piano we played a few evenings, and this seems to be a part of SF where the weather is always perfect. They have great wifi, a scanner, a printer, a fax machine - which actually ended up being useful for us. We pretty much walked to the park everyday where there was a really mindblowing playground and an awesome carousel that our daughter (and we) enjoyed. There was a phenomenal food truck gathering on Fridays with live music. There is the best ice cream shop in the world around the corner. A very reasonable gym that I got day passes for a short walk away. All-in-all this was an amazing trip. We stayed for several days and I didn't want to leave. I felt very lucky that we found such a fantastic spot with such amazing hosts. I am trying to think of something that I wish had been different - but it was honestly just great through and through. We will be back. I really hope this feedback is helpful as you make your decision. This is a really phenomenal place, and Adam and Alex were lovely hosts. I hope you enjoy their home as much as we did.",0.9997
101333,"This is a perfect place to stay. We were a large party of twelve people across three generations (from 68 to 8months) but the accommodation is so versatile that we managed with ease. The house is beautiful, with pretty bedrooms with comfortable beds and quality linens, three luxurious bathrooms with ample towels and quality toiletries provided, a superb kitchen which is very well equipped and satisfied a family of foodies, and stylish living space. There were lots of books and things for the children including beach toys and a kite and racquets and balls for the “bigger kids” who found free courts nearby and staged knockout competitions. The location is wonderful. The house is in a pretty, quiet street just over the road from Golden Gate park and all that has to offer and a short walk to the beach. There is a large supermarket just a few blocks away and a lively area on Balboa Street with local bakeries and a wealth of eateries and pleasant bars within easy walking distance. The zoo, which is lovely, is also within easy walking distance. You could happily stay in the immediate locality and keep yourself fully amused. That said access to the rest of the city is easy with a bus stop on a main arterial route just over the road. MOMA gave us a great day out and the Mission District some great eating experiences. Haight and Castro were lively and interesting and of course the Golden Gate Bridge! Fisherman’s Wharf was a bit touristy but the museum of animated machines gave us several hours of fun. Suzy was an absolutely fabulous host. She gave us lots of local information which really helped to orient us in the first few days. The house was beautifully clean and well supplied and we had a welcome pack that was more a welcome banquet. The house was full of plants and flowers and it was good to have a pleasant garden to relax in. Suzy even supplied a gift voucher to get a birthday cake for my 65th Birthday from a local bakery. Her communication was excellent and in the lead up to the holiday she came to feel more like a friend rather than a host. If you want a lovely place to stay in San Francisco look no further. I can’t recommend it enough. We will be back!",0.9996
642545,"Leslie is the best host in ""the best city"":) I like every moments we share/explore together lol Our met like a little miracle, but it really happened.We are guest & host, but cooperate very well as a super cool team.We knew time is not long, but just clicked immediately.We hug at first sight instead of introduce ourselves.We both are funny people, and have a good understanding with each other's sense of humor, we laughed a lot and we had a lot of fun time, writing messages as long as a letter full of true heart. She is a SF native, can drive anywhere with no maps.She has a super cute & cool dog (has followed her ins cheffshanti, will see her life everyday with ❤️) Thank you for all the things, the beer, the streets/blocks you showed to me, beautiful Waffle in St Jorge, the hipster hill....the Frida socks gift wearing on the way in NYC. And I remembered warm words you send,""You are a Rockstar/you deserve it/I filled good energy because you are so fun"",the same words for u. Let's talk about the house,If you lives in the apartment, you will have your own private space in SF.not just travel, but live here like local.Drinking coffee and reading a book on the sofa, listening music on the bed, or enjoy peaceful time in a cute & beautiful garden, breathing the fresh air, seeing green leaves sway in the wind. I have wrote many reviews in different city’s Airbnb,but for Leslie’s,I prepared my emotions and words, because I want to give best words for her. I recommend Leslie and her apartment in San Francisco, and I saw the huge potential of her to be a superhost ^_^ I hope she will play very well in Airbnb Community, and meet more interesting & kind & cool guests (like me ), more surprises & charming moments . Come on, my cool lawyer, my best friend in SF:) 推荐这位旧金山的新朋友,我无比可爱的新房东,非常随和,有趣,很逗,自带幽默感的那种人｡是旧金山的本地人,会非常热情地给你很多当地的旅行和生活建议,很好沟通｡没有疏离感,相当朴实认真｡房间是一个独居的公寓,有一个小花园,私密空间和居住便利都很不错,我从这里去了教会区,海特街,和Castro 区,都是优步很近的距离｡希望你在旧金山,住在这里时,能体会到家一样的热情｡",0.9995


In [14]:
#View some of the most negative reviews
df[['comments','sentiment_compound' ]].sort_values(by = 'sentiment_compound', ascending= False).tail(3)

Unnamed: 0,comments,sentiment_compound
1352782,"I did not stay with Hostwell as I didn’t feel comfortable or safe after trying to cancel the accommodation as I thought I’d booked for 2 night but then checked only booked for 1 night which cost me $511 US dollars. Not to mention when I clicked on reserve and clicked on confirm and pay through the Airbnb Website it came up with an error payment didn’t go through. If you don’t want to miss out on the accommodation to click on confirm payment again. But if I hadn’t thought to check my email which confirmed payment then I would of paid double and would still be in this situation oF fighting for my full refund with still no resolution and booking was for Friday, 9th August. I didn’t cancel with the host as the refund would of only been $169 US so at 12.44am in the morning and less than 48hrs contacted Airbnb Team Support. For over 12 hrs I have had to constantly call Team Support with no end to the resolution. Been hung up by them 7+ times by Case Manager Keith the first one simply spat policy at me and would not transfer me to his supervisors. He also proceeded to ignore my calls and emails and was promised he would call me back but only attempted to call apparently when I was yet again called Team Support then I received an email. Every time I asked to speak to a Case Managers supervisor they hung up on me blaming it on computer issues also unable to transfer me to the Case Manager dealing with my matter - every call. That is the reason for so many Case Managers I had gone through. At 4am in the morning spoke to Hazel who mentioned she would reach out to the host on my behalf requesting a refund. However was never emailed updates and where my case is at. And so it seems Airbnb like to push the blame and responsibility on the host as to whether I would receive a refund however the host was only concerned with losing the money that he refused to provide a full refund. At this point having a aleepless night moved around 7+ Case Managers from Manila all not prepared to do anything, Spitting policy at me. And not willing to resolve my issue. Tired and frustrated waiting for call back after call back from Case Managers, Supervisors wanting my money back for accommodation which was only for 1 nights stay instead of the 2 nights I thought I’d booked. I’m still no closer to resolving the matter as now the Supervisor from Resolutions Team is yet to call me back and it’s been 2 days passed. I am frustrated, disappointed, disgusted at the outrageously horrible customer service I have experienced from everyone I’ve dealt with at Airbnb Team Support and the Host who also only worried about losing their money and spat policy at me but not concerned about a overseas traveller trying to enjoy her international experience and save money along the way where I can. I will continue to fight for my refund and will no longer be using Airbnb again because of the disgraceful way they continue to handle my case. Disappointed customer, Zarita",-0.9941
173772,"My host tried to make me feel welcome but I'm afraid that I have to give the experience a bad review. I did not understand from the listing that it was a basement apartment (which meant lots of sounds from the floor above, spiders above the bed & kitchen garbage, a long dark hallway in spite of the one light, that one had to go through to get to the apartment.) The key to the street door was left on a leather tongue attached to the inside of the mailbox. Anyone casing the place with so much traffic going in & out could see the set up. The host told me I could remove the ""public"" key during my stay. But then I discovered that the door to the apartment had no outside lock so I could not lock it when I left. There was a bolt for locking it from the inside. Plus a few feet away was a door to their enclosed back yard garden which was kept open. So I did not feel safe as a single woman in urban America & could not believe anyone would have a set up like that. The apartment itself was tastefully done and fun. The few problems (cleaning people having left no toilet paper & very dead flowers; sink plug that disconnected the first time I used it; no can opener when I went to cook my dinner in a well set up kitchen) were all taken care of immediately. And I was patient & understanding with each of these things. I knew they were new at this and discovering how to work out the kinks. But the first time I had a complaint, that I couldn't leave my suitcase for a couple of hours after check out because I had a late flight out, I received a rude & unprofessional text accusing me of being a problem. We got through that with a couple back & froths, but it was disturbing to my day. And then on leaving there was a very upsetting confrontation. I saw they were letting the next guests put their suitcases in the garage! I confronted them calmly with the question of what was their problem with me. Instead of explaining the reason they were making an exception for this couple, they aggressively tried to get me to leave. I said they were being ""nasty"" and left very upset. So it was not a good experience for my first airbnb.",-0.9956
88818,"I stayed in her apt from Feb 6 to Feb 10, 2016. The experience in her place was so bad during these days that totally ruined my vacation. 1. The house had strong bad smell that made me sick and dizzy,since it was really hard for me to find another place in the evening of Feb 6 due to Super Bowl 50, I kept staying there for 4 nights, but the smell problem bothered me all the time. 2. Julie didn't provide enough toilet paper that really caused inconvenience when I found the toilet paper used out and no any extra one at home. 3. Julie didn't offer any paper towel, napkin at kitchen, dining table and anywhere at home that made me really feel uncomfortable and inconvenience when I cooked and ate. I even couldn't find a steak knife after I cooked my steak. 4. When I came home at night on Feb 7, I found I had no hot water to take a shower in my bathroom. 5. I found several pieces of hair on my pillow when I arrived on Feb 6 that made me doubt if the bedding was cleaned. 6. There was someone walked around whole night above my bedroom that made me wake up whole night on Feb 8. Julie also heard that but she didn't do anything trying to stop that noise made by her neighbors above. And the noise happenned the night of Feb 9th again. 7. Julie came home in the mid-night of Feb 6th with her high heel shoes and the noise woke me up 8. Julie called my name in fount of my door in the mid-night and asked me if I screamed on the night of Feb 8th, cos she heard someone screaming. I was almost sleeping that time but her interruption woke me up. It was really make me confusing why she acted in this way, I guessed she may have some hallucination or felt too lonely. 9. Julie wrote me an email after I moved out. She complained there was a bad smell in house from the trash can in kitchen. She blamed me to put kitchen trash (some fish skin and meat of a salmon, I only cooked once during my stay) in that trash can in kitchen and caused bad smell in house. It was her duty to throw trash away, but she didn't throw even once during my 5 days staying, I really confused where I should put the kitchen trash if I was not allowed to put that trash inside kitchen trash can. She also complained that I put the dirty towels inside the laundry basket. She also complained that I didn't lock the door although I did locked at least one lock of two, my friend Gordon can prove that.",-0.9978


## Assign postive, negative, and neutral  labels to df

In [15]:
#Function that assigns positive, negative, or neutral label depending on vader score
def labeler(vader_score):
    if vader_score > .25:
        return 'positive'
    elif vader_score < -.25:
        return 'negative'
    else:
        return 'neutral'

#Apply to df
df['label']= df.sentiment_compound.apply(labeler)

#check
display(df.head(2))

Unnamed: 0,comments,date,sentiment_compound,label
19330,"Hello Josh Thank you very much for everything. I found myself very comfortable in your home. Quiet, comfortable and very complete and very clean, which I value highly. Next time I'd come with my family. I hope it's possible.",2013-12-01,0.9534,positive
143113,"Stop and book it now. Rea (Website hidden by Airbnb) this later!!! If your a single person looking for a story book San Francisco experience, look no farther. Staying in Mikes place couldn't be any more wonderful. If your familiar with ""Tales of the City"" Mike is the Olympia Dukakis. The home is warm and inviting with all the nuances of an old Victorian. Mike is an amazing host . He can tell you how walk drive or public transit the city (don't bother with a car). Would love to keep the gem to myself but everyone deserves this unique place to lay your head. Make sure while you're there be introduced to William . Book IT you won't be disappointed .",2017-06-07,0.9334,positive


## Word counts

In [16]:
#Capture number of characters used in comments
df['word_count'] = [len(x.split()) for x in df['comments'].tolist()]

#Check
display(df.head())

Unnamed: 0,comments,date,sentiment_compound,label,word_count
19330,"Hello Josh Thank you very much for everything. I found myself very comfortable in your home. Quiet, comfortable and very complete and very clean, which I value highly. Next time I'd come with my family. I hope it's possible.",2013-12-01,0.9534,positive,39
143113,"Stop and book it now. Rea (Website hidden by Airbnb) this later!!! If your a single person looking for a story book San Francisco experience, look no farther. Staying in Mikes place couldn't be any more wonderful. If your familiar with ""Tales of the City"" Mike is the Olympia Dukakis. The home is warm and inviting with all the nuances of an old Victorian. Mike is an amazing host . He can tell you how walk drive or public transit the city (don't bother with a car). Would love to keep the gem to myself but everyone deserves this unique place to lay your head. Make sure while you're there be introduced to William . Book IT you won't be disappointed .",2017-06-07,0.9334,positive,122
1021372,"So I moved to SF in late May from Michigan to intern at Genentech for the summer. I stayed at Anjan’s apartment for 7 days while I was looking for a more permanent housing situation. Anjan was extremely hospitable and welcoming throughout the week. He was also very knowledgeable about the area and always offered to help in any way that he could. The area (SOMA) is very safe and is very “walkable.” There are plenty of restaurants and stores nearby (there’s even a target a few blocks away), so you have everything you need within a couple blocks from the apartment. As for the bedroom, it was spacious and clean. The bathroom was nice and I had to myself for the entirety of my stay. I felt very comfortable living at Anjan’s for a week and I really enjoyed staying there. If you’re a respectful person and are looking for a place to stay in SF for a short time, I highly recommend staying at Anjan’s. He’s a great person and a great host.",2013-06-02,0.986,positive,175
64636,"This was the perfect home from home, our host was amazing like most California's we had a wonderful time.",2014-10-16,0.9287,positive,19
174143,We loved our time in beautiful SF! The place is in a fantastic location and near everything. Nadia’s communication and check in process is amazing and probably the best we have used by far. Would definitely recommend!! (Website hidden by Airbnb) Thanks for having us Nadia Jess + Mark x,2018-08-10,0.9824,positive,50


# Preprocessing comments data

## Language Processing Pipeline

### Clean-up text, stop word removal, and tokenize comments

In [17]:
#Import stopwords
from nltk.corpus import stopwords, wordnet
stop_words = stopwords.words('english')

#Add additional stop words
stop_words.extend(['airbnb','website hidden by airbnb'])

def comment_preprocessor(comments):
    """
    Function that completes the following preprocessing steps:
        -Remove numbers
        -Tokenize columns of text data in pandas. 
        -Additionally remove punctuation and lower-case text    
        -Remove tokens with < 1 character
        -Remove stopwords
    """
    series = re.sub(r'\d+', '', comments) #remove numbers from text
    tokenizer = nltk.tokenize.RegexpTokenizer(r'\w+') #Instantiat tokenizer
    tokens = tokenizer.tokenize(series) #Tokenize series and remove punctuation
    tokens = [token.lower() for token in tokens] #convert tokens to lowercase
    tokens = [tokens.remove(token) if len(token) <3 else token for token in tokens] #remove tokens with len <3
    tokens = [token for token in tokens if token] #Remove None from tokens
    tokens = [token for token in tokens if token not in stop_words] #Remove stopwords
    tokens = nltk.tag.pos_tag(tokens) #apply POS tags to tokens
    return tokens

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

In [18]:
#Import progress bar to track time to apply comment_preprocessor
from tqdm import tqdm, tqdm_pandas
tqdm_pandas(tqdm)

df['comments_pos_tag'] = df['comments'].progress_apply(comment_preprocessor)


100%|█████████████████████████████████████████████████████████████████████████| 425509/425509 [13:39<00:00, 519.12it/s]


In [19]:
#check
display(df.head(1))

Unnamed: 0,comments,date,sentiment_compound,label,word_count,comments_pos_tag
19330,"Hello Josh Thank you very much for everything. I found myself very comfortable in your home. Quiet, comfortable and very complete and very clean, which I value highly. Next time I'd come with my family. I hope it's possible.",2013-12-01,0.9534,positive,39,"[(hello, NN), (josh, NN), (thank, VBD), (much, JJ), (everything, NN), (comfortable, JJ), (home, NN), (quiet, RBR), (comfortable, JJ), (complete, JJ), (clean, JJ), (highly, RB), (next, JJ), (time, NN), (come, VBN), (possible, JJ)]"


# Write file to csv

In [21]:
#View reviews shape
print('Final reviews shape:',df.shape)

#Set path to write processed data
path = r'C:\Users\kishe\Documents\Data Science\Projects\Python Projects\In Progress\Air BnB - SF\Data\03_Processed'

#Write to csv
df.to_csv(path + '/2020_0208_Reviews_Processed_NLP.csv',sep=',', index=False)

Final reviews shape: (425509, 6)
