# 03-Advanced: Segment and rank
### The problem
Your manager (/sales rep) asks you to: *rank the listings with at least 5 reviews in the AirBnB sample dataset using the sentiment of the available reviews*. (in London?)

### The method
You should:

1. load the reviews and calculate a sentiment score for each review,
2. load the listings and (join)
3. find a way to aggregate...?

## Segment
Segment by room_type.

In [None]:
# Load the listings.csv data
import pandas as pd
df_listings = pd.read_csv('../data/listings_sample.csv')
df_listings.head(3)

Unnamed: 0,host_id,id,name,description,neighborhood_overview,neighbourhood,latitude,longitude,room_type,accommodates,...,amenities,price,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value
0,43039,11551,Arty and Bright London Apartment in Zone 2,Unlike most rental apartments my flat gives yo...,Not even 10 minutes by metro from Victoria Sta...,"London, United Kingdom",51.46095,-0.11758,Entire home/apt,4,...,"[""Hair dryer"", ""Essentials"", ""Washer"", ""Lockbo...",$110.00,193,91.0,9.0,9.0,10.0,10.0,9.0,9.0
1,54730,13913,Holiday London DB Room Let-on going,My bright double bedroom with a large window h...,Finsbury Park is a friendly melting pot commun...,"Islington, Greater London, United Kingdom",51.56861,-0.1127,Private room,2,...,"[""Kitchen"", ""Host greets you"", ""Bed linens"", ""...",$40.00,21,97.0,10.0,10.0,10.0,10.0,9.0,9.0
2,60302,15400,Bright Chelsea Apartment. Chelsea!,Lots of windows and light. St Luke's Gardens ...,It is Chelsea.,"London, United Kingdom",51.4878,-0.16813,Entire home/apt,2,...,"[""Kitchen"", ""Hangers"", ""Fire extinguisher"", ""L...",$75.00,89,96.0,10.0,10.0,10.0,10.0,10.0,9.0


In [None]:
# Use 'value_counts()' to inspect neighbourhoods listed
df_listings['room_type'].value_counts().head()

Entire home/apt    15623
Private room       11579
Hotel room           176
Shared room          148
Name: room_type, dtype: int64

In [None]:
df_listings['price'].apply(lambda p: float(p.replace('$', '').replace(',', ''))).sort_values(ascending=False)

6334     17803.0
14629    17803.0
15561    17762.0
9822     16023.0
5292     16023.0
          ...   
13565        9.0
16650        9.0
14487        8.0
16373        8.0
7139         8.0
Name: price, Length: 27526, dtype: float64

#### Example 03-01: Identify luxury homes/apartement listings
The 10 most expensive home/apartment listings can be found using the following code:

In [None]:
df_lux_top5 = df_listings[df_listings['room_type']=='Entire home/apt'].sort_values(by=['price']).head(5)
df_lux_top5

Unnamed: 0,host_id,id,name,description,neighborhood_overview,neighbourhood,latitude,longitude,room_type,accommodates,...,amenities,price,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value
4395,36196974,6907168,Stunning spacious 3 bdr with roof terrace UNIQ...,"Spacious, bright and beautifully furnished 3-b...","The quintessential London neighbourhood, Parso...","London, United Kingdom",51.47311,-0.20212,Entire home/apt,8,...,"[""Kitchen"", ""Hangers"", ""Fire extinguisher"", ""L...","$1,000.00",39,99.0,10.0,10.0,10.0,10.0,10.0,10.0
6070,50298404,9751604,Fulham family home 6 bedrooms,Stunning newly refurbished lion house in leafy...,Fulham is a lovely leafy neighbourhood which a...,"London, Fulham, United Kingdom",51.47303,-0.19226,Entire home/apt,10,...,"[""Dishwasher"", ""Kitchen"", ""Host greets you"", ""...","$1,000.00",7,100.0,10.0,10.0,10.0,10.0,10.0,9.0
23445,174234692,35571333,Modern flat with courtyard garden Islington zo...,Clean and contemporary flat set over ground an...,The flat is part of a very small parade of sho...,"Greater London, England, United Kingdom",51.55718,-0.1241,Entire home/apt,6,...,"[""Dishwasher"", ""Kitchen"", ""Bed linens"", ""Hange...","$1,000.00",28,93.0,10.0,10.0,10.0,10.0,9.0,9.0
22791,259627797,34391573,studio flat to rent in London Kingsbury nw9,studio flat double bed with 4-drawer small c...,Close to shops and transport And a good quie...,"Greater London, England, United Kingdom",51.59092,-0.25941,Entire home/apt,2,...,"[""TV"", ""Essentials"", ""Kitchen"", ""Hangers"", ""Pr...","$1,000.00",37,93.0,10.0,10.0,10.0,10.0,9.0,9.0
26255,273701605,41251650,Newly-Renovated 2bd in Heart of Exclusive Mayfair,"Situated in central Mayfair, London's most sou...","Kensington is a district of west London, withi...","Greater London, England, United Kingdom",51.51147,-0.1493,Entire home/apt,4,...,"[""Dishwasher"", ""Kitchen"", ""Bed linens"", ""Hange...","$1,000.00",7,94.0,9.0,10.0,10.0,10.0,10.0,10.0


#### Exercise 03-01: Identify budget private room listings
The 10 cheapest private room listings can be found using the following code:

In [None]:
df_bud_top5 = df_listings[df_listings['room_type']=='Private room'].sort_values(by=['price'], ascending=True).head(5)
df_bud_top5

Unnamed: 0,host_id,id,name,description,neighborhood_overview,neighbourhood,latitude,longitude,room_type,accommodates,...,amenities,price,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value
24193,277371026,36941802,"OYO Apollo Bayswater, Standard Triple Room","Relax, re-charge your batteries and feel like ...",Smoking is prohibited inside the premises.<br ...,"Greater London, England, United Kingdom",51.51007,-0.18607,Private room,3,...,"[""TV"", ""Essentials"", ""Bed linens"", ""Hangers"", ...","$1,000.00",1,20.0,2.0,2.0,2.0,2.0,2.0,2.0
9186,96270773,15186185,Perfect Room in London,Enjoy a comfortable room in a warm and welcomi...,The neighbourhood is safe and friendly.,"London, England, United Kingdom",51.56383,-0.10304,Private room,2,...,"[""TV"", ""Essentials"", ""Kitchen"", ""Host greets y...","$1,000.00",83,87.0,9.0,9.0,9.0,9.0,9.0,9.0
24242,278107161,37018690,"OYO Abbey Hotel, Standard Single Room","Relax, re-charge your batteries and feel like ...","""The well-equipped deluxe accommodation is wel...","Greater London, England, United Kingdom",51.50718,-0.22609,Private room,1,...,"[""TV"", ""Essentials"", ""First aid kit"", ""Fire ex...","$1,000.00",2,100.0,10.0,10.0,10.0,10.0,10.0,10.0
24197,277374821,36943146,"OYO Bakers Hotel, Standard Single with Shared ...","Relax, re-charge your batteries and feel like ...","The Chelsea Psychic Garden, Apollo Victoria Th...","Greater London, England, United Kingdom",51.49094,-0.14548,Private room,2,...,"[""TV"", ""Essentials"", ""Hangers"", ""First aid kit...","$1,000.00",10,82.0,9.0,10.0,9.0,8.0,9.0,8.0
24196,277374821,36942855,"OYO Bakers Hotel, Double En-Suite Room","Relax, re-charge your batteries and feel like ...","The Chelsea Psychic Garden, Apollo Victoria Th...","Greater London, England, United Kingdom",51.49173,-0.14538,Private room,2,...,"[""TV"", ""Essentials"", ""Hangers"", ""First aid kit...","$1,000.00",10,94.0,10.0,10.0,9.0,10.0,10.0,10.0


## Rank
Rank the listing in a particular segement

### Prepare data
Listing with at least 5 reviews less than 3 years old.

In [None]:
# Load reviews data and remove comments that are NaN
import pandas as pd
df = pd.read_csv('../data/reviews_sample.csv')
df = df[~df['comments'].isna()]
print(len(df))

# Filter to reviews less 3 years old
df = df[df['date']>'2018-10-05']
print(len(df))

# Filter to listings with at least 5 reviews
listing_counts = df['listing_id'].value_counts()
valid_listings = listing_counts[listing_counts>=5].index
df = df[df['listing_id'].isin(valid_listings)]
print(len(df))

99964
47315
17888


In [None]:
# Can check...!!!
print(len(df[df['comments'].isna()]))
print(len(df[df['date']<='2018-10-05']))
print(sum(df['listing_id'].value_counts()<5))

0
0
0


### Analyse sentiment

In [None]:
# Calculate a sentiment score for each review
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
scorer = SentimentIntensityAnalyzer()
df.loc[:,'sentiment'] = df['comments'].apply(lambda s: scorer.polarity_scores(s)['compound'])
df.head(3)

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments,sentiment
3,27425883,355242699,2018-12-03,7738609,André,Guests can expect a great hospitality. The stu...,0.9594
5,26989771,347977056,2018-11-12,149133279,Guille,Really nice host. He is very kind and sympathi...,0.7774
8,4886845,339317863,2018-10-21,26995276,Alex,"Soo is a great host, would love to stay at the...",0.8519


In [None]:
# Aggregate by taking mean score of sentiments and sort
listings_scored = df.groupby('listing_id')['sentiment'].agg(['mean', 'count']).reset_index()
listings_scored.sort_values(by=['mean'])

Unnamed: 0,listing_id,mean,count
144,2347198,-0.039800,5
1981,30335074,-0.021220,5
579,12846088,-0.000160,5
2117,32582210,0.038867,6
230,4284972,0.071100,5
...,...,...,...
907,17597932,0.971767,6
1381,23270833,0.973040,5
92,957861,0.976250,6
2129,32838811,0.978717,6


### Inspect results
Look at the reviews of the top/bottom listing

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=2c6f047c-21a6-4149-814c-b3f60a9bf973' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>