# 1- What are the aspects of a listing that best correlate to price ?

# 2- What are the aspects of a listing that best correlate to availabilty (lack of bookings), and if found (those aspects), do they necessarily correlate  with fully booked listings ?

# 3- What are the aspects of a listing that best correlate to a positive review, or a negative one ?


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from bs4 import BeautifulSoup
import urllib


In [2]:
Boston_listings = pd.read_csv('Boston_listings.csv')

In [3]:
Boston_listings

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,3168,https://www.airbnb.com/rooms/3168,20220915162158,2022-09-15,city scrape,TudorStudio,"The ""Studio at 14 Weldon"" is located in Newton...","Newton has 13 unique villages, and gives off a...",https://a0.muscache.com/pictures/ff7952dc-ef0b...,3697,...,,,,,f,1,0,1,0,
1,3781,https://www.airbnb.com/rooms/3781,20220915162158,2022-09-15,city scrape,HARBORSIDE-Walk to subway,Fully separate apartment in a two apartment bu...,"Mostly quiet ( no loud music, no crowed sidewa...",https://a0.muscache.com/pictures/24670/b2de044...,4804,...,4.96,4.87,4.91,,f,1,1,0,0,0.26
2,5506,https://www.airbnb.com/rooms/5506,20220915162158,2022-09-15,city scrape,** Fort Hill Inn Private! Minutes to center!**,"Private guest room with private bath, You do n...","Peaceful, Architecturally interesting, histori...",https://a0.muscache.com/pictures/miso/Hosting-...,8229,...,4.89,4.54,4.73,Approved by the government,f,10,10,0,0,0.69
3,6695,https://www.airbnb.com/rooms/6695,20220915162158,2022-09-15,city scrape,"Fort Hill Inn *Sunny* 1 bedroom, condo duplex","Comfortable, Fully Equipped private apartment...","Peaceful, Architecturally interesting, histori...",https://a0.muscache.com/pictures/38ac4797-e7a4...,8229,...,4.95,4.50,4.71,STR446650,f,10,10,0,0,0.75
4,7903,https://www.airbnb.com/rooms/7903,20220915162158,2022-09-15,city scrape,"Colorful, modern 2 BR apt shared with host",I'm a high school teacher and frequent travele...,"The apartment is in Somerville, located direct...",https://a0.muscache.com/pictures/miso/Hosting-...,14169,...,4.95,4.56,4.80,,f,1,0,1,0,1.84
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5180,716081443145047239,https://www.airbnb.com/rooms/716081443145047239,20220915162158,2022-09-15,city scrape,Private Room with Shared Bath in Quiet Street,*Please Note: You are booking a private room i...,South Boston is a very large neighborhood comp...,https://a0.muscache.com/pictures/prohost-api/H...,2356643,...,,,,STR-460218,f,71,25,46,0,
5181,716081469166085329,https://www.airbnb.com/rooms/716081469166085329,20220915162158,2022-09-15,city scrape,Cozy Bedroom in Convenient Downtown Location,*Please Note: You are booking a private room i...,South Boston is a very large neighborhood comp...,https://a0.muscache.com/pictures/prohost-api/H...,2356643,...,,,,STR-460218,f,71,25,46,0,
5182,716081495310456299,https://www.airbnb.com/rooms/716081495310456299,20220915162158,2022-09-15,city scrape,"Peaceful Bedroom w/ Shared Bath - AC, Wifi inc...",*Please Note: You are booking a private room i...,South Boston is a very large neighborhood comp...,https://a0.muscache.com/pictures/prohost-api/H...,2356643,...,,,,STR-460218,f,71,25,46,0,
5183,716235197792512391,https://www.airbnb.com/rooms/716235197792512391,20220915162158,2022-09-15,city scrape,Sunny Room w/ Shared Bath in Modest Brighton Home,"Perfect for Hospital Stays, Medical Students, ...",The apartment is located in a walkable neighbo...,https://a0.muscache.com/pictures/prohost-api/H...,2356643,...,,,,STR-484106,t,71,25,46,0,


### After a quick data exploration, we need to drop a couple of columns, which are:
> 1- Columns that are all nulls <br>
2- Personal info (Id) <br>
3- Non-relevant info (urls, yet listing_url is needed to validate some of the data as we will see later in the notebook) <br>
4- Meta-data (source) <br>
5- String columns (i.e. those that need TF-IDF) for simplicity

In [4]:
Boston_listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5185 entries, 0 to 5184
Data columns (total 75 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            5185 non-null   int64  
 1   listing_url                                   5185 non-null   object 
 2   scrape_id                                     5185 non-null   int64  
 3   last_scraped                                  5185 non-null   object 
 4   source                                        5185 non-null   object 
 5   name                                          5185 non-null   object 
 6   description                                   5140 non-null   object 
 7   neighborhood_overview                         3435 non-null   object 
 8   picture_url                                   5185 non-null   object 
 9   host_id                                       5185 non-null   i

In [5]:
Boston_listings.drop(columns=['id', 'name', 'description', 'last_scraped', 'scrape_id',
                                'host_name', 'host_about', 'host_neighbourhood', 'amenities',
                                'source','picture_url', 'first_review', 'last_review', 'review_scores_rating',
                                'review_scores_accuracy', 'review_scores_cleanliness','review_scores_checkin',
                                'review_scores_communication','review_scores_location', 'review_scores_value',
                                'review_scores_value', 'license', 'host_id', 'host_url','host_thumbnail_url',
                                'host_picture_url','calendar_updated','bathrooms','neighbourhood_group_cleansed',
                             'neighbourhood', 'neighborhood_overview'],
                    inplace=True)


> The *neighbourhood* column does not have a lot of information, because 2255 instances are **Boston, Massachusetts, United States**, which does not specify the exact region of a listing, unfortunatley *neighborhood_overview* dependes on *neighbourhood*, not to mention the missing values, dropping the two columns seems fine. <br><br> An alternative to the *neighbourhood* column would be *neighbourhood_cleansed*. 

In [6]:
# Boston_listings.drop(columns=['scrape_id','last_scraped','source','picture_url',# ids & urls & meta
#                              'host_id', 'host_url','host_thumbnail_url','host_picture_url',# id & urls
#                              'calendar_updated','bathrooms','neighbourhood_group_cleansed',# all_nan
#                              'neighbourhood', 'neighborhood_overview'],
#                     inplace=True)


In [7]:

one_hot = pd.get_dummies(Boston_listings['neighbourhood_cleansed'])

Boston_listings = Boston_listings.drop('neighbourhood_cleansed',axis = 1)

Boston_listings = Boston_listings.join(one_hot)

# =================================================================

one_hot = pd.get_dummies(Boston_listings['host_location'], dummy_na=True, prefix='host_location')

Boston_listings = Boston_listings.drop('host_location',axis = 1)

Boston_listings = Boston_listings.join(one_hot)

# =================================================================


one_hot = pd.get_dummies(Boston_listings['host_response_time'], dummy_na=True, prefix='host_response_time')

Boston_listings = Boston_listings.drop('host_response_time',axis = 1)

Boston_listings = Boston_listings.join(one_hot)

# =================================================================


one_hot = pd.get_dummies(Boston_listings['bathrooms_text'], dummy_na=True, prefix='bathrooms_text')

Boston_listings = Boston_listings.drop('bathrooms_text',axis = 1)

Boston_listings = Boston_listings.join(one_hot)



# =================================================================


Boston_listings['host_response_rate'] = Boston_listings['host_response_rate'].str.rstrip('%').astype('float')


# =================================================================



Boston_listings['price'] = Boston_listings['price'].str.strip('$').str.replace(',', '').astype('float')


> The *listing url* that corresponds to of the null values of the column `bedrooms` suggest that those nulls mean *zero* or *no actuall bedroom*.

In [8]:
Boston_listings[Boston_listings['bedrooms'].isnull()]['listing_url']

0                     https://www.airbnb.com/rooms/3168
3                     https://www.airbnb.com/rooms/6695
7                    https://www.airbnb.com/rooms/10813
8                    https://www.airbnb.com/rooms/10986
34                  https://www.airbnb.com/rooms/210097
                             ...                       
5137    https://www.airbnb.com/rooms/708066864505175780
5158    https://www.airbnb.com/rooms/711804721312473870
5159    https://www.airbnb.com/rooms/712092718787212242
5174    https://www.airbnb.com/rooms/714906239224334877
5176    https://www.airbnb.com/rooms/715658190467254169
Name: listing_url, Length: 559, dtype: object

In [9]:
Boston_listings['bedrooms'].isnull().mean()

0.10781099324975892

In [10]:
bedrooms_mode = Boston_listings['bedrooms'].mode()

Boston_listings = Boston_listings.fillna({'bedrooms':0})

In [11]:
Boston_listings['bedrooms'].isnull().mean()

0.0

> For host_response_rate we can fill some of the null values <br><br>
1- By web scraping <br>
2- By host_is_superhost feature because you can't be a superhost unless you satisfy those three requirements: <br>
>>1- Completed at least 10 trips or 3 reservations that total at least 100 nights. <br>
2- Maintained a 90% response rate or higher. <br>
3- Maintained a less than 1% cancellation rate, with exceptions made for those that fall under our Extenuating Circumstances policy. <br>
thus, those null values of host_response_rate that corresponds to a host_is_superhost being 't', we can subtitue them with 90% ;

In [12]:
#<span class="ll4r2nl dir dir-ltr">100%</span>
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import re
import requests




for iteration, (ind, row) in enumerate(Boston_listings[Boston_listings['host_response_rate'].isnull()].iterrows()):
    
    print('iteration: ', iteration)
    
    url = row['listing_url']
    response = requests.get(url)
    if response.status_code != 200:
        continue
        
    driver = webdriver.Chrome()
    driver.get(url)

    time.sleep(40)
    
    content = driver.page_source.encode('utf-8').strip()
    driver.quit() 
    soup = BeautifulSoup(content,"html.parser")
    
        
        
    officials = soup.findAll("div",{"class":"_1k8vduze"}) 
    res_rate = re.findall(r'(?:\d+%)|(?:\d+\.\d+%)', str(officials))
    if len(res_rate) == 0 :
        print("Fail\t", url)
        continue
        
    res_time =  str(officials).split('ul>')[0].split('li>')[-2].split('span>')[0].split('>')[2].split('<')[0].strip()

    Boston_listings.at[ind, 'host_response_rate'] = float(res_rate[0][:-1])
    Boston_listings.at[ind, 'host_response_time'] = res_time
    print(iteration, "\t", ind, "\t", url)



iteration:  0
Fail	 https://www.airbnb.com/rooms/3168
iteration:  1
Fail	 https://www.airbnb.com/rooms/154505
iteration:  2
2 	 30 	 https://www.airbnb.com/rooms/190170
iteration:  3
Fail	 https://www.airbnb.com/rooms/219956
iteration:  4
Fail	 https://www.airbnb.com/rooms/222081
iteration:  5
5 	 42 	 https://www.airbnb.com/rooms/352585
iteration:  6
Fail	 https://www.airbnb.com/rooms/496022
iteration:  7
Fail	 https://www.airbnb.com/rooms/507525
iteration:  8
8 	 53 	 https://www.airbnb.com/rooms/570493
iteration:  9
Fail	 https://www.airbnb.com/rooms/583255
iteration:  10
iteration:  11
11 	 77 	 https://www.airbnb.com/rooms/869353
iteration:  12
Fail	 https://www.airbnb.com/rooms/910408
iteration:  13
13 	 88 	 https://www.airbnb.com/rooms/977860
iteration:  14
Fail	 https://www.airbnb.com/rooms/1054635
iteration:  15
Fail	 https://www.airbnb.com/rooms/1055627
iteration:  16
Fail	 https://www.airbnb.com/rooms/1077105
iteration:  17


ConnectionError: HTTPSConnectionPool(host='www.airbnb.com', port=443): Max retries exceeded with url: /rooms/1167804 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f80694963d0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

In [None]:
Boston_listings.loc[(Boston_listings['host_response_rate'].isnull()) & (Boston_listings['host_is_superhost']=='t'), 'host_response_rate'] = 90



In [None]:
Boston_listings[Boston_listings['host_response_rate'].isnull()]

> The `listing_url` of the null values of `description` & `host_about` indicates that those values are really missing and we can't get them by web scraping, thankfully we can deal with empty strings, as the algorithm that will used to deal with strings in the dataset will be .....

In [20]:
Boston_listings = Boston_listings.fillna({'description':'', 'host_about':''})

> For host_acceptance_rate null values, most of the actual listings either <br>
1- have been inactive for a couple of years. <br>
2- have very little reviews (i.e. they just started listing) <br>
a possible solution would to subtitute those null values with zero due to the defintion of host_acceptance_rate form the airbnb website: <br><br>
*Your acceptance rate measures how often you accept or decline reservations. Guest inquiries are not included in the calculation of your acceptance rate. You can see your acceptance rate from the last **365** days by clicking on the Performance tab, then clicking Basic Requirements.*  <br><br>

In [21]:
Boston_listings = Boston_listings.fillna({'host_acceptance_rate':'0%'})

> For host_is_superhost, we have three null values, all of these are hotels or motels using airbnb, I am not sure of the policy of airbnb, but one can conclude that the this is the reason, also since they are three, a decision to fill those null values with f (they are not superhosts) should not be a problem.

In [22]:
Boston_listings[Boston_listings['host_is_superhost'].isnull()]

Unnamed: 0,id,listing_url,name,description,host_name,host_since,host_about,host_response_rate,host_acceptance_rate,host_is_superhost,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
2485,41740613,https://www.airbnb.com/rooms/41740613,The Revolution Hotel,,The Revolution Hotel,2018-12-03,,100%,74%,,...,,,,,f,1,0,0,0,
2515,42065556,https://www.airbnb.com/rooms/42065556,Inn @ St. Botolph,,Inn @,2016-12-14,Enjoy contemporary charm and outstanding value...,,0%,,...,4.67,4.56,4.5,,f,1,0,0,0,0.87
2836,46232976,https://www.airbnb.com/rooms/46232976,citizenM Boston North Station,,CitizenM,2020-11-04,You’ll find us right around the corner from th...,100%,74%,,...,,,,,f,1,0,0,0,


In [23]:
Boston_listings = Boston_listings.fillna({'host_is_superhost':'f'})

In [24]:
Boston_listings['host_is_superhost'].unique()

array(['f', 't'], dtype=object)

In [29]:
Boston_listings[Boston_listings['host_neighbourhood'].isnull()][['listing_url', 'host_neighbourhood']]

Unnamed: 0,listing_url,host_neighbourhood
136,https://www.airbnb.com/rooms/1541166,
145,https://www.airbnb.com/rooms/1644031,
202,https://www.airbnb.com/rooms/2672058,
214,https://www.airbnb.com/rooms/2975163,
229,https://www.airbnb.com/rooms/3316049,
...,...,...
5015,https://www.airbnb.com/rooms/697294545202982856,
5025,https://www.airbnb.com/rooms/699168449076485035,
5044,https://www.airbnb.com/rooms/702038283928493258,
5075,https://www.airbnb.com/rooms/706037531970508047,


In [28]:
Boston_listings[['listing_url', 'host_neighbourhood']]

Unnamed: 0,listing_url,host_neighbourhood
0,https://www.airbnb.com/rooms/3168,Newton
1,https://www.airbnb.com/rooms/3781,East Boston
2,https://www.airbnb.com/rooms/5506,Roxbury
3,https://www.airbnb.com/rooms/6695,Roxbury
4,https://www.airbnb.com/rooms/7903,Somerville
...,...,...
5180,https://www.airbnb.com/rooms/716081443145047239,Entertainment District
5181,https://www.airbnb.com/rooms/716081469166085329,Entertainment District
5182,https://www.airbnb.com/rooms/716081495310456299,Entertainment District
5183,https://www.airbnb.com/rooms/716235197792512391,Entertainment District


In [None]:
# pd.pandas.set_option('display.max_rows', None)

# pd.reset_option('all', silent=True)

In [25]:

from selenium import webdriver
import time
from bs4 import BeautifulSoup


driver = webdriver.Chrome()
url= "https://www.airbnb.com/rooms/10813?_set_bev_on_new_domain=1672487471_YmRkYzRjZmE4OTI5&source_impression_id=p3_1673713643_DcJa9orfz3w1CQku&locale=en"
driver.maximize_window()
driver.get(url)

time.sleep(5)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
officials = soup.findAll("span",{"class":"_9xiloll"})

for entry in officials:
    print(str(entry))


driver.quit()

<span aria-hidden="false" class="_9xiloll">Boston, Massachusetts, United States</span>


In [8]:
5185 - 4626

559

In [12]:
print(Boston_listings[Boston_listings['bedrooms'].isnull()]['beds'].isnull().mean())
len(Boston_listings[Boston_listings['bedrooms'].isnull()]['beds']) * Boston_listings[Boston_listings['bedrooms'].isnull()]['beds'].isnull().mean()



0.023255813953488372


13.0

> in the `beds` column, there are 71 null values, those listings don't have a clear pattern some them actually have beds, and some don't, yet all beds are available in a non-traditional bedrom, i.e. it's the bed exists in a room where the kitchen is there or the living room (without doors or walls seperating them, I think a suitable option here is to replace those null values with 1 as there is an actual bed in most of them.

In [43]:
Boston_listings = Boston_listings.fillna({'beds':1})

> There are two missing values at `minimum_minimum_nights`, and three other columns, since it is only two (all 4 are missing in the same two observation), thus we can drop them.

In [74]:
# Boston_listings[['minimum_nights', 'maximum_nights', 'minimum_minimum_nights',
#                 'maximum_minimum_nights', 'minimum_maximum_nights', 'maximum_maximum_nights']]
# 900 + 139

Boston_listings.dropna(subset=['minimum_minimum_nights'], inplace=True)

In [75]:
Boston_listings.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5183 entries, 0 to 5184
Data columns (total 59 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            5183 non-null   int64  
 1   listing_url                                   5183 non-null   object 
 2   name                                          5183 non-null   object 
 3   description                                   5183 non-null   object 
 4   host_name                                     5183 non-null   object 
 5   host_since                                    5183 non-null   object 
 6   host_about                                    5183 non-null   object 
 7   host_response_rate                            4586 non-null   object 
 8   host_acceptance_rate                          5183 non-null   object 
 9   host_is_superhost                             5183 non-null   o

In [None]:
# import numpy as np
# from sklearn.feature_extraction.text import TfidfVectorizer
# from sklearn.pipeline import Pipeline, FeatureUnion
# from sklearn.ensemble import RandomForestClassifier
# from sklearn.preprocessing import FunctionTransformer
# from sklearn.model_selection import GridSearchCV, StratifiedKFold

# # host_about, description

# # Create Function Transformer to use Feature Union
# def get_numeric_data(x):
#     return [record[:-2].astype(float) for record in x]

# def get_text_data(x):
#     return [record[-1] for record in x]

# transfomer_numeric = FunctionTransformer(get_numeric_data)
# transformer_text = FunctionTransformer(get_text_data)

# # Create a pipeline to concatenate Tfidf Vector and Numeric data
# # Use RandomForestClassifier as an example
# pipeline = Pipeline([
#     ('features', FeatureUnion([
#             ('numeric_features', Pipeline([
#                 ('selector', transfomer_numeric)
#             ])),
#              ('text_features', Pipeline([
#                 ('selector', transformer_text),
#                 ('vec', TfidfVectorizer(analyzer='word'))
#             ]))
#          ])),
#     ('clf', RandomForestClassifier())
# ])

# # Grid Search Parameters for RandomForest
# param_grid = {'clf__n_estimators': np.linspace(1, 100, 10, dtype=int),
#               'clf__min_samples_split': [3, 10],
#               'clf__min_samples_leaf': [3],
#               'clf__max_features': [7],
#               'clf__max_depth': [None],
#               'clf__criterion': ['gini'],
#               'clf__bootstrap': [False]}

# # Training config
# kfold = StratifiedKFold(n_splits=7)
# scoring = {'Accuracy': 'accuracy', 'F1': 'f1_macro'}
# refit = 'F1'

# # Perform GridSearch
# rf_model = GridSearchCV(pipeline, param_grid=param_grid, cv=kfold, scoring=scoring, 
#                          refit=refit, n_jobs=-1, return_train_score=True, verbose=1)
# rf_model.fit(X_train, Y_train)
# rf_best = rf_model.best_estimator_

In [3]:
Boston_listings

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,3168,https://www.airbnb.com/rooms/3168,20220915162158,2022-09-15,city scrape,TudorStudio,"The ""Studio at 14 Weldon"" is located in Newton...","Newton has 13 unique villages, and gives off a...",https://a0.muscache.com/pictures/ff7952dc-ef0b...,3697,...,,,,,f,1,0,1,0,
1,3781,https://www.airbnb.com/rooms/3781,20220915162158,2022-09-15,city scrape,HARBORSIDE-Walk to subway,Fully separate apartment in a two apartment bu...,"Mostly quiet ( no loud music, no crowed sidewa...",https://a0.muscache.com/pictures/24670/b2de044...,4804,...,4.96,4.87,4.91,,f,1,1,0,0,0.26
2,5506,https://www.airbnb.com/rooms/5506,20220915162158,2022-09-15,city scrape,** Fort Hill Inn Private! Minutes to center!**,"Private guest room with private bath, You do n...","Peaceful, Architecturally interesting, histori...",https://a0.muscache.com/pictures/miso/Hosting-...,8229,...,4.89,4.54,4.73,Approved by the government,f,10,10,0,0,0.69
3,6695,https://www.airbnb.com/rooms/6695,20220915162158,2022-09-15,city scrape,"Fort Hill Inn *Sunny* 1 bedroom, condo duplex","Comfortable, Fully Equipped private apartment...","Peaceful, Architecturally interesting, histori...",https://a0.muscache.com/pictures/38ac4797-e7a4...,8229,...,4.95,4.50,4.71,STR446650,f,10,10,0,0,0.75
4,7903,https://www.airbnb.com/rooms/7903,20220915162158,2022-09-15,city scrape,"Colorful, modern 2 BR apt shared with host",I'm a high school teacher and frequent travele...,"The apartment is in Somerville, located direct...",https://a0.muscache.com/pictures/miso/Hosting-...,14169,...,4.95,4.56,4.80,,f,1,0,1,0,1.84
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5180,716081443145047239,https://www.airbnb.com/rooms/716081443145047239,20220915162158,2022-09-15,city scrape,Private Room with Shared Bath in Quiet Street,*Please Note: You are booking a private room i...,South Boston is a very large neighborhood comp...,https://a0.muscache.com/pictures/prohost-api/H...,2356643,...,,,,STR-460218,f,71,25,46,0,
5181,716081469166085329,https://www.airbnb.com/rooms/716081469166085329,20220915162158,2022-09-15,city scrape,Cozy Bedroom in Convenient Downtown Location,*Please Note: You are booking a private room i...,South Boston is a very large neighborhood comp...,https://a0.muscache.com/pictures/prohost-api/H...,2356643,...,,,,STR-460218,f,71,25,46,0,
5182,716081495310456299,https://www.airbnb.com/rooms/716081495310456299,20220915162158,2022-09-15,city scrape,"Peaceful Bedroom w/ Shared Bath - AC, Wifi inc...",*Please Note: You are booking a private room i...,South Boston is a very large neighborhood comp...,https://a0.muscache.com/pictures/prohost-api/H...,2356643,...,,,,STR-460218,f,71,25,46,0,
5183,716235197792512391,https://www.airbnb.com/rooms/716235197792512391,20220915162158,2022-09-15,city scrape,Sunny Room w/ Shared Bath in Modest Brighton Home,"Perfect for Hospital Stays, Medical Students, ...",The apartment is located in a walkable neighbo...,https://a0.muscache.com/pictures/prohost-api/H...,2356643,...,,,,STR-484106,t,71,25,46,0,


In [4]:
Boston_listings['last_scraped'].unique()

array(['2022-09-15', '2022-10-02'], dtype=object)

In [9]:
Boston_listings['name'].unique()

array(['TudorStudio', 'HARBORSIDE-Walk to subway',
       '** Fort Hill Inn Private! Minutes to center!**', ...,
       'Peaceful Bedroom w/ Shared Bath - AC, Wifi included',
       'Sunny Room w/ Shared Bath in Modest Brighton Home',
       'Charming Room in Modern Shared Spacious Apt'], dtype=object)

In [6]:
Boston_listings['price']

0        $99.00
1       $132.00
2       $149.00
3       $179.00
4       $116.00
         ...   
5180     $51.00
5181     $51.00
5182     $51.00
5183     $51.00
5184     $51.00
Name: price, Length: 5185, dtype: object

In [7]:
Boston_listings['price'] = Boston_listings['price'].str.replace('$', '', regex=True)
Boston_listings['price'] = Boston_listings['price'].str.replace(',', '', regex=True)

In [8]:
Boston_listings = Boston_listings.astype({'price': 'float64'})

In [9]:
am_list = []
for i,amen in enumerate(Boston_listings['amenities']):
    temp_am = []
    for i,amenities in enumerate(amen.split('"')):
        if i % 2 == 1:
            temp_am.append(amenities)
    am.append(len(temp_am))
max(am), min(am), sum(am)/len(am)

NameError: name 'am' is not defined

In [None]:
Boston_listings[Boston_listings['amenities']=='["Long term stays allowed"]']

In [37]:
Boston_listings['amenities']

0       ["Dishes and silverware", "Long term stays all...
1       ["Bed linens", "Dishes and silverware", "Long ...
2       ["Bed linens", "Dishes and silverware", "Long ...
3       ["Dishes and silverware", "Long term stays all...
4       ["Bed linens", "Rice maker", "Dishes and silve...
                              ...                        
5180    ["Long term stays allowed", "Stove", "Wifi", "...
5181    ["Long term stays allowed", "Stove", "Wifi", "...
5182    ["Long term stays allowed", "Stove", "Wifi", "...
5183    ["Bed linens", "Dishes and silverware", "Long ...
5184    ["Bed linens", "Dishes and silverware", "Long ...
Name: amenities, Length: 5185, dtype: object