City With Most Amenities

You're given a dataset of searches for properties on Airbnb. For simplicity, let's say that each search result (i.e., each row) represents a unique host. Find the city with the most amenities across all their host's properties. Output the name of the city.

In [1]:
import pandas as pd
import numpy as np

In [2]:
airbnb_search_details = pd.read_csv("../CSV/airbnb_search_details.csv")
airbnb_search_details.head()

Unnamed: 0,id,price,property_type,room_type,amenities,accommodates,bathrooms,bed_type,cancellation_policy,cleaning_fee,city,host_identity_verified,host_response_rate,host_since,neighbourhood,number_of_reviews,review_scores_rating,zipcode,bedrooms,beds
0,12513361,555.68,Apartment,Entire home/apt,"{TV,""Wireless Internet"",""Air conditioning"",""Sm...",2,1,Real Bed,flexible,False,NYC,t,89 %,2015-11-18,East Harlem,3,87.0,10029,0,1
1,7196412,366.36,Cabin,Private room,"{""Wireless Internet"",Kitchen,Washer,Dryer,""Smo...",2,3,Real Bed,moderate,False,LA,f,100 %,2016-09-10,Valley Glen,14,91.0,91606,1,1
2,16333776,482.83,House,Private room,"{TV,""Cable TV"",Internet,""Wireless Internet"",Ki...",2,1,Real Bed,strict,True,SF,t,100 %,2013-12-26,Richmond District,117,96.0,94118,1,1
3,1786412,448.86,Apartment,Private room,"{""Wireless Internet"",""Air conditioning"",Kitche...",2,1,Real Bed,strict,True,NYC,t,93 %,2010-05-11,Williamsburg,8,86.0,11211,1,1
4,14575777,506.89,Villa,Private room,"{TV,Internet,""Wireless Internet"",""Air conditio...",6,2,Real Bed,strict,True,LA,t,70 %,2015-10-22,,2,100.0,90703,3,3


In [3]:
df = airbnb_search_details

In [4]:
df['amenities_count'] = df['amenities'].str.count(',')+1
df.head()

Unnamed: 0,id,price,property_type,room_type,amenities,accommodates,bathrooms,bed_type,cancellation_policy,cleaning_fee,...,host_identity_verified,host_response_rate,host_since,neighbourhood,number_of_reviews,review_scores_rating,zipcode,bedrooms,beds,amenities_count
0,12513361,555.68,Apartment,Entire home/apt,"{TV,""Wireless Internet"",""Air conditioning"",""Sm...",2,1,Real Bed,flexible,False,...,t,89 %,2015-11-18,East Harlem,3,87.0,10029,0,1,9
1,7196412,366.36,Cabin,Private room,"{""Wireless Internet"",Kitchen,Washer,Dryer,""Smo...",2,3,Real Bed,moderate,False,...,f,100 %,2016-09-10,Valley Glen,14,91.0,91606,1,1,11
2,16333776,482.83,House,Private room,"{TV,""Cable TV"",Internet,""Wireless Internet"",Ki...",2,1,Real Bed,strict,True,...,t,100 %,2013-12-26,Richmond District,117,96.0,94118,1,1,29
3,1786412,448.86,Apartment,Private room,"{""Wireless Internet"",""Air conditioning"",Kitche...",2,1,Real Bed,strict,True,...,t,93 %,2010-05-11,Williamsburg,8,86.0,11211,1,1,15
4,14575777,506.89,Villa,Private room,"{TV,Internet,""Wireless Internet"",""Air conditio...",6,2,Real Bed,strict,True,...,t,70 %,2015-10-22,,2,100.0,90703,3,3,10


In [5]:
df.columns

Index(['id', 'price', 'property_type', 'room_type', 'amenities',
       'accommodates', 'bathrooms', 'bed_type', 'cancellation_policy',
       'cleaning_fee', 'city', 'host_identity_verified', 'host_response_rate',
       'host_since', 'neighbourhood', 'number_of_reviews',
       'review_scores_rating', 'zipcode', 'bedrooms', 'beds',
       'amenities_count'],
      dtype='object')

In [6]:
grouped_asd = df.groupby('city',as_index=False).agg({'amenities_count':'sum'})
grouped_asd

Unnamed: 0,city,amenities_count
0,Boston,63
1,Chicago,88
2,DC,16
3,LA,1146
4,NYC,1415
5,SF,128


In [7]:
max_city = grouped_asd[grouped_asd['amenities_count'] == max(grouped_asd['amenities_count'])]['city'].tolist()
max_city

['NYC']

In [9]:
result = max_city
result

['NYC']

Solution Walkthrough
In this walkthrough, we will analyze a dataset of searches for properties on Airbnb and find the city with the most amenities across all their host's properties.

We will first understand the data, then break down the problem statement, and finally implement the solution code.

Understanding The Data
The data is represented as a pandas DataFrame called airbnb_search_details. Each row in the DataFrame represents a unique host, and the columns contain information about the host and their properties. We are specifically interested in the columns 'city' and 'amenities'.

The Problem Statement
The task is to find the city with the most amenities across all their host's properties. We need to count the number of amenities for each host's property, sum them up by city, and identify the city with the highest sum.

Breaking Down The Code
Let's start by importing the necessary libraries and assigning the DataFrame to the variable df.

import pandas as pd
import numpy as np

df = airbnb_search_details
Next, we create a new column in the DataFrame called 'amenities_count'. This column will contain the count of amenities for each property by counting the number of commas in the 'amenities' column and adding 1 for the last amenity.

df["amenities_count"] = df["amenities"].str.count(",") + 1
We then group the DataFrame by the 'city' column and sum up the 'amenities_count' for each city using the groupby() and agg() functions. The result is stored in a new DataFrame called grouped_asd.

grouped_asd = df.groupby("city", as_index=False).agg(
    {"amenities_count": "sum"}
)
Now, we need to identify the city with the highest sum of amenities. We can achieve this by finding the maximum value in the 'amenities_count' column of the grouped_asd DataFrame and then selecting the corresponding 'city' value.

max_city = grouped_asd[
    grouped_asd["amenities_count"]
    == max(grouped_asd["amenities_count"])
]["city"].tolist()
Finally, we assign the result, which is a list containing the name of the city with the most amenities, to the variable result.

result = max_city
Bringing It All Together
To summarize, we start by importing the necessary libraries and assigning the DataFrame to the variable df. Then, we create a new column 'amenities_count' to store the count of amenities for each property. Next, we group the DataFrame by 'city', sum up the 'amenities_count' for each city, and store the result in grouped_asd. Finally, we find the city with the highest sum of amenities and assign it to result.

import pandas as pd
import numpy as np

df = airbnb_search_details

df["amenities_count"] = df["amenities"].str.count(",") + 1

grouped_asd = df.groupby("city", as_index=False).agg(
    {"amenities_count": "sum"}
)

max_city = grouped_asd[
    grouped_asd["amenities_count"]
    == max(grouped_asd["amenities_count"])
]["city"].tolist()

result = max_city
Conclusion
The provided code helps us identify the city with the most amenities across all their host's properties in the Airbnb dataset.