In [1]:
import pandas as pd

# Problem Description

You’re given a table of rental property searches by users. The table consists of search results and outputs host information for searchers. Find the minimum, average, maximum rental prices for each host’s popularity rating. The host’s popularity rating is defined as below:

    0 reviews: New

    1 to 5 reviews: Rising

    6 to 15 reviews: Trending Up

    16 to 40 reviews: Popular

    more than 40 reviews: Hot

Tip: The `id` column in the table refers to the search ID. You'll need to create your own host_id by concating price, room_type, host_since, zipcode, and number_of_reviews.

Output host popularity rating and their minimum, average and maximum rental prices.

## First look at Data

In [2]:
airbnb = pd.read_csv('airbnb_host_searches.csv')
airbnb.head(3)

Unnamed: 0,id,price,property_type,room_type,amenities,accommodates,bathrooms,bed_type,cancellation_policy,cleaning_fee,city,host_identity_verified,host_response_rate,host_since,neighbourhood,number_of_reviews,review_scores_rating,zipcode,bedrooms,beds
0,8284881,621.46,House,Entire home/apt,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...",8,3,Real Bed,strict,True,LA,f,100%,2016-11-01,Pacific Palisades,1,,90272,4,6
1,8284882,621.46,House,Entire home/apt,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...",8,3,Real Bed,strict,True,LA,f,100%,2016-11-01,Pacific Palisades,1,,90272,4,6
2,9479348,598.9,Apartment,Entire home/apt,"{""Wireless Internet"",""Air conditioning"",Kitche...",7,2,Real Bed,strict,False,NYC,f,100%,2017-07-03,Hell's Kitchen,1,60.0,10036,3,4


## Firsts Tougths
* Instead of creating a new host_id, i will drop all the duplicates based on sugested paramters

* Create a new column with the Host's popularity

* groupby host_popularity and agg[min, mean, max] prices

## Data Analysis

In [4]:
#checking for missing values and format of columns
airbnb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 167 entries, 0 to 166
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   id                      167 non-null    int64  
 1   price                   167 non-null    float64
 2   property_type           167 non-null    object 
 3   room_type               167 non-null    object 
 4   amenities               167 non-null    object 
 5   accommodates            167 non-null    int64  
 6   bathrooms               167 non-null    int64  
 7   bed_type                167 non-null    object 
 8   cancellation_policy     167 non-null    object 
 9   cleaning_fee            167 non-null    bool   
 10  city                    167 non-null    object 
 11  host_identity_verified  167 non-null    object 
 12  host_response_rate      135 non-null    object 
 13  host_since              167 non-null    object 
 14  neighbourhood           152 non-null    ob

There is a few missing values, but not in the main columns

How we are not working with dates, there is no need to convert to_datetime

## Solution

In [7]:
df=airbnb.copy()
#Drop duplicates, resulting only in unique hosts
df = df.drop_duplicates(subset=['price','zipcode','host_since','number_of_reviews', 'room_type'])

#Defining the rule for Host's popularity
def to_popularity(n_reviews):
    if n_reviews == 0:
        return '1- New'
    elif n_reviews <=5:
        return '2- Rising'
    elif n_reviews <=15:
        return '3- Trending Up'
    elif n_reviews <=40:
        return '4- Popular'
    elif n_reviews >40:
        return '5- Hot'
    else:
        return 'Error'
        
df['host_popularity'] = df['number_of_reviews'].apply(to_popularity)
output = df.groupby('host_popularity').price.agg(['min', 'mean', 'max']).reset_index()

## Final Output

In [9]:
output.round(2)

Unnamed: 0,host_popularity,min,mean,max
0,1- New,313.55,515.92,741.76
1,2- Rising,355.53,503.85,717.01
2,3- Trending Up,361.09,476.28,685.65
3,4- Popular,270.81,472.82,667.83
4,5- Hot,340.12,464.23,633.51
