## <b> Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present a more unique, personalized way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analyzed and used for security, business decisions, understanding of customers' and providers' (hosts) behavior and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more. </b>

## <b>This dataset has around 49,000 observations in it with 16 columns and it is a mix between categorical and numeric values. </b>

## <b> Explore and analyze the data to discover key understandings (not limited to these) such as : 
* What can we learn about different hosts and areas?
* What can we learn from predictions? (ex: locations, prices, reviews, etc)
* Which hosts are the busiest and why?
* Is there any noticeable difference of traffic among different areas and what could be the reason for it? </b>

In [None]:
import pandas as pd
import numpy as np

In [None]:
from google.colab import drive
drive.mount('/content/drive')

path = '/content/drive/MyDrive/AlmaBetter/Team Capstone Projects/1 Exploratory Data Analysis/Airbnb Bookings Analysis/Airbnb NYC 2019.csv'
df = pd.read_csv(path)
df_air = df.copy()

Mounted at /content/drive


# Peak

In [None]:
df_air.shape

(48895, 16)

In [None]:
df_air.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              48895 non-null  int64  
 1   name                            48879 non-null  object 
 2   host_id                         48895 non-null  int64  
 3   host_name                       48874 non-null  object 
 4   neighbourhood_group             48895 non-null  object 
 5   neighbourhood                   48895 non-null  object 
 6   latitude                        48895 non-null  float64
 7   longitude                       48895 non-null  float64
 8   room_type                       48895 non-null  object 
 9   price                           48895 non-null  int64  
 10  minimum_nights                  48895 non-null  int64  
 11  number_of_reviews               48895 non-null  int64  
 12  last_review                     

In [None]:
df_air.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [None]:
df_air.groupby('neighbourhood')['price'].sum().sort_values(ascending=False)

neighbourhood
Williamsburg          563707
Midtown               436801
Upper West Side       415720
Hell's Kitchen        400987
Bedford-Stuyvesant    399917
                       ...  
Westerleigh              143
Silver Lake              140
Richmondtown              78
Rossville                 75
New Dorp                  57
Name: price, Length: 221, dtype: int64

In [None]:
df_air['neighbourhood'].value_counts()

Williamsburg          3920
Bedford-Stuyvesant    3714
Harlem                2658
Bushwick              2465
Upper West Side       1971
                      ... 
Fort Wadsworth           1
Richmondtown             1
New Dorp                 1
Rossville                1
Willowbrook              1
Name: neighbourhood, Length: 221, dtype: int64

In [None]:
df_air.groupby('neighbourhood_group')['room_type'].count()

neighbourhood_group
Bronx             1091
Brooklyn         20104
Manhattan        21661
Queens            5666
Staten Island      373
Name: room_type, dtype: int64

In [None]:
df_air.columns

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365'],
      dtype='object')

# 1 What can we learn about different hosts and areas?

## Host Vs Count and Price

In [None]:
host_counts = df_air['host_id'].value_counts()
host_counts.describe()

count    37457.000000
mean         1.305363
std          2.760747
min          1.000000
25%          1.000000
50%          1.000000
75%          1.000000
max        327.000000
Name: host_id, dtype: float64

In [None]:
host_counts

219517861    327
107434423    232
30283594     121
137358866    103
16098958      96
            ... 
23727216       1
89211125       1
19928013       1
1017772        1
68119814       1
Name: host_id, Length: 37457, dtype: int64

In [None]:
# Percentage of hosts using service one time
100*host_counts[host_counts==1].count()/len(host_counts)

86.24022212136583

In [None]:
# Percentage of hosts using service two times
100*host_counts[host_counts==2].count()/len(host_counts)

8.887524361267587

In [None]:
# Percentage of hosts using service two times
100*host_counts[(host_counts>2)].count()/len(host_counts)

4.87225351736658

Host:

* Mean number of times a host uses service = 1.3
* Percentage of hosts using service one time = 86.24
* Percentage of hosts using service two times = 8.89

## Neighbourhood vs Count, Room type and Price

# 2 What can we learn from predictions? (ex: locations, prices, reviews, etc)

## Num of Reviews vs Host count

## Reviews per month vs Host count

## Neighbourhood vs Host count

## Prices vs Host count

## Minimum nights vs Host count

## Minimum Nights vs Price

# 3 Which hosts are the busiest and why?

# 4 Is there any noticeable difference of traffic among different areas and what could be the reason for it?

# 5 Others

## Availability vs Houst count

## words in Name vs count

## Room type count