# **Project Name**    -AirBnb Bookings Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member  - RAVI KANT PRASAD


# **Project Summary -**

The analysis of the Airbnb dataset, which includes around 49,000 observations with 16 columns (a mix of categorical and numerical variables), provides key insights into host behavior, customer preferences, pricing trends, and overall platform performance. This exploratory data analysis (EDA) aims to uncover patterns and relationships that are valuable for Airbnb’s strategic decision-making.This project focused on exploring and preparing a dataset for analysis. The data exploration phase involved examining and understanding the dataset’s characteristics, such as its structure, data types, presence of missing values, and value distributions. The cleaning phase addressed issues like errors, missing entries, and duplicate records, while also identifying and removing outliers.

Through this process, we resolved any inconsistencies, ensuring the data was of high quality and suitable for further analysis. This crucial step minimized potential biases or errors that could influence the outcomes. The refined dataset is now well-prepared to address specific research questions.

After cleaning and preparing the data, we proceeded to explore and summarize it by generating descriptive statistics, visualizations, and identifying patterns and trends. During this phase, we investigated relationships between variables and explored potential underlying causes for observed patterns or trends.

We utilized data visualization techniques to interpret and uncover patterns in Airbnb data. Various charts and graphs were created to represent the data, with observations and insights documented for each visualization. This approach enabled us to gain a deeper understanding of the dataset and identify meaningful insights and patterns.

Through this analysis, we discovered trends and relationships that would have been challenging to discern from raw data alone. For instance, we identified factors such as minimum nights, number of reviews, and host listing counts as significant determinants of pricing. Additionally, we observed notable variations in availability across neighborhoods. These findings provide valuable insights for both travelers and hosts in the city.

The insights and observations derived from this analysis serve as a foundation for future decision-making and analysis related to Airbnb. Moreover, our findings offer practical guidance and valuable information for both travelers and hosts within the city.

# **GitHub Link -**

https://github.com/Ravikant0705/Airbnb-Booking-Analysis-.git

# **Problem Statement**


1.Popular Neighborhoods and Price Trends
Identify the most popular neighborhoods for Airbnb rentals in the city. Analyze price and availability variations by neighborhood.

2.Market Trends Over Time
Analyze the evolution of the Airbnb market in the city. Explore significant trends in the number of listings, prices, and occupancy rates over time.

3.Property Type Trends
Investigate the types of properties listed on Airbnb in the city. Determine which types are more popular or command higher prices.

4.Price Correlations
Identify factors correlated with Airbnb rental prices in the city, such as location, amenities, property type, or proximity to landmarks.

5.Investment Opportunities for Hosts
Recommend the best neighborhoods for property investment based on affordable property rates and high traveler demand.

6.Lengths of Stay by Neighborhood
Explore how the lengths of stay vary across neighborhoods. Identify areas that attract longer versus shorter stays.

7.Ratings vs. Pricing
Examine the relationship between Airbnb rental prices and their ratings. Determine if higher-priced rentals typically receive better reviews.

8.Review Analysis by Neighborhood
Calculate the total number of reviews and identify the neighborhood group with the maximum reviews.

9.Most Reviewed Room Types
Identify the most reviewed room types within each neighborhood group on a monthly basis.

10.Best Locations for Travelers
Identify the best properties or locations in the city for travelers, based on reviews, amenities, and proximity to popular attractions.

11.Best Locations for Hosts
Determine the best locations for hosts to list their properties, considering high occupancy rates and rental demand.

12.Price Variations Across Neighborhood Groups
Analyze price variations across different neighborhood groups in the city to identify significant disparities or trends.


#### **Define Your Business Objective?**

To address the problem statements effectively, we align each analysis with a specific business objective. This ensures that the insights generated are actionable and tailored to the needs of Airbnb hosts, potential investors, and travelers.


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')


In [None]:
data=pd.read_csv('/content/drive/MyDrive/Airbnb NYC 2019.csv')
data

### Dataset First View

In [None]:
# Dataset First Look
data.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, columns = data.shape
print(f"Number of Rows: {rows}")
print(f"Number of Columns: {columns}")

In [None]:
data.columns

### Dataset Information

In [None]:
# Dataset Info
data.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
data.duplicated().sum()

In [None]:
data2 = data.drop_duplicates()
data2.count()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
data2.isnull().sum()

In [None]:
# Visualizing the missing values
import missingno as msno
msno.bar(data)
plt.show()

In [None]:
sns.heatmap(data.isnull(), cbar=False);

In [None]:
data.loc[:,data.isna().sum()!=0][:5]

In [None]:
def showMissing():
    missing = data.columns[data.isnull().any()].tolist()
    return missing

missingVal = pd.DataFrame()
missingVal['Missing Data Count'] = data[showMissing()].isnull().sum().sort_values(ascending = False)
missingVal['Missing Data Percentage'] = data[showMissing()].isnull().sum().sort_values(ascending = False)/len(data)*100

print(missingVal)

### What did you know about your dataset?

1.The dataset comprises 48,895 records and 16 attributes, incorporating a mix of integer, float, and string data types, indicating the presence of both numeric and categorical variables.
2.The last_review attribute is currently stored as a string, though it represents a date and requires conversion to the appropriate data type.
3.The dataset contains unique entries with no duplicates, ensuring the data is unbiased and ready for analysis. The absence of duplicates prevents issues in subsequent processes, such as skewing results or complicating data summarization.
4.Some attributes, such as name, host_name, last_review, and reviews_per_month, contain missing values that need to be addressed.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
data.columns

In [None]:
# Dataset Describe
data.describe()

### Variables Description

* **id** : Unique Id generated while storing data
* **name** : Name of the listing
* **host_id** : The host id is assigned to each host by Airbnb and is used to identify and distinguish hosts from one another.
* **host_name** : The host name is typically the name of the person who owns the property or is authorized to list it on Airbnb.
* **neighbourhood_group** : Location of the listing / categorical variable that indicates the general geographic area in which a listing is located.
* **neighbourhood** : area of the listing
* **latitude** : Latitude range of the listing
* **longitude** : Longitude range of the listing
* **room_type** : Type of the listing
* **price** : Price of the listing
* **minimum_nights** : Minimum nights to be paid for
* **number_of_reviews** : Number of reviews for the listing
* **last_review** : Content of the last review
* **reviews_per_month** : Average number of reviews that a listing receives per month
* **calculated_host_listings_count** : Total number of listings that a host has on the Airbnb platform
* **availability_365** : The number of days in a year that a listing is available for booking on the Airbnb platform based on the listing's calendar, and reflects the number of days in the future that the listing is marked as available for booking.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
data.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df = data.copy()  #creating copy to keep original dataset safe

In [None]:
#changing data type for last review
df['last_review'] = pd.to_datetime(df['last_review'], infer_datetime_format=True)

#calculating total count of observation where number of review is equal to 0
len(df[df['number_of_reviews']== 0])

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
#replacing null values with possible data

#categorial value like name and host name is replace with not known
df['name'].fillna('not known',inplace = True)
df['host_name'].fillna('not known',inplace = True)

#replacing numerical value for reviews per month with 0 as number of review is 0 for those values
df['reviews_per_month'].fillna(0,inplace = True)

In [None]:
#dropping last review column as it can only provide information about the timeliness of the data being analyzed
df.drop(['id', 'last_review'], axis=1,inplace = True)
df.isnull().sum()

In [None]:
# Calculate the median of the feature price
median = df['price'].median()

# Replace 0 with the median
df['price'].replace(0, median, inplace=True)

In [None]:
#calculating total count of observation where number of review is equal to 0
len(df[df['price']== 0])

In [None]:
#calculating number of potential booking each can take
df['number_of_bookings'] = (df['availability_365'] / df['minimum_nights']).astype(int)
df['potential_revenue_per_year'] = df['number_of_bookings']* df['price']

In [None]:
#finding unique neighbourhood names and room type for each group
count = 0
for i in df.neighbourhood_group.unique():
  count= count+1
  print(f'{count}. {i} has room type {df.room_type.unique()}')

In [None]:
#finding mean and median based on neighbourhood_group
df.groupby('neighbourhood_group')[['price','number_of_reviews', 'reviews_per_month','calculated_host_listings_count','number_of_bookings', 'potential_revenue_per_year']].agg(['mean', 'median','min','max','sum']).T

In [None]:
import plotly.express as px
from scipy.stats import norm

In [None]:
# # Datetime library for manipulating Date columns.
# from datetime import datetime
# import calendar

# The following lines adjust the granularity of reporting.
pd.options.display.float_format = "{:.2f}".format

In [None]:
#finding mean and median based on neghbourhood_group and room_type
df.groupby(['neighbourhood_group','room_type'])[['price','number_of_reviews', 'reviews_per_month','calculated_host_listings_count','number_of_bookings','potential_revenue_per_year']].agg(['mean', 'median','min', 'max','sum']).T

In [None]:
#finding total review when price is maximum
df_max_reviews = df[df['price'] == df['price'].max()].groupby('neighbourhood_group')[['reviews_per_month','number_of_bookings']].sum().reset_index().sort_values('reviews_per_month', ascending = False)
max_price = df['price'].max()
print(f'Maximum Price for each Group is {max_price}')
df_max_reviews

In [None]:
#finding total review when price is minimum
df_min_reviews = df[df['price'] == df['price'].min()].groupby('neighbourhood_group')[['reviews_per_month','number_of_bookings']].sum().reset_index().sort_values('reviews_per_month', ascending = False)
Min_price = df['price'].min()
print(f'Minimum Price for each Group is {Min_price}')
df_min_reviews

In [None]:
#finding for price above min and below max
df_between_max_min = df[(df['price'] > df['price'].min()) & (df['price'] < df['price'].max())].groupby('neighbourhood_group')[['reviews_per_month','number_of_bookings']].sum().reset_index().sort_values('reviews_per_month', ascending = False)
print(f'Price range for each Group is {Min_price} - {max_price}')
df_between_max_min

In [None]:
#considering only those reviews which are more than average and price range between min and max
df_above_avg_reviews = df[(df['reviews_per_month']> df['reviews_per_month'].mean()) & (df['price'] > df['price'].min()) & (df['price'] < df['price'].max())].groupby('neighbourhood_group')[['reviews_per_month','number_of_bookings']].sum().reset_index().sort_values('reviews_per_month', ascending = False)
avg = df['reviews_per_month'].mean()
print(f'Average Review: {avg}')
df_above_avg_reviews

In [None]:
#based on avg reviews per month creating poor and good engagement
df['review_quality'] = df['reviews_per_month'].apply(lambda x:'Poor Engagement'  if x < df['reviews_per_month'].mean() else 'Good Engagement')

In [None]:
#checking number of booking based on review quality
pd.DataFrame(df.groupby('review_quality')['number_of_bookings'].value_counts().reset_index(name='Count')).head()

In [None]:
#checking number of booking based on review quality
pd.DataFrame(df.groupby('review_quality')['number_of_bookings'].value_counts().reset_index(name='Count')).tail()

In [None]:
#average price for airbnb which has good engagement but actual booking is 0
df_good = df[(df['number_of_bookings']==0) & (df['review_quality']=='Good Engagement')]
df_good['price'].mean()

In [None]:
#average price for airbnb which has poor engagement but more than 200 booking
df_bad = df[(df['number_of_bookings']>200) & (df['review_quality']=='Poor Engagement')]
df_bad['price'].mean()

In [None]:
#function to filter host with number of booking more than 0 and show good engagement
def good_host(group):
    return group[(group['number_of_bookings'] > 0) & (group['review_quality'] =='Good Engagement')]

# Grouping by host id and host name and apply the function
df_good_host = df.groupby(['host_id','host_name']).apply(good_host)

In [None]:
#creating copy and to make changes as it will be have host id and host name as column and row
df_good_host2 =  df_good_host.copy()
df_good_host2.drop(columns=['host_id','host_name'],inplace=True)
df_good_host2 = df_good_host2.reset_index()
df_good_host2.drop(columns=['level_2'],inplace=True)

In [None]:
#finding top 5 host with maximum booking and average price and minimum nights
#for their booking with good engagement
agg_dict = {'minimum_nights':'min','number_of_bookings':'sum','price':'mean'}
df_good_host2.groupby(['host_id','host_name']).agg(agg_dict).reset_index().sort_values(
    'number_of_bookings',ascending = False)[:6]

### What all manipulations have you done and insights you found?

In order to proceed and find the insight from the data it was important to make it consistent, therefore i first changed ***data type*** for '***last_review***' feature as it represent date but had object data type. so change it into ***datetime*** data type. Secondly, it was important to deal with null values before feature engineering as the scope of this project was to find relation amoung different features and find the trend and pattern and null values can skew the result, therefore i changed the ***name*** and ***host_name*** null value with ***'not known'***. Since total count of number of review equal to zero was equal to the count of null values in ***review per month***, and as there was no review for the month therefore review per month should also be zero therefore, i changed null value with ***zero***. Thirdly, since ***minimum value*** for ***price*** was ***zero***, and since there is no specific data available why it was zero for example whether any discount was given or because of some other reason, therefore i changed zero value with ***median*** as median is less sensitive to the presence of outliers than the mean, and can provide a more accurate representation of the central tendency of the data in these cases. Next added ***new column - number_of_booking*** and ***potenial_revenue_per_year*** as it will be helpful in forecasting potential revenue it can generate throughout year if ideally booked based on availability. Finally ***dropped*** the ***last_review*** date column even though it can be used to know how old our data can be and in sentiment analysis, but in the context of this analysis it may not be used and also dropped ***id*** column because the feature will ***not*** have any ***variability*** - every row will have a unique value, so the feature will not be able to provide any information about the relationship between the target variable and the feature. Ignored ***availability_365*** ***zero*** values to understand how these values are affecting overall business as being an airbnb host you should be available for atleast more than 1 day in entire year, its better to figure out what the reason may be,so a better anlysis on these number was needed. Then dumped the cleaned data into csv format for visual anlaysis in tableau.

After making consistent data, calculated to mean, median, minimum, maximum and sum for each feature, later grouped neighbourhood_group and room type to check what does maximum and minimum price show customer engagement. However better analysis could be done from visualization. These are few finding from analysis:
* Minimum price was 10 and maximum price was 10000 for neighbourhood group.
* People show very less interaction with maximum price and main reason was because it does not show any number of bookings.
* Manhattan shows maximum engagement with respect to reviews per month.
* Sonder (NYC) shows maximum engagement.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
#creating bar plot function
def barPlot(df,X,Y):
  plt.figure(figsize=(15,8));
  ax = sns.barplot(data = df,x= X, y = Y);


  # Add labels to the bars
  for bar in ax.patches:
      ax.annotate(format(bar.get_height(), '.0f'),
                    (bar.get_x() + bar.get_width() / 2,
                      bar.get_height()), ha='center', va='center',
                    size=15, xytext=(0, 8),
                    textcoords='offset points')

  # Set the font size of the tick labels to 12
  ax.tick_params(axis='both', which='major', labelsize=18);
  return ax

In [None]:
fig = px.histogram(df, x="price",);
fig.show()

In [None]:
#price lower than 1000
price_less_than_1000 = df[df['price'] <= 1000]
col = 'price'
sns.distplot(price_less_than_1000[col], color = '#055E85',fit = norm);
feature = price_less_than_1000[col]
plt.axvline(feature.mean(), color='#ff033e', linestyle='dashed', linewidth=3,label= 'mean');  #Rose-Red Color indicate mean of data
plt.axvline(feature.median(), color='#A020F0', linestyle='dashed', linewidth=3,label='median'); #Cyan indicate median of data

plt.legend(bbox_to_anchor = (1.0, 1), loc = 'best');
plt.title('Distribution of Price', fontsize=20, color='red')
# Add a title to the plot with custom font size and color
plt.title('Distribution of Price', fontsize=20, color='red')

# Show the plot
plt.show()

In [None]:
# Calculate Q1, Q3, and IQR
Q1 = df['price'].quantile(0.25)
Q3 = df['price'].quantile(0.75)
IQR = Q3 - Q1

# Identify outliers using the IQR method
outliers = df[~((df['price'] < (Q1 - 1.5 * IQR)) | (df['price'] > (Q3 + 1.5 * IQR)))]

# Remove the outliers from the dataset
df_clean = df[((df['price'] >= (Q1 - 1.5 * IQR)) & (df['price'] <= (Q3 + 1.5 * IQR)))]

In [None]:
col = 'price'
sns.distplot(df_clean[col], color = '#055E85',fit = norm);
feature = df_clean[col]
plt.axvline(feature.mean(), color='#ff033e', linestyle='dashed', linewidth=3,label= 'mean');  #Rose-Red Color indicate mean of data
plt.axvline(feature.median(), color='#A020F0', linestyle='dashed', linewidth=3,label='median'); #Cyan indicate median of data

plt.legend(bbox_to_anchor = (1.0, 1), loc = 'best')
# Add a title to the plot with custom font size and color
plt.title('Distribution of Price', fontsize=20, color='red')

# Show the plot
plt.show()

In [None]:
#box plot for outlier visualization
sns.boxplot(y='price', data=df_clean).set_title('Price Distribution');

##### 1. Why did you pick the specific chart?

To analyze the distribution of price, the most effective way to visualize continuous data is through a histogram or a KDE plot. Therefore, I utilized a distplot, which combines both representations.

##### 2. What is/are the insight(s) found from the chart?

From the initial plot, I observed that prices ranged from 0 to 10,000. However, visualizing the entire range on the graph was challenging. Most of the values were concentrated below 1,000, with significantly fewer occurrences above this threshold. Additionally, the price distribution was right-skewed, prompting me to attempt outlier removal using the IQR method. Nonetheless, for the purposes of this project, I have decided not to proceed with the cleaned data.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding the distribution of prices is crucial for any business, as it directly influences customer demand. If the price is set too high, it might deter potential customers, resulting in decreased sales and revenue. Conversely, pricing too low could lead to insufficient profits or even difficulty covering operational costs, such as maintenance expenses. Therefore, businesses must carefully analyze price distribution to strike a balance that maximizes revenue while remaining appealing to customers.

Only three Airbnb listings are priced at 10,000, which is significantly higher than the majority of listings, where prices are under 1,000. This disparity could potentially have a negative impact on business revenue. However, further analysis is required to determine whether this pricing truly affects revenue by comparing it with other features for more precise insights.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Count the number of listings in each neighborhood group and store the result in a Pandas series
counts = data['neighbourhood_group'].value_counts()

# Reset the index of the series so that the neighborhood groups become columns in the resulting dataframe
Top_Neighborhood_group = counts.reset_index()

# Rename the columns of the dataframe to be more descriptive
Top_Neighborhood_group.columns = ['Neighborhood_Groups', 'Listing_Counts']

# display the resulting DataFrame
Top_Neighborhood_group

In [None]:
plt.figure(figsize=(12, 8),dpi=80)

# Create a countplot of the neighbourhood group data
sns.countplot(data['neighbourhood_group'])


# Set the title of the plot
plt.title('Neighbourhood_group Listing Counts in city', fontsize=15)

# Set the x-axis label
plt.xlabel('Neighbourhood_Group', fontsize=14)

# Set the y-axis label
plt.ylabel('total listings counts', fontsize=14)

##### Bar charts are ideal for comparing data across categories. Here, the categories are the five neighborhood groups, and the chart will clearly show the differences in listing counts among them.
Bar charts are straightforward to interpret. Each bar represents a neighborhood group, and the bar's height corresponds to the listing count, making it easy to grasp the data at a glance.

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

*   Manhattan and Brooklyn have the highest number of listings on Airbnb, with over 19,000 listings each.

*   Queens and the Bronx have significantly fewer listings compared to Manhattan and Brooklyn, with 5,567 and 1,070 listings, respectively

*   Staten Island has the fewest number of listings, with only 365.

*   The distribution of listings across the different neighborhood groups is skewed, with a concentration of listings in Manhattan and Brooklyn.

*   Despite being larger in size, the neighborhoods in Queens, the Bronx, and Staten Island have fewer listings on Airbnb compared to Manhattan, which has a smaller geographical area.

*   This could suggest that the demand for Airbnb rentals is higher in Manhattan compared to the other neighborhoods, leading to a higher concentration of listings in this area.

*   Alternatively, it could be that the supply of listings is higher in Manhattan due to a higher number of homeowners or property owners in this neighborhood who are willing to list their properties on Airbnb.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

> Based on the above insight it will be useful in understanding the popularity of different neighborhood groups, or for comparing the supply of listings in different neighborhoods.

> Compartively less airbnb in Staten Island will generate less revenue, hence need to change marketing strategy like giving promotional offer would work.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
#grouping and taking mean of price
df_avg_price = df.groupby('neighbourhood_group')['price'].mean().reset_index().sort_values(
    'price',ascending = False)

#line plot
plt.figure(figsize=(12,8));

ax = sns.lineplot(data = df_avg_price,x='neighbourhood_group', y = 'price',
  marker= 'o', color = 'green',linewidth=3);

# Set the font size of the tick labels to 12
ax.tick_params(axis='both', which='major', labelsize=18);
# Set the x-label with a font size of 25
ax.set_xlabel("neighbourhood_group", fontsize=25)

# Set the y-label with a font size of 25
ax.set_ylabel("Price", fontsize=25)
plt.title('Price Point of neighbourhood_group', fontsize=20, color='green');

##### 1. Why did you pick the specific chart?

I wanted to find the trend of price in neighbourhood_groups as price represents continous data(data that can take any value).

##### 2. What is/are the insight(s) found from the chart?

 Price which airbnb offer in Manhattan is more than others. Based on the earlier analysis we found that Staten Island has less number of airbnb, price point for that is relatively high when compared to others.   

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

> Yes, the price point of airbnb refers to the specific price at which it is offered for service. Manhattan offers a competitive avg price of 196 which is a good value to generate profitable revenue and enhance brand perception.

> Price point in Staten Island is relatively high compared to number of airbnb it has, it can decrease demand and cause negative brand perception.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Create the violin plot for price distribution in each Neighbourhood_groups

ax= sns.violinplot(x='neighbourhood_group',y='price',data= data)

##### 1. Why did you pick the specific chart?

The violin plot for price distribution in each Neighbourhood_groups


##### 2. What is/are the insight(s) found from the chart?

*   price distribution is very high in Manhattan and Brooklyn. but Manhattan have more Diversity in price range, you can see in violin plot.

*   Queens and Bronx have same price distribution but in Queens area more distribution in 50$ to 100$ but diversity in price is not like Manhattan and Brooklyn.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

> Yes, the price point of airbnb refers to the specific price at which it is offered for service. Manhattan offers a competitive avg price of 196 which is a good value to generate profitable revenue and enhance brand perception.

>Queens and Bronx have same price distribution but in Queens area more distribution in 50$ to 100$ but diversity in price is not like Manhattan and Brooklyn.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
#grouping all host id to know top 10 host
df_popular_host = df.groupby(['host_id','host_name'])['calculated_host_listings_count'
                                ].max().reset_index().sort_values(
                            'calculated_host_listings_count',ascending = False)[:10]
#barplot
ax = barPlot(df_popular_host,'host_name','calculated_host_listings_count')
# Set the x-label with a font size of 25
ax.set_xlabel("Host Name", fontsize=25)
plt.xticks(fontsize = 14, rotation = 90);

# Set the y-label with a font size of 25
ax.set_ylabel("Total Airbnb", fontsize=25)
plt.title('Top 10 Host with Most Airbnb', fontsize=20, color='purple');

##### 1. Why did you pick the specific chart?

 Since popular host name represent to be discrete data, barplot will be the best option for plotting these type of data.

##### 2. What is/are the insight(s) found from the chart?

Sounder is the most popular host with the highest number of airbnb listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

> Sounder  has maximux listing that means, there is a chance he/she will be generating more revenue.

> There are host with only 1 listing, they can create a negative brand name if they won't generate profit. Strategies can be designed on how they can generate more revenue.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
#Customer Engagement
df_engage = df.groupby(['name'])['number_of_reviews'].sum().reset_index().sort_values(
    'number_of_reviews',ascending = False)[:10]
#barplot
plt.figure(figsize=(12,8));
ax = sns.barplot(data = df_engage, y='name', x = 'number_of_reviews');
# ax.bar_label(ax.containers[0]);  will work other than google colab

# Set the font size of the tick labels to 12
ax.tick_params(axis='both', which='major', labelsize=15);
# Set the x-label with a font size of 25
ax.set_xlabel("Total Number of Reviews", fontsize=25)

# Set the y-label with a font size of 25
ax.set_ylabel("Airbnb Name", fontsize=25)
plt.title('Top 10 Customer Engaging Airbnb', fontsize=20, color='purple');

In [None]:
#reviews_per_month
df_engage2 = df.groupby(['name'])['reviews_per_month'].sum().reset_index().sort_values(
    'reviews_per_month', ascending = False)[:10]
#barplot
plt.figure(figsize=(12,8));
ax = sns.barplot(data = df_engage2, y='name', x = 'reviews_per_month');
# ax.bar_label(ax.containers[0]);  will work other than google colab

# Set the font size of the tick labels to 12
ax.tick_params(axis='both', which='major', labelsize=15);
# Set the x-label with a font size of 25
ax.set_xlabel("Reviews Per Month", fontsize=25)

# Set the y-label with a font size of 25
ax.set_ylabel("Airbnb Name", fontsize=25)
plt.title('Top 10 Customer Engaging Airbnb', fontsize=20, color='purple');

In [None]:
#grouping all host id to know top 10 host
df_popular_host = df.groupby(['host_id','host_name'])['reviews_per_month'].sum().reset_index().sort_values(
                            'reviews_per_month',ascending = False)[:10]
#barplot
ax = barPlot(df_popular_host,'host_name', 'reviews_per_month');

# Set the x-label with a font size of 25
ax.set_xlabel("Host Name", fontsize=25)
plt.xticks(fontsize = 14, rotation = 90);

# Set the y-label with a font size of 25
ax.set_ylabel("Reviews Per Month", fontsize=25)
plt.title('Top 10 Customer Engaging Host', fontsize=20, color='purple');

In [None]:
#checking engagement based on room type
df_room = df.groupby('room_type')['reviews_per_month'].sum().reset_index().sort_values(
    'reviews_per_month', ascending = False)
ax = barPlot(df_room,'room_type','reviews_per_month')
# Set the x-label with a font size of 25
ax.set_xlabel("Room Type", fontsize=25)

# Set the y-label with a font size of 25
ax.set_ylabel("Reviews Per Month", fontsize=25)
plt.title('Customer Engagement based on Room Type', fontsize=20, color='purple');

##### 1. Why did you pick the specific chart?

> I used this plot to display relative proportions of reviews, since barplot helps to compare data across different categories.

##### 2. What is/are the insight(s) found from the chart?

> Airbnb listing **Room near JFK Queen Bed** has maximum number of reviews but when compared with average review per month airbnb listing - **Enjoy great views of the City in our Deluxe Room!** and host name **Sonder (NYC)** and **Entire home/apt** shows maximum customer engagement.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

> Yes, customer engagement is the heart and soul of any business, maximum engagement means maximum profit.

> Negative impact could be identified if sentiment analysis for review will be done, it is hard to predict with numbers, deos not have specific data for the analysis.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
#violin plot
#note : for this analysis data without outliers is used which was cleaned earlier in the analysis
plt.figure(figsize=(12,6))
sns.violinplot(data=df_clean,x="room_type", y="price").set_title(
    'Price Distribution for Room Type',fontsize=20 ,color='purple')
plt.show()

##### 1. Why did you pick the specific chart?

> I used this plot as violin plot is used mainly when learning about distrbution of quantitaive data across one categorial variable, here it helps best in understanding distribution of price amoung certain demand and can be used to compare the distributions of prices between different groups, in this case price distribution for different room type.

##### 2. What is/are the insight(s) found from the chart?

> Entire home or appartment room type has higher price range than private and shared.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

> Different price range for different economy people which is more accepted toward high end profit.

> There is no such negative impact.

#### Chart - 8

In [None]:
# Chart - 8 visualization code Possible Revenue Generation based on Availability
#grouping host id
df_revenue = df.groupby(['host_id','host_name'])[
    'potential_revenue_per_year'].sum().reset_index().sort_values(
        'potential_revenue_per_year', ascending = False)[:10]

# plt.figure(figsize=(15,6));
barPlot(df_revenue, 'host_name', 'potential_revenue_per_year');
plt.title('Possible Revenue Per Year',fontsize=20,color='red');

In [None]:
#grouping host name and id based on availabilty to know potential revenue
df_non_functional = df.groupby(['host_id', 'host_name','number_of_bookings'])[
    'price'].max().reset_index().sort_values('price', ascending = False)
df_non_functional = df_non_functional[df_non_functional['number_of_bookings']<=5]

In [None]:
#barplot
ax = barPlot(df_non_functional.head(10),'host_name', 'price');
# Set the x-label with a font size of 25
ax.set_xlabel("Host Name", fontsize=25)

# Set the y-label with a font size of 25
ax.set_ylabel("Price", fontsize=25)
plt.title('Top 10 Host with High Price and less than 5 Booking per Year', fontsize=20, color='purple');

##### 1. Why did you pick the specific chart?

> To compare different category.

##### 2. What is/are the insight(s) found from the chart?

> Sonder(NYC) can possibly produce highest revenue if booked for entire available day with minimum night booking.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

> This analysis was done to forecast ideal case to understand total revenue each host can generate based on possible number of booking.

> There are host like Jelena Catherine who offers less than 5 booking per year but relatively very high price than average price offered by airbnb which can cause negative brand name and even revenue generated by them is zero, these listing show no profit for business.

#### Chart - 9

In [None]:
# Chart - 9 visualization
#grouping
df_avgPrice_roomType = df.groupby(['neighbourhood_group','room_type'])[
    'price'].mean().reset_index().sort_values('price',ascending=False)
# Chart - 8 visualization code
plt.figure(figsize=(12,6));
ax = sns.barplot(data= df_avgPrice_roomType,x='neighbourhood_group',y='price',hue='room_type');
# Add labels to the bars
for bar in ax.patches:
    ax.annotate(format(bar.get_height(), '.0f'),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')

# Set the font size of the tick labels to 12
ax.tick_params(axis='both', which='major', labelsize=18);
# Set the x-label with a font size of 25
ax.set_xlabel("Neighbourhood Group", fontsize=25)

# Set the y-label with a font size of 25
ax.set_ylabel("Average Price", fontsize=25)
plt.title('Price Point for each Room Type in Neighbourhood Group', fontsize=20, color='purple');

##### 1. Why did you pick the specific chart?

> Best way to visualize three different variable in one chart.

##### 2. What is/are the insight(s) found from the chart?

> Manhattan has highest price point in room type and Bronx has lowest.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

> Maximum profit and has most afforable price range for all class of people.

> Less average price resembles that there are more number of listing and it may nit be generating required amount of profit, need to change marketing strategy.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
#Number Of Active Hosts Per Location Using Line Chart
# create a new DataFrame that displays the number of hosts in each neighborhood group in the Airbnb NYC dataset
hosts_per_location = data.groupby('neighbourhood_group')['id'].count().reset_index()

# rename the columns of the resulting DataFrame to 'Neighbourhood_Groups' and 'Host_counts'
hosts_per_location.columns = ['Neighbourhood_Groups', 'Host_counts']

# display the resulting DataFrame
hosts_per_location

In [None]:
# Group the data by neighbourhood_group and count the number of listings for each group
hosts_per_location = data.groupby('neighbourhood_group')['id'].count()

# Get the list of neighbourhood_group names
locations = hosts_per_location.index

# Get the list of host counts for each neighbourhood_group
host_counts = hosts_per_location.values

# Set the figure size
plt.figure(figsize=(12, 5))

# Create the line chart with some experiments using marker function
plt.plot(locations, host_counts, marker='o', ms=12, mew=4, mec='r')

# Add a title and labels to the x-axis and y-axis
plt.title('Number of Active Hosts per Location', fontsize='15')
plt.xlabel('Location', fontsize='14')
plt.ylabel('Number of Active Hosts', fontsize='14')

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

data by neighbourhood_group and count the number of listings for each group,with the help of line chart with some experiments using marker function.

##### 2. What is/are the insight(s) found from the chart?


*   Manhattan has the largest number of hosts with 19501,Brooklyn has the second largest number of hosts with 19415.

* After that Queens with 5567 and the Bronx with 1070. while Staten Island has the fewest with 365.

*   Brooklyn and Manhattan have the largest number of hosts, with more than double the number of hosts in Queens and more than 18 times the number of hosts in the Bronx.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 1.By leveraging insights into the popularity of Manhattan and Brooklyn businesses can maximize returns while planning strategic expansions into other areas.

2.Over-reliance on high-performing areas without addressing potential challenges like over-saturation or neglecting low-host areas can hinder balanced growth and resilience.





#### Chart - 11

In [None]:
# Chart - 11 visualization using Scatter and Bar chart**
# Convert the 'price' column to numeric before grouping
data['price'] = pd.to_numeric(data['price'], errors='coerce')

# create a new DataFrame that displays the average price of Airbnb rentals in each neighborhood
neighbourhood_avg_price = data.groupby("neighbourhood")['price'].mean().reset_index().rename(columns={"price": "avg_price"})[['neighbourhood', 'avg_price']]

# select the top 10 neighborhoods with the lowest average prices
neighbourhood_avg_price = neighbourhood_avg_price.sort_values("avg_price").head(10)

# join the resulting DataFrame with the 'neighbourhood_group' column from the Airbnb NYC dataset, dropping any duplicate entries
neighbourhood_avg_price_sorted_with_group = neighbourhood_avg_price.join(data[['neighbourhood', 'neighbourhood_group']].drop_duplicates().set_index('neighbourhood'),
                                                                         on='neighbourhood')

# Display the resulting data
# Use hide() instead of hide_index()
display(neighbourhood_avg_price_sorted_with_group.style.hide(axis="index")) # Changed hide_index() to hide(axis="index")

In [None]:
neighbourhood_avg_price = data.groupby("neighbourhood")['price'].mean().reset_index().rename(columns={"price": "avg_price"})[['neighbourhood', 'avg_price']]
neighbourhood_avg_price = neighbourhood_avg_price.sort_values("avg_price")

# Instead of using select_dtypes, calculate average price directly from the original DataFrame
neighbourhood_avg_price = data.groupby("neighbourhood")["price"].mean()

# Create a new DataFrame with the average price for each neighborhood
neighbourhood_prices = pd.DataFrame({"neighbourhood": neighbourhood_avg_price.index, "avg_price": neighbourhood_avg_price.values})

# Merge the average price data with the original DataFrame
df = data.merge(neighbourhood_prices, on="neighbourhood")

# Create the scattermapbox plot
fig = df.plot.scatter(x="longitude", y="latitude", c="avg_price", title="Average Airbnb Price by Neighborhoods in City", figsize=(12,6), cmap="plasma")
fig

In [None]:
# Extract the values from the dataset
neighborhoods = neighbourhood_avg_price_sorted_with_group['neighbourhood']
prices = neighbourhood_avg_price_sorted_with_group['avg_price']

# Create the bar plot
plt.figure(figsize=(15,5))
plt.bar(neighborhoods, prices,width=0.5, color = 'orchid')
plt.xlabel('Neighborhood')
plt.ylabel('Average Price')
plt.title('Average Price by Neighborhood')

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

Bar Chart: Focuses on the overall volume of listings per borough, making it ideal for summarizing the host counts.

Scatter Plot: Digs deeper into patterns like affordability and how it might influence the number of listings or housing choices.

##### 2. What is/are the insight(s) found from the chart?

* All of the neighborhoods listed are located in the outer boroughs of  City (Bronx, Queens, and Staten Island). This suggests that these neighborhoods may have a lower overall cost of living compared to neighborhoods in Manhattan and Brooklyn.

*  Most of these neighborhoods are located in the Bronx and Staten Island. These boroughs tend to have a lower overall cost of living compared to Manhattan and Brooklyn.

*  These neighborhoods may be attractive to renters or buyers looking for more affordable housing options in the  City area.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights support strategic growth in underserved or underutilized neighborhoods, creating opportunities for market expansion. However, care must be taken to balance investments across high-demand and affordable areas to mitigate risks like over-saturation or limited demand in specific neighborhoods. Businesses can achieve a positive impact by diversifying their focus and tailoring their approach to each borough's unique strengths and challenges.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
# create a new DataFrame that displays the number of listings of each room type in the Airbnb NYC dataset
top_room_type = data['room_type'].value_counts().reset_index()

# rename the columns of the resulting DataFrame to 'Room_Type' and 'Total_counts'
top_room_type.columns = ['Room_Type', 'Total_counts']

# display the resulting DataFrame
top_room_type


In [None]:
# Set the figure size
plt.figure(figsize=(10, 6))

# Get the room type counts
room_type_counts = data['room_type'].value_counts()

# Set the labels and sizes for the pie chart
labels = room_type_counts.index
sizes = room_type_counts.values

# Create the pie chart
plt.pie(sizes, labels=labels, autopct='%1.1f%%')

# Add a legend to the chart
plt.legend(title='Room Type', bbox_to_anchor=(0.8, 0, 0.5, 1), fontsize='12')

# Show the plot
plt.show()

In [None]:

# Grouping the Airbnb data by neighbourhood group and summing the number of reviews for each group
reviews_by_neighbourhood_group = data.groupby("neighbourhood_group")["number_of_reviews"].sum()

# Creating a pie chart to visualize the distribution of total reviews among different neighbourhood groups
plt.pie(reviews_by_neighbourhood_group, labels=reviews_by_neighbourhood_group.index, autopct='%1.1f%%')

# Adding a title to the chart
plt.title("Total Reviews by Neighborhood Group in City", fontsize='15')

# Displaying the chart
plt.show()

##### 1. Why did you pick the specific chart?

I pick the specific chart which is Pie chart.
Focus on Shares: Pie charts are excellent for illustrating percentages or shares of a whole, which aligns perfectly with both review share and room type distribution data.

Intuitive Visuals: Pie charts are universally understood and allow viewers to easily compare parts to the whole.

Highlighting Dominance: The chart effectively emphasizes dominant categories, like Brooklyn's review share or the popularity of entire homes.

##### 2. What is/are the insight(s) found from the chart?

Brooklyn (42.8%) and Manhattan (39.9%) dominate the share of Airbnb reviews, indicating these boroughs are the most popular among travelers.
Queens (13.8%), the Bronx (2.5%), and Staten Island (1.0%) have significantly fewer reviews, reflecting lower Airbnb activity or traveler engagement in these areas.

Despite fewer listings, Brooklyn has a higher review share than Manhattan. This suggests that Brooklyn's listings may be more engaging, attract a different traveler demographic, or have a higher tendency for users to leave reviews.
Entire homes/apartments are the most popular listing type, accounting for the majority of listings (22,784).

Private rooms (21,996) are also common, while shared rooms (1,138) are significantly less popular, showing a strong preference for privacy among Airbnb travelers.

Airbnb offers a variety of room types, catering to different traveler needs, from budget-conscious shared rooms to more private entire homes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights suggest strong potential for business impact through focused efforts in Brooklyn and Manhattan while also encouraging diversification into less popular areas. However, balancing growth across all boroughs, monitoring market trends, and maintaining quality standards are essential to avoiding risks like over-saturation and uneven market development.

#### Chart - 13

In [None]:
# Chart - 13 visualization
# create a new DataFrame that displays the average price of Airbnb rentals in each neighborhood
neighbourhood_avg_price = data.groupby("neighbourhood")['price'].mean().reset_index().rename(columns={"price": "avg_price"})[['neighbourhood', 'avg_price']]

# select the top 10 neighborhoods with the lowest average prices
neighbourhood_avg_price = neighbourhood_avg_price.sort_values("avg_price").head(10)

# join the resulting DataFrame with the 'neighbourhood_group' column from the Airbnb NYC dataset, dropping any duplicate entries
neighbourhood_avg_price_sorted_with_group = neighbourhood_avg_price.join(data[['neighbourhood', 'neighbourhood_group']].drop_duplicates().set_index('neighbourhood'),
                                                                         on='neighbourhood')

# Display the resulting data
# Instead of .hide_index(), use .hide(axis='index')
display(neighbourhood_avg_price_sorted_with_group.style.hide(axis='index'))

In [None]:
neighbourhood_avg_price = (data.groupby("neighbourhood")['price'].mean().reset_index().rename(columns={"price": "avg_price"}))[['neighbourhood', 'avg_price']]
neighbourhood_avg_price = (neighbourhood_avg_price.sort_values("avg_price"))

# Group the data by neighborhood and calculate the average price
# neighbourhood_avg_price = data.groupby("neighbourhood")["price"].mean() #This line is redundant

# Create a new DataFrame with the average price for each neighborhood
# neighbourhood_prices = pd.DataFrame({"neighbourhood": neighbourhood_avg_price.index, "avg_price": neighbourhood_avg_price.values}) #This line is redundant

# Merge the average price data with the original DataFrame#trying to find where the coordinates belong from the latitude and longitude
df = data.merge(neighbourhood_avg_price, on="neighbourhood")

# Create the scattermapbox plot
# fig = df.plot.scatter(x="longitude", y="latitude", c="avg_price", title="Average Airbnb Price by Neighborhoods in New York City", figsize=(12,6), cmap="plasma")
# fig

#Instead of using pandas plotting directly, use plotly for scattermapbox
import plotly.express as px

fig = px.scatter_mapbox(df, lat="latitude", lon="longitude", color="avg_price",
                        hover_name="neighbourhood",
                        title="Average Airbnb Price by Neighborhoods in New York City",
                        zoom=9, height=600,
                        color_continuous_scale="plasma")
fig.update_layout(mapbox_style="open-street-map")
fig.show()

In [None]:
# Extract the values from the dataset
neighborhoods = neighbourhood_avg_price_sorted_with_group['neighbourhood']
prices = neighbourhood_avg_price_sorted_with_group['avg_price']

# Create the bar plot
plt.figure(figsize=(15,5))
plt.bar(neighborhoods, prices,width=0.5, color = 'orchid')
plt.xlabel('Neighborhood')
plt.ylabel('Average Price')
plt.title('Average Price by Neighborhood')

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

Scattermapbox charts are visually appealing, combining clean design with functional representation. This makes the data both informative and accessible to diverse audiences.
Using scattermapbox aligns well with your goal of presenting data about affordable neighborhoods in NYC, offering a clear and interactive way to communicate key insights.

##### 2. What is/are the insight(s) found from the chart?

All of the neighborhoods listed are located in the outer boroughs of New York City (Bronx, Queens, and Staten Island). This suggests that these neighborhoods may have a lower overall cost of living compared to neighborhoods in Manhattan and Brooklyn.

*  Most of these neighborhoods are located in the Bronx and Staten Island. These boroughs tend to have a lower overall cost of living compared to Manhattan and Brooklyn.

*  These neighborhoods may be attractive to renters or buyers looking for more affordable housing options in the New York City area

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights are overwhelmingly positive for businesses focused on affordability, middle-income consumers, or local expansion. However, careful planning is needed to mitigate potential risks, such as limited purchasing power or unintended gentrification. Tailoring strategies to the unique characteristics of these neighborhoods will be crucial for sustained growth and community alignment.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
sns.set_context('notebook')
plt.figure(figsize = (14,10))
plt.xticks(fontsize= 14)
plt.yticks(fontsize= 10)
# Include numeric_only=True within the .corr() method.
sns.heatmap(df.corr(numeric_only=True), annot=True,linewidth=.5,cmap="PiYG");

##### 1. Why did you pick the specific chart?

> I choose this chart because correlation heatmap is the easiest way to identify which variables are correlated and the strength of the correlation. It can help identify multicollinearity, which can be useful for identifying which variables to include in a model or analysis.

##### 2. What is/are the insight(s) found from the chart?

Feature like id and host_id , price and potential_revenue_per_year, number_of_reviews and review_per_month, availability_365 and number of booking show high correlation which means they show multicolinearity as a result these variable can not be used together to train model, so either they should be combined togther using any formula or relation or they should be dropped for further analysis.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df);

In [None]:
sns.pairplot(df, hue='room_type');

In [None]:
sns.pairplot(df, hue='neighbourhood_group');

In [None]:
sns.pairplot(df, hue='neighbourhood');

In [None]:
sns.pairplot(df, hue='minimum_nights');

In [None]:
sns.pairplot(df, hue='availability_365');

In [None]:
sns.pairplot(df, hue='number_of_reviews');

##### 1. Why did you pick the specific chart?

A pairplot can be a valuable tool for data analysis when trying to understand and analyze the relationships between different variables and identify patterns and trends in the data.

##### 2. What is/are the insight(s) found from the chart?

It shows there are very less linear relation between features.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the analysis, it is clear that

1.   There are host which take 0 booking for entire year and not even available for booking but has high booking price, these listing should be either removed or should undergo physical verification to learn exact reason.
2.   In order to increase profit for host with less lisitng, special promotional offer should be given and personalised marketing for them should be done from airbnb to promote business in their specific region.
3.   Price point for few area are relatively high compared to others based on number of listing in the other area, it could be changed.
4.   Reward for host who provide maximum customer engagement to keep them motivated, this will help in host retention and generation of profit.
5.   Since private room/apt get maximum reviews per month, therefore, these kind of listing could be increased based on customer demand.

# **Conclusion**

From the above analysis, we could conclude that:
*   **Average price** distribution for airbnb falls under the range of 1000 i.e., **150-200**.
*   Amoung all different area **Manhattan** and Brooklyn has **maximum airbnb** and **Staten Island** has **least** airbnb.
*   **Sounder ** is the most **popular host** with the highest number of airbnb listings.
*   **Price** which airbnb offer in **Manhattan** is more than others.
*   Airbnb listing **Room near JFK Queen Bed** has maximum number of reviews but when compared with average review per month airbnb listing - **Enjoy great views of the City in our Deluxe Room!** and host name **Sonder ** and **Entire home/apt** shows **maximum** customer **engagement**.
*   **Sonder** can possibly produce **highest revenue** if booked for entire available day with minimum night booking.
*   **Manhattan** has **highest price point** in room type and **Bronx** has **lowest**.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***