# Airbnb - Booking Analysis Project 


This project focuses on performing an exploratory data analysis (EDA) of Airbnb listings within a selected city. The goal is to examine various factors such as price, availability, location, and property type to uncover the underlying trends and patterns that influence the demand for Airbnb listings. Through this analysis, we aim to provide a comprehensive overview of the Airbnb market in the city, helping stakeholders to make informed decisions.


## Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium

## Loading Data

In [None]:
df = pd.read_csv('listings4.csv')
df.head()

In [None]:
df.shape

In [None]:
df.info()

## Data Cleaning

### 1- Locate Missing Data

In [None]:
df.isnull().sum()

#### 1.1 Drop the data

In [None]:
df.drop(['id','last_review','license'], inplace =True, axis =1)
df

In [None]:
df.isnull().sum()

In [None]:
# معالجة القيم الفارغة في خاصية السعر
# الاطلاع على العقارات التي لا تملك سعر محدد
missing_price_rows = df[df['price'].isna()]
missing_price_rows

In [None]:
# كل حي كم فيه عقار مجهول السعر منها  
missing_price_per_neighborhood = missing_price_rows.groupby('neighbourhood').size().reset_index(name='Missing Price Count')
missing_price_per_neighborhood

In [None]:
# ملء القيم الفارغة في عمود السعر بمتوسط السعر لنفس الحي
neighborhood_price_means = df.groupby('neighbourhood')['price'].mean()

for neighborhood, price_mean in neighborhood_price_means.items():
    if pd.isna(price_mean):
        continue
    neighborhood_rows = df['neighbourhood'] == neighborhood
    df.loc[neighborhood_rows & df['price'].isna(), 'price'] = price_mean

df.loc[73:,:]

In [None]:
df.isnull().sum()

#### 1.2 Input missing data

In [None]:
df.dropna(inplace=True)
df.isnull().sum()

### 2- Check for Duplicates

In [None]:
df.duplicated().sum()

In [None]:
df.drop_duplicates()

In [None]:
df.shape

### 3- Detect Outliers

In [None]:
describe_data = df[['price', 'minimum_nights', 'number_of_reviews','reviews_per_month','calculated_host_listings_count','number_of_reviews_ltm']].describe()
describe_data 

In [None]:
plt.figure(figsize=(15, 15))

plt.subplot(3, 2, 1)
sns.boxplot(x=df['price'])
plt.title('Price Boxplot')

plt.subplot(3, 2, 2)
sns.boxplot(x=df['minimum_nights'])
plt.title('Minimum Nights Boxplot')

plt.subplot(3, 2, 3)
sns.boxplot(x=df['number_of_reviews'])
plt.title('Number of Reviews Boxplot')

plt.subplot(3, 2, 4)
sns.boxplot(x=df['reviews_per_month'])
plt.title('Reviews Per Month Boxplot')

plt.subplot(3, 2, 5)
sns.boxplot(x=df['calculated_host_listings_count'])
plt.title('Calculated Host Listings Count Boxplot')

plt.subplot(3, 2, 6)
sns.boxplot(x=df['number_of_reviews_ltm'])
plt.title('Number of Reviews LTM Boxplot')

plt.tight_layout()
plt.show()

In [None]:
Q1 = describe_data.loc['25%']
Q3 = describe_data.loc['75%']
IQR = Q3 - Q1

lower_bounds = Q1 - 1.5 * IQR
upper_bounds = Q3 + 1.5 * IQR

for column in ['price', 'minimum_nights', 'number_of_reviews','reviews_per_month','calculated_host_listings_count','number_of_reviews_ltm']:
    df = df[(df[column] >= lower_bounds[column]) & (df[column] <= upper_bounds[column])]


In [None]:
df

In [None]:
plt.figure(figsize=(15, 15))

plt.subplot(3, 2, 1)
sns.boxplot(x=df['price'])
plt.title('Price Boxplot')

plt.subplot(3, 2, 2)
sns.boxplot(x=df['minimum_nights'])
plt.title('Minimum Nights Boxplot')

plt.subplot(3, 2, 3)
sns.boxplot(x=df['number_of_reviews'])
plt.title('Number of Reviews Boxplot')

plt.subplot(3, 2, 4)
sns.boxplot(x=df['reviews_per_month'])
plt.title('Reviews Per Month Boxplot')

plt.subplot(3, 2, 5)
sns.boxplot(x=df['calculated_host_listings_count'])
plt.title('Calculated Host Listings Count Boxplot')

plt.subplot(3, 2, 6)
sns.boxplot(x=df['number_of_reviews_ltm'])
plt.title('Number of Reviews LTM Boxplot')

plt.tight_layout()
plt.show()

# Data analysis & visualizations 

### Top 10 Hosts:
By identifying hosts with the most properties, stakeholders can understand which hosts have a significant presence in the market and potentially explore collaboration opportunities.

In [None]:
top_10_hosts = df.groupby('host_id').size().reset_index(name='Number of Properties').sort_values(by='Number of Properties', ascending=False)
top_10_hosts.head(10)

### Distribution of Room Types:
Understanding the distribution of room types allows stakeholders to gauge the diversity of accommodations available on the platform, catering to different preferences and budgets of travelers.

In [None]:
room_type_counts = df.groupby('room_type').size().reset_index(name='Count')
room_type_counts

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='room_type', discrete=True)
plt.title('Distribution of Room Types')
plt.xlabel('Room Type')
plt.ylabel('Count')
plt.xticks(rotation=45)  
plt.show()

### Relationship between Room Type and Availability:
This visualization helps stakeholders understand how the availability of different room types varies throughout the year, providing insights into seasonal demand patterns.



In [None]:
# نوع الغرفة وعدد التوافر خلال السنة
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x='room_type', y='availability_365')
plt.title('Relationship between Room Type and Availability Over the Year')
plt.xlabel('Room Type')
plt.ylabel('Availability (365 days)')
plt.show()

##### Entire home/apt:
It has the highest median availability, indicating that entire homes/apartments are generally more available throughout the year. Additionally, the data range between the minimum and the third quartile, suggesting variability in availability but generally long periods of availability.
##### Private room:
There is variability in the availability of this room type as well, but it falls somewhere between entire homes/apartments and shared rooms in terms of median availability. The maximum and minimum values indicate a diverse range of availability periods.
##### Hotel room:
Hotel rooms exhibit the highest median availability among all room types, suggesting that they are generally continuously available throughout the year. Some outliers indicate hotels with very long availability periods.
##### Shared room:
This room type has the lowest median availability among all types, but it shows significant variability in availability periods. Although generally less available, some outliers suggest availability for long periods at times.


### Average Price by Room Type:
By comparing the average prices of different room types, stakeholders can identify which types of accommodations command higher prices and adjust their pricing strategies accordingly.



In [None]:
# متسوط السعر لكل انواع الغرف
average_price = df['price'].mean()
average_price

In [None]:
# تجميع البيانات بناءً على نوع الغرفة وحساب متوسط السعر وعدد التقييمات
room_type_stats = df.groupby('room_type').agg({'price': 'mean', 'number_of_reviews': 'sum'}).reset_index()
room_type_stats

In [None]:
room_type_price = df.groupby('room_type')['price'].mean().reset_index()

plt.figure(figsize=(10, 6))
sns.barplot(data=room_type_price, y='room_type', x='price', palette='Set2')
plt.title('Average Price by Room Type')
plt.xlabel('Average Price')
plt.ylabel('Room Type')
plt.show()

### Room Count by Neighbourhood and Room Type:
This analysis provides a detailed breakdown of the number of properties by room type in each neighborhood, enabling stakeholders to assess the variety of accommodations available in different areas.



In [None]:
# تجميع البيانات بناءً على الحي ونوع الغرف وحساب عدد الغرف لكل حالة
room_count_by_neighbourhood_and_room_type = df.groupby(['neighbourhood', 'room_type']).size().unstack(fill_value=0).reset_index()
room_count_by_neighbourhood_and_room_type

### Average Minimum Nights:
Knowing the average minimum nights required for booking provides insights into the booking preferences of guests and helps hosts set minimum stay policies accordingly.


In [None]:
# حساب متوسط عدد الليالي المطلوبة للحجز
average_minimum_nights = df['minimum_nights'].mean()
average_minimum_nights

In [None]:
night_count = df.groupby('minimum_nights').size().reset_index(name='Number of Properties')
night_count

In [None]:
plt.figure(figsize=(10, 6))
plt.bar(night_count['minimum_nights'], night_count['Number of Properties'], color='skyblue')
plt.title('Number of Properties by Minimum Nights')
plt.xlabel('Minimum Nights')
plt.ylabel('Number of Properties')
plt.show()

### Number of Properties and Average Reviews by Neighbourhood Group:
This scatter plot shows the relationship between the number of properties and the average reviews per month for each neighborhood group, helping stakeholders identify neighborhoods with high demand and positive guest feedback.



In [None]:
#ماهو عدد العقارات في كل مجموعة حيوية وما متوسط التقييم لهذه المجموعات خلال 12 شهر

# تجميع البيانات بناءً على المجموعة الحيوية وحساب عدد العقارات في كل مجموعة
properties_per_neighborhood_group = df.groupby('neighbourhood_group').size().reset_index(name='Number of Properties')

# تجميع البيانات بناءً على المجموعة الحيوية وحساب متوسط التقييم للـ 12 شهر الأخيرة
reviews_per_neighborhood_group = df.groupby('neighbourhood_group')['reviews_per_month'].mean().reset_index(name='Average Reviews per Month')

# دمج البيانات في جدول واحد
merged_data = pd.merge(properties_per_neighborhood_group, reviews_per_neighborhood_group, on='neighbourhood_group')
merged_data


In [None]:
plt.figure(figsize=(10, 6))
sns.scatterplot(data=merged_data, x='Number of Properties', y='Average Reviews per Month', hue='neighbourhood_group', palette='Set2', s=100)
plt.title('Number of Properties vs. Average Reviews per Month by Neighbourhood Group')
plt.xlabel('Number of Properties')
plt.ylabel('Average Reviews per Month')
plt.legend(title='Neighbourhood Group', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

Analyzing the number of properties in each neighborhood group alongside the average reviews per month provides valuable insights for stakeholders to understand the performance and desirability of different neighborhood areas on Airbnb. By leveraging this information, stakeholders can strategically allocate resources and investments, focusing on neighborhoods with high demand and positive guest feedback to maximize returns and enhance overall guest satisfaction. Additionally, identifying areas with lower review averages presents opportunities for stakeholders to address any potential issues or shortcomings, such as property maintenance or guest experience, to improve the overall quality of listings and attract more guests.







### Highest Rated Neighbourhood Group:
By identifying the neighborhood group with the highest average reviews per month, stakeholders can pinpoint areas with high guest satisfaction and potentially target their marketing efforts towards those neighborhoods.



In [None]:
# المجموعة الحيوية الاعلى متوسط تقييم

# العثور على أعلى متوسط تقييم
max_rating = reviews_per_neighborhood_group['Average Reviews per Month'].max()
# العثور على اسم المجموعة الحيوية المقابل لأعلى متوسط تقييم
highest_rated_neighborhood_group = reviews_per_neighborhood_group[reviews_per_neighborhood_group['Average Reviews per Month'] == max_rating]['neighbourhood_group']

result = f"Highest rated neighborhood group: {highest_rated_neighborhood_group.values[0]} - Rating: {max_rating}"
print(result)

### Average Price by Neighbourhood Group:
Understanding the average prices across different neighborhood groups allows stakeholders to assess the affordability of accommodations in each area and tailor their offerings accordingly.



In [None]:
price_neighbourhood = df.groupby('neighbourhood_group')['price'].mean().reset_index()

plt.figure(figsize=(12, 12))
sns.barplot(data=price_neighbourhood, x='price', y='neighbourhood_group')
plt.title('Average Price by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Average Price')
plt.xticks(rotation=45)  
plt.show()

### Average Price by Neighbourhood Group and Room Type:
This analysis provides insights into how prices vary across different neighborhood groups and room types, enabling stakeholders to make data-driven pricing decisions.



In [None]:
# تجميع البيانات بناءً على مجموعة الحي ونوع الغرف وحساب متوسط السعر لكل حالة

average_price_by_neighbourhood_group_and_room_type = df.groupby(['neighbourhood_group', 'room_type'])['price'].mean().unstack().reset_index()
average_price_by_neighbourhood_group_and_room_type

### Distribution of Booking Prices Over the Year:
This histogram illustrates the distribution of booking prices, helping stakeholders understand the range of prices offered on the platform and identify any pricing trends.



In [None]:
# توزيع أسعار الحجوزات على مدار السنة
plt.figure(figsize=(12, 6))
sns.histplot(data=df, x='price', bins=30, kde=True)
plt.title('Distribution of Booking Prices Over the Year')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

# Note:
الكود هذا يعرض خريطة وتوزيع العقارات في المدينة - لم اقم بتشغيل الكود لانه يزيد من حجم النوت بوك إلى 35 قيقا بايت

## To try the code first install this library:
pip install folium


### Interactive Map of Airbnb Listings:
The interactive map provides a visual representation of Airbnb listings in the city, allowing stakeholders to explore the geographical distribution of properties and assess their proximity to amenities and attractions.



In [None]:
map_airbnb = folium.Map(location=[df['latitude'].mean(), df['longitude'].mean()], zoom_start=10)

# إضافة نقاط العقارات إلى الخريطة مع تلوينها حسب السعر
for index, row in df.iterrows():
    folium.CircleMarker(location=[row['latitude'], row['longitude']],
                        radius=5,
                        popup=f"Price: {row['price']}$",
                        fill=True,
                        fill_color='blue' if row['price'] < 100 else 'red',  # تلوين النقاط حسب السعر
                        color='grey',
                        fill_opacity=0.7).add_to(map_airbnb)

# عرض الخريطة
map_airbnb

# Recommendation for stakeholders:



Learn from the success stories! Utilize the top 10 hosts in Airbnb to understand successful operational patterns and identify ways to enhance your offerings and pricing strategies. Use this knowledge to build a successful strategy and enter the market with confidence and success.

Based on the data analysis, I recommend the following to stakeholders:

Diversify Accommodation Options: Given that "Entire home/apartment" and "Private room" are the most common room types, it's essential to maintain a diverse range of accommodation options to attract a wider audience of travelers with different preferences.
Improve Hotel Room Offerings: While "Hotel room" listings are relatively fewer in number, there's potential to enhance their appeal by offering unique amenities or special experiences to attract guests seeking a hotel-like experience.
Enhance Visibility of Shared Rooms: Although "Shared room" listings are less common, stakeholders can focus on promoting these listings to budget-conscious travelers or those seeking a communal living experience, thereby increasing their visibility and bookings.


Based on the box plot visualization illustrating the relationship between room type and availability throughout the year, I recommend the following insights:

#### Diversify Room Offerings:
Stakeholders should consider diversifying their room offerings to accommodate varying levels of availability. For instance, while Entire home/apt and Private room types generally exhibit higher availability (with median values of 170 and 250 days respectively), Hotel room and Shared room types tend to have lower availability (with median values of 180 and 310 days respectively). By offering a mix of room types, hosts can attract a wider range of guests and optimize occupancy rates throughout the year.

#### Optimize Pricing Strategies:
Understanding the distribution of availability across different room types can help stakeholders optimize pricing strategies. For room types with consistently high availability (e.g., Entire home/apt), stakeholders may consider implementing dynamic pricing models to adjust rates based on demand and maximize revenue during peak seasons. Conversely, for room types with limited availability (e.g., Hotel room), offering promotional discounts during off-peak periods can help stimulate demand and increase bookings.

#### Enhance Marketing Efforts:
Stakeholders can leverage insights from room availability data to tailor marketing campaigns and promotions effectively. By highlighting the availability of specific room types during peak travel seasons or special events, hosts can attract potential guests seeking accommodation options that align with their preferences and travel plans.



Based on the analysis of the average price by room type, it's evident that Entire home/apt has the highest average price (257.034436), followed by Hotel room (248.983686), Private room (130.549311), and Shared room (72.988095). This insight suggests an opportunity for stakeholders to optimize their pricing strategy according to room type. To capitalize on this, consider implementing differential pricing strategies tailored to each room type to maximize revenue potential. Additionally, conduct regular market analysis to stay informed about pricing trends and adjust pricing strategies accordingly. By optimizing pricing strategies based on room type dynamics, stakeholders can enhance revenue generation and maintain competitiveness in the Airbnb market.

By aggregating data based on neighborhood and room type and calculating the number of rooms for each case, stakeholders gain valuable insights into the distribution of accommodation options across different neighborhoods and room types. This information can be leveraged to identify popular room types in specific neighborhoods, allowing stakeholders to tailor their marketing strategies and property investments accordingly. Additionally, understanding the demand for different room types in various neighborhoods can inform decisions related to pricing, property management, and expansion strategies. Therefore, stakeholders are advised to regularly analyze this data to optimize their offerings and maximize profitability in the Airbnb market.



Analyzing the average minimum nights required for booking provides stakeholders with valuable insights into booking preferences and expectations of Airbnb guests. This information can guide hosts and property managers in setting appropriate minimum night requirements for their listings to attract more bookings while meeting guest expectations. Moreover, understanding the average minimum nights can help stakeholders adjust their pricing strategies and promotional efforts to optimize occupancy rates and enhance the overall guest experience. Therefore, stakeholders are encouraged to utilize this data to tailor their listing policies and improve their competitiveness in the Airbnb market.

Understanding the distribution of minimum nights required for booking is crucial for stakeholders to adapt their listing policies and meet the diverse needs of Airbnb guests. With a significant portion of properties requiring either 1 or 2 minimum nights, stakeholders should consider offering flexible booking options to attract short-term guests looking for quick getaways. Additionally, while properties with longer minimum night requirements may appeal to guests seeking extended stays, stakeholders should carefully balance these requirements to avoid potential booking limitations. By analyzing and adjusting minimum night requirements based on guest preferences and market demand, stakeholders can optimize their property occupancy and enhance the overall guest experience.



Identifying the neighborhood group with the highest average reviews per month provides valuable insights into areas with exceptional guest satisfaction. Given the output indicating that the South Waikato District has the highest average reviews per month, stakeholders should prioritize investing in this area to capitalize on its positive reputation and guest satisfaction. They can consider expanding their property listings, improving amenities and services, and implementing targeted marketing strategies to attract more guests to this highly-rated neighborhood group. By focusing efforts on areas with proven high satisfaction rates, stakeholders can enhance their competitiveness on the Airbnb platform and drive sustained growth and success in the market.

In light of the distinct pricing dynamics observed across different neighborhood groups, stakeholders are advised to adopt tailored pricing strategies to optimize their market competitiveness and revenue potential.

For properties situated in Queenstown-Lakes District, characterized by the highest average price, stakeholders may benefit from emphasizing premium amenities and services to align with the upscale nature of the market. This approach enables them to justify higher rates and cater to the expectations of discerning guests seeking luxury accommodations.

Conversely, in Kawerau District, where the average price is notably lower, stakeholders are encouraged to focus on offering cost-effective options and targeted promotions to appeal to budget-conscious travelers. By leveraging competitive pricing strategies in this market segment, stakeholders can attract a broader range of guests and maximize occupancy rates.

Ultimately, by aligning pricing strategies with the prevailing average prices in each neighborhood group, stakeholders can enhance their market position, drive demand, and optimize revenue generation in their respective markets.
