# **Project Name**    - AirBnb Hotel Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name**            -Amardeep Pawar


# **AirBnb tableau Dashboard link :**

https://public.tableau.com/shared/H8S7FB8HQ?:display_count=n&:origin=viz_share_link

# **GitHub Link -**

https://github.com/amrrr001/Tableau-Airbnd-Transforming-EDAs-to-Dashboards.git

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
%matplotlib inline

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset Loading

In [None]:
# Load Dataset
file_path = "/content/drive/MyDrive/Airbnb project/Airbnb.csv"
airbnb_df = pd.read_csv(file_path )

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df.head()

In [None]:
airbnb_df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
airbnb_df.shape

### Dataset Information

In [None]:
# Dataset Info
airbnb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count = airbnb_df.duplicated().sum()
duplicate_count

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
airbnb_df.isna().sum()

In [None]:
airbnb_df.isna().sum().sort_values(ascending=False)[:4]

In [None]:
# Visualizing the missing values
missing_values_count = airbnb_df.isna().sum()
# Create a bar plot to visualize missing values
plt.figure(figsize=(10, 6))
sns.barplot(x=missing_values_count.index, y=missing_values_count.values, palette='rocket')
plt.xlabel('Columns')
plt.ylabel('Count of Missing Values')
plt.title('Missing Values in Dataset')
plt.xticks(rotation=90)  # Rotate x-axis labels for better visibility
plt.show()

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df.columns


In [None]:
# Dataset Describe
airbnb_df.describe()

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in airbnb_df.columns:
    print(f"Unique values for {column}:")
    print(airbnb_df[column].unique())
    print()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Handling Missing Values
# For columns 'name' and 'host_name', we will fill missing values with 'Unknown'
airbnb_df['name'].fillna('Unknown', inplace=True)
airbnb_df['host_name'].fillna('Unknown', inplace=True)

# For 'last_review', fill NaN with a placeholder date or drop the column depending on analysis needs
# Here, we fill with a placeholder date
airbnb_df['last_review'].fillna('1970-01-01', inplace=True)

# For 'reviews_per_month', fill NaN with 0 since no reviews imply zero reviews per month
airbnb_df['reviews_per_month'].fillna(0, inplace=True)

In [None]:
# Verify there are no more missing values
print(airbnb_df.isnull().sum())

In [None]:
# Display the DataFrame
airbnb_df.head()

In [None]:
airbnb_df.dtypes

In [None]:
#Top 10 Most Reviewed Host

most_reviewed_host_df=['name','number_of_reviews']


airbnb_review_df=airbnb_df[most_reviewed_host_df]


airbnb_review_df.sort_values(by='number_of_reviews',ascending=False).head(10).reset_index()

In [None]:
#top highest price host

price_df=['name','price']


airbnb_high_price_df=airbnb_df[price_df]


airbnb_high_price_df.sort_values(by='price',ascending=False).head(10).reset_index()

In [None]:
#Host which need to update price

airbnb_lowest_price=airbnb_df[price_df]

airbnb_lowest_price.sort_values(by='price',ascending=True).head(11).reset_index()

# Dataframe saved after manipulations as "airbnb_output_df"

In [None]:
working_dir_path= "/content/drive/MyDrive/Colab Notebooks/"
airbnb_df.to_csv(working_dir_path +'airbnb_output_df.csv')

### What all manipulations have you done and insights you found?

Data Manipulations:

Handling Missing Values:

Identified missing values in the columns company, agent, country, and children.

Filled missing values in the company column with the mean value of that column.

Filled missing values in the country column with the mode (most frequent) value of that column.

Filled missing values in the agent column fill with (0) value of that column.

Filled missing values in the children column fill with (0) value of that column.

Handling Duplicates:

Identified and dropped duplicate rows in the dataset, resulting in a reduced dataset with 87,396 rows.

Feature Engineering:

Added two new columns to the dataset: 'total_stay': The sum of 'stays_in_week_nights' and 'stays_in_weekend_nights'.

'total_people': The sum of 'adults', 'children', and 'babies'.

**Data Exploration**: Provided summary statistics for numeric columns, including mean, standard deviation, minimum, 25th percentile, median (50th percentile), 75th percentile, and maximum values.

Column Information: Listed the column names and their data types in the dataset.

Basic Data Overview: Shared the shape of the dataset, including the number of rows and columns. Mentioned the presence of missing values and the number of duplicated rows in the original dataset.

Insights:

Data Types:

Provided information about the data types of each column in the dataset, including object (categorical), int64 (integer), and float64 (floating-point).

**Summary Statistics**:

Presented summary statistics for numeric columns, including count, mean, standard deviation, minimum, 25th percentile, median, 75th percentile, and maximum values for various attributes such as lead time, number of adults, children, babies, and others.

**Overview of Columns**:

Listed the column names and provided a brief description of their meanings, such as the hotel type, booking status, booking lead time, and more.

**Categorical Columns**:

Identified several categorical columns, including hotel, arrival_date_month, meal, country, market_segment, distribution_channel, reserved_room_type, assigned_room_type, deposit_type, customer_type, reservation_status, and reservation_status_date.

Overall, data cleaning was performed by addressing missing values and duplicates, and some basic insights into the dataset's structure and content were provided. However, for more in-depth analysis, further exploration and visualization of the data may be required, such as exploring correlations between variables or building predictive models based on the dataset.

# **Conclusion**

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***