## IMPORTING ALL DEPENDANCIES I NEED FOR THIS PROJECT

In [None]:
import numpy as np
import pandas as pd
import missingno as msno 
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import statsmodels.graphics.correlation as sgc
from statsmodels.graphics.gofplots import qqplot
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import OLSInfluence
from sklearn.preprocessing import StandardScaler
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

## CONNECTION TO MY DATABASE ON POSTGRES

This is the connection link to my database on postgreSQL, the actual connection function is on the file **db_connect.py**

In [None]:
# Import necessary packages
import pandas as pd
from db_connect import connect_to_db

# Step 1: Connect to the database
conn = connect_to_db()

# Step 2: Create a cursor and run a query
cursor = conn.cursor()
query = "SELECT * FROM airbnbs_nairobi.listing_data_yearly;"
cursor.execute(query)

# Step 3: Fetch results and convert to a DataFrame
rows = cursor.fetchall()
df = pd.DataFrame(rows, columns=[desc[0] for desc in cursor.description])

# Step 4: Display the data
print("Connection successful! Previewing data:")
display(df.head())

## Data Exploration an attempt at understanding my data

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
msno.matrix(df)

## Unpacking Listing Type

To try understand the dataset and the types of AirBNBs we are working with, I've decided to display the airbnbs type we have in our dataset.

In [None]:
# listing all unique listing types
df['listing_type'].unique()

In [None]:
# count of each listing type
listing_type_counts = df['listing_type'].value_counts()
listing_type_counts

A visual representation of listing types we have in our dataset

In [None]:
# create a bar chart 
plt.figure(figsize=(10, 6))
listing_type_counts.plot(kind='bar', color='steelblue')
plt.title('Count of Listings by Type', fontsize=14, fontweight='bold')
plt.xlabel('Listing Type', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout
plt.show()

# Display the counts
print(listing_type_counts)

Based on the bar chart, the most common unit across the dataset is *An Entire rental Unit* with 129 listings offering it as a listing type. There are a couple of listing types with only 1.


The most interesting listings were, a bus, a Tiny home and a Treehouse, my curiosity got the best of me and i had to check them out. Since the dataset is accompanied by images, let me get them and showcase them here.

In [None]:
interesting_listing_types = ['Bus', 'Treehouse', 'Tiny home']
filtered_df = df[df['listing_type'].isin(interesting_listing_types)]
filtered_df[['listing_type','cover_photo_url']]

To display the cover photos of these interesting listings.

In [None]:
# Create HTML to display images in a grid
html_content = "<div style='display: flex; flex-wrap: wrap; gap: 20px;'>"
for idx, row in filtered_df.iterrows():
    html_content += f"""
    <div style='text-align: center;'>
        <h4>{row['listing_type']}</h4>
        <h4>{row['listing_name']}</h4>
        <img src='{row['cover_photo_url']}' style='width: 300px; height: 300px; object-fit: cover;'>
    </div>
    """
html_content += "</div>"
display(HTML(html_content))

### Let's check room type as a category

How many unique <b> room types </b> are in our dataset?

In [None]:
df['room_type'].unique()

So our whole dataset has 3 types of rooms:
1. Private Rooms
2. Entire Homes
3. Hotel Rooms

In [None]:
df['room_type'].value_counts()

The most common room type is entire home meaning the client gets to access each room in the listing, followed by Private room, this means that the client has no access to other rooms within the listing, 1 listing is a hotel room which is kinda weird since i did not know hotel rooms could be listed on the airbnb app.



For my analysis, I will group the listings into two categories:

1.Entire Home

2.Private Room

This is because these two categories are subject to different considerations and evaluation criteria.



In [None]:
df = df.sort_values(['room_type']).reset_index(drop=True)
df