# **Airbnb Deep Dive** 

**This Python and Pandas deep dive is to uncover patterns in the London Listings Dataset last updates on the 19th of September 2025**

**Action points for this Dataset:**
- Data Loading and intital Exploration
  - Loading Dataset
  - Displaying basic information
  - Checking missing values,duplicates,datatypes
- Data Cleaning 
  - Chnaging dtypes
  - Parsing dates
  - Handling Missing data
  - Removin irrelevant rows
- Data Enrichment
  - New Columns 
- Analysis/Answering business questions


### **Data Loading and intital Exploration**
- Reading the Dataset
- Brief survey of dataset using.describe,info methods
- Check for missing and duplicated data

In [None]:
#Importing necessary libraries
import pandas as pd
import os

#Reading the dataset using a function
def read_data():
    if os.path.exists("listings.csv"): #Checking if the dataset exist in the working directory
        data = pd.read_csv("listings.csv") #Reading the dataset
        return data #Output displayed when the function is called
    else:
        return FileNotFoundError("This file does not exist.") #Error message is the file is not in the working directory
data = read_data() #Calling the function to read the file
data.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
0,13913,Holiday London DB Room Let-on going,54730,Alina,,Islington,51.56861,-0.1127,Private room,70.0,1,55,2025-08-21,0.3,2,331,10,
1,15400,Bright Chelsea Apartment. Chelsea!,60302,Philippa,,Kensington and Chelsea,51.4878,-0.16813,Entire home/apt,149.0,4,97,2025-04-05,0.51,1,199,1,
2,17402,Very Central Modern 3-Bed/2 Bath By Oxford St W1,67564,Liz,,Westminster,51.52195,-0.14094,Entire home/apt,411.0,3,56,2024-02-19,0.32,2,80,0,
3,24328,Battersea live/work artist house,41759,Joe,,Wandsworth,51.47072,-0.16266,Entire home/apt,,7,95,2025-07-05,0.53,1,294,1,
4,36274,Bright 1 bedroom apt off brick lane in Shoreditch,133271,Hendryks,,Tower Hamlets,51.52322,-0.06979,Entire home/apt,210.0,5,15,2025-09-06,0.09,2,323,6,


In [62]:
#Pre data processing dataset summary
data.describe()

Unnamed: 0,id,host_id,neighbourhood_group,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
count,96871.0,96871.0,0.0,96871.0,96871.0,61963.0,96871.0,96871.0,72749.0,96871.0,96871.0,96871.0,0.0
mean,6.894448e+17,214449400.0,,51.50974,-0.127638,229.917,5.440297,21.657627,0.990334,16.685499,144.927429,5.709614,
std,5.941222e+17,219605300.0,,0.049067,0.101112,4437.589,23.686681,50.368644,1.304282,53.13029,141.808279,11.99677,
min,13913.0,2594.0,,51.295937,-0.49676,7.0,1.0,0.0,0.01,1.0,0.0,0.0,
25%,30260580.0,27268140.0,,51.48415,-0.189468,77.0,1.0,1.0,0.15,1.0,0.0,0.0,
50%,8.505248e+17,116432100.0,,51.51372,-0.127505,135.0,2.0,5.0,0.52,2.0,96.0,0.0,
75%,1.254262e+18,419897400.0,,51.539108,-0.068316,221.0,4.0,20.0,1.29,8.0,288.0,6.0,
max,1.508964e+18,718690500.0,,51.68263,0.27896,1085147.0,1125.0,1902.0,36.96,500.0,365.0,390.0,


In [63]:
#Dataset information
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96871 entries, 0 to 96870
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              96871 non-null  int64  
 1   name                            96871 non-null  object 
 2   host_id                         96871 non-null  int64  
 3   host_name                       96828 non-null  object 
 4   neighbourhood_group             0 non-null      float64
 5   neighbourhood                   96871 non-null  object 
 6   latitude                        96871 non-null  float64
 7   longitude                       96871 non-null  float64
 8   room_type                       96871 non-null  object 
 9   price                           61963 non-null  float64
 10  minimum_nights                  96871 non-null  int64  
 11  number_of_reviews               96871 non-null  int64  
 12  last_review                     

In [65]:
#Function to check for missing data in the dataset

def missing():
    missing_data = data.isnull().sum()    #Sum of all mmissing data pre column
    percentage_missing_data = (missing_data / len(data)) * 100 #percentage missing data

    print("% Missing data per column:")   #Print statemennt fot clarity
    return pd.DataFrame({"Total Missing": missing_data, "Percentage Missing": percentage_missing_data})   #Converting output to DataFrame

missing()#Calling the function

% Missing data per column:


Unnamed: 0,Total Missing,Percentage Missing
id,0,0.0
name,0,0.0
host_id,0,0.0
host_name,43,0.044389
neighbourhood_group,96871,100.0
neighbourhood,0,0.0
latitude,0,0.0
longitude,0,0.0
room_type,0,0.0
price,34908,36.035552


In [66]:
#Function for checking for duplicates in the dataset

def is_duplicated():
    duplicates = data.duplicated().sum() #Sum of duplicated rows
    return pd.DataFrame({"Total duplicates": [duplicates]}) #DataFrame for the output
is_duplicated()


Unnamed: 0,Total duplicates
0,0


### **Data Cleaning**
- Stripping object columns of digits,symbols
- Dropping 100% null columns
- Dropping rows where prices are null
- Changing data types
- Assigning **Unknown** to null host_name[With the assumption they choose to have a low profile or stay anonymous]

In [None]:
data[["name","host_name","neighbourhood","room_type"]] = data[["name","host_name","neighbourhood","room_type"]].replace(r"\W", "", regex=True) #Stripping object datatype of things that are not word characters.

#### Dropping the **License & Neighbourhood group** column as it is not useful for this analysis and contains over 100% missing values.
#### Rows with missing **prices** wee also dropped from the dataset


In [None]:
#Function to drop not so useful data in the dataset
def dropping(data):
    if "price" in data.columns:
        data = data.dropna(subset=["price"])
        data = data.drop(columns=["license", "neighbourhood_group"], axis=1)
    return data
data = dropping(data)
data


Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
0,13913,HolidayLondonDBRoomLetongoing,54730,Alina,Islington,51.568610,-0.112700,Privateroom,70.0,1,55,2025-08-21,0.30,2,331,10
1,15400,BrightChelseaApartmentChelsea,60302,Philippa,KensingtonandChelsea,51.487800,-0.168130,Entirehomeapt,149.0,4,97,2025-04-05,0.51,1,199,1
2,17402,VeryCentralModern3Bed2BathByOxfordStW1,67564,Liz,Westminster,51.521950,-0.140940,Entirehomeapt,411.0,3,56,2024-02-19,0.32,2,80,0
4,36274,Bright1bedroomaptoffbricklaneinShoreditch,133271,Hendryks,TowerHamlets,51.523220,-0.069790,Entirehomeapt,210.0,5,15,2025-09-06,0.09,2,323,6
5,36299,KewGardens3BRhouseinculdesac,155938,Geert,RichmonduponThames,51.481450,-0.281070,Entirehomeapt,280.0,3,116,2025-07-20,0.64,1,324,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96866,1508894090797273412,BluegroundFinsburybalconynrStPauls,314162972,Blueground,Islington,51.526692,-0.097322,Entirehomeapt,298.0,30,0,,,405,351,0
96867,1508900042872179492,SelfContainedStudioinHeartofTootingBroadway,718690455,Ali,Wandsworth,51.429503,-0.165492,Entirehomeapt,66.0,1,2,2025-09-15,2.00,1,354,2
96868,1508926597927944565,OnebedroomapartmentDagenham,389056540,Arnelle,BarkingandDagenham,51.529700,0.148890,Entirehomeapt,350.0,1,0,,,1,365,0
96869,1508962439633147670,ShortStay,683246718,Tayba,TowerHamlets,51.514600,-0.063140,Privateroom,40.0,1,0,,,1,348,0


In [None]:
#Converting the last_review column to datetime
data["last_review"] = pd.to_datetime(data["last_review"], errors= "coerce")
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 61963 entries, 0 to 96870
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   id                              61963 non-null  int64         
 1   name                            61963 non-null  object        
 2   host_id                         61963 non-null  int64         
 3   host_name                       61936 non-null  object        
 4   neighbourhood                   61963 non-null  object        
 5   latitude                        61963 non-null  float64       
 6   longitude                       61963 non-null  float64       
 7   room_type                       61963 non-null  object        
 8   price                           61963 non-null  float64       
 9   minimum_nights                  61963 non-null  int64         
 10  number_of_reviews               61963 non-null  int64         
 11  last_re

In [None]:
#Assigning Unknown to host_name(s) which are null
data.loc[data["host_name"].isnull(), "host_name"] = "Unknown"

### **DATA ENRICHMENT**
- Creating a new column named : **price_per_booking**
- Bucketing categories into:
    - Full-time
    - Part-time
    - Rare

In [None]:
#Function to create new column
def price_per_booking(data): #Parameter : data (An instance of the dataframe)
    data["price_per_booking"] = data["price"] * data["minimum_nights"] #Multiplying the price column with theminimium_nights colmn
    return data #Returning the Dataframe
bookings = price_per_booking(data) #Assinging the function to a variable
bookings #Clling booking


Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,price_per_booking
0,13913,HolidayLondonDBRoomLetongoing,54730,Alina,Islington,51.568610,-0.112700,Privateroom,70.0,1,55,2025-08-21,0.30,2,331,10,70.0
1,15400,BrightChelseaApartmentChelsea,60302,Philippa,KensingtonandChelsea,51.487800,-0.168130,Entirehomeapt,149.0,4,97,2025-04-05,0.51,1,199,1,596.0
2,17402,VeryCentralModern3Bed2BathByOxfordStW1,67564,Liz,Westminster,51.521950,-0.140940,Entirehomeapt,411.0,3,56,2024-02-19,0.32,2,80,0,1233.0
4,36274,Bright1bedroomaptoffbricklaneinShoreditch,133271,Hendryks,TowerHamlets,51.523220,-0.069790,Entirehomeapt,210.0,5,15,2025-09-06,0.09,2,323,6,1050.0
5,36299,KewGardens3BRhouseinculdesac,155938,Geert,RichmonduponThames,51.481450,-0.281070,Entirehomeapt,280.0,3,116,2025-07-20,0.64,1,324,6,840.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96866,1508894090797273412,BluegroundFinsburybalconynrStPauls,314162972,Blueground,Islington,51.526692,-0.097322,Entirehomeapt,298.0,30,0,NaT,,405,351,0,8940.0
96867,1508900042872179492,SelfContainedStudioinHeartofTootingBroadway,718690455,Ali,Wandsworth,51.429503,-0.165492,Entirehomeapt,66.0,1,2,2025-09-15,2.00,1,354,2,66.0
96868,1508926597927944565,OnebedroomapartmentDagenham,389056540,Arnelle,BarkingandDagenham,51.529700,0.148890,Entirehomeapt,350.0,1,0,NaT,,1,365,0,350.0
96869,1508962439633147670,ShortStay,683246718,Tayba,TowerHamlets,51.514600,-0.063140,Privateroom,40.0,1,0,NaT,,1,348,0,40.0


In [None]:
#Function to bucket room avalabillity
def availability(x):
    if x < 100: #If less than 100 : Rare
        return "Rare"
    elif x<=300: #if >100 &<= 3000 : Part-time
        return "Part-time"
    else: #Else : Full-time
        return "Full-time"
#Uing annonymous funtion to create a column named availabilityand assign the function to it
data["availability"] = data["availability_365"].apply(availability)
data #calling data


Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,price_per_booking,availability
0,13913,HolidayLondonDBRoomLetongoing,54730,Alina,Islington,51.568610,-0.112700,Privateroom,70.0,1,55,2025-08-21,0.30,2,331,10,70.0,Full-time
1,15400,BrightChelseaApartmentChelsea,60302,Philippa,KensingtonandChelsea,51.487800,-0.168130,Entirehomeapt,149.0,4,97,2025-04-05,0.51,1,199,1,596.0,Part-time
2,17402,VeryCentralModern3Bed2BathByOxfordStW1,67564,Liz,Westminster,51.521950,-0.140940,Entirehomeapt,411.0,3,56,2024-02-19,0.32,2,80,0,1233.0,Rare
4,36274,Bright1bedroomaptoffbricklaneinShoreditch,133271,Hendryks,TowerHamlets,51.523220,-0.069790,Entirehomeapt,210.0,5,15,2025-09-06,0.09,2,323,6,1050.0,Full-time
5,36299,KewGardens3BRhouseinculdesac,155938,Geert,RichmonduponThames,51.481450,-0.281070,Entirehomeapt,280.0,3,116,2025-07-20,0.64,1,324,6,840.0,Full-time
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96866,1508894090797273412,BluegroundFinsburybalconynrStPauls,314162972,Blueground,Islington,51.526692,-0.097322,Entirehomeapt,298.0,30,0,NaT,,405,351,0,8940.0,Full-time
96867,1508900042872179492,SelfContainedStudioinHeartofTootingBroadway,718690455,Ali,Wandsworth,51.429503,-0.165492,Entirehomeapt,66.0,1,2,2025-09-15,2.00,1,354,2,66.0,Full-time
96868,1508926597927944565,OnebedroomapartmentDagenham,389056540,Arnelle,BarkingandDagenham,51.529700,0.148890,Entirehomeapt,350.0,1,0,NaT,,1,365,0,350.0,Full-time
96869,1508962439633147670,ShortStay,683246718,Tayba,TowerHamlets,51.514600,-0.063140,Privateroom,40.0,1,0,NaT,,1,348,0,40.0,Full-time


### **Data Analysis Using Pandas**
1. What are the top 10 most expensive neighborhoods by average price?
2. What’s the average availability and price by room type?
3. Which host has the most listings?
4. How does average price vary across different boroughs or districts?
5. How many listings have never been reviewed?
6. Write a summary of 3–5 key insights you found through your analysis

1. What are the top 10 most expensive neighborhoods by average price?

In [None]:
#1
#Function for the most expensive neighbourhoob by avg price.
def expensive_neighbourhood(data):
    return data.groupby("neighbourhood")["price"].mean().round(2).sort_values(ascending=False).head(10) #Grouping the neighbourhood by avg price sorting in descenfding order for the top 10 neighbourhoods
top10 = expensive_neighbourhood(data) #Calling the function
top10 #Output

neighbourhood
TowerHamlets            430.906199
CityofLondon            354.389908
Lambeth                 345.710741
Westminster             342.139405
KensingtonandChelsea    336.072148
Islington               217.546807
Camden                  216.511547
HammersmithandFulham    199.188085
Wandsworth              198.431607
RichmonduponThames      184.270936
Name: price, dtype: float64

2. What’s the average availability and price by room type?

In [None]:
#Function for calculating avg room availabilty per room type
def avg_availability_per_room_type(data):
    return pd.DataFrame(data.groupby(["room_type","availability"])["price"].mean()) #grouping room_type and availability by avg price
avg_availability_per_room_type(data)

Unnamed: 0_level_0,Unnamed: 1_level_0,price
room_type,availability,Unnamed: 2_level_1
Entirehomeapt,Full-time,359.585339
Entirehomeapt,Part-time,242.623532
Entirehomeapt,Rare,229.761866
Hotelroom,Full-time,333.6
Hotelroom,Part-time,883.227273
Hotelroom,Rare,896.25
Privateroom,Full-time,146.606588
Privateroom,Part-time,121.049567
Privateroom,Rare,89.706029
Sharedroom,Full-time,111.666667


3. Which host has the most listings?

In [None]:
#Function for calculating the host with the most house listings
def most_listings():
    return data[["host_id","host_name"]].value_counts().head(1)
most_listings()

host_id    host_name        
446820235  LuxurybookingsFZE    470
Name: count, dtype: int64

4. How does average price vary across different boroughs or districts

In [None]:
#Function for checking house varaiblity
def variation():
    #Grouping houses in several neighbourhoods and their prices
    neighbourhood = data.groupby("neighbourhood")["price"]

    avg_price_per_neighbourhood = neighbourhood.mean() #Mean prices of houses in different neighbourhods
    price_range = neighbourhood.max() - neighbourhood.min() #Rnage of houses (Max price - Min Price)
    price_variation = neighbourhood.std()
    min_price = neighbourhood.min() #Min price of houses

    result = pd.DataFrame({
        "Avg_neighbourhood_price": avg_price_per_neighbourhood,
        "Price Range": price_range,
        "Price Variation": price_variation,
        "Min Price": min_price
    }).sort_values(by="Avg_neighbourhood_price", ascending=False) #Neighbourhoods vs Price Variability

    return result
variation()


Unnamed: 0_level_0,Avg_neighbourhood_price,Price Range,Price Variation,Min Price
neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
TowerHamlets,430.906199,1085137.0,16599.043139,10.0
CityofLondon,354.389908,8947.0,568.767835,53.0
Lambeth,345.710741,66179.0,2914.282887,10.0
Westminster,342.139405,15130.0,644.31708,13.0
KensingtonandChelsea,336.072148,53564.0,896.96997,24.0
Islington,217.546807,74076.0,1451.562241,24.0
Camden,216.511547,14980.0,411.282336,20.0
HammersmithandFulham,199.188085,10021.0,291.083862,11.0
Wandsworth,198.431607,11984.0,398.859443,16.0
RichmonduponThames,184.270936,2875.0,198.190813,25.0


5. How many istings have never been reviewd


In [None]:
#Funtion to calculate houses with no reviews
def no_reviews(data):
    no_listings_reviews = data[data["last_review"].isnull() == True].shape[0]
    return no_listings_reviews
no_reviews(data)

13908