# Exploratory Data Analysis of Airbnb Listings in Milan, Italy

## Introduction

##### Milan is Italy's fashion and design capital. It is not only known as the centre of art, culture, and commerce but is also a vibrant destination for various travelers. With its rich history, stunning architecture, and bustling city life, Milan attracts millions of visitors each year.

##### In the recent years, the hospitality industry has completely transformed with the rise of platforms like Airbnb offering travelers a diverse range of accommodation options that are beyond traditional hotels. As one of the world's most popular travel destinations, Milan boasts a vast number of Airbnb listings, providing visitors with opportunities to immerse themselves in the city's local culture and lifestyle more than ever.

##### This notebook presents an exploratory data analysis of Airbnb listings in Milan, making use of a dataset to uncover insights into various aspects of Airbnb offerings in the city.

## Objective - Market Opportunity and Pricing Strategy:

##### - Identify neighborhoods with high demand and optimal pricing strategies for Airbnb hosts to maximize revenue. 
##### - This objective would be useful for Airbnb hosts or property managers looking to enter the Milan market or optimize their existing listings. 
##### - I aim to investigate how data-driven pricing can lead to better profitability.

### Importing necessary libraries

In [1]:
import numpy as np
import pandas as pd

#### Loading the Data:

In [3]:
listings = pd.read_csv("C:/Users/STARIZ.PK/Desktop/DA_COURSE/listingssummary.csv")

## Understanding the data structure:

In [4]:
listings

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
0,39185194,The Best Rent - One bedroom apartment in Milan,4417813,The Best Rent,,RONCHETTO SUL NAVIGLIO,45.441440,9.140470,Entire home/apt,175.0,1,4,2022-04-27,0.12,155,0,0,015146-LNI-00331
1,17818971,Studio nel cuore delle cinque vie,20016667,Carlotta,,DUOMO,45.461430,9.184950,Private room,200.0,1,1,2023-04-20,0.09,2,357,1,
2,1077526403195603262,PrimoPiano - Arrivabene,2504885,Simone,,BOVISA,45.500170,9.165999,Entire home/apt,108.0,2,1,2024-02-12,0.70,128,68,1,015146-LIM-01075
3,582962332344164114,Italianway - Venini 16,27693585,Italianway,,BUENOS AIRES - VENEZIA,45.487240,9.212029,Entire home/apt,112.0,1,26,2023-11-07,1.07,489,257,4,015146-CIM-04360
4,626361430623671090,One Suite Meravigli - 5 mins from DUOMO,458997695,Davide,,DUOMO,45.465786,9.181277,Entire home/apt,134.0,2,100,2024-02-18,4.56,1,180,47,CIR: 015146-LNI-00888
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22101,999255609762923112,stanza luminosa e accogliente,5745369,Marco,,TICINESE,45.451855,9.182139,Private room,69.0,2,6,2024-03-18,1.41,1,248,6,
22102,43732904,Villa Aida App Superior,302714051,Aida,,SCALO ROMANA,45.440450,9.210560,Entire home/apt,179.0,6,1,2021-03-01,0.03,3,159,0,
22103,37855859,Sweet Central Suite in Milan,1366623,MATnIV,,DUOMO,45.459570,9.192330,Entire home/apt,138.0,2,164,2024-03-16,2.96,2,30,42,
22104,1035885357495120445,Studio Centro Milan Porta Romana,2452841,Paolo & Yuki,,PORTA ROMANA,45.453052,9.204039,Entire home/apt,53.0,1,0,,,7,186,0,


##### Shape of the dataframe:

##### We have a total of 22106 rows and 18 columns.

In [90]:
listings.shape

(22106, 18)

##### The below mentioned are all the columns names available in this dataframe:

In [73]:
listings.columns

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood', 'latitude',
       'longitude', 'room_type', 'price', 'minimum_nights',
       'number_of_reviews', 'last_review', 'reviews_per_month',
       'calculated_host_listings_count', 'availability_365',
       'number_of_reviews_ltm', 'license'],
      dtype='object')

##### The function performed below provides us with a concise summary of our dataframe's structure, including the data types of each column, the number of non-null values, and memory usage. 

In [4]:
listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22106 entries, 0 to 22105
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              22106 non-null  int64  
 1   name                            22106 non-null  object 
 2   host_id                         22106 non-null  int64  
 3   host_name                       22106 non-null  object 
 4   neighbourhood_group             0 non-null      float64
 5   neighbourhood                   22106 non-null  object 
 6   latitude                        22106 non-null  float64
 7   longitude                       22106 non-null  float64
 8   room_type                       22106 non-null  object 
 9   price                           21591 non-null  float64
 10  minimum_nights                  22106 non-null  int64  
 11  number_of_reviews               22106 non-null  int64  
 12  last_review                     

##### Now let's check if we have any duplicated rows in this dataframe.

In [7]:
listings.duplicated()

0        False
1        False
2        False
3        False
4        False
         ...  
22101    False
22102    False
22103    False
22104    False
22105    False
Length: 22106, dtype: bool

In [9]:
duplicate_count = listings.duplicated().sum()

print("Number of duplicate rows:", duplicate_count)

Number of duplicate rows: 0


##### In this dataframe, we have 0 duplicate rows.

## Handling Missing Values in the Dataset:

##### Checking for NaN, null or missing values:

In [10]:
listings.isna().sum() 

id                                    0
name                                  0
host_id                               0
host_name                             0
neighbourhood_group               22106
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price                               515
minimum_nights                        0
number_of_reviews                     0
last_review                        4381
reviews_per_month                  4381
calculated_host_listings_count        0
availability_365                      0
number_of_reviews_ltm                 0
license                           14219
dtype: int64

##### All rows in the column, "neighbourhood_group" have NaN or missing values, therefore we will drop this column completely.
##### Keeping this column would not contribute to any meaningful analysis, especially given our focus on market opportunity and pricing strategy.
##### Since there was no usable information in this column and it added no value to our analysis, it was deemed unnecessary and dropped to maintain data clarity and streamline our focus on identifying optimal pricing and potential market opportunitie
##### Dropping the column, "neighbourhood_group":":


In [41]:
try:
    listings = listings.drop(columns=['neighbourhood_group'])
    print("Dropped 'neighbourhood_group' column.")
except KeyError:
    print("The column 'neighbourhood_group' does not exist in the DataFrame.")

Dropped 'neighbourhood_group' column.


##### We will now deal with an important column with NaN values which is, "price". First we will find all the listings that have missing values in the "price" column and then they will be fetched manually on the Airbnb website and if available, the NaN values will be replaced with the listing's actual price per night. 

##### The "price" column is a critical feature for our analysis, especially for our analysis on investigating possible pricing strategies. Given its importance and the fact that many rows had missing values,# it was crucial to fill these NaN values

##### The "price" column is a critical feature for our analysis, especially Airbnb's website to retrieve the actual prices and replace the NaN values, ensuring that our analysis remains accurate and reflects real-world data.

##### This manual process was chosen to avoid filling with estimates or averages that could distort the insights related to price vs. minimum nights.s.


#### Adding correct price value to those rows that had NaN values of price: 

In [78]:
listings.loc[20, 'price'] = 200; listings.loc[45, 'price'] = 80; listings.loc[107, 'price'] = 119; listings.loc[263, 'price'] = 122; listings.loc[21620, 'price'] = 402;
listings.loc[21621, 'price'] = 60; listings.loc[21622, 'price'] = 300; listings.loc[21624, 'price'] = 679; listings.loc[264, 'price'] = 350;
listings.loc[265, 'price'] = 297; listings.loc[270, 'price'] = 80; listings.loc[21616, 'price'] = 250; listings.loc[21617, 'price'] = 250;
listings.loc[21618, 'price'] = 142; listings.loc[21613, 'price'] = 250; listings.loc[21614, 'price'] = 80; listings.loc[21615, 'price'] = 300;
listings.loc[21612, 'price'] = 220; listings.loc[935, 'price'] = 185; listings.loc[1158, 'price'] = 80; listings.loc[1159, 'price'] = 80;
listings.loc[1160, 'price'] = 60; listings.loc[1480, 'price'] = 503; listings.loc[1688, 'price'] = 48; listings.loc[1690, 'price'] = 38;
listings.loc[1691, 'price'] = 130; listings.loc[1695, 'price'] = 73; listings.loc[1700, 'price'] = 380; listings.loc[1703, 'price'] = 900;
listings.loc[1705, 'price'] = 67; listings.loc[1723, 'price'] = 145; listings.loc[2113, 'price'] = 32; listings.loc[2118, 'price'] = 200;
listings.loc[2120, 'price'] = 260; listings.loc[2324, 'price'] = 97; listings.loc[2336, 'price'] = 55; listings.loc[2337, 'price'] = 175;
listings.loc[2340, 'price'] = 380; listings.loc[2617, 'price'] = 62; listings.loc[2706, 'price'] = 230; listings.loc[3326, 'price'] = 88;
listings.loc[3327, 'price'] = 55; listings.loc[3684, 'price'] = 99; listings.loc[3808, 'price'] = 126; listings.loc[3849, 'price'] = 500;
listings.loc[4386, 'price'] = 165; listings.loc[4795, 'price'] = 280; listings.loc[4864, 'price'] = 98; listings.loc[5147, 'price'] = 86;
listings.loc[5149, 'price'] = 76; listings.loc[5151, 'price'] = 100; listings.loc[5153, 'price'] = 150; listings.loc[5157, 'price'] = 118;
listings.loc[5361, 'price'] = 100; listings.loc[5363, 'price'] = 80; listings.loc[5368, 'price'] = 45; listings.loc[5369, 'price'] = 95;
listings.loc[5372, 'price'] = 211; listings.loc[5373, 'price'] = 400; listings.loc[5474, 'price'] = 124; listings.loc[5480, 'price'] = 250;
listings.loc[5635, 'price'] = 263; listings.loc[5637, 'price'] = 139; listings.loc[5721, 'price'] = 600; listings.loc[5725, 'price'] = 153;
listings.loc[5726, 'price'] = 153; listings.loc[5734, 'price'] = 91; listings.loc[6332, 'price'] = 50; listings.loc[6333, 'price'] = 150;
listings.loc[6520, 'price'] = 105; listings.loc[6522, 'price'] = 200; listings.loc[6683, 'price'] = 110; listings.loc[6688, 'price'] = 922;
listings.loc[7203, 'price'] = 250; listings.loc[7592, 'price'] = 98; listings.loc[7595, 'price'] = 290; listings.loc[7596, 'price'] = 144;
listings.loc[7602, 'price'] = 295; listings.loc[7997, 'price'] = 120; listings.loc[8192, 'price'] = 267; listings.loc[8519, 'price'] = 106;
listings.loc[8521, 'price'] = 90; listings.loc[8665, 'price'] = 102; listings.loc[8961, 'price'] = 103; listings.loc[9320, 'price'] = 899;
listings.loc[9343, 'price'] = 120; listings.loc[9344, 'price'] = 55; listings.loc[9350, 'price'] = 149; listings.loc[9565, 'price'] = 103;
listings.loc[9566, 'price'] = 180; listings.loc[9569, 'price'] = 350; listings.loc[9571, 'price'] = 120; listings.loc[9576, 'price'] = 300;
listings.loc[9580, 'price'] = 204; listings.loc[9584, 'price'] = 94; listings.loc[9588, 'price'] = 51; listings.loc[9589, 'price'] = 164;
listings.loc[9591, 'price'] = 90; listings.loc[9757, 'price'] = 116; listings.loc[9773, 'price'] = 236; listings.loc[9834, 'price'] = 112;
listings.loc[9864, 'price'] = 135; listings.loc[9968, 'price'] = 208; listings.loc[9969, 'price'] = 200; listings.loc[9970, 'price'] = 76;
listings.loc[9975, 'price'] = 90; listings.loc[10363, 'price'] = 250; listings.loc[10898, 'price'] = 43; listings.loc[10900, 'price'] = 94;
listings.loc[10916, 'price'] = 400; listings.loc[11499, 'price'] = 140; listings.loc[11962, 'price'] = 150; listings.loc[12001, 'price'] = 193;
listings.loc[12009, 'price'] = 450; listings.loc[12362, 'price'] = 138; listings.loc[12363, 'price'] = 120; listings.loc[12364, 'price'] = 99;
listings.loc[12669, 'price'] = 2000; listings.loc[12677, 'price'] = 100; listings.loc[12679, 'price'] = 90; listings.loc[12681, 'price'] = 74;
listings.loc[12682, 'price'] = 100; listings.loc[12683, 'price'] = 46; listings.loc[12686, 'price'] = 80; listings.loc[12688, 'price'] = 101;
listings.loc[12869, 'price'] = 89; listings.loc[12872, 'price'] = 60; listings.loc[12873, 'price'] = 600; listings.loc[12877, 'price'] = 93;
listings.loc[12882, 'price'] = 170; listings.loc[12883, 'price'] = 500; listings.loc[13079, 'price'] = 350; listings.loc[13082, 'price'] = 80;
listings.loc[13113, 'price'] = 120; listings.loc[13114, 'price'] = 211; listings.loc[13117, 'price'] = 85; listings.loc[13119, 'price'] = 50;
listings.loc[13122, 'price'] = 124; listings.loc[13124, 'price'] = 100; listings.loc[13332, 'price'] = 130; listings.loc[13333, 'price'] = 87;
listings.loc[13337, 'price'] = 103; listings.loc[13338, 'price'] = 99; listings.loc[13341, 'price'] = 250; listings.loc[13343, 'price'] = 45;
listings.loc[13604, 'price'] = 120; listings.loc[13713, 'price'] = 86; listings.loc[14128, 'price'] = 149; listings.loc[14274, 'price'] = 86; 
listings.loc[14299, 'price'] = 322; listings.loc[14303, 'price'] = 145; listings.loc[14491, 'price'] = 253; listings.loc[14494, 'price'] = 92;
listings.loc[14657, 'price'] = 100; listings.loc[14664, 'price'] = 109; listings.loc[15128, 'price'] = 69; listings.loc[15244, 'price'] = 120;
listings.loc[15544, 'price'] = 89; listings.loc[15545, 'price'] = 118; listings.loc[15546, 'price'] = 65; listings.loc[15547, 'price'] = 206;
listings.loc[15551, 'price'] = 105; listings.loc[15552, 'price'] = 200; listings.loc[15553, 'price'] = 165; listings.loc[15554, 'price'] = 60;
listings.loc[15555, 'price'] = 100; listings.loc[15558, 'price'] = 300; listings.loc[15559, 'price'] = 130; listings.loc[15560, 'price'] = 80;
listings.loc[15739, 'price'] = 210; listings.loc[15741, 'price'] = 80; listings.loc[15744, 'price'] = 100; listings.loc[15748, 'price'] = 199;
listings.loc[15749, 'price'] = 121; listings.loc[15754, 'price'] = 750; listings.loc[15756, 'price'] = 373; listings.loc[15904, 'price'] = 142; 
listings.loc[15956, 'price'] = 185; listings.loc[15969, 'price'] = 155; listings.loc[16194, 'price'] = 85; listings.loc[16209, 'price'] = 120;
listings.loc[16210, 'price'] = 140; listings.loc[16638, 'price'] = 118; listings.loc[16861, 'price'] = 35; listings.loc[16879, 'price'] = 101;
listings.loc[17181, 'price'] = 200; listings.loc[17187, 'price'] = 125; listings.loc[17188, 'price'] = 143; listings.loc[17191, 'price'] = 250;
listings.loc[17775, 'price'] = 108; listings.loc[18109, 'price'] = 245; listings.loc[18110, 'price'] = 200; listings.loc[18113, 'price'] = 292;
listings.loc[18118, 'price'] = 62; listings.loc[18119, 'price'] = 179; listings.loc[18318, 'price'] = 35; listings.loc[18324, 'price'] = 200;
listings.loc[18325, 'price'] = 60; listings.loc[18328, 'price'] = 475; listings.loc[18329, 'price'] = 124; listings.loc[18330, 'price'] = 108;
listings.loc[18333, 'price'] = 74; listings.loc[18622, 'price'] = 200; listings.loc[18728, 'price'] = 85; listings.loc[18730, 'price'] = 85;
listings.loc[18731, 'price'] = 132; listings.loc[19304, 'price'] = 130; listings.loc[19345, 'price'] = 284; listings.loc[19634, 'price'] = 102;
listings.loc[19636, 'price'] = 69; listings.loc[19637, 'price'] = 90; listings.loc[20446, 'price'] = 180; listings.loc[20475, 'price'] = 70;
listings.loc[20476, 'price'] = 136; listings.loc[20478, 'price'] = 305; listings.loc[20482, 'price'] = 500; listings.loc[20486, 'price'] = 472;
listings.loc[20674, 'price'] = 150; listings.loc[20675, 'price'] = 139; listings.loc[20677, 'price'] = 120; listings.loc[20682, 'price'] = 119;
listings.loc[20685, 'price'] = 180; listings.loc[20686, 'price'] = 98; listings.loc[20687, 'price'] = 100; listings.loc[20688, 'price'] = 120;
listings.loc[21282, 'price'] = 100; listings.loc[21493, 'price'] = 60; listings.loc[21497, 'price'] = 70; listings.loc[21498, 'price'] = 900;
listings.loc[21500, 'price'] = 300; listings.loc[21509, 'price'] = 90; listings.loc[21517, 'price'] = 100; listings.loc[21527, 'price'] = 75;
listings.loc[21531, 'price'] = 200; listings.loc[21532, 'price'] = 160; listings.loc[21537, 'price'] = 75; listings.loc[21540, 'price'] = 180;
listings.loc[21544, 'price'] = 140; listings.loc[21550, 'price'] = 450; listings.loc[21551, 'price'] = 80; listings.loc[21552, 'price'] = 400;
listings.loc[21556, 'price'] = 100; listings.loc[21557, 'price'] = 85; listings.loc[21558, 'price'] = 90; listings.loc[21559, 'price'] = 60;
listings.loc[21560, 'price'] = 145; listings.loc[21561, 'price'] = 35; listings.loc[21562, 'price'] = 333; listings.loc[21564, 'price'] = 160;
listings.loc[21567, 'price'] = 160; listings.loc[21568, 'price'] = 300; listings.loc[21569, 'price'] = 335; listings.loc[21570, 'price'] = 194;
listings.loc[21571, 'price'] = 300; listings.loc[21574, 'price'] = 180; listings.loc[21576, 'price'] = 84; listings.loc[21578, 'price'] = 138;
listings.loc[21581, 'price'] = 250; listings.loc[21584, 'price'] = 1000; listings.loc[21586, 'price'] = 70; listings.loc[21589, 'price'] = 100;
listings.loc[21590, 'price'] = 140; listings.loc[21597, 'price'] = 500; listings.loc[21600, 'price'] = 200; listings.loc[21602, 'price'] = 143;
listings.loc[21606, 'price'] = 399; listings.loc[21607, 'price'] = 290; listings.loc[21609, 'price'] = 200

##### Some listings with more than 2 missing (NaN) values were not found on the Airbnb website, meaning they are unavailable for booking or viewing. 

##### Since these lisings do not represent active or viewable properties, they were dropped from the dataframe. This ensures that our analysis focuses only on listings that are currently available, provinding more accurate insights into key factors such as price and availability , which depend on actual bookable listings.gs.


In [79]:
# Dropping the two rows as both are not visible on the website and had more than 2 columns with NaN values

try:
    listings = listings.drop([1074, 1477, 1697, 1825, 2123, 2333, 2338, 2456, 2709, 3678, 4865, 5364, 5365, 5366])
    print("Dropped listings unavailable on the airbnb website")
except KeyError:
    print("The rows requested do not exist in the DataFrame.")

The rows requested do not exist in the DataFrame.


In [45]:
# Dropping the two rows as both are not visible on the website and had more than 2 columns with NaN values

try:
    listings = listings.drop([5375, 5724, 6521, 6681, 7482, 7589, 7597, 7600, 7618, 9045, 9348, 9574, 9581, 9863, 10542, 11330, 11846])
    print("Dropped listings unavailable on the airbnb website")
except KeyError:
    print("The rows requested do not exist in the DataFrame.")

Dropped listings unavailable on the airbnb website


In [13]:
# Dropping the two rows as both are not visible on the website and had more than 2 columns with NaN values

try:
    listings = listings.drop([12670, 12875, 13109, 13328, 13338, 13340, 14300, 14660, 15245, 15557, 15561, 15752, 18326, 18733, 19465, 19639, 20148, 20484, 20487, 20692, 21554, 21563, 21572, 21601, 21605])
    print("Dropped listings unavailable on the airbnb website")
except KeyError:
    print("The rows requested do not exist in the DataFrame.")

Dropped listings unavailable on the airbnb website


In [46]:
try:
    listings = listings.drop([2125])
    print("Dropped listings unavailable on the airbnb website")
except KeyError:
    print("The rows requested do not exist in the DataFrame.")

Dropped listings unavailable on the airbnb website


In [47]:
try:
    listings = listings.drop([21623, 4358, 9585, 21555, 21573, 21594, 21599])
    print("Dropped listings unavailable to all the guests")
except KeyError:
    print("The rows requested do not exist in the DataFrame.")

Dropped listings unavailable to all the guests


## Conversion of monthly priced listings to per-night prices:

##### Some listings had prices listed on a per-month basis instead of per-night. To standardize the data and maintain consistency in our analysis of with the objective we are focusing on, the monthly prices were divided by 30 (the average number of days in a month) to convert them into a per-night price. 

##### This conversion allows for a fair comparision across all listings and ensures that this analysis reflects accurate pricing information for nightly bookings.


In [48]:
# Manually fetched monthly prices
monthly_prices_by_index = {
    1161: 21350, 1822: 3534, 2319: 593, 4348: 10332, 
    5152: 1800, 5154: 3799, 9047: 2400, 9340: 1875, 11845: 2147, 
    12366: 2070, 12881: 2250, 15243: 1488, 
    16072: 2100, 18116: 3600, 21491: 2135, 
    21592: 2929, 21603: 2320
}

for index, monthly_price in monthly_prices_by_index.items():
    nightly_price = monthly_price / 30
    print(f"Processing row index: {index}, Monthly Price: {monthly_price}, Nightly Price: {nightly_price}")
    listings.loc[index, 'price'] = nightly_price

Processing row index: 1161, Monthly Price: 21350, Nightly Price: 711.6666666666666
Processing row index: 1822, Monthly Price: 3534, Nightly Price: 117.8
Processing row index: 2319, Monthly Price: 593, Nightly Price: 19.766666666666666
Processing row index: 4348, Monthly Price: 10332, Nightly Price: 344.4
Processing row index: 5152, Monthly Price: 1800, Nightly Price: 60.0
Processing row index: 5154, Monthly Price: 3799, Nightly Price: 126.63333333333334
Processing row index: 9047, Monthly Price: 2400, Nightly Price: 80.0
Processing row index: 9340, Monthly Price: 1875, Nightly Price: 62.5
Processing row index: 11845, Monthly Price: 2147, Nightly Price: 71.56666666666666
Processing row index: 12366, Monthly Price: 2070, Nightly Price: 69.0
Processing row index: 12881, Monthly Price: 2250, Nightly Price: 75.0
Processing row index: 15243, Monthly Price: 1488, Nightly Price: 49.6
Processing row index: 16072, Monthly Price: 2100, Nightly Price: 70.0
Processing row index: 18116, Monthly Pric

#### Dealing with NaN values in the columns, "last_review" and "reviews_per_month":

##### The "last_review" and "reviews_per_month" columns had missing values. Since reviews can provide valuable insights into listings popularity and quality, imputing missing values helps preserve useful data. 

##### By maintaining review information, we can more effectively analyze factors like price and demand with a focus on active listings.

##### For "last_review", missing values were filled with the mode date (most frequent date) to reflect a common review period. This ensures the dataset remains as complete as possible without introducing bias or skewing results.ngs.


In [61]:
# Convert 'last_review' column to datetime type
listings['last_review'] = pd.to_datetime(listings['last_review'])

# Compute mode of 'last_review' column
mode_last_review = listings['last_review'].mode()[0]

# Impute missing values with mode
listings['last_review'].fillna(mode_last_review, inplace=True)

#### Median Imputation for "reviews_per_month" column:

##### The median "reviews_per_month" provides a typical or representative value for listings, making it suitable for imputing missing values in cases where the exact value may not be known but a typical value is desired to get rid of missing values.

In [71]:
median_reviews_per_month = listings['reviews_per_month'].median()
listings['reviews_per_month'].fillna(median_reviews_per_month, inplace=True)

#### Replacing NaN values in 'license' column with the mode (most frequent value)

##### The 'license' column was imputed with the mode (most frequent value) because it represents the most common license status across listings. Sice the presence of a license is often standard for most hosts, using the mode ensures that the majority case is reflected for the missing values.This decision helps keep the data consisted without impating the analysis of price vs. minimum nights, as the focus is not heavily reliant on the license column but still requires complteness for overall accuracy.cy.


In [66]:
mode_license = listings['license'].mode()[0]  # Calculate mode license
listings['license'].fillna(mode_license, inplace=True)  # Fill NaN values with mode

#### Checking again for total null values:

In [82]:
listings.isna().sum() # checking for total null values

id                                  0
name                                0
host_id                             0
host_name                           0
neighbourhood                       0
latitude                            0
longitude                           0
room_type                           0
price                             191
minimum_nights                      0
number_of_reviews                   0
last_review                         0
reviews_per_month                   0
calculated_host_listings_count      0
availability_365                    0
number_of_reviews_ltm               0
license                             0
dtype: int64

##### We see that we are now left with only 191 NaN values in the column, "price". These have been thouroughly checked manually on the Airbnb website. They have NaN values because they are unavailable to book currently or till May 2026. 

##### Including these values in the analysis wouldn't provide meaningful insights, as they are not relevant to the current market conditions. Leaving them as NaN ensures that the analysis is based on active and bookable listings, preserving the integrity of the data without impacting the analysis of the other rows.
