## Comparation between Barcelona and Madrid 

In this section, we perform an Exploratory Data Analysis (EDA) to compare the rental markets of two major Spanish cities: Barcelona and Madrid. Both cities, known for their tourist appeal and vibrant urban life, present unique contexts that influence their respective rental markets.

The goal of this EDA is to uncover the differences and similarities in rental patterns between these two cities, providing a comparative view that helps understand how various factors impact prices and availability of accommodations. The comparison covers several key aspects, including:

1. **Basic Descriptive Statistics:**
   We will explore general metrics for both markets, such as the average rental price, the total number of listings, and their distribution across the city.

2. **Neighborhood Analysis:**
   Differences in prices and the popularity of neighborhoods in Barcelona and Madrid will be analyzed, highlighting the most and least expensive areas and how these differences relate to factors like proximity to tourist centers and amenities.

3. **Price Distribution:**
   Comparing the distribution of prices in both cities will reveal trends in price variation based on location, type of accommodation, and other relevant factors.

4. **Review Analysis:**
   We will examine the distribution and impact of reviews on both markets, including how ratings affect prices and the perception of accommodations in Barcelona and Madrid.

5. **Amenities Analysis:**
   This analysis will detail how the presence of certain amenities in accommodations influences prices in both cities, providing insights into tenant preferences and the added value of amenities.

6. **Room Type vs. Price:**
   We will compare the average prices of different room types in Barcelona and Madrid, highlighting how these differences are reflected in each city's market.

7. **Host Analysis:**
   The impact of host status (superhosts vs. regular hosts) on prices and the number of listings in both cities will be investigated, offering a perspective on how host profiles affect the market.

Through this comparative analysis, we aim to provide a comprehensive understanding of how the rental markets in Barcelona and Madrid differ and align, facilitating informed decision-making based on a deeper insight into the characteristics and trends of the market in these two major urban destinations

In [12]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the datasets
barcelona_df = pd.read_csv('datasets/Final_cleaned_dataset/bcn_final_cleaned_data_csv.csv', on_bad_lines='skip', sep=";")
madrid_df = pd.read_csv('datasets/Final_cleaned_dataset/mad_final_cleaned_data_csv.csv',on_bad_lines='skip', sep=";" ) 

### 1. Basic Descriptive Statistics

In [18]:
def basic_descriptive_statistics(df, city_name):
    print(f"Basic Descriptive Statistics for {city_name}")
    print(df.describe(include='all'))

basic_descriptive_statistics(barcelona_df, 'Barcelona')




Basic Descriptive Statistics for Barcelona
       neighbourhood_group_cleansed            id  \
count                         18857  1.885700e+04   
unique                           10           NaN   
top                        Eixample           NaN   
freq                           6692           NaN   
mean                            NaN  4.262056e+17   
std                             NaN  4.751728e+17   
min                             NaN  1.867400e+04   
25%                             NaN  2.434676e+07   
50%                             NaN  5.199440e+07   
75%                             NaN  9.194646e+17   
max                             NaN  1.178911e+18   

                                listing_url                 name  \
count                                 18857                18857   
unique                                18857                18140   
top     https://www.airbnb.com/rooms/269467  Double or Twin Room   
freq                                      1     

## 2.Neighborhood Analysis


### 2.1.Top Neighborhoods

In [16]:
def top_neighborhoods(df, city_name):
    print(f"Top Neighborhoods for {city_name}")
    top_neigh = df['neighbourhood_group_cleansed'].value_counts()
    print(top_neigh)

top_neighborhoods(barcelona_df, 'Barcelona')
top_neighborhoods(madrid_df, 'Madrid')


Top Neighborhoods for Barcelona
neighbourhood_group_cleansed
Eixample               6692
Ciutat Vella           4390
Sants-Montjuïc         1969
Sant Martí             1745
Gràcia                 1573
Sarrià-Sant Gervasi     988
Horta-Guinardó          557
Les Corts               403
Sant Andreu             316
Nou Barris              224
Name: count, dtype: int64
Top Neighborhoods for Madrid


KeyError: 'neighbourhood_group_cleansed'

### 2.2. Average Price by Neighborhood

In [None]:
def average_price_by_neighborhood(df, city_name):
    print(f"Average Price by Neighborhood for {city_name}")
    avg_price_neigh = df.groupby('neighbourhood_group_cleansed')['price'].mean()
    print(avg_price_neigh)

average_price_by_neighborhood(barcelona_df, 'Barcelona')
average_price_by_neighborhood(madrid_df, 'Madrid')


## 3. Price Distribution

In [None]:
def price_distribution(df, city_name):
    plt.figure(figsize=(10, 6))
    sns.histplot(df['price'], kde=True)
    plt.title(f'Price Distribution for {city_name}')
    plt.xlabel('Price')
    plt.ylabel('Frequency')
    plt.show()

price_distribution(barcelona_df, 'Barcelona')
price_distribution(madrid_df, 'Madrid')


## 4. Review Analysis

### 4.1. Distribution of Scores

In [None]:
def review_scores_distribution(df, city_name):
    plt.figure(figsize=(10, 6))
    sns.histplot(df['review_scores_rating'], kde=True)
    plt.title(f'Review Scores Distribution for {city_name}')
    plt.xlabel('Review Scores Rating')
    plt.ylabel('Frequency')
    plt.show()

review_scores_distribution(barcelona_df, 'Barcelona')
review_scores_distribution(madrid_df, 'Madrid')


## 5. Amenities Analysis
### 5.1. Count of Listings with Specific Amenities

In [None]:
def amenities_count(df, city_name):
    amenities = ['kitchen', 'patio or balcony', 'elevator', 'air conditioning']
    print(f"Count of Listings with Specific Amenities for {city_name}")
    for amenity in amenities:
        count = df[amenity].sum()
        print(f"{amenity}: {count}")

amenities_count(barcelona_df, 'Barcelona')
amenities_count(madrid_df, 'Madrid')


## 6. Room Type vs. Price

### 6.1. Average Price by Room Type

In [None]:
def average_price_by_room_type(df, city_name):
    print(f"Average Price by Room Type for {city_name}")
    avg_price_room_type = df.groupby('room_type_encoded')['price'].mean()
    print(avg_price_room_type)

average_price_by_room_type(barcelona_df, 'Barcelona')
average_price_by_room_type(madrid_df, 'Madrid')


## 7. Host Analysis
### 7.1. Superhost vs. Price

In [None]:
def superhost_vs_price(df, city_name):
    plt.figure(figsize=(10, 6))
    sns.boxplot(x='host_is_superhost', y='price', data=df)
    plt.title(f'Superhost vs Price for {city_name}')
    plt.xlabel('Superhost')
    plt.ylabel('Price')
    plt.show()

superhost_vs_price(barcelona_df, 'Barcelona')
superhost_vs_price(madrid_df, 'Madrid')


### 7.2. Number of Listings per Host

In [None]:
def listings_per_host(df, city_name):
    print(f"Number of Listings per Host for {city_name}")
    listings_count = df['host_id'].value_counts()
    print(listings_count.describe())

listings_per_host(barcelona_df, 'Barcelona')
listings_per_host(madrid_df, 'Madrid')
