# INSTRUCTIONS

**Perform a basic exploratory data analysis of the Inside Airbnb dataset and answer some questions**

To complete the assignment, paste the answers where appropriate, and upload the full notebook at the end.


**Submission format**: One Jupyter notebook (`Airbnb_GroupX.ipynb`).

Please do *not* submit:
* A zip file
* A link to Google CoLab
* A file with the wrong extension
* A Python script


To complete the assignment, follow the steps. The data is attached below.


The grading criteria are, in decreasing order of importance and increasing object of subjectivity, as follows:


* Code has no errors, the whole notebook runs from top to bottom without modifications (35 %)
* Code gives correct answers (30 %)
* Code avoids repetition and favours pandas methods where appropriate (loops and conditionals only if strictly necessary) (15 %)
* Code uses meaningful, explanatory variable names (10 %)
Code is as succinct as possible (when there are two ways of doing something, the simplest, shortest, or easier to understand is chosen) (5 %)
  * If you discuss several ways of doing something, with its pros and cons (without just dumping the code and no explanations), that counts positively as well
* Code is easy to read (i.e. "similar to how the professor codes") (5 %)


Optionally, you can include code comments describing the intent (i.e. code comments should answer "why is this code here?", not "what is this code doing?") and supplementary markdown cells if appropriate.



All the questions can be done independently of each other (after reading the data). They are not sorted in any particular order of difficulty.

#IMPORT LIBRARIES AND DATASET


In [None]:
# PREREQUISITE:
# Every group member must go to their Google Drive "Shared with me",
# right-click the "GROUP 1 " folder, and choose "Add Shortcut to Drive".

from google.colab import drive
import pandas as pd
import os

# 1. Mount Google Drive
# This will ask for permission every time you restart the runtime.
drive.mount('/content/drive', force_remount=True)

# 2. Define the path
# We use the path relative to 'My Drive' assuming 'Group 1' is shortcutted there.
folder_name = 'GROUP 1 /PYTHON'
file_name = 'listings.csv'

# This path assumes everyone added the shortcut to the root of their My Drive
path = f'/content/drive/MyDrive/{folder_name}/{file_name}'

# 3. Load the Data with error handling
if os.path.exists(path):
    df = pd.read_csv(path)
    print("Success! Data loaded.")
else:
    print(f"Error: File not found at {path}")
    print("Make sure you have added a SHORTCUT of the 'GROUP 1 ' folder to your 'My Drive'.")

Mounted at /content/drive
Success! Data loaded.


In [None]:
display(df.head())

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,21853,https://www.airbnb.com/rooms/21853,20250914152907,2025-09-15,previous scrape,Bright and airy room,We have a quiet and sunny room with a good vie...,We live in a leafy neighbourhood with plenty o...,https://a0.muscache.com/pictures/68483181/87bc...,83531,...,4.82,4.21,4.67,,f,2,0,2,0,0.25
1,30320,https://www.airbnb.com/rooms/30320,20250914152907,2025-09-15,city scrape,Apartamentos Dana Sol,,,https://a0.muscache.com/pictures/336868/f67409...,130907,...,4.78,4.9,4.69,,t,17,17,0,0,0.93
2,30959,https://www.airbnb.com/rooms/30959,20250914152907,2025-09-15,previous scrape,Beautiful loft in Madrid Center,Beautiful Loft 60m2 size just in the historica...,,https://a0.muscache.com/pictures/78173471/835e...,132883,...,4.63,4.88,4.25,,f,1,1,0,0,0.06
3,40916,https://www.airbnb.com/rooms/40916,20250914152907,2025-09-15,city scrape,Apartasol Apartamentos Dana,,,https://a0.muscache.com/pictures/hosting/Hosti...,130907,...,4.81,4.88,4.59,,t,17,17,0,0,0.29
4,62423,https://www.airbnb.com/rooms/62423,20250914152907,2025-09-15,city scrape,MAGIC ARTISTIC HOUSE IN THE CENTER OF MADRID,INCREDIBLE HOME OF AN ARTIST SURROUNDED BY PAI...,DISTRICT WITH VERY GOOD VIBES IN THE MIDDLE OF...,https://a0.muscache.com/pictures/miso/Hosting-...,303845,...,4.86,4.97,4.6,,f,3,1,2,0,2.78


# PART 1: BASIC EXPLORATORY ANALYSIS

**Question 1**

How many rows does the dataset have? (Excluding the header containing the column names)

**Question 2**

How many columns does the dataset have? (Excluding the autogenerated numerical index)

**Question 3**

How many unique values are there for host_id?

**Question 4**

Count how many listings are there per host (where 1 row = 1 listing). Find the host with the largest number of listings. How many listings do they have?

**Question 5**

How many distinct hosts are superhosts?

**Question 6**

In the city of Madrid there are 2 administrative levels represented in the dataset: neighbourhood and district. Find the district with the largest number of listings. How many does it have?

**Question 7**

What's the average price of listings? (Error of +-1 USD is accepted)

**Question 8**

How many listings have zero reviews?

**Question 9**

Fill the gap: "Listings that are instantly bookable have an average number of reviews per month X % higher than those that are not" (Error of +-1 % is accepted)

**Question 10**

How many listings have missing (null) license information?

**Question 11**

Some licenses have a very long string starting with ES, followed by 2 letters (type of listing), followed by 2 letters (category), followed by a long list of numbers and some extra characters.

How many listings have a license containing the string "ESFC"?

**Question 12**

How many listings have declared "Exempt" or "En proceso" in the license field?

**Question 13**

How many hosts *cannot* be contacted by email?

**Question 14**

What's the maximum number of amenities found in any listing?

**Question 15**

Which year has the record for number of hosts registered?

# PART 2: OPEN ENDED ANALYSIS

**Question 16**

Examine the "license" field a bit more closely. There are different structures present, apart from the one described above. Try to identify them and count how many listings have each type of license.

**Question 17**

The "host_location" information is somewhat structured. Sometimes it contains (city, country), sometimes it doesn't. Explore how many different countries are present, trying to pay attention to typos, special values (like state names), and which are the most prevalent ones, other than Spain.

**Question 18**

A few listings seem to be extremely expensive. Devise a method of extracting price outliers, and inspect those listings. Which ones have been most reviewed? Do they have more amenities than average? Highlight anything else that's interesting about them.