# Section B: Practical questions with applied multiple choice

## General Rules:
- This is an open book examination.
- Students may make use of a calculator.
- This is an online examination where you will access a computer; however you may not communicate with other students in any form.
- Headphone are prohibited.
- The use of AI (chatGPT etc.) is prohibited.
- All cell phones are to be switched off for the duration of the exam.
- The invigilator will not assist you with the explanation of questions.
- Students are prohibited from conversing in any manner with other students.

## My Name and Surname

Name = 
</br>
Surname =  

### Part 1: SQL Queries  
You are provided with a pre-populated SQLite database named `airbnb.db`. Download [here](https://www.kaggle.com/datasets/arianazmoudeh/airbnbopendata) if you haven't already. Your task is to explore this database and write a series of SQL queries to perform the tasks detailed below. Queries should be optimised to run within 20 seconds or less.

The tables and columns included in the `airbnb.db` are:

- `listings`: `listing_id`, `host_id`, `listing_name`, `neighbourhood`, `room_type`, `price`, `minimum_nights`, `number_of_reviews`, `last_review`, `reviews_per_month`, `calculated_host_listings_count`, `availability_365`  
- `hosts`: `host_id`, `host_name`, `host_since`, `host_location`, `host_response_time`, `host_response_rate`, `host_is_superhost`  
- `reviews`: `review_id`, `listing_id`, `reviewer_id`, `review_date`, `comments`  
- `reviewers`: `reviewer_id`, `reviewer_name`  
- `calendar`: `listing_id`, `date`, `available`, `price`  
- `neighbourhoods`: `neighbourhood`, `borough`  
- `amenities`: `listing_id`, `amenity_name`

In [1]:
import os
import json
import random
import sqlite3
import sqlparse
import pandas as pd
import numpy as np

import seaborn as sns
import mysql.connector

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.preprocessing import PolynomialFeatures

import matplotlib.pyplot as plt

In [2]:
# Step 1: Connect without specifying the database
conn = mysql.connector.connect(
    host="127.0.0.1",
    user="root",
    password="Bonakele.1"
)
cursor = conn.cursor()

# Step 2: Create the database if it doesn't exist
cursor.execute("CREATE DATABASE IF NOT EXISTS airbnb_nyc")
cursor.execute("USE airbnb_nyc")


InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (10061 No connection could be made because the target machine actively refused it)

In [8]:
# Neighbourhood Groups
cursor.execute("""
CREATE TABLE IF NOT EXISTS neighbourhood_groups (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(50) NOT NULL UNIQUE
)
""")

# Neighbourhoods
cursor.execute("""
CREATE TABLE IF NOT EXISTS neighbourhoods (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    neighbourhood_group_id INT NOT NULL,
    UNIQUE (name, neighbourhood_group_id),
    FOREIGN KEY (neighbourhood_group_id) REFERENCES neighbourhood_groups(id)
)
""")

# Room Types
cursor.execute("""
CREATE TABLE IF NOT EXISTS room_types (
    id INT AUTO_INCREMENT PRIMARY KEY,
    type VARCHAR(50) NOT NULL UNIQUE
)
""")

# Cancellation Policies
cursor.execute("""
CREATE TABLE IF NOT EXISTS cancellation_policies (
    id INT AUTO_INCREMENT PRIMARY KEY,
    policy VARCHAR(50) NOT NULL UNIQUE
)
""")

# Hosts
cursor.execute("""
CREATE TABLE IF NOT EXISTS hosts (
    id BIGINT PRIMARY KEY,
    name VARCHAR(100),
    identity_verified VARCHAR(20)
)
""")

# Listings
cursor.execute("""
CREATE TABLE IF NOT EXISTS listings (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    host_id BIGINT NOT NULL,
    neighbourhood_id INT,
    latitude DECIMAL(10, 8),
    longitude DECIMAL(11, 8),
    country VARCHAR(50),
    country_code VARCHAR(10),
    room_type_id INT,
    construction_year INT,
    price DECIMAL(10, 2),
    service_fee DECIMAL(10, 2),
    minimum_nights INT,
    number_of_reviews INT,
    last_review DATE,
    reviews_per_month DECIMAL(5, 2),
    review_rate_number INT,
    calculated_host_listings_count INT,
    availability_365 INT,
    instant_bookable BOOLEAN,
    cancellation_policy_id INT,
    house_rules TEXT,
    license TEXT,
    FOREIGN KEY (host_id) REFERENCES hosts(id),
    FOREIGN KEY (neighbourhood_id) REFERENCES neighbourhoods(id),
    FOREIGN KEY (room_type_id) REFERENCES room_types(id),
    FOREIGN KEY (cancellation_policy_id) REFERENCES cancellation_policies(id)
)
""")

cursor.execute("INSERT IGNORE INTO neighbourhood_groups (name) VALUES ('Brooklyn'), ('Manhattan'), ('Queens'), ('Bronx'), ('Staten Island')")
cursor.execute("INSERT IGNORE INTO room_types (type) VALUES ('Private room'), ('Entire home/apt'), ('Shared room')")
cursor.execute("INSERT IGNORE INTO cancellation_policies (policy) VALUES ('strict'), ('moderate'), ('flexible')")
conn.commit()

df = pd.read_csv("Airbnb_Open_Data.csv")

# Standardize column names
df.columns = [col.strip().lower().replace(" ", "_") for col in df.columns]


  df = pd.read_csv("Airbnb_Open_Data.csv")


In [9]:
# Insert unique neighbourhoods
neighbourhoods = df[['neighbourhood', 'neighbourhood_group']].drop_duplicates()

for _, row in neighbourhoods.iterrows():
    # Get neighbourhood_group_id
    cursor.execute("SELECT id FROM neighbourhood_groups WHERE name = %s", (row['neighbourhood_group'],))
    group_id = cursor.fetchone()
    if group_id:
        cursor.execute("""
            INSERT IGNORE INTO neighbourhoods (name, neighbourhood_group_id)
            VALUES (%s, %s)
        """, (row['neighbourhood'], group_id[0]))

# Insert unique hosts  

hosts = df[['host_id', 'host_name', 'host_identity_verified']].drop_duplicates()

for _, row in hosts.iterrows():
    cursor.execute("""
        INSERT IGNORE INTO hosts (id, name, identity_verified)
        VALUES (%s, %s, %s)
    """, (row['host_id'], row['host_name'], row['host_identity_verified']))



# Insert unique listings
for _, row in df.iterrows():
    # Get FK IDs
    cursor.execute("SELECT id FROM neighbourhoods WHERE name = %s", (row['neighbourhood'],))
    neighbourhood_id = cursor.fetchone()
    
    cursor.execute("SELECT id FROM room_types WHERE type = %s", (row['room_type'],))
    room_type_id = cursor.fetchone()
    
    cursor.execute("SELECT id FROM cancellation_policies WHERE policy = %s", (row['cancellation_policy'],))
    cancellation_policy_id = cursor.fetchone()

    # Insert listing
    cursor.execute("""
        INSERT IGNORE INTO listings (
            id, name, host_id, neighbourhood_id, latitude, longitude, country,
            country_code, room_type_id, construction_year, price, service_fee,
            minimum_nights, number_of_reviews, last_review, reviews_per_month,
            review_rate_number, calculated_host_listings_count, availability_365,
            instant_bookable, cancellation_policy_id, house_rules, license
        )
        VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
    """, (
        row['id'], row['name'], row['host_id'], neighbourhood_id[0] if neighbourhood_id else None,
        row['lat'], row['long'], row['country'], row['country_code'], room_type_id[0] if room_type_id else None,
        row['construction_year'], row['price'], row['service_fee'],
        row['minimum_nights'], row['number_of_reviews'], row['last_review'],
        row['reviews_per_month'], row['review_rate_number'],
        row['calculated_host_listings_count'], row['availability_365'],
        row['instant_bookable'], cancellation_policy_id[0] if cancellation_policy_id else None,
        row['house_rules'], row['license']
    ))

    conn.commit()

In [None]:
## mysqldump -u root -p airbnb_nyc > airbnb_nyc.sql
# run this on anaconda prompt

In [14]:
# Connect to MySQL
mysql_conn = mysql.connector.connect(
    host="localhost",
    user="root",
    password="Bonakele.1",
    database="airbnb_nyc"
)
mysql_cursor = mysql_conn.cursor()

# Connect to SQLite (creates airbnb_nyc.db if it doesn't exist)
sqlite_conn = sqlite3.connect("airbnb_nyc.db")

# List of table names to transfer
tables = [
    "neighbourhood_groups",
    "neighbourhoods",
    "room_types",
    "cancellation_policies",
    "hosts",
    "listings"
]

# Transfer each table from MySQL to SQLite
for table in tables:
    df = pd.read_sql(f"SELECT * FROM {table}", mysql_conn)
    df.to_sql(table, sqlite_conn, if_exists="replace", index=False)
    print(f"Transferred table: {table}")

# Close connections
mysql_cursor.close()
mysql_conn.close()
sqlite_conn.close()

  df = pd.read_sql(f"SELECT * FROM {table}", mysql_conn)


Transferred table: neighbourhood_groups
Transferred table: neighbourhoods
Transferred table: room_types
Transferred table: cancellation_policies
Transferred table: hosts
Transferred table: listings


In [15]:
# Load your database and create a database connection.
# You can connect to the sql database in any way you wish. 
# Use this method if you are unsure how to proceed. 
# Ensure the bike_store.db file is in the same directory as this notebook.
try:
    with sqlite3.connect("airbnb_nyc.db") as conn:
        print(f"Opened SQLite database with version {sqlite3.sqlite_version} successfully.")

except sqlite3.OperationalError as e:
    print("Failed to open database:", e)

# List all tables in the database
pd.read_sql('''SELECT name FROM sqlite_master WHERE type='table';''',conn)     #Does the same thing



Opened SQLite database with version 3.49.1 successfully.


Unnamed: 0,name
0,neighbourhood_groups
1,neighbourhoods
2,room_types
3,cancellation_policies
4,hosts
5,listings


In [16]:
conn = sqlite3.connect('airbnb_nyc.db')
cursor = conn.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")    # Does the same thing
print(cursor.fetchall())  # Lists all tables in the database
conn.close()

[('neighbourhood_groups',), ('neighbourhoods',), ('room_types',), ('cancellation_policies',), ('hosts',), ('listings',)]


#### SQL Query 1: Top 10 Neighbourhoods by Average Listing Price

#### This query helps identify the most expensive areas to stay in San Francisco.
#### It calculates the average price of listings per neighbourhood, excluding free or zero-priced listings,
#### and returns the top 10 neighbourhoods with the highest average prices.


