# SQL EXAM Model Solution

## SECTION A: Theory Predicts Multiple choice

#### Question 1/10

Brooklyn’s housing authority has launched a crackdown on illegal short-term 
rentals. CityBnB’s internal data reveals clusters of listings in Williamsburg and 
Bushwick suspected to be operated by commercial entities posing as individual 
hosts. Your task is to identify hosts with multiple listings to help investigators 
prioritize enforcement. 

##### Which SQL clause would you use? 

>GROUP BY host_id HAVING COUNT(*) > 1 

#### Question 2/10 

CityBnB’s leadership is preparing a presentation for New York City Council to 
demonstrate compliance with short-term rental laws. They need a month-by-month 
breakdown of 2024 bookings to prove seasonal demand aligns with housing 
availability regulations. The bookings table uses YYYY-MM-DD formatting. 

##### Which query generates the required report? 

> SELECT MONTH(booking_date), COUNT(*) 
> FROM bookings 
> WHERE YEAR(booking_date) = 2024 
> GROUP BY MONTH(booking_date); 

#### Question 3/10

Chinatown’s community board reports a surge in vacant "ghost listings" that sit 
empty year-round, exacerbating housing shortages. CityBnB must calculate the 
percentage of listings marked "available" in the calendar table but with zero 
bookings to address public concerns. 

##### Which method calculates this accurately?

> (COUNT(*) * 100.0) / (SELECT COUNT(*) FROM calendar) 

#### Question 4/10 

CityBnB is partnering with a real estate analytics firm to study housing trends. The 
firm requests a non-duplicated list of property types (e.g., "entire apartment," 
"private room") to analyze market saturation in high-demand areas like Manhattan 
and Queens.

##### Which SQL clause ensures unique property types?

>DISTINCT

#### Question 5/10 

A whistleblower in Sandton Heights—a luxury rental hotspot—claims that 30% of 
high-end listings are fake, using stock photos and fabricated reviews. CityBnB’s 
fraud team needs to identify the top 5 hosts by listing count for further 
investigation. 

##### What does this query detect? 

> Hosts with multiple listings, possible commercial operators

#### Question 6/10 

Users complain that CityBnB’s search filters fail to show affordable options in 
Harlem. The product team adds a "Budget-Friendly" filter for listings under 
$300/night but needs to validate the query before deployment. 

##### Which SQL query works?

>SELECT * FROM listings WHERE price < 300;

#### Question 7/10 

Following a data breach, CityBnB’s security team mandates an audit of hosts with 
unverified identities. The host_identity_verified column uses 'True'/'False' strings. 
Failure to comply could result in fines under New York’s short-term rental laws. 

##### Which SQL expression counts unverified hosts? 

>COUNT(CASE WHEN host_identity_verified = 'False' THEN 1 END)

#### Question 8/10 

A viral TikTok video exposes a "party house" in Astoria with 25+ noise complaints. 
CityBnB’s legal team needs to flag listings with >10 "noise" or "party" reviews AND >90% occupancy 
(booked ≥329 days/year) to avoid liability. 

##### Which query meets both conditions?

>WITH problematic_listings AS ( 
SELECT listing_id, COUNT(*) AS noise_reports 
FROM reviews 
WHERE comment ILIKE '%noise%' OR comment ILIKE '%party%' 
GROUP BY listing_id 
HAVING COUNT(*) > 10 
) 
SELECT l.id, l.name, p.noise_reports 
FROM listings l 
JOIN problematic_listings p ON l.id = p.listing_id 
WHERE l.availability_365 < 36;  -- 365 - 36 = 329 days booked

#### Question 9/10 

Scenario: A junior analyst’s query crashes CityBnB’s dashboard during a live demo 
to investors. The intended goal was to display high-rated properties (>4 stars) in 
Tribeca for a premium marketing campaign. 

##### What’s wrong with this syntax? 
- SELECT * FROM properties WHERE rating > 4 ORDER;

>The keyword ORDER should be ORDER BY

#### Question 10/10 

Scenario: As CityBnB scales to 50+ cities, redundant "host_status" entries (e.g., 
10,000 rows with 'superhost') slow down updates. The engineering team proposes 
normalizing this into a separate host_status table. 

##### What’s the primary advantage?

>To avoid redundant data and simplify updates 

## SECTION B: Practical questions with applied multiple choice
You are provided with a pre-populated SQLite database named `airbnb.db`. Download [here](https://www.kaggle.com/datasets/arianazmoudeh/airbnbopendata) if you haven't already. Your task is to explore this database and write a series of SQL queries to perform the tasks detailed below. Queries should be optimised to run within 20 seconds or less.

The tables and columns included in the `airbnb.db` are:

- `listings`: `id`, `host_id`, `name`, `neighbourhood_id`, `latitude`. `longitude`, `room_type_id`, ` construction_year`, `number_of_reviews`, `last_review`, `reviews_per_month`, `review_rate_number`, `calculated_host_listings_count` ,  `availability_365`, `instant_bookable`, `cancellation policy`, `house_rules`, `license`  
- `hosts`: `id`, `name`, `identity_verified`
- `neighbourhoods`: `id`, `name`, `neighbourhood_group_id`
- `neighbourhood_groups`: `id`, `name`
- `room types`: `id`, `type`
- `cancellation_policies`: `id`, `policy`

<div style="text-align: center;">
    <img src="ERD_airBnB.png" alt="Airbnb ERD" width="800"/>
</div>

In [2]:
%load_ext sql

In [3]:
%sql sqlite:///airbnb_nyc.db

#### Question 1
You are analyzing the Airbnb listings dataset for San Francisco. The dataset contains a table named neighborhoods which lists each unique neighborhood by its ID.<br> 
How many <strong>unique</strong> neighborhoods are there in this dataset?

#### Options

- 150<br>
- 200<br>
- 218<br>
- 250<br>



In [33]:
%%sql

SELECT COUNT(neighbourhood_id) 
FROM neighbourhoods;

 * sqlite:///airbnb_nyc.db
Done.


COUNT(neighbourhood_id)
218


#### Question 2 

Airbnb listings are being reviewed to identify <strong>properties</strong> that are listed but have <strong>zero availability</strong> throughout the entire year. These are often called “ghost listings” because they appear on the platform but cannot actually be booked.<br>
Write a sql query to detect listings that are not active but present in the Airbnb.

Based on the dataset, which of the following statements is TRUE?

#### Options

- All listings have some availability throughout the year, so there are no ghost listings.<br>
- There are several listings with zero availability, including “Chill in Alphabet City” and “Modern loft in great neighborhood.”<br>
- Listings with zero availability only exist in Brooklyn.<br>
- Listings with zero availability are always luxury apartments.<br>


In [5]:
%%sql 

SELECT listing
FROM listings
WHERE availability_365 = 0;

 * sqlite:///airbnb_nyc.db
Done.


listing
Chill in Alphabet City
Modern loft in great neighborhood
Upper West Side apartment with balcony
Your home away from home...
Sunny and spacious apartment in Brooklyn
Spacious Room with Character in BK
"Modern 3 BR home, 4 blocks from Public Transport!"
"Fun, Comfy, and Convenient Studio in Midtown West"
beautiful one bedroom apartment
Chic 1 br with huge private garden


#### Question 3 
Airbnb host data is being analyzed to understand how identity verification impacts guest review ratings. Hosts are categorized as either “<strong>verified</strong>” or “<strong>unconfirmed</strong>” (not verified).
Retrieve the <strong>average</strong> review scores for both verified and unverified hosts from the listings data.

Based on the data, what can you conclude about the average review scores for each identity verification type?

#### Options

- Verified hosts have a slightly higher average review score than unconfirmed hosts.<br>
- Unconfirmed hosts have a much higher average review score than verified hosts.<br>
- Both verified and unconfirmed hosts have the exact same average review score.<br>
- Verified hosts have a significantly lower average review score than unconfirmed hosts.<br>



In [6]:
%%sql

SELECT h.identity_verified, ROUND(AVG(l.review_rate_number),0) AS average_review_score
FROM listings l
JOIN hosts h ON l.host_id =h.host_id
GROUP BY h.identity_verified;

 * sqlite:///airbnb_nyc.db
Done.


identity_verified,average_review_score
unconfirmed,3.0
verified,3.0


#### Question 4 

An analysis is conducted to determine whether the <strong>expensive</strong> Airbnb listings in New York City are also the <strong>most available</strong> to guests throughout the year.<br>
Your task is to write a sql query that returns the columns <strong>listing</strong>, <strong>neighbourhood</strong> and <strong>price</strong>.

True or False? Based on the top 10 most expensive listings, these listings are also the most available (have high availability_365 values)?

#### Options

- True — All top 10 listings have high availability throughout the year.
- False — Many of the top 10 most expensive listings have low or zero availability.


In [27]:
%%sql

SELECT listing_id, listing, price, availability_365
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
ORDER BY price DESC, availability_365 DESC
LIMIT 10;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,price,availability_365
12022483,Sun filled 1 bedroom in the heart of Crown Heights,1199.0,13
1749150,In_Manhattan+1 Small Block to train,1198.0,275
45242223,Spacious Private Bedroom in Hip Bushwick,1198.0,0
20794111,Amazing Soho Apartment,1195.0,7
45741502,Sunny Spacious Room + Private Bathroom in Bushwick,1195.0,0
19294064,Beautiful 1 Bdrm w/ Large Private Patio,1193.0,0
53039594,Cozy room for easy traveler,1190.0,71
21952837,Private room in two floor apartment w/ back yard,1190.0,66
24658554,Room in Washingtom Heights (Manhattan),1190.0,26
17627223,Big beautiful bedroom in huge Bushwick apartment,1190.0,8


#### Question 5
You are helping a traveler find the <strong>most expensive</strong> Airbnb listing available in New York City. After examining the data, you find one listing with the highest price.<br>
Create a table with the columns: <strong>listing_id, listing, and price</strong>.

Which of the following details correctly describes the most expensive Airbnb listing?

#### Options

- Listing: Sun filled 1 bedroom in the heart of Crown Heights — Neighborhood: Crown Heights — Price: $1199.00<br>
- Listing: Modern loft in great neighborhood — Neighborhood: Williamsburg — Price: $1500.00<br>
- Listing: Cozy Studio in Midtown — Neighborhood: Midtown — Price: $900.00<br>
- Listing: Large & Adorable 2 Bedroom in the Heart of Bklyn — Neighborhood: Brooklyn — Price: $850.00<br>


In [7]:
%%sql

SELECT listing_id, listing, MAX(price) AS price
FROM listings l
JOIN neighbourhoods n 
ON l.neighbourhood_id = n.neighbourhood_id;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,price
12022483,Sun filled 1 bedroom in the heart of Crown Heights,1199.0


In [8]:
%%sql

SELECT listing_id, listing, price
FROM listings l
JOIN neighbourhoods n 
ON l.neighbourhood_id = n.neighbourhood_id
ORDER BY price DESC
LIMIT 1;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,price
12022483,Sun filled 1 bedroom in the heart of Crown Heights,1199.0


In [32]:
%%sql

SELECT listing_id, listing, price
FROM (SELECT listing_id, listing, price, availability_365
      FROM listings l
      JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
      ORDER BY price DESC, availability_365 DESC
      LIMIT 10)
LIMIT 1;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,price
12022483,Sun filled 1 bedroom in the heart of Crown Heights,1199.0


#### Question 6
CityBnb is analyzing neighborhoods to find where the <strong>most affordable</strong> Airbnb listings are located. 7 cheapest listings were retrieved, along with their neighborhood IDs. 
Create a table that shows <strong>listing, neighbourhood_id and price</strong>.

Which neighborhood ID(s) appear most frequently among these affordable listings?

#### Options

- Neighborhood ID 5 appears most frequently.
- Neighborhood ID 14 has the most affordable listings.
- Neighborhood ID 80 has the cheapest listings overall.
- Neighborhood ID 32 has the majority of affordable listings.



In [10]:
%%sql

SELECT listing_id, listing, n.neighbourhood_id, price
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
ORDER BY price ASC
LIMIT 7;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,neighbourhood_id,price
46397082,Home Away from Home,5,51.0
1947426,Fantastic East Village Location!,14,52.0
44970491,Bright Minimalist Creative Retreat Bedroom BedStuy,5,52.0
43859816,Sunny Room,5,55.0
6597240,Creative Woodside Studio,80,56.0
12945928,Large Modern 3-bdrm Duplex Apt near Manhattan,32,56.0
51652769,(2R) Cozy and clean bedroom with private bathroom,143,57.0


In [11]:
%%sql

SELECT listing_id, listing, n.neighbourhood_id, price
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
WHERE price < 60;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,neighbourhood_id,price
1947426,Fantastic East Village Location!,14,52.0
6597240,Creative Woodside Studio,80,56.0
12945928,Large Modern 3-bdrm Duplex Apt near Manhattan,32,56.0
43859816,Sunny Room,5,55.0
44970491,Bright Minimalist Creative Retreat Bedroom BedStuy,5,52.0
46397082,Home Away from Home,5,51.0
51652769,(2R) Cozy and clean bedroom with private bathroom,143,57.0


#### Question 7 
Which <strong>two</strong> hosts receive the highest number of reviews on the Airbnb platform?
Identify the hosts who consistently receive the most feedback. <br> Create a table with the following columns: <strong>host_id, host, number_of_reviews, review_rate_number, and reviews_per_month</strong>.

#### Options

- Agnes with 439 reviews and Gurpreet Singh with 424 reviews
- Agnes with 300 reviews and Gurpreet Singh with 350 reviews
- John Doe with 500 reviews and Sarah Lee with 450 reviews
- Gurpreet Singh with 439 reviews and Agnes with 424 reviews



In [12]:
%%sql

SELECT l.host_id, h.host, l.reviews_per_month AS monthly_review_perc, review_rate_number, number_of_reviews
FROM listings l
JOIN hosts h ON l.host_id = h.host_id
GROUP BY l.host_id, h.host
ORDER BY number_of_reviews DESC
LIMIT 2;

 * sqlite:///airbnb_nyc.db
Done.


host_id,host,monthly_review_perc,review_rate_number,number_of_reviews
935726341,Agnes,5.12,4,439
2044138632,Gurpreet Singh,8.86,4,424


#### Question 8 
Find Top 5 <strong>unique</strong> hosts that owns the most expensive listings.<br>
Identify the operators behind the priciest listings, return a table with columns: <strong>host, listing_id, price</strong>.

#### Options

- Linda, Chris, Maria, Cathal, Tian
- Agnes, Gurpreet Singh, Linda, Chris, Maria
- John, Sarah, Agnes, Gurpreet Singh, Tian
- Linda, Agnes, Tian, Cathal, Maria



In [13]:
%%sql

SELECT l.host_id, host, listing_id, price
FROM listings l
JOIN hosts h 
ON l.host_id =h.host_id
ORDER BY price DESC
LIMIT 5;

 * sqlite:///airbnb_nyc.db
Done.


host_id,host,listing_id,price
598865899,Linda,12022483,1199.0
1799553000,Chris,1749150,1198.0
868180273,Maria,45242223,1198.0
544412318,Cathal,20794111,1195.0
1260133006,Tian,45741502,1195.0


##### Question 9
A real estate analysis is being conducted by CityBnb to determine where Airbnb activity is most concentrated in New York City. To support this, the goal is to identify the neighborhoods with the highest <strong>number</strong> of Airbnb listings.<br>
Your task is to create a table with columns: <strong>neighbourhood</strong> and <strong>total_listings</strong>, and identify the neighbourhoods with the most Airbnb listings.

Which three neighborhoods have the highest number of Airbnb listings?

#### Options

- Bedford-Stuyvesant, Williamsburg, Harlem
- Harlem, Crown Heights, Bushwick
- Williamsburg, East Village, Chelsea
- Crown Heights, Harlem, Bushwick


In [14]:
%%sql
SELECT 
    neighbourhood,
    COUNT(n.neighbourhood_id) AS total_listings
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
GROUP BY neighbourhood
ORDER BY total_listings DESC
LIMIT 3;

 * sqlite:///airbnb_nyc.db
Done.


neighbourhood,total_listings
Bedford-Stuyvesant,69
Williamsburg,64
Harlem,56


In [15]:
%%sql

SELECT neighbourhood,
       COUNT(n.neighbourhood_id) AS total_listings
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
GROUP BY neighbourhood
HAVING total_listings > 50
ORDER BY total_listings DESC;

 * sqlite:///airbnb_nyc.db
Done.


neighbourhood,total_listings
Bedford-Stuyvesant,69
Williamsburg,64
Harlem,56


#### Question 10
You are working with CityBnb data to identify hosts who might not be fully verified and who are not actively renting out their properties throughout the year. This could help the platform improve trust and availability.<br>
Your task is to create a table with columns: <strong>host_id, host, identity_verified, and availability_365 </strong> to identify unverified hosts who are not active throughout the year.

Which of the following statements correctly describes <strong>unverified</strong> hosts who have <strong>zero</strong> availability throughout the year?

#### Options

- These hosts have identity verified status as "unconfirmed" and availability_365 equals 0, meaning their listings are not available any day of the year.
- These hosts have identity verified status as "confirmed" and availability_365 equals 365, meaning their listings are always available.
- These hosts have identity verified status as "unconfirmed" and availability_365 equals 365, meaning their listings are always available despite being unverified.
- These hosts have identity verified status as "confirmed" and availability_365 equals 0, meaning their listings are never available but are verified.


In [16]:
%%sql

SELECT 
    h.host_id, 
    host,
    identity_verified,
    availability_365
FROM listings l
JOIN hosts h 
ON l.host_id = h.host_id
WHERE h.identity_verified = 'unconfirmed' 
AND l.availability_365 = 0;


 * sqlite:///airbnb_nyc.db
Done.


host_id,host,identity_verified,availability_365
123600518,Meryl,unconfirmed,0
124039648,Kierra,unconfirmed,0
162511322,Hans,unconfirmed,0
176369804,Samet,unconfirmed,0
182476890,Lawrence,unconfirmed,0
191490859,Adam,unconfirmed,0
232268189,Juan Pablo,unconfirmed,0
234136209,Loreley,unconfirmed,0
246371399,Rishab,unconfirmed,0
271758427,Yaakov,unconfirmed,0


#### Question 11 
A marketing team is seeking to focus Airbnb promotional efforts on neighborhoods with the highest levels of guest activity. To support this initiative, guest review data is being analyzed to identify which neighborhoods receive the highest total number of guest reviews.<br>
Your task is to create a table with columns: <strong>neighbourhood, neighbourhood_group, and total_reviews</strong> to identify which areas in the city receive the most guest reviews overall. Use the number_of_reviews column to calculate the total reviews per neighborhood.

Which three neighborhoods have the highest <strong>total</strong> guest reviews according to the data?


#### Options

- Bedford-Stuyvesant, Williamsburg, Harlem
- East Harlem, East Village, Hell's Kitchen
- Crown Heights, Upper East Side, Upper West Side
- Midtown, Richmond Hill, Astoria


In [35]:
%%sql

SELECT 
    l.neighbourhood_id,
    neighbourhood,
    SUM(number_of_reviews) AS total_reviews
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
GROUP BY n.neighbourhood
ORDER BY total_reviews DESC
LIMIT 3;

 * sqlite:///airbnb_nyc.db
Done.


neighbourhood_id,neighbourhood,total_reviews
5,Bedford-Stuyvesant,2239
10,Williamsburg,2067
11,Harlem,1642


In [34]:
%%sql

WITH ranked_reviews AS (
    SELECT 
        l.neighbourhood_id,
        n.neighbourhood,
        SUM(l.number_of_reviews) AS total_reviews,
        RANK() OVER (ORDER BY SUM(l.number_of_reviews) DESC) AS review_rank
    FROM listings l
    JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
    GROUP BY l.neighbourhood_id, n.neighbourhood
)
SELECT *
FROM ranked_reviews
WHERE review_rank <= 3;

 * sqlite:///airbnb_nyc.db
Done.


neighbourhood_id,neighbourhood,total_reviews,review_rank
5,Bedford-Stuyvesant,2239,1
10,Williamsburg,2067,2
11,Harlem,1642,3


#### Question 12 
Which listings have availability_365=0, review_rate_number = 1 and <strong>number</strong> of reviews <= 1?<br>
Your task is to create a table with columns: <strong>listing_id, listing, availability_365, review_rate_number, and number_of_reviews</strong>.

#### Options


- Bright Minimalist Creative Retreat Bedroom BedStuy,<br> 
  Cozy Room With Big Comfortable Bed!,<br>
  Bright and modern 1 BR in Williamsburg w/balcony,<br>
  Cozy Lodge Chic in the heart of New York<br>

- Bright and modern 1 BR in Williamsburg w/balcony,<br> 
Charming light-filled apartment in Crown Heights,<br> 
Bright Minimalist Creative Retreat Bedroom BedStuy,<br>
Cozy apartment in Upper Manhattan!
  
- Spacious bedroom suite in Brooklyn brownstone,<br> 
Cozy Lodge Chic in the heart of New York,<br> 
Cozy Room With Big Comfortable Bed!,<br> 
Comfortable One bedroom Harlem Aprt<br> 

- Big, Bright 1 Bedroom b/w Columbia, City College,<br>
  Bright and modern 1 BR in Williamsburg w/balcony,<br>
  Comfortable One bedroom Harlem Aprt,<br>
  Spacious bedroom suite in Brooklyn brownstone



In [36]:
%%sql

SELECT listing_id, listing, availability_365, review_rate_number, number_of_reviews
FROM listings
WHERE review_rate_number IS NOT NULL
  AND availability_365 = 0
ORDER BY availability_365 ASC, review_rate_number ASC, number_of_reviews ASC;  


 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,availability_365,review_rate_number,number_of_reviews
11617647,Bright and modern 1 BR in Williamsburg w/balcony,0,1,1
43794645,Charming light-filled apartment in Crown Heights,0,1,1
44970491,Bright Minimalist Creative Retreat Bedroom BedStuy,0,1,1
50810512,Cozy apartment in Upper Manhattan!,0,1,1
13564504,Greenwich Village Apartment,0,1,2
44823027,Beautiful Sunlit Room in Brooklyn,0,1,2
49886514,"Spacious 1BR amazing view, Beach 5 min, airport 20",0,1,2
11841328,Bright Spacious BK Room with Bath,0,1,3
11869495,Beautiful & spacious apartment on Upper East Side,0,1,3
14325021,Private sanctuary close to Central park,0,1,3


#### Question 13 
Identify hosts who own more than 100 listings and what normalization principle is being violated?<br>
Create a table that returns columns: <strong>host, and calculated_host_listings_count</strong>.

#### Options

- Kazuya, 1st Normal Form (1NF)
- Kara, 2nd Normal Form (2NF)
- Sonder (NYC), 2nd Normal Form (2NF)
- Airbnb Official, 3rd Normal Form (3NF)


In [19]:
%%sql

SELECT h.host_id, host, calculated_host_listings_count
FROM listings l
JOIN hosts h ON l.host_id = h.host_id
WHERE calculated_host_listings_count >= 100
GROUP BY h.host_id
ORDER BY calculated_host_listings_count DESC;

 * sqlite:///airbnb_nyc.db
Done.


host_id,host,calculated_host_listings_count
1936282448,Sonder (NYC),327
1844778040,Sonder (NYC),327
1370945204,Sonder (NYC),327
1198781658,Sonder (NYC),327
574716087,Sonder (NYC),327
1968430100,Kara,121
750461323,Kara,121
882085888,Kazuya,103


#### Question 14 
Which neighborhood has the highest number of guest reviews among listings that are available most of the year (availability_365 ≥ 300)<br>
Create a table showing the neighborhoods where listings are available most of the year. The table should return: <strong>neighbourhood, neighbourhood_group,<br> total_high_availability_listings</strong> (count of listings with <strong>availability_365</strong> > 300), and total_reviews.

#### Options

- Hell's Kitchen
- Upper East Side
- Bedford-Stuyvesant
- East Village


In [20]:
%%sql

SELECT 
    l.neighbourhood_id,
    neighbourhood_group_id,
    neighbourhood,
    COUNT(l.neighbourhood_id) AS total_high_availability_listings,
    SUM(number_of_reviews) AS total_reviews
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
WHERE availability_365 >= 300 
GROUP BY n.neighbourhood
HAVING COUNT(l.neighbourhood_id) >= 5
ORDER BY total_reviews DESC, total_high_availability_listings DESC;

 * sqlite:///airbnb_nyc.db
Done.


neighbourhood_id,neighbourhood_group_id,neighbourhood,total_high_availability_listings,total_reviews
37,2,Upper East Side,5,626
6,2,Hell's Kitchen,9,424
14,2,East Village,7,322
5,1,Bedford-Stuyvesant,11,252
8,1,Bushwick,12,160
10,1,Williamsburg,5,134
42,1,Prospect-Lefferts Gardens,5,115
7,2,Upper West Side,5,64
11,2,Harlem,5,30
19,2,Financial District,5,29


#### Question 15 
Which listings have the highest estimated revenue? Multiply nightly price by number of available nights.<br> Create a table that will return a <strong>listing, neighbourhood_id, host_id, rt.type, and estimated revenue</strong>. Show the first five.

True or False? Small room for 1 Person-Best Value, has the highest estimated revenue

#### Options

- True
- False

In [37]:
%%sql
SELECT 
    listing_id,
    listing,
    neighbourhood_id,
    host_id,
    rt.type,
    ROUND(price * availability_365) AS estimated_revenue
FROM listings l
JOIN room_types rt ON rt.room_type_id = l.room_type_id
WHERE price IS NOT NULL AND
    availability_365 IS NOT NULL
ORDER BY 
    estimated_revenue DESC
LIMIT 5;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,neighbourhood_id,host_id,type,estimated_revenue
55836994,Small room for 1 Person-Best Value,117,1380561739,Private room,423765.0
48129647,"One train ride to Times SQ, Central Park, LOCATION",3,548848159,Entire home/apt,408870.0
50965156,Interfaith Retreat Guest Rooms (Bhakti),28,1631373893,Private room,404234.0
12013646,100$,32,334356607,Private room,402597.0
2790788,"One Bedroom Apt, Suitable for Two",115,132238305,Entire home/apt,393464.0


#### Question 16 
Which five neighborhoods receive the most negative reviews?
Highlights areas where guest satisfaction (<strong>review_rate_number</strong> < 2).

#### Options

- Harlem, Bedford-Stuyvesant, Crown Heights, Williamsburg, Bushwick
- East Village, Hell's Kitchen, Bushwick, Upper East Side, Harlem
- Crown Heights, Upper West Side, SoHo, Williamsburg, Bushwick
- Bedford-Stuyvesant, Tribeca, Bushwick, Chelsea, Crown Heights


In [22]:
%%sql

SELECT 
    l.neighbourhood_id,
    n.neighbourhood_group_id,
    neighbourhood,
    COUNT(l.neighbourhood_id) AS negative_review_count
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
WHERE l.review_rate_number < 2
GROUP BY neighbourhood
HAVING negative_review_count >= 5
ORDER BY negative_review_count DESC;

 * sqlite:///airbnb_nyc.db
Done.


neighbourhood_id,neighbourhood_group_id,neighbourhood,negative_review_count
11,2,Harlem,10
5,1,Bedford-Stuyvesant,10
36,1,Crown Heights,9
10,1,Williamsburg,5
8,1,Bushwick,5


#### Question 17   
Find how different cancellation policies are distributed across instant bookable and non-instant bookable listings in the Airbnb?<br>
Create a table that return columns: <strong>cancellation_policy_id, instant_bookable, listing_count</strong>(needs to be calculated)

Which combination of instant_bookable and cancellation_policy_id has the highest <strong>number</strong> of listings?

#### Options

- Instant_bookable = 1, Cancellation_policy_id = 1
- Instant_bookable = 0, Cancellation_policy_id = 2
- Instant_bookable = 1, Cancellation_policy_id = 2
- Instant_bookable = 0, Cancellation_policy_id = 3



In [42]:
%%sql

WITH ranked_policies AS (
    SELECT 
        instant_bookable,
        cancellation_policy_id,
        COUNT(*) AS listing_count
    FROM listings
    GROUP BY instant_bookable, cancellation_policy_id
)
SELECT 
    *,
    RANK() OVER (ORDER BY listing_count DESC) AS rank_overall
FROM ranked_policies
ORDER BY rank_overall;

 * sqlite:///airbnb_nyc.db
Done.


instant_bookable,cancellation_policy_id,listing_count,rank_overall
0,2,149,1
1,1,146,2
0,1,140,3
0,3,140,3
1,3,138,5
1,2,137,6


#### Question 18
Flag listings with unusually high service fees. Create a table with columns: <strong>listing, price, service fee, service fee percentage</strong>.

True or False? There are Airbnb listings where the service fee is more than half (50%) of the nightly price.

#### Options

- True
- False

In [24]:
%%sql

SELECT 
    listing_id, 
    listing, 
    price, 
    service_fee,
    ROUND((service_fee / price) * 100, 2) AS service_fee_percent
FROM listings
WHERE price > 0 AND 
    (service_fee / price) > 0.5
ORDER BY service_fee_percent ASC;


 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,price,service_fee,service_fee_percent


#### Question 19
What is the average price of Airbnb listings in each neighbourhood by room type, and how many listings exist for each room type?
Find affordability of room types in neighbourhoods. Return a table with columns: <strong>neighbourhood, room type, listings_per_room_type, price</strong> average.

In [25]:
%%sql

SELECT
    l.neighbourhood_id,
    neighbourhood,
    rt.type AS room_type,
    COUNT(l.room_type_id) AS listings_per_room_type,
    ROUND(AVG(price)) AS avg_price
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
JOIN room_types rt ON l.room_type_id = rt.room_type_id
GROUP BY neighbourhood, rt.type
ORDER BY listings_per_room_type DESC, avg_price DESC;


 * sqlite:///airbnb_nyc.db
Done.


neighbourhood_id,neighbourhood,room_type,listings_per_room_type,avg_price
8,Bushwick,Private room,38,612.0
5,Bedford-Stuyvesant,Private room,38,555.0
10,Williamsburg,Entire home/apt,35,618.0
5,Bedford-Stuyvesant,Entire home/apt,31,633.0
11,Harlem,Private room,28,625.0
10,Williamsburg,Private room,27,575.0
11,Harlem,Entire home/apt,27,505.0
6,Hell's Kitchen,Entire home/apt,24,569.0
36,Crown Heights,Entire home/apt,21,741.0
14,East Village,Entire home/apt,18,686.0


##### Q20/20: What are the estimated costs for short-term and extended stays across different room types?
#### Based on available data, what would a guest expect to pay for a short stay (fewer than 7 nights) or a long stay (more than 7 nights) as <strong>stay_type</strong>, and which <strong>room types</strong> are available at those <strong>price</strong> points? 

In [26]:
%%sql

SELECT  
    CASE 
    WHEN minimum_nights < 7 THEN 'Short Stay (<7 nights)'
    ELSE 'Long Stay (≥7 nights)'
    END AS stay_type,
    rt.type,
    ROUND(AVG(price),2) AS price
FROM Listings l
JOIN room_types rt ON l.room_type_id = rt.room_type_id
GROUP BY stay_type, rt.type
ORDER BY price DESC;


 * sqlite:///airbnb_nyc.db
Done.


stay_type,type,price
Short Stay (<7 nights),Shared room,791.93
Long Stay (≥7 nights),Entire home/apt,629.66
Short Stay (<7 nights),Entire home/apt,628.4
Short Stay (<7 nights),Private room,615.38
Long Stay (≥7 nights),Private room,568.29
Long Stay (≥7 nights),Shared room,564.33


In [39]:
%%sql

SELECT *
FROM cancellation_policies

 * sqlite:///airbnb_nyc.db
Done.


cancellation_policy_id,policy
3,flexible
2,moderate
1,strict
