# SQL EXAM Model Solution

## SECTION A: Theory Predicts Multiple choice

#### Question 1/10

Brooklyn’s housing authority has launched a crackdown on illegal short-term 
rentals. CityBnB’s internal data reveals clusters of listings in Williamsburg and 
Bushwick suspected to be operated by commercial entities posing as individual 
hosts. Your task is to identify hosts with multiple listings to help investigators 
prioritize enforcement. 

##### Which SQL clause would you use? 

>GROUP BY host_id HAVING COUNT(*) > 1 

#### Question 2/10 

CityBnB’s leadership is preparing a presentation for New York City Council to 
demonstrate compliance with short-term rental laws. They need a month-by-month 
breakdown of 2024 bookings to prove seasonal demand aligns with housing 
availability regulations. The bookings table uses YYYY-MM-DD formatting. 

##### Which query generates the required report? 

> SELECT MONTH(booking_date), COUNT(*) 
> FROM bookings 
> WHERE YEAR(booking_date) = 2024 
> GROUP BY MONTH(booking_date); 

#### Question 3/10

Chinatown’s community board reports a surge in vacant "ghost listings" that sit 
empty year-round, exacerbating housing shortages. CityBnB must calculate the 
percentage of listings marked "available" in the calendar table but with zero 
bookings to address public concerns. 

##### Which method calculates this accurately?

> (COUNT(*) * 100.0) / (SELECT COUNT(*) FROM calendar) 

#### Question 4/10 

CityBnB is partnering with a real estate analytics firm to study housing trends. The 
firm requests a non-duplicated list of property types (e.g., "entire apartment," 
"private room") to analyze market saturation in high-demand areas like Manhattan 
and Queens.

##### Which SQL clause ensures unique property types?

>DISTINCT

#### Question 5/10 

A whistleblower in Sandton Heights—a luxury rental hotspot—claims that 30% of 
high-end listings are fake, using stock photos and fabricated reviews. CityBnB’s 
fraud team needs to identify the top 5 hosts by listing count for further 
investigation. 

##### What does this query detect? 

> Hosts with multiple listings, possible commercial operators

#### Question 6/10 

Users complain that CityBnB’s search filters fail to show affordable options in 
Harlem. The product team adds a "Budget-Friendly" filter for listings under 
$300/night but needs to validate the query before deployment. 

##### Which SQL query works?

>SELECT * FROM listings WHERE price < 300;

#### Question 7/10 

Following a data breach, CityBnB’s security team mandates an audit of hosts with 
unverified identities. The host_identity_verified column uses 'True'/'False' strings. 
Failure to comply could result in fines under New York’s short-term rental laws. 

##### Which SQL expression counts unverified hosts? 

>COUNT(CASE WHEN host_identity_verified = 'False' THEN 1 END)

#### Question 8/10 

A viral TikTok video exposes a "party house" in Astoria with 25+ noise complaints. 
CityBnB’s legal team needs to flag listings with >10 "noise" or "party" reviews AND >90% occupancy 
(booked ≥329 days/year) to avoid liability. 

##### Which query meets both conditions?

>WITH problematic_listings AS ( 
SELECT listing_id, COUNT(*) AS noise_reports 
FROM reviews 
WHERE comment ILIKE '%noise%' OR comment ILIKE '%party%' 
GROUP BY listing_id 
HAVING COUNT(*) > 10 
) 
SELECT l.id, l.name, p.noise_reports 
FROM listings l 
JOIN problematic_listings p ON l.id = p.listing_id 
WHERE l.availability_365 < 36;  -- 365 - 36 = 329 days booked

#### Question 9/10 

Scenario: A junior analyst’s query crashes CityBnB’s dashboard during a live demo 
to investors. The intended goal was to display high-rated properties (>4 stars) in 
Tribeca for a premium marketing campaign. 

##### What’s wrong with this syntax? 
- SELECT * FROM properties WHERE rating > 4 ORDER;

>The keyword ORDER should be ORDER BY

#### Question 10/10 

Scenario: As CityBnB scales to 50+ cities, redundant "host_status" entries (e.g., 
10,000 rows with 'superhost') slow down updates. The engineering team proposes 
normalizing this into a separate host_status table. 

##### What’s the primary advantage?

>To avoid redundant data and simplify updates 

## SECTION B: Practical questions with applied multiple choice
You are provided with a pre-populated SQLite database named `airbnb.db`. Download [here](https://www.kaggle.com/datasets/arianazmoudeh/airbnbopendata) if you haven't already. Your task is to explore this database and write a series of SQL queries to perform the tasks detailed below. Queries should be optimised to run within 20 seconds or less.

The tables and columns included in the `airbnb.db` are:

- `listings`: `id`, `host_id`, `name`, `neighbourhood_id`, `latitude`. `longitude`, `room_type_id`, ` construction_year`, `number_of_reviews`, `last_review`, `reviews_per_month`, `review_rate_number`, `calculated_host_listings_count` ,  `availability_365`, `instant_bookable`, `cancellation policy`, `house_rules`, `license`  
- `hosts`: `id`, `name`, `identity_verified`
- `neighbourhoods`: `id`, `name`, `neighbourhood_group_id`
- `neighbourhood_groups`: `id`, `name`
- `room types`: `id`, `type`
- `cancellation_policies`: `id`, `policy`

<div style="text-align: center;">
    <img src="ERD_airBnB.png" alt="Airbnb ERD" width="800"/>
</div>

In [2]:
%load_ext sql

In [3]:
%sql sqlite:///airbnb_nyc.db

### **Question 1/20** - Unique Neighbourhoods in San Francisco Dataset

You are analyzing the Airbnb listings dataset for San Francisco. The dataset contains a table named `neighborhoods` which lists each unique neighborhood by its ID. **How many unique neighborhoods are there in this dataset?**

**A)** 150  

**B)** 200  

**C)** 218  

**D)** 250  


In [4]:
%%sql

SELECT COUNT(DISTINCT neighbourhood_id) 
FROM neighbourhoods;

 * sqlite:///airbnb_nyc.db
Done.


COUNT(DISTINCT neighbourhood_id)
218


### EXPLANATION

<span style="color:red">

**Correct answer:** C) 218

The query to get this answer involves counting the distinct neighborhood IDs to ensure each unique neighborhood is counted only once. This gives the accurate total number of neighborhoods in the dataset, reflecting the actual variety of neighborhoods in San Francisco.

</span>

---

#### Incorrect options:

- **A)** is false because it counts only neighborhoods marked as ‘active,’ excluding others and resulting in a lower number than the total unique neighborhoods.

- **B)** is false because it counts all neighborhood IDs without removing duplicates, so some neighborhoods are counted multiple times.

- **D)** is false because it likely counts all rows, including duplicates or irrelevant entries, leading to an overestimate of the number of neighborhoods.


### Question 2/20 - Ghost listings and availability

Airbnb listings are being reviewed to identify properties that are listed but have zero availability throughout the entire year. These are often called “ghost listings” because they appear on the platform but cannot actually be booked.

**Based on the dataset, which of the following statements is TRUE?**

**A)** All listings have some availability throughout the year, so there are no ghost listings. 

**B)** There are several listings with zero availability, including “Chill in Alphabet City” and “Modern loft in great neighborhood.”

**C)** Listings with zero availability only exist in Brooklyn.  

**D)** Listings with zero availability are always luxury apartments.


In [5]:
%%sql 

SELECT listing_id, listing
FROM listings
WHERE availability_365 = 0;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing
10518017,Chill in Alphabet City
10579323,Modern loft in great neighborhood
10627925,Upper West Side apartment with balcony
10740594,Your home away from home...
10798586,Sunny and spacious apartment in Brooklyn
10818469,Spacious Room with Character in BK
10852159,"Modern 3 BR home, 4 blocks from Public Transport!"
10885849,"Fun, Comfy, and Convenient Studio in Midtown West"
10943288,beautiful one bedroom apartment
11179120,Chic 1 br with huge private garden


### EXPLANATION

<span style="color:red">

**Correct answer:** B) There are several listings with zero availability, including “Chill in Alphabet City” and “Modern loft in great neighborhood.”

The query used includes a condition that filters listings where `availability_365 = 0`, ensuring that only listings with zero availability throughout the year are returned. The results clearly show multiple such listings, including “Chill in Alphabet City” and “Modern loft in great neighborhood,” confirming that ghost listings do exist and are not limited to one borough or property type.

</span>

---

#### Incorrect options:

- **A)** is false because the query specifically returns many listings with zero availability, proving that not all listings have some availability.

- **C)** is false because zero-availability listings appear in various neighborhoods, not only in Brooklyn.

- **D)** is false because these listings represent a wide range of property types, not exclusively luxury apartments.

  ---


### Question 3/20 – Host Verification and Review Scores

Airbnb host data is being analyzed to understand how **identity verification** impacts **guest review ratings**.  
Hosts are categorized as either **“verified”** or **“unconfirmed”** (not verified).

#### What can you conclude about the average review scores?

Choose the most accurate statement:

 **A)** Verified hosts have a slightly higher average review score than unconfirmed hosts.  
 
 **B)** Unconfirmed hosts have a much higher average review score than verified hosts.  
 
 **C)** Both verified and unconfirmed hosts have the exact same average review score. 
 
 **D)** Verified hosts have a significantly lower average review score than unconfirmed hosts.

 ---

In [6]:
%%sql

SELECT h.identity_verified, ROUND(AVG(l.review_rate_number),2) AS avg_rating
FROM listings l
JOIN hosts h ON l.host_id =h.host_id
GROUP BY h.identity_verified;

 * sqlite:///airbnb_nyc.db
Done.


identity_verified,avg_rating
unconfirmed,3.15
verified,3.22


### EXPLANATION

<span style="color:red">

**Correct answer:** A) Verified hosts have a slightly higher average review score than unconfirmed hosts.

The query joins the `listings` and `hosts` tables on `host_id` and groups the results by the `identity_verified` status. This ensures the calculation of average review ratings for both verified and unconfirmed hosts separately. The results show that verified hosts have a slightly higher average rating of 3.22 compared to 3.15 for unconfirmed hosts, indicating a modest positive impact of identity verification on guest reviews.

</span>

---

#### Incorrect options:

- **B)** is false because unconfirmed hosts have a lower average score (3.15) than verified hosts (3.22), so they do not have a much higher rating.

- **C)** is false because the average ratings are close but not exactly the same between verified and unconfirmed hosts.

- **D)** is false because verified hosts do not have significantly lower average scores; in fact, their average rating is higher.
---

### Question 4/20 – Most Expensive Airbnb Listing in NYC

You are helping a traveler find the **most expensive Airbnb listing** available in New York City.  
After examining the data, you find one listing with the **highest price**.

#### Which of the following details correctly describes the most expensive Airbnb listing?

**A)** Listing: *Sun filled 1 bedroom in the heart of Crown Heights* — Neighborhood: **Crown Heights** — Price: **$1199.00** 

**B)** Listing: *Modern loft in great neighborhood* — Neighborhood: **Williamsburg** — Price: **$1500.00**  

**C)** Listing: *Cozy Studio in Midtown* — Neighborhood: **Midtown** — Price: **$900.00**  
 
**D)** Listing: *Large & Adorable 2 Bedroom in the Heart of Bklyn* — Neighborhood: **Brooklyn** — Price: **$850.00**


In [7]:
%%sql

SELECT listing_id, listing, neighbourhood, MAX(price) AS price
FROM listings l
JOIN neighbourhoods n 
ON l.neighbourhood_id = n.neighbourhood_id;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,neighbourhood,price
12022483,Sun filled 1 bedroom in the heart of Crown Heights,Crown Heights,1199.0


In [69]:
%%sql

SELECT listing_id, listing, neighbourhood, price
FROM listings l
JOIN neighbourhoods n 
ON l.neighbourhood_id = n.neighbourhood_id
ORDER BY price DESC
LIMIT 1;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,neighbourhood,price
12022483,Sun filled 1 bedroom in the heart of Crown Heights,Crown Heights,1199.0


### EXPLANATION

<span style="color:red">

**Correct answer:** A) *Sun filled 1 bedroom in the heart of Crown Heights* — **Neighborhood:** Crown Heights — **Price:** $1199.00

The query to get this answer involves joining the `listings` and `neighbourhoods` tables using the `neighbourhood_id`, then ordering the results by price in descending order and selecting the top result. This approach ensures that only the highest-priced listing is retrieved, along with its name and associated neighborhood. The listing titled “Sun filled 1 bedroom in the heart of Crown Heights”, located in Crown Heights, has the highest price of $1199.00 in the dataset.

</span>

---

#### Incorrect options:

- **B)** is incorrect because the price listed ($1500.00) does not appear in the dataset and exceeds the actual maximum price, making it inaccurate.

- **C)** is incorrect as the price (900.00  is lower than the highest listing price of $1199.00.)

- **D)** is also incorrect because its price ($850.00) falls below the top listing’s price and was not the most expensive.

---

### Question 5/20 – Availability of the Priciest Listings in NYC

An analysis is conducted to determine whether the priciest Airbnb listings in New York City are also the most available to guests throughout the year.

#### Based on the top 10 most expensive listings, is it true that these listings are also the most available (have high `availability_365` values)?

 **A)** True — All top 10 listings have high availability throughout the year.
 
 **B)** False — Many of the top 10 most expensive listings have low or zero availability.


In [68]:
%%sql

SELECT listing_id, listing, price, availability_365
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
ORDER BY price DESC, availability_365 DESC
LIMIT 10;


 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,price,availability_365
12022483,Sun filled 1 bedroom in the heart of Crown Heights,1199.0,13
1749150,In_Manhattan+1 Small Block to train,1198.0,275
45242223,Spacious Private Bedroom in Hip Bushwick,1198.0,0
20794111,Amazing Soho Apartment,1195.0,7
45741502,Sunny Spacious Room + Private Bathroom in Bushwick,1195.0,0
19294064,Beautiful 1 Bdrm w/ Large Private Patio,1193.0,0
53039594,Cozy room for easy traveler,1190.0,71
21952837,Private room in two floor apartment w/ back yard,1190.0,66
24658554,Room in Washingtom Heights (Manhattan),1190.0,26
17627223,Big beautiful bedroom in huge Bushwick apartment,1190.0,8


### EXPLANATION

<span style="color:red">

**Correct Answer:** B) False — Many of the top 10 most expensive listings have low or zero availability.

This conclusion was reached by selecting the top 10 listings with the highest prices and checking their `availability_365` values. This was done using a query that joined the `listings` and `neighbourhoods` tables, ordered the results by price in descending order, and limited the output to the top 10 results.

The results revealed that many of the most expensive listings had low or even zero availability. For example:

- *“Sun filled 1 bedroom in the heart of Crown Heights”* has a price of **$1199.00** but is only available **13 days** a year.
- Other listings in the top 10 include `availability_365` values of **0, 7, 8**, or **26** days, indicating limited availability.

This confirms that high-priced listings do not always have high availability, making option **B** correct.

</span>

---

####  Incorrect Option:

- **A) True — All top 10 listings have high availability throughout the year**  
  This is incorrect. The data clearly shows that several high-priced listings have low or no availability, contradicting the statement.

---

### Question 6/20 – Neighborhoods with Most Affordable Listings

CityBnb is analyzing neighborhoods to find where the most affordable Airbnb listings are located.  
7 cheapest listings were retrieved, along with their neighborhood IDs.

#### Which neighborhood ID(s) appear most frequently among these affordable listings?

**A)** Neighborhood ID 5 appears most frequently.  

**B)** Neighborhood ID 14 has the most affordable listings.  

**C)** Neighborhood ID 80 has the cheapest listings overall.

**D)** Neighborhood ID 32 has the majority of affordable listings.


In [10]:
%%sql

SELECT listing_id, listing, n.neighbourhood_id, price
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
ORDER BY price ASC
LIMIT 7;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,neighbourhood_id,price
46397082,Home Away from Home,5,51.0
1947426,Fantastic East Village Location!,14,52.0
44970491,Bright Minimalist Creative Retreat Bedroom BedStuy,5,52.0
43859816,Sunny Room,5,55.0
6597240,Creative Woodside Studio,80,56.0
12945928,Large Modern 3-bdrm Duplex Apt near Manhattan,32,56.0
51652769,(2R) Cozy and clean bedroom with private bathroom,143,57.0


In [11]:
%%sql

SELECT listing_id, listing, n.neighbourhood_id, price
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
WHERE price < 60;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,neighbourhood_id,price
1947426,Fantastic East Village Location!,14,52.0
6597240,Creative Woodside Studio,80,56.0
12945928,Large Modern 3-bdrm Duplex Apt near Manhattan,32,56.0
43859816,Sunny Room,5,55.0
44970491,Bright Minimalist Creative Retreat Bedroom BedStuy,5,52.0
46397082,Home Away from Home,5,51.0
51652769,(2R) Cozy and clean bedroom with private bathroom,143,57.0


### EXPLANATION

<span style="color:red">

**Correct answer:** A) Neighborhood ID 5 appears most frequently.

The query to get this answer involves joining the `listings` and `neighbourhoods` tables on `neighbourhood_id`, then ordering the listings by price in ascending order and limiting the results to the 7 cheapest listings. This approach ensures that only the most affordable listings are retrieved, along with their neighborhood IDs. The result shows that Neighborhood ID 5 appears three times among these cheapest listings, making it the most frequent.

</span>

---

#### Incorrect options:

- **B)** is incorrect because Neighborhood ID 14 appears only once among the cheapest listings, so it does not have the most affordable listings.

- **C)** is incorrect as Neighborhood ID 80 appears only once and does not represent the majority or the cheapest listings overall.

- **D)** is also incorrect because Neighborhood ID 32 appears only once, so it does not have the majority of affordable listings.

---

### Question 7/20 – Hosts with the Highest Number of Reviews

Which two hosts receive the highest number of reviews on the Airbnb platform?

**A)** Agnes with 439 reviews and Gurpreet Singh with 424 reviews 

**B)** Agnes with 300 reviews and Gurpreet Singh with 350 reviews

**C)** John Doe with 500 reviews and Sarah Lee with 450 reviews

**D)** Gurpreet Singh with 439 reviews and Agnes with 424 reviews



In [12]:
%%sql

SELECT l.host_id, h.host, l.reviews_per_month AS monthly_review_perc, review_rate_number, number_of_reviews
FROM listings l
JOIN hosts h ON l.host_id = h.host_id
GROUP BY l.host_id, h.host
ORDER BY number_of_reviews DESC
LIMIT 2;

 * sqlite:///airbnb_nyc.db
Done.


host_id,host,monthly_review_perc,review_rate_number,number_of_reviews
935726341,Agnes,5.12,4,439
2044138632,Gurpreet Singh,8.86,4,424


### EXPLANATION

<span style="color:red">

**Correct answer:** A) Agnes with 439 reviews and Gurpreet Singh with 424 reviews

The query to get this answer involves joining the `listings` and `hosts` tables on `host_id`, grouping by host, then ordering by `number_of_reviews` in descending order and selecting the top two. This method ensures that the two hosts with the highest total reviews are retrieved. Agnes and Gurpreet Singh top the list with 439 and 424 reviews respectively.

</span>

---

#### Incorrect options:

- **B)** is incorrect because it underestimates the actual review counts for Agnes and Gurpreet Singh.

- **C)** is incorrect because the hosts John Doe and Sarah Lee do not appear as the top reviewers in the dataset.

- **D)** is incorrect because it switches the review counts between Agnes and Gurpreet Singh.

  ---


### Question 8/20 – Top 5 Unique Hosts Owning the Most Expensive Listings

Find the top 5 unique hosts that own the most expensive listings:

**A)** Linda, Chris, Maria, Cathal, Tian  

**B)** Agnes, Gurpreet Singh, Linda, Chris, Maria

**C)** John, Sarah, Agnes, Gurpreet Singh, Tian

**D)** Linda, Agnes, Tian, Cathal, Maria


In [13]:
%%sql

SELECT l.host_id, host, listing_id, price
FROM listings l
JOIN hosts h 
ON l.host_id =h.host_id
ORDER BY price DESC
LIMIT 5;

 * sqlite:///airbnb_nyc.db
Done.


host_id,host,listing_id,price
598865899,Linda,12022483,1199.0
1799553000,Chris,1749150,1198.0
868180273,Maria,45242223,1198.0
544412318,Cathal,20794111,1195.0
1260133006,Tian,45741502,1195.0


### EXPLANATION

<span style="color:red">

**Correct Answer:** A) Linda, Chris, Maria, Cathal, Tian.

This is determined by first joining the `listings` and `hosts` tables on `host_id`, then ordering the results by price in descending order to get the most expensive listings, and finally selecting the unique hosts associated with these top listings. The results show that the hosts Linda, Chris, Maria, Cathal, and Tian own the top 5 most expensive listings, which matches option A.

</span>

---

#### Incorrect Options:

- **B)** Agnes, Gurpreet Singh, Linda, Chris, Maria  
  This option incorrectly includes Agnes and Gurpreet Singh, who do not own any of the top 5 most expensive listings.

- **C)** John, Sarah, Agnes, Gurpreet Singh, Tian  
  Hosts John, Sarah, Agnes, and Gurpreet Singh are not among the owners of the top 5 priciest listings.

- **D)** Linda, Agnes, Tian, Cathal, Maria  
  This option wrongly includes Agnes, who is not part of the top 5, and excludes Chris, who is.

  ---


### Question 9/20 – Neighborhoods with Highest Number of Airbnb Listings

A real estate analysis is being conducted by CityBnb to determine where Airbnb activity is most concentrated in New York City.  
To support this, the goal is to identify the neighborhoods with the highest number of Airbnb listings.

#### Which three neighborhoods have the highest number of Airbnb listings?

**A)** Bedford-Stuyvesant, Williamsburg, Harlem 

**B)** Harlem, Crown Heights, Bushwick

**C)** Williamsburg, East Village, Chelsea 

**D)** Crown Heights, Harlem, Bushwick


In [14]:
%%sql
SELECT 
    neighbourhood,
    COUNT(n.neighbourhood_id) AS total_listings
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
GROUP BY neighbourhood
ORDER BY total_listings DESC
LIMIT 3;

 * sqlite:///airbnb_nyc.db
Done.


neighbourhood,total_listings
Bedford-Stuyvesant,69
Williamsburg,64
Harlem,56


In [15]:
%%sql

SELECT neighbourhood,
       COUNT(n.neighbourhood_id) AS total_listings
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
GROUP BY neighbourhood
HAVING total_listings > 50
ORDER BY total_listings DESC;

 * sqlite:///airbnb_nyc.db
Done.


neighbourhood,total_listings
Bedford-Stuyvesant,69
Williamsburg,64
Harlem,56


### EXPLANATION

<span style="color:red">

**Correct Answer:** A) Bedford-Stuyvesant, Williamsburg, Harlem.

To identify the neighborhoods with the highest Airbnb activity, we perform a count of listings grouped by neighborhood. This involves joining the `listings` table with the `neighbourhoods` table using the `neighbourhood_id` as the key. We then group the data by neighborhood name and order by the total number of listings in descending order, selecting the top three.

The results show that Bedford-Stuyvesant has 69 listings, Williamsburg has 64, and Harlem has 56. These neighborhoods have the highest number of Airbnb listings, indicating the greatest concentration of Airbnb activity, which confirms option **A** as the correct choice.

</span>

---

#### Incorrect Options :

- **B)** Harlem, Crown Heights, Bushwick  
  Though Harlem appears in the top three, Crown Heights and Bushwick have fewer listings and do not appear among the top three neighborhoods by listing count.

- **C)** Williamsburg, East Village, Chelsea  
  Only Williamsburg is among the top three. East Village and Chelsea do not have enough listings to be in the top three.

- **D)** Crown Heights, Harlem, Bushwick  
  Again, while Harlem is in the top three, Crown Heights and Bushwick do not have the highest counts required to make this option correct.

---

### Question 10/20 – Unverified Hosts with Zero Availability

You are working with CityBnb data to identify hosts who might not be fully verified and who are not actively renting out their properties throughout the year.  
This could help the platform improve trust and availability.

#### Which of the following statements correctly describes unverified hosts who have zero availability throughout the year?

**A)** These hosts have identity verified status as "unconfirmed" and `availability_365` equals 0, meaning their listings are not available any day of the year.  

**B)** These hosts have identity verified status as "confirmed" and `availability_365` equals 365, meaning their listings are always available. 

**C)** These hosts have identity verified status as "unconfirmed" and `availability_365` equals 365, meaning their listings are always available despite being unverified.  

**D)** These hosts have identity verified status as "confirmed" and `availability_365` equals 0, meaning their listings are never available but are verified.


In [16]:
%%sql

SELECT 
    h.host_id, 
    host,
    identity_verified,
    availability_365
FROM listings l
JOIN hosts h 
ON l.host_id = h.host_id
WHERE h.identity_verified = 'unconfirmed' 
AND l.availability_365 = 0;


 * sqlite:///airbnb_nyc.db
Done.


host_id,host,identity_verified,availability_365
123600518,Meryl,unconfirmed,0
124039648,Kierra,unconfirmed,0
162511322,Hans,unconfirmed,0
176369804,Samet,unconfirmed,0
182476890,Lawrence,unconfirmed,0
191490859,Adam,unconfirmed,0
232268189,Juan Pablo,unconfirmed,0
234136209,Loreley,unconfirmed,0
246371399,Rishab,unconfirmed,0
271758427,Yaakov,unconfirmed,0


### EXPLANATION

<span style="color:red">

**Correct Answer:** A)

This option accurately describes hosts whose identity verification status is "unconfirmed" and whose listings show `availability_365 = 0`, indicating that their listings are not available to guests any day of the year. This suggests that these hosts are neither fully verified by the platform nor actively renting out their properties, which aligns with the goal of identifying hosts who may affect trust and availability on the platform.

To arrive at this conclusion, one would query the database to filter hosts with an identity verification status marked as "unconfirmed" and cross-reference with listings that have `availability_365 = 0`. This approach helps CityBnb flag inactive or potentially unreliable hosts.

</span>

---

#### Incorrect Options :

- **B)** describes hosts with a "confirmed" verification status and full availability (365 days). This reflects active and verified hosts, which contradicts the premise of identifying unverified, inactive hosts.

- **C)** suggests unconfirmed hosts whose listings are available all year round (365 days). While unverified, these hosts remain fully active, so they do not fit the profile of concern for trust and availability.

- **D)** indicates verified hosts whose listings are never available. Although these hosts are verified, their zero availability means they are inactive, but they do not represent unverified hosts as specified in the question.

  ---


### Question 11/20 – Neighborhoods with Highest Guest Reviews

A marketing team is seeking to focus Airbnb promotional efforts on neighborhoods with the highest levels of guest activity.  
To support this initiative, guest review data is being analyzed to identify which neighborhoods receive the highest total number of guest reviews.

#### Which three neighborhoods have the highest total guest reviews according to the data?

**A)** Bedford-Stuyvesant, Williamsburg, Harlem 

**B)** East Harlem, East Village, Hell's Kitchen

**C)** Crown Heights, Upper East Side, Upper West Side 

**D)** Midtown, Richmond Hill, Astoria



In [17]:
%%sql

SELECT 
    l.neighbourhood_id,
    neighbourhood,
    SUM(number_of_reviews) AS total_reviews
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
GROUP BY n.neighbourhood
ORDER BY total_reviews DESC;

 * sqlite:///airbnb_nyc.db
Done.


neighbourhood_id,neighbourhood,total_reviews
5,Bedford-Stuyvesant,2239
10,Williamsburg,2067
11,Harlem,1642
3,East Harlem,1134
14,East Village,1104
6,Hell's Kitchen,1061
36,Crown Heights,1061
37,Upper East Side,956
7,Upper West Side,792
8,Bushwick,765


### EXPLANATION

<span style="color:red">

**Correct Answer:** A) Bedford-Stuyvesant, Williamsburg, Harlem.

To find this, we first join the listings data with the neighborhood information to associate each listing with its neighborhood. Then, we sum the total number of guest reviews (`number_of_reviews`) for each neighborhood to measure guest activity levels.

The query used calculates the total guest reviews per neighborhood and orders them in descending order. From the output, Bedford-Stuyvesant has the highest total reviews (2239), followed by Williamsburg (2067), and then Harlem (1642). These are the neighborhoods with the most guest activity, making option **A** the right choice for targeted Airbnb promotions.

</span>

---

#### Incorrect Options :

- **B)** East Harlem, East Village, Hell's Kitchen: These neighborhoods have lower total reviews compared to the top three, with totals around 1100 for East Harlem and East Village, and about 1061 for Hell’s Kitchen, placing them below Bedford-Stuyvesant, Williamsburg, and Harlem.

- **C)** Crown Heights, Upper East Side, Upper West Side: While these neighborhoods have moderate guest reviews (around 1061 for Crown Heights, 956 for Upper East Side, and 792 for Upper West Side), they do not surpass the top three neighborhoods in total guest reviews.

- **D)** Midtown, Richmond Hill, Astoria: These neighborhoods have even fewer total guest reviews (Midtown at 431, Richmond Hill at 424, Astoria at 416), so they are not among the highest in guest activity.

---

### Question 12/20 – Listing with Zero Availability and Lowest Reviews

Each listing below has `availability_365 = 0`.  
Select the listing that also has the lowest `review_rate_number` and the fewest `number_of_reviews`.

 **A.**  
  Listing: Your home away from home...  
  Review Rating: 1  
  Number of Reviews: 63  

  **B.**  
  Listing: Bright and modern 1 BR in Williamsburg w/balcony  
  Review Rating: 1  
  Number of Reviews: 1  

 **C.**  
  Listing: Greenwich Village Apartment  
  Review Rating: 1  
  Number of Reviews: 2  

 **D.**  
  Listing: One bedroom APT in Prospect Park  
  Review Rating: 1  
  Number of Reviews: 5  


In [18]:
%%sql

SELECT listing_id, listing, availability_365, review_rate_number, number_of_reviews
FROM listings
WHERE review_rate_number IS NOT NULL
  AND review_rate_number <= 2
  AND availability_365 = 0
ORDER BY availability_365 ASC, review_rate_number ASC, number_of_reviews ASC;  


 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,availability_365,review_rate_number,number_of_reviews
11617647,Bright and modern 1 BR in Williamsburg w/balcony,0,1,1
43794645,Charming light-filled apartment in Crown Heights,0,1,1
44970491,Bright Minimalist Creative Retreat Bedroom BedStuy,0,1,1
50810512,Cozy apartment in Upper Manhattan!,0,1,1
13564504,Greenwich Village Apartment,0,1,2
44823027,Beautiful Sunlit Room in Brooklyn,0,1,2
49886514,"Spacious 1BR amazing view, Beach 5 min, airport 20",0,1,2
11841328,Bright Spacious BK Room with Bath,0,1,3
11869495,Beautiful & spacious apartment on Upper East Side,0,1,3
14325021,Private sanctuary close to Central park,0,1,3


### EXPLANATION

<span style="color:red">

**Correct Answer:** B) Bright and modern 1 BR in Williamsburg w/balcony.

We are looking for a listing that meets three conditions:
- `availability_365` is 0 (meaning the listing is not available at all during the year),
- The lowest possible `review_rate_number`, and
- The fewest `number_of_reviews`.

The query used filters for listings with non-null review ratings, a rating of 2 or less, and no availability in the year. Then it orders the results by availability, review rating, and number of reviews — all ascending — to prioritize the lowest rating and fewest reviews.

From the output, **“Bright and modern 1 BR in Williamsburg w/balcony”** has a review rating of 1 and only 1 review, which is the fewest among all listings with `availability_365 = 0` and the lowest review rating. This meets both conditions better than any other options.

</span>

---

#### Incorrect Options:

- **A)** “Your home away from home...” has the lowest review rating of 1 but the highest number of reviews (63), so it does not qualify as having the fewest reviews.

- **C)** “Greenwich Village Apartment” has a review rating of 1, but with 2 reviews, it does not have the fewest number of reviews.

- **D)** “One bedroom APT in Prospect Park” has a review rating of 1 but 5 reviews, which is higher than both B and C.


---


### Question 13/20 – Host with Highest Number of Listings

Which host has the highest number of listings in the dataset?

 **A.** Kazuya 
 
 **B.** Kara 
 
 **C.** Sonder (NYC)
 
 **D.** Airbnb Official  


In [36]:
%%sql

SELECT h.host_id, host, calculated_host_listings_count
FROM listings l
JOIN hosts h ON l.host_id = h.host_id
WHERE calculated_host_listings_count >= 100
GROUP BY h.host_id
ORDER BY calculated_host_listings_count DESC;

 * sqlite:///airbnb_nyc.db
Done.


host_id,host,calculated_host_listings_count
1936282448,Sonder (NYC),327
1844778040,Sonder (NYC),327
1370945204,Sonder (NYC),327
1198781658,Sonder (NYC),327
574716087,Sonder (NYC),327
1968430100,Kara,121
750461323,Kara,121
882085888,Kazuya,103


### Explanation

<span style="color:red">
<strong>Correct Answer: C) Sonder (NYC) — Listings: 327</strong> 

To determine the host with the most listings:
- The `listings` and `hosts` tables were joined on `host_id`.
- Results were filtered to include only hosts with 100 or more listings.
- The data was grouped by `host_id` and ordered by `calculated_host_listings_count` in descending order.

This query revealed that **Sonder (NYC)** appears multiple times with **327 listings**, making it the host with the highest number of listings in the dataset.

</span>

---

#### Incorrect Options :

- **A)** Kazuya — Has only **103 listings**, which is far fewer than Sonder (NYC)’s 327.
- **B)** Kara — Has **121 listings**, also significantly less than Sonder (NYC).
- **D)** Airbnb Official — Does not appear in the result set and does not have the highest number of listings.

---

### Question 14/20 – Neighborhood with Highest Guest Reviews for High Availability Listings

Which neighborhood has the highest number of guest reviews among listings that are available most of the year (`availability_365` ≥ 300)?

**A.** Hell's Kitchen 

**B.** Upper East Side

**C.** Bedford-Stuyvesant

**D.** East Village  


In [88]:
%%sql

SELECT 
    l.neighbourhood_id,
    neighbourhood_group_id,
    neighbourhood,
    COUNT(l.neighbourhood_id) AS total_high_availability_listings,
    SUM(number_of_reviews) AS total_reviews
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
WHERE availability_365 >= 300 
GROUP BY n.neighbourhood
HAVING COUNT(l.neighbourhood_id) >= 5
ORDER BY total_reviews DESC, total_high_availability_listings DESC;

UsageError: Cell magic `%%sql` not found.


### Explanation

<span style="color:red"> 
<strong>Correct Answer: B) Upper East Side — Total Reviews: 626</strong> 

To find the neighborhood with the highest number of guest reviews among listings with **high availability**:
- Listings were filtered where `availability_365 >= 300` (indicating they are available most of the year).
- These listings were joined with the `neighbourhoods` table.
- Results were grouped by neighborhood, and neighborhoods with at least **5 high-availability listings** were considered.
- The total number of reviews was calculated for each neighborhood, and the results were ordered in **descending** order of total guest reviews.

**Upper East Side** came out on top with **626 total reviews** across 5 high-availability listings.

</span>

---

#### Incorrect Options:

- **A) Hell's Kitchen** — Has **9 listings**, but only **424 reviews**, which is less than Upper East Side.
- **C) Bedford-Stuyvesant** — Has **11 listings**, but only **252 reviews**, which is significantly lower.
- **D) East Village** — Has **7 listings** and **322 reviews**, also below Upper East Side.

---

### Question 15/20 – Listing with Highest Estimated Annual Revenue

Which listing has the highest estimated annual revenue based on nightly price × `availability_365`?

**A.** One Bedroom Apt, Suitable for Two  
**B.** Interfaith Retreat Guest Rooms (Bhakti)  
**C.** 100$  
**D.** Small room for 1 Person-Best Value  


In [None]:
%%sql
SELECT 
    listing_id,
    listing,
    neighbourhood_id,
    host_id,
    rt.type,
    ROUND(price * availability_365) AS estimated_revenue
FROM listings l
JOIN room_types rt ON rt.room_type_id = l.room_type_id
WHERE price IS NOT NULL AND
    availability_365 IS NOT NULL
ORDER BY 
    estimated_annual_revenue DESC
LIMIT 5;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,neighbourhood_id,host_id,type,estimated_annual_revenue
55836994,Small room for 1 Person-Best Value,117,1380561739,Private room,423765.0
48129647,"One train ride to Times SQ, Central Park, LOCATION",3,548848159,Entire home/apt,408870.0
50965156,Interfaith Retreat Guest Rooms (Bhakti),28,1631373893,Private room,404234.0
12013646,100$,32,334356607,Private room,402597.0
2790788,"One Bedroom Apt, Suitable for Two",115,132238305,Entire home/apt,393464.0


### Explanation

<span style="color:red">
<strong>Correct Answer: Small room for 1 Person-Best Value</strong>


To determine the listing with the highest estimated annual revenue, we calculate the product of the nightly price and the number of days the listing is available throughout the year (`availability_365`). This metric provides a realistic estimate of potential earnings assuming full occupancy on all available days.

Using a SQL query that multiplies `price` by `availability_365` and orders the results in descending order, we identify the top listings by this revenue estimate.

The listing **“Small room for 1 Person-Best Value”** emerges as the highest revenue generator with an estimated annual revenue of **$423,765**. This is notable because it achieves the highest revenue despite being a private room (not an entire apartment), indicating a strategic balance of price and availability.

</span>

---

#### Incorrect Options:

- **A) One Bedroom Apt, Suitable for Two**: Estimated revenue approximately **393,464**, which, although substantial, is less than the top listing.
  
- **B) Interfaith Retreat Guest Rooms (Bhakti)**: Estimated revenue of **404,234**, also less than the top revenue earner.
  
- **C) 100$**: Despite the name, this listing generates **402,597**, still falling short of the leading figure.


---


### Question 16/20 – Neighborhoods with Most Negative Reviews

Which five neighborhoods receive the most negative reviews (`review_rate_number` < 2)?

 **A.** Harlem, Bedford-Stuyvesant, Crown Heights, Williamsburg, Bushwick 

 **B.** East Village, Hell's Kitchen, Bushwick, Upper East Side, Harlem 
 
 **C.** Crown Heights, Upper West Side, SoHo, Williamsburg, Bushwick  

 **D.** Bedford-Stuyvesant, Tribeca, Bushwick, Chelsea, Crown Heights  


In [43]:
%%sql

SELECT 
    l.neighbourhood_id,
    n.neighbourhood_group_id,
    neighbourhood,
    COUNT(l.neighbourhood_id) AS negative_review_count
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
WHERE l.review_rate_number < 2
GROUP BY neighbourhood
HAVING negative_review_count >= 5
ORDER BY negative_review_count DESC;

 * sqlite:///airbnb_nyc.db
Done.


neighbourhood_id,neighbourhood_group_id,neighbourhood,negative_review_count
11,2,Harlem,10
5,1,Bedford-Stuyvesant,10
36,1,Crown Heights,9
10,1,Williamsburg,5
8,1,Bushwick,5


### Explanation

<span style="color:red"> 
<strong>Correct Answer: A) Harlem, Bedford-Stuyvesant, Crown Heights, Williamsburg, Bushwick</strong>


To determine which neighborhoods receive the most negative reviews, we focus on listings where the `review_rate_number` is less than 2, signifying poor guest ratings. By joining the `listings` and `neighbourhoods` tables, we enrich the dataset with neighborhood names, then aggregate the count of these low-rated reviews per neighborhood. Ordering the results by descending count highlights the neighborhoods with the greatest frequency of negative feedback.

The analysis reveals:  
- Harlem and Bedford-Stuyvesant lead with 10 negative reviews each.  
- Crown Heights follows closely with 9.  
- Both Williamsburg and Bushwick record 5 negative reviews each.

This data-driven approach validates option A as the accurate selection.
</span>  
    
---

#### Incorrect Options:

- **B)** Neighborhoods like East Village and Upper East Side lack sufficient negative review counts to be included in the top five.
   
- **C)** Upper West Side and SoHo do not meet the threshold of five or more negative reviews.
   
- **D)** Tribeca and Chelsea have comparatively fewer listings with negative reviews and thus are excluded.


### Question 17/20 – Analysis of Low-Rated Listings in Bedford-Stuyvesant and Williamsburg

Based on listings with poor guest ratings (`review_rate_number` < 2), which statement is most accurate about Bedford-Stuyvesant and Williamsburg?

 **A.** Bedford-Stuyvesant listings have lower average prices than Williamsburg for all room types. 
 
 **B.** Williamsburg has a Shared room listed at a higher average price than any other room type in both neighborhoods.
 
 **C.** Both neighborhoods only offer Private rooms among low-rated listings.  
 
 **D.** Bedford-Stuyvesant has a higher average number of reviews for low-rated listings than Williamsburg.  


In [58]:
%%sql

SELECT listing_id, 
       listing,
       rt.type,
       n.neighbourhood,
       ROUND(AVG(price)) AS price,
       ROUND(AVG(number_of_reviews)) AS avg_number_of_reviews,
       ROUND(AVG(review_rate_number)) AS avg_review_rate_number
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
JOIN room_types rt ON l.room_type_id = rt.room_type_id
WHERE  neighbourhood IN ('Bedford-Stuyvesant', 'Williamsburg') 
AND review_rate_number < 2
GROUP BY neighbourhood, rt.type 
ORDER BY neighbourhood, rt.type, l.price DESC;

 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,type,neighbourhood,price,avg_number_of_reviews,avg_review_rate_number
9695090,"Bright, airy apartment in Brooklyn",Entire home/apt,Bedford-Stuyvesant,772.0,35.0,1.0
2884679,Cozy Room in Family Home..BKLYN!!!,Private room,Bedford-Stuyvesant,477.0,48.0,1.0
11617647,Bright and modern 1 BR in Williamsburg w/balcony,Entire home/apt,Williamsburg,611.0,1.0,1.0
8702055,Big room in East Williamsburg Loft,Private room,Williamsburg,342.0,60.0,1.0
5854396,Wildlife Loft Living room adventure,Shared room,Williamsburg,875.0,25.0,1.0


### Explanation

<span style="color:red">
<strong>Correct Answer: B) Williamsburg has a Shared room listed at a higher average price than any other room type in both neighborhoods.</strong>


To analyze the pricing and review dynamics of low-rated listings (where `review_rate_number < 2`) in Bedford-Stuyvesant and Williamsburg, we segmented the dataset by neighborhood and room type. For each group, we computed average nightly prices and average review counts.

- In Williamsburg, the **Shared room** category stands out with an average price of **875**, surpassing both Entire homes/apartments at 611 and Private rooms at 342.  
- In contrast, Bedford-Stuyvesant’s most expensive low-rated listings are **Entire homes/apartments** at an average price of **772**, with Private rooms at 477, and no Shared rooms priced higher.

This price distribution reveals a unique characteristic in Williamsburg: Shared rooms command the highest average price among low-rated listings, a scenario not observed in Bedford-Stuyvesant.

</span>

---

### Incorrect Options:

- **A)** Claims Bedford-Stuyvesant listings have lower average prices across all room types, but the data shows Bedford-Stuyvesant’s Entire homes/apartments are pricier than Williamsburg’s.
   
- **C)** Suggests only Private rooms exist among low-rated listings in both neighborhoods, but Shared and Entire home types also appear.

   
- **D)** Posits Bedford-Stuyvesant has a higher average number of reviews for low-rated listings, yet Williamsburg’s listings show comparable or higher average review counts in some room categories.

---

### Question 18/20 – Service Fee vs Nightly Price

There are Airbnb listings where the service fee is more than half (50%) of the nightly price.

**A.** True

**B.** False  


In [61]:
%%sql

SELECT 
    listing_id, 
    listing, 
    price, 
    service_fee,
    ROUND((service_fee / price) * 100, 2) AS service_fee_percent
FROM listings
WHERE price > 0 AND 
    (service_fee / price) > 0.5
ORDER BY service_fee_percent ASC;


 * sqlite:///airbnb_nyc.db
Done.


listing_id,listing,price,service_fee,service_fee_percent


### Question 19/20-Analyzing Listing Counts and Average Prices by Neighbourhood and Room Type 

You are analyzing Airbnb listing data to understand the distribution and pricing of different room types across neighbourhoods. Which SQL query will provide the number of **listings per room type** in each neighbourhood and the **average price per listing,** ordered by the number of listings and average price in descending order?

**A)**
SELECT
    l.neighbourhood_id,
    neighbourhood,
    rt.type AS room_type,
    COUNT(l.room_type_id) AS listings_per_room_type,
    ROUND(AVG(price)) AS avg_price
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
JOIN room_types rt ON l.room_type_id = rt.room_type_id
GROUP BY neighbourhood, rt.type
ORDER BY listings_per_room_type DESC, avg_price DESC;

**B)**
SELECT
    neighbourhood,
    room_type,
    SUM(price) AS total_price,
    COUNT(*) AS listings_count
FROM listings
GROUP BY neighbourhood, room_type
ORDER BY listings_count DESC;

**C)**
SELECT
    neighbourhood,
    room_type,
    AVG(price) AS average_price
FROM listings
GROUP BY room_type
ORDER BY average_price DESC;

**D)**
SELECT
    neighbourhood,
    room_type,
    COUNT(*) AS total_listings,
    AVG(availability_365) AS avg_availability
FROM listings
GROUP BY neighbourhood
ORDER BY total_listings DESC;



In [66]:
%%sql

SELECT
    l.neighbourhood_id,
    neighbourhood,
    rt.type AS room_type,
    COUNT(l.room_type_id) AS listings_per_room_type,
    ROUND(AVG(price)) AS avg_price
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.neighbourhood_id
JOIN room_types rt ON l.room_type_id = rt.room_type_id
GROUP BY neighbourhood, rt.type
ORDER BY listings_per_room_type DESC, avg_price DESC;


 * sqlite:///airbnb_nyc.db
Done.


neighbourhood_id,neighbourhood,room_type,listings_per_room_type,avg_price
8,Bushwick,Private room,38,612.0
5,Bedford-Stuyvesant,Private room,38,555.0
10,Williamsburg,Entire home/apt,35,618.0
5,Bedford-Stuyvesant,Entire home/apt,31,633.0
11,Harlem,Private room,28,625.0
10,Williamsburg,Private room,27,575.0
11,Harlem,Entire home/apt,27,505.0
6,Hell's Kitchen,Entire home/apt,24,569.0
36,Crown Heights,Entire home/apt,21,741.0
14,East Village,Entire home/apt,18,686.0


### Explanation

<span style="color:red">
<strong>Correct Answer: A)</strong>


This query correctly joins the `listings` table with `neighbourhoods` and `room_types` to include **neighbourhood names** and **room type descriptions** in the output. It then:

- **Groups by** both `neighbourhood` and `room_type`, enabling analysis at the intersection of these two dimensions.
- **Calculates**:
  - `COUNT(l.room_type_id)`: the **total number of listings** for each room type in each neighbourhood.
  - `ROUND(AVG(price))`: the **average price** per listing, rounded to the nearest whole number.
- **Orders** the results by:
  1. Number of listings (descending),
  2. Then average price (also descending).

This approach meets the goal of identifying neighborhood-room type combinations with the **most listings** and **highest prices**.
 
 </span>

---

#### Incorrect Options:

- **B)** Uses `SUM(price)` instead of average, which shows total revenue, not **average price** per listing. It also lacks the necessary **joins** to access room type descriptions.
- **C)** Groups **only by room type**, so it fails to segment data by **neighbourhood**.
- **D)** Groups **only by neighbourhood**, omitting room type segmentation, and calculates **average availability** rather than **price**.

---

### Q20/20: Evaluating Price Trends by Stay Duration and Room Type

You're analyzing pricing trends in the Airbnb NYC dataset to help budget-conscious travelers make informed choices. You run a query to compare average prices for listings based on stay duration and room type. The dataset classifies stays as either **short-term (<7 nights) or long-term (≥7 nights)** using minimum_nights, and calculates the average price per room type:

SELECT

    CASE
    
    WHEN minimum_nights < 7 THEN 'Short Stay (<7 nights)'
    
    ELSE 'Long Stay (≥7 nights)'
    
    END AS stay_type,
    
    rt.type,
    
    ROUND(AVG(price),2) AS price
    
FROM Listings l

JOIN room_types rt ON l.room_type_id = rt.room_type_id

GROUP BY stay_type, rt.type

ORDER BY price DESC;

**Based on this analysis, which of the following insights is most accurate?**

**A)** Guests booking shared rooms for short stays should expect the highest average cost compared to other room types and durations.

**B)** Long stays in private rooms are costlier than any type of short stay.

**C)** Entire homes consistently offer the lowest average prices across all durations.

**D)** Private rooms show the largest price increase when switching from long to short stays.



In [60]:
%%sql

SELECT  
    CASE 
    WHEN minimum_nights < 7 THEN 'Short Stay (<7 nights)'
    ELSE 'Long Stay (≥7 nights)'
    END AS stay_type,
    rt.type,
    ROUND(AVG(price),2) AS price
FROM Listings l
JOIN room_types rt ON l.room_type_id = rt.room_type_id
GROUP BY stay_type, rt.type
ORDER BY price DESC;


 * sqlite:///airbnb_nyc.db
Done.


stay_type,type,price
Short Stay (<7 nights),Shared room,791.93
Long Stay (≥7 nights),Entire home/apt,629.66
Short Stay (<7 nights),Entire home/apt,628.4
Short Stay (<7 nights),Private room,615.38
Long Stay (≥7 nights),Private room,568.29
Long Stay (≥7 nights),Shared room,564.33


### Explanation

<span style="color:red">
<strong>Correct Answer: A) Guests booking shared rooms for short stays should expect the highest average cost compared to other room types and durations.</strong>


To assess pricing based on the **length of stay**, listings were categorized using a `CASE` statement:
- **Short Stay**: Fewer than 7 nights
- **Long Stay**: 7 nights or more

Average prices were then calculated for each **room type** within these stay duration categories.

From the analysis: **Shared rooms for short stays** have the **highest average cost**,($791.93) even exceeding the price of entire apartments and private rooms — across both short and long stays.

</span>

---

### Incorrect Options:

- **B)** *Long stays in private rooms are costlier than any type of short stay*  
   Incorrect. Long-stay private rooms ($568.29) are cheaper than **all** short-stay averages.

- **C)** *Entire homes consistently offer the lowest average prices across all durations*  
   Incorrect. Entire homes are actually among the **most expensive** options.

- **D)** *Private rooms show the largest price increase from long to short stays*  
   Incorrect. The increase is **greater** for shared rooms:  
  - Shared room increase = 791.93 − 564.33 = **227.60** 
  - Private room increase = 615.38 − 568.29 = **47.09**

    ---
