## 1. Neighborhood Listing Saturation

To begin mapping the scope of Airbnb in San Francisco, you are tasked with finding out how listings are distributed across neighborhoods. CityBnB wants to understand the saturation level in each area to identify where Airbnb might be displacing long-term residents. Use SQL to find which neighborhoods have the most listings.

In [None]:
SELECT neighbourhood, COUNT(*) AS listing_count
FROM listings
GROUP BY neighbourhood
ORDER BY listing_count DESC;

In [None]:
SELECT neighbourhood, COUNT(id) AS total_listings
FROM listings
GROUP BY neighbourhood
HAVING COUNT(id) > 50
ORDER BY total_listings DESC;

## 2. High-Priced Listings by Neighborhood

CityBnB suspects that certain listings in traditionally touristic areas are being priced far beyond what locals can afford. These listings may be driving gentrification or are managed by commercial operators. Identify where the most expensive listings are located and what neighborhoods they belong to.

In [None]:
SELECT l.id, l.name, n.neighbourhood, l.price
FROM listings l
JOIN neighbourhoods n ON l.neighbourhood_id = n.id
ORDER BY l.price DESC
LIMIT 10;

In [None]:
SELECT neighbourhood, MAX(price) AS highest_price
FROM listings
GROUP BY neighbourhood
ORDER BY highest_price DESC;

## 3. Room Type Distribution by Neighborhood

You’re now investigating whether Airbnb is still about sharing homes or if entire homes are now dominating the platform. By analyzing the prevalence of room types by neighborhood, CityBnB can understand if the platform is moving away from its original intent.

In [None]:
SELECT neighbourhood, room_type, COUNT(*) AS total
FROM listings
GROUP BY neighbourhood, room_type
ORDER BY neighbourhood, total DESC;

In [None]:
SELECT room_type, COUNT(*) AS count
FROM listings
WHERE neighbourhood = 'Mission'
GROUP BY room_type
ORDER BY count DESC;

## 4. Detecting Commercial Operators

CityBnB believes that Airbnb is being used by commercial operators who run multiple listings, similar to unlicensed hotels. Your task is to identify hosts who own more than five listings, potentially exploiting the platform at scale.

In [None]:
SELECT host_id, COUNT(*) AS listing_count
FROM listings
GROUP BY host_id
HAVING listing_count > 5
ORDER BY listing_count DESC;

In [None]:
SELECT host_id, COUNT(id) AS total_listings
FROM listings
GROUP BY host_id
HAVING total_listings BETWEEN 6 AND 50;

## 5. Detecting Ghost Listings

One concerning pattern is the presence of ghost listings—properties that appear available year-round but have zero engagement or reviews. These could be fake listings created to inflate Airbnb's supply or mislead consumers. Help CityBnB detect such cases.

In [None]:
SELECT id, name, availability_365, number_of_reviews
FROM listings
WHERE availability_365 = 365 AND number_of_reviews = 0;

In [None]:
SELECT *
FROM listings
WHERE number_of_reviews = 0 AND availability_365 >= 300;

## 6. Guest Review Distribution by Neighborhood

Traveler activity is often reflected in the number of reviews per neighborhood. Higher review counts might signal over-tourism or an attractive tourist district. Analyze which neighborhoods are receiving the most guest interaction based on review counts.

In [None]:
SELECT n.neighbourhood, COUNT(r.id) AS total_reviews
FROM reviews r
JOIN listings l ON r.listing_id = l.id
JOIN neighbourhoods n ON l.neighbourhood_id = n.id
GROUP BY n.neighbourhood
ORDER BY total_reviews DESC;

In [None]:
SELECT neighbourhood, SUM(number_of_reviews) AS review_volume
FROM listings
GROUP BY neighbourhood
ORDER BY review_volume DESC;

## 7. Host Verification vs. Review Score

CityBnB wants to understand if verified hosts offer a better experience than unverified ones. This could influence future policy, platform changes, and trust-building with guests. Use SQL to compare average review scores between verified and unverified hosts.

In [None]:
SELECT host_identity_verified, AVG(review_scores_rating) AS avg_rating
FROM listings
GROUP BY host_identity_verified;

In [None]:
SELECT
    CASE WHEN host_identity_verified = 't' THEN 'Verified' ELSE 'Unverified' END AS status,
    AVG(review_scores_rating) AS average_score
FROM listings
GROUP BY status;

## 8. Estimating Annual Revenue Potential

Commercial-style listings often show full-year availability and premium pricing. CityBnB wants to estimate which listings have the highest earning potential. Estimate revenue by multiplying nightly price by number of available nights.

In [None]:
SELECT id, availability_365, price, (price * availability_365) AS estimated_revenue
FROM listings
ORDER BY estimated_revenue DESC
LIMIT 10;

In [None]:
SELECT id, name, neighbourhood, ROUND(price * (availability_365 / 365.0), 2) AS potential_income
FROM listings
ORDER BY potential_income DESC
LIMIT 10;

## 9. High Price, Low Engagement Listings

Listings that charge a premium price yet have very few reviews might be underperforming, fraudulent, or exploiting pricing gaps. Help CityBnB surface listings that may be charging too much relative to their engagement.

In [None]:
SELECT id, name, price, number_of_reviews
FROM listings
WHERE number_of_reviews < 5 AND price > 300
ORDER BY price DESC;

In [None]:
SELECT *
FROM listings
WHERE price > 400 AND number_of_reviews = 0;

## 10. Host Revenue Aggregation Analysis

CityBnB believes that a small number of power hosts are profiting disproportionately and potentially skirting local laws. These hosts may own tens of properties, operate through shell companies, and never even live in San Francisco. Using SQL, analyze which hosts manage the most listings and generate the highest total estimated revenue. To do this, aggregate all their listings, compute the potential annual income (price × availability), and return the top ten. Keep in mind that some hosts may be hiding behind verified profiles or inconsistent naming. Consider how you might approach joining this with external tax datasets (though that step is conceptual).

In [None]:
SELECT host_id, COUNT(*) AS total_listings,
       SUM(price * availability_365) AS estimated_revenue
FROM listings
GROUP BY host_id
ORDER BY estimated_revenue DESC
LIMIT 10;

In [None]:
SELECT host_id, COUNT(id) AS properties_owned,
       ROUND(AVG(price), 2) AS avg_price,
       SUM(price * availability_365) AS total_income
FROM listings
GROUP BY host_id
HAVING COUNT(id) > 5
ORDER BY total_income DESC;