# Anaylze Airbnb Listings using SQL

- 🏆 80 points available
- 🤠 Author: Park (ypark32@illinois.edu)
- ✏️ Last updated on 10/26/2021

---

▶️ First, run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder.

In [1]:
# DO NOT MODIFY THE CODE IN THIS CELL
import base64
import unittest
tc = unittest.TestCase()

---

## 💎 Case overview

![Airbnb logo](https://github.com/bdi475/images/blob/main/case-studies/airbnb-sql/airbnb-logo-black-background.png?raw=true)

[Airbnb](https://airbnb.com/) is a popular platform for hosts to accommodate guests with short-term stays and leisure activities. It was born in 2008 when the two founders welcomed three guests to their San Francisco home. Airbnb has grown to serve 4 million hosts in about 100,000 cities. That makes Airbnb's valuation worth more than top three hotel chains combined as of December 2020 ([source](https://www.businessinsider.com/airbnb-ipo-valuation-tops-three-hotel-chains-combined-opening-day-2020-12)).

### ⚔️ Your goal

You have recently joined a property management firm as a business development manager. Your role is to establish and deepen relationships with the property owners. Your success on building relationships largely depends on the firm's ability to offer value-adding services to the property owners.

A common concern that most property owners face is dealing with vacant properties. The traditional approach to to fill up vacant properties is to run paid advertisements or to provide sign-up incentives (e.g., free rent for a month if leased for 13+ months, gift cards, referral bonus, free parking). You have a better idea - why not convert some of the vacant properties into vacation rentals?

In this case study, you'll use SQL analyze all Airbnb listings in the city your firm is operating in. Below are some of the questions you'll be answering.

- 👉 Are there properties that can accommodate more than 8 guests?
- 👉 Are there properties that target guests staying over a month?
- 👉 Are there properties that have more than 5 bedrooms and 3 bathrooms?
- 👉 What are the average price, occupancy (availability), and review score by room type?
- 👉 Can you calculate the metrics of interest (number of listings, average price, average occupancy, average review score) by the number of bedrooms?
- 👉 How about by neighbourhoods? Can you also find the average property dimensions (number of bedrooms & bathrooms)?
- 👉 How do the metrics and dimensions differ between superhost and non-superhost listings?
- 👉 Can you find all listings that were never booked?
- 👉 Can you find the top 15 highest-rated listings?
- 👉 Can you find listings with the keyword "bright"?

Note that looking at *just* the listings data is not sufficient to determine whether converting a vacant property to an Airbnb listing is a good idea. The analysis you perform here is preliminary. If you'd like, you can download additional booking data at [http://insideairbnb.com/get-the-data.html](http://insideairbnb.com/get-the-data.html) to explore further on your own.

---

### 🎯 Part 1: Choose a city 🌄

Guess what - you get to pick your own city! 😺 

Go to [this Google spreadsheet](https://docs.google.com/spreadsheets/d/1Yv3acCW6NYgcwVVo55TcgQ-aXcAsSmoVo_Su79xm9Js/edit?usp=sharing) and mark your NetID in any row where the city has not been taken yet (the selection is based on a first-come, first-served basis). Store the copied URL to a new variable named `file_url`.

#### 🔑 Sample Code
```python
file_url = 'https://github.com/bdi475/datasets/raw/main/case-studies/airbnb-sql/Chicago.csv'
```

In [2]:
### BEGIN SOLUTION
file_url = 'https://github.com/bdi475/datasets/raw/main/case-studies/airbnb-sql/Chicago.csv'
### END SOLUTION

print(f'CSV File URL: {file_url}')

CSV File URL: https://github.com/bdi475/datasets/raw/main/case-studies/airbnb-sql/Chicago.csv


#### 🧭 Check your work

In [3]:
import pandas as pd
df_check = pd.read_csv(file_url)

tc.assertEqual(set(df_check.columns), set(['name', 'neighbourhood', 'room_type', 'bedrooms', 'bathrooms', 
                                           'accommodates', 'minimum_nights', 'price', 'availability_365', 
                                           'number_of_reviews', 'review_score', 'latitude', 'longitude', 'is_superhost']))

tc.assertGreater(df_check.shape[0], 100, 'The city should have at least 100 listings')

---

### 🎯 Part 2: Import packages

#### 👇 Tasks

- ✔️ Import the following Python packages.
    1. `pandas`: Use alias `pd`.
    2. `numpy`: Use alias `np`.
    3. `sqlite3`: No alias

In [4]:
### BEGIN SOLUTION
import pandas as pd
import numpy as np
import sqlite3
### END SOLUTION

#### 🧭 Check your work

In [5]:
import sys

tc.assertIsNotNone(pd.__version__, 'Check whether you have correctly imported Pandas with an alias.')
tc.assertIsNotNone(np.__version__, 'Check whether you have correctly imported NumPy with an alias.')
tc.assertTrue('sqlite3' in globals(), 'Check whether you have correctly imported the sqlite3 package.')

---

### 🔨 Create and populate the `listings` table

▶️ Run the code below to create a new SQLite database. The code will also create and populate the `listings` table.

In [6]:
# DO NOT CHANGE THE CODE BELOW
# Connect to airbnb.db database file
# If the file does not exist, create a new file
conn = sqlite3.connect('airbnb.db')

# Create a cursor
# A cursor enables users of a DBMS to traverse through the result set
# without having to retrieve all results at once
c = conn.cursor()

# Drop (delete) listings table if it already exists
c.execute('DROP TABLE IF EXISTS listings')
conn.commit()

# Create table
# Triple quotes (''') denote multiline strings
create_table_query = '''
CREATE TABLE IF NOT EXISTS listings (
    name TEXT,
    neighbourhood TEXT,
    room_type TEXT,
    bedrooms REAL,
    bathrooms REAL,
    price REAL,
    accommodates INT,
    minimum_nights INT,
    availability_365 INT,
    number_of_reviews INTEGER,
    review_score REAL,
    is_superhost INT,
    latitude REAL,
    longitude REAL
)
'''
c.execute(create_table_query)
conn.commit()

# Populate the listings table
df_listings = pd.read_csv(file_url)
df_listings.to_sql(name='listings', index=False, con=conn, if_exists='append')

conn.close()

#### 🔬 `listings` table column descriptions

| Column Name         | Description                                                              |
|---------------------|--------------------------------------------------------------------------|
| `name`              | Listing title                                                            |
| `neighbourhood`     | Neighbourhood                                                            |
| `room_type`         | Room type (e.g., Shared room, Private room, Entire home/apt, Hotel room) |
| `bedrooms`          | Number of bedrooms                                                       |
| `bathrooms`         | Number of bathrooms                                                      |
| `accommodates`      | Number of maximum guests                                                 |
| `minimum_nights`    | Number of minimum nights required to book                                |
| `price`             | Base nightly price - the actual price will vary by specific dates        |
| `availability_365`  | Number of available days during the past 365 days                        |
| `number_of_reviews` | Total number of reviews                                                  |
| `review_score`      | Guest feedback rating                                                    |
| `latitude`          | Latitude                                                                 |
| `longitude`         | Longitude                                                                |
| `is_superhost`      | Whether the host is a superhost (0 == not a superhost, 1 == superhost)   |

---

### 🎯 Part 3: Select all columns and rows from the `listings` table

#### 👇 Tasks

- ✔️ Write a query that selects all columns and rows from the `listings` table.
- ✔️ Store your query to a new variable named `query_select_all`.

#### 🔥 Need help?

Refer to the 🚀 **Hint** section below the 🧭 **Check your work** code cell.

In [7]:
### BEGIN SOLUTION
query_select_all = '''
SELECT *
FROM listings
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_select_all, con=conn)
display(df_result)
conn.close()

Unnamed: 0,name,neighbourhood,room_type,bedrooms,bathrooms,price,accommodates,minimum_nights,availability_365,number_of_reviews,review_score,is_superhost,latitude,longitude
0,"Hyde Park - Walk to UChicago, 10 min to McCormick",Hyde Park,Private room,1.0,1.0,65.0,1,2,355,181,100.0,1,41.78790,-87.58780
1,394 Great Reviews. 127 y/o House. 40 yds to tr...,South Lawndale,Entire home/apt,3.0,1.0,117.0,7,2,184,395,96.0,1,41.85495,-87.69696
2,Tiny Studio Apartment 94 Walk Score,West Town,Entire home/apt,3.0,1.0,70.0,2,2,365,389,93.0,1,41.90289,-87.68182
3,Barbara's Hideaway - Old Town,Lincoln Park,Entire home/apt,1.0,1.0,95.0,4,4,282,54,93.0,1,41.91769,-87.63788
4,The Biddle House (#1),Lincoln Park,Entire home/apt,2.0,1.0,125.0,4,4,101,21,90.0,1,41.91183,-87.64000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3212,Kasa Chicago | Pet Friendly 2BD/2BA + Balcony ...,Loop,Entire home/apt,2.0,2.0,127.0,6,4,357,23,97.0,0,41.87098,-87.63046
3213,Kasa Chicago | Family Friendly 3BD/2BA + Free ...,Loop,Entire home/apt,3.0,2.0,200.0,6,4,357,10,86.0,0,41.86941,-87.63214
3214,Penthouse Downtown River North - Sleeps 13,Near North Side,Entire home/apt,3.0,2.0,140.0,13,2,327,13,91.0,0,41.89772,-87.63832
3215,Unique classic private home 🏠 amazing location!,West Town,Entire home/apt,3.0,2.0,117.0,8,2,0,10,84.0,0,41.89413,-87.67549


#### 🧭 Check your work

In [8]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(b'ClNFTEVDVCAqCkZST00gbGlzdGluZ3MK').decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
conn.close()

#### 🚀 Hint

The query below selects all rows and columns from a database table named `my_table`.

```sql
SELECT *
FROM my_table;
```

---

### 🎯 Part 4: Listings that can accommodate 8 or more people

#### 👇 Tasks

- ✔️ Using the `listings` table, find all listings that can accommodate 8 or more people (`accommodates >= 8`).
- ✔️ Select only the following 5 columns (in the same order):
    - `name`, `room_type`, `bedrooms`, `bathrooms`, `accommodates`
- ✔️ Store your query to a new variable named `query_eight_or_more`.
- ✔️ Sort the result by `accommodates` in descending order.

#### 🔑 Output Format

Your query result will likely have different number of rows and cell values.

|   |                                       name |       room_type | bedrooms | bathrooms | accommodates |
|--:|-------------------------------------------:|----------------:|---------:|----------:|-------------:|
| 0 |   Elegant Rantoul Oasis: 20 mins to campus | Entire home/apt |      5.0 |       4.0 |           14 |
| 1 |         Exquisite 1980's Historic Building | Entire home/apt |      2.0 |       2.0 |           10 |
| 2 | Quiet Area in Urbana - Perfect for Groups! | Entire home/apt |      3.0 |       2.0 |            8 |
| 3 |     Spacious Champaign Apt w/Pool Gym Wifi | Entire home/apt |      1.0 |       1.0 |            7 |

#### 🔥 Need help?

Refer to the 🚀 **Hint** section below the 🧭 **Check your work** code cell.

In [9]:
### BEGIN SOLUTION
query_eight_or_more = '''
SELECT name, room_type, bedrooms, bathrooms, accommodates
FROM listings
WHERE accommodates >= 8
ORDER BY accommodates DESC
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_eight_or_more, con=conn)
display(df_result)
conn.close()

Unnamed: 0,name,room_type,bedrooms,bathrooms,accommodates
0,Family Style Home-walk to Aragon/Riviera,Entire home/apt,5.0,2.5,16
1,Lincoln Park Single Family Home,Entire home/apt,4.0,3.5,16
2,"11 Beds in 5*House TRAIN,CUBS, BEACH.RIVERA AR...",Entire home/apt,5.0,2.5,16
3,LOFT606: Gigantic Sunlit Former Factory Home,Entire home/apt,6.0,2.5,16
4,5 BRRogers ParkEvanston Whole House Rental&Par...,Entire home/apt,5.0,2.5,16
...,...,...,...,...,...
474,Traditional Greystone Chicago Duplex Apartment,Entire home/apt,4.0,2.0,8
475,Michigan | Spacious 3Bedroom/2Bath+Optional PRKG,Entire home/apt,3.0,2.0,8
476,3BD|2BA Corner view loft | Downtown | Opt PRKG,Entire home/apt,3.0,2.0,8
477,Spacious & luxurious corner apt w optional Par...,Entire home/apt,3.0,2.0,8


#### 🧭 Check your work

In [10]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCBuYW1lLCByb29tX3R5cGUsIGJlZHJvb21zLCBiYXRocm9vbXMsIGFjY29tbW9kYXRlcwp\
GUk9NIGxpc3RpbmdzCldIRVJFIGFjY29tbW9kYXRlcyA+PSA4Ck9SREVSIEJZIGFjY29tbW9kYXRlcyBERVNDCg=='
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['accommodates'].reset_index(drop=True),
                               df_check['accommodates'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

conn.close()

#### 🚀 Hint

The query below selects 5 columns from `my_table` where `column5` is greater than or equal to 8. The result will be sorted by `column5` in descending order.

```sql
SELECT column1, column2, column3, column4, column5
FROM my_table
WHERE column5 >= 8
ORDER BY column5 DESC;
```

---

### 🎯 Part 5: Average price of all listings

#### 👇 Tasks

- ✔️ Using the `listings` table, find the average price of all listings:
- ✔️ Store your query to a new variable named `query_avg_price`.
- ✔️ Result of your query should have only one column and one row.
    - The name of the column does not matter.
    
#### 🔑 Output Format

You are free to rename your `AVG(price)` column.

|   |      AVG(price) |
|--:|----------------:|
| 0 |      145.666667 |

#### 🔥 Need help?

Refer to the 🚀 **Hint** section below the 🧭 **Check your work** code cell.

In [11]:
### BEGIN SOLUTION
query_avg_price = '''
SELECT AVG(price)
FROM listings;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_avg_price, con=conn)
display(df_result)
conn.close()

Unnamed: 0,AVG(price)
0,129.522847


#### 🧭 Check your work

In [12]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
    b'ClNFTEVDVCBBVkcocHJpY2UpCkZST00gbGlzdGluZ3MK'
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
tc.assertEqual(df_result.iloc[0, 0], df_check.iloc[0, 0], f'Average price should be {df_check.iloc[0, 0]}')
conn.close()

#### 🚀 Hint

The query below calculates the average of `my_column` in `my_table`.

```sql
SELECT AVG(my_column)
FROM my_table;
```

---

### 🎯 Part 6: Long-term stays

#### 👇 Tasks

- ✔️ Using the `listings` table, find all listings with minimum nights of 30 or more (`minimum_nights >= 30`).
- ✔️ Select only the following 7 columns (in the same order):
    - `name`, `room_type`, `bedrooms`, `bathrooms`, `minimum_nights`, `price`, `review_score`
- ✔️ Store your query to a new variable named `query_long_term_stays`.
- ✔️ Sort the result by `price` in ascending order.

#### 🔑 Output Format

Your query result will likely have different number of rows and cell values.

|   | name                                      | room_type    | bedrooms | bathrooms | minimum_nights | price | review_score |
|---|-------------------------------------------|--------------|----------|-----------|----------------|-------|--------------|
| 0 | Peace within the city for solo travelers  | Private room | 1.0      | 1.0       | 35             | 25.0  | 95.0         |
| 1 | Room close to Downtown Champaign          | Private room | 1.0      | 1.0       | 30             | 33.0  | 95.0         |
| 2 | Perfect spot to work on your next startup | Private room | 1.0      | 1.0       | 30             | 65.0  | 98.0         |

#### 🔥 Need help?

Refer to the 🚀 **Hint** section below the 🧭 **Check your work** code cell.

In [13]:
### BEGIN SOLUTION
query_long_term_stays = '''
SELECT name, room_type, bedrooms, bathrooms, minimum_nights, price, review_score
FROM listings
WHERE minimum_nights >= 30
ORDER BY price;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_long_term_stays, con=conn)
display(df_result)
conn.close()

Unnamed: 0,name,room_type,bedrooms,bathrooms,minimum_nights,price,review_score
0,MODERN LOFT Downtown by McCormick w/KING BED WiFi,Entire home/apt,1.0,1.0,30,20.0,100.0
1,Luxury City Center Loft on a Traffic-free Street,Private room,1.0,2.0,30,24.0,97.0
2,Sidh Arya,Private room,1.0,1.0,60,30.0,95.0
3,Siddharth Arya,Private room,1.0,1.0,60,30.0,94.0
4,Bedroom,Private room,1.0,1.0,30,32.0,100.0
...,...,...,...,...,...,...,...
103,Modern Duplex-1 mile to downtown-4BR 2BA on 2 ...,Entire home/apt,4.0,2.0,30,445.0,100.0
104,3bedrm 2bath 1600 sqft condo w/ balcony n parking,Entire home/apt,3.0,2.0,30,450.0,96.0
105,Pilsen Gem,Entire home/apt,2.0,1.0,39,499.0,88.0
106,Tropical Five Bed // Back Patio in Wicker Park,Entire home/apt,5.0,2.0,30,803.0,100.0


#### 🧭 Check your work

In [14]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCBuYW1lLCByb29tX3R5cGUsIGJlZHJvb21zLCBiYXRocm9vbXMsIG1pbmltdW1fbmlnaHRz\
LCBwcmljZSwgcmV2aWV3X3Njb3JlCkZST00gbGlzdGluZ3MKV0hFUkUgbWluaW11bV9uaWdodHMgPj0gMzAKT1JERVIgQlkgcHJpY2UK'
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['price'].reset_index(drop=True),
                               df_check['price'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

conn.close()

#### 🚀 Hint

The query below selects 7 columns from `my_table` where `column5` is greater than or equal to 30. The result will be sorted by `column6` in ascending order.

```sql
SELECT column1, column2, column3, column4, column5, column6, column7
FROM my_table
WHERE column5 >= 30
ORDER BY column6;
```

---

### 🎯 Part 7: Large listings

#### 👇 Tasks

- ✔️ Using the `listings` table, find all rows where the listing has:
    - 6 or more bedrooms
    - **AND** 4 or more bathrooms
- ✔️ Select only the following 5 columns (in the same order):
    - `name`, `bedrooms`, `bathrooms`, `price`, `review_score`
- ✔️ Store your query to a new variable named `query_large_listings`.

#### 🔑 Output Format

Your query result will likely have different number of rows and cell values.


|   |                                   name | bedrooms | bathrooms | price | review_score |
|--:|---------------------------------------:|---------:|----------:|------:|-------------:|
| 0 |     5 min away from Downtown Champaign |      6.0 |       4.0 | 550.4 |         99.0 |
| 1 |   Large Bedroom & Bath close to campus |      6.0 |       5.0 | 483.0 |         98.0 |
| 2 | Big family welcomed, graduation season |      6.0 |       4.5 | 379.9 |         94.0 |
| 3 |                   LARGEST HOME IN TOWN |      8.0 |       7.0 | 600.0 |         93.0 |

#### 🔥 Need help?

Refer to the 🚀 **Hint** section below the 🧭 **Check your work** code cell.

In [15]:
### BEGIN SOLUTION
query_large_listings = '''
SELECT name, bedrooms, bathrooms, price, review_score
FROM listings
WHERE (bedrooms >= 6) AND (bathrooms >= 4);
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_large_listings, con=conn)
display(df_result)
conn.close()

Unnamed: 0,name,bedrooms,bathrooms,price,review_score
0,Luxurious Four-Level Home with Personal Touches,6.0,4.5,1400.0,99.0
1,Wrigleyville Mansion - Sleeps 16+,6.0,5.0,9999.0,100.0
2,Exceptional 8 Bedroom Humboldt Park Duplex,8.0,4.0,403.0,98.0
3,Five Bedroom Luxury in Old Town (Sleeps 12),6.0,4.5,533.0,95.0
4,Large 12 Bedroom Chicago Home- Sleeps 25,12.0,4.0,464.0,98.0
5,Exquisite 6 br Downtown* Private Yard *Patio *...,6.0,6.5,435.0,100.0
6,A Northside Home Near Wrigley & CTA (Sleeps 21),7.0,4.0,232.0,95.0
7,Beautiful 7Br Custom Built Single-Family Home,7.0,4.5,299.0,95.0
8,Lang House Chicago - where Art & Architecture ...,7.0,7.0,529.0,99.0
9,Massive 10 bd/5 ba house in heart of Wrigleyv...,10.0,5.0,786.0,96.0


#### 🧭 Check your work

In [16]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCBuYW1lLCBiZWRyb29tcywgYmF0aHJvb21zLCBwcmljZSwgcmV2aWV3X3Njb3JlCkZST00gb\
GlzdGluZ3MKV0hFUkUgKGJlZHJvb21zID49IDYpIEFORCAoYmF0aHJvb21zID49IDQpCg=='
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

#### 🚀 Hint

The query below selects 5 columns from `my_table` where `column2` is greater than or equal to 6 and `column3` is greater than or equal to 4.

```sql
SELECT column1, column2, column3, column4, column5
FROM my_table
WHERE (column2 >= 6) AND (column3 >= 4);
```

---

### 🎯 Part 8: Average price of large listings

#### 👇 Tasks

- ✔️ Using the `listings` table, find the average price of listings with:
    - 6 or more bedrooms
    - **AND** 4 or more bathrooms
- ✔️ Store your query to a new variable named `query_large_listings_avg_price`.
- ✔️ The result of your query should have only one column and one row.
    - The name of the column does not matter.
    
#### 🔑 Output Format

Your query result may have a different column name.

|   | AVG(price) |
|--:|-----------:|
| 0 |      475.5 |


#### 🔥 Need help?

Refer to the 🚀 **Hint** section below the 🧭 **Check your work** code cell.

In [17]:
### BEGIN SOLUTION
query_large_listings_avg_price = '''
SELECT AVG(price)
FROM listings
WHERE (bedrooms >= 6) AND (bathrooms >= 4)
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_large_listings_avg_price, con=conn)
display(df_result)
conn.close()

Unnamed: 0,AVG(price)
0,1167.6875


#### 🧭 Check your work

In [18]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
    b'ClNFTEVDVCBBVkcocHJpY2UpCkZST00gbGlzdGluZ3MKV0hFUkUgKGJlZHJvb21zID49IDYpIEFORCAoYmF0aHJvb21zID49IDQpCg=='
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
tc.assertEqual(df_result.iloc[0, 0], df_check.iloc[0, 0], f'Average price should be {df_check.iloc[0, 0]}')
conn.close()

#### 🚀 Hint

The query below calculates the average of `column4` where `column2` is greater or equal to 6 and `column3` is greater or equal to 4.

```sql
SELECT AVG(column4)
FROM my_table
WHERE (column2 >= 6) AND (column3 >= 4);
```

---

### 🎯 Part 9: Number of listings by neighbourhood

#### 👇 Tasks

- ✔️ Using the `listings` table, find the number of listings by `neighbourhood`.
- ✔️ Store your query to a new variable named `query_num_listings_by_neighbourhood`.
- ✔️ The result of your query should have the following two columns:
    - `neighbourhood`
    - `num_listings`
- ✔️ Sort the result by `num_listings` in descending order.

#### 🔑 Output Format

Your query result will likely have different number of rows and cell values.

|   | neighbourhood | num_listings |
|--:|--------------:|-------------:|
| 0 |     Champaign |           60 |
| 1 |        Urbana |           50 |
| 2 |         Savoy |           30 |
| 3 |        Fisher |           20 |

#### 🔥 No hint

Refer to the 🚀 **Hint** section below the 🧭 **Check your work** code cell.

In [19]:
### BEGIN SOLUTION
query_num_listings_by_neighbourhood = '''
SELECT neighbourhood, COUNT(*) AS num_listings
FROM listings
GROUP BY neighbourhood
ORDER BY num_listings DESC;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_num_listings_by_neighbourhood, con=conn)
display(df_result)
conn.close()

Unnamed: 0,neighbourhood,num_listings
0,West Town,430
1,Lake View,336
2,Logan Square,256
3,Near North Side,191
4,Lincoln Park,169
...,...,...
63,Edison Park,2
64,Avalon Park,2
65,West Englewood,1
66,West Elsdon,1


#### 🧭 Check your work

In [20]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCBuZWlnaGJvdXJob29kLCBDT1VOVCgqKSBhcyBudW1fbGlzdGluZ3MKRlJPTSBsaXN0aW\
5ncwpHUk9VUCBCWSBuZWlnaGJvdXJob29kCk9SREVSIEJZIG51bV9saXN0aW5ncyBERVNDOwo='
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['num_listings'].reset_index(drop=True),
                               df_check['num_listings'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

#### 🚀 Hint

Assuming that the `listings` table has a `city` column, the query below counts the number of listings for each `city`. It sorts the result by `num_listings` in descending order.

```sql
SELECT city, COUNT(*) AS num_listings
FROM listings
GROUP BY city
ORDER BY num_listings DESC;
```

---

### 🎯 Part 10: Number of superhost/non-superhost listings

#### 👇 Tasks

- ✔️ Using the `listings` table, find the number of listings of superhost/non-superhost.
- ✔️ Store your query to a new variable named `query_num_listings_by_host_status`.
- ✔️ The result of your query should have the following two columns:
    - `is_superhost`: `0` or `1`
    - `num_listings`
- ✔️ Sort the result by `is_superhost` in ascending order.

#### 🔑 Output Format

Your query result will likely have different cell values.


|   | is_superhost | num_listings |
|--:|-------------:|-------------:|
| 0 |            0 |          500 |
| 1 |            1 |          400 |

#### 💣 No hint

Try this one on your own!

In [21]:
### BEGIN SOLUTION
query_num_listings_by_host_status = '''
SELECT is_superhost, COUNT(*) as num_listings
FROM listings
GROUP BY is_superhost
ORDER BY is_superhost;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_num_listings_by_host_status, con=conn)
display(df_result)
conn.close()

Unnamed: 0,is_superhost,num_listings
0,0,1628
1,1,1589


#### 🧭 Check your work

In [22]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
    b'ClNFTEVDVCBpc19zdXBlcmhvc3QsIENPVU5UKCopIGFzIG51bV9saXN0aW5ncwpGUk9NIGxpc3RpbmdzCkdST1VQIEJZIGlzX3N1cGVyaG9zdApPUkRFUiBCWSBpc19zdXBlcmhvc3Q7Cg=='
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['is_superhost'].reset_index(drop=True),
                               df_check['is_superhost'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

## 💵 Metrics of interest

In this part, you will analyze the key metrics of rental properties such as occupancy rate (`availability_365`), rental rates (`price`), and guest feedback (`review_score`).

---

### 🎯 Part 11: Metrics by room types

#### 👇 Tasks

- ✔️ Using the `listings` table, find the **averages** of following three columns by `room_type`:
    - `price`
    - `availability_365`
    - `review_score`
- ✔️ Store your query to a new variable named `query_metrics_by_room_type`.
- ✔️ The result of your query should have the following 4 columns (in the same order):
    - `room_type`
    - `avg_price`
    - `avg_availability`
    - `avg_review_score`
- ✔️ Sort the result by `avg_availability` in ascending order.

#### 🔑 Output Format

Your query result will likely have different cell values and may have different number of rows.

|   |       room_type |  avg_price | avg_availability | avg_review_score |
|--:|----------------:|-----------:|-----------------:|-----------------:|
| 0 |     Shared room |  40.100000 |       162.821215 |        95.491510 |
| 1 |    Private room |  60.236737 |       189.997992 |        95.500100 |
| 2 | Entire home/apt | 104.599667 |       225.127593 |        92.105033 |
| 3 |      Hotel room |  66.666667 |       310.444444 |        88.666667 |


#### 🔥 Need help?

Refer to the 🚀 **Hint** section below the 🧭 **Check your work** code cell.

In [23]:
### BEGIN SOLUTION
query_metrics_by_room_type = '''
SELECT 
    room_type, 
    AVG(price) AS avg_price, 
    AVG(availability_365) AS avg_availability, 
    AVG(review_score) AS avg_review_score
FROM listings
GROUP BY room_type
ORDER BY avg_availability;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_metrics_by_room_type, con=conn)
display(df_result)
conn.close()

Unnamed: 0,room_type,avg_price,avg_availability,avg_review_score
0,Entire home/apt,150.298467,167.565374,95.996393
1,Private room,64.475162,173.898488,95.384449
2,Shared room,27.595238,213.928571,94.404762
3,Hotel room,724.193548,268.129032,92.709677


#### 🧭 Check your work

In [24]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
    b'ClNFTEVDVCAKICAgIHJvb21fdHlwZSwKICAgIEFWRyhwcmljZSkgQVMgYXZnX3ByaWNlLCAKICAgIEFWRyhhdmFpbGFiaWxpdHlfMzY1KSBBUyBhdmdfYXZhaWxhYmlsaXR5LCAKICAgIEFWRyhyZXZpZXdfc2NvcmUpIEFTIGF2Z19yZXZpZXdfc2NvcmUKRlJPTSBsaXN0aW5ncwpHUk9VUCBCWSByb29tX3R5cGUKT1JERVIgQlkgYXZnX2F2YWlsYWJpbGl0eTsK'
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['avg_availability'].reset_index(drop=True),
                               df_check['avg_availability'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

#### 🚀 Hint

Below is a **fully-functional** solution query. You do not have to change anything.

```sql
SELECT 
    room_type, 
    AVG(price) AS avg_price, 
    AVG(availability_365) AS avg_availability, 
    AVG(review_score) AS avg_review_score
FROM listings
GROUP BY room_type
ORDER BY avg_availability;
```

---

### 🎯 Part 12: Metrics by number of bedrooms

#### 👇 Tasks

- ✔️ Using the `listings` table, find the following measures by the number of bedrooms (`bedrooms`):
    - Count of rows (number of listings)
    - Average `price`
    - Average `availability_365`
    - Average `review_score`
- ✔️ Store your query to a new variable named `query_metrics_by_bedrooms`.
- ✔️ The result of your query should have the following 5 columns (in the same order):
    - `bedrooms`
    - `num_listings`
    - `avg_price`
    - `avg_availability`
    - `avg_review_score`
- ✔️ Sort the result by `bedrooms` in ascending order.

#### 🔑 Output Format

Your query result will likely have different cell values and may have different number of rows.

|   | bedrooms | num_listings |  avg_price | avg_availability | avg_review_score |
|--:|---------:|-------------:|-----------:|-----------------:|-----------------:|
| 0 |      1.0 |          524 |  95.839695 |       218.353053 |        94.398855 |
| 1 |      2.0 |          207 | 143.144928 |       202.579710 |        94.275362 |
| 2 |      3.0 |           98 | 227.326531 |       211.755102 |        96.285714 |
| 3 |      4.0 |           27 | 364.111111 |       205.592593 |        95.481481 |
| 4 |      5.0 |            9 | 491.555556 |       245.333333 |        95.666667 |
| 5 |      6.0 |            4 | 477.250000 |       262.500000 |        96.500000 |


#### 💣 No hint

Try this one on your own!

In [25]:
### BEGIN SOLUTION
query_metrics_by_bedrooms = '''
SELECT 
    bedrooms,
    COUNT(*) AS num_listings,
    AVG(price) AS avg_price, 
    AVG(availability_365) AS avg_availability, 
    AVG(review_score) AS avg_review_score
FROM listings
GROUP BY bedrooms
ORDER BY bedrooms;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_metrics_by_bedrooms, con=conn)
display(df_result)
conn.close()

Unnamed: 0,bedrooms,num_listings,avg_price,avg_availability,avg_review_score
0,1.0,1621,86.137569,166.371993,95.526835
1,2.0,904,123.863938,160.748894,95.881637
2,3.0,491,168.069246,181.06721,96.08554
3,4.0,131,308.244275,221.610687,96.206107
4,5.0,45,420.755556,235.555556,96.755556
5,6.0,12,1345.916667,254.5,97.583333
6,7.0,6,348.833333,242.666667,94.833333
7,8.0,3,792.333333,258.0,97.666667
8,9.0,2,683.0,320.5,98.5
9,10.0,1,786.0,259.0,96.0


#### 🧭 Check your work

In [26]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCAKICAgIGJlZHJvb21zLAogICAgQ09VTlQoKikgQVMgbnVtX2xpc3RpbmdzLAogICAgQVZHKHByaWN\
lKSBBUyBhdmdfcHJpY2UsIAogICAgQVZHKGF2YWlsYWJpbGl0eV8zNjUpIEFTIGF2Z19hdmFpbGFiaWxpdHksIAog\
ICAgQVZHKHJldmlld19zY29yZSkgQVMgYXZnX3Jldmlld19zY29yZQpGUk9NIGxpc3RpbmdzCkdST1VQIEJZIGJlZHJvb21zCk9SREVSIEJZIGJlZHJvb21zOwo='
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['avg_availability'].reset_index(drop=True),
                               df_check['avg_availability'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

### 🎯 Part 13: Metrics by neighbourhood

#### 👇 Tasks

- ✔️ Using the `listings` table, find the following measures by the neighbourhood (`neighbourhood`):
    - Count of rows (number of listings)
    - Average `price`
    - Average `availability_365`
    - Average `review_score`
- ✔️ Store your query to a new variable named `query_metrics_by_neighbourhood`.
- ✔️ The result of your query should have the following 5 columns (in the same order):
    - `neighbourhood`
    - `num_listings`
    - `avg_price`
    - `avg_availability`
    - `avg_review_score`
- ✔️ Sort the result by `num_listings` in descending order.

#### 🔑 Output Format

Your query result will likely have different number of rows and cell values.

|   | neighbourhood | num_listings |  avg_price | avg_availability | avg_review_score |
|--:|--------------:|-------------:|-----------:|-----------------:|-----------------:|
| 0 |     Champaign |          107 | 120.299065 |       238.392523 |        93.504673 |
| 1 |        Urbana |           75 |  88.938776 |       197.938776 |        95.306122 |
| 2 |       Mahomet |           41 |  95.347826 |       237.304348 |        93.000000 |
| 3 |       Rantoul |           26 | 287.000000 |       195.166667 |        97.166667 |

#### 💣 No hint

Try this one on your own!

In [27]:
### BEGIN SOLUTION
query_metrics_by_neighbourhood = '''
SELECT 
    neighbourhood,
    COUNT(*) AS num_listings,
    AVG(price) AS avg_price, 
    AVG(availability_365) AS avg_availability, 
    AVG(review_score) AS avg_review_score
FROM listings
GROUP BY neighbourhood
ORDER BY num_listings DESC;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_metrics_by_neighbourhood, con=conn)
display(df_result)
conn.close()

Unnamed: 0,neighbourhood,num_listings,avg_price,avg_availability,avg_review_score
0,West Town,430,132.213953,151.095349,96.388372
1,Lake View,336,184.476190,164.491071,96.437500
2,Logan Square,256,120.261719,162.449219,96.640625
3,Near North Side,191,158.308901,198.963351,94.099476
4,Lincoln Park,169,154.621302,159.272189,95.627219
...,...,...,...,...,...
63,Edison Park,2,119.000000,318.000000,94.500000
64,Avalon Park,2,92.000000,0.000000,97.500000
65,West Englewood,1,28.000000,180.000000,92.000000
66,West Elsdon,1,85.000000,352.000000,87.000000


#### 🧭 Check your work

In [28]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCAKICAgIG5laWdoYm91cmhvb2QsCiAgICBDT1VOVCgqKSBBUyBudW1fbGlzdGluZ3MsCiAg\
ICBBVkcocHJpY2UpIEFTIGF2Z19wcmljZSwgCiAgICBBVkcoYXZhaWxhYmlsaXR5XzM2NSkgQVMgYXZnX2\
F2YWlsYWJpbGl0eSwgCiAgICBBVkcocmV2aWV3X3Njb3JlKSBBUyBhdmdfcmV2aWV3X3Njb3JlCkZST00g\
bGlzdGluZ3MKR1JPVVAgQlkgbmVpZ2hib3VyaG9vZApPUkRFUiBCWSBudW1fbGlzdGluZ3MgREVTQzsK'
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['num_listings'].reset_index(drop=True),
                               df_check['num_listings'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

### 🎯 Part 14: Metrics by neighbourhood, expanded

#### 👇 Tasks
- ✔️ This time, you will include more details on the listing size (average number of bedrooms, bathrooms, accommodations).
- ✔️ Using the `listings` table, find the following measures by the neighbourhood (`neighbourhood`):
    - Count of rows (number of listings)
    - Average number of `bedrooms`
    - Average number of `bathrooms`
    - Average number of maximum guests (average of `accommodates`)
    - Average `price`
    - Average `availability_365`
    - Average `review_score`
- ✔️ Store your query to a new variable named `query_details_by_neighbourhood`.
- ✔️ The result of your query should have the following 8 columns (in the same order):
    - `neighbourhood`
    - `num_listings`
    - `avg_bedrooms`,
    - `avg_bathrooms`,
    - `avg_accommodates`,
    - `avg_price`
    - `avg_availability`
    - `avg_review_score`
- ✔️ Sort the result by `num_listings` in descending order.

#### 🔑 Output Format

Your query result will likely have different number of rows and cell values.

|   | neighbourhood | num_listings | avg_bedrooms | avg_bathrooms | avg_accommodates |  avg_price | avg_availability | avg_review_score |
|--:|--------------:|-------------:|-------------:|--------------:|-----------------:|-----------:|-----------------:|-----------------:|
| 0 |     Champaign |          107 |     1.736842 |      1.552632 |         4.771930 | 108.192982 |       186.982456 |        95.631579 |
| 1 |        Urbana |           75 |     1.500000 |      1.166667 |         3.666667 | 117.666667 |       212.666667 |        98.333333 |
| 2 |       Mahomet |           41 |     1.333333 |      1.500000 |         2.333333 |  49.000000 |        31.666667 |        98.333333 |
| 3 |       Rantoul |           26 |     1.000000 |      1.000000 |         2.000000 |  29.000000 |       322.500000 |        96.000000 |

#### 🔥 Need help?

The query will build upon the previous part and will be highly similar.

In [29]:
### BEGIN SOLUTION
query_details_by_neighbourhood = '''
SELECT 
    neighbourhood,
    COUNT(*) AS num_listings,
    AVG(bedrooms) AS avg_bedrooms,
    AVG(bathrooms) AS avg_bathrooms,
    AVG(accommodates) AS avg_accommodates,
    AVG(price) AS avg_price, 
    AVG(availability_365) AS avg_availability, 
    AVG(review_score) AS avg_review_score
FROM listings
GROUP BY neighbourhood
ORDER BY num_listings DESC;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_details_by_neighbourhood, con=conn)
display(df_result)
conn.close()

Unnamed: 0,neighbourhood,num_listings,avg_bedrooms,avg_bathrooms,avg_accommodates,avg_price,avg_availability,avg_review_score
0,West Town,430,1.890698,1.411628,4.874419,132.213953,151.095349,96.388372
1,Lake View,336,1.979167,1.535714,4.967262,184.476190,164.491071,96.437500
2,Logan Square,256,1.855469,1.263672,4.593750,120.261719,162.449219,96.640625
3,Near North Side,191,1.717277,1.489529,4.785340,158.308901,198.963351,94.099476
4,Lincoln Park,169,1.905325,1.437870,5.218935,154.621302,159.272189,95.627219
...,...,...,...,...,...,...,...,...
63,Edison Park,2,2.500000,2.000000,8.000000,119.000000,318.000000,94.500000
64,Avalon Park,2,2.000000,1.000000,5.500000,92.000000,0.000000,97.500000
65,West Englewood,1,1.000000,1.000000,1.000000,28.000000,180.000000,92.000000
66,West Elsdon,1,1.000000,1.000000,3.000000,85.000000,352.000000,87.000000


#### 🧭 Check your work

In [30]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCAKICAgIG5laWdoYm91cmhvb2QsCiAgICBDT1VOVCgqKSBBUyBudW1fbGlzdGluZ3MsCiAgICBBVkcoYmVkcm9vbXMpI\
GFzIGF2Z19iZWRyb29tcywKICAgIEFWRyhiYXRocm9vbXMpIGFzIGF2Z19iYXRocm9vbXMsCiAgICBBVkcoYWNjb21tb2RhdGVzKSBh\
cyBhdmdfYWNjb21tb2RhdGVzLAogICAgQVZHKHByaWNlKSBBUyBhdmdfcHJpY2UsIAogICAgQVZHKGF2YWlsYWJpbGl0eV8zNjUpIEF\
TIGF2Z19hdmFpbGFiaWxpdHksIAogICAgQVZHKHJldmlld19zY29yZSkgQVMgYXZnX3Jldmlld19zY29yZQpGUk9NIGxpc3RpbmdzCk\
dST1VQIEJZIG5laWdoYm91cmhvb2QKT1JERVIgQlkgbnVtX2xpc3RpbmdzIERFU0M7Cg=='
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['num_listings'].reset_index(drop=True),
                               df_check['num_listings'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

### 🎯 Part 15: Metrics by host status, expanded

#### 👇 Tasks
- ✔️ This time, you will repeat the previous analysis using `is_superhost` instead of `neighbourhood`.
- ✔️ Using the `listings` table, find the following measures by the host status (`is_superhost`):
    - Count of rows (number of listings)
    - Average number of `bedrooms`
    - Average number of `bathrooms`
    - Average number of maximum guests (average of `accommodates`)
    - Average `price`
    - Average `availability_365`
    - Average `review_score`
- ✔️ Store your query to a new variable named `query_details_by_host_status`.
- ✔️ The result of your query should have the following 8 columns (in the same order):
    - `is_superhost`
    - `num_listings`
    - `avg_bedrooms`,
    - `avg_bathrooms`,
    - `avg_accommodates`,
    - `avg_price`
    - `avg_availability`
    - `avg_review_score`
- ✔️ Sort the result by `is_superhost` in ascending order.

#### 🔑 Output Format

Your query result will likely have different number of rows and cell values.

|   | is_superhost | num_listings | avg_bedrooms | avg_bathrooms | avg_accommodates |  avg_price | avg_availability | avg_review_score |
|--:|-------------:|-------------:|-------------:|--------------:|-----------------:|-----------:|-----------------:|-----------------:|
| 0 |            0 |          400 |     1.465113 |      1.630648 |         4.132415 | 132.887151 |       220.066667 |        93.015012 |
| 1 |            1 |          500 |     1.357654 |      1.554667 |         5.142551 | 144.887511 |       193.269997 |        97.425115 |

In [31]:
### BEGIN SOLUTION
query_details_by_host_status = '''
SELECT 
    is_superhost,
    COUNT(*) AS num_listings,
    AVG(bedrooms) AS avg_bedrooms,
    AVG(bathrooms) AS avg_bathrooms,
    AVG(accommodates) AS avg_accommodates,
    AVG(price) AS avg_price, 
    AVG(availability_365) AS avg_availability, 
    AVG(review_score) AS avg_review_score
FROM listings
GROUP BY is_superhost
ORDER BY is_superhost;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_details_by_host_status, con=conn)
display(df_result)
conn.close()

Unnamed: 0,is_superhost,num_listings,avg_bedrooms,avg_bathrooms,avg_accommodates,avg_price,avg_availability,avg_review_score
0,0,1628,1.775184,1.445946,4.547297,129.278256,172.953931,93.92199
1,1,1589,1.849591,1.335431,4.595972,129.773442,168.922593,97.658905


#### 🧭 Check your work

In [32]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCAKICAgIGlzX3N1cGVyaG9zdCwKICAgIENPVU5UKCopIEFTIG51bV9saXN0aW5ncywKICAgIEFWRyhiZWRyb29tcy\
kgYXMgYXZnX2JlZHJvb21zLAogICAgQVZHKGJhdGhyb29tcykgYXMgYXZnX2JhdGhyb29tcywKICAgIEFWRyhhY2NvbW1vZGF0ZX\
MpIGFzIGF2Z19hY2NvbW1vZGF0ZXMsCiAgICBBVkcocHJpY2UpIEFTIGF2Z19wcmljZSwgCiAgICBBVkcoYXZhaWxhYmlsaXR5Xz\
M2NSkgQVMgYXZnX2F2YWlsYWJpbGl0eSwgCiAgICBBVkcocmV2aWV3X3Njb3JlKSBBUyBhdmdfcmV2aWV3X3Njb3JlCkZST00gbG\
lzdGluZ3MKR1JPVVAgQlkgaXNfc3VwZXJob3N0Ck9SREVSIEJZIGlzX3N1cGVyaG9zdDsK'
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['is_superhost'].reset_index(drop=True),
                               df_check['is_superhost'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

## 🐳 More queries!

In this part, you will try out queries that calculate metrics of interest by more than one criteria. In Pandas terms, this would be grouping by more than one column.

**The fully-working query for each part will be given to you.** Your task is to *try* to understand what each query does. 🐬

![SQL Query Meme](https://github.com/bdi475/images/blob/main/case-studies/airbnb-sql/my-sql-query-meme-01.jpg?raw=true)

---

### 🎯 Part 16: Metrics by neighbourhood + room type

#### 👇 Tasks
- ✔️ Store your query to a new variable named `query_neighbourhood_room_type`.

![query](https://github.com/bdi475/images/blob/main/case-studies/airbnb-sql/query-neighbourhood-room-type-popularity-01.png?raw=true)

In [33]:
### BEGIN SOLUTION
query_neighbourhood_room_type = '''
SELECT 
    neighbourhood,
    room_type,
    COUNT(*) AS num_listings,
    AVG(bedrooms) as avg_bedrooms,
    AVG(bathrooms) as avg_bathrooms,
    AVG(accommodates) as avg_accommodates,
    AVG(price) AS avg_price, 
    AVG(availability_365) AS avg_availability, 
    AVG(review_score) AS avg_review_score
FROM listings
GROUP BY neighbourhood, room_type
HAVING num_listings > 20
ORDER BY avg_availability
LIMIT 10;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_neighbourhood_room_type, con=conn)
display(df_result)
conn.close()

Unnamed: 0,neighbourhood,room_type,num_listings,avg_bedrooms,avg_bathrooms,avg_accommodates,avg_price,avg_availability,avg_review_score
0,Near West Side,Private room,40,1.0,1.4,2.4,57.55,131.175,95.475
1,Lincoln Park,Private room,22,1.090909,1.159091,2.727273,75.590909,132.181818,94.363636
2,Lower West Side,Entire home/apt,68,1.926471,1.161765,5.132353,104.323529,133.691176,94.75
3,Lower West Side,Private room,42,1.071429,1.166667,2.190476,118.214286,134.166667,95.261905
4,Lincoln Square,Entire home/apt,44,1.772727,1.204545,4.477273,100.0,140.909091,96.477273
5,West Town,Private room,103,1.087379,1.160194,2.300971,67.660194,141.932039,96.242718
6,South Lawndale,Entire home/apt,21,1.952381,1.047619,5.047619,75.714286,142.666667,93.857143
7,Rogers Park,Private room,34,1.058824,1.426471,2.088235,64.029412,144.264706,96.235294
8,Irving Park,Entire home/apt,57,2.368421,1.307018,5.368421,177.982456,146.421053,97.192982
9,Armour Square,Private room,25,1.0,1.76,1.92,44.48,146.88,92.68


#### 🧭 Check your work

In [34]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCAKICAgIG5laWdoYm91cmhvb2QsCiAgICByb29tX3R5cGUsCiAgICBDT1VOVCgqKSBBUyBudW1fbGlzdGluZ3MsCiAg\
ICBBVkcoYmVkcm9vbXMpIGFzIGF2Z19iZWRyb29tcywKICAgIEFWRyhiYXRocm9vbXMpIGFzIGF2Z19iYXRocm9vbXMsCiAgICBBVk\
coYWNjb21tb2RhdGVzKSBhcyBhdmdfYWNjb21tb2RhdGVzLAogICAgQVZHKHByaWNlKSBBUyBhdmdfcHJpY2UsIAogICAgQVZHKGF2\
YWlsYWJpbGl0eV8zNjUpIEFTIGF2Z19hdmFpbGFiaWxpdHksIAogICAgQVZHKHJldmlld19zY29yZSkgQVMgYXZnX3Jldmlld19zY2\
9yZQpGUk9NIGxpc3RpbmdzCkdST1VQIEJZIG5laWdoYm91cmhvb2QsIHJvb21fdHlwZQpIQVZJTkcgbnVtX2xpc3RpbmdzID4gMjAK\
T1JERVIgQlkgYXZnX2F2YWlsYWJpbGl0eQpMSU1JVCAxMDsK'
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['avg_availability'].reset_index(drop=True),
                               df_check['avg_availability'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

### 🎯 Part 17: Rounded metrics by neighbourhood + room type

#### 👇 Tasks
- ✔️ Store your query to a new variable named `query_neighbourhood_room_type_rounded`.
- ✔️ The query below performs identical tasks to the previous deliverable, but rounds up the numbers to 1 decimal place.

![query](https://github.com/bdi475/images/blob/main/case-studies/airbnb-sql/query-neighbourhood-room-type-popularity-rounded-01.png?raw=true)

In [35]:
### BEGIN SOLUTION
query_neighbourhood_room_type_rounded = '''
SELECT 
    neighbourhood,
    room_type,
    COUNT(*) AS num_listings,
    ROUND(AVG(bedrooms), 1) AS avg_bedrooms,
    ROUND(AVG(bathrooms), 1) AS avg_bathrooms,
    ROUND(AVG(accommodates), 1) AS avg_accommodates,
    ROUND(AVG(price), 1) AS avg_price, 
    ROUND(AVG(availability_365), 1) AS avg_availability, 
    ROUND(AVG(review_score), 1) AS avg_review_score
FROM listings
GROUP BY neighbourhood, room_type
HAVING num_listings > 20
ORDER BY avg_availability
LIMIT 10;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_neighbourhood_room_type_rounded, con=conn)
display(df_result)
conn.close()

Unnamed: 0,neighbourhood,room_type,num_listings,avg_bedrooms,avg_bathrooms,avg_accommodates,avg_price,avg_availability,avg_review_score
0,Near West Side,Private room,40,1.0,1.4,2.4,57.6,131.2,95.5
1,Lincoln Park,Private room,22,1.1,1.2,2.7,75.6,132.2,94.4
2,Lower West Side,Entire home/apt,68,1.9,1.2,5.1,104.3,133.7,94.8
3,Lower West Side,Private room,42,1.1,1.2,2.2,118.2,134.2,95.3
4,Lincoln Square,Entire home/apt,44,1.8,1.2,4.5,100.0,140.9,96.5
5,West Town,Private room,103,1.1,1.2,2.3,67.7,141.9,96.2
6,South Lawndale,Entire home/apt,21,2.0,1.0,5.0,75.7,142.7,93.9
7,Rogers Park,Private room,34,1.1,1.4,2.1,64.0,144.3,96.2
8,Irving Park,Entire home/apt,57,2.4,1.3,5.4,178.0,146.4,97.2
9,Armour Square,Private room,25,1.0,1.8,1.9,44.5,146.9,92.7


#### 🧭 Check your work

In [36]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCAKICAgIG5laWdoYm91cmhvb2QsCiAgICByb29tX3R5cGUsCiAgICBDT1VOVCgqKSBBUyBudW1fbGlzdGluZ3\
MsCiAgICBST1VORChBVkcoYmVkcm9vbXMpLCAxKSBhcyBhdmdfYmVkcm9vbXMsCiAgICBST1VORChBVkcoYmF0aHJvb21zKS\
wgMSkgYXMgYXZnX2JhdGhyb29tcywKICAgIFJPVU5EKEFWRyhhY2NvbW1vZGF0ZXMpLCAxKSBhcyBhdmdfYWNjb21tb2RhdG\
VzLAogICAgUk9VTkQoQVZHKHByaWNlKSwgMSkgQVMgYXZnX3ByaWNlLCAKICAgIFJPVU5EKEFWRyhhdmFpbGFiaWxpdHlfMz\
Y1KSwgMSkgQVMgYXZnX2F2YWlsYWJpbGl0eSwgCiAgICBST1VORChBVkcocmV2aWV3X3Njb3JlKSwgMSkgQVMgYXZnX3Jldm\
lld19zY29yZQpGUk9NIGxpc3RpbmdzCkdST1VQIEJZIG5laWdoYm91cmhvb2QsIHJvb21fdHlwZQpIQVZJTkcgbnVtX2xpc3\
RpbmdzID4gMjAKT1JERVIgQlkgYXZnX2F2YWlsYWJpbGl0eQpMSU1JVCAxMDsK'
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['avg_availability'].reset_index(drop=True),
                               df_check['avg_availability'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

### 🎯 Part 18: Listings that were never booked

#### 👇 Tasks
- ✔️ Store your query to a new variable named `query_never_booked_listings`.

![query](https://github.com/bdi475/images/blob/main/case-studies/airbnb-sql/query-never-booked-listings-01.png?raw=true)

In [37]:
### BEGIN SOLUTION
query_never_booked_listings = '''
SELECT name, neighbourhood, room_type, bedrooms, bathrooms, accommodates, price, availability_365
FROM listings
WHERE availability_365 = 365
ORDER BY name;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_never_booked_listings, con=conn)
display(df_result)
conn.close()

Unnamed: 0,name,neighbourhood,room_type,bedrooms,bathrooms,accommodates,price,availability_365
0,#54-Home away from home- Safe place,Dunning,Private room,1.0,1.0,2,33.0,365
1,"*McCormickMidway5 mi downtown43GreenTrain,Free...",Grand Boulevard,Private room,2.0,1.0,4,85.0,365
2,*Recent Price Drop! Superb location!,West Town,Private room,1.0,2.0,2,71.0,365
3,"1 BD Condo by downtown,park,lake",Near North Side,Entire home/apt,1.0,1.0,2,75.0,365
4,"1 Bed/Bath. Near Lake, Zoo, Wrigley",Lincoln Park,Entire home/apt,1.0,1.0,2,110.0,365
...,...,...,...,...,...,...,...,...
153,"Your Own Home, 2-Bath with Laundry and Parking!",Logan Square,Entire home/apt,1.0,2.0,4,204.0,365
154,a-PublicTransit&FewMinAwayFromMcCormickPlace,Grand Boulevard,Private room,1.0,1.0,2,44.0,365
155,private bedroom/shared bathroom by Uni. of Chi...,South Shore,Private room,1.0,1.0,2,50.0,365
156,♥Wicker Park🏠LARGE 1300SF🌞WiFi AC🚝SS DW W/D,Logan Square,Entire home/apt,1.0,1.0,7,88.0,365


#### 🧭 Check your work

In [38]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCBuYW1lLCBuZWlnaGJvdXJob29kLCByb29tX3R5cGUsIGJlZHJvb21zLCBiYXRocm9vbXMsI\
GFjY29tbW9kYXRlcywgcHJpY2UsIGF2YWlsYWJpbGl0eV8zNjUKRlJPTSBsaXN0aW5ncwpXSEVSRSBhdmFp\
bGFiaWxpdHlfMzY1ID0gMzY1Ck9SREVSIEJZIG5hbWU7Cg=='
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['name'].reset_index(drop=True),
                               df_check['name'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

### 🎯 Part 19: Top 15 highest-rated listings

#### 👇 Tasks
- ✔️ Store your query to a new variable named `query_highest_rated_listings`.

![query](https://github.com/bdi475/images/blob/main/case-studies/airbnb-sql/query-highest-rated-listings-01.png?raw=true)

In [39]:
### BEGIN SOLUTION
query_highest_rated_listings = '''
SELECT
    name, neighbourhood, room_type, bedrooms, bathrooms, 
    accommodates, price, review_score, number_of_reviews
FROM listings
ORDER BY review_score DESC, number_of_reviews DESC
LIMIT 15;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_highest_rated_listings, con=conn)
display(df_result)
conn.close()

Unnamed: 0,name,neighbourhood,room_type,bedrooms,bathrooms,accommodates,price,review_score,number_of_reviews
0,"Polk Street Coach House Apartment, Little Ital...",Near West Side,Entire home/apt,2.0,1.0,6,110.0,100.0,511
1,"Cozy Wicker Park, Chicago Apartment",West Town,Entire home/apt,1.0,1.0,2,159.0,100.0,269
2,Andersonville - Perfect location!,Uptown,Private room,1.0,1.5,2,110.0,100.0,250
3,Urban Comfort in the Heart of Chicago,Lake View,Entire home/apt,1.0,1.0,3,95.0,100.0,250
4,"Private Room/Bath Wicker Park, Free Street Par...",West Town,Private room,1.0,1.0,2,40.0,100.0,216
5,Andy's Bed & Breakfast Queen Room,North Park,Private room,1.0,1.0,2,45.0,100.0,200
6,Relax in a Newly Remodeled Apartment in Irving...,Irving Park,Entire home/apt,2.0,1.0,6,132.0,100.0,187
7,"Hyde Park - Walk to UChicago, 10 min to McCormick",Hyde Park,Private room,1.0,1.0,1,65.0,100.0,181
8,Lincoln Park Avant Artist's Private 1891 Apart...,Lincoln Park,Private room,1.0,1.0,3,99.0,100.0,174
9,Northside Chicago Favorite: Rehabbed w/parking!,Rogers Park,Entire home/apt,3.0,1.5,6,176.0,100.0,168


#### 🧭 Check your work

In [40]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVAogICAgbmFtZSwgbmVpZ2hib3VyaG9vZCwgcm9vbV90eXBlLCBiZWRyb29tcywgYmF0aHJvb21zLCAKICAgIGFjY\
29tbW9kYXRlcywgcHJpY2UsIHJldmlld19zY29yZSwgbnVtYmVyX29mX3Jldmlld3MKRlJPTSBsaXN0aW5ncwpPUkRFUiBCWSBy\
ZXZpZXdfc2NvcmUgREVTQywgbnVtYmVyX29mX3Jldmlld3MgZEVTQwpMSU1JVCAxNTsK'
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['review_score'].reset_index(drop=True),
                               df_check['review_score'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

### 🎯 Part 20: Search by keyword

#### 👇 Tasks
- ✔️ Store your query to a new variable named `query_bright_listings`.

![query](https://github.com/bdi475/images/blob/main/case-studies/airbnb-sql/query-search-by-keyword-01.png?raw=true)

In [41]:
### BEGIN SOLUTION
query_bright_listings = '''
SELECT name, price, review_score, number_of_reviews
FROM listings
WHERE LOWER(name) LIKE '%bright%'
ORDER BY review_score DESC, number_of_reviews DESC
LIMIT 10;
'''
### END SOLUTION

conn = sqlite3.connect('airbnb.db')
df_result = pd.read_sql_query(query_bright_listings, con=conn)
display(df_result)
conn.close()

Unnamed: 0,name,price,review_score,number_of_reviews
0,Relax in a Bright and Spacious Apartment in Lo...,128.0,100.0,144
1,Bright family-friendly Lincoln Square 2-bed 1-...,94.0,100.0,135
2,ARTpartment Retreat | Bright & Bold in Bridgeport,115.0,100.0,77
3,Bright & Spacious 3br/2ba Condo near Wrigleyville,231.0,100.0,38
4,Huge Bright Modern 2BR Home in Heart of Bucktown,165.0,100.0,24
5,Bright and Cozy Apt in Irving Park / Albany Park!,100.0,100.0,20
6,Bright Vintage Room Near University of Chicago,68.0,100.0,17
7,"Bright, Clean, and Spacious 3 BR Logan Square Apt",76.0,100.0,14
8,Bright Chicago Apt. 1BR 1.4 miles fr Wrigley,89.0,99.0,237
9,"Lovely, bright, 3-bedroom Avondale flat. Private.",95.0,99.0,221


#### 🧭 Check your work

In [42]:
conn = sqlite3.connect('airbnb.db')
decoded_query = base64.b64decode(
b'ClNFTEVDVCBuYW1lLCBwcmljZSwgcmV2aWV3X3Njb3JlLCBudW1iZXJfb2ZfcmV2aWV3cwpGUk9NIG\
xpc3RpbmdzCldIRVJFIExPV0VSKG5hbWUpIExJS0UgJyVicmlnaHQlJwpPUkRFUiBCWSByZXZpZXdfc2\
NvcmUgREVTQywgbnVtYmVyX29mX3Jldmlld3MgREVTQwpMSU1JVCAxMDsK'
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['review_score'].reset_index(drop=True),
                               df_check['review_score'].reset_index(drop=True))
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()