# SQL Queries and Aggregations

---

## ✨ Aggregating Data

Today, we will review basic SQL queries and try out a few aggregations. We will perform every exercise using both Pandas and SQL to see how similar they are.

▶️ First, run the code cell below to import modules used for **🧭 Check Your Work** sections and the autograder.

In [None]:
import unittest
import base64
tc = unittest.TestCase()

---

### 🎯 Pre-exercise: Import Packages

#### 👇 Tasks

- ✔️ Import the following Python packages.
    1. `pandas`: Use alias `pd`.
    2. `numpy`: Use alias `np`.
    3. `sqlite3`: No alias

In [None]:
# YOUR CODE BEGINS


# YOUR CODE ENDS

#### 🧭 Check your work

In [None]:
import sys
tc.assertTrue('pd' in globals(), 'Check whether you have correctly imported Pandas with an alias.')
tc.assertTrue('np' in globals(), 'Check whether you have correctly imported NumPy with an alias.')
tc.assertTrue('sqlite3' in globals(), 'Check whether you have correctly imported the sqlite3 package.')

---
### 📌 Populate a database table from a CSV file

▶️ Run the code below to create a DataFrame named `df_listings`.

In [None]:
df_listings = pd.read_csv('https://github.com/bdi475/datasets/raw/main/boston-airbnb-listings-small.csv')
df_listings_backup = df_listings.copy()

display(df_listings.head(5))

▶️ Run the code below to populate the `listings` table. All data in `df_listings` will be inserted to the table.

In [None]:
conn = sqlite3.connect('airbnb-boston-medium.db')
c = conn.cursor()

tables = list(pd.read_sql_query('SELECT * FROM sqlite_master WHERE type="table";', con=conn)['tbl_name'])

if 'listings' in tables:
    c.execute(f'DELETE FROM listings')
    conn.commit()
    
df_listings.to_sql(name='listings', index=False, con=conn, if_exists='append')

conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn_checker = sqlite3.connect('airbnb-boston-medium.db')
table_to_check = 'listings'

# Check if table exists
user_tables = list(pd.read_sql_query('SELECT * FROM sqlite_master WHERE type="table";', con=conn_checker)['tbl_name'])
tc.assertTrue(table_to_check in user_tables, f'{table_to_check} does not exist in your airbnb-boston-medium.db file!')

conn_checker.close()

▶️ Run the code below to display `df_listings`.

In [None]:
display(df_listings)

---

### 🎯 Exercise 1: Select all columns and rows from the `listings` table (SQL)

#### 👇 Tasks

- ✔️ Write a query that selects all columns for all rows.
- ✔️ Store your query to a new variable named `query_select_all`.

In [None]:
# YOUR CODE BEGINS



# YOUR CODE ENDS
conn = sqlite3.connect('airbnb-boston-medium.db')
df_result = pd.read_sql_query(query_select_all, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('airbnb-boston-medium.db')
decoded_query = base64.b64decode(b'ClNFTEVDVCAqCkZST00gbGlzdGluZ3MK').decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
conn.close()

---

### 🎯 Exercise 2A: Find expensive listings (Pandas)

#### 👇 Tasks

- ✔️ Using `df_listings`, find all rows with a price of over $1,000 (`df_listings['price'] > 1000`).
- ✔️ Store the result to a new variable named `df_expensive_listings`.

#### 🔑 Expected Output

Your index column may contain different values.

|     | name                                           | neighbourhood   | room_type       |   bedrooms |   bathrooms |   minimum_nights |   price |   availability_365 |   number_of_reviews |   review_score |   is_superhost |
|----:|:-----------------------------------------------|:----------------|:----------------|-----------:|------------:|-----------------:|--------:|-------------------:|--------------------:|---------------:|---------------:|
| 502 | Large 4BDR near Harvard with Parking           | Allston         | Entire home/apt |          4 |         2.5 |                1 |    1250 |                  0 |                 136 |             98 |              0 |
| 669 | The Historic House in the North End/Waterfront | North End       | Entire home/apt |          3 |         2   |               91 |    1052 |                  0 |                 104 |             98 |              1 |

In [None]:
# YOUR CODE BEGINS
# YOUR CODE ENDS
display(df_expensive_listings)

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
df_check = df_listings_backup.query(f"{'PrIcE'.lower()} > {25 * 40}")

pd.testing.assert_frame_equal(df_expensive_listings.sort_values(df_expensive_listings.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

### 🎯 Exercise 2B: Find expensive listings (SQL)

#### 👇 Tasks

- ✔️ Using the `listings` table, select rows with a price of over $1,000 (`price > 1000`).
- ✔️ Select all columns.
- ✔️ Store your query to a new variable named `query_expensive_listings`.

In [None]:
# YOUR CODE BEGINS




# YOUR CODE ENDS
conn = sqlite3.connect('airbnb-boston-medium.db')
df_result = pd.read_sql_query(query_expensive_listings, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('airbnb-boston-medium.db')
decoded_query = base64.b64decode(b'ClNFTEVDVCAqCkZST00gbGlzdGluZ3MKV0hFUkUgcHJpY2UgPiAxMDAwCg==').decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

### 🎯 Exercise 3A: Find large listings (Pandas)

#### 👇 Tasks

- ✔️ Using `df_listings`, find all rows where the listing has:
    - 5 or more bedrooms
    - **AND** 3 or more bathrooms
- ✔️ Select only the following 5 columns (in the same order):
    - `name`, `bedrooms`, `bathrooms`, `price`, `review_score`
- ✔️ Store the result to a new variable named `df_large_listings`.

#### 🔑 Expected Output

Your index column may contain different values.

|     | name                                               |   bedrooms |   bathrooms |   price |   review_score |
|----:|:---------------------------------------------------|-----------:|------------:|--------:|---------------:|
| 151 | Beautiful Philadelphia house near Harvard U. w/ pk |          5 |           3 |     600 |             82 |
| 250 | Spacious 4 BR | 2.5 BA Single-Family Home.         |          5 |           3 |     373 |             97 |
| 921 | ★Large Retreat 5BR w/3BA★ Close to Everything      |          5 |           3 |     738 |             95 |

In [None]:
# YOUR CODE BEGINS

# YOUR CODE ENDS
display(df_large_listings)

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
__r = 'oo'.join(['R', 'Ms']).lower()
df_check = df_listings.query(f'bed{__r} > {2 ** 2} & bath{__r} > {2 ** 1}') \
    [['_'.join(['review', 'score']), 'price', 'bath' + __r, 'bed' + __r, 'name'][::-1]]

pd.testing.assert_frame_equal(df_large_listings.sort_values(df_large_listings.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

### 🎯 Exercise 3B: Find large listings (SQL)

#### 👇 Tasks

- ✔️ Using the `listings` table, find all rows where the listing has:
    - 5 or more bedrooms
    - **AND** 3 or more bathrooms
- ✔️ Select only the following 5 columns (in the same order):
    - `name`, `bedrooms`, `bathrooms`, `price`, `review_score`
- ✔️ Store your query to a new variable named `query_large_listings`.

In [None]:
# YOUR CODE BEGINS




# YOUR CODE ENDS
conn = sqlite3.connect('airbnb-boston-medium.db')
df_result = pd.read_sql_query(query_large_listings, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('airbnb-boston-medium.db')
decoded_query = base64.b64decode(b'ClNFTEVDVCBuYW1lLCBiZWRyb29tcywgYmF0aHJvb21zLCBwcmljZSwgcmV2aWV3X3Njb3JlCkZST00gbGlzdGluZ3MKV0hFUkUgKGJlZHJvb21zID49IDUpIEFORCAoYmF0aHJvb21zID49IDMpCg=='
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.columns.tolist(), df_check.columns.tolist(), 'Incorrect set of columns or order')
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))
conn.close()

---

### 🎯 Exercise 4A: Number of listings (Pandas)

#### 👇 Tasks

- ✔️ Find the number of rows in `df_listings`.
- ✔️ Store the result to a new variable named `num_listings`.
- ✔️ `num_listings` should be an integer type.

In [None]:
# YOUR CODE BEGINS
# YOUR CODE ENDS
print(f'There are {num_listings} AirBnB listings in the dataset.')

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
tc.assertEqual(num_listings, len(df_listings_backup.index), f'Incorrect number of listings - should be {len(df_listings_backup.index)}')

---

### 🎯 Exercise 4B: Number of listings (SQL)

#### 👇 Tasks

- ✔️ Write a query that counts the number of rows in the `listings` table.
- ✔️ Store your query in a new variable named `query_num_listings`.

In [None]:
# YOUR CODE BEGINS



# YOUR CODE ENDS
conn = sqlite3.connect('airbnb-boston-medium.db')
df_result = pd.read_sql_query(query_num_listings, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('airbnb-boston-medium.db')
decoded_query = base64.b64decode(b'ClNFTEVDVCBDT1VOVCgqKQpGUk9NIGxpc3RpbmdzCg==').decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.iloc[0, 0], df_check.iloc[0, 0], 'Incorrect number of listings')
conn.close()

---

### 🎯 Exercise 5A: Get the number of listings by neighbourhood (Pandas)

#### 👇 Tasks

- ✔️ Using `df_listings`, find the number of listings by neighbourhood.
- ✔️ Store the result to a new variable named `df_by_neighbourhood`.
- ✔️ `df_by_neighbourhood` should have the following two columns.
    1. `neighbourhood`: Name of the neighbourhood
    2. `num_listings`: Number of listings in a neighbourhood
- ✔️ `print(df_by_neighbourhood.columns.tolist())` should print `['neighbourhood', 'num_listings']`.
- ✔️ **Sort** `df_by_neighbourhood` by `num_listings` in descending order.

#### 🔑 Expected Output

Your index column may contain different values.

|    | neighbourhood   |   num_listings |
|---:|:----------------|---------------:|
|  7 | Dorchester      |            215 |
| 12 | Jamaica Plain   |            129 |
| 19 | Roxbury         |            107 |
|  9 | East Boston     |             90 |
| 22 | South End       |             86 |

In [None]:
# YOUR CODE BEGINS





# YOUR CODE ENDS
df_by_neighbourhood.head(5)

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
df_check = df_listings_backup.pivot_table(index=['neigh' + 'BoUr'.lower() + 'HooD'.lower()],
                                          values=['NaMe'.lower()],
                                          aggfunc=('CoU' + 'Nt').lower()) \
                                .rename(columns={'NaME'.lower(): '_'.join(['NuM', 'LiSTinGs']).lower()}) \
                                .reset_index().sort_values('_'.join(['NuM', 'LiSTinGs']).lower(), ascending=bool(0))

pd.testing.assert_series_equal(df_by_neighbourhood['num_listings'].reset_index(drop=True),
                               df_check['num_listings'].reset_index(drop=True))

---

### 🎯 Exercise 5B: Get the number of listings by neighbourhood (SQL)

#### 👇 Tasks

- ✔️ Write a query that counts the number of listings by neighbourhood from the `listings` table.
- ✔️ The result of the query should have the following two columns.
    1. `neighbourhood`: Name of the neighbourhood
    2. `num_listings`: Number of listings in a neighbourhood
- ✔️ Store your query in a new variable named `query_num_listings_by_neighbourhood`.
- ✔️ **Sort** your result by `num_listings` in descending order.

In [None]:
# YOUR CODE BEGINS





# YOUR CODE ENDS
conn = sqlite3.connect('airbnb-boston-medium.db')
df_result = pd.read_sql_query(query_num_listings_by_neighbourhood, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('airbnb-boston-medium.db')
decoded_query = base64.b64decode(b'ClNFTEVDVCBuZWlnaGJvdXJob29kLCBDT1VOVCgqKSBBUyBudW1fbGlzdGluZ3MKRlJPTSBsaXN0aW5ncwpHUk9VUCBCWSBuZWlnaGJvdXJob29kCk9SREVSIEJZIG51bV9saXN0aW5ncyBERVNDCg==').decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_series_equal(df_result['num_listings'].reset_index(drop=True),
                               df_result['num_listings'].reset_index(drop=True))
conn.close()

#### 🚀 Solution

```sql
SELECT neighbourhood, COUNT(*) AS num_listings
FROM listings
GROUP BY neighbourhood
ORDER BY num_listings DESC
```

---

### 🎯 Exercise 6A: Average price by room type (Pandas)

#### 👇 Tasks

- ✔️ Using `df_listings`, find the average price by room type.
- ✔️ Store the result to a new variable named `df_price_by_room_type`.
- ✔️ `df_price_by_room_type` should have the following two columns.
    1. `room_type`: Room type
    2. `price`: Average price of the room type
- ✔️ `print(df_price_by_room_type.columns.tolist())` should print `['room_type', 'price']`.
- ✔️ **Sort** `df_price_by_room_type` by `price` in ascending order.

#### 🔑 Expected Output

Your index column may contain different values.

|    | room_type       |    price |
|---:|:----------------|---------:|
|  3 | Shared room     |  34.75   |
|  2 | Private room    |  82.2278 |
|  0 | Entire home/apt | 201.135  |
|  1 | Hotel room      | 206.273  |

In [None]:
# YOUR CODE BEGINS





# YOUR CODE ENDS
df_price_by_room_type

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
df_check = df_listings_backup.pivot_table(index=['RoOm_'.lower() + 'TyPe'.lower()],
                                          values=['price'],
                                          aggfunc=np.mean) \
                                .reset_index() \
                                .sort_values('price', ascending=bool(not False))

pd.testing.assert_frame_equal(df_price_by_room_type.sort_values(df_price_by_room_type.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

### 🎯 Exercise 6B: Average price by room type (SQL)

#### 👇 Tasks

- ✔️ Write a query that finds the average price by room type from the `listings` table.
- ✔️ Store your query to a new variable named `query_price_by_room_type`.
- ✔️ Your result should have the following two columns.
    1. `room_type`: Room type
    2. `price`: Average price of the room type
- ✔️ **Sort** your result by `price` in ascending order.

In [None]:
# YOUR CODE BEGINS





# YOUR CODE ENDS
conn = sqlite3.connect('airbnb-boston-medium.db')
df_result = pd.read_sql_query(query_price_by_room_type, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('airbnb-boston-medium.db')
decoded_query = base64.b64decode(b'ClNFTEVDVCByb29tX3R5cGUsIEFWRyhwcmljZSkgYXMgcHJpY2UKRlJPTSBsaXN0aW5ncwpHUk9VUCBCWSByb29tX3R5cGUKT1JERVIgQlkgcHJpY2UK').decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_frame_equal(df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

conn.close()

---

## 🎬 More SQL Exercises

In the next section, we will work with a [Bollywood Movies Dataset](https://data.mendeley.com/datasets/3c57btcxy9/1) that includes all 1698 Hindi language movies released in India between 2005 and 2017 from the website of Box Office India.

Source: [Mendeley Data](https://data.mendeley.com/datasets/3c57btcxy9/1)


### 📌 Populate a database table from a CSV file

▶️ Run the code below to create a DataFrame named `df_movies`.

In [None]:
df_movies = pd.read_csv('https://github.com/bdi475/datasets/raw/main/bollywood-movies.csv')
df_movies_backup = df_movies.copy()

display(df_movies.head(5))

▶️ Run the code below to populate the `movies` table. All data in `df_movies` will be inserted to the table.

In [None]:
conn = sqlite3.connect('bollywood-movies.db')
c = conn.cursor()

tables = list(pd.read_sql_query('SELECT * FROM sqlite_master WHERE type="table";', con=conn)['tbl_name'])

if 'movies' in tables:
    c.execute(f'DELETE FROM movies')
    conn.commit()
    
df_movies.to_sql(name='movies', index=False, con=conn, if_exists='append')

conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn_checker = sqlite3.connect('bollywood-movies.db')
table_to_check = 'movies'

# Check if table exists
user_tables = list(pd.read_sql_query('SELECT * FROM sqlite_master WHERE type="table";', con=conn_checker)['tbl_name'])
tc.assertTrue(table_to_check in user_tables, f'{table_to_check} does not exist in your bollywood-movies.db file!')

conn_checker.close()

▶️ Run the code below to display `df_movies`.

In [None]:
display(df_movies)

---

### 🎯 Exercise 7: Top 5 movies with the largest number of screens

#### 👇 Tasks

- ✔️ Find the top 5 movies with the largest number of screens.
- ✔️ The result of the query should have the following two columns.
    1. `movie_name`: Name of the movie
    2. `num_screens`: Number of screens
- ✔️ Store your query in a new variable named `query_top5_movies_by_screens`.
- ✔️ **Sort** your result by `num_screens` in descending order.

#### 🧭 Expected Output

|    | movie_name           |   num_screens |
|---:|:---------------------|--------------:|
|  0 | Tiger Zinda Hai      |          4600 |
|  1 | Tubelight            |          4400 |
|  2 | Sultan               |          4350 |
|  3 | Dangal               |          4250 |
|  4 | Prem Ratan Dhan Payo |          4200 |

In [None]:
# YOUR CODE BEGINS





# YOUR CODE ENDS
conn = sqlite3.connect('bollywood-movies.db')
df_result = pd.read_sql_query(query_top5_movies_by_screens, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('bollywood-movies.db')
decoded_query = base64.b64decode(b'ClNFTEVDVCBtb3ZpZV9uYW1lLCBudW1fc2NyZWVuc\
wpGUk9NIG1vdmllcwpPUkRFUiBCWSBudW1fc2NyZWVucyBERVNDCkxJTUlUIDU7Cg==').decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_frame_equal(df_result.reset_index(drop=True),
                              df_check.reset_index(drop=True))

conn.close()

---

### 🎯 Exercise 8: Get the number of movies by lead star

#### 👇 Tasks

- ✔️ Write a query that counts the number of movies by lead star from the `movies` table.
- ✔️ The result of the query should have the following two columns.
    1. `lead_star`: Name of the lead_star
    2. `num_movies`: Number of movies starred by the lead_star
- ✔️ Store your query in a new variable named `query_num_movies_by_lead_star`.
- ✔️ **Sort** your result by `num_movies` in descending order.
- ✔️ Only select the top 10 rows.

#### 🧭 Expected Output

|    | lead_star        |   num_movies |
|---:|:-----------------|-------------:|
|  0 | Akshay Kumar     |           48 |
|  1 | Ajay Devgn       |           35 |
|  2 | Salman Khan      |           27 |
|  3 | Emraan Hashmi    |           27 |
|  4 | Amitabh Bachchan |           21 |
|  5 | Shahid Kapoor    |           18 |
|  6 | Sanjay Dutt      |           18 |
|  7 | Saif Ali Khan    |           18 |
|  8 | Ranbir Kapoor    |           17 |
|  9 | John Abraham     |           17 |

In [None]:
# YOUR CODE BEGINS






# YOUR CODE ENDS
conn = sqlite3.connect('bollywood-movies.db')
df_result = pd.read_sql_query(query_num_movies_by_lead_star, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('bollywood-movies.db')
decoded_query = base64.b64decode(b'ClNFTEVDVCBsZWFkX3N0YXIsIENPVU5UKCopIEFTIG51bV9tb\
3ZpZXMKRlJPTSBtb3ZpZXMKR1JPVVAgQlkgbGVhZF9zdGFyCk9SREVSIEJZIG51bV9tb3ZpZXMgREVTQwpMSU1JVCAxMDsK').decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_frame_equal(df_result.reset_index(drop=True),
                              df_check.reset_index(drop=True))

conn.close()

---

### 🎯 Exercise 9: Average revenue and budget by genre

#### 👇 Tasks

- ✔️ Write a query that finds the average revenue and budget by genre.
- ✔️ The result of the query should have the following three columns.
    1. `genre`: Each genre
    2. `average_revenue`: Average revenue
    3. `average_budget`: Average budget
- ✔️ Store your query in a new variable named `query_genre_stats`.
- ✔️ **Sort** your result by `average_revenue` in descending order.

#### 🧭 Expected Output

|    | genre        |   average_revenue |   average_budget |
|---:|:-------------|------------------:|-----------------:|
|  0 | masala       |     1002500000.00 |    2684070937.50 |
|  1 | documentary  |      390000000.00 |     649515000.00 |
|  2 | action       |      345958661.42 |     612091863.19 |
|  3 | fantasy      |      199615384.62 |      85812019.23 |
|  4 | rom__com     |      199323157.89 |     295074342.11 |
|  5 | thriller     |      155841981.13 |     195139264.39 |
|  6 | love_story   |      154432330.83 |     256592253.29 |
|  7 | comedy       |      152713028.17 |     234431389.08 |
|  8 | drama        |      110292762.13 |     164228786.27 |
|  9 | animation    |       97500000.00 |      13300000.00 |
| 10 | horror       |       71103773.58 |      97006981.13 |
| 11 | suspense     |       38750000.00 |      20628333.33 |
| 12 | mythological |        9214285.71 |        728035.71 |
| 13 | adult        |        3957692.31 |       4591378.21 |

In [None]:
# YOUR CODE BEGINS








# YOUR CODE ENDS
# avoid scientific notations
pd.set_option('display.float_format', lambda x: '%.2f' % x)

conn = sqlite3.connect('bollywood-movies.db')
df_result = pd.read_sql_query(query_genre_stats, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('bollywood-movies.db')
decoded_query = base64.b64decode(b'ClNFTEVDVCAKICAgIGdlbnJlLAogICAgQVZHKHJldmVudWVfaW5fSU5S\
KSBBUyBhdmVyYWdlX3JldmVudWUsCiAgICBBVkcoYnVkZ2V0X2luX0lOUikgQVMgYXZlcmFnZV9idWRnZXQKRlJPTSB\
tb3ZpZXMKR1JPVVAgQlkgZ2VucmUKT1JERVIgQlkgYXZlcmFnZV9yZXZlbnVlIERFU0M7Cg==').decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
tc.assertEqual(df_result.shape, df_check.shape, 'Number of rows and/or columns is different')
pd.testing.assert_frame_equal(df_result.reset_index(drop=True),
                              df_check.reset_index(drop=True))

conn.close()