# Aggregations in SQLite

Just like in pandas, you can use SQL to perform aggregations on your data. Common aggregation functions include `COUNT()`, `SUM()`, `AVG()`, `MIN()`, and `MAX()`.

The syntax for using these functions is straightforward. Here are some examples:

```sql
-- Count the number of listings
SELECT COUNT(*) AS total_listings
FROM listings;
```

```sql
-- Calculate the average price of listings
SELECT AVG(price) AS average_price
FROM listings;
```


▶️ Import `pandas`, `numpy`, and `sqlite3`.


In [2]:
import pandas as pd
import numpy as np
import sqlite3

---

## 🆚 Comparing Pandas and SQL Aggregations using Airbnb Listings

In this section, we will compare how to perform similar tasks using both pandas and SQL. We will use a dataset of Airbnb listings for this comparison.


▶️ Create a DataFrame named `df_listings`.

The DataFrame contains Boston Airbnb listings data.


In [3]:
df_listings = pd.read_csv(
    "https://github.com/bdi475/datasets/raw/main/boston-airbnb-listings-small.csv"
)
df_listings_backup = df_listings.copy()

display(df_listings.head(5))

Unnamed: 0,name,neighbourhood,room_type,bedrooms,bathrooms,minimum_nights,price,availability_365,number_of_reviews,review_score,is_superhost
0,HARBORSIDE-Walk to subway,East Boston,Entire home/apt,1.0,1.0,28,150.0,123,17,99.0,0
1,**$49 Special ** Private! Minutes to center!,Roxbury,Entire home/apt,1.0,1.0,3,145.0,0,107,95.0,1
2,$99 Special!! Home Away! Condo,Roxbury,Entire home/apt,1.0,1.0,3,169.0,0,115,96.0,1
3,Bright 1bed facing Golden Dome,Downtown,Entire home/apt,1.0,1.0,91,81.0,281,32,96.0,1
4,The Dorset Redline | 3BR 1BA | Walk to Redline...,Dorchester,Entire home/apt,3.0,1.0,32,129.0,103,52,86.0,0


▶️ Populate the `listings` table. This will insert the data from the `df_listings` DataFrame into the `listings` table in the SQLite database.


In [4]:
conn = sqlite3.connect("airbnb-boston-medium.db")
c = conn.cursor()

tables = list(
    pd.read_sql_query("SELECT * FROM sqlite_master WHERE type='table';", con=conn)[
        "tbl_name"
    ]
)

if "listings" in tables:
    c.execute(f"DELETE FROM listings")
    conn.commit()

df_listings.to_sql(name="listings", index=False, con=conn, if_exists="append")

1339

▶️ Display the first 5 rows of the `df_listings` DataFrame.


In [5]:
display(df_listings.head())

Unnamed: 0,name,neighbourhood,room_type,bedrooms,bathrooms,minimum_nights,price,availability_365,number_of_reviews,review_score,is_superhost
0,HARBORSIDE-Walk to subway,East Boston,Entire home/apt,1.0,1.0,28,150.0,123,17,99.0,0
1,**$49 Special ** Private! Minutes to center!,Roxbury,Entire home/apt,1.0,1.0,3,145.0,0,107,95.0,1
2,$99 Special!! Home Away! Condo,Roxbury,Entire home/apt,1.0,1.0,3,169.0,0,115,96.0,1
3,Bright 1bed facing Golden Dome,Downtown,Entire home/apt,1.0,1.0,91,81.0,281,32,96.0,1
4,The Dorset Redline | 3BR 1BA | Walk to Redline...,Dorchester,Entire home/apt,3.0,1.0,32,129.0,103,52,86.0,0


---

### 📌 Example 1: Select all rows and columns


▶️ Use pandas.


In [6]:
display(df_listings)

Unnamed: 0,name,neighbourhood,room_type,bedrooms,bathrooms,minimum_nights,price,availability_365,number_of_reviews,review_score,is_superhost
0,HARBORSIDE-Walk to subway,East Boston,Entire home/apt,1.0,1.0,28,150.0,123,17,99.0,0
1,**$49 Special ** Private! Minutes to center!,Roxbury,Entire home/apt,1.0,1.0,3,145.0,0,107,95.0,1
2,$99 Special!! Home Away! Condo,Roxbury,Entire home/apt,1.0,1.0,3,169.0,0,115,96.0,1
3,Bright 1bed facing Golden Dome,Downtown,Entire home/apt,1.0,1.0,91,81.0,281,32,96.0,1
4,The Dorset Redline | 3BR 1BA | Walk to Redline...,Dorchester,Entire home/apt,3.0,1.0,32,129.0,103,52,86.0,0
...,...,...,...,...,...,...,...,...,...,...,...
1334,Quiet and Sunny Top Floor Studio! South End LOVE!,South End,Entire home/apt,1.0,1.0,1,92.0,328,14,77.0,0
1335,FOUND Boston Common - Standard Full Room,Bay Village,Private room,1.0,1.0,1,100.0,253,53,90.0,0
1336,FOUND Boston Common - Queen Room,Downtown,Private room,1.0,1.0,1,114.0,254,49,89.0,0
1337,FOUND Boston Common - Double Queen Room,Downtown,Private room,1.0,1.0,1,143.0,252,43,93.0,0


▶️ Use SQLite.


In [7]:
query_select_all = """
SELECT *
FROM listings;
"""

df_result = pd.read_sql_query(query_select_all, con=conn)
display(df_result)

Unnamed: 0,name,neighbourhood,room_type,bedrooms,bathrooms,minimum_nights,price,availability_365,number_of_reviews,review_score,is_superhost
0,HARBORSIDE-Walk to subway,East Boston,Entire home/apt,1.0,1.0,28,150.0,123,17,99.0,0
1,**$49 Special ** Private! Minutes to center!,Roxbury,Entire home/apt,1.0,1.0,3,145.0,0,107,95.0,1
2,$99 Special!! Home Away! Condo,Roxbury,Entire home/apt,1.0,1.0,3,169.0,0,115,96.0,1
3,Bright 1bed facing Golden Dome,Downtown,Entire home/apt,1.0,1.0,91,81.0,281,32,96.0,1
4,The Dorset Redline | 3BR 1BA | Walk to Redline...,Dorchester,Entire home/apt,3.0,1.0,32,129.0,103,52,86.0,0
...,...,...,...,...,...,...,...,...,...,...,...
1334,Quiet and Sunny Top Floor Studio! South End LOVE!,South End,Entire home/apt,1.0,1.0,1,92.0,328,14,77.0,0
1335,FOUND Boston Common - Standard Full Room,Bay Village,Private room,1.0,1.0,1,100.0,253,53,90.0,0
1336,FOUND Boston Common - Queen Room,Downtown,Private room,1.0,1.0,1,114.0,254,49,89.0,0
1337,FOUND Boston Common - Double Queen Room,Downtown,Private room,1.0,1.0,1,143.0,252,43,93.0,0


:::{hint} What does the semicolon (`;`) do in SQL?

The semicolon (`;`) is used to terminate a SQL statement. This is especially important when executing multiple SQL statements in a single execution block, as it helps the SQL interpreter understand where one statement ends and the next begins.

If you're only running a single SQL statement, the semicolon is optional in many SQL environments. But it is a good practice to include it for clarity and to avoid syntax errors in some systems.

```sql
-- Because this is a single statement, the semicolon is optional here
SELECT * FROM listings
```

```sql
-- But if you have multiple statements, you need to use semicolons to separate them
INSERT INTO my_table (column1, column2) VALUES ('value1', 'value2');
UPDATE my_table SET column1 = 'new_value' WHERE column2 = 'value2';
DELETE FROM my_table WHERE column2 = 'value2';
```

:::


---

### 📌 Example 2: Find expensive listings

Find listings with a price greater than $1,000.


▶️ Use pandas.


In [8]:
df_listings[df_listings["price"] > 1000]

Unnamed: 0,name,neighbourhood,room_type,bedrooms,bathrooms,minimum_nights,price,availability_365,number_of_reviews,review_score,is_superhost
502,Large 4BDR near Harvard with Parking,Allston,Entire home/apt,4.0,2.5,1,1250.0,0,136,98.0,0
669,The Historic House in the North End/Waterfront,North End,Entire home/apt,3.0,2.0,91,1052.0,0,104,98.0,1


▶️ Use SQLite.


In [9]:
query_listings_above_1000 = """
SELECT *
FROM listings
WHERE price > 1000
"""

df_result = pd.read_sql_query(query_listings_above_1000, con=conn)
display(df_result)

Unnamed: 0,name,neighbourhood,room_type,bedrooms,bathrooms,minimum_nights,price,availability_365,number_of_reviews,review_score,is_superhost
0,Large 4BDR near Harvard with Parking,Allston,Entire home/apt,4.0,2.5,1,1250.0,0,136,98.0,0
1,The Historic House in the North End/Waterfront,North End,Entire home/apt,3.0,2.0,91,1052.0,0,104,98.0,1


---

### 📌 Example 3: Find large listings

Filter listings with 5 or more bedrooms AND 3 or more bathrooms. Select only the `"name"`, `"bedrooms"`, `"bathrooms"`, `"price"`, `"review_score"` columns.


▶️ Use pandas.


In [10]:
df_large_listings = df_listings[
    (df_listings["bedrooms"] >= 5) & (df_listings["bathrooms"] >= 3)
]
df_large_listings = df_large_listings[
    ["name", "bedrooms", "bathrooms", "price", "review_score"]
]

display(df_large_listings)

Unnamed: 0,name,bedrooms,bathrooms,price,review_score
151,Beautiful Philadelphia house near Harvard U. w...,5.0,3.0,600.0,82.0
250,Spacious 4 BR | 2.5 BA Single-Family Home.,5.0,3.0,373.0,97.0
921,★Large Retreat 5BR w/3BA★ Close to Everything,5.0,3.0,738.0,95.0


▶️ Use SQLite.


In [11]:
query_large_listings = """
SELECT name, bedrooms, bathrooms, price, review_score

FROM listings
WHERE (bedrooms >= 5) AND (bathrooms >= 3)
"""

df_result = pd.read_sql_query(query_large_listings, con=conn)
display(df_result)

Unnamed: 0,name,bedrooms,bathrooms,price,review_score
0,Beautiful Philadelphia house near Harvard U. w...,5.0,3.0,600.0,82.0
1,Spacious 4 BR | 2.5 BA Single-Family Home.,5.0,3.0,373.0,97.0
2,★Large Retreat 5BR w/3BA★ Close to Everything,5.0,3.0,738.0,95.0


---

### 📌 Example 4: Find the number of listings


▶️ Use pandas.


In [12]:
num_listings = df_listings.shape[0]

print(f"There are {num_listings} Airbnb listings in the dataset.")

There are 1339 Airbnb listings in the dataset.


▶️ Use SQLite.


In [13]:
query_num_listings = """
SELECT COUNT(*)
FROM listings
"""

df_result = pd.read_sql_query(query_num_listings, con=conn)
display(df_result)

Unnamed: 0,COUNT(*)
0,1339


:::{tip} What does the asterisk (\*) mean in SQL?

The asterisk (_) in SQL is a wildcard character that represents all columns in a table. When you use `SELECT _`, it means you want to retrieve all columns from the specified table. For example, `SELECT _ FROM listings;`will return all columns for every row in the`listings`table. When you use it with aggregation functions like`COUNT(_)`, it counts all rows in the table.

If you specify a column name instead of an asterisk, such as `SELECT COUNT(price)`, it will count only the rows where the `price` column is not NULL.

Alternatively, you can use `SELECT COUNT(1)` which counts all rows in the table, similar to `COUNT(*)`. The choice between `COUNT(*)` and `COUNT(1)` is often a matter of preference, as they generally yield the same result. Historically, older database systems had performance differences between the two, but in modern databases, they are optimized to perform similarly.

:::


---

### 📌 Example 5: Find the number of listings by neighbourhood

Find the number of listings in each neighbourhood. Sort the results by the number of listings in descending order. Display only the top 5 neighbourhoods.


▶️ Use pandas.


In [14]:
df_by_neighbourhood = df_listings.groupby("neighbourhood", as_index=False).agg(
    {"name": "count"}
)

df_by_neighbourhood.rename(columns={"name": "num_listings"}, inplace=True)
df_by_neighbourhood.sort_values("num_listings", ascending=False, inplace=True)

df_by_neighbourhood.head(5)

Unnamed: 0,neighbourhood,num_listings
7,Dorchester,215
12,Jamaica Plain,129
19,Roxbury,107
9,East Boston,90
22,South End,86


▶️ Use SQLite.


In [15]:
query_num_listings_by_neighbourhood = """
SELECT neighbourhood, COUNT(*) AS num_listings
FROM listings
GROUP BY neighbourhood
ORDER BY num_listings DESC
LIMIT 5;
"""

df_result = pd.read_sql_query(query_num_listings_by_neighbourhood, con=conn)
display(df_result)

Unnamed: 0,neighbourhood,num_listings
0,Dorchester,215
1,Jamaica Plain,129
2,Roxbury,107
3,East Boston,90
4,South End,86


---

### 📌 Example 6: Calculate the average price by room type

Calculate the average price for each room type. Sort the results by price in ascending order.


▶️ Use pandas.


In [16]:
df_price_by_room_type = df_listings.groupby("room_type", as_index=False).agg(
    {"price": "mean"}
)

df_price_by_room_type.sort_values("price", inplace=True)

df_price_by_room_type

Unnamed: 0,room_type,price
3,Shared room,34.75
2,Private room,82.227848
0,Entire home/apt,201.13489
1,Hotel room,206.272727


▶️ Use SQLite.


In [17]:
query_price_by_room_type = """
SELECT room_type, AVG(price) as price
FROM listings
GROUP BY room_type
ORDER BY price
"""

df_result = pd.read_sql_query(query_price_by_room_type, con=conn)
display(df_result)

Unnamed: 0,room_type,price
0,Shared room,34.75
1,Private room,82.227848
2,Entire home/apt,201.13489
3,Hotel room,206.272727


---

## 🎬 More Aggregation Examples

In this section, we will work with a [Bollywood Movies Dataset](https://data.mendeley.com/datasets/3c57btcxy9/1) that includes all 1698 Hindi language movies released in India between 2005 and 2017 from the website of Box Office India.

Source: [Mendeley Data](https://data.mendeley.com/datasets/3c57btcxy9/1)


▶️ Create a DataFrame named `df_movies` from a CSV file.


In [18]:
df_movies = pd.read_csv(
    "https://github.com/bdi475/datasets/raw/main/bollywood-movies.csv"
)
df_movies_backup = df_movies.copy()

display(df_movies.head(5))

Unnamed: 0,movie_name,release_period,is_remake,is_franchise,genre,is_new_actor,is_new_director,is_new_music_director,lead_star,director,music_director,num_screens,revenue_in_INR,budget_in_INR
0,Golden Boys,Normal,No,No,suspense,Yes,No,No,Jeet Goswami,Ravi Varma,Baba Jagirdar,5,5000000,85000
1,Kaccha Limboo,Holiday,No,No,drama,Yes,No,Yes,Karan Bhanushali,Sagar Ballary,Amardeep Nijjer,75,15000000,825000
2,Not A Love Story,Holiday,No,No,thriller,No,No,No,Mahie Gill,Ram Gopal Verma,Sandeep Chowta,525,75000000,56700000
3,Qaidi Band,Holiday,No,No,drama,Yes,No,No,Aadar Jain,Habib Faisal,Amit Trivedi,800,210000000,4500000
4,Chaatwali,Holiday,No,No,adult,Yes,Yes,Yes,Aadil Khan,Aadil Khan,Babloo Ustad,1,1000000,1075000


▶️ Run the code below to populate the `movies` table. This will insert the data from the `df_movies` DataFrame into the `movies` table in the SQLite database.


In [19]:
conn = sqlite3.connect("bollywood-movies.db")
c = conn.cursor()

tables = list(
    pd.read_sql_query('SELECT * FROM sqlite_master WHERE type="table";', con=conn)[
        "tbl_name"
    ]
)

if "movies" in tables:
    c.execute(f"DELETE FROM movies")
    conn.commit()

df_movies.to_sql(name="movies", index=False, con=conn, if_exists="append")

1698

▶️ Display all rows and columns of the `movies` table.


In [20]:
pd.read_sql_query("SELECT * FROM movies", con=conn)

Unnamed: 0,movie_name,release_period,is_remake,is_franchise,genre,is_new_actor,is_new_director,is_new_music_director,lead_star,director,music_director,num_screens,revenue_in_INR,budget_in_INR
0,Golden Boys,Normal,No,No,suspense,Yes,No,No,Jeet Goswami,Ravi Varma,Baba Jagirdar,5,5000000,85000
1,Kaccha Limboo,Holiday,No,No,drama,Yes,No,Yes,Karan Bhanushali,Sagar Ballary,Amardeep Nijjer,75,15000000,825000
2,Not A Love Story,Holiday,No,No,thriller,No,No,No,Mahie Gill,Ram Gopal Verma,Sandeep Chowta,525,75000000,56700000
3,Qaidi Band,Holiday,No,No,drama,Yes,No,No,Aadar Jain,Habib Faisal,Amit Trivedi,800,210000000,4500000
4,Chaatwali,Holiday,No,No,adult,Yes,Yes,Yes,Aadil Khan,Aadil Khan,Babloo Ustad,1,1000000,1075000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1693,Fight Club,Holiday,No,No,action,No,Yes,No,Zayed Khan,Vikram Chopra,Pritam,375,82500000,88862500
1694,Strings Of Paasion,Normal,No,No,drama,No,Yes,Yes,Zeenat Aman,Sanghamitra Chaudhuri,Dev Sikdar,10,8000000,70000
1695,Dunno Y Na Jaane Kyun,Normal,No,No,drama,No,No,No,Zeenat Aman,Sanjay Sharma,Nikhil,20,12500000,850000
1696,Taj Mahal - An Eternal Love Story,Normal,No,No,drama,No,Yes,No,Zulfi Sayed,Akbar Khan,Naushad,135,100000000,31065000


---

### 📌 Example 7: Find the top 5 movies with the largest number of screens

Find the top 5 movies with the largest number of screens. Only show the movie name and number of screens. Sort the results by number of screens in descending order.


In [21]:
query_top5_movies_by_screens = """
SELECT movie_name, num_screens
FROM movies
ORDER BY num_screens DESC
LIMIT 5;
"""

df_result = pd.read_sql_query(query_top5_movies_by_screens, con=conn)
display(df_result)

Unnamed: 0,movie_name,num_screens
0,Tiger Zinda Hai,4600
1,Tubelight,4400
2,Sultan,4350
3,Dangal,4250
4,Prem Ratan Dhan Payo,4200


---

### 📌 Example 8: Find the top 10 lead stars by number of movies

Find the number of movies for each lead star. Sort the results by the number of movies in descending order. Only show the top 10 lead stars.


In [22]:
query_num_movies_by_lead_star = """
SELECT lead_star, COUNT(*) AS num_movies
FROM movies
GROUP BY lead_star
ORDER BY num_movies DESC
LIMIT 10;
"""

df_result = pd.read_sql_query(query_num_movies_by_lead_star, con=conn)
display(df_result)

Unnamed: 0,lead_star,num_movies
0,Akshay Kumar,48
1,Ajay Devgn,35
2,Salman Khan,27
3,Emraan Hashmi,27
4,Amitabh Bachchan,21
5,Shahid Kapoor,18
6,Sanjay Dutt,18
7,Saif Ali Khan,18
8,Ranbir Kapoor,17
9,John Abraham,17


---

### 📌 Example 9: Calculate statistics by genre

Find the number of movies, average revenue, and average budget for each genre. Sort the results by average revenue in descending order.


▶️ Use SQLite.


In [None]:
query_genre_stats = """
SELECT 
    genre,
    COUNT(*) AS num_movies,
    AVG(revenue_in_INR) AS average_revenue,
    AVG(budget_in_INR) AS average_budget
FROM movies
GROUP BY genre
ORDER BY average_revenue DESC;
"""

# Avoid scientific notations
# This is to improve readability of large numbers in the output
pd.set_option("display.float_format", lambda x: "%.2f" % x)

df_result = pd.read_sql_query(query_genre_stats, con=conn)
display(df_result)

Unnamed: 0,genre,num_movies,average_revenue,average_budget
0,masala,16,1002500000.0,2684070937.5
1,documentary,1,390000000.0,649515000.0
2,action,127,345958661.42,612091863.19
3,fantasy,13,199615384.62,85812019.23
4,rom__com,95,199323157.89,295074342.11
5,thriller,212,155841981.13,195139264.39
6,love_story,133,154432330.83,256592253.29
7,comedy,284,152713028.17,234431389.08
8,drama,639,110292762.13,164228786.27
9,animation,3,97500000.0,13300000.0


---

### 📌 Example 10: Calculate statistics by genre and release period

Find the number of movies, average revenue, and average budget for each genre and release period. Sort the results by average revenue in descending order.

Exclude remakes and only include groups with at least 20 movies.


▶️ Use SQLite.


In [None]:
query_genre_release_period_stats = """
SELECT 
    genre,
    release_period,
    COUNT(*) AS num_movies,
    AVG(revenue_in_INR) AS average_revenue,
    AVG(budget_in_INR) AS average_budget
FROM movies
WHERE is_remake = 'No'
GROUP BY genre, release_period
HAVING num_movies >= 20
ORDER BY average_revenue DESC;
"""

df_result = pd.read_sql_query(query_genre_release_period_stats, con=conn)
display(df_result)

Unnamed: 0,genre,release_period,num_movies,average_revenue,average_budget
0,action,Holiday,51,398083333.33,756808203.43
1,rom__com,Holiday,31,247338709.68,287379153.23
2,action,Normal,57,209552631.58,294643258.77
3,comedy,Holiday,88,197420454.55,322201593.75
4,thriller,Holiday,77,183993506.49,239388675.97
5,love_story,Holiday,45,179222222.22,324451444.44
6,rom__com,Normal,63,174574603.17,297420019.84
7,love_story,Normal,83,134512048.19,207346231.18
8,thriller,Normal,125,125468000.0,153630748.0
9,drama,Holiday,223,123428251.12,192415071.52


In this example, we passed two columns to the `GROUP BY` clause: `genre` and `release_period`. This means that the aggregation functions (`COUNT()`, `AVG()`) will be calculated for each unique combination of `genre` and `release_period`.

:::{seealso} `HAVING` is used to filter groups after aggregation

This example also demonstrates the use of the `HAVING` clause to filter groups after aggregation. The `HAVING` clause is used to filter groups based on a condition, similar to how the `WHERE` clause is used to filter rows before aggregation. The `HAVING` clause is applied after the `GROUP BY` result, allowing you to filter the results of the aggregation. See the next section for more details.

:::


---

## 🕵️‍♀️ `WHERE` (row-level filter) vs `HAVING` (group-level filter)

The `WHERE` clause is used to filter rows **before** any `GROUP BY` operations, while the `HAVING` clause is used to filter groups **after** a `GROUP BY` operation has been applied.

You cannot use `WHERE` to filter aggregated results; for that, you must use `HAVING`. For example, you can only filter with `WHERE is_remake = "NO"` before the aggregation.

Similarly, you can only filter with `HAVING num_movies >= 20` after the aggregation. This is because the `num_movies` is calculated during the aggregation process and is not available for filtering until after the grouping has been completed.


We can see the difference between `WHERE` and `HAVING` clauses by applying them to the same column. In the example below, we use both `WHERE` and `HAVING` to filter the results based on the `budget_in_INR` column.


▶️ First, we use the `WHERE` clause to filter out movies with a budget greater than ₹2 million **before** the aggregation.

This means that only movies with a budget greater than ₹2 million will be considered in the aggregation functions (`COUNT()`, `AVG()`). The filtering is done at the row level before any grouping occurs.


In [30]:
query_using_where = """
SELECT 
    genre,
    AVG(budget_in_INR) AS avg_budget
FROM movies
WHERE budget_in_INR > 2000000
GROUP BY genre;
"""

df_result = pd.read_sql_query(query_using_where, con=conn)
display(df_result)

Unnamed: 0,genre,avg_budget
0,action,777194696.25
1,adult,15446250.0
2,animation,13300000.0
3,comedy,293087828.19
4,documentary,649515000.0
5,drama,280903908.98
6,fantasy,111393375.0
7,horror,131508333.33
8,love_story,366662819.23
9,masala,2684070937.5


▶️ Next, we use the `HAVING` clause to filter out groups where the average budget is greater than ₹2 million after the aggregation.

This means that all movies are considered in the aggregation functions, but only those groups where the average budget exceeds ₹2 million are included in the final result. The filtering is done at the group level after the aggregation has been completed.


In [None]:
query_using_HAVING = """
SELECT 
    genre,
    AVG(budget_in_INR) AS avg_budget
FROM movies
GROUP BY genre
HAVING AVG(budget_in_INR) > 2000000;
"""

df_result = pd.read_sql_query(query_using_HAVING, con=conn)
display(df_result)

Unnamed: 0,genre,avg_budget
0,action,612091863.19
1,adult,4591378.21
2,animation,13300000.0
3,comedy,234431389.08
4,documentary,649515000.0
5,drama,164228786.27
6,fantasy,85812019.23
7,horror,97006981.13
8,love_story,256592253.29
9,masala,2684070937.5


The "mythological" genre is excluded in the first query because none of its movies have a budget greater than ₹2 million. However, it appears in the second query because its average budget across all movies exceeds ₹2 million.
