# Ungraded Lab: Group and Filter

## 📋 Overview 
Welcome to this hands-on lab where you'll expand your knowledge of SQL queries using the BookCycle database. Building on your previous experience with basic queries, you'll now learn to group and filter data using GROUP BY and HAVING clauses. These powerful tools will help you summarize information and extract meaningful insights from large datasets.
As a data analyst at BookCycle, your task is to help the management team understand sales patterns across different store locations. By mastering these techniques, you'll be able to provide valuable insights that can drive business decisions.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:

- Use GROUP BY clauses to summarize data based on specific criteria
- Apply aggregate functions like COUNT, SUM, and AVG with GROUP BY
- Implement HAVING clauses to filter groups based on aggregate conditions
- Write complex queries that combine GROUP BY and HAVING to extract meaningful insights

## 📚 Dataset Information
We'll be working with the 'transactions' table from the BookCycle database. This table contains information about book sales across different store locations, including transaction details, customer information, and sales data.


## 🖥️ Activities

### Activity 1: Setting Up and Basic Grouping 

Before we begin complex grouping, let's set up our environment and start with a simple GROUP BY query.

<b>Step 1:</b> Import the necessary libraries and connect to the database:

In [1]:
import sqlite3
import pandas as pd

# Setting up the database. DO NOT edit the code given below
from db_setup import setup_database
setup_database() 

✅ Database setup complete: Tables created and populated with data!


In [2]:
# Connect to the SQLite database
conn = sqlite3.connect('bookcycle.db')

# Function to execute SQL queries and display results
def run_query(query):
    return pd.read_sql_query(query, conn)

<b>Step 2:</b> Let's start by counting the number of transactions for each store location:

In [3]:
query = """
SELECT store_location, COUNT(*) as transaction_count
FROM transactions
GROUP BY store_location;
"""

result = run_query(query)
display(result)

Unnamed: 0,store_location,transaction_count
0,Downtown,24
1,Suburban,26
2,University,50


 <b>💡 Tip:</b> The GROUP BY clause groups rows that have the same values in specified columns into summary rows.

<b>Step 3: Try it yourself:</b> Write a query to find the total sales (sum of sale_price) for each payment method:

In [None]:
query = """
<YOUR CODE HERE>
"""

result = run_query(query)
display(result)

### Activity 2: Using Multiple Aggregate Functions 

Often, we need to calculate multiple aggregates for each group. Let's explore how to do this.

<b>Step 1:</b> Let's calculate the total sales, average sale price, and number of transactions for each store location:

In [4]:
query = """
SELECT 
    store_location,
    SUM(sale_price) as total_sales,
    AVG(sale_price) as avg_sale_price,
    COUNT(*) as transaction_count
FROM transactions
GROUP BY store_location;

"""

result = run_query(query)
display(result)

Unnamed: 0,store_location,total_sales,avg_sale_price,transaction_count
0,Downtown,241.76,10.073333,24
1,Suburban,286.74,11.028462,26
2,University,586.5,11.73,50


 <b>💡 Tip:</b> You can use multiple aggregate functions in a single GROUP BY query.

<b>Step 2: Try it yourself:</b> Write a query to find the minimum, maximum, and average sale price for each payment method:

In [None]:
query = """
<YOUR CODE HERE>
"""

result = run_query(query)
display(result)

### Activity 3: Filtering Groups with HAVING 

Sometimes we need to filter groups based on aggregate values. This is where the HAVING clause comes in handy.

<b>Step 1:</b> Let's find store locations with more than 10 transactions:

In [5]:
query = """
SELECT 
    store_location,
    COUNT(*) as transaction_count
FROM transactions
GROUP BY store_location
HAVING transaction_count > 10;
"""

result = run_query(query)
display(result)

Unnamed: 0,store_location,transaction_count
0,Downtown,24
1,Suburban,26
2,University,50


 <b>💡 Tip:</b> HAVING is used to filter groups, while WHERE filters individual rows before grouping.

<b>Step 2: Try it yourself:</b> Write a query to find payment methods with an average sale price greater than $10:

In [None]:
query = """
<YOUR CODE HERE>
"""

result = run_query(query)
display(result)

#### Close the Connection
It's good practice to close the database connection when you're done

In [6]:
# Close the database connection 
conn.close()

## ✅ Success Checklist
- You can group data using GROUP BY
- You can apply multiple aggregate functions in a single query
- You can filter grouped data using HAVING
- Your queries run without errors


## 🔍 Common Issues & Solutions 

- Problem: Syntax error in GROUP BY clause
    - Solution: Ensure all non-aggregated columns in the SELECT statement are included in the GROUP BY clause

- Problem: HAVING clause not working as expected 
    - Solution: Remember that HAVING filters groups after they're formed, while WHERE filters individual rows before grouping

## ➡️ Summary
Great job completing this lab! You've learned how to group and filter data effectively using SQL, which are crucial skills for any data analyst. Keep practicing these concepts to become more proficient in SQL data analysis.
### 🔑 Key Points
- GROUP BY is used to group rows with similar values in specified columns
- Aggregate functions like SUM, AVG, COUNT can be used with GROUP BY
- HAVING is used to filter groups based on aggregate conditions