# SQL: Aggregating and filtering

## Session 2 Overview
In this session, we expand your SQL skills by introducing aggregation functions and grouping techniques. You will learn how to count records, compute averages, group data, and filter aggregated results using `HAVING`. These are essential tools for summarizing large datasets in analytics.


## Quick Recap: `SELECT` and `WHERE`
```sql
SELECT column1, column2 FROM table_name WHERE condition;
```
- `SELECT` chooses columns
- `WHERE` filters rows based on conditions

```python
import sqlite3
import pandas as pd

# Create sample data with region, platform, and sentiment score
data = [
    (1, 'Game A', 'Loved it!', 5, '2023-01-01', 'NA', 'PC', 'english'),
    (2, 'Game B', 'Buggy', 2, '2023-01-15', 'EU', 'PC', 'german'),
    (3, 'Game A', 'Amazing', 5, '2023-02-01', 'AS', 'Console', 'japanese'),
    (4, 'Game C', 'Meh', 3, '2023-03-12', 'EU', 'PC', 'french'),
    (5, 'Game A', 'Great', 4, '2023-04-05', 'NA', 'Console', 'english'),
    (6, 'Game B', 'Okay', 3, '2023-04-12', 'NA', 'PC', 'english'),
    (7, 'Game C', 'Bad', 1, '2023-05-20', 'AS', 'Console', 'koreana'),
    (8, 'Game B', 'Solid', 4, '2023-06-18', 'EU', 'PC', 'german')
]

# Create in-memory database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
cursor.execute('''
    CREATE TABLE reviews (
        review_id INTEGER,
        game_name TEXT,
        review_text TEXT,
        rating INTEGER,
        review_date TEXT,
        region TEXT,
        platform TEXT,
        language TEXT
    );
''')
cursor.executemany('INSERT INTO reviews VALUES (?, ?, ?, ?, ?, ?, ?, ?);', data)
conn.commit()
```

## Aggregation Functions: `COUNT`, `AVG`, `MIN`, `MAX`

These functions summarize data:
- `COUNT(*)`: Number of records
- `AVG(column)`: Average value
- `MIN(column)`, `MAX(column)`: Smallest/largest values

### Example: Average Rating by Game
```sql
SELECT game_name, AVG(rating) AS avg_rating
FROM reviews
GROUP BY game_name;
```

```python
pd.read_sql_query("""
SELECT game_name, AVG(rating) AS avg_rating
FROM reviews
GROUP BY game_name;
""", conn)
```


## Using `GROUP BY`
`GROUP BY` segments your data for aggregation. Each group returns one row.

### Example: Count of Reviews by Region
```sql
SELECT region, COUNT(*) AS review_count
FROM reviews
GROUP BY region;
```

```python
pd.read_sql_query("""
SELECT region, COUNT(*) AS review_count
FROM reviews
GROUP BY region;
""", conn)
```

You can group by multiple columns:
```sql
SELECT region, platform, COUNT(*) FROM reviews GROUP BY region, platform;
```


## Filtering Aggregates: `HAVING`
`HAVING` filters groups after aggregation (like `WHERE`, but for groups).

### Example: Only show games with average rating above 4
```sql
SELECT game_name, AVG(rating) AS avg_rating
FROM reviews
GROUP BY game_name
HAVING avg_rating > 4;
```

```python
pd.read_sql_query("""
SELECT game_name, AVG(rating) AS avg_rating
FROM reviews
GROUP BY game_name
HAVING avg_rating > 4;
""", conn)
```


## Mini-Challenge (10 min)
**Question**: What games receive the highest average rating per region?

### Sample Solution:
```sql
SELECT region, game_name, AVG(rating) AS avg_rating
FROM reviews
GROUP BY region, game_name
ORDER BY region, avg_rating DESC;
```

```python
pd.read_sql_query("""
SELECT region, game_name, AVG(rating) AS avg_rating
FROM reviews
GROUP BY region, game_name
ORDER BY region, avg_rating DESC;
""", conn)
```


## Exercises
1. Count the number of reviews per platform.
2. Show the minimum and maximum ratings for each game.
3. Get the average rating per language.
4. Which games have an average rating of 4 or more on PC?

```python
# Example starter:
pd.read_sql_query("""
SELECT platform, COUNT(*) FROM reviews GROUP BY platform;
""", conn)
```


## Summary
Today you learned how to:
- Use aggregation functions like `COUNT`, `AVG`, `MIN`, `MAX`
- Segment data with `GROUP BY`
- Filter grouped results with `HAVING`

Next session: **Sorting, Aliasing & Formatting Output**.
