## Combining Data: JOINs

In real-world datasets, information is often spread across multiple tables. Combining these tables meaningfully is essential for analysis. This session introduces the concept of JOINs in both SQL and Python (pandas), focusing on two essential types: `INNER JOIN` and `LEFT JOIN`. ## Visual Explanation of JOINs

A JOIN operation links rows from two tables based on a common key. The most commonly used types are:

- **INNER JOIN**: Returns only the rows where there's a match in both tables.
- **LEFT JOIN**: Returns all rows from the left table, and matched rows from the right. If there's no match, the result is NULL (or NaN in pandas).

There are of course many other types of joins, as can be seen from the image below.

<div style="text-align:center">
<img src="/Users/sergedegossondevarennes/Documents/repositories/data-science-course-UR/Course A/resources_course_A/joins_sql.png" alt= "Joins" width="600">
</div>


In [4]:
import sqlite3
import pandas as pd 

conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

cursor.execute('''
    CREATE TABLE reviews (
        review_id INTEGER,
        user_id TEXT,
        game_id INTEGER,
        review_text TEXT
    )
''')
cursor.executemany('INSERT INTO reviews VALUES (?, ?, ?, ?)', [
    (1, 'A01', 101, 'Great game!'),
    (2, 'A02', 102, 'Too buggy'),
    (3, 'A03', 101, 'Loved the visuals'),
    (4, 'A04', 103, 'Needs improvement')
])

cursor.execute('''
    CREATE TABLE users (
        user_id TEXT,
        segment TEXT
    )
''')
cursor.executemany('INSERT INTO users VALUES (?, ?)', [
    ('A01', 'casual'),
    ('A02', 'hardcore'),
    ('A03', 'moderate')
])

cursor.execute('''
    CREATE TABLE games (
        game_id INTEGER,
        game_name TEXT,
        platform TEXT
    )
''')
cursor.executemany('INSERT INTO games VALUES (?, ?, ?)', [
    (101, 'Galaxy Wars', 'PC'),
    (102, 'Medieval Quest', 'Console')
])

conn.commit()


## Example 1: INNER JOIN (Reviews with User Segments)

Join reviews with user segments. Only include reviews where user information exists.

In [5]:
query = """
SELECT r.*, u.segment
FROM reviews r
INNER JOIN users u ON r.user_id = u.user_id;
"""
df_inner = pd.read_sql(query, conn)
display(df_inner)

Unnamed: 0,review_id,user_id,game_id,review_text,segment
0,1,A01,101,Great game!,casual
1,2,A02,102,Too buggy,hardcore
2,3,A03,101,Loved the visuals,moderate


## Example 2: LEFT JOIN in SQL (Reviews with game metadata)

Join reviews with game metadata, but keep all reviews even if some game details are missing.


In [6]:
query = """
SELECT r.*, g.game_name, g.platform
FROM reviews r
LEFT JOIN games g ON r.game_id = g.game_id;
"""
df_left = pd.read_sql(query, conn)
display(df_left)

Unnamed: 0,review_id,user_id,game_id,review_text,game_name,platform
0,1,A01,101,Great game!,Galaxy Wars,PC
1,2,A02,102,Too buggy,Medieval Quest,Console
2,3,A03,101,Loved the visuals,Galaxy Wars,PC
3,4,A04,103,Needs improvement,,


## Exercise: Combine Review Data

### On sample data

**Task:**
1. Perform a LEFT JOIN from `reviews` to `games`.
2. Perform an INNER JOIN from `reviews` to `users`.
3. Try chaining both JOINs to include user segments and game info.

**SQL Tips:**
Use `JOIN ON column_name` syntax. Ensure you understand which table provides the base rows.

**Bonus:** Use `pandas.read_sql()` to fetch and display results.

### On Snowflake data

1. The tables ```PDX_EXPERIMENTS.UR_DS_COURSE_MATERIAL.VICTORIA3_EVENTS``` and ```PDX_EXPERIMENTS.UR_DS_COURSE_MATERIAL.VICTORIA3_GAME_RULES``` contain information about player's playthroughs and behaviors. In ```PDX_EXPERIMENTS.UR_DS_COURSE_MATERIAL.VICTORIA3_EVENTS``` the game telemetry is recorded an all the data necessary to understand the behavior and choices made by the player is know. One of the values in the column ```PAYLOAD_

## Recap

- **JOINs** help combine data across tables using a common key.
- **INNER JOIN** keeps only matching rows.
- **LEFT JOIN** keeps all rows from the left table.
- These concepts are central to both SQL and pandas-based data analysis.
