# Creating JOINs in PostgreSQL with Python

This notebook covers the fundamentals of JOINs using Python and PostgreSQL database.

**Topics Covered**

1. Simple Left JOINs (1-1)
2. Normal Left JOIN (1-Many)
3. Complex Left JOIN (Many - Many)
4. Left JOIN with Temp Tables (Many - Many)
5. Inner JOIN
6. Outer JOIN
    
Each section includes examples and explanations. In the end, you'll find an exercise to apply the concepts learned.

In [None]:

import psycopg2
import pandas as pd
import matplotlib.pyplot as plt
import itertools

# Connecting to the PostgreSQL Database

# We'll establish a connection to the PostgreSQL database using psycopg2 library.

DB_HOST = 'test-db-sql-class.cnct5qiopjti.us-east-1.rds.amazonaws.com'
DB_PORT = 5432
DB_NAME = 'students'
DB_USER = 'student'
DB_PASSWORD = 'Password123$'
STUDENT_NAME_TABLE = 'default'  # CHANGE THIS!

try:
    conn = psycopg2.connect(host=DB_HOST, port=DB_PORT, database=DB_NAME, user=DB_USER, password=DB_PASSWORD)
    print("Connected to the database.")
except psycopg2.Error as e:
    print("Unable to connect to the database.")
    print(e)


## 0. Setup

In [None]:
cursor = conn.cursor()

# Creating tables (drinks, countries, drink_reviews, ingredients, drink_ingredients)
cursor.execute(f"""
DROP TABLE IF EXISTS drinks_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS countries_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS drink_reviews_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS ingredients_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS drink_ingredients_{STUDENT_NAME_TABLE} cascade;
CREATE TABLE IF NOT EXISTS drinks_{STUDENT_NAME_TABLE} (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    alcohol_content FLOAT,
    country_id INTEGER
);

CREATE TABLE IF NOT EXISTS countries_{STUDENT_NAME_TABLE} (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    famous_for VARCHAR(255)
);

CREATE TABLE IF NOT EXISTS drink_reviews_{STUDENT_NAME_TABLE} (
    id SERIAL PRIMARY KEY,
    drink_id INTEGER REFERENCES drinks_{STUDENT_NAME_TABLE}(id),
    review TEXT
);

CREATE TABLE IF NOT EXISTS ingredients_{STUDENT_NAME_TABLE} (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255)
);

CREATE TABLE IF NOT EXISTS drink_ingredients_{STUDENT_NAME_TABLE} (
    drink_id INTEGER REFERENCES drinks_{STUDENT_NAME_TABLE}(id),
    ingredient_id INTEGER REFERENCES ingredients_{STUDENT_NAME_TABLE}(id),
    PRIMARY KEY (drink_id, ingredient_id)
);
""")
conn.commit()

In [None]:
# Inserting data into 'countries'
cursor.execute(f"""
INSERT INTO countries_{STUDENT_NAME_TABLE} (name, famous_for) VALUES
('France', 'Wine'),
('Germany', 'Beer'),
('Scotland', 'Whisky')
ON CONFLICT DO NOTHING;
""")

# Inserting data into 'drinks'
cursor.execute(f"""
INSERT INTO drinks_{STUDENT_NAME_TABLE} (name, alcohol_content, country_id) VALUES
('Bordeaux', 12.5, 1),
('Berlin Beer', 5.5, 2),
('Scotch Whisky', 40.0, 3)
ON CONFLICT DO NOTHING;
""")

# Inserting data into 'drink_reviews'
cursor.execute(f"""
INSERT INTO drink_reviews_{STUDENT_NAME_TABLE} (drink_id, review) VALUES
(1, 'Excellent taste'),
(1, 'Too dry for my liking'),
(2, 'Perfect bitterness'),
(3, 'Smooth and strong')
ON CONFLICT DO NOTHING;
""")

# Inserting data into 'ingredients' and 'drink_ingredients'
cursor.execute(f"""
INSERT INTO ingredients_{STUDENT_NAME_TABLE} (name) VALUES
('Grapes'),
('Barley'),
('Water'),
('Yeast')
ON CONFLICT DO NOTHING;
""")

cursor.execute(f"""
INSERT INTO drink_ingredients_{STUDENT_NAME_TABLE} (drink_id, ingredient_id) VALUES
(1, 1),
(2, 2), (2, 3), (2, 4),
(3, 1), (3, 3)
ON CONFLICT DO NOTHING;
""")
conn.commit()

## 1. Simple Left JOIN (1-1)

Fetching drink names with their country's name:

In [None]:
query = f"""
SELECT drinks_{STUDENT_NAME_TABLE}.name, countries_{STUDENT_NAME_TABLE}.name AS country
FROM drinks_{STUDENT_NAME_TABLE}
LEFT JOIN countries_{STUDENT_NAME_TABLE} ON drinks_{STUDENT_NAME_TABLE}.country_id = countries_{STUDENT_NAME_TABLE}.id;
"""
df = pd.read_sql_query(query, conn)
df


Including the country's famous attribute:

In [None]:
query = f"""
SELECT drinks_{STUDENT_NAME_TABLE}.name, countries_{STUDENT_NAME_TABLE}.name AS country, countries_{STUDENT_NAME_TABLE}.famous_for
FROM drinks_{STUDENT_NAME_TABLE}
LEFT JOIN countries_{STUDENT_NAME_TABLE} ON drinks_{STUDENT_NAME_TABLE}.country_id = countries_{STUDENT_NAME_TABLE}.id;
"""
df = pd.read_sql_query(query, conn)
df

### Mini Exercise: LEFT JOIN (1 to 1 relationship with `drinks` and `drinks_details`)

**Objective**: Retrieve all drinks and their corresponding details, assuming a 1 to 1 relationship between `drinks` and `drinks_details`. Create the corresponding table `drinks_details`.

## 2. Normal Left JOIN (1-Many)

Fetching drink names with one review each:

In [None]:
query = f"""
SELECT drinks_{STUDENT_NAME_TABLE}.name, drink_reviews_{STUDENT_NAME_TABLE}.review
FROM drinks_{STUDENT_NAME_TABLE}
LEFT JOIN drink_reviews_{STUDENT_NAME_TABLE} ON drinks_{STUDENT_NAME_TABLE}.id = drink_reviews_{STUDENT_NAME_TABLE}.drink_id;
"""
df = pd.read_sql_query(query, conn)
df


Aggregating reviews for each drink:

In [None]:
query = f"""
SELECT drinks_{STUDENT_NAME_TABLE}.name, COUNT(drink_reviews_{STUDENT_NAME_TABLE}.id) AS review_count
FROM drinks_{STUDENT_NAME_TABLE}
LEFT JOIN drink_reviews_{STUDENT_NAME_TABLE} ON drinks_{STUDENT_NAME_TABLE}.id = drink_reviews_{STUDENT_NAME_TABLE}.drink_id
GROUP BY drinks_{STUDENT_NAME_TABLE}.name;
"""
df = pd.read_sql_query(query, conn)
df


### Mini Exercise: LEFT JOIN (1 to Many relationship with `drinks` and `drinks_ingredients`)

**Objective**: Show all drinks along with all their ingredients, representing a 1 to Many relationship.


## 3. Complex Left JOIN (Many - Many)

Listing drinks with one ingredient each:

In [None]:
query = f"""
SELECT drinks_{STUDENT_NAME_TABLE}.name, ingredients_{STUDENT_NAME_TABLE}.name AS ingredient
FROM drinks_{STUDENT_NAME_TABLE}
LEFT JOIN drink_ingredients_{STUDENT_NAME_TABLE} ON drinks_{STUDENT_NAME_TABLE}.id = drink_ingredients_{STUDENT_NAME_TABLE}.drink_id
LEFT JOIN ingredients_{STUDENT_NAME_TABLE} ON drink_ingredients_{STUDENT_NAME_TABLE}.ingredient_id = ingredients_{STUDENT_NAME_TABLE}.id;
"""
df = pd.read_sql_query(query, conn)
df

Counting different ingredients for each drink:



In [None]:
query = f"""
SELECT drinks_{STUDENT_NAME_TABLE}.name, COUNT(drink_ingredients_{STUDENT_NAME_TABLE}.ingredient_id) AS ingredient_count
FROM drinks_{STUDENT_NAME_TABLE}
LEFT JOIN drink_ingredients_{STUDENT_NAME_TABLE} ON drinks_{STUDENT_NAME_TABLE}.id = drink_ingredients_{STUDENT_NAME_TABLE}.drink_id
GROUP BY drinks_{STUDENT_NAME_TABLE}.name;
"""
df = pd.read_sql_query(query, conn)
df

### Mini Exercise: LEFT JOIN (Many to Many relationship with `drinks` and `ingredients` through `drinks_ingredients`)

**Objective**: Demonstrate a LEFT JOIN in a Many to Many relationship, showing all drinks and their possible ingredients, including those without ingredients.


## 4. Left JOIN with Temp Tables (Many - Many)

Finding the most common ingredient in all drinks:

In [None]:
query = f"""
WITH drink_ingredient_count AS (
    SELECT ingredient_id, COUNT(*) AS count
    FROM drink_ingredients_{STUDENT_NAME_TABLE}
    GROUP BY ingredient_id
)
SELECT ingredients_{STUDENT_NAME_TABLE}.name, drink_ingredient_count.count
FROM ingredients_{STUDENT_NAME_TABLE}
LEFT JOIN drink_ingredient_count ON ingredients_{STUDENT_NAME_TABLE}.id = drink_ingredient_count.ingredient_id
ORDER BY drink_ingredient_count.count DESC
LIMIT 5;
"""
df = pd.read_sql_query(query, conn)
df


Calculating the average number of ingredients per drink category:

In [None]:
conn.rollback()
# Assuming there is a 'category' column in the drinks table
cursor.execute(f"""
ALTER TABLE drinks_{STUDENT_NAME_TABLE}
ADD COLUMN category VARCHAR(255);
""")
conn.commit()
cursor.execute(f"""
UPDATE drinks_{STUDENT_NAME_TABLE}
SET category = CASE
    WHEN alcohol_content > 20 THEN 'Whisky'
    WHEN alcohol_content BETWEEN 10 AND 20 THEN 'Wine'
    ELSE 'Beer'
END;
""")
conn.commit()


In [None]:
query = f"""
WITH avg_ingredients AS (
    SELECT sub.category, AVG(sub.ingredient_count) AS avg_count
    FROM (
        SELECT d.category, COUNT(di.ingredient_id) AS ingredient_count
        FROM drinks_{STUDENT_NAME_TABLE} d
        LEFT JOIN drink_ingredients_{STUDENT_NAME_TABLE} di ON d.id = di.drink_id
        GROUP BY d.id, d.category
    ) AS sub
    GROUP BY sub.category
)
SELECT category, avg_count
FROM avg_ingredients;
"""

df = pd.read_sql_query(query, conn)
df

### Mini Exercise: LEFT JOIN (Many to Many relationship with a Temp table)

**Objective**: Utilize a temporary table in a LEFT JOIN operation in a Many to Many relationship scenario, using `drinks`, `ingredients`, and a temporary junction table.


## 5. Inner JOIN

Fetching drinks with ingredients present in the ingredients table:

In [None]:
query = f"""
SELECT drinks_{STUDENT_NAME_TABLE}.name, ingredients_{STUDENT_NAME_TABLE}.name AS ingredient
FROM drinks_{STUDENT_NAME_TABLE}
JOIN drink_ingredients_{STUDENT_NAME_TABLE} ON drinks_{STUDENT_NAME_TABLE}.id = drink_ingredients_{STUDENT_NAME_TABLE}.drink_id
JOIN ingredients_{STUDENT_NAME_TABLE} ON drink_ingredients_{STUDENT_NAME_TABLE}.ingredient_id = ingredients_{STUDENT_NAME_TABLE}.id;
"""
df = pd.read_sql_query(query, conn)
df

### Mini Exercise: INNER JOIN (Find common ingredients in `drinks` and `ingredients`)

**Objective**: Identify common ingredients used in drinks, represented in both `drinks` and `ingredients` tables.


## 6. Outer JOIN

Fetching all drinks and ingredients, whether matched or not:

In [None]:
query = f"""
SELECT drinks_{STUDENT_NAME_TABLE}.*, countries_{STUDENT_NAME_TABLE}.*
FROM drinks_{STUDENT_NAME_TABLE}
FULL OUTER JOIN countries_{STUDENT_NAME_TABLE} ON drinks_{STUDENT_NAME_TABLE}.country_id = countries_{STUDENT_NAME_TABLE}.id
"""
df = pd.read_sql_query(query, conn)
df

## Final Integrating Exercise

**Objective**: This exercise is designed to test your ability to create and manipulate tables using JOIN operations in PostgreSQL, along with integrating the results with Python for analysis. You will work with a fictional scenario involving drinks, their origins, ratings, and flavors.

**Setup:**

1. Create a new table drink_origin with columns id, drink_id (foreign key to drinks table), and origin. Establish a 1-1 relationship with the drinks table.
1. Create another table drink_flavors with columns id, drink_id (foreign key to drinks table), and flavor. This table will have a 1-many relationship with the drinks table.
1. Populate these tables with relevant dummy data.

**Tasks:**

1. INNER JOIN: Write a query using INNER JOIN to find all drinks that have a specific flavor (e.g., "Fruity"). Load the results into a Pandas DataFrame.
1. LEFT JOIN: Fetch all drinks, along with their origin and flavors, using LEFT JOIN. Include drinks that might not have an associated flavor or origin. Load the results into a Pandas DataFrame.
1. FULL OUTER JOIN: Perform a FULL OUTER JOIN on the drinks table and drink_flavors to show all combinations, including drinks without flavors and flavors not associated with any drink. Load the results into a Pandas DataFrame.
1. Complex LEFT JOIN: Calculate the average number of flavors per country of origin. This will involve a LEFT JOIN between drinks, drink_origin, and drink_flavors, along with aggregation (COUNT, AVG) of the number of flavors. Load the results into a Pandas DataFrame.
1. Data Visualization: Utilize Pandas and a plotting library (like matplotlib) to visualize the distribution of flavors across different drink origins.

# Teardown

In [None]:
# Creating tables (drinks, countries, drink_reviews, ingredients, drink_ingredients)
cursor.execute(f"""
DROP TABLE IF EXISTS drinks_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS countries_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS drink_reviews_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS ingredients_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS drink_ingredients_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS drink_details_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS drink_origin_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS drink_flavors_{STUDENT_NAME_TABLE} cascade;

""")
conn.commit()
cursor.close()
conn.close()