
# Creating Simple Queries to PostgreSQL in Python

This notebook covers the fundamentals of using Python to interact with a PostgreSQL database. We'll explore how to run queries, handle data, and ensure secure database interactions.

## Topics Covered

0. Uploading a CSV file as a new table in the database.
1. Basic SQL queries in Python using SQLite (as a local example before moving to PostgreSQL).
2. Iterating over data using a `SELECT *` statement.
3. Basic `GROUP BY` queries.
4. Using templating to prevent SQL injection.
5. Selecting specific columns and rows, including limiting and offsetting results.
6. Implementing `ORDER BY` in queries.
7. Adding pagination to results using itertools.
8. Loading data into a Pandas DataFrame for analysis and charting.

Each section includes examples and explanations. In the end, you'll find an exercise to apply the concepts learned.


In [None]:
!pip install psycopg2

In [None]:

import psycopg2
import pandas as pd
import matplotlib.pyplot as plt
import itertools

# Connecting to the PostgreSQL Database

# We'll establish a connection to the PostgreSQL database using psycopg2 library.

DB_HOST = 'test-db-sql-class.cnct5qiopjti.us-east-1.rds.amazonaws.com'
DB_PORT = 5432
DB_NAME = 'students'
DB_USER = 'student'
DB_PASSWORD = 'Password123$'
STUDENT_NAME_TABLE = 'default'  # CHANGE THIS!

try:
    conn = psycopg2.connect(host=DB_HOST, port=DB_PORT, database=DB_NAME, user=DB_USER, password=DB_PASSWORD)
    print("Connected to the database.")
except psycopg2.Error as e:
    print("Unable to connect to the database.")
    print(e)


## 0. Load the Database and Table

In [None]:
%%writefile get_data.sh

mkdir -p data
if [ ! -f ./data/drinks.csv ]; then
    wget -O data/drinks.csv https://www.dropbox.com/scl/fi/tkfdy0mq30g2t424hmn5o/drinks.csv?rlkey=jl8r4aw1o7y7b5au8icub20pn&dl=0
fi

In [None]:
!bash get_data.sh

In [None]:
# Creating a new table with specified data types
cursor = conn.cursor()
cursor.execute(f"""
CREATE TABLE IF NOT EXISTS drinks_{STUDENT_NAME_TABLE} (
    id SERIAL PRIMARY KEY,
    country VARCHAR(255),
    beer_servings FLOAT,
    spirit_servings FLOAT,
    wine_servings FLOAT,
    total_litres_of_pure_alcohol FLOAT,
    continent VARCHAR(255)
);
""")
# Inserting data from the CSV file
with open('data/drinks.csv', 'r') as file:
    next(file)  # Skip the header
    cursor.copy_from(file, f'drinks_{STUDENT_NAME_TABLE}', sep=',', columns=(
        'country', 'beer_servings', 'spirit_servings', 'wine_servings',
        'total_litres_of_pure_alcohol', 'continent'
    ))

conn.commit()


## 1. Basic SQL Queries Using SQLite
In this section, we'll start with SQLite to demonstrate basic SQL operations in Python.


In [None]:

# Example 1: Creating a table and inserting data
create_table_query = f'''
CREATE TABLE if not exists students_{STUDENT_NAME_TABLE} (
    id serial primary key,
    name TEXT,
    age INTEGER
);
'''

# Execute the query
cur = conn.cursor()
cur.execute(create_table_query)

# Inserting data
insert_query = f"INSERT INTO students_{STUDENT_NAME_TABLE} (name, age) VALUES (%s, %s);"
cur.execute(insert_query, ('Alice', 21))
cur.execute(insert_query, ('Bob', 22))
conn.commit()


In [None]:

# Example 2: Selecting data from the table
select_query = f"SELECT * FROM students_{STUDENT_NAME_TABLE};"
pd.read_sql_query(select_query, conn)


### Mini Exercise - Connection and Basic Queries
Connect to the database and perform a simple query to count the number of rows in a specified table (e.g., 'students').

More complex example:

In [None]:
cursor.execute(f"""
SELECT country, continent, total_litres_of_pure_alcohol
FROM drinks_{STUDENT_NAME_TABLE}
ORDER BY total_litres_of_pure_alcohol DESC
LIMIT 3;
""")
top_alcoholic_drinks = cursor.fetchall()
for drink in top_alcoholic_drinks:
    print(drink)

## 2. Iterating Data from a Table

In [None]:
cursor.execute(f"SELECT * FROM drinks_{STUDENT_NAME_TABLE};")
for row in cursor:
    print(row)


In [None]:
cursor.execute(f"SELECT * FROM drinks_{STUDENT_NAME_TABLE} WHERE beer_servings > 100;")
for row in cursor:
    print(row)

In [None]:
cursor.execute(f"SELECT * FROM drinks_{STUDENT_NAME_TABLE} WHERE beer_servings > 100;")
pd.DataFrame(cursor.fetchall())

### Mini Exercise - Data Manipulation with Pandas
Load the results of a query (e.g., select * from students) into a Pandas DataFrame and display the first 5 rows.

## 3. Basic GROUP BY Queries

In [None]:
cursor.execute(f"SELECT continent, COUNT(*) FROM drinks_{STUDENT_NAME_TABLE} GROUP BY continent;")
count_per_category = cursor.fetchall()


In [None]:
for continent, count in count_per_category:
    print(f"Continent: {continent}, Count: {count}")

In [None]:
cursor.execute(f"""
SELECT continent, AVG(spirit_servings) as average_spirit
FROM drinks_{STUDENT_NAME_TABLE}
GROUP BY continent;
""")
avg_alcohol_by_continent = cursor.fetchall()
pd.DataFrame(avg_alcohol_by_continent, columns=['Continent', 'Average Spirit Servings'])

### Mini Exercise - Advanced Querying Techniques
Write a query to find the names of students who have a certain attribute (e.g., age greater than 20) and display the results.

## 4. Preventing SQL Injection using Templating

In [None]:
category_input = "EU"
cursor.execute(f"SELECT * FROM drinks_{STUDENT_NAME_TABLE} WHERE continent = %s;", (category_input,))
europe = cursor.fetchall()
europe

In [None]:
cursor.execute(f"SELECT * FROM drinks_{STUDENT_NAME_TABLE} WHERE country LIKE %s AND beer_servings > %s;", ('A%', 50))
strong_beers = cursor.fetchall()
strong_beers

## 5. SELECT Queries with Limiting and Skipping Rows

In [None]:
cursor.execute(f"SELECT * FROM drinks_{STUDENT_NAME_TABLE} LIMIT 5;")
top_five_drinks = cursor.fetchall()
top_five_drinks


In [None]:
cursor.execute(f"""
SELECT * FROM drinks_{STUDENT_NAME_TABLE}
WHERE beer_servings BETWEEN 50 AND 100
ORDER BY beer_servings DESC
LIMIT 10;
""")
moderate_alcohol_drinks = cursor.fetchall()
moderate_alcohol_drinks

## 6. ORDER BY queries

In [None]:
cursor.execute(f"SELECT * FROM drinks_{STUDENT_NAME_TABLE} ORDER BY continent;")
drinks_ordered_by_name = cursor.fetchall()
drinks_ordered_by_name

In [None]:
cursor.execute(f"""
SELECT * FROM drinks_{STUDENT_NAME_TABLE}
ORDER BY spirit_servings DESC, country;
""")
sorted_drinks = cursor.fetchall()
sorted_drinks

## 7. Implementing Pagination with itertools

In [None]:
from itertools import islice, zip_longest

def batched(iterable, n):
    "Batch data into lists of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    it = iter(iterable)
    while True:
        batch = list(islice(it, n))
        if not batch:
            return
        yield batch


cursor.execute(f"SELECT * FROM drinks_{STUDENT_NAME_TABLE};")
all_drinks = cursor.fetchall()

# Pagination logic
pages = batched(all_drinks, 5)
for page, data in enumerate(pages):
    print(f'page: {page}')
    print(data)
    print('-'*20)


In [None]:
cursor.execute(f"SELECT * FROM drinks_{STUDENT_NAME_TABLE} WHERE spirit_servings > 10;")
beer_drinks = cursor.fetchall()

# Pagination setup
pages = batched(beer_drinks, 5)
for page, batch in enumerate(pages):
    print(f'page: {page}')
    print(batch)
    print('-'*20)


### Mini Exercise - Using itertools for Data Processing
Use itertools to group a list of tuples from a query based on a specific attribute (e.g., student's department).

## 8. Loading Data into a Pandas DataFrame

In [None]:
df = pd.read_sql(f"SELECT * FROM drinks_{STUDENT_NAME_TABLE};", conn)
print(df.head())


In [None]:
df.groupby('continent').mean()['total_litres_of_pure_alcohol'].plot(kind='bar')

### Mini Exercise - Visualization and Analysis
Create a simple plot (e.g., bar chart) using matplotlib to visualize the count of students in each department.

### Mini Exercise - Database Management
Write a script to add a new column to an existing table (e.g., adding a 'gender' column to the 'students' table).


## Final Exercise
Using the concepts learned in this notebook, perform the following tasks:
- Create a new table 'courses' with columns 'course_id', 'course_name', and 'student_id'.
- Insert sample data into the 'courses' table.
- Write a query to select all students who are taking more than one course.
- Use a GROUP BY clause to find the average age of students in each course.
- Implement pagination to display results in batches of 5.
- Finally, load the results into a Pandas DataFrame and create a simple plot.



# Teardown

In [None]:
conn = psycopg2.connect(host=DB_HOST, port=DB_PORT, database=DB_NAME, user=DB_USER, password=DB_PASSWORD)
cursor = conn.cursor()

# Creating tables (drinks, countries, drink_reviews, ingredients, drink_ingredients)
cursor.execute(f"""
DROP TABLE IF EXISTS drinks_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS students_{STUDENT_NAME_TABLE} cascade;
DROP TABLE IF EXISTS courses_{STUDENT_NAME_TABLE} cascade;
""")
conn.commit()
cursor.close()
conn.close()