# Preventing SQL Injection and Handling Data

This notebook covers two essential topics for any developer working with databases:

1.  **SQL Injection**: We will learn what it is, see a live demonstration of an attack, and learn the **only correct way** to prevent it using **parameterized queries**.
2.  **Effective Data Handling**: We will explore more convenient ways to access retrieved data, moving from standard tuples to dictionary-like objects and finally loading data directly into a **Pandas DataFrame** for analysis.

--- 
## Setup

We'll import `psycopg2` and define our connection details. We also need `pandas` for the second half of the notebook.

In [2]:
!pip install pandas
import psycopg2
import psycopg2.extras # Needed for DictCursor
import pandas as pd

DB_HOST = "localhost"
DB_NAME = "people"
DB_USER = "fahad"
DB_PASS = "secret"



--- 
## SQL Injection: A Live Attack Demonstration

SQL injection occurs when a user's input is directly pasted into an SQL query using string formatting. This allows a malicious user to break out of the intended query and run their own code.

Let's set up a `users` table with passwords and a **vulnerable** login function.

In [3]:
sql_setup = """
DROP TABLE IF EXISTS users CASCADE;
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(255) UNIQUE NOT NULL,
    password VARCHAR(255) NOT NULL
);
INSERT INTO users (username, password) VALUES ('admin', 'password123'), ('fahad', 'secret99');
"""

try:
    with psycopg2.connect(host=DB_HOST, dbname=DB_NAME, user=DB_USER, password=DB_PASS) as conn:
        with conn.cursor() as cur:
            cur.execute(sql_setup)
    print("Table 'users' created and populated.")
except psycopg2.Error as e:
    print(f"Database error: {e}")

Table 'users' created and populated.


### The WRONG Way: Vulnerable Code

This function uses an f-string to build the query. **NEVER DO THIS.**

In [4]:
def vulnerable_login(username, password):
    sql = f"SELECT * FROM users WHERE username = '{username}' AND password = '{password}';"
    print(f"Executing SQL: {sql}") # For demonstration
    
    with psycopg2.connect(host=DB_HOST, dbname=DB_NAME, user=DB_USER, password=DB_PASS) as conn:
        with conn.cursor() as cur:
            cur.execute(sql)
            user = cur.fetchone()
            if user:
                print(f"Login successful! Welcome, {user[1]}.")
            else:
                print("Login failed.")

# A malicious user doesn't know the password. They input this:
malicious_username = "admin' OR 1=1; -- "
malicious_password = "anything"

vulnerable_login(malicious_username, malicious_password)

Executing SQL: SELECT * FROM users WHERE username = 'admin' OR 1=1; -- ' AND password = 'anything';
Login successful! Welcome, admin.


The malicious input `' OR 1=1; --` breaks the query. The `WHERE` clause becomes `WHERE username = 'admin' OR 1=1`, which is always true, and the `--` comments out the rest of the line, ignoring the password check. The attacker is logged in as admin without knowing the password.

### The RIGHT Way: Parameterized Queries

To prevent this, we use placeholders (`%s`) and pass the values as a separate tuple. `psycopg2` will then safely combine them, escaping any dangerous characters.

In [5]:
def secure_login(username, password):
    # Note the %s placeholders and the tuple of variables
    sql = "SELECT * FROM users WHERE username = %s AND password = %s;"
    
    with psycopg2.connect(host=DB_HOST, dbname=DB_NAME, user=DB_USER, password=DB_PASS) as conn:
        with conn.cursor() as cur:
            cur.execute(sql, (username, password))
            user = cur.fetchone()
            if user:
                print(f"Login successful! Welcome, {user[1]}.")
            else:
                print("Login failed.")

print("--- Trying the attack again on the secure function ---")
secure_login(malicious_username, malicious_password)

print("\n--- Trying a legitimate login on the secure function ---")
secure_login('fahad', 'secret99')

--- Trying the attack again on the secure function ---
Login failed.

--- Trying a legitimate login on the secure function ---
Login successful! Welcome, fahad.


--- 
## Effective Data Handling

Fetching data as tuples (`row[0]`, `row[1]`) works, but it's not very readable. Let's explore better options.

### Option 1: Dictionary Cursor

By specifying `cursor_factory=psycopg2.extras.DictCursor`, we can fetch rows that behave like dictionaries, allowing access by column name.

In [6]:
with psycopg2.connect(host=DB_HOST, dbname=DB_NAME, user=DB_USER, password=DB_PASS) as conn:
    # Use the special DictCursor
    with conn.cursor(cursor_factory=psycopg2.extras.DictCursor) as cur:
        cur.execute("SELECT * FROM users;")
        for row in cur.fetchall():
            print(f"ID: {row['id']}, Username: {row['username']}, Password: {row['password']}")

ID: 1, Username: admin, Password: password123
ID: 2, Username: fahad, Password: secret99


### Option 2: Load Directly into Pandas DataFrame

For data analysis, the most powerful method is to load the query result directly into a Pandas DataFrame. `pandas.read_sql_query` handles the connection, query, and data loading in one simple step.

In [7]:
sql = "SELECT id, username FROM users;"
df = None
with psycopg2.connect(host=DB_HOST, dbname=DB_NAME, user=DB_USER, password=DB_PASS) as conn:
    df = pd.read_sql_query(sql, conn)

print("Data loaded into Pandas DataFrame:")
display(df)
print("\nNow you can use all Pandas functions, e.g., df.describe():")
display(df.describe())

Data loaded into Pandas DataFrame:


  df = pd.read_sql_query(sql, conn)


Unnamed: 0,id,username
0,1,admin
1,2,fahad



Now you can use all Pandas functions, e.g., df.describe():


Unnamed: 0,id
count,2.0
mean,1.5
std,0.707107
min,1.0
25%,1.25
50%,1.5
75%,1.75
max,2.0


--- 
## Conclusion

This notebook covered two vital concepts:

1.  **Always use parameterized queries** (`%s` placeholders) to prevent SQL injection. Never use f-strings or other string formatting to build queries with user input.
2.  **Handle data effectively**: Use `DictCursor` for readable row access in standard Python, and use `pandas.read_sql_query` to seamlessly integrate your database with powerful data analysis tools.