# Sakila DVD Rental – SQL & EDA

This notebook is part of a lab where we migrate the legacy Sakila SQLite database into DuckDB and perform exploratory data analysis (EDA).  

**Goals:**

- Connect to the Sakila SQLite database via DuckDB  
- (Optionally) create a persistent DuckDB database file  
- Explore the data using a combination of SQL (DuckDB) and pandas  
- Prepare insights that can later be used for a BI dashboard


In [1]:
from pathlib import Path

import duckdb
import pandas as pd

# Paths
PROJECT_ROOT = Path.cwd().resolve().parents[0] if Path.cwd().name == "notebooks" else Path.cwd()
DATA_RAW = PROJECT_ROOT / "data" / "raw"
DATA_PROCESSED = PROJECT_ROOT / "data" / "processed"

SQLITE_PATH = DATA_RAW / "sqlite-sakila.db"
DUCKDB_PATH = DATA_PROCESSED / "sakila.duckdb"

PROJECT_ROOT, DATA_RAW, DATA_PROCESSED, SQLITE_PATH, DUCKDB_PATH


(WindowsPath('C:/Users/jonas/Documents/Nackademin MLOps/Databashantering/sakila-lab'),
 WindowsPath('C:/Users/jonas/Documents/Nackademin MLOps/Databashantering/sakila-lab/data/raw'),
 WindowsPath('C:/Users/jonas/Documents/Nackademin MLOps/Databashantering/sakila-lab/data/processed'),
 WindowsPath('C:/Users/jonas/Documents/Nackademin MLOps/Databashantering/sakila-lab/data/raw/sqlite-sakila.db'),
 WindowsPath('C:/Users/jonas/Documents/Nackademin MLOps/Databashantering/sakila-lab/data/processed/sakila.duckdb'))

## Task 0 – Data ingestion (SQLite → DuckDB)

In this section we create a DuckDB database file and attach the legacy Sakila SQLite database using DuckDB's SQLite extension.


In [2]:
# Connect (this will create the file if it does not exist)
con = duckdb.connect(DUCKDB_PATH.as_posix())

# Install & load the SQLite extension, then attach the SQLite database
con.execute("INSTALL sqlite;")
con.execute("LOAD sqlite;")
con.execute(f"CALL sqlite_attach('{SQLITE_PATH.as_posix()}');")


<duckdb.duckdb.DuckDBPyConnection at 0x26f75bdd5b0>

In [3]:
con.sql("""
    SELECT table_schema, table_name
    FROM information_schema.tables
    ORDER BY table_schema, table_name
    LIMIT 50
""").df()


Unnamed: 0,table_schema,table_name
0,main,actor
1,main,address
2,main,category
3,main,city
4,main,country
5,main,customer
6,main,customer_list
7,main,film
8,main,film_actor
9,main,film_category
