## Imports and logging configuration

This cell imports the Python libraries used in the notebook and configures **file-based logging**.

**What happens here**
- Imports:
  - **pandas** for reading CSV files into DataFrames.
  - **os** for scanning files in a folder.
  - **sqlalchemy.create_engine** for creating a database connection/engine.
  - **logging** to record ingestion progress and timings.
  - **time** to measure runtime.
- Sets up `logging.basicConfig(...)` to write logs to `logs/ingestion_db.log` at **DEBUG** level with timestamps.


In [7]:
import pandas as pd
import os
from sqlalchemy import create_engine
import logging
import time

logging.basicConfig(
    filename="logs/ingestion_db.log",
    level=logging.DEBUG,
    format="%(asctime)s - %(levelname)s - %(message)s",
    filemode="a"
)

## Create the database engine

This cell creates a **SQLAlchemy engine** connected to a local **SQLite** database file.

**What happens here**
- `create_engine("sqlite:///inventory.db")` points to a SQLite database stored in `inventory.db` (created if it doesn't exist).
- The `engine` object is then used by `pandas.DataFrame.to_sql(...)` to write tables into the database.

In [2]:
engine = create_engine("sqlite:///inventory.db")

## Helper function: ingest a DataFrame into a SQL table

This cell defines a reusable helper function that writes a pandas DataFrame into a database table.

**What `ingest_db` does**
- Takes:
  - `df`: the DataFrame to store
  - `table_name`: the SQL table name to write to
  - `engine`: the SQLAlchemy engine/connection
- Calls `df.to_sql(...)` with:
  - `if_exists="replace"` → the table is **overwritten** each run (fresh load)
  - `index=False` → DataFrame index is not stored as a column


In [None]:
def ingest_db(df, table_name, engine):
    '''this function will ingest dataframe into db table'''
    df.to_sql(table_name, con = engine, if_exists = 'replace', index = False)

## Load raw CSV files and ingest them into the database

This cell defines the end-to-end ingestion routine that:
1) scans a folder of CSV files,  
2) loads each CSV into a DataFrame, and  
3) writes each DataFrame into a corresponding SQL table.


In [None]:
def load_raw_data():
    '''this function will load CSVs as dataframe which will then be ingested into db'''
    start = time.time()
    for file in os.listdir("data"):
        if ".csv" in file:
            df = pd.read_csv("data/"+file)
            logging.info(f"Ingesting {file} in db")
            ingest_db(df, file[:-4], engine)
    end = time.time()

    total_time = (end - start)/60

    logging.info("--------------Ingestion Complete------------")

    logging.info(f"Total time taken: {total_time} minutes")

load_raw_data()