# Phase 2: Create and Seed the Database

**Objective:** Bring our database to life by creating the physical `mining_knowledge.db` file and populating it with structured, validated sample data.

**Workflow:**
1.  **Define and test** the CRUD helper functions here in the notebook.
2.  **Define** sample data for our models.
3.  **Use** the helpers and data to seed the database.
4.  **Finally, copy** the finalized CRUD functions into a new `helper_crud.py` file.

---
## ✅ Part 1: Database and Session Setup

First, we'll set up the connection to our SQLite database and create the tables from our models.

### TODO 1.1: Imports


In [1]:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import os
from dotenv import load_dotenv
from models import (    
    Base,
    SD_Objective,
    SDG_Goal,
    SDG_Target,
    SDG_Indicator,
    Practice,
    Stakeholder_Group,
    Stakeholder,
    Concern,
    PracticeToTargetLink,
    StakeholderToConcernLink,
    ConcernToTargetLink,
    SDObjectiveToSDGLink)


### TODO 1.2: Define Database URL and Create Engine

**Your Task:** Set the database URL and create the SQLAlchemy engine.

In [2]:
# This function will load variables from a .env file into your environment
load_dotenv()

# Retrieve the database URL from the environment.
# os.getenv() returns None if the variable is not found.
DATABASE_URL = os.getenv("DATABASE_URL")

# Check if the DATABASE_URL was loaded correctly
if not DATABASE_URL:
    raise ValueError("No DATABASE_URL found in the .env file or environment variables.")

# The 'echo=True' argument is very useful for development.
# It makes SQLAlchemy log all the SQL statements it generates.
engine = create_engine(DATABASE_URL, echo=True)

print(f"Engine created successfully using the URL from the .env file.")

Engine created successfully using the URL from the .env file.


The advantages of using the `.env` file to define DATABASE_URL:
- **Security**: You can add your .env file to .gitignore. This ensures that your database credentials or file paths are never accidentally committed to your public GitHub repository.

- **Flexibility**: You can have different .env files for different environments (development, testing, production) without ever changing your Python code.

- **Collaboration**: Your teammates can use their own local database configurations by simply creating their own .env file, making the project easier to set up for everyone.

### TODO 1.3: Create Database Tables

**Your Task:** Execute this cell to create the `mining_knowledge.db` file and all the tables within it. **Run this cell only once.** If you run it again after seeding data, it won't delete the data, but it's good practice to think of this as a one-time setup step.

In [3]:
# This command connects to the database and creates all tables
# that inherit from our 'Base' object in models.py
Base.metadata.create_all(bind=engine)

print("Database 'mining_knowledge.db' and its tables have been created or verified successfully.")

2025-07-10 17:00:39,973 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-07-10 17:00:39,973 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("sd_objectives")
2025-07-10 17:00:39,974 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-07-10 17:00:39,975 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("sdg_goal")
2025-07-10 17:00:39,975 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-07-10 17:00:39,976 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("practice")
2025-07-10 17:00:39,976 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-07-10 17:00:39,977 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("sdg_target")
2025-07-10 17:00:39,977 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-07-10 17:00:39,978 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("sdg_indicator")
2025-07-10 17:00:39,979 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-07-10 17:00:39,979 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("stakeholder_group")
2025-07-10 17:00:39,980 INFO sqlalchemy

---
## 🛠️ Part 2: CRUD Helper Functions (for `helper_crud.py`)

Now, we will write the helper functions that will simplify our interactions with the database. Develop and test them in the cell below. Once they are working correctly, you can copy the entire content of the cell into a new `helper_crud.py` file.

### TODO 2.1 - 2.3: Create CRUD Helpers

**Your Task:** Write the Python code for the following functions:
1.  **Imports**: Import `Session`, all ORM models from `models`, and all `Create` schemas from `valid_schemas`.
2.  **`get_or_create(db: Session, model, **kwargs)`**: A generic function to prevent creating duplicate entries.
3.  **Specific `create_` functions**: For each model, write a function that takes a Pydantic `Create` schema and uses `get_or_create` to add it to the DB. For example: `create_practice(db: Session, practice: PracticeCreate)`.

In [4]:

# 1. Imports
from sqlalchemy.orm import Session
# It's good practice to import the modules themselves to maintain namespace clarity.
import models
import valid_schemas

# 2. Generic get_or_create function
def get_or_create(db: Session, model, **kwargs):
    """
    Checks if an instance of a model exists in the database.
    If it exists, it returns the instance and False.
    If not, it creates a new instance, adds it to the session, and returns it and True.
    """
    # For models with composite keys, we filter by all primary key columns.
    # For others, we can just use the provided kwargs.
    instance = db.query(model).filter_by(**kwargs).first()
    if instance:
        return instance, False # Return instance and a flag indicating it was not created
    else:
        instance = model(**kwargs)
        db.add(instance)
        # Note: We don't commit here. The calling function is responsible for the commit.
        return instance, True # Return instance and a flag indicating it was created

# 3. Specific create functions for each model

# --- Node Helpers ---

def create_sd_objective(db: Session, objective: valid_schemas.SD_ObjectiveCreate):
    db_obj, _ = get_or_create(db, models.SD_Objective, id=objective.id)
    return db_obj

def create_sdg_goal(db: Session, goal: valid_schemas.SDG_GoalCreate):
    db_obj, _ = get_or_create(db, models.SDG_Goal, **goal.model_dump())
    return db_obj

def create_sdg_target(db: Session, target: valid_schemas.SDG_TargetCreate):
    db_obj, _ = get_or_create(db, models.SDG_Target, **target.model_dump())
    return db_obj

def create_sdg_indicator(db: Session, indicator: valid_schemas.SDG_IndicatorCreate):
    db_obj, _ = get_or_create(db, models.SDG_Indicator, **indicator.model_dump())
    return db_obj

def create_practice(db: Session, practice: valid_schemas.PracticeCreate):
    db_obj, _ = get_or_create(db, models.Practice, **practice.model_dump())
    return db_obj

def create_stakeholder_group(db: Session, group: valid_schemas.Stakeholder_GroupCreate):
    db_obj, _ = get_or_create(db, models.Stakeholder_Group, **group.model_dump())
    return db_obj

def create_stakeholder(db: Session, stakeholder: valid_schemas.StakeholderCreate):
    db_obj, _ = get_or_create(db, models.Stakeholder, **stakeholder.model_dump())
    return db_obj

def create_concern(db: Session, concern: valid_schemas.ConcernCreate):
    db_obj, _ = get_or_create(db, models.Concern, **concern.model_dump())
    return db_obj

# --- Link Helpers ---

def create_practice_to_target_link(db: Session, link: valid_schemas.PracticeToTargetLinkCreate):
    # For link tables with composite keys, we filter by the primary key fields
    # to check for existence.
    db_obj, _ = get_or_create(
        db,
        models.PracticeToTargetLink,
        practice_id=link.practice_id,
        target_id=link.target_id,
        # If the link doesn't exist, the remaining data is used to create it.
        **link.model_dump()
    )
    return db_obj

def create_stakeholder_to_concern_link(db: Session, link: valid_schemas.StakeholderToConcernLinkCreate):
    db_obj, _ = get_or_create(
        db,
        models.StakeholderToConcernLink,
        stakeholder_id=link.stakeholder_id,
        concern_id=link.concern_id,
        **link.model_dump()
    )
    return db_obj

def create_concern_to_target_link(db: Session, link: valid_schemas.ConcernToTargetLinkCreate):
    db_obj, _ = get_or_create(
        db,
        models.ConcernToTargetLink,
        concern_id=link.concern_id,
        target_id=link.target_id,
        **link.model_dump()
    )
    return db_obj

def create_sd_objective_to_sdg_link(db: Session, link: valid_schemas.SDObjectiveToSDGLinkCreate):
    db_obj, _ = get_or_create(
        db,
        models.SDObjectiveToSDGLink,
        sd_objective_id=link.sd_objective_id,
        sdg_goal_id=link.sdg_goal_id,
        **link.model_dump()
    )
    return db_obj


print("CRUD Helper functions defined and ready for use.")

CRUD Helper functions defined and ready for use.


---
## 📝 Part 3: Prepare Sample Data

Here, we'll define the sample data we want to insert into our database. The data should be structured as a list of dictionaries, where each dictionary matches a Pydantic `Create` schema.

### TODO 3.1 & 3.2: Define Sample Data for Nodes and Links

**Your Task:** Create lists of dictionaries containing your sample data. I've provided a small example for `SD_Objective` and `Practice` to get you started.

In [None]:
# TODO: Finish this part in Excel.

---
## 🌱 Part 4: Seed the Database

This is the final step where we put everything together. We'll create a database session, loop through our sample data, validate it with Pydantic, use our CRUD helpers to add it to the session, and finally commit all changes to the database.

### TODO 4.1 - 4.4: Seeding Script

**Your Task:** Write the script to seed the database. Remember the correct order: **seed nodes first, then seed links** to ensure relational integrity.

In [None]:
# Create a Session class to interact with the database
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)

# Create a new session
db = SessionLocal()

try:
    print("--- Seeding Node Data ---")
    # Seed SD Objectives
    for objective_data in sd_objectives_data:
        pydantic_obj = valid_schemas.SD_ObjectiveCreate(**objective_data)
        create_sd_objective(db=db, objective=pydantic_obj)
    print("SD Objectives seeded.")
    
    # Seed Practices
    for practice_data in practices_data:
        pydantic_obj = valid_schemas.PracticeCreate(**practice_data)
        create_practice(db=db, practice=pydantic_obj)
    print("Practices seeded.")
    
    # ... Add seeding loops for all other Node types here ...

    # Commit the nodes first to ensure they have IDs before creating links
    db.commit()
    print("\n--- Node data committed ---")
    
    print("\n--- Seeding Link Data ---")
    # Seed PracticeToTarget Links
    # (You would need a create_practice_to_target_link helper for this)
    # for link_data in practice_to_target_links_data:
    #     pydantic_obj = valid_schemas.PracticeToTargetLinkCreate(**link_data)
    #     create_practice_to_target_link(db=db, link=pydantic_obj)
    # print("Practice-to-Target links seeded.")

    # ... Add seeding loops for all other Link types here ...

    # Final commit for the links
    db.commit()
    print("\n--- Link data committed ---")
    
    print("\nDatabase seeding completed successfully!")

except Exception as e:
    print(f"An error occurred: {e}")
    db.rollback() # Roll back the transaction on error

finally:
    db.close() # Always close the session
    print("\nDatabase session closed.")