# <a id='toc1_'></a>[Phase 1: Define the SQLAlchemy & Pydantic Schemas](#toc0_)

**Objective:** Translate our project's data schema from the Excel file into functional Python code. This notebook will contain all the class definitions for our database models (`models.py`) and data validation schemas (`valid_schemas.py`).

Follow the `TODO` instructions in each step to complete the exercise.

--- 
## <a id='toc1_1_'></a>[✅ Part 1: Setup and Base Declarations](#toc0_)
This first part handles the necessary imports and foundational components for SQLAlchemy.

### <a id='toc1_1_1_'></a>[TODO 1.1: Imports](#toc0_)

**Your Task:** Import all the necessary modules.
- From `sqlalchemy`: `create_engine`, `Column`, `Integer`, `String`, `Float`, `Boolean`, `Text`, `ForeignKey`, `DateTime`.
- From `sqlalchemy.orm`: `relationship`, `declarative_base`.
- From Python's `enum` module: `Enum`.

In [1]:
# sqlalchemy imports for database schema definition
from sqlalchemy import (
    create_engine,
    Column,
    Integer,
    String,
    Float,
    Boolean,
    Text,
    ForeignKey,
    DateTime,
    Enum as SQLAlchemyEnum # Renamed to avoid conflict with Python's Enum
)

# sqlalchemy.orm imports for object-relational mapping
from sqlalchemy.orm import relationship, declarative_base

# Python's built-in enum for creating controlled vocabularies
from enum import Enum

### <a id='toc1_1_2_'></a>[TODO 1.2: Create Enums](#toc0_)

**Your Task:** Define a Python `Enum` class to represent controlled sets of values.
- Create a class `LevelEnum(Enum)` for attributes like `capital_intensity`. It should have the members: `HIGH = 'High'`, `MEDIUM = 'Medium'`, and `LOW = 'Low'`.

In [2]:
# This Enum provides controlled vocabulary for certain model fields.
# Using a standard Python Enum is good practice and can be used by other parts
# of the application (like Pydantic models or frontend components).
class LevelEnum(Enum):
    HIGH = 'H'
    HIGH_MEDIUM = 'H/M'
    MEDIUM = 'M'
    MEDIUM_LOW = 'M/L'
    LOW = 'L'


### <a id='toc1_1_3_'></a>[TODO 1.3: Declarative Base](#toc0_)

**Your Task:** Create the `Base` object. All of our ORM model classes will inherit from this, allowing SQLAlchemy to map them to database tables.

In [3]:
# The Declarative Base is a factory for creating base classes for your ORM models.
# All of our model classes will inherit from this 'Base' object.
# SQLAlchemy's machinery will then map these classes to tables in the database.
Base = declarative_base()

---
## <a id='toc1_2_'></a>[🏛️ Part 2: Node Table ORM Classes (The Entities)](#toc0_)

Now, let's define the classes for our core entities. Each class corresponds to a table in the database.

### <a id='toc1_2_1_'></a>[TODO 2.1: `SD_Objective` Class](#toc0_)
**Your Task:** Define the class for the highest-level Sustainable Development objectives (Environmental, Social, Economic).
- **Table Name:** `sd_objectives`
- **Columns:** `id` (String, Primary Key)
- **Relationships:** `sdg_goals` (one-to-many relationship with `SDG_Goal`)

In [4]:
class SD_Objective(Base):
    __tablename__ = 'sd_objectives'
    
    # Columns
    id = Column(String, primary_key=True) # E.g., "Env", "Soc", "Econ"
    
    # Relationships
    # This defines the one-to-many relationship from SD_Objective to SDG_Goal.
    # The 'back_populates' argument creates a two-way linkage, connecting to the 'objective' attribute in the SDG_Goal class.
    sdg_goals = relationship("SDG_Goal", back_populates="objective")

    def __repr__(self):
        return f"<SD_Objective(id='{self.id}')>"
    


### <a id='toc1_2_2_'></a>[TODO 2.2: `SDG_Goal` Class](#toc0_)
**Your Task:** Define the class for the 17 main SDG Goals.
- **Table Name:** `sdg_goal`
- **Columns:** `id` (String, Primary Key), `name` (String), `parent_objective_id` (ForeignKey to `sd_objectives.id`).
- **Relationships:** `objective` (many-to-one to `SD_Objective`), `targets` (one-to-many to `SDG_Target`).

In [5]:
class SDG_Goal(Base):
    __tablename__ = 'sdg_goal'
    
    # Columns
    id = Column(String, primary_key=True) # E.g., "SDG1"
    name = Column(String, nullable=False)
    parent_objective_id = Column(String, ForeignKey('sd_objectives.id'))
    
    # Relationships
    # Many-to-one relationship back to SD_Objective
    objective = relationship("SD_Objective", back_populates="sdg_goals")
    # One-to-many relationship to SDG_Target
    targets = relationship("SDG_Target", back_populates="goal")

    def __repr__(self):
        return f"<SDG_Goal(id='{self.id}', name='{self.name}')>"

### <a id='toc1_2_3_'></a>[TODO 2.3: `Practice` Class](#toc0_)
**Your Task:** Define the class for Mining Practices.
- **Table Name:** `practice`
- **Columns:** Define all columns as specified in the schema, including `id`, `name`, `category`, `description`, etc.
- **Enums:** Use the `LevelEnum` for `capital_intensity`, `technical_complexity`, and `operational_disruption`.
- **Boolean:** Use `Boolean` for `long_term_liability`.
- **Relationships:** `target_links` (one-to-many relationship to the `PracticeToTargetLink` association object).

In [6]:
class Practice(Base):
    __tablename__ = 'practice'
    
    # Columns
    id = Column(String, primary_key=True) # E.g., "p1"
    name = Column(String, nullable=False)
    category = Column(String)
    description = Column(Text)
    major_actions_involved = Column(Text)
    remark = Column(Text)
    evidence_source = Column(String)
    
    # Using the 5-level Enum for qualitative assessment
    capital_intensity = Column(SQLAlchemyEnum(LevelEnum))
    technical_complexity = Column(SQLAlchemyEnum(LevelEnum))
    operational_disruption = Column(SQLAlchemyEnum(LevelEnum))
    
    # Boolean for true/false values
    long_term_liability = Column(Boolean)
    
    # Relationships
    # This will link to the PracticeToTargetLink association object.
    target_links = relationship("PracticeToTargetLink", back_populates="practice")

    def __repr__(self):
        return f"<Practice(id='{self.id}', name='{self.name}')>"

### <a id='toc1_2_4_'></a>[TODO 2.4: `Stakeholder_Group` Class](#toc0_)
**Your Task:** Define the class for stakeholder categories.
- **Table Name:** `stakeholder_group`
- **Columns:** `id` (String, Primary Key), `name` (String).
- **Relationships:** `stakeholders` (one-to-many relationship with `Stakeholder`).

In [7]:
class Stakeholder_Group(Base):
    __tablename__ = 'stakeholder_group'
    
    # Columns
    id = Column(String, primary_key=True) # E.g., "shg1_cv_sct"
    name = Column(String, nullable=False)
    
    # Relationships
    # One-to-many relationship to the Stakeholder class
    stakeholders = relationship("Stakeholder", back_populates="group")
    
    def __repr__(self):
        return f"<Stakeholder_Group(id='{self.id}', name='{self.name}')>"

### <a id='toc1_2_5_'></a>[TODO 2.5: `Concern` Class](#toc0_)
**Your Task:** Define the class for stakeholder concerns.
- **Table Name:** `concern`
- **Columns:** `id` (String, Primary Key), `name` (String), `description` (Text).
- **Relationships:** `sh_links` (to `StakeholderToConcernLink`), `target_links` (to `ConcernToTargetLink`).

In [8]:
class Concern(Base):
    __tablename__ = 'concern'
    
    # Columns
    id = Column(String, primary_key=True) # E.g., "concern_jobs"
    name = Column(String, nullable=False)
    description = Column(Text)
    
    # Relationships
    # These will link to our association objects, which we'll define in Part 3.
    sh_links = relationship("StakeholderToConcernLink", back_populates="concern")
    target_links = relationship("ConcernToTargetLink", back_populates="concern")
    
    def __repr__(self):
        return f"<Concern(id='{self.id}', name='{self.name}')>"

### <a id='toc1_2_6_'></a>[TODO 2.6: `Stakeholder` Class](#toc0_)
**Your Task:** Define the class for specific stakeholders.
- **Table Name:** `stakeholder`
- **Columns:** `id`, `name`, `category_id` (ForeignKey to `stakeholder_group.id`), and other attributes from the schema.
- **Relationships:** `group` (many-to-one to `Stakeholder_Group`), `concern_links` (to `StakeholderToConcernLink`).

In [9]:
class Stakeholder(Base):
    __tablename__ = 'stakeholder'
    
    # Columns
    id = Column(String, primary_key=True) # E.g., "sh1"
    name = Column(String, nullable=False)
    category_id = Column(String, ForeignKey('stakeholder_group.id'))
    definition = Column(Text)
    description = Column(Text)
    
    # Relationships
    # Many-to-one relationship back to its group
    group = relationship("Stakeholder_Group", back_populates="stakeholders")
    # One-to-many relationship to the association object
    concern_links = relationship("StakeholderToConcernLink", back_populates="stakeholder")

    def __repr__(self):
        return f"<Stakeholder(id='{self.id}', name='{self.name}')>"

### <a id='toc1_2_7_'></a>[TODO 2.7: `SDG_Target` Class](#toc0_)
**Your Task:** Define the class for specific SDG Targets.
- **Table Name:** `sdg_target`
- **Columns:** `id`, `description` (Text), `parent_goal_id` (ForeignKey to `sdg_goal.id`).
- **Relationships:** `goal` (many-to-one to `SDG_Goal`), `indicators` (one-to-many to `SDG_Indicator`), `practice_links` (to `PracticeToTargetLink`), `concern_links` (to `ConcernToTargetLink`).

In [10]:
class SDG_Target(Base):
    __tablename__ = 'sdg_target'
    
    # Columns
    id = Column(String, primary_key=True) # E.g., "1.1"
    description = Column(Text, nullable=False)
    parent_goal_id = Column(String, ForeignKey('sdg_goal.id'))
    
    # Relationships
    # Many-to-one relationship back to its parent goal
    goal = relationship("SDG_Goal", back_populates="targets")
    # One-to-many relationship to its indicators
    indicators = relationship("SDG_Indicator", back_populates="target")
    # Link to association objects for many-to-many relationships
    practice_links = relationship("PracticeToTargetLink", back_populates="target")
    concern_links = relationship("ConcernToTargetLink", back_populates="target")

    def __repr__(self):
        return f"<SDG_Target(id='{self.id}')>"

### <a id='toc1_2_8_'></a>[TODO 2.8: `SDG_Indicator` Class](#toc0_)
**Your Task:** Define the class for SDG Indicators.
- **Table Name:** `sdg_indicator`
- **Columns:** `id`, `description` (Text), `code` (String), `parent_target_id` (ForeignKey to `sdg_target.id`).
- **Relationships:** `target` (many-to-one to `SDG_Target`).

In [11]:
class SDG_Indicator(Base):
    __tablename__ = 'sdg_indicator'
    
    # Columns
    id = Column(String, primary_key=True) # E.g., "1.1.1"
    description = Column(Text, nullable=False)
    code = Column(String)
    parent_target_id = Column(String, ForeignKey('sdg_target.id'))
    
    # Relationships
    # Many-to-one relationship back to its parent target
    target = relationship("SDG_Target", back_populates="indicators")

    def __repr__(self):
        return f"<SDG_Indicator(id='{self.id}')>"

---
## <a id='toc1_3_'></a>[🔗 Part 3: Link Table ORM Classes (The Relationships)](#toc0_)

This part defines the "association objects" that manage the many-to-many relationships in our schema. They hold the foreign keys and any extra data about the link itself.

### <a id='toc1_3_1_'></a>[TODO 3.1: `PracticeToTargetLink` Association Object](#toc0_)
**Your Task:** Define the link between Practices and SDG Targets.
- **Table Name:** `practice_to_target_link`
- **Columns:** `practice_id` (ForeignKey to `practice.id`, primary_key=True), `target_id` (ForeignKey to `sdg_target.id`, primary_key=True), `relevance_weight` (Float), `is_direct` (Boolean), and other attributes from the schema.
- **Relationships:** `practice` (back-populates `Practice.target_links`), `target` (back-populates `SDG_Target.practice_links`).

In [12]:
# Import the current time function for the last_updated default
from datetime import datetime, timezone

class PracticeToTargetLink(Base):
    __tablename__ = 'practice_to_target_link'
    
    # Composite primary key made of two foreign keys
    practice_id = Column(String, ForeignKey('practice.id'), primary_key=True)
    target_id = Column(String, ForeignKey('sdg_target.id'), primary_key=True)
    
    # Attributes specific to the link
    relevance_weight = Column(SQLAlchemyEnum(LevelEnum), nullable=False)
    is_direct = Column(Boolean, nullable=False)
    evidence = Column(Text)
    math_model = Column(Text)
    last_updated = Column(DateTime, default=lambda: datetime.now(timezone.utc))
    
    # Relationships to navigate back to the parent objects
    # These connect to the 'target_links' attribute in the Practice class
    # and the 'practice_links' attribute in the SDG_Target class.
    practice = relationship("Practice", back_populates="target_links")
    target = relationship("SDG_Target", back_populates="practice_links")

    def __repr__(self):
        return f"<PracticeToTargetLink practice='{self.practice_id}' target='{self.target_id}'>"

### <a id='toc1_3_2_'></a>[TODO 3.2: `StakeholderToConcernLink` Association Object](#toc0_)
**Your Task:** Define the link between Stakeholders and their Concerns.
- **Table Name:** `stakeholder_to_concern_link`
- **Columns:** `stakeholder_id` (ForeignKey, primary_key=True), `concern_id` (ForeignKey, primary_key=True), `priority_weight` (Float), and other attributes.
- **Relationships:** `stakeholder` (back-populates `Stakeholder.concern_links`), `concern` (back-populates `Concern.sh_links`).

In [13]:
class StakeholderToConcernLink(Base):
    __tablename__ = 'stakeholder_to_concern_link'
    
    # Composite primary key
    stakeholder_id = Column(String, ForeignKey('stakeholder.id'), primary_key=True)
    concern_id = Column(String, ForeignKey('concern.id'), primary_key=True)
    
    # Attributes specific to the link
    priority_weight = Column(SQLAlchemyEnum(LevelEnum), nullable=False)
    evidence = Column(Text)
    
    # Relationships for back-population
    stakeholder = relationship("Stakeholder", back_populates="concern_links")
    concern = relationship("Concern", back_populates="sh_links")

    def __repr__(self):
        return f"<StakeholderToConcernLink stakeholder='{self.stakeholder_id}' concern='{self.concern_id}'>"


### <a id='toc1_3_3_'></a>[TODO 3.3: `ConcernToTargetLink` Association Object](#toc0_)
**Your Task:** Define the link between stakeholder Concerns and SDG Targets.
- **Table Name:** `concern_to_target_link`
- **Columns:** `concern_id` (ForeignKey, primary_key=True), `target_id` (ForeignKey, primary_key=True).
- **Relationships:** `concern` (back-populates `Concern.target_links`), `target` (back-populates `SDG_Target.concern_links`).

In [14]:
class ConcernToTargetLink(Base):
    __tablename__ = 'concern_to_target_link'
    
    # Composite primary key
    concern_id = Column(String, ForeignKey('concern.id'), primary_key=True)
    target_id = Column(String, ForeignKey('sdg_target.id'), primary_key=True)
    
    # Relationships for back-population
    concern = relationship("Concern", back_populates="target_links")
    target = relationship("SDG_Target", back_populates="concern_links")

    def __repr__(self):
        return f"<ConcernToTargetLink concern='{self.concern_id}' target='{self.target_id}'>"

### <a id='toc1_3_4_'></a>[TODO 3.4: `SDObjectiveToSDGLink` Association Object](#toc0_)
**Your Task:** Define the link between the high-level SD Objectives and SDG Goals.
- **Table Name:** `sd_objective_to_sdg_link`
- **Columns:** `sd_objective_id` (ForeignKey, primary_key=True), `sdg_goal_id` (ForeignKey, primary_key=True), `weight` (Float), `comment` (Text).
- **Note:** This table requires careful relationship setup. Since `SD_Objective` and `SDG_Goal` are already linked directly, we need to decide if this table is for additional properties on that link. For now, just define the table.

In [15]:
class SDObjectiveToSDGLink(Base):
    __tablename__ = 'sd_objective_to_sdg_link'

    # Composite primary key
    sd_objective_id = Column(String, ForeignKey('sd_objectives.id'), primary_key=True)
    sdg_goal_id = Column(String, ForeignKey('sdg_goal.id'), primary_key=True)

    # Attributes specific to the link
    # We use our 5-level enum here for consistency
    weight = Column(SQLAlchemyEnum(LevelEnum), nullable=False)
    comment = Column(Text)
    
    # Note: we are defining this table to store properties about the link itself. 
    # We are omitting the direct relationship() attributes here to avoid conflicting 
    # with the simpler one-to-many relationship already defined between
    # SD_Objective and SDG_Goal. This table acts as an "enrichment" layer.

    def __repr__(self):
        return f"<SDObjectiveToSDGLink objective='{self.sd_objective_id}' goal='{self.sdg_goal_id}'>"

---
## <a id='toc1_4_'></a>[VALIDATION Part 4: Pydantic Schemas for Data Validation](#toc0_)

This part is for defining Pydantic models. These will be used later (in our API and seeding scripts) to validate that the data we are handling matches the expected structure and types. It's a crucial step for ensuring data quality.

**First, import Pydantic and other necessary types.**

In [17]:
from pydantic import BaseModel
from typing import List, Optional
from datetime import datetime

# Import the LevelEnum from your models file so Pydantic can use it for validation
from models import LevelEnum

### <a id='toc1_4'></a>[TODO 4: Base, Create, Read Schemas](#toc0_)
**Your Tasks:** 
- Create base Pydantic models for the main entities (e.g., `PracticeBase`, `StakeholderBase`). These should contain the fields that are common to both creating and reading data (e.g., `name`, `description`).
- Create schemas for data creation (e.g., `PracticeCreate`). These models should inherit from your base schemas and include any other fields required when creating a new record. They are used to validate incoming data.
- Create schemas for reading data (e.g., `PracticeRead`). These will also inherit from the base schemas but will add fields that are generated by the database, such as `id`.**Crucially, they must include an inner class `Config` with `from_attributes = True`.** This allows Pydantic to read data directly from your SQLAlchemy ORM objects.

In [None]:
from pydantic import BaseModel
from typing import List, Optional
from datetime import datetime

# Assuming your models.py file is in the same directory
# and contains the LevelEnum definition.
from models import LevelEnum

# ===================================================================
# BASE & CREATE SCHEMAS
# These are used for data input and validation.
# ===================================================================

# --- Node Schemas ---

class SD_ObjectiveBase(BaseModel):
    id: str

class SDG_GoalBase(BaseModel):
    id: str
    name: str
    parent_objective_id: str

class SDG_TargetBase(BaseModel):
    id: str
    description: str
    parent_goal_id: str

class SDG_IndicatorBase(BaseModel):
    id: str
    description: str
    code: Optional[str] = None
    parent_target_id: str

class PracticeBase(BaseModel):
    id: str
    name: str
    category: Optional[str] = None
    description: Optional[str] = None
    major_actions_involved: Optional[str] = None
    remark: Optional[str] = None
    evidence_source: Optional[str] = None
    capital_intensity: Optional[LevelEnum] = None
    technical_complexity: Optional[LevelEnum] = None
    operational_disruption: Optional[LevelEnum] = None
    long_term_liability: Optional[bool] = None

class Stakeholder_GroupBase(BaseModel):
    id: str
    name: str

class StakeholderBase(BaseModel):
    id: str
    name: str
    category_id: str
    definition: Optional[str] = None
    description: Optional[str] = None

class ConcernBase(BaseModel):
    id: str
    name: str
    description: Optional[str] = None

# --- Link Schemas ---

class PracticeToTargetLinkBase(BaseModel):
    practice_id: str
    target_id: str
    relevance_weight: LevelEnum
    is_direct: bool
    evidence: Optional[str] = None
    math_model: Optional[str] = None

class StakeholderToConcernLinkBase(BaseModel):
    stakeholder_id: str
    concern_id: str
    priority_weight: LevelEnum
    evidence: Optional[str] = None

class ConcernToTargetLinkBase(BaseModel):
    concern_id: str
    target_id: str

class SDObjectiveToSDGLinkBase(BaseModel):
    sd_objective_id: str
    sdg_goal_id: str
    weight: LevelEnum
    comment: Optional[str] = None


# The 'Create' schemas can often be identical to the 'Base' schemas.
# We define them for clarity and future extensibility.
SD_ObjectiveCreate = SD_ObjectiveBase
SDG_GoalCreate = SDG_GoalBase
SDG_TargetCreate = SDG_TargetBase
SDG_IndicatorCreate = SDG_IndicatorBase
PracticeCreate = PracticeBase
Stakeholder_GroupCreate = Stakeholder_GroupBase
StakeholderCreate = StakeholderBase
ConcernCreate = ConcernBase
PracticeToTargetLinkCreate = PracticeToTargetLinkBase
StakeholderToConcernLinkCreate = StakeholderToConcernLinkBase
ConcernToTargetLinkCreate = ConcernToTargetLinkBase
SDObjectiveToSDGLinkCreate = SDObjectiveToSDGLinkBase


# ===================================================================
# READ SCHEMAS
# These are used for data output and include relationships.
# They have from_attributes = True to read from SQLAlchemy models.
# ===================================================================

# --- Link Read Schemas (no nesting needed) ---

class PracticeToTargetLinkRead(PracticeToTargetLinkBase):
    last_updated: datetime
    class Config:
        from_attributes = True

class StakeholderToConcernLinkRead(StakeholderToConcernLinkBase):
    class Config:
        from_attributes = True

class ConcernToTargetLinkRead(ConcernToTargetLinkBase):
    class Config:
        from_attributes = True

class SDObjectiveToSDGLinkRead(SDObjectiveToSDGLinkBase):
    class Config:
        from_attributes = True

# --- Node Read Schemas (with nested relationships) ---

class PracticeRead(PracticeBase):
    target_links: List[PracticeToTargetLinkRead] = []
    class Config:
        from_attributes = True

class ConcernRead(ConcernBase):
    sh_links: List[StakeholderToConcernLinkRead] = []
    target_links: List[ConcernToTargetLinkRead] = []
    class Config:
        from_attributes = True

class StakeholderRead(StakeholderBase):
    concern_links: List[StakeholderToConcernLinkRead] = []
    class Config:
        from_attributes = True

class SDG_TargetRead(SDG_TargetBase):
    practice_links: List[PracticeToTargetLinkRead] = []
    concern_links: List[ConcernToTargetLinkRead] = []
    # indicators: List['SDG_IndicatorRead'] = [] # Example for further nesting
    class Config:
        from_attributes = True

# --- Schemas for remaining nodes (no relationships defined in this example) ---

class SDG_IndicatorRead(SDG_IndicatorBase):
    class Config:
        from_attributes = True

class Stakeholder_GroupRead(Stakeholder_GroupBase):
    # stakeholders: List[StakeholderRead] = [] # Example for further nesting
    class Config:
        from_attributes = True

class SDG_GoalRead(SDG_GoalBase):
    # targets: List[SDG_TargetRead] = [] # Example for further nesting
    class Config:
        from_attributes = True

class SD_ObjectiveRead(SD_ObjectiveBase):
    # sdg_goals: List[SDG_GoalRead] = [] # Example for further nesting
    class Config:
        from_attributes = True

---
## <a id='toc1_5_'></a>[🎉 Congratulations!](#toc0_)

Once you've filled out all the code blocks, you will have a complete, self-contained script that defines the entire database schema and the validation models for your project.

**Next Step:** In our next session, we will use the classes you've defined here to create the physical `mining_knowledge.db` file and populate it with sample data.