<div align="center">

# Getting Started with Patra Model Card Toolkit

</div>

The **Patra Toolkit** simplifies creating and documenting AI/ML models through a structured schema, encouraging best practices and enhanced transparency. It captures essential metadata—model purpose, development process, performance metrics, fairness, and explainability analyses—and packages them into **Model Cards** that can be integrated into the [Patra Knowledge Base](https://github.com/Data-to-Insight-Center/patra-kg).

## Features
- **Structured Schema** – Helps provide critical model information, including usage, development, and performance.
- **Semi-Automated Descriptive Fields** – Automated scanners capture fairness, explainability, and environment dependencies:
  - *Fairness Scanner* – Evaluates predictions across different groups.  
  - *Explainability Scanner* – Provides interpretability metrics.  
  - *Model Requirements Scanner* – Records Python packages and versions.
- **Validation and JSON Generation** – Ensures completeness and correctness before generating the Model Card as JSON.
- **Backend Storage Support** – Pluggable model store backends enable uploading and retrieving models/artifacts from:
  - *Hugging Face* – Integrates with Hugging Face Hub for model storage.  
  - *GitHub* – Leverages GitHub repositories to store serialized models.  
- **Integration with Patra Knowledge Base:** The Model Cards created using the Patra Toolkit are designed to be added to the [Patra Knowledge Base](https://github.com/Data-to-Insight-Center/patra-kg), which is a graph database that stores and manages these cards.

The Patra Toolkit plays a crucial role in promoting transparency and accountability in AI/ML development by making it easier for developers to create comprehensive and informative Model Cards. By automating certain aspects of the documentation process and providing a structured schema, the Toolkit reduces the barriers to entry for creating high-quality model documentation.

---

This notebook demonstrates:

1. **Loading & Preprocessing** the UCI Adult Dataset  
2. **Training** a simple TensorFlow model  
3. **Creating a Model Card** with optional Fairness and XAI scans  
4. **Submitting** the Model Card (and optionally the model, inference label, and artifacts) to:
   - **Patra server** (for model card storage)  
   - **Backend** (Hugging Face or GitHub) for model storage

---

## 1. Environment Setup

In [20]:
!pip install patra_toolkit

Collecting jsonschema>4.18.5 (from patra_toolkit)
  Obtaining dependency information for jsonschema>4.18.5 from https://files.pythonhosted.org/packages/69/4a/4f9dbeb84e8850557c02365a0eee0649abe5eb1d84af92a25731c6c0f922/jsonschema-4.23.0-py3-none-any.whl.metadata
  Downloading jsonschema-4.23.0-py3-none-any.whl.metadata (7.9 kB)
Collecting fairlearn~=0.11.0 (from patra_toolkit)
  Obtaining dependency information for fairlearn~=0.11.0 from https://files.pythonhosted.org/packages/ec/10/7142b64f0835958920672410c0002b3575d668db979000266e81b19eb4ac/fairlearn-0.11.0-py3-none-any.whl.metadata
  Downloading fairlearn-0.11.0-py3-none-any.whl.metadata (7.0 kB)
Collecting shap~=0.46.0 (from patra_toolkit)
  Obtaining dependency information for shap~=0.46.0 from https://files.pythonhosted.org/packages/5f/9e/dce41d5ec9e79add65faf4381d8d4492247b29daaa6cc7d7fd0298abc1e2/shap-0.46.0-cp311-cp311-macosx_11_0_arm64.whl.metadata
  Downloading shap-0.46.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (24 kB)
C

In [None]:
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("absl").setLevel(logging.ERROR)
logging.getLogger("huggingface_hub").setLevel(logging.ERROR)
logging.getLogger("PyGithub").setLevel(logging.ERROR)

import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import requests, io

from patra_toolkit import ModelCard, AIModel

## 2. Load and Pre-process the Data

We'll use the **UCI Adult Dataset**, which predicts whether an individual's income is above or below $50K based on demographics. This dataset is a common benchmark for exploring model fairness.

In [2]:
data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
resp = requests.get(data_url)
resp.raise_for_status()

cols = [
    "age","workclass","fnlwgt","education","education_num",
    "marital_status","occupation","relationship","race",
    "sex","capital_gain","capital_loss","hours_per_week",
    "native_country","income"
]
df = pd.read_csv(io.StringIO(resp.text), names=cols, header=None)

# Encode target
df["income"] = LabelEncoder().fit_transform(df["income"])  # 1 if >50K, else 0

# One-hot encode everything except the target
df = pd.get_dummies(df, drop_first=True, dtype=float)

# Split into features/labels
X = df.drop("income", axis=1).astype("float32").values
y = df["income"].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
print("Train shape:", X_train.shape, "Test shape:", X_test.shape)

Train shape: (26048, 100) Test shape: (6513, 100)


## 3. Train a Simple TensorFlow Model

Below is a straightforward neural network: two hidden layers plus a final sigmoid for binary classification. We'll train for a few epochs to demonstrate end-to-end usage.

In [3]:
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

model.fit(X_train, y_train, epochs=5, batch_size=64, verbose=1)

loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")

Epoch 1/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 963us/step - accuracy: 0.6729 - loss: 903.4367
Epoch 2/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 934us/step - accuracy: 0.6763 - loss: 218.0626
Epoch 3/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.6886 - loss: 162.0765
Epoch 4/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.6828 - loss: 194.1631
Epoch 5/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.6830 - loss: 176.9442
Test Loss: 258.3792, Test Accuracy: 0.2412


## 4. Building a Patra Model Card

### 4.1 Basic Model Card Setup
We start with essential metadata like name, version, short description, and so on.  

In [4]:
mc = ModelCard(
    name="UCI_Adult_Model",
    version="1.0",
    short_description="Predicting whether an individual's income is above $50K using TensorFlow.",
    full_description=(
        "This is a feed-forward neural network trained on the UCI Adult Dataset. "
        "It demonstrates how Patra Toolkit can store model details, fairness scans, "
        "and basic explainability data in a comprehensive Model Card."
    ),
    keywords="uci, adult, patra, fairness, xai, tensorflow",
    author="neelk",
    input_type="Tabular",
    category="classification",
    citation="Becker, B. & Kohavi, R. (1996). Adult [Dataset]. UCI."
)

### 4.2 Attach AI Model Information
Here we describe the model's ownership, license, performance metrics, etc.

In [5]:
ai_model = AIModel(
    name="AdultTFModel",
    version="1.0",
    description="DNN on UCI Adult dataset for income prediction",
    owner="username",
    location="", 
    license="BSD-3-Clause",
    framework="tensorflow",
    model_type="dnn",
    test_accuracy=accuracy
)

# Add additional performance or training metrics
ai_model.add_metric("Epochs", 5)
ai_model.add_metric("BatchSize", 64)
ai_model.add_metric("Optimizer", "Adam")

mc.ai_model = ai_model

## 5. Fairness & Explainability

### 5.1 Bias (Fairness) Analysis
Patra Toolkit has a built-in `populate_bias` method to measure metrics like **demographic parity** or **equalized odds**. We'll focus on the protected attribute "sex" in the data.

**Why check bias?** Real-world models often inadvertently penalize certain groups. By calling `mc.populate_bias(...)`, you get a quick sense of whether the model is systematically advantaging or disadvantaging certain subpopulations.

In [6]:
y_pred = model.predict(X_test)
y_pred = (y_pred >= 0.5).flatten()

mc.populate_bias(
    X_test,
    y_test,
    y_pred,
    "gender",           # Name you want displayed in the report
    X_test[:, 58],      # The slice of data that corresponds to gender
    model
)

print("Bias Analysis:\n", mc.bias_analysis)


[1m204/204[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 431us/step
Bias Analysis:
 {'demographic_parity_diff': np.float64(0.0), 'equal_odds_difference': 0.0}


### 5.2 Explainability (XAI)

If we want to understand model decisions, we can generate interpretability metrics (like feature importance) using Patra’s internal SHAP-based approach.

In [7]:
# Rebuild the list of columns used in training
x_columns = df.columns.tolist()
x_columns.remove('income')  # Remove the target

mc.populate_xai(
    X_test[:10],
    x_columns,
    model
)

print("Explainability Analysis:\n", mc.xai_analysis)

Explainability Analysis:
 {'age': 0.0, 'native_country__Cuba': 0.0, 'native_country__Holand_Netherlands': 0.0, 'native_country__Haiti': 0.0, 'native_country__Guatemala': 0.0, 'native_country__Greece': 0.0, 'native_country__Germany': 0.0, 'native_country__France': 0.0, 'native_country__England': 0.0, 'native_country__El_Salvador': 0.0}


## 6. Add Requirements and Validate
We let Patra auto-detect Python package dependencies to ensure reproducibility and then validate the card for completeness.

In [8]:
mc.populate_requirements()
if mc.validate():
    print("Model Card is valid and ready to submit!")
else:
    print("Validation failed. See logs for details.")

INFO:root:Model card validated successfully.


Model Card is valid and ready to submit!


## 7. Submission Options

The `mc.submit(...)` method can do one or more of the following:
1. **Submit only the card** (no model, no artifacts).
2. **Include the trained model** (uploading to Hugging Face or GitHub).
3. **Add artifacts** (like data files, inference labels, or any additional resources).

Below, we demonstrate multiple usage patterns.

### 7.1 Submit **Only** the Model Card

No model, no inference label, no artifacts. Just the card is posted to your Patra server for cataloging.

In [9]:
mc.submit(patra_server_url="http://127.0.0.1:5002")

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: neelk-uci_adult_model-1.0
INFO:root:Model Card submitted successfully.


'success'

### 7.2 Submit Model Card and Model

We can specify `"huggingface"` or `"github"` for `model_store`. This will attempt to upload our trained model, while the card is posted to the Patra server.

In [10]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002",  
    model=model,                
    file_format="h5",
    model_store="huggingface"
)

INFO:root:Model card validated successfully.
ERROR:root:Model submission failed during model ID creation: Model ID already exists. Please update your model version or choose a new name.


In [11]:
mc.version = "1.1"
mc.submit(
    patra_server_url="http://127.0.0.1:5002",  
    model=model,                
    file_format="h5",
    model_store="huggingface"
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: neelk-uci_adult_model-1.1
ERROR:root:Model submission failed during model serialization: module 'torch' has no attribute 'nn'


### 7.3 Submit Artifacts

In [13]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002", 
    model_store="huggingface",
    artifacts=["data/adult/adult.data", 
               "data/adult/adult.names",
               "data/adult/adult.names"]
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: unknown-id
No files have been modified since last commit. Skipping to prevent empty commit.
INFO:root:Artifact 'data/adult/adult.data' uploaded at: https://huggingface.co/patra-iu/unknown-id/blob/main/adult.data
No files have been modified since last commit. Skipping to prevent empty commit.
INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://huggingface.co/patra-iu/unknown-id/blob/main/adult.names
No files have been modified since last commit. Skipping to prevent empty commit.
INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://huggingface.co/patra-iu/unknown-id/blob/main/adult.names
INFO:root:Model Card submitted successfully.


'success'

### 7.4 Submit Model Card, Model, and Artifacts

This scenario might include a special label file plus multiple dataset artifacts.

In [14]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002", 
    model=model,
    file_format="h5",
    model_store="huggingface",
    inference_label="data/labels.txt",
    artifacts=["data/adult/adult.data", 
               "data/adult/adult.names",
               "data/adult/adult.names"]
)

INFO:root:Model card validated successfully.
ERROR:root:Model submission failed during model ID creation: Model ID already exists. Please update your model version or choose a new name.


In [15]:
mc.version = "1.2"
mc.submit(
    patra_server_url="http://127.0.0.1:5002", 
    model=model,
    file_format="h5",
    model_store="huggingface",
    inference_label="data/labels.txt",
    artifacts=["data/adult/adult.data", 
               "data/adult/adult.names",
               "data/adult/adult.names"]
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: neelk-uci_adult_model-1.2
INFO:root:Model serialized successfully.
neelk-uci_adult_model-1.2.h5: 100%|██████████| 130k/130k [00:00<00:00, 1.56MB/s]
INFO:root:Model uploaded at: https://huggingface.co/patra-iu/neelk-uci_adult_model-1.2/blob/main/neelk-uci_adult_model-1.2.h5
INFO:root:Inference label uploaded at: https://huggingface.co/patra-iu/neelk-uci_adult_model-1.2/blob/main/labels.txt
INFO:root:Artifact 'data/adult/adult.data' uploaded at: https://huggingface.co/patra-iu/neelk-uci_adult_model-1.2/blob/main/adult.data
INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://huggingface.co/patra-iu/neelk-uci_adult_model-1.2/blob/main/adult.names
No files have been modified since last commit. Skipping to prevent empty commit.
INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://huggingface.co/patra-iu/neelk-uci_adult_model-1.2/blob/main/adult.names
INFO:root:Model Card submitted successfully

'success'

### 7.4 Pushing to GitHub

By switching `"huggingface"` to `"github"`, you can store your model in a GitHub repo.

In [17]:
mc.version = "1.3"
mc.submit(
    patra_server_url="http://127.0.0.1:5002", 
    model=model,
    file_format="h5",
    model_store="github",
    inference_label="data/labels.txt",
    artifacts=["data/adult/adult.data", 
               "data/adult/adult.names",
               "data/adult/adult.names"]
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: neelk-uci_adult_model-1.3
INFO:root:Model serialized successfully.


Repository 'neelk-uci_adult_model-1.3' created successfully.
Initialized empty Git repository in /private/var/folders/d7/zwq9fkgs65xdfbrv7v00g8dc0000gn/T/neelk-uci_adult_model-1.36xwscxda/.git/


To https://github.com/nee1k/neelk-uci_adult_model-1.3.git
 * [new branch]      main -> mainINFO:root:Model uploaded at: https://github.com/nee1k/neelk-uci_adult_model-1.3/blob/main/neelk-uci_adult_model-1.3.h5


Repository 'neelk-uci_adult_model-1.3' already exists. Using existing repository.


INFO:root:Inference label uploaded at: https://github.com/nee1k/neelk-uci_adult_model-1.3/blob/main/labels.txt


Repository 'neelk-uci_adult_model-1.3' already exists. Using existing repository.


INFO:root:Artifact 'data/adult/adult.data' uploaded at: https://github.com/nee1k/neelk-uci_adult_model-1.3/blob/main/adult.data


Repository 'neelk-uci_adult_model-1.3' already exists. Using existing repository.


INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://github.com/nee1k/neelk-uci_adult_model-1.3/blob/main/adult.names


Repository 'neelk-uci_adult_model-1.3' already exists. Using existing repository.
No changes to commit, skipping commit step.


INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://github.com/nee1k/neelk-uci_adult_model-1.3/blob/main/adult.names
INFO:root:Model Card submitted successfully.


'success'

By following this notebook, you have:

	1.	Loaded and preprocessed the UCI Adult Dataset
	2.	Trained a TensorFlow model to predict income
	3.	Built a Patra Model Card describing the model’s purpose, performance, and environment
	4.	(Optionally) scanned for fairness and explainability metrics
	5.	Submitted the card to a Patra server along with the model or artifacts to a chosen store (Hugging Face or GitHub)
