<div align="center">

# Getting Started with Patra Model Card Toolkit

<div align="center">
    
[![Documentation Status](https://img.shields.io/badge/docs-latest-blue.svg)](https://patra-toolkit.readthedocs.io/en/latest/) [![Build Status](https://github.com/Data-to-Insight-Center/patra-toolkit/actions/workflows/ci.yml/badge.svg)](https://github.com/Data-to-Insight-Center/patra-toolkit/actions)  [![PyPI version](https://badge.fury.io/py/patra-toolkit.svg)](https://pypi.org/project/patra-toolkit/)  [![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)  [![Example Notebook](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Data-to-Insight-Center/patra-toolkit/blob/main/examples/notebooks/GettingStarted.ipynb)

</div>

</div>

**Patra Toolkit** offers a structured, semi-automated way to create and document AI/ML models via **Model Cards**. These cards:

- Capture essential model metadata: purpose, usage, performance.
- Include optional but **highly recommended** *Fairness* (bias) and *Explainability* (XAI) analyses.
- Support environment scanning for reproducibility.
- Can be stored or retrieved from popular backends (Hugging Face, GitHub).

---

This notebook demonstrates:

1. **Loading & Preprocessing** the UCI Adult Dataset  
2. **Training** a simple TensorFlow model  
3. **Creating a Model Card** with optional Fairness and XAI scans  
4. **Submitting** the Model Card (and optionally the model, inference label, and artifacts) to:
   - **Patra server** (for model card storage)  
   - **Backend** (Hugging Face or GitHub) for model storage

---

In [20]:
# 1. ENVIRONMENT SETUP
!pip install patra_toolkit


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [21]:
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("absl").setLevel(logging.ERROR)
logging.getLogger("huggingface_hub").setLevel(logging.ERROR)

import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import requests, io

from patra_toolkit import ModelCard, AIModel

## 2. Load and Pre-process the Data

We'll use the **UCI Adult Dataset**, which predicts whether an individual's income is above or below $50K based on demographics. This dataset is a common benchmark for exploring model fairness.

In [22]:
data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
resp = requests.get(data_url)
resp.raise_for_status()

cols = [
    "age","workclass","fnlwgt","education","education_num",
    "marital_status","occupation","relationship","race",
    "sex","capital_gain","capital_loss","hours_per_week",
    "native_country","income"
]
df = pd.read_csv(io.StringIO(resp.text), names=cols, header=None)

# Encode target
df["income"] = LabelEncoder().fit_transform(df["income"])  # 1 if >50K, else 0

# One-hot encode everything except the target
df = pd.get_dummies(df, drop_first=True, dtype=float)

# Split into features/labels
X = df.drop("income", axis=1).astype("float32").values
y = df["income"].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
print("Train shape:", X_train.shape, "Test shape:", X_test.shape)

Train shape: (26048, 100) Test shape: (6513, 100)


## 3. Train a Simple TensorFlow Model

Below is a straightforward neural network: two hidden layers plus a final sigmoid for binary classification. We'll train for a few epochs to demonstrate end-to-end usage.

In [23]:
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

model.fit(X_train, y_train, epochs=5, batch_size=64, verbose=1)

loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")

Epoch 1/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 465us/step - accuracy: 0.6495 - loss: 1371.9069   
Epoch 2/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 583us/step - accuracy: 0.6836 - loss: 188.4090
Epoch 3/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 512us/step - accuracy: 0.6798 - loss: 185.3150
Epoch 4/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 531us/step - accuracy: 0.6836 - loss: 145.5680
Epoch 5/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 475us/step - accuracy: 0.6699 - loss: 185.8244
Test Loss: 142.9663, Test Accuracy: 0.7895


## 4. Building a Patra Model Card

### 4.1 Basic Model Card Setup
We start with essential metadata like name, version, short description, and so on.  

In [24]:
mc = ModelCard(
    name="UCI_Adult_Model",
    version="1.0",
    short_description="Predicting whether an individual's income is above $50K using TensorFlow.",
    full_description=(
        "This is a feed-forward neural network trained on the UCI Adult Dataset. "
        "It demonstrates how Patra Toolkit can store model details, fairness scans, "
        "and basic explainability data in a comprehensive Model Card."
    ),
    keywords="uci, adult, patra, fairness, xai, tensorflow",
    author="YourName",
    input_type="Tabular",
    category="classification",
    citation="Becker, B. & Kohavi, R. (1996). Adult [Dataset]. UCI."
)

### 4.2 Attach AI Model Information
Here we describe the model's ownership, license, performance metrics, etc.

In [25]:
ai_model = AIModel(
    name="AdultTFModel",
    version="1.0",
    description="DNN on UCI Adult dataset for income prediction",
    owner="username",
    location="", 
    license="BSD-3-Clause",
    framework="tensorflow",
    model_type="dnn",
    test_accuracy=accuracy
)

# Add additional performance or training metrics
ai_model.add_metric("Epochs", 5)
ai_model.add_metric("BatchSize", 64)
ai_model.add_metric("Optimizer", "Adam")

mc.ai_model = ai_model

## 5. Fairness & Explainability

### 5.1 Bias (Fairness) Analysis
Patra Toolkit has a built-in `populate_bias` method to measure metrics like **demographic parity** or **equalized odds**. We'll focus on the protected attribute "sex" in the data.

**Why check bias?** Real-world models often inadvertently penalize certain groups. By calling `mc.populate_bias(...)`, you get a quick sense of whether the model is systematically advantaging or disadvantaging certain subpopulations.

In [26]:
y_pred = (model.predict(X_test) >= 0.5).astype(int).flatten()

# Let's assume the "sex_ Male" column is at index i (we find it by searching df.columns)
import numpy as np
col_list = df.drop("income", axis=1).columns.tolist()
sex_col_index = col_list.index("sex_ Male")  # example
sex_data = X_test[:, sex_col_index]  # This is the 'sex_ Male' feature in numeric form

mc.populate_bias(
    dataset=X_test,
    true_labels=y_test,
    predicted_labels=y_pred,
    sensitive_feature_name="sex",
    sensitive_feature_data=sex_data,
    model=model
)

print("Bias Analysis Results:", mc.bias_analysis)

[1m204/204[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 305us/step
Bias Analysis Results: {'demographic_parity_diff': 0.033167888276767435, 'equal_odds_difference': 0.030822379183587073}


### 5.2 Explainability (XAI)

If we want to understand model decisions, we can generate interpretability metrics (like feature importance) using Patra’s internal SHAP-based approach.

In [27]:
# We'll define a subset for demonstration
x_columns = df.drop("income", axis=1).columns.tolist()

mc.populate_xai(
    train_dataset=X_test[:10],  # or any sample you want
    column_names=x_columns,
    model=model,
    n_features=10  # top 10 features
)

print("Explainability (SHAP-based) info:", mc.xai_analysis)

Explainability (SHAP-based) info: {'capital_gain': 0.25, 'fnlwgt': 0.09, 'native_country__Cuba': 0.0, 'native_country__Holand_Netherlands': 0.0, 'native_country__Haiti': 0.0, 'native_country__Guatemala': 0.0, 'native_country__Greece': 0.0, 'native_country__Germany': 0.0, 'native_country__France': 0.0, 'native_country__England': 0.0}


## 6. Add Requirements and Validate
We let Patra auto-detect Python package dependencies to ensure reproducibility and then validate the card for completeness.

In [28]:
mc.populate_requirements()
if mc.validate():
    print("Model Card is valid and ready to submit!")
else:
    print("Validation failed. See logs for details.")

INFO:root:Model card validated successfully.


Model Card is valid and ready to submit!


## 7. Submission Options

The `mc.submit(...)` method can do one or more of the following:
1. **Submit only the card** (no model, no artifacts).
2. **Include the trained model** (uploading to Hugging Face or GitHub).
3. **Add artifacts** (like data files, inference labels, or any additional resources).

Below, we demonstrate multiple usage patterns.

### 7.1 Submit **Only** the Model Card

No model, no inference label, no artifacts. Just the card is posted to your Patra server for cataloging.

In [29]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002", # example
    model=None,
    file_format=None,
    model_store=None,
    inference_label=None,
    artifacts=None
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: yourname-uci_adult_model-1.0
INFO:root:Model Card submitted successfully.


'success'

### 7.2 Submit Model Card and Model

We can specify `"huggingface"` or `"github"` for `model_store`. This will attempt to upload our trained model, while the card is posted to the Patra server.

In [32]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002",  
    model=model,                
    file_format="h5",
    model_store="huggingface",  
    inference_label=None,
    artifacts=None
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: yourname-uci_adult_model-1.0
INFO:root:Model serialized successfully.
yourname-uci_adult_model-1.0.h5: 100%|██████████| 130k/130k [00:00<00:00, 549kB/s]
INFO:root:Model uploaded at: https://huggingface.co/patra-iu/yourname-uci_adult_model-1.0/blob/main/yourname-uci_adult_model-1.0.h5
INFO:root:Model Card submitted successfully.


'success'

### 7.3 Submit Model Card, Model, and Artifacts

This scenario might include a special label file plus multiple dataset artifacts.

In [37]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002", 
    model=model,
    file_format="h5",
    model_store="huggingface",
    inference_label="data/labels.txt",
    artifacts=["data/adult/adult.data", 
               "data/adult/adult.names",
               "data/adult/adult.names"]
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: yourname-uci_adult_model-1.0
INFO:root:Model serialized successfully.
INFO:root:Model uploaded at: https://huggingface.co/patra-iu/yourname-uci_adult_model-1.0/blob/main/yourname-uci_adult_model-1.0.h5
INFO:root:Inference label uploaded at: https://huggingface.co/patra-iu/yourname-uci_adult_model-1.0/blob/main/labels.txt
INFO:root:Artifact 'data/adult/adult.data' uploaded at: https://huggingface.co/patra-iu/yourname-uci_adult_model-1.0/blob/main/adult.data
INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://huggingface.co/patra-iu/yourname-uci_adult_model-1.0/blob/main/adult.names
INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://huggingface.co/patra-iu/yourname-uci_adult_model-1.0/blob/main/adult.names
INFO:root:Model Card submitted successfully.


'success'

### 7.4 Pushing to GitHub

By switching `"huggingface"` to `"github"`, you can store your model in a GitHub repo.

In [38]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002", 
    model=model,
    file_format="h5",
    model_store="github",
    inference_label="data/labels.txt",
    artifacts=["data/adult/adult.data", 
               "data/adult/adult.names",
               "data/adult/adult.names"]
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: yourname-uci_adult_model-1.0
INFO:root:Model serialized successfully.


Repository 'yourname-uci_adult_model-1.0' created successfully.


INFO:root:Model uploaded at: https://github.com/nee1k/yourname-uci_adult_model-1.0/blob/main/yourname-uci_adult_model-1.0.h5


Repository 'yourname-uci_adult_model-1.0' already exists. Using existing repository.


INFO:root:Inference label uploaded at: https://github.com/nee1k/yourname-uci_adult_model-1.0/blob/main/labels.txt


Repository 'yourname-uci_adult_model-1.0' already exists. Using existing repository.


INFO:root:Artifact 'data/adult/adult.data' uploaded at: https://github.com/nee1k/yourname-uci_adult_model-1.0/blob/main/adult.data


Repository 'yourname-uci_adult_model-1.0' already exists. Using existing repository.


INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://github.com/nee1k/yourname-uci_adult_model-1.0/blob/main/adult.names


Repository 'yourname-uci_adult_model-1.0' already exists. Using existing repository.
No changes to commit, skipping commit step.


INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://github.com/nee1k/yourname-uci_adult_model-1.0/blob/main/adult.names
INFO:root:Model Card submitted successfully.


'success'

By following this notebook, you have:

	1.	Loaded and preprocessed the UCI Adult Dataset
	2.	Trained a TensorFlow model to predict income
	3.	Built a Patra Model Card describing the model’s purpose, performance, and environment
	4.	(Optionally) scanned for fairness and explainability metrics
	5.	Submitted the card to a Patra server along with the model or artifacts to a chosen store (Hugging Face or GitHub)
