# Getting Started with the Patra Toolkit

This notebook will guide you through:
- Loading and preprocessing the UCI Adult Dataset
- Training a simple TensorFlow model (as one cohesive block)
- Creating and populating a Patra Model Card, with a focus on:
  - Model metadata
  - Potential fairness and explainability scans
  - Submitting the card (and optionally the model or artifacts) to a Patra server and model store

---

In [1]:
# 1. ENVIRONMENT SETUP
!pip install patra_toolkit


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [14]:
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("absl").setLevel(logging.ERROR)
logging.getLogger("huggingface_hub").setLevel(logging.ERROR)

import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import requests, io

from patra_toolkit import ModelCard, AIModel

## 2. Load and Pre-process the Data

We’ll download the UCI Adult Dataset from UC Irvine’s repository and convert it into a pandas DataFrame.

In [4]:
data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
response = requests.get(data_url)
response.raise_for_status()

columns = [
    "age","workclass","fnlwgt","education","education_num",
    "marital_status","occupation","relationship","race",
    "sex","capital_gain","capital_loss","hours_per_week",
    "native_country","income"
]
df = pd.read_csv(io.StringIO(response.text), names=columns, header=None)

# Encode target
df["income"] = LabelEncoder().fit_transform(df["income"])

# One-hot encode
df = pd.get_dummies(df, drop_first=True, dtype=float)

# Split X and y
X = df.drop("income", axis=1).astype("float32").values
y = df["income"].values

# Train-Test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### 3. Train a Simple TensorFlow Model

In [5]:
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

model.fit(
    X_train, y_train,
    epochs=5,
    batch_size=64,
    verbose=1
)

loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")

Epoch 1/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 473us/step - accuracy: 0.6378 - loss: 1281.0596   
Epoch 2/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 440us/step - accuracy: 0.6832 - loss: 91.0107
Epoch 3/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 465us/step - accuracy: 0.6749 - loss: 78.8655
Epoch 4/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 523us/step - accuracy: 0.6700 - loss: 128.7520
Epoch 5/5
[1m407/407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 621us/step - accuracy: 0.6780 - loss: 97.8933 
Test Loss: 246.5828, Test Accuracy: 0.2412


## 4. Create a Patra Model Card

A `ModelCard` holds high-level metadata about the model: purpose, data, contact info, etc.

In [6]:
# 4.1 Create the ModelCard
mc = ModelCard(
    name="UCI_Adult_Income_Model",
    version="0.1",
    short_description="Income prediction using UCI Adult Dataset in TensorFlow.",
    full_description=(
        "A basic neural network trained on the UCI Adult data to predict whether an "
        "individual's income exceeds a threshold. Demonstrates usage of the Patra Toolkit."
    ),
    keywords="uci, adult, patra, fairness, explainability, tensorflow",
    author="Data Scientist",
    input_type="Tabular",
    category="classification",
    citation="Becker, B. & Kohavi, R. (1996). Adult [Dataset]. UCI Machine Learning Repository."
)

In [7]:
ai_model = AIModel(
    name="AdultTFModel",
    version="0.1",
    description="Simple DNN for UCI Adult dataset classification",
    owner="username",
    location="",  # Will be filled upon upload
    license="BSD-3-Clause",
    framework="tensorflow",
    model_type="dnn",
    test_accuracy=accuracy
)

# Additional metrics
ai_model.add_metric("Epochs", 5)
ai_model.add_metric("BatchSize", 64)
ai_model.add_metric("Optimizer", "Adam")

# Link the AIModel to the ModelCard
mc.ai_model = ai_model

### 4.2 Bias (Fairness) Analysis

Below, we show how to call the `populate_bias()` method, which takes the test dataset, predicted labels, and the feature on which you want to measure bias. For demonstration, we assume the “gender” feature is at index 58 in **X_test** (as determined after one-hot encoding).

- `feature_name`: "gender"  
- `protected_feature_data`: The specific column from your **X_test** that corresponds to "gender"  
- `model`: The trained TensorFlow model (not strictly needed to compute bias, but used in some advanced checks)


In [8]:
y_pred = model.predict(X_test)
y_pred = (y_pred >= 0.5).flatten()

mc.populate_bias(
    X_test,
    y_test,
    y_pred,
    "gender",           # Name you want displayed in the report
    X_test[:, 58],      # The slice of data that corresponds to gender
    model
)

print("Bias Analysis:\n", mc.bias_analysis)

[1m204/204[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 310us/step
Bias Analysis:
 {'demographic_parity_diff': 0.0, 'equal_odds_difference': 0.0}


### 4.3 Explainability (XAI) Analysis

Similarly, we can generate some basic SHAP-based interpretability metrics or feature attribution for a sample of inputs.
- `num_samples_to_explain`: 10 in our case  
- We provide `X_test[:10]` along with the actual column names from the dataset (minus the target column).

In [10]:
# Rebuild the list of columns used in training
x_columns = df.columns.tolist()
x_columns.remove('income')  # Remove the target

mc.populate_xai(
    X_test[:10],
    x_columns,
    model
)

print("Explainability Analysis:\n", mc.xai_analysis)


Explainability Analysis:
 {'age': 0.0, 'native_country__Cuba': 0.0, 'native_country__Holand_Netherlands': 0.0, 'native_country__Haiti': 0.0, 'native_country__Guatemala': 0.0, 'native_country__Greece': 0.0, 'native_country__Germany': 0.0, 'native_country__France': 0.0, 'native_country__England': 0.0, 'native_country__El_Salvador': 0.0}


### 5. Validate and Save the ModelCard

In [11]:
mc.validate()

mc.save("patra_modelcard.json")
print("Model Card validation successful and file saved.")

INFO:root:Model card validated successfully.
INFO:root:Model card saved to patra_modelcard.json.


Model Card validation successful and file saved.


### 6. Submit

Below are several ways to submit:

1. **Only the ModelCard** (no model or artifacts)
2. **Model Only** (upload your model, no label or artifacts)
3. **Full** (model, label, artifacts)
4. **GitHub** option

### 6.1 Only the Model Card

In [12]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002",
    model=None,
    file_format=None,
    model_store=None,
    inference_label=None,
    artifacts=None
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: data_scientist-uci_adult_income_model-0.1
INFO:root:Model Card submitted successfully.


'success'

### 6.2 Model Only (upload the model, no label or artifacts)

In [15]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002",
    model=model,
    file_format="h5",
    model_store="huggingface",
    inference_label=None,
    artifacts=None
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: data_scientist-uci_adult_income_model-0.1
INFO:root:Model serialized successfully.
No files have been modified since last commit. Skipping to prevent empty commit.
INFO:root:Model uploaded at: https://huggingface.co/patra-iu/data_scientist-uci_adult_income_model-0.1/blob/main/data_scientist-uci_adult_income_model-0.1.h5
INFO:root:Model Card submitted successfully.


'success'

### 5.3 Full Submission (model, label, artifacts)

In [16]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002",
    model=model,
    file_format="h5",
    model_store="huggingface",
    inference_label="data/labels.txt",
    artifacts=["data/adult/adult.data", "data/adult/adult.names", "data/adult/adult.test"]
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: data_scientist-uci_adult_income_model-0.1
INFO:root:Model serialized successfully.
No files have been modified since last commit. Skipping to prevent empty commit.
INFO:root:Model uploaded at: https://huggingface.co/patra-iu/data_scientist-uci_adult_income_model-0.1/blob/main/data_scientist-uci_adult_income_model-0.1.h5
INFO:root:Inference label uploaded at: https://huggingface.co/patra-iu/data_scientist-uci_adult_income_model-0.1/blob/main/labels.txt
INFO:root:Artifact 'data/adult/adult.data' uploaded at: https://huggingface.co/patra-iu/data_scientist-uci_adult_income_model-0.1/blob/main/adult.data
INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://huggingface.co/patra-iu/data_scientist-uci_adult_income_model-0.1/blob/main/adult.names
INFO:root:Artifact 'data/adult/adult.test' uploaded at: https://huggingface.co/patra-iu/data_scientist-uci_adult_income_model-0.1/blob/main/adult.test
INFO:root:Mode

'success'

### 5.4 Submitting to GitHub Instead

In [18]:
mc.submit(
    patra_server_url="http://127.0.0.1:5002",
    model=model,
    file_format="h5",
    model_store="github",
    inference_label="data/labels.txt",
    artifacts=["data/adult/adult.data", "data/adult/adult.names", "data/adult/adult.test"]
)

INFO:root:Model card validated successfully.
INFO:root:Model ID retrieved: data_scientist-uci_adult_income_model-0.1
INFO:root:Model serialized successfully.


Repository 'data_scientist-uci_adult_income_model-0.1' already exists. Using existing repository.
No changes to commit, skipping commit step.


INFO:root:Model uploaded at: https://github.com/nee1k/data_scientist-uci_adult_income_model-0.1/blob/main/data_scientist-uci_adult_income_model-0.1.h5


Repository 'data_scientist-uci_adult_income_model-0.1' already exists. Using existing repository.


INFO:root:Inference label uploaded at: https://github.com/nee1k/data_scientist-uci_adult_income_model-0.1/blob/main/labels.txt


Repository 'data_scientist-uci_adult_income_model-0.1' already exists. Using existing repository.


INFO:root:Artifact 'data/adult/adult.data' uploaded at: https://github.com/nee1k/data_scientist-uci_adult_income_model-0.1/blob/main/adult.data


Repository 'data_scientist-uci_adult_income_model-0.1' already exists. Using existing repository.


INFO:root:Artifact 'data/adult/adult.names' uploaded at: https://github.com/nee1k/data_scientist-uci_adult_income_model-0.1/blob/main/adult.names


Repository 'data_scientist-uci_adult_income_model-0.1' already exists. Using existing repository.


INFO:root:Artifact 'data/adult/adult.test' uploaded at: https://github.com/nee1k/data_scientist-uci_adult_income_model-0.1/blob/main/adult.test
INFO:root:Model Card submitted successfully.


'success'

- Explore advanced fairness or XAI techniques in Patra.
- Save the card with `mc.save("my_model_card.json")`.
- Integrate your card with the [Patra Knowledge Base](https://github.com/Data-to-Insight-Center/patra-kg).

That’s it! You’ve built a minimal end-to-end workflow showing how to:
1. Train a model,
2. Build a Model Card,
3. Possibly include fairness or explainability,
4. Submit or store that card and your model via Patra Toolkit.