<h1 style=\"text-align: center; font-size: 50px;\"> Run Workflow </h1>

# Notebook Overview

- Start Execution
- Define User Constants
- Install and Import Libraries
- Configure Settings
- Verify Assets
- Load and Validate Data
- Load and Configure the LLM
- Define Helper Functions
- Run Evaluation
- Display Evaluation Results
- Save Evaluation Results

# Start Execution

In [1]:
import logging
from pathlib import Path
from datetime import datetime
import time

# Configure logger
logger: logging.Logger = logging.getLogger("run_workflow_logger")
logger.setLevel(logging.INFO)
logger.propagate = False  # Prevent duplicate logs from parent loggers

# Set formatter
formatter: logging.Formatter = logging.Formatter(
    fmt="%(asctime)s - %(levelname)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S"
)

# Configure and attach stream handler
stream_handler: logging.StreamHandler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)

In [2]:
start_time = time.time()  
logger.info("Notebook execution started.")

2025-06-30 20:53:47 - INFO - Notebook execution started.


# Define User Constants

In [3]:
# File configuration
INPUT_FILE_NAME: str = "2025 ISEF Project Abstracts.csv"
INPUT_DIR: Path = Path("../data/inputs")
OUTPUT_DIR: Path = Path("../data/outputs")

# Ensure directories exist
INPUT_DIR.mkdir(parents=True, exist_ok=True)
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

INPUT_PATH: Path = INPUT_DIR / INPUT_FILE_NAME
TIMESTAMP: str = datetime.now().strftime('%Y-%m-%d %H-%M-%S')
OUTPUT_FILE_NAME: str = f"Evaluated - {INPUT_FILE_NAME} - {TIMESTAMP}"
OUTPUT_PATH: Path = OUTPUT_DIR / OUTPUT_FILE_NAME

# Evaluation configuration
KEY_COLUMN: str = "title"
EVAL_COLUMN: str = "abstract"
CRITERIA: dict[str, int] = {
    "Originality": 3,
    "ScientificRigor": 4,
    "Clarity": 2,
    "Relevance": 1,
    "Feasibility": 3,
    "Brevity": 2,
}

# Percentage of rows to evaluate
PERCENTAGE_ROWS_TO_BE_EVALUATED: float = 0.1

# Install and Import Libraries

In [4]:
%%time

%pip install -r ../requirements.txt --quiet

Note: you may need to restart the kernel to use updated packages.
CPU times: user 20.3 ms, sys: 9.18 ms, total: 29.5 ms
Wall time: 1.91 s


In [5]:
import os
import re
import sys
import warnings
import multiprocessing
from typing import Any, Dict, List

import pandas as pd
from tqdm.auto import tqdm
from llama_cpp import Llama

# Configure Settings

In [6]:
warnings.filterwarnings("ignore")

In [7]:
LLAMA_MODEL_PATH = "/home/jovyan/datafabric/meta-llama3.1-8b-Q8/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf"

# Verify Assets

In [8]:
def log_asset_status(asset_path: str, asset_name: str) -> None:
    """
    Logs the status of a given asset based on its existence.

    Parameters:
        asset_path (str): File or directory path to check.
        asset_name (str): Name of the asset for logging context.
    """
    if Path(asset_path).exists():
        logger.info(f"{asset_name} is properly configured.")
    else:
        logger.info(f"{asset_name} is not properly configured. Please ensure the required asset is correctly configured in your AI Studio project according to the README file.")

In [9]:
log_asset_status(
    asset_path=INPUT_PATH,
    asset_name="Input Data",
)

log_asset_status(
    asset_path=LLAMA_MODEL_PATH,
    asset_name="LLaMA Local model",
)

2025-06-30 20:53:50 - INFO - Input Data is properly configured.
2025-06-30 20:53:50 - INFO - LLaMA Local model is properly configured.


# Load and Validate Data

In [10]:
df = pd.read_csv(INPUT_PATH)

df.head(10)

Unnamed: 0,title,category,year,schools,abstract,country,State,Province,awards
0,Dynamic Response of a Human Neck Replica to Ax...,Energy: Physical,2014.0,set(),Purpose: A human neck replica was made to simu...,United States of America,MN,,['nan']
1,The Effect of Nutrient Solution Concentration ...,Physics and Astronomy,2014.0,set(),Studies comparing the mineral nutrition of hyd...,United States of America,UT,,['nan']
2,Do Air Root Pruning Pots Accelerate Success in...,Physics and Astronomy,2014.0,set(),The purpose of my project was to determine whi...,United States of America,LA,,['nan']
3,Insect-repelling Plants & New Organic Pesticide,Environmental Engineering,2014.0,set(),Organochlorine pesticides in agriculture are n...,United States of America,TX,,['nan']
4,How Do Different Factors Affect the Accuracy o...,Earth and Environmental Sciences,2014.0,set(),The purpose of this experiment is to determine...,United States of America,MN,,['nan']
5,Dye Sensitized Solar Cells: New Structures and...,Engineering Mechanics,2014.0,set(),Although fossil fuels have the capacity to pow...,United States of America,TX,,['Fourth Award of $500']
6,A Novel Method for Determination of Camera Pos...,Embedded Systems,2014.0,set(),The method proposed here solves for the pose o...,United States of America,MO,,['nan']
7,Observational Detection of Solar g-mode Oscill...,Microbiology,2014.0,set(),,United States of America,HI,,"['Third Award of $1,000']"
8,Synthesis of Periodic Mesoporous Organosilicas...,Plant Sciences,2014.0,set(),,United States of America,TX,,"['Second Award of $2,000']"
9,A Novel Mathematical Simulation to Study the D...,Materials Science,2014.0,set(),"Human Immunodeficiency Virus (HIV), the virus ...",United States of America,TX,,['nan']


In [11]:
# Validate required columns
missing_columns: list[str] = [
    col for col in [KEY_COLUMN, EVAL_COLUMN] if col not in df.columns
]
if missing_columns:
    raise KeyError(f"Missing required column(s): {', '.join(missing_columns)}")

# Ensure key column is of string type
df[KEY_COLUMN] = df[KEY_COLUMN].astype(str)

In [12]:
# Determine the number of rows to evaluate (at least 1)
num_rows_to_evaluate: int = max(int(len(df) * PERCENTAGE_ROWS_TO_BE_EVALUATED / 100), 1)

# Select the top rows for evaluation
df = df[:num_rows_to_evaluate]

# Load and Configure the LLM

In [13]:
%%time

llm: Llama = Llama(
    model_path=LLAMA_MODEL_PATH,                 # Path to the local GGUF model file
    n_gpu_layers=-1,                             # Load all layers to GPU if available
    n_batch=128,                                 # Number of tokens processed per batch
    n_ctx=8192,                                  # Context window size (max tokens the model can attend to)
    max_tokens=16,                                # Max tokens to generate per response
    f16_kv=True,                                 # Use 16-bit key/value memory for reduced memory usage
    use_mmap=True,                               # Memory-map model to improve loading efficiency
    low_vram=True,                               # Optimize for systems with low GPU memory
    rope_scaling=None,                           # Positional encoding scaling (None = default behavior)
    temperature=0.0,                             # Deterministic output (0 = greedy decoding)
    repeat_penalty=1.0,                          # No penalty for repeating tokens
    streaming=False,                             # Disable token streaming (batch output only)
    stop=None,                                   # No custom stop sequences
    seed=42,                                     # Set random seed for reproducibility
    num_threads=multiprocessing.cpu_count(),     # Use all available CPU cores
    verbose=False,                               # Disable verbose logging
)

CPU times: user 1.69 s, sys: 6.57 s, total: 8.27 s
Wall time: 1min 7s


# Define Helper Functions

In [14]:
# ─── Helper Functions ───────────────────────────────────────────────────────

def scale_score(raw_score: int, max_target: int) -> int:
    """
    Scales a score from a 1–10 range to the given max_target range.
    Clamps the result between 0 and max_target.
    """
    scaled: int = round((raw_score / 10) * max_target)
    return min(max(scaled, 0), max_target)


def extract_single_score(output: str) -> int:
    """
    Extracts a single integer score (1–10) from the LLM output.
    Returns -1 if no valid score is found.
    """
    match = re.search(r"\b(10|[1-9])\b", output)
    return int(match.group(1)) if match else -1


def evaluate_criterion(text: str, criterion: str) -> int:
    """
    Prompts the LLM to score the text based on a specific criterion.
    Returns the extracted integer score from the LLM's response.
    """
    if not isinstance(text, str):
        return -1

    prompt: str = (
        f"You are an expert evaluator. Rate the abstract below based solely on the criterion: '{criterion}'.\n"
        "Provide a single integer from 1 to 10 (inclusive).\n"
        "Output only the number — no words, labels, punctuation, or explanations.\n\n"
        f"Abstract:\n{text.strip()}\n\n"
        "Score:"
    )


    response: str = llm(prompt)["choices"][0]["text"]
    return extract_single_score(response)


def evaluate_row(text: str) -> Dict[str, int]:
    """
    Evaluates a single text against all rubric criteria.
    Returns a dictionary of scaled scores.
    """
    return {
        criterion: scale_score(
            evaluate_criterion(text, criterion),
            CRITERIA[criterion]
        )
        for criterion in CRITERIA
    }

# Run Evaluation

In [15]:
# Run evaluation over the selected DataFrame rows
evaluation_results: list[dict[str, Any]] = []

for _, row in tqdm(df.iterrows(), total=len(df), desc="Evaluating rows"):
    evaluated_row: dict[str, Any] = evaluate_row(row[EVAL_COLUMN])
    evaluated_row[KEY_COLUMN] = row[KEY_COLUMN]
    evaluation_results.append(evaluated_row)

# Convert results to a DataFrame
evaluation_df: pd.DataFrame = pd.DataFrame(evaluation_results)

Evaluating rows:   0%|          | 0/14 [00:00<?, ?it/s]

# Display Evaluation Results

In [16]:
# Merge original data with evaluation results on the key column
final_df: pd.DataFrame = df.merge(evaluation_df, on=KEY_COLUMN)

# Compute total score by summing across all criteria
final_df["TotalScore"] = final_df[list(CRITERIA.keys())].sum(axis=1)

# Sort the DataFrame by total score in descending order
final_df.sort_values(by="TotalScore", ascending=False, inplace=True)

# Preview the top 10 evaluated entries
final_df.head(10)

Unnamed: 0,title,category,year,schools,abstract,country,State,Province,awards,Originality,ScientificRigor,Clarity,Relevance,Feasibility,Brevity,TotalScore
11,Enhanced Third-Generation Biofuel Production f...,Systems Software,2014.0,set(),Algae are one of the most promising sources of...,United States of America,TX,,"['Third Award of $1,000']",3,4,2,1,3,1,14
12,The Use of MnSOD in Combined Modality Therapy ...,Mathematics,2014.0,set(),Lung cancer is the deadliest cancer claiming 1...,United States of America,TX,,"['Second Award of $2,000']",2,4,1,1,3,2,13
9,A Novel Mathematical Simulation to Study the D...,Materials Science,2014.0,set(),"Human Immunodeficiency Virus (HIV), the virus ...",United States of America,TX,,['nan'],1,4,1,1,3,2,12
0,Dynamic Response of a Human Neck Replica to Ax...,Energy: Physical,2014.0,set(),Purpose: A human neck replica was made to simu...,United States of America,MN,,['nan'],1,4,2,1,3,0,11
1,The Effect of Nutrient Solution Concentration ...,Physics and Astronomy,2014.0,set(),Studies comparing the mineral nutrition of hyd...,United States of America,UT,,['nan'],1,4,2,1,3,0,11
10,A Novel Approach to Solar Desalination Using N...,Plant Sciences,2014.0,set(),The purpose of the project was to determine if...,United States of America,FL,,['nan'],2,3,2,1,2,0,10
3,Insect-repelling Plants & New Organic Pesticide,Environmental Engineering,2014.0,set(),Organochlorine pesticides in agriculture are n...,United States of America,TX,,['nan'],1,2,1,1,2,2,9
5,Dye Sensitized Solar Cells: New Structures and...,Engineering Mechanics,2014.0,set(),Although fossil fuels have the capacity to pow...,United States of America,TX,,['Fourth Award of $500'],1,3,1,1,2,1,9
6,A Novel Method for Determination of Camera Pos...,Embedded Systems,2014.0,set(),The method proposed here solves for the pose o...,United States of America,MO,,['nan'],1,2,2,1,2,1,9
2,Do Air Root Pruning Pots Accelerate Success in...,Physics and Astronomy,2014.0,set(),The purpose of my project was to determine whi...,United States of America,LA,,['nan'],1,2,1,0,2,1,7


# Save Evaluation Results

In [17]:
final_df.to_csv(OUTPUT_PATH, index=False)
logger.info(f"✅ Evaluation results successfully saved to: {OUTPUT_PATH}")

2025-06-30 20:55:44 - INFO - ✅ Evaluation results successfully saved to: ../data/outputs/Evaluated - 2025 ISEF Project Abstracts.csv - 2025-06-30 20-53-47


In [18]:
end_time: float = time.time()
elapsed_time: float = end_time - start_time
elapsed_minutes: int = int(elapsed_time // 60)
elapsed_seconds: float = elapsed_time % 60

logger.info(f"⏱️ Total execution time: {elapsed_minutes}m {elapsed_seconds:.2f}s")
logger.info("✅ Notebook execution completed successfully.")

2025-06-30 20:55:44 - INFO - ⏱️ Total execution time: 1m 56.51s
2025-06-30 20:55:44 - INFO - ✅ Notebook execution completed successfully.


Built with ❤️ using [**HP AI Studio**](https://hp.com/ai-studio).