## Basic workflow example for text classification/rubric

In order to run this, you need to put your OpenAI API token in an environmental variable called "OPENAI_KEY"

If you have quarto installed, you can both run this and export it to HTML via:

```quarto render Instructions_Example.ipynb --execute```

You can just export the current version without rerunning it via: 

```quarto render Instructions_Example.ipynb```

This has two methods for creating classes  to use as grading rubrics and then running them on sample data:

1. Hard-coding them via this workbook.
2. Reading them in from .csv files and writing out each class to a new .csv file with results.

In [1]:
from enum import Enum
import marvin
import os
import openai
import pandas as pd
from pydantic import BaseModel

marvin.settings.llm_max_tokens=1500
llm_max_context_tokens=2500
marvin.settings.llm_temperature=0.0

openai.api_key = os.environ.get("OPENAI_KEY")
marvin.settings.llm_model='openai/gpt-4'
pd.set_option('display.max_colwidth', None)

class GradingPipetteCleaningInstructions(Enum):
    # This defines the grading rubric that will be used. 
    PASS = """Includes instructions for all of the following tasks: 
    using distilled water, use of mild detergent or cleaning solution, 
    rinsing with distilled water, drying, reassembly, wearing gloves and goggles, 
    checking for calibration and wear"""
    FAIL = """Leaves out one or more of the following tasks:: using distilled water, 
    use of mild detergent or cleaning solution, 
    rinsing with distilled water, drying, reassembly, 
    wearing gloves and goggles, checking for calibration and wear"""

def compile_classification_data(instructions_with_true_labels, labels):
    """
    Compiles classification data into a DataFrame, assuming Marvin as the classifier.

    Parameters:
    - instructions_with_true_labels: A dictionary with instructions as keys and their true labels as values.
    - labels: The labels to be considered by the classifier during classification.

    Returns:
    - A pandas DataFrame containing instructions, their true labels, and Marvin's labels.
    """
    instructions = list(instructions_with_true_labels.keys())
    true_labels = list(instructions_with_true_labels.values())
    model_labels = [marvin.classify(instruction, labels).name for instruction in instructions]

    df = pd.DataFrame({
        'Instructions': instructions,
        'True Label': true_labels,
        'Model Label': model_labels
    })

    return df


Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [3]:
instructions_with_true_labels = {
    """Begin by rinsing the pipette with distilled water to remove any residual chemicals.
    Carefully disassemble the pipette into its component parts.
    Soak and gently scrub the parts with a mild detergent solution to cleanse thoroughly.
    Rinse all parts several times with distilled water to ensure all detergent is removed.
    Allow the parts to air dry completely in an upright position to prevent moisture from being trapped.
    Once dry, reassemble the pipette, ensuring all parts fit together correctly.
    Wear gloves throughout the cleaning process to protect your hands, and goggles if there's a risk of splashing.
    Regularly check the pipette for calibration accuracy and signs of wear or damage.""": "PASS",

    """Rinse the pipette with distilled water to eliminate initial contaminants.
    Disassemble the pipette if the design permits, keeping track of all pieces.
    Clean each part with a solution specifically designed for pipettes or a non-abrasive detergent.
    Perform a thorough rinse of all components with distilled water to remove the cleaning solution.
    Dry the components using a clean, lint-free cloth or let them air dry in an upright position.
    Reassemble the pipette, ensuring it functions smoothly.
    Always use gloves and eye protection while cleaning to avoid direct contact with chemicals.
    Conduct maintenance checks for calibration and inspect for damage regularly.""": "PASS",

    """Start by rinsing the pipette using distilled water to wash away leftover substances.
    Disassemble the pipette carefully to access all internal surfaces.
    Apply a gentle detergent or pipette cleaner to all parts, scrubbing softly to avoid damage.
    Rinse thoroughly with distilled water until all traces of the cleaner are gone.
    Reassemble the pipette after ensuring all parts are clean but without specifying drying.
    Utilize protective gloves to safeguard your hands during the cleaning.
    Regularly perform maintenance checks to ensure the pipette's accuracy and condition.""": "FAIL", 
    
    """Rinse initially with distilled water to remove surface residues.
    Apply a mild detergent to clean the pipette internally, avoiding harsh scrubbing.
    After cleaning, rinse with distilled water to clear out any soap remnants.
    Dry the pipette externally with a soft cloth.""": "FAIL",  

    """Initial rinsing with distilled water is performed to clear away visible contaminants.
    The pipette is disassembled for thorough cleaning.
    All parts are rinsed post-cleaning with distilled water to ensure no detergent is left.
    The components are air dried in an upright position or with a gentle airflow.
    The pipette is reassembled.""": "FAIL"  
}

df = compile_classification_data(instructions_with_true_labels, GradingPipetteCleaningInstructions)

In [17]:
@marvin.fn
def GradingPipetteCleaningInstructions(text: str) -> float:
    """
    Award ten points for the inclusion of each of the following tasks: using distilled water, use of mild detergent or cleaning solution, 
    rinsing with distilled water, drying, reassembly, wearing gloves and goggles, 
    checking for calibration and wear
    """

def compile_scoring_data(instructions, scoring_function):
    model_labels = [scoring_function(instruction) for instruction in instructions]
    df = pd.DataFrame({
        'Instructions': instructions,
        'Model Label': model_labels
    })
    return df

score_df= compile_scoring_data(instructions_with_true_labels.keys(), GradingPipetteCleaningInstructions)

Unnamed: 0,Instructions,Model Label
0,"(GradingPipetteCleaningInstructions, Begin by rinsing the pipette with distilled water to remove any residual chemicals.\n Carefully disassemble the pipette into its component parts.\n Soak and gently scrub the parts with a mild detergent solution to cleanse thoroughly.\n Rinse all parts several times with distilled water to ensure all detergent is removed.\n Allow the parts to air dry completely in an upright position to prevent moisture from being trapped.\n Once dry, reassemble the pipette, ensuring all parts fit together correctly.\n Wear gloves throughout the cleaning process to protect your hands, and goggles if there's a risk of splashing.\n Regularly check the pipette for calibration accuracy and signs of wear or damage)",80.0
1,"(GradingPipetteCleaningInstructions, Rinse the pipette with distilled water to eliminate initial contaminants.\n Disassemble the pipette if the design permits, keeping track of all pieces.\n Clean each part with a solution specifically designed for pipettes or a non-abrasive detergent.\n Perform a thorough rinse of all components with distilled water to remove the cleaning solution.\n Dry the components using a clean, lint-free cloth or let them air dry in an upright position.\n Reassemble the pipette, ensuring it functions smoothly.\n Always use gloves and eye protection while cleaning to avoid direct contact with chemicals.\n Conduct maintenance checks for calibration and inspect for damage regularly)",80.0
2,"(GradingPipetteCleaningInstructions, Start by rinsing the pipette using distilled water to wash away leftover substances.\n Disassemble the pipette carefully to access all internal surfaces.\n Apply a gentle detergent or pipette cleaner to all parts, scrubbing softly to avoid damage.\n Rinse thoroughly with distilled water until all traces of the cleaner are gone.\n Reassemble the pipette after ensuring all parts are clean but without specifying drying.\n Utilize protective gloves to safeguard your hands during the cleaning.\n Regularly perform maintenance checks to ensure the pipette's accuracy and condition)",70.0
3,"(GradingPipetteCleaningInstructions, Rinse initially with distilled water to remove surface residues.\n Apply a mild detergent to clean the pipette internally, avoiding harsh scrubbing.\n After cleaning, rinse with distilled water to clear out any soap remnants.\n Dry the pipette externally with a soft cloth.)",40.0
4,"(GradingPipetteCleaningInstructions, Initial rinsing with distilled water is performed to clear away visible contaminants.\n The pipette is disassembled for thorough cleaning.\n All parts are rinsed post-cleaning with distilled water to ensure no detergent is left.\n The components are air dried in an upright position or with a gentle airflow.\n The pipette is reassembled)",50.0


## Reading from .csv files and outputting results to a .csv file

In [5]:
def create_classes_from_definitions(csv_file_path):
    df = pd.read_csv(csv_file_path)
    enum_classes = {}
    for class_name, group in df.groupby('ClassName'):
        # Create an Enum with uppercase member names and their descriptions
        enum_class = Enum(class_name, {row['Label']: row['ClassDocString'].strip() for _, row in group.iterrows()})
        enum_classes[class_name] = enum_class
    return enum_classes
    
def compile_classification_data(instructions_with_true_labels, enum_class):
    """
    Compiles classification data into a DataFrame.

    Parameters:
    - instructions_with_true_labels (dict): A dictionary mapping (ClassName, Example) tuples to true labels.
    - enum_class (Enum): The Enum class associated with the classification.

    Returns:
    - DataFrame: A pandas DataFrame containing the instructions, their true labels, and the labels predicted by Marvin.
    """
    # Filter instructions for the current enum_class based on class name
    filtered_instructions = {instr: label for (cls_name, instr), label in instructions_with_true_labels.items() if cls_name == enum_class.__name__}
    # Extract only the instruction text for the DataFrame
    instructions = list(filtered_instructions.keys())  # Now just the instruction text
    true_labels = list(filtered_instructions.values())

    model_labels = [marvin.classify(instruction, enum_class).name for instruction in instructions]

    df = pd.DataFrame({
        'Instructions': instructions,
        'True Label': true_labels,
        'Model Label': model_labels
    })

    return df

def get_labels_for_class(class_name, instructions_with_true_labels):
    """
    Extracts unique labels for a given class based on the instructions_with_true_labels dictionary.

    Parameters:
    - class_name: The name of the class to get labels for.
    - instructions_with_true_labels: A dictionary with (ClassName, Example) as keys and their true labels as values.

    Returns:
    - A list of unique labels associated with the class.
    """
    labels = set()  # Use a set to avoid duplicates
    for (cls_name, _), label in instructions_with_true_labels.items():
        if cls_name == class_name:
            labels.add(label)
    return list(labels)


def read_labeled_examples(csv_file_path):
    df = pd.read_csv(csv_file_path)
    instructions_with_true_labels = {}
    for _, row in df.iterrows():
        # Constructing the key as a tuple of (ClassName, Example)
        key = (row['ClassName'], row['Example'])
        # The value is the label
        instructions_with_true_labels[key] = row['Label']
    return instructions_with_true_labels


class_definitions_path = '../data/class_definitions.csv'
labeled_examples_path = '../data/labeled_examples.csv'

dynamic_classes = create_classes_from_definitions(class_definitions_path)
instructions_with_true_labels = read_labeled_examples(labeled_examples_path)

for class_name, enum_class in dynamic_classes.items():
    df = compile_classification_data(instructions_with_true_labels, enum_class)
    df.to_csv(f"../data/{class_name}Results.csv", index=False)


In [6]:
df

Unnamed: 0,Instructions,True Label,Model Label
0,"Begin by rinsing the pipette with distilled water to remove any residual chemicals.\n Carefully disassemble the pipette into its component parts.\n Soak and gently scrub the parts with a mild detergent solution to cleanse thoroughly.\n Rinse all parts several times with distilled water to ensure all detergent is removed.\n Allow the parts to air dry completely in an upright position to prevent moisture from being trapped.\n Once dry, reassemble the pipette, ensuring all parts fit together correctly.\n Wear gloves throughout the cleaning process to protect your hands, and goggles if there's a risk of splashing.\n Regularly check the pipette for calibration accuracy and signs of wear or damage",PASS,PASS
1,"Rinse the pipette with distilled water to eliminate initial contaminants.\n Disassemble the pipette if the design permits, keeping track of all pieces.\n Clean each part with a solution specifically designed for pipettes or a non-abrasive detergent.\n Perform a thorough rinse of all components with distilled water to remove the cleaning solution.\n Dry the components using a clean, lint-free cloth or let them air dry in an upright position.\n Reassemble the pipette, ensuring it functions smoothly.\n Always use gloves and eye protection while cleaning to avoid direct contact with chemicals.\n Conduct maintenance checks for calibration and inspect for damage regularly",PASS,PASS
2,"Start by rinsing the pipette using distilled water to wash away leftover substances.\n Disassemble the pipette carefully to access all internal surfaces.\n Apply a gentle detergent or pipette cleaner to all parts, scrubbing softly to avoid damage.\n Rinse thoroughly with distilled water until all traces of the cleaner are gone.\n Reassemble the pipette after ensuring all parts are clean but without specifying drying.\n Utilize protective gloves to safeguard your hands during the cleaning.\n Regularly perform maintenance checks to ensure the pipette's accuracy and condition",FAIL,FAIL
3,"Rinse initially with distilled water to remove surface residues.\n Apply a mild detergent to clean the pipette internally, avoiding harsh scrubbing.\n After cleaning, rinse with distilled water to clear out any soap remnants.\n Dry the pipette externally with a soft cloth.",FAIL,FAIL
4,Initial rinsing with distilled water is performed to clear away visible contaminants.\n The pipette is disassembled for thorough cleaning.\n All parts are rinsed post-cleaning with distilled water to ensure no detergent is left.\n The components are air dried in an upright position or with a gentle airflow.\n The pipette is reassembled,FAIL,FAIL
