# Exercise 1: Analyzing thresholds

The Receiver Operating Characteristic (ROC) curve is a graphical representation used to evaluate the performance of binary classification models. It plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings.

### Step 1: Generating some random data

We simulate the result of the application of an LLM by generating two random vectors, of actual results and expected results (the ground truth). For this simplified example, the corresponding actual vector is always in the same place as the expected vector (it will not be always like this!).

In [None]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
from sklearn.metrics import auc


# Step 1: Generate random dataset (100 vectors, each of 100 dimensions)
np.random.seed(42)  # For reproducibility
expected = ""


#we define here a perturbation factor to simulate the difference between the expected and the actual ones
perturbation_factor = 1

#the actual are defined in this way
actual = ""

# Print out the original and modified datasets for comparison
print(" (First 2 Vectors):")
print(expected[:2])
print("\nModified Data2 (First 2 Vectors):")
print(actual[:2])

#the comparisons can be made as usual through cosine similarity
cos_sim_matrix = cosine_similarity(actual, expected)

print("Cosine Similarity Matrix:")
print(cos_sim_matrix)

### Step 2: a function to compute TPR and FPR

True Positive Rate (TPR), also called Sensitivity or Recall, is the proportion of actual positives that are correctly identified by the model. It is given by TPR = TP / (TP + FN), where TP = true positives; FN = False Negatives.

False Positive Rate (FPR) is the proportion of actual negatives that are incorrectly classified as positives. It is given by FPR = FP / (FP + TN), where FP = False Positives, TN = True Negatives.

A classifier typically outputs a probability score for each sample (the likelihood that a sample belongs to the positive class). To classify the sample, you apply a threshold on this score. If the score is above the threshold, the sample is classified as positive (class 1), and if it is below the threshold, it is classified as negative (class 0).


In [2]:

#a function to compute tpr and fpr given a vector of actual data, a vector of expected data, and a threshold

def compute_tpr_fpr(data, data2, threshold=0.9):

    # Compute the cosine similarity between all pairs of vectors
    cos_sim_matrix = ""
    
    # Initialize counters for TP, FP, TN, FN
    tp = 0  # True Positives
    fp = 0  # False Positives
    tn = 0  # True Negatives
    fn = 0  # False Negatives
    
    # Loop over all the pairs in the matrix
    for i in range(len(data)):
        for j in range(len(data2)):
            # Check ground truth - vectors from the same index in original data
                
            # Apply the threshold to classify cosine similarity
                
            # Update the counts based on comparison of ground truth and prediction
            pass

    #compute tpr and fpr
    tpr = 0
    fpr = 0

    return tpr, fpr

#an example of computation for a threshold equal to 0.95
tpr, fpr = compute_tpr_fpr(actual, expected, threshold=0.95)

# Print the result
print(f"True Positive Rate (TPR): {tpr}")
print(f"False Positive Rate (FPR): {fpr}")



True Positive Rate (TPR): 0
False Positive Rate (FPR): 0


### Step 3: Plotting the ROC curve

In the ROC curve:

- The x-axis represents the False Positive Rate (FPR)
- The y-axis represents the True Positive Rate (TPR)

Each point on the ROC curve corresponds to a specific threshold value. By adjusting the threshold, you change the trade-off between TPR and FPR.

### Thresholding

By varying this threshold from 0 to 1, you can calculate different values for TPR and FPR, generating a curve. The threshold determines the sensitivity (TPR) and the specificity (FPR) of the classifier:

- At a high threshold, the model will classify fewer instances as positive, leading to fewer true positives and possibly many false negatives.

- At a low threshold, the model will classify more instances as positive, leading to more true positives but also increasing false positives.

In [None]:
# Compute TPR and FPR at varying thresholds
thresholds = []  # Create 100 thresholds between 0 and 1
tprs = []
fprs = []


# Plot the ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fprs, tprs, color='blue', label='ROC curve')
plt.plot([0, 1], [0, 1], color='red', linestyle='--')  # Random guess line (diagonal)
plt.title('ROC Curve')
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

# Calculate AUC (Area Under Curve)
roc_auc = auc(fprs, tprs)
print(f'Area Under the ROC Curve (AUC): {roc_auc}')


### Now reason about the following points:

- What happens by varying the size of the vectors?
- What happens by varying the perturbation factor?
- How to cope with cases in which *we don't know* what is the ground truth (i.e., we don't know that the actual result correspond to the one in the same position in the expected results?)

---

# Exercise 2: A simple router architecture

In this architecture, we leverage a Large Language Model (LLM) to dynamically interpret user instructions and route them to the appropriate task-specific prompt. This approach ensures that complex software engineering tasks, such as generating use cases or class diagrams, are efficiently handled based on the user's needs.

The architecture is split into two main stages:

- Router LLM Stage:
  - The LLM analyzes the user's instruction and determines whether the task is related to generating use cases or a class diagram.
  - It outputs an instruction to route the next stage.

- Task Execution LLM Stage:
  - Based on the generated prompt from the router, the LLM executes the required task by producing either:
    - A set of use cases, or
    - A UML class diagram.


In [None]:
# reference code for llama prompting
 
 
from transformers import AutoTokenizer, AutoModelForCausalLM
from huggingface_hub import login
import torch



# Load the tokenizer and model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_id = "meta-llama/Llama-3.2-3B"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)




#we define a method to ask any prompt to llama
def ask_llama(prompt, maxl=200, temp=0.7):
    """
    Send a prompt to the Llama model and get a response.

    Args:
    - prompt (str): The input question or statement to the model.
    - max_length (int): The maximum length of the response.
    - temperature (float): Controls randomness in the model's output.

    Returns:
    - str: The model's generated response.
    """
    # Tokenize the prompt
    inputs = tokenizer(prompt, return_tensors="pt")

    inputs.to(device)

    # Generate the output
    outputs = model.generate(
        inputs['input_ids'],  # Tokenized input
        max_length=maxl,         # Limit response length to avoid extra text
        temperature=temp,        # Lower temperature to reduce randomness
        do_sample=True,        # Disable sampling for deterministic output
        pad_token_id=tokenizer.eos_token_id  # Ensure the model doesn't go beyond the end token
    )

    # Decode and return the response
    return tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

# Example usage
prompt = """
System: You are an expert on world capitals.
Respond with only the capital city of the given country. Do not repeat the question.

Query: What is the capital of France?
Answer:
"""

response = ask_llama(prompt)

print(f"Prompt: {prompt}\nResponse: {response}")



### Step 1: Router LLM - Decide the next step
The first LLM will analyze the user's instruction and generate an instruction for the next prompt to use. For simplicity, in this case the router will only decide what is the type of diagram to create:

- Use Case Diagram
- Class Diagram

In [None]:
requirements_text = "The proposed platform is designed to enhance the hiking experience for various user groups, including visitors, local guides, platform managers, and hut workers. The platform provides a centralized repository of hiking routes, hut information, and parking facilities. It also enables interactive features such as real-time hike tracking, personalized recommendations, and group hike planning. By combining these capabilities, the platform seeks to foster safe, informed, and collaborative hiking experiences.\
The platform will be deployed as a cloud-based web and mobile application accessible to all stakeholders. The distribution strategy includes an app available on major mobile operating systems, such as iOS and Android, alongside a responsive web interface. It will require an internet connection for features like real-time tracking, notifications, and user authentication, though some offline capabilities, such as pre-downloaded hike information, will also be available.\
User authentication will be role-based, ensuring that only authorized users, such as verified hut workers and platform managers, can access sensitive or administrative features.\
Visitors are the primary users of the platform. They can browse a comprehensive list of hiking trails, filter them based on specific criteria such as difficulty, length, or starting point, and view detailed descriptions. To access advanced features like personalized recommendations, visitors can create user accounts by registering on the platform. Registered users can record their fitness parameters, enabling the system to suggest trails tailored to their capabilities.\
During a hike, visitors can record their progress by marking reference points and sharing their live location through a broadcasting URL. They can also initiate group activities by planning hikes, adding group members, and confirming group participation. The platform allows visitors to start, terminate, and track their hikes, with notifications for unfinished hikes or late group members to ensure safety and accountability.\
Local guides enrich the platform by contributing essential information. They can add detailed descriptions of hikes, parking facilities, and huts, ensuring hikers have accurate and comprehensive data. Local guides also link parking lots and huts to specific trails as starting or arrival points, enhancing the planning process.\
To aid in the visual representation and accessibility of information, local guides can upload pictures of huts and connect these locations directly to hikes. This integration simplifies route planning and helps visitors visualize their journey.\
Platform managers oversee the operational integrity and safety of the platform. They verify new hut worker registrations, ensuring that only authorized personnel can update hut-related data. Managers can also broadcast weather alerts for specific areas, notifying all hikers in those regions through push notifications. This ensures that users stay informed about potentially hazardous conditions.\
The platform manager's role includes maintaining an organized and secure user system while facilitating collaboration between local guides, hut workers, and visitors.\
Hut workers are critical to the maintenance of up-to-date trail and accommodation information. After registering and being verified, hut workers can log into the platform to add or update information about their assigned huts, including uploading pictures and describing the facilities available. They can also monitor and report on the condition of nearby trails, ensuring hikers receive current information.\
Hut workers play a vital role in providing situational updates for hikers. For instance, if a nearby trail is impacted by severe weather or physical obstructions, they can communicate these conditions through the platform. This enhances the safety and preparedness of all hikers relying on the platform."


user_instruction = f"""     """


prompt_router = f"""    """


#the response here will be only used for guiding the generation of the next prompt.
#it must be one of two alternatives: "Use Cases" or "Class Diagram"
response = ask_llama(prompt_router, maxl = 2000)

print(response)

### Step 2: Task-Specific LLM - Generate Output
The second LLM agent, based on the decision of the router, will either:

- Generate use cases, or
- Generate a class diagram

In use case diagram design, the primary components typically include actors, which are entities interacting with the system, and use cases, which represent the goals or tasks the actors want to achieve. The diagram focuses on the interactions between these actors and the system, illustrating the functional requirements of the system from a user perspective. For this simplified example, we are focusing only on user-goal use cases (i.e., main functions of the system).

In class diagram design, typically, the primary elements extracted are classes, their attributes, methods, and the relationships between them. Classes represent entities within the system, and attributes define their properties or characteristics. Methods outline the actions or operations that can be performed on or by a class. Additionally, relationships like associations, inheritance, and dependencies are represented to show how different classes interact with one another. For this simplified example, we are focusing only on the classes, leaving the recognition of individual attributes to other prompts.

In [None]:

import re
import string

#remove all non textual characters from the response
fixed_response = ""

print(fixed_response)


if (fixed_response == "Use Cases") :

    prompt_use_cases = f"""    """

    response_uc = ask_llama(prompt_use_cases, maxl = 2000)

    #format the response for readability
    response = ""


elif (fixed_response == "Class Diagram") :

    prompt_class_diagram = f"""    """

    response_cd = ask_llama(prompt_class_diagram, maxl = 2000)

    #format the response for readability
    response = ""


else : 

    print("Unrecognized command from the user")


### Step 3: Reasoning

Now reason about the following steps:
- How can I evaluate the results? 
- How can I extend the prompts to provide other aspects of class and uml diagrams?
- Try to execute the prompts with the ChatGPT engine. What are your results?