## Evaluation Metrics

This notebook illustrates how two evaluation metrics were applied to measure the similarity between the reference output and the generated output.

### Installing NLTK Library

In [12]:
pip install nltk


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### Installing OpenAI Library

An older version of the OpenAI Python library was utilized because the specific method attempted for retrieving embeddings has been discontinued in newer versions.

In [2]:
pip install openai==0.27.0


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### Importing the Necessary Functions

In [13]:
import nltk 
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from nltk.tokenize import word_tokenize
import openai
import numpy as np

In [14]:
# Ensure that NLTK tokenizers are downloaded
nltk.download('punkt')

[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

Further section demonstrates how the BLEU and Cosine similarity scores were utlised to calculate similarity.

### Step 1: Creating a reference document

A reference document was created with the information available in the dataset.

In [7]:
# Reference Document
reference_document = """
Activity Site Entrance

Hazard: Unauthorized access 
Initial Risk Level: L4, S3, R12
Identified at Risk Security Personnel, Visitors
Residual Risk: L4, S1, R4

Control Measures Safety induction and visible signage must be present

Hazard: Struck by vehicle
Initial Risk Level: L4, S3, R12
Identified at Risk Security Personnel, Visitors
Residual Risk: L4, S3, R12

Control Measures Safety induction and visible signage must be present
"""

### Step 2: Converting Generated Output to a string

The generated document that was created by the models was converted into a string.

In [20]:
#Generated Document
generated_document = """
Site Entrance Risk Assessment Document
1. Introduction
This document outlines the hazards associated with the site entrance of [Project Name], the level of risks, the individuals or groups identified at risk, the control measures implemented to mitigate these risks, and the level of residual risk after the implementation of control measures.
2. Hazards Identified
Slip, Trip, and Fall Hazards due to uneven or wet surfaces.
Vehicle-Pedestrian Interaction increasing the risk of accidents and injuries.
Unauthorized Access leading to potential safety or security breaches.
Poor Visibility at the entrance can cause accidents.
Dust and Air Pollution creating health issues for workers and visitors.
3. Risk Level
Slip, Trip, and Fall Hazards: High
Vehicle-Pedestrian Interaction: High
Unauthorized Access: Medium
Poor Visibility: Medium
Dust and Air Pollution: Low
4. Identified at Risk
Site Workers
Visitors
Delivery Personnel
Pedestrians passing the site entrance
5. Control Measures
For Slip, Trip, and Fall Hazards
Regular inspection and maintenance of entrance pathways.
Adequate signage indicating uneven surfaces.
Installation of anti-slip floor mats during wet conditions.
For Vehicle-Pedestrian Interaction
Designated pedestrian walkways separated from vehicle lanes.
Speed limits imposed and visibly marked.
Brightly colored pedestrian crossings with appropriate signage.
For Unauthorized Access
Secure fencing around the site perimeter.
Security personnel stationed at the entrance.
Visitor sign-in protocol and provision of visitor badges.
For Poor Visibility
Ample lighting installed at the site entrance.
Reflective signs and markings used to enhance visibility.
Mirrors installed at blind spots.
For Dust and Air Pollution
Water spraying systems to minimize dust.
Mandatory use of personal protective equipment (PPE) such as masks.
Regular monitoring of air quality at the site entrance.
6. Residual Risk
Slip, Trip, and Fall Hazards: Reduced to Low with regular monitoring and maintenance.
Vehicle-Pedestrian Interaction: Reduced to Low with strict enforcement of pedestrian and vehicle separation.
Unauthorized Access: Reduced to Low with effective security measures.
Poor Visibility: Remains at Medium without constant monitoring and maintenance of visibility enhancements.
Dust and Air Pollution: Remains at Low with effective use of control measures but may fluctuate based on site activity and weather conditions.
7. Review and Monitoring
This risk assessment will be reviewed every 6 months or following any significant changes at the site entrance or in regulations. Continuous monitoring of the entrance area is essential to ensure that control measures are appropriately maintained and adapted as necessary.
8. Conclusion
The control measures, when efficiently implemented and maintained, significantly reduce the risk levels at the site entrance. Ongoing vigilance, adherence to safety protocols, and regular review of these measures are paramount to ensuring the safety and health of all individuals entering or exiting the site."""


### Step 3: Calculate the BLEU Score

In [15]:
# Tokenize the reference and generated documents into words after converting them to lowercase.
reference_tokens = word_tokenize(reference_document.lower())
generated_tokens = word_tokenize(generated_document.lower())

# Calculate the BLEU score using a smoothing function
# Smoothing is used to avoid the BLEU score being unduly penalized by zero counts in the n-gram comparisons,
# which often happens with shorter texts or texts with less common lexical items.
smoothie = SmoothingFunction().method4
bleu_score = sentence_bleu([reference_tokens], generated_tokens, smoothing_function=smoothie)

# Print the BLEU score, formatted to four decimal places.
# The BLEU score gives a numerical indication of the generated text's similarity to the reference text,
# with a score of 1 indicating perfect overlap (identical text) and scores approaching zero indicating less similarity.
print(f"BLEU score: {bleu_score:.4f}")

BLEU score: 0.0108


### Calculate the Cosine Similarity Score

In [17]:
# Set your OpenAI API key
openai.api_key = '' # API Key has been removed for privacy concerns

# Function to get embeddings from OpenAI using the latest model
def get_embedding(text, model="text-embedding-3-large"):
    # This line sends a request to the OpenAI API to create an embedding for the provided text.
    # The 'model' parameter specifies which OpenAI embedding model to use.
    response = openai.Embedding.create(model=model, input=[text])
    
    # This line extracts the embedding vector from the API response.
    # The embedding is stored in the 'data' field of the response, at the first index.
    embedding = response['data'][0]['embedding']
    
    # Convert the list of embedding values into a NumPy array for easier manipulation and use in further calculations.
    return np.array(embedding)

In [18]:
# Get embeddings
embedding1 = get_embedding(reference_document)
embedding2 = get_embedding(generated_document)

# Calculate cosine similarity
cosine_sim = np.dot(embedding1, embedding2) / (np.linalg.norm(embedding1) * np.linalg.norm(embedding2))
print(f"Cosine Similarity: {cosine_sim:.4f}")

Cosine Similarity: 0.7186


The BLEU and cosine similarity scores were calculated for each activity within the Risk Assessment Document and Job Safety Analysis by sequentially updating the reference and generated documents specific to each activity. This approach ensured that the assessments of textual similarity were tailored and relevant for the diverse activities covered in both documents.