# What

As established in [#144](https://github.com/1jamesthompson1/TAIC-report-summary/issues/144) now that we have the safety issues dataset from #141 it is possible for us to generate the safety themes just from the safety issues.

This is supposed to be a quick example of how it could be done and what this dataset could be used for.

## Modules

To make this document easy to move all the modules needed are listed here

In [1]:
# from engine
from engine.OpenAICaller import openAICaller

from engine.Extract_Analyze import ThemeGenerator

# third party
import yaml
import pandas as pd

# built in
import os
import importlib


importlib.reload(ThemeGenerator)


<module 'engine.Extract_Analyze.ThemeGenerator' from '/home/james/code/TAIC-report-summary/engine/Extract_Analyze/ThemeGenerator.py'>

# Get data

I am going to read all of the safety issues from the reports.

In [2]:
output_path = "output"

safety_issues = []

all_reports = [r for r in os.listdir(output_path) if os.path.isdir(os.path.join(output_path, r))] 

for report_id in all_reports:
    
    safety_issue_path = os.path.join(output_path, report_id, f"{report_id}_safety_issues.yaml")

    if not os.path.exists(safety_issue_path):
        continue

    with open(os.path.join(output_path, report_id, f"{report_id}_safety_issues.yaml"), "r") as f:
        si = yaml.safe_load(f)

    safety_issues.append({
        'report_id': report_id,
        'si': si
    })

# Use gpt 4 to generate safety themes

I am now going to run it through a LLM.

## Prompt preparation

In [3]:
safety_issues_str = ""

for si in safety_issues:
    si_str = "\n".join([i['safety_issue'] for i in si['si']])
            
    safety_issues_str += f"Report ID: {si['report_id']}\n{si_str}\n"

## Sending to model

In [4]:
importlib.reload(ThemeGenerator)

themeGenerator = ThemeGenerator.ThemeGenerator(output_path, {"folder_name": "{{report_id}}", "themes_file_name": "{{report_id}}_themes.yaml"}, ['aviation', 'rail', 'marine'], False)

## Getting safety themes

In [7]:
safety_theme_example = themeGenerator._get_safety_themes_from_reports(safety_issues_str)

In [None]:
safety_themes = [themeGenerator._get_safety_themes_from_reports(safety_issues_str) for _ in range(3)]


In [None]:
# Get the average

safety_themes_str = [
    '\n'.join([f"{safety_theme['title'].strip()} for modes {', '.join(safety_theme['modes'])}:\n {safety_theme['description']}" for safety_theme in safety_themes_set])
    for safety_themes_set in safety_themes
]

safety_themes_str = '\n\n'.join([f"Safety themes version\n'''\n{themes}\n'''\n" for themes in safety_themes_str])

print(safety_themes_str)


In [142]:

average_safety_theme = openAICaller.query(
    system="""
You are going to help me summarize the given source text.
        """,

    user=f"""
Here is the text to summarize:
'''
{safety_themes_str}
'''

Here are three versions of the safety themes extracted from a collection of accident investigation reports.

I need you to return one set that is the average of all of the safety themes versions. This means it there should be about 15 final safety themes.

Your output needs to be in yaml format. Just output the yaml structure with no extra text (This means no ```yaml and ```) . It will look something like this:
- title: |-
    title of the theme goes here
    description: |
    Multi line description of the theme goes here.
    modes:
    - modes that should be included. One per row (Each mode is just one letter (a,r,m))


=Here are some definitions=

Safety factor - Any (non-trivial) events or conditions, which increases safety risk. If they occurred in the future, these would
increase the likelihood of an occurrence, and/or the
severity of any adverse consequences associated with the
occurrence.

Safety issue - A safety factor that:
• can reasonably be regarded as having the
potential to adversely affect the safety of future
operations, and
• is characteristic of an organization, a system, or an
operational environment at a specific point in time.
Safety Issues are derived from safety factors classified
either as Risk Controls or Organizational Influences.

Safety theme - Indication of recurring circumstances or causes, either across transport modes or over time. A safety theme may
cover a single safety issue, or two or more related safety
issues.    

        """,
    model = 'gpt-4',
    temp=0
)
try:
    average_safety_theme = yaml.safe_load(average_safety_theme)
except yaml.YAMLError:
    print("Cant parse")
    print(average_safety_theme)


yaml.safe_dump(average_safety_theme, open("safety_themes.yaml", "w"))

In [None]:
safety_themes = yaml.safe_load(open("safety_themes.yaml", "r"))

print(len(safety_themes))

safety_themes

## Grouping safety themes

In [148]:
safety_theme_groups = themeGenerator._group_safety_themes(safety_themes)

In [11]:
# Edit each title to remove trailing newline

for group in safety_theme_example:
    group['title'] = group['title'].strip()


safety_theme_example

[{'title': 'Non-compliance with Procedures',
  'description': 'This theme involves instances where individuals or teams do not adhere to established safety procedures, protocols, or regulations. This can include failure to follow standard operating procedures, maintenance protocols, or regulatory requirements. Non-compliance can lead to unsafe conditions, increased risk of accidents, and potential harm. It underscores the importance of strict adherence to safety guidelines and the need for robust oversight and training to ensure compliance.\n',
  'modes': ['a', 'r', 'm']},
 {'title': 'Inadequate Communication',
  'description': 'Inadequate communication refers to failures or deficiencies in the exchange of critical information among team members, between different teams, or with external entities. This can include miscommunication, lack of clarity, or failure to share important safety-related information. Such communication failures can lead to misunderstandings, misaligned actions, an

In [12]:


safety_theme_groups_example = themeGenerator._group_safety_themes(safety_theme_example)

In [149]:
safety_theme_groups

[{'title': 'Human Factors and Training',
  'description': 'This group focuses on the human element in operational safety, emphasizing the importance of training, non-technical skills, and managing human vulnerabilities such as fatigue.\n',
  'themes': ['Inadequate Training and Familiarization',
   'Human Factors and Non-Technical Skills',
   'Fatigue Management']},
 {'title': 'Safety Management and Organizational Practices',
  'description': 'Themes in this group relate to the overarching systems and cultural practices within organizations that support or undermine safety, including the management of safety information and the cultivation of a safety culture.\n',
  'themes': ['Safety Management Systems Deficiencies',
   'Safety Culture and Organizational Influences',
   'Regulatory and Oversight Deficiencies']},
 {'title': 'Operational and Technical Safeguards',
  'description': 'This group encompasses the technical and procedural aspects of safety, focusing on the design, maintenance,

## Add groups to original safety themes

In [14]:
# Add a field to each safety theme that is the group it belongs to

def combine_groups_with_themes(safety_theme_groups, safety_themes):

    safety_themes_with_groups = []

    for theme in safety_themes:
        for group in safety_theme_groups:
            if theme['title'] in group['themes']:
                theme['group'] = group['title']
                safety_themes_with_groups.append(theme)
                break

    return safety_themes_with_groups

In [15]:
safety_theme_example_with_groups = combine_groups_with_themes(safety_theme_groups_example, safety_theme_example)

# Prepare xlsx spreadsheet to share with chris and ingrid

I need to give a spreadsheet to them so that they can appreicate what the results are.

In [13]:
# First page has all the safety themes
# Second page has the groups

def send_to_excel(safety_themes_with_groups, safety_theme_groups, file_name):

    writer = pd.ExcelWriter("generated_safety_themes.xlsx", engine="openpyxl")

    pd.DataFrame(safety_themes_with_groups)[['title', 'description', 'group']].to_excel(writer, sheet_name="Safety Themes", index=False)

    pd.DataFrame(safety_theme_groups).to_excel(writer, sheet_name="Safety Theme Groups", index=False)

    writer.close()

In [16]:
send_to_excel(safety_theme_example_with_groups, safety_theme_groups_example, "safety_themes.xlsx")

After performing this multiple times I am noticing that it varies a bit.

I would like to take an average.