# Summarizing Data for Results Graphics in "R2: Perceived risks and benefits qualitative"

There are two central steps involved:


**Check Szenario Texts:**
- to check if topic was already mentioned for the category specific graph

**Summarizing Data:**
- for A, B graph
- for category specific graph



***
**Coding sources**

Code snippets from my (Julius Fenn) Large Language Model Workshop: https://github.com/FennStatistics/introductory-workshop-in-LLMs/


I extend the code provided and explained in the following documenations:
* Build a PDF ingestion and Question/Answering system: https://python.langchain.com/v0.2/docs/tutorials/pdf_qa/

* https://python.langchain.com/v0.2/docs/tutorials/llm_chain/ (Build a Simple LLM Application with LCEL)
* https://python.langchain.com/v0.2/docs/how_to/structured_output/ (How to return structured data from a model)
* https://python.langchain.com/v0.2/docs/how_to/llm_token_usage_tracking/ (How to track token usage for LLMs)


***
## If you facing issues running your Code:

It could be the case that chroma and langchain-core are not compatible.

In [1]:
## run in your terminal; could be necessary to downgrade package:
# pip uninstall langchain-core
# pip install langchain-core==0.3.10

## or install older version of chromadb:
# pip install --upgrade chromadb==0.5.0

## Get API, local supabase server key(s)

In [2]:
import os
import sys

# Assuming 'src' is one level down (in the current directory or a subdirectory)
path_to_src = os.path.join('src')  # Moves one level down to 'src' folder

# Add the path to sys.path
sys.path.append(path_to_src)

# Now you can import your API_key module
import API_key as key

## include self-written functions

In [3]:
import src.forChromaApproach as di_drg

# Check Szenario Texts

## Load Scenario Texts
This should be the final scenario texts in English of the two robots:

* rescue robot
* socially assistive robot

In [4]:
path_to_PDFs = os.path.join('data/scenario texts')  # Moves one level up to 'PDFs' folder

pdf_pages = di_drg.load_pdfs_by_filename(path_to_PDFs, verbose=False)

# Optional: Print the loaded pages by filename
for filename, pages in pdf_pages.items():
    print(f"\nPDF: {filename}")
    print(f"Total Pages: {len(pages)}")
    # print(pages[0])


PDF: rescue robot.pdf
Total Pages: 6

PDF: socially assistive robot.pdf
Total Pages: 6


## Data Storage: Text chunks are converted into vector embeddings and stored in a vector database (Vector DB) next to their respective text chunks.

In [5]:
pdf_chunks = di_drg.split_pdf_pages_into_chunks(pdf_pages, chunk_size=500, chunk_overlap=150, verbose=False)

# Optional: Print a summary of chunks created per PDF
for filename, chunks in pdf_chunks.items():
    print(f"\nPDF: {filename}")
    print(f"Total Chunks: {len(chunks)}")


PDF: rescue robot.pdf
Total Chunks: 15

PDF: socially assistive robot.pdf
Total Chunks: 16


In [6]:
path_to_Chroma = os.path.join('DB_Chroma')  # Moves one level up to 'PDFs' folder

sources_DB = di_drg.inspect_chrom(CHROMA_PATH=path_to_Chroma, openAI_key=key.openAI_key)
print("Number of sources in DB:", len(sources_DB))
print("\nSources:\n", sources_DB)

# Remove the "PDFs\\" prefix from all entries
cleaned_sources_DB = [pdf.replace('PDFs\\', '').replace('data/scenario texts\\', '') for pdf in sources_DB]

# Print the result
print("\nCleaned sources:\n", cleaned_sources_DB)

  db = Chroma(persist_directory=CHROMA_PATH, embedding_function=OpenAIEmbeddings(api_key=openAI_key))


Number of sources in DB: 2

Sources:
 ['data/scenario texts\\rescue robot.pdf', 'data/scenario texts\\socially assistive robot.pdf']

Cleaned sources:
 ['rescue robot.pdf', 'socially assistive robot.pdf']


In [7]:
# if you want to remove your DB:
## di_drg.remove_chrom(CHROMA_PATH=path_to_Chroma)

# pdf_chunks is a dictionary as such we can run over the keys:
for pdf in pdf_chunks.keys():
    if pdf not in cleaned_sources_DB:
        print(f"The PDF '{pdf}' is not included in the DB, as such:")
        print("create DB for", pdf)
        di_drg.save_to_chrom(chunks=pdf_chunks[pdf], CHROMA_PATH=path_to_Chroma, openAI_key=key.openAI_key)

In [8]:
sources_DB = di_drg.inspect_chrom(CHROMA_PATH=path_to_Chroma, openAI_key=key.openAI_key)
print("Number of sources in DB:", len(sources_DB))
print("\nSources:\n", sources_DB)

Number of sources in DB: 2

Sources:
 ['data/scenario texts\\rescue robot.pdf', 'data/scenario texts\\socially assistive robot.pdf']


## Data Retrieval and Generation

your prompt template (system message):

In [9]:
PROMPT_TEMPLATE = """
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"

---

Answer the question based on the above context: {question}
"""

In [10]:
# Question query 2
question = """
What are the advantages of soft robots?
"""

In [11]:
response, source_page_pairs, filtered_hits, all_hits = di_drg.retrieveGenerate(query_text=question, prompt_template=PROMPT_TEMPLATE, openAI_key=key.openAI_key, chroma_path=path_to_Chroma, 
                                                                            docsReturn=10, thresholdSimilarity=0.8)

Number of requested results 150 is greater than number of elements in index 31, updating n_results = 31


Number of possible relevant text chunks found with a threshold similarity of 0.8: 11
Query: Human: 
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "

"
    "for individuals with autism, assisting them to recognize and understand 
emotions in others, which can enhance interpersonal communication skills  
●  Less likely to cause injury during physic al interactions because of their 
softness  
 
Possible risks of socially assistive soft robots (SASR) might be:  
●  Users may become overly reliant on the robot for emotional support or 
companionship, potentially reducing their engagement with human social 
networks

---

often have natural shapes and can bend, twist, and stretch like living organisms, such 
as snakes or octopi. Designed with inspiration from living ent

  response_text = model.predict(prompt)


In [12]:
print(response)

The advantages of soft robots include their ability to mimic living organisms, their flexibility and adaptability, and their natural shapes that allow them to bend, twist, and stretch. Additionally, soft robots can express and perceive human emotions, communicate through expressions like gaze and gestures, and offer a lifelike experience due to their softness and flexibility. Soft robots also have benefits in search and rescue missions, providing access to unreachable areas, delivering essential supplies, and reducing the risk of injury to victims.


In [13]:
print(source_page_pairs)

[('data/scenario texts\\socially assistive robot.pdf', 4), ('data/scenario texts\\socially assistive robot.pdf', 3), ('data/scenario texts\\rescue robot.pdf', 4), ('data/scenario texts\\rescue robot.pdf', 3), ('data/scenario texts\\rescue robot.pdf', 5), ('data/scenario texts\\rescue robot.pdf', 4), ('data/scenario texts\\rescue robot.pdf', 1), ('data/scenario texts\\socially assistive robot.pdf', 3), ('data/scenario texts\\socially assistive robot.pdf', 3), ('data/scenario texts\\socially assistive robot.pdf', 5)]


In [14]:
print(len(all_hits))
print(all_hits[0])

11
(Document(metadata={'page': 4, 'source': 'data/scenario texts\\rescue robot.pdf'}, page_content='––––––––––––  Second  Page in Experiment (Intervention) ––––––––––––   \nSoft Robots for Search and Rescue Missions  \nBenefits of soft robots for search and rescue missions might be:  \n●  Access to areas unreachable or too dangerous for human rescuers  \n●  Delivery  of essential supplies (water, food, medicine) until victims are safely  \nextracted  \n●  Reduced risk of injury to victims due to their flexibility and adaptability'), 0.8405289303258905)


# Summarizing Data
- for A, B graph
- for category specific graph


## Load .xlsx files (lists of words)

* rescue robot_multipleSheets
* socially assistive robot_multipleSheets
* rescue robot_socially assistive robot_multipleSheets

> there are seperate columns for new (added concepts), deleted (deleted concepts), constant (not changed concepts)


In [15]:
## set working environment
#> Get the current working directory
print(os.getcwd())
directory = os.getcwd()

c:\DATEN\PHD\Article_SoftRobotIntervention\Analyses\main study - LLM


In [16]:
import pandas as pd

## Load the xlsx file of the rescue robot and the socially assistive robot combined
# Path to your Excel file
file_path = directory + "/data/" + "rescue robot_social assistance robot_multipleSheets_new" + ".xlsx"
# Load the Excel file
excel_data = pd.ExcelFile(file_path)
# Print the sheet names
print("Sheet names combined:", excel_data.sheet_names)
# Load all sheets into a dictionary of dataframes
all_sheets_Combined = {sheet_name: excel_data.parse(sheet_name) for sheet_name in excel_data.sheet_names}



## Load the xlsx file of the rescue robot and the socially assistive robot combined
# Path to your Excel file
file_path = directory + "/data/" + "rescue robot_multipleSheets_new" + ".xlsx"
# Load the Excel file
excel_data = pd.ExcelFile(file_path)
# Print the sheet names
print("Sheet names RR:", excel_data.sheet_names)
# Load all sheets into a dictionary of dataframes
all_sheets_RR = {sheet_name: excel_data.parse(sheet_name) for sheet_name in excel_data.sheet_names}

## Load the xlsx file of the rescue robot and the socially assistive robot combined
# Path to your Excel file
file_path = directory + "/data/" + "social assistance robot_multipleSheets_new" + ".xlsx"
# Load the Excel file
excel_data = pd.ExcelFile(file_path)
# Print the sheet names
print("Sheet names SAR:", excel_data.sheet_names)
# Load all sheets into a dictionary of dataframes
all_sheets_SAR = {sheet_name: excel_data.parse(sheet_name) for sheet_name in excel_data.sheet_names}

Sheet names combined: ['RCPP', 'LC', 'T', 'SIP', 'HRIP', 'AN', 'SIN', 'R', 'HC', 'RCN', 'SA', 'TP', 'TL', 'RCPN', 'HRIN', 'MT', 'RCA', 'AP']
Sheet names RR: ['RCPP', 'LC', 'T', 'SIP', 'HRIP', 'AN', 'SIN', 'R', 'HC', 'RCN', 'SA', 'TP', 'TL', 'RCPN', 'HRIN', 'MT', 'RCA', 'AP']
Sheet names SAR: ['RCPP', 'LC', 'T', 'SIP', 'HRIP', 'AN', 'SIN', 'R', 'HC', 'RCN', 'SA', 'TP', 'TL', 'RCPN', 'HRIN', 'MT', 'RCA', 'AP']


## Additional dictonaries to provide LLM context

In [17]:
abbreviations_dict = {
    'RCPP': 'perceived positive usefulness (rest category, refers to a classification of arguments that do not fit into any of the predefined categories)',
    'LC': 'perceived low costs',
    'T': 'perceived trust',
    'SIP': 'perceived positive social impact',
    'HRIP': 'perceived positive Human-Robot-Interaction',
    'AN': 'perceived negative anthropomorphism',
    'SIN': 'perceived positive social impact',
    'R': 'perceived risks',
    'HC': 'perceived high costs',
    'RCN': 'neutral rest category (rest category refers to a classification of arguments that do not fit into any of the predefined categories)',
    'SA': 'perceived safety',
    'TP': 'perceived technological possibilities',
    'TL': 'perceived technological limitations',
    'RCPN': 'perceived negative usefulness (rest category, refers to a classification of arguments that do not fit into any of the predefined categories)',
    'HRIN': 'perceived negative Human-Robot-Interaction',
    'MT': 'perceived mistrust',
    'RCA': 'ambivalent rest category (rest category refers to a classification of arguments that do not fit into any of the predefined categories)',
    'AP': 'perceived positive anthropomorphism'
}

print(abbreviations_dict)
print(abbreviations_dict.keys())
print(abbreviations_dict['AN'])

{'RCPP': 'perceived positive usefulness (rest category, refers to a classification of arguments that do not fit into any of the predefined categories)', 'LC': 'perceived low costs', 'T': 'perceived trust', 'SIP': 'perceived positive social impact', 'HRIP': 'perceived positive Human-Robot-Interaction', 'AN': 'perceived negative anthropomorphism', 'SIN': 'perceived positive social impact', 'R': 'perceived risks', 'HC': 'perceived high costs', 'RCN': 'neutral rest category (rest category refers to a classification of arguments that do not fit into any of the predefined categories)', 'SA': 'perceived safety', 'TP': 'perceived technological possibilities', 'TL': 'perceived technological limitations', 'RCPN': 'perceived negative usefulness (rest category, refers to a classification of arguments that do not fit into any of the predefined categories)', 'HRIN': 'perceived negative Human-Robot-Interaction', 'MT': 'perceived mistrust', 'RCA': 'ambivalent rest category (rest category refers to a c

keep only subset of list (which have meaningfull differences):

In [18]:
# Keys to keep
keys_to_keep = ['TP', 'TL', 'SA', 'R', 'HRIP', 'HRIN', 'AP', 'AN']

# Filter the dictionary
abbreviations_dict = {key: abbreviations_dict[key] for key in keys_to_keep}

# Display the filtered dictionary
print(abbreviations_dict)

{'TP': 'perceived technological possibilities', 'TL': 'perceived technological limitations', 'SA': 'perceived safety', 'R': 'perceived risks', 'HRIP': 'perceived positive Human-Robot-Interaction', 'HRIN': 'perceived negative Human-Robot-Interaction', 'AP': 'perceived positive anthropomorphism', 'AN': 'perceived negative anthropomorphism'}


Use dictionaries to map each word to its comment:

In [19]:
import numpy as np

def create_multivalue_dict(df, key_col, value_col):
    """
    Create a dictionary from a DataFrame where each key maps to a list of values.
    
    Parameters:
    df (pd.DataFrame): The input DataFrame.
    key_col (str): The column name to be used as keys.
    value_col (str): The column name to be used as values.
    
    Returns:
    dict: A dictionary where each key maps to a list of values.
    """
    # Remove rows with NaN in the key columns
    df = df.dropna(subset=[key_col])

    # Create a dictionary to map items to their comments, allowing for multiple comments per key
    multivalue_dict = {}
    for key, value in zip(df[key_col], df[value_col]):
        if key in multivalue_dict:
            multivalue_dict[key].append(value)
        else:
            multivalue_dict[key] = [value]

    return multivalue_dict

# Example usage
data = {
    'constant': ['a', 'b', 'a', 'c', 'b', np.nan],
    'constant_comments': ['comment1', 'comment2', 'comment3', 'comment4', 'comment5', 'comment6']
}
df = pd.DataFrame(data)

df_mapping = create_multivalue_dict(df, 'constant', 'constant_comments')
print(df_mapping)


{'a': ['comment1', 'comment3'], 'b': ['comment2', 'comment5'], 'c': ['comment4']}


test create_multivalue_dict() function:

In [20]:
sheet_name = "SA"

In [21]:
# Remove rows with NaN in the key columns as they cannot be used as dictionary keys
#> not sensitive to multiple identical keys: dict(zip(df['constant'], df['constant_comments']))
df = all_sheets_Combined[sheet_name]

constant_comments_mapping = create_multivalue_dict(df, 'constant', 'constant_comments')
print("mapping constant x comments:", constant_comments_mapping)
print(len(constant_comments_mapping))

new_comments_mapping = create_multivalue_dict(df, 'new', 'new_comments')
print("mapping new x comments:", new_comments_mapping)
print(len(new_comments_mapping))

deleted_comments_mapping = create_multivalue_dict(df, 'deleted', 'deleted_comments')
print("mapping deleted x comments:", deleted_comments_mapping)
print(len(deleted_comments_mapping))

mapping constant x comments: {'quick hygienic help': [nan], 'efficient': ['Work on statistics can be made more efficient through probabilities.', nan, nan, nan], 'safety': ['Prevents people from having to work under difficult or dangerous conditions to increase safety.', 'Enables deployment in difficult situations where human rescuers would put themselves in danger', nan, nan, nan, 'through reliability', 'No people are being put in danger'], 'remote control': ['Can be operated by experts via remote control or autonomous systems to perform operations from a safe distance.', nan, nan, nan, nan, 'Minimizes risks associated with autonomy'], 'speed and efficiency': ['Robots can often act faster than humans and increase the speed of emergency interventions', 'Improvement of speed and efficiency of rescue operations', nan], 'safe': ['People do not have to put themselves in dangerous situations'], 'reliable': ['Can be precisely controlled', nan, nan, nan, nan], 'health monitoring': [nan, 'Peop

In [22]:
def combine_dicts(dict1, dict2):
    """
    Combine two dictionaries where each key maps to a list of values.
    
    Parameters:
    dict1 (dict): The first dictionary.
    dict2 (dict): The second dictionary.
    
    Returns:
    dict: A combined dictionary where each key maps to a concatenated list of values.
    """
    combined_dict = dict1.copy()
    for key, values in dict2.items():
        if key in combined_dict:
            combined_dict[key].extend(values)
        else:
            combined_dict[key] = values
    return combined_dict

test combine_dicts() function:

In [23]:
constant_new_comments_mapping = combine_dicts(constant_comments_mapping, new_comments_mapping)
print("mapping constant, new x comments:", constant_new_comments_mapping)
print(len(constant_new_comments_mapping))


mapping constant, new x comments: {'quick hygienic help': [nan], 'efficient': ['Work on statistics can be made more efficient through probabilities.', nan, nan, nan], 'safety': ['Prevents people from having to work under difficult or dangerous conditions to increase safety.', 'Enables deployment in difficult situations where human rescuers would put themselves in danger', nan, nan, nan, 'through reliability', 'No people are being put in danger', 'Soft material reduces risk of injury'], 'remote control': ['Can be operated by experts via remote control or autonomous systems to perform operations from a safe distance.', nan, nan, nan, nan, 'Minimizes risks associated with autonomy'], 'speed and efficiency': ['Robots can often act faster than humans and increase the speed of emergency interventions', 'Improvement of speed and efficiency of rescue operations', nan], 'safe': ['People do not have to put themselves in dangerous situations'], 'reliable': ['Can be precisely controlled', nan, nan

## Data for A, B graph (G2)

Prompt to get list of arguments and explenations:

In [24]:
from langchain_core.prompts import ChatPromptTemplate


system_template = """
You are a researcher summarizing two word lists that represent people's assessments of rigid and soft robots, whereby laypersons were informed about the risks and benefits of {robots} through scenario texts.

Participants shared their views on traditional rigid robots in a "rigid" list and on flexible, electronic-free soft robots in a "soft" list after learning about their respective risks and benefits. 
The overall theme is {topicCategory}.

Both "rigid" and "soft" lists are dictionaries with argument keys and comment values. If [nan] appears, it means no comment was provided; repeated entries or [nan] values indicate that the argument was emphasized multiple times.

Your task:
Summarize the main points for each category into a JSON object. Fill the two arrays within the JSON object, "rigid_arguments" for the "rigid" list and "soft_arguments" for the "soft" list. 
Summarize the main points for each category in a JSON object. Populate the two arrays, "rigid_arguments" and "soft_arguments," with up to five key arguments each. 
Include a brief, interrelated explanation (up to two sentences) for each argument, derived from the provided lists.


Output Format:

{{
  "assessments": {{
    "rigid_arguments": [
      {{
        "argument": "argument1",
        "explanation": "explanation of argument1"
      }},
      {{
        "argument": "argument2",
        "explanation": "explanation of argument2"
      }},
      ...
    ],
    "soft_arguments": [
      {{
        "argument": "argument1",
        "explanation": "explanation of argument1"
      }},
      {{
        "argument": "argument2",
        "explanation": "explanation of argument2"
      }},
      ...
    ]
  }}
}}

Please respond with the entire JSON structure as specified, providing up to five arguments for each list, and without any additional commentary or context.
"""


user_template = """List "rigid": 
{rigid}

List "soft": 
{soft}"""

# rescue robots and socially assistive robots
prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_template), ("user", user_template)]
)

template_out = prompt_template.invoke({"robots": "rescue robots and socially assistive robots", "topicCategory": abbreviations_dict[sheet_name], "rigid": constant_comments_mapping, "soft": new_comments_mapping})
print(template_out)

print("template_out:", template_out)
print("template_out.to_messages():", template_out.to_messages())

messages=[SystemMessage(content='\nYou are a researcher summarizing two word lists that represent people\'s assessments of rigid and soft robots, whereby laypersons were informed about the risks and benefits of rescue robots and socially assistive robots through scenario texts.\n\nParticipants shared their views on traditional rigid robots in a "rigid" list and on flexible, electronic-free soft robots in a "soft" list after learning about their respective risks and benefits. \nThe overall theme is perceived safety.\n\nBoth "rigid" and "soft" lists are dictionaries with argument keys and comment values. If [nan] appears, it means no comment was provided; repeated entries or [nan] values indicate that the argument was emphasized multiple times.\n\nYour task:\nSummarize the main points for each category into a JSON object. Fill the two arrays within the JSON object, "rigid_arguments" for the "rigid" list and "soft_arguments" for the "soft" list. \nSummarize the main points for each catego

### Single Run

Function to call LLM:

In [25]:
from langchain_openai import ChatOpenAI
from langchain.callbacks import get_openai_callback

def huggingface_API_call(
    prompt,
    
    robots,
    topicCategory,
    dictonaryRigid,
    dictonarySoft,
    
    api_key=key.hugging_api_key,

    model_name="meta-llama/Meta-Llama-3-70B-Instruct",
    json_schema=None,
    max_tokens=1000,
    temperature=0.2,
    verbose=True
):

    # Initialize the ChatOpenAI model for Hugging Face
    model = ChatOpenAI(
        model=model_name,
        openai_api_key=api_key,
        openai_api_base="https://api-inference.huggingface.co/v1/",
        max_tokens=max_tokens,
        temperature=temperature
    )

    # Check if structured output is required and configure it
    if json_schema:
        structured_llm = model.with_structured_output(json_schema, include_raw=True)
        chain = prompt | structured_llm
    else:
        chain = prompt | model

    # Execute the model and output response details
    with get_openai_callback() as cb:
        response = chain.invoke(
            {"robots": robots, "topicCategory": topicCategory, "rigid": dictonaryRigid, "soft": dictonarySoft}
        )
        
        if cb.total_tokens > max_tokens:
            print("Warning: The response may be incomplete due to exceeding the maximum token limit.")
        
        if verbose:
            print(cb)
            print(f"Total Tokens: {cb.total_tokens}")
            print(f"Prompt Tokens: {cb.prompt_tokens}")
            print(f"Completion Tokens: {cb.completion_tokens}")
            print(f"Total Cost (USD): ${cb.total_cost}")

    return response

In [26]:
response = huggingface_API_call(
    prompt=prompt_template,
    
    robots="rescue robots and socially assistive robots",
    topicCategory=abbreviations_dict[sheet_name],
    dictonaryRigid=constant_comments_mapping,
    dictonarySoft=new_comments_mapping,
    
    api_key=key.hugging_api_key,
    model_name="meta-llama/Meta-Llama-3-70B-Instruct",

    json_schema=None,
    max_tokens=4000,
    temperature=0.2,
    verbose=True
)

Tokens Used: 4409
	Prompt Tokens: 3949
	Completion Tokens: 460
Successful Requests: 1
Total Cost (USD): $0.0
Total Tokens: 4409
Prompt Tokens: 3949
Completion Tokens: 460
Total Cost (USD): $0.0


result - structured JSON output:

In [27]:
response

AIMessage(content='Here is the JSON object summarizing the main points for each category:\n\n```\n{\n  "assessments": {\n    "rigid_arguments": [\n      {\n        "argument": "safety",\n        "explanation": "Rigid robots can prevent people from having to work under difficult or dangerous conditions, increasing safety and reducing the risk of injury."\n      },\n      {\n        "argument": "efficiency",\n        "explanation": "Rigid robots can work faster and more efficiently than humans, improving the speed and effectiveness of rescue operations."\n      },\n      {\n        "argument": "reliability",\n        "explanation": "Rigid robots can be precisely controlled and are always ready to perform tasks, making them reliable in emergency situations."\n      },\n      {\n        "argument": "strength",\n        "explanation": "Rigid robots can perform tasks that require greater strength than humans, such as lifting debris or moving heavy objects."\n      },\n      {\n        "argum

In [28]:
response.content

'Here is the JSON object summarizing the main points for each category:\n\n```\n{\n  "assessments": {\n    "rigid_arguments": [\n      {\n        "argument": "safety",\n        "explanation": "Rigid robots can prevent people from having to work under difficult or dangerous conditions, increasing safety and reducing the risk of injury."\n      },\n      {\n        "argument": "efficiency",\n        "explanation": "Rigid robots can work faster and more efficiently than humans, improving the speed and effectiveness of rescue operations."\n      },\n      {\n        "argument": "reliability",\n        "explanation": "Rigid robots can be precisely controlled and are always ready to perform tasks, making them reliable in emergency situations."\n      },\n      {\n        "argument": "strength",\n        "explanation": "Rigid robots can perform tasks that require greater strength than humans, such as lifting debris or moving heavy objects."\n      },\n      {\n        "argument": "accessibili

In [29]:
import json
import re

try:
    # Attempt to parse the response directly as JSON
    data = json.loads(response.content)
    # print("Valid JSON object:", json.dumps(data, indent=2))
except json.JSONDecodeError:
    # If not valid JSON, handle extraction using regex to match JSON block between triple backticks (```)
    json_match = re.search(r'```(.*?)```', response.content, re.DOTALL)
    # If JSON block is found, parse it
    if json_match:
        json_text = json_match.group(1).strip()  # Extract JSON text and strip whitespace
        try:
            data = json.loads(json_text)   # Parse JSON
            print("Valid JSON object after regex:", json.dumps(data, indent=2))
        except json.JSONDecodeError as e:
            print("Failed to parse JSON:", e)
    else:
        print("No JSON object found.")

Valid JSON object after regex: {
  "assessments": {
    "rigid_arguments": [
      {
        "argument": "safety",
        "explanation": "Rigid robots can prevent people from having to work under difficult or dangerous conditions, increasing safety and reducing the risk of injury."
      },
      {
        "argument": "efficiency",
        "explanation": "Rigid robots can work faster and more efficiently than humans, improving the speed and effectiveness of rescue operations."
      },
      {
        "argument": "reliability",
        "explanation": "Rigid robots can be precisely controlled and are always ready to perform tasks, making them reliable in emergency situations."
      },
      {
        "argument": "strength",
        "explanation": "Rigid robots can perform tasks that require greater strength than humans, such as lifting debris or moving heavy objects."
      },
      {
        "argument": "accessibility",
        "explanation": "Rigid robots can access areas that are d

In [30]:
# Extract rigid and soft arguments and format them into a DataFrame
arguments = []
for category, items in data['assessments'].items():
    for item in items:
        arguments.append({
            'type': category.split('_')[0],  # Extracts 'rigid' or 'soft' from 'rigid_arguments'/'soft_arguments'
            'argument': item['argument'],
            'explanation': item['explanation']
        })

# Create DataFrame
df = pd.DataFrame(arguments)

df["category"] = sheet_name

# Display the DataFrame
print(df)

    type                argument  \
0  rigid                  safety   
1  rigid              efficiency   
2  rigid             reliability   
3  rigid                strength   
4  rigid           accessibility   
5   soft    lower risk of injury   
6   soft             flexibility   
7   soft  reduced risk of injury   
8   soft           accessibility   
9   soft         care and supply   

                                         explanation category  
0  Rigid robots can prevent people from having to...       SA  
1  Rigid robots can work faster and more efficien...       SA  
2  Rigid robots can be precisely controlled and a...       SA  
3  Rigid robots can perform tasks that require gr...       SA  
4  Rigid robots can access areas that are difficu...       SA  
5  Soft robots pose a lower risk of injury to hum...       SA  
6  Soft robots are more flexible and adaptable th...       SA  
7  Soft robots can reduce the risk of injury to v...       SA  
8  Soft robots can access a

### Multiple Runs

Higher order function to call LLM for all categories:

In [31]:
import json
import pandas as pd
import re

def process_robot_data(type_robot):
    # Define naming based on type_robot
    if type_robot == "RR":
        all_sheets = all_sheets_RR
        naming_robots = "rescue robots"
    elif type_robot == "SAR":
        all_sheets = all_sheets_SAR
        naming_robots = "socially assistive robots"
    elif type_robot == "Combined":
        all_sheets = all_sheets_Combined
        naming_robots = "rescue robots and socially assistive robots"
    else:
        raise ValueError("Invalid type_robot specified.")

    # Initialize an empty DataFrame for concatenation
    final_df = pd.DataFrame()

    for index, category in enumerate(abbreviations_dict.keys()):
        print(f"index: {index}, category: {category}")

        # Load the specific DataFrame for the current category
        df = all_sheets[category]

        # Generate constant and new comments mappings
        constant_comments_mapping = create_multivalue_dict(df, 'constant', 'constant_comments')
        new_comments_mapping = create_multivalue_dict(df, 'new', 'new_comments')

        # Call API with specified parameters
        response = huggingface_API_call(
            prompt=prompt_template,
            robots=naming_robots,
            topicCategory=abbreviations_dict[category],
            dictonaryRigid=constant_comments_mapping,
            dictonarySoft=new_comments_mapping,
            api_key=key.hugging_api_key,
            model_name="meta-llama/Meta-Llama-3-70B-Instruct",
            json_schema=None,
            max_tokens=4200,
            temperature=0.0,
            verbose=False
        )
        
        # print("response.content:\n", response.content)
        #  data = json.loads(response.content)
        # Regular expression to match JSON block between triple backticks (```)
        try:
            # Attempt to parse the response directly as JSON
            data = json.loads(response.content)
            # print("Valid JSON object:", json.dumps(data, indent=2))
        except json.JSONDecodeError:
            # If not valid JSON, handle extraction using regex to match JSON block between triple backticks (```)
            json_match = re.search(r'```(.*?)```', response.content, re.DOTALL)
            # If JSON block is found, parse it
            if json_match:
                json_text = json_match.group(1).strip()  # Extract JSON text and strip whitespace
                try:
                    data = json.loads(json_text)   # Parse JSON
                    # print("Valid JSON object after regex:", json.dumps(data, indent=2))
                except json.JSONDecodeError as e:
                    print("Failed to parse JSON:", e)
                    # break # !!!
            else:
                print("No JSON object found.")
                # break # !!!

        # Extract arguments and format them into a temporary DataFrame
        arguments = []
        for arg_category, items in data['assessments'].items():
            for item in items:
                arguments.append({
                    'type': arg_category.split('_')[0],
                    'argument': item['argument'],
                    'explanation': item['explanation']
                })

        df_tmp = pd.DataFrame(arguments)
        df_tmp["category"] = category
        
        print(f"length of df_tmp: {len(df_tmp)}")

        # Concatenate the current DataFrame to the final DataFrame
        final_df = pd.concat([final_df, df_tmp], ignore_index=True)

    return final_df

logic if process_robot_data() should be run:

In [32]:
run_process_robot_data = False # True

for rescue robots:

In [33]:
# Path to your Excel file
file_path = directory + "/output/G2/" + "rescue robots" + ".xlsx"

if run_process_robot_data:
    df_RR = process_robot_data(type_robot="RR")
    # save the dataframe to an Excel file
    df_RR.to_excel(file_path, index=False)
else:
    df_RR = pd.read_excel(file_path)

for socially assistive robots:

In [34]:
# Path to your Excel file
file_path = directory + "/output/G2/" + "socially assistive robots" + ".xlsx"
    
if run_process_robot_data:
    df_SAR = process_robot_data(type_robot="SAR")
    # save the dataframe to an Excel file
    df_SAR.to_excel(file_path, index=False)
else:
    df_SAR = pd.read_excel(file_path)

for rescue robots and socially assistive robots:

In [35]:
# Path to your Excel file
file_path = directory + "/output/G2/" + "rescue robots AND socially assistive robots" + ".xlsx"

if run_process_robot_data:
    df_Combined = process_robot_data(type_robot="Combined")
    # save the dataframe to an Excel file
    df_Combined.to_excel(file_path, index=False)
else:
    df_Combined = pd.read_excel(file_path)

### Summarize Generated Data for A, B graph (G2)

In [36]:
tmp_rigid = df_RR[(df_RR["category"] == "TP") & (df_RR["type"] == "rigid")]
tmp_soft = df_RR[(df_RR["category"] == "TP") & (df_RR["type"] == "soft")]

tmp_string_rigid = 'Arguments for "rigid" robots:'
for index, row in tmp_rigid.iterrows():
    tmp_string_rigid += " \n " + row["argument"]
    tmp_string_rigid += ": " + row["explanation"]
    
tmp_string_soft = 'Arguments for "soft" robots:'
for index, row in tmp_soft.iterrows():
    tmp_string_soft += " \n " + row["argument"]
    tmp_string_soft += ": " + row["explanation"]

In [37]:
tmp_string_rigid

'Arguments for "rigid" robots: \n new places: Rigid robots can search in places where people cannot reach, such as underwater caves or narrow openings. \n resilience: Rigid robots can withstand adverse conditions, making them more effective in disaster areas. \n special abilities: Rigid robots can perform special tasks like flying, hacking doors, or sending images to the control center with cameras. \n environment-independent: Rigid robots can operate in various environments, including air, water, and ground, and can withstand toxic or narrow environments. \n more power: Rigid robots can have more strength than humans, allowing them to perform tasks that require heavy lifting or drilling.'

In [38]:
tmp_string_soft

'Arguments for "soft" robots: \n deliver supplies: Soft robots can deliver essential goods like food, water, and medicine to victims in hard-to-reach areas. \n accessible: Soft robots can reach inaccessible places due to their small size and high flexibility, allowing them to supply victims with vital resources. \n care for victims: Soft robots can provide care for victims during the rescue operation, such as delivering food and medicine. \n adaptability: Soft robots can adapt to complex problems and changing situations, making them effective in disaster areas. \n temporary supply: Soft robots can provide temporary supply of vital resources to victims until human rescuers arrive.'

Prompt to get summary (focus on overlapping, diverging arguments) for A, B graph regarding single categories:

In [39]:
from langchain_core.prompts import ChatPromptTemplate


system_template = """
You are a researcher summarizing central arguments and their explenations of people's assessments of rigid and soft robots, 
whereby laypersons were informed about the risks and benefits of {robots} through scenario texts.

Participants shared their central arguments and explenations on traditional rigid robots in arguments for "rigid" robots 
and on flexible, electronic-free soft robots in arguments for "soft" robots.

The overall theme of these arguments is {topicCategory}.

Your task:

Write a paragraph highlighting the commonalities of the arguments for rigid and soft robots followed by a brief discussion of the main differences, 
focusing stronger on the arguments for soft robots. 

The paragraph should be limited to four sentences. Provide only the paragraph without any additional commentary or context.
"""


user_template = """arguments for "rigid" robots: 
{rigid}

arguments for "soft" robots: 
{soft}"""

# rescue robots and socially assistive robots
prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_template), ("user", user_template)]
)

template_out = prompt_template.invoke({"robots": "rescue robots", "topicCategory": abbreviations_dict["TP"], "rigid": tmp_string_rigid, "soft": tmp_string_soft})
print(template_out)

print("template_out:", template_out)
print("template_out.to_messages():", template_out.to_messages())

messages=[SystemMessage(content='\nYou are a researcher summarizing central arguments and their explenations of people\'s assessments of rigid and soft robots, \nwhereby laypersons were informed about the risks and benefits of rescue robots through scenario texts.\n\nParticipants shared their central arguments and explenations on traditional rigid robots in arguments for "rigid" robots \nand on flexible, electronic-free soft robots in arguments for "soft" robots.\n\nThe overall theme of these arguments is perceived technological possibilities.\n\nYour task:\n\nWrite a paragraph highlighting the commonalities of the arguments for rigid and soft robots followed by a brief discussion of the main differences, \nfocusing stronger on the arguments for soft robots. \n\nThe paragraph should be limited to four sentences. Provide only the paragraph without any additional commentary or context.\n', additional_kwargs={}, response_metadata={}), HumanMessage(content='arguments for "rigid" robots: \n

#### Single Run

Function to call LLM:

In [40]:
response = huggingface_API_call(
    prompt=prompt_template,
    
    robots="rescue robots",
    topicCategory=abbreviations_dict["TP"],
    dictonaryRigid=tmp_string_rigid,
    dictonarySoft=tmp_string_soft,
    
    api_key=key.hugging_api_key,
    model_name="meta-llama/Meta-Llama-3-70B-Instruct",

    json_schema=None,
    max_tokens=4000,
    temperature=0.2,
    verbose=True
)

Tokens Used: 571
	Prompt Tokens: 444
	Completion Tokens: 127
Successful Requests: 1
Total Cost (USD): $0.0
Total Tokens: 571
Prompt Tokens: 444
Completion Tokens: 127
Total Cost (USD): $0.0


In [41]:
response

AIMessage(content='The arguments for both rigid and soft robots highlight their potential to access and operate in challenging environments, emphasizing their ability to reach and assist victims in hard-to-reach areas. Both types of robots are seen as capable of performing tasks that humans cannot, whether due to environmental constraints or physical limitations. However, the arguments for soft robots place a stronger emphasis on their ability to provide care and support to victims, as well as their adaptability in complex and changing situations. Overall, the arguments for soft robots focus more on the humanitarian aspects of rescue operations, highlighting their potential to deliver essential goods and provide temporary support until human rescuers arrive.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 127, 'prompt_tokens': 444, 'total_tokens': 571, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'meta-l

In [42]:
response.content

'The arguments for both rigid and soft robots highlight their potential to access and operate in challenging environments, emphasizing their ability to reach and assist victims in hard-to-reach areas. Both types of robots are seen as capable of performing tasks that humans cannot, whether due to environmental constraints or physical limitations. However, the arguments for soft robots place a stronger emphasis on their ability to provide care and support to victims, as well as their adaptability in complex and changing situations. Overall, the arguments for soft robots focus more on the humanitarian aspects of rescue operations, highlighting their potential to deliver essential goods and provide temporary support until human rescuers arrive.'

### Multiple Runs

Higher order function to call LLM for all categories:

In [43]:
import pandas as pd

def summarize_processed_robot_data(type_robot):
    # Define naming based on type_robot
    if type_robot == "RR":
        tmp_df = df_RR
        naming_robots = "rescue robots"
    elif type_robot == "SAR":
        tmp_df = df_SAR
        naming_robots = "socially assistive robots"
    elif type_robot == "Combined":
        tmp_df = df_Combined
        naming_robots = "rescue robots and socially assistive robots"
    else:
        raise ValueError("Invalid type_robot specified.")


    # Initialize an empty DataFrame for concatenation
    data_array = []
    
    for index, category in enumerate(abbreviations_dict.keys()):
        print(f"index: {index}, category: {category}")

        # Load the specific DataFrame for the current category
        tmp_rigid = tmp_df[(tmp_df["category"] == category) & (tmp_df["type"] == "rigid")]
        tmp_soft = tmp_df[(tmp_df["category"] == category) & (tmp_df["type"] == "soft")]
        
        tmp_string_rigid = 'Arguments for "rigid" robots:'
        for index, row in tmp_rigid.iterrows():
            tmp_string_rigid += " \n " + row["argument"]
            tmp_string_rigid += ": " + row["explanation"]

        tmp_string_soft = 'Arguments for "soft" robots:'
        for index, row in tmp_soft.iterrows():
            tmp_string_soft += " \n " + row["argument"]
            tmp_string_soft += ": " + row["explanation"]
        
        
        # Call API with specified parameters
        response = huggingface_API_call(
            prompt=prompt_template,
            
            robots=naming_robots,
            topicCategory=abbreviations_dict[category],
            dictonaryRigid=tmp_string_rigid,
            dictonarySoft=tmp_string_soft,
            
            api_key=key.hugging_api_key,
            model_name="meta-llama/Meta-Llama-3-70B-Instruct",

            json_schema=None,
            max_tokens=4200,
            temperature=0.0,
            verbose=False
        )
        
        data_array.append({'category': category, 'summary':response.content})

    final_df = pd.DataFrame(data_array)
    return final_df

In [44]:
run_summary_processed_robot_data = False # True

for rescue robots - summary:

In [45]:
# Path to your Excel file
file_path = directory + "/output/G2/" + "rescue robots summary" + ".xlsx"

if run_summary_processed_robot_data:
    df_RR_summary = summarize_processed_robot_data(type_robot="RR")
    # save the dataframe to an Excel file
    df_RR_summary.to_excel(file_path, index=False)
else:
    df_RR_summary = pd.read_excel(file_path)

for socially assistive robots - summary:

In [46]:
# Path to your Excel file
file_path = directory + "/output/G2/" + "socially assistive robots summary" + ".xlsx"
    
if run_summary_processed_robot_data:
    df_SAR_summary = summarize_processed_robot_data(type_robot="SAR")
    # save the dataframe to an Excel file
    df_SAR_summary.to_excel(file_path, index=False)
else:
    df_SAR_summary = pd.read_excel(file_path)

## Data for Category Specific Graph (G3)

All the loaded files are originally from the following GitHub page: https://github.com/PerttuHamalainen/LLMCode

In [47]:
import os
import sys

# Assuming 'src' is one level down (in the current directory or a subdirectory)
path_to_src = os.path.join('src/LLMCode')  # Moves one level down to 'src' folder

# Add the path to sys.path
sys.path.append(path_to_src)

# Now you can import your modules
#import llms as LLMCode_LLMS
#import coding as LLMCode_coding

import src.LLMCode as LLMCode

load openAI key into environment:

In [48]:
import os

os.environ["OPENAI_API_KEY"] = key.openAI_key

## prepare data

In [49]:
print("sheet_name:", sheet_name)

# Remove rows with NaN in the key columns as they cannot be used as dictionary keys
#> not sensitive to multiple identical keys: dict(zip(df['constant'], df['constant_comments']))
df = all_sheets_RR[sheet_name]

constant_comments_mapping = create_multivalue_dict(df, 'constant', 'constant_comments')
print("mapping constant x comments:", constant_comments_mapping)
print(len(constant_comments_mapping))

new_comments_mapping = create_multivalue_dict(df, 'new', 'new_comments')
print("mapping new x comments:", new_comments_mapping)
print(len(new_comments_mapping))

deleted_comments_mapping = create_multivalue_dict(df, 'deleted', 'deleted_comments')
print("mapping deleted x comments:", deleted_comments_mapping)
print(len(deleted_comments_mapping))

sheet_name: SA
mapping constant x comments: {'safety': ['Prevents people from having to work under difficult or dangerous conditions to increase safety.', 'Enables deployment in difficult situations where human rescuers would put themselves in danger', nan, nan, nan, 'through reliability', 'No people are being put in danger'], 'remote control': ['Can be operated by experts via remote control or autonomous systems to perform operations from a safe distance.', nan, nan, nan, nan, 'Minimizes risks associated with autonomy'], 'speed and efficiency': ['Robots can often act faster than humans and increase the speed of emergency interventions', 'Improvement of speed and efficiency of rescue operations', nan], 'safe': ['People do not have to put themselves in dangerous situations'], 'reliable': ['Can be precisely controlled', nan, nan, nan, nan], 'no people necessary': [nan], 'stronger than humans': [nan, 'In many situations, e.g. when clearing rubble, a great advantage.', nan, nan], 'faster t

In [50]:
constant_new_comments_mapping = combine_dicts(constant_comments_mapping, new_comments_mapping)
print("mapping constant, new x comments:", constant_new_comments_mapping)
print(len(constant_new_comments_mapping))


mapping constant, new x comments: {'safety': ['Prevents people from having to work under difficult or dangerous conditions to increase safety.', 'Enables deployment in difficult situations where human rescuers would put themselves in danger', nan, nan, nan, 'through reliability', 'No people are being put in danger'], 'remote control': ['Can be operated by experts via remote control or autonomous systems to perform operations from a safe distance.', nan, nan, nan, nan, 'Minimizes risks associated with autonomy'], 'speed and efficiency': ['Robots can often act faster than humans and increase the speed of emergency interventions', 'Improvement of speed and efficiency of rescue operations', nan], 'safe': ['People do not have to put themselves in dangerous situations'], 'reliable': ['Can be precisely controlled', nan, nan, nan, nan], 'no people necessary': [nan], 'stronger than humans': [nan, 'In many situations, e.g. when clearing rubble, a great advantage.', nan, nan], 'faster than humans

In [92]:
# Sample dictionary with NaN values
dictionary = constant_new_comments_mapping

# Initialize the list to store single concepts
single_concepts = []

# Iterate through the dictionary
for key, values in dictionary.items():
    for value in values:
        # Check if the value is not NaN
        if isinstance(value, str):
            single_concepts.append(f"{key}: {value}")
        else:
            single_concepts.append(key)

# Print the result
print(len(single_concepts))


# Filter unique entries with more than one word
single_concepts_unique = list({entry for entry in single_concepts if len(entry.split()) > 1})

# Print the result
print(len(single_concepts_unique))

262
209


In [89]:
print(single_concepts_unique)

['fewer mistakes', 'are mentally superior: No trauma for rescuers', "reduction of injury risk: compared to 'soft' rescue robots; valid for hard rescue robots", 'accuracy of work', 'use in dangerous situations: Rescue robots can be used in dangerous situations where humans are exposed to too high a risk.', 'stronger than human: Specially equipped machines could help trapped victims quicker and more efficiently.', 'no danger: No human doctor etc. needs to enter the operating area and thus it is significantly safer and therefore protects human lives.', 'deployment possibility: Also possible under dangerous conditions', 'delivery of goods: Robots can, for example, deliver goods in war zones, which would be dangerous for humans.', 'in danger zones', 'dexterity: Often higher skill than human rescuers', 'no fatigue', 'saves lives: For example, when human helpers are not available, a rescue robot can still save human lives.', 'speed / agility', 'life-threatening places', 'do not lose strength:

In [93]:
import random

print(len(single_concepts))

# Draw 30 random entries (if the list has less than 30 entries, it will return the entire list)
single_concepts = random.sample(single_concepts_unique, min(30, len(single_concepts_unique)))

print(len(single_concepts))

262
30


In [94]:
single_concepts

['use in dangerous situations: Rescue robots can be used in dangerous situations where humans are exposed to too high a risk.',
 'more/ easier access: Robots can create (easier) access to places where it would otherwise be more difficult.',
 'no exhaustion',
 'perform operations: disease',
 'quantity: more rescue operations or more people saved possible',
 'delivery of goods: Robots can, for example, deliver goods in war zones, which would be dangerous for humans.',
 'reduced risk of injury: the victim due to the organic construction method',
 'speed / agility',
 'better search capabilities: Robots can search better in the dark and in smoke etc because they have numerous sensors',
 'do not lose strength: Robots do not lose stamina while humans could lose strength during missions.',
 'replaceable: In case of loss, no human sacrifice',
 'without basic need: Water, sleep, food etc.',
 'toxic places: Places where people can only breathe with special equipment (gas mask)',
 'safety: Enables

## run LLMCode - a toolkit for AI-assisted qualitative data analysis

In [53]:
LLMCode.init(API="OpenAI")

In [None]:
import nest_asyncio
nest_asyncio.apply()



# Sample inputs for function arguments

texts = single_concepts

research_question = "What are the key perceived benefits and risks regarding safety of the search and rescue robot?"

few_shot_examples = pd.DataFrame({
    "text": [
        "I feel valued and appreciated by my team.",
        "The training programs are really beneficial."
    ],
    "coded_text": [
        "**I feel valued and appreciated**<sup>employee recognition</sup> by my team.",
        "The **training programs**<sup>professional development</sup> are really beneficial."
    ]
})


few_shot_examples = pd.DataFrame({
    "text": [
        "The game and it's graphics, music and story made me feel calm and happy  in a way nothing else could at the time. Playing it felt like a journey to another, better place , and that's art to me.",
        "I played the game Kairo, or Cairo I can't remember, it was an atmospheric puzzle game with big rooms filled with mist and interesting lighting, all the textures were concrete"],
    "coded_text": [
        "The game and it's graphics, music and story **made me feel calm and happy**<sup>emotional response</sup> in a way nothing else could at the time. **Playing it felt like a journey to another, better place**<sup>setting; immersion</sup>, and that's art to me.",
        "I played the game Kairo, or Cairo I can't remember, **it was an atmospheric puzzle game with big rooms filled with mist and interesting lighting, all the textures were concrete**<sup>setting; creativity</sup>"
    ]
})



gpt_model = "gpt-4o"  # or another preferred GPT model
use_cache = True
max_tokens = 150  # Specify maximum tokens for each prompt if necessary
verbose = True
topicCategory = abbreviations_dict[sheet_name]


# Now you can run:
coded_texts, code_descriptions = LLMCode.code_inductively_with_code_consistency_adj(
    texts=texts,
    research_question=research_question,
    topicCategory=topicCategory,
    few_shot_examples=few_shot_examples,
    gpt_model=gpt_model,
    use_cache=use_cache,
    max_tokens=max_tokens,
    verbose=verbose
)


Original text: "fewer mistakes"
LLM output: "The text contains no insights relevant to the research question."
Text reconstruction successful

Had to reconstruct 1 texts due to LLM errors
Original text: "are mentally superior: No trauma for rescuers"
LLM output: "The text contains no insights relevant to the research question, so there is nothing to highlight or code."
Text reconstruction successful

Had to reconstruct 1 texts due to LLM errors
Original text: "use in dangerous situations: Rescue robots can be used in dangerous situations where humans are exposed to too high a risk."
LLM output: "**Rescue robots can be used in dangerous situations**<sup>enhancing safety</sup> where humans are exposed to too high a risk."
Text reconstruction successful

Had to reconstruct 1 texts due to LLM errors
Original text: "stronger than human: Specially equipped machines could help trapped victims quicker and more efficiently."
LLM output: "**Specially equipped machines could help trapped victims 

In [99]:
len(coded_texts)
coded_texts

['fewer mistakes',
 'are mentally superior: No trauma for rescuers',
 "reduction of injury risk: compared to 'soft' rescue robots; valid for hard rescue robots",
 '**accuracy of work**<sup>work quality</sup>',
 'use in dangerous situations: **Rescue robots can be used in dangerous situations**<sup>enhancing safety</sup> where humans are exposed to too high a risk.',
 'stronger than human: **Specially equipped machines could help trapped victims quicker and more efficiently**<sup>enhancing efficiency</sup>.',
 'no danger: **No human doctor etc. needs to enter the operating area**<sup>enhancing safety</sup> and **thus it is significantly safer and therefore protects human lives**<sup>enhancing safety</sup>.',
 '**deployment possibility**<sup>enhancing safety</sup>: **Also possible under dangerous conditions**<sup>enhancing safety</sup>',
 '**delivery of goods**<sup>task execution</sup>: **Robots can, for example, deliver goods in war zones, which would be dangerous for humans**<sup>enhan

In [100]:
print(len(code_descriptions))
code_descriptions

5


{'work quality': 'The "work quality" code encapsulates participants\' perceptions of the precision and reliability of tasks performed by the search and rescue robot, highlighting its ability to execute operations with high accuracy. This code reflects concerns and assurances regarding the robot\'s performance standards, emphasizing the importance of maintaining consistent and dependable outcomes in safety-critical situations.',
 'enhancing safety': 'The code "enhancing safety" captures instances where participants highlight the role of search and rescue robots in mitigating human risk by operating in hazardous environments, thereby reducing the need for human presence in potentially life-threatening situations. This perception underscores the robots\' ability to perform tasks in areas that are unsafe for humans, thus enhancing overall safety during rescue operations.',
 'enhancing efficiency': 'The code "enhancing efficiency" captures discussions where participants highlight the percei

overall:


{'robot endurance': "Captures discussions where participants highlight the robot's ability to operate for extended periods without fatigue, emphasizing its potential to enhance search and rescue missions by maintaining consistent performance over long durations.",
 'area of application': 'Captures discussions about specific environments or situations where the search and rescue robot is deemed particularly beneficial, such as hazardous locations that pose significant risks to human safety.',
 'reduction in human deployment': 'Captures instances where participants highlight the perceived safety benefit of minimizing human exposure to hazardous environments by deploying search and rescue robots, thereby reducing the need for human personnel to enter potentially dangerous situations.',
 'enhancing safety': 'Captures instances where participants highlight the benefit of minimizing human exposure to hazardous environments by utilizing search and rescue robots, thereby enhancing overall safety during operations.',
 'rapid deployment': "Captures instances where participants highlight the advantage of the search and rescue robot's ability to be quickly deployed to disaster sites, significantly reducing response time compared to human teams.",
 'simplification of difficult situations': 'Captures instances where participants highlight how search and rescue robots can streamline complex rescue operations by providing automated solutions that enhance accessibility and intervention in challenging environments.',
 'key benefit': 'Captures instances where participants highlight the primary advantage of the search and rescue robot as its ability to save lives by efficiently locating and assisting individuals in emergency situations.',
 'robot focus': 'Captures instances where participants emphasize the importance of the search and rescue robot maintaining a singular focus on its task to ensure efficiency and safety, avoiding distractions that could compromise its performance in critical situations.',
 'replaceability': 'Captures discussions where participants highlight the perceived benefit of robots in search and rescue operations being easily replaceable, thus reducing concerns about their safety compared to human rescuers.',
 'robot capability': "Captures discussions where participants highlight the robot's ability to perform complex tasks with precision and agility, emphasizing its potential to enhance safety in search and rescue operations by navigating challenging environments effectively."}

In [103]:
import json

my_array = [
    {'robot': 'RR', 'category': 'SA', 'coded_texts': coded_texts, 'code_descriptions': code_descriptions},
    {'robot': 'RR_2', 'category': 'SA_2', 'coded_texts': coded_texts, 'code_descriptions': code_descriptions},
]


file_path = directory + "/output/G3/" # + "rescue robots" + ".xlsx"


# Save to JSON file
with open(file_path + 'output_LLMcode.json', 'w') as file:
    json.dump(my_array, file, indent=4)

# To load the data back
with open(file_path + 'output_LLMcode.json', 'r') as file:
    loaded_array = json.load(file)

# Print loaded data
print(loaded_array)

[{'robot': 'RR', 'category': 'SA', 'coded_texts': ['fewer mistakes', 'are mentally superior: No trauma for rescuers', "reduction of injury risk: compared to 'soft' rescue robots; valid for hard rescue robots", '**accuracy of work**<sup>work quality</sup>', 'use in dangerous situations: **Rescue robots can be used in dangerous situations**<sup>enhancing safety</sup> where humans are exposed to too high a risk.', 'stronger than human: **Specially equipped machines could help trapped victims quicker and more efficiently**<sup>enhancing efficiency</sup>.', 'no danger: **No human doctor etc. needs to enter the operating area**<sup>enhancing safety</sup> and **thus it is significantly safer and therefore protects human lives**<sup>enhancing safety</sup>.', '**deployment possibility**<sup>enhancing safety</sup>: **Also possible under dangerous conditions**<sup>enhancing safety</sup>', '**delivery of goods**<sup>task execution</sup>: **Robots can, for example, deliver goods in war zones, which

In [102]:
my_array

[{'robot': 'RR',
  'category': 'SA',
  'coded_texts': ['fewer mistakes',
   'are mentally superior: No trauma for rescuers',
   "reduction of injury risk: compared to 'soft' rescue robots; valid for hard rescue robots",
   '**accuracy of work**<sup>work quality</sup>',
   'use in dangerous situations: **Rescue robots can be used in dangerous situations**<sup>enhancing safety</sup> where humans are exposed to too high a risk.',
   'stronger than human: **Specially equipped machines could help trapped victims quicker and more efficiently**<sup>enhancing efficiency</sup>.',
   'no danger: **No human doctor etc. needs to enter the operating area**<sup>enhancing safety</sup> and **thus it is significantly safer and therefore protects human lives**<sup>enhancing safety</sup>.',
   '**deployment possibility**<sup>enhancing safety</sup>: **Also possible under dangerous conditions**<sup>enhancing safety</sup>',
   '**delivery of goods**<sup>task execution</sup>: **Robots can, for example, deliv

In [70]:
coded_texts_deductively = LLMCode.code_deductively(texts,
                     research_question=research_question,
                     codebook=code_descriptions,
                     gpt_model=gpt_model,
                     few_shot_examples=None,
                     use_cache=True,
                     verbose=False)

 |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% 


In [71]:
coded_texts_deductively

['fewer **mistakes**<sup>work quality</sup>',
 'are mentally superior: **No trauma for rescuers**<sup>enhancing safety</sup>',
 "reduction of injury risk: compared to 'soft' rescue robots; **valid for hard rescue robots**<sup>enhancing safety</sup>",
 'accuracy of work',
 'use in dangerous situations: **Rescue robots can be used in dangerous situations where humans are exposed to too high a risk.**<sup>enhancing safety</sup>',
 'stronger than human: **Specially equipped machines could help trapped victims quicker and more efficiently.**<sup>enhancing efficiency; saves lives</sup>',
 'no danger: **No human doctor etc. needs to enter the operating area and thus it is significantly safer**<sup>enhancing safety</sup> **and therefore protects human lives**<sup>saves lives</sup>.',
 'deployment possibility: **Also possible under dangerous conditions**<sup>enhancing safety</sup>',
 'delivery of goods: Robots can, for example, **deliver goods in war zones, which would be dangerous for humans**

In [72]:
#@title Show distribution of LLM-generated deductive codes

# Parse all codes and highlights in LLM output
code_highlights_ded = LLMCode.get_codes_and_highlights(coded_texts_deductively)


In [73]:
code_highlights_ded

defaultdict(list,
            {'work quality': ['mistakes',
              'work',
              'Machines can accomplish their tasks reliably and reproducibly'],
             'enhancing safety': ['No trauma for rescuers',
              'valid for hard rescue robots',
              'Rescue robots can be used in dangerous situations where humans are exposed to too high a risk.',
              'No human doctor etc. needs to enter the operating area and thus it is significantly safer',
              'Also possible under dangerous conditions',
              'deliver goods in war zones, which would be dangerous for humans',
              'risk',
              'rescuer endangered',
              'People do not have to put themselves in dangerous situations',
              'even where it is dangerous for humans, a robot can go there',
              'reduction of injury risk',
              'the loss of a robot is less tragic than the loss of a human rescuer',
              'can be used in time

In [74]:
import plotly.express as px


# @title
def plot_generated_codes(code_highlights, title):
    code_counts = [(code, len(highlights)) for code, highlights in code_highlights.items()]
    df_codes = pd.DataFrame(code_counts, columns=['Code', 'Count'])
    df_codes = df_codes.sort_values(by='Count', ascending=False).reset_index(drop=True)

    # Create a vertical bar plot using Plotly with angled x-axis labels
    fig = px.bar(df_codes, x='Code', y='Count', title=title)

    # Update layout to angle x-axis labels at 45 degrees
    fig.update_layout(xaxis_tickangle=-45)
    fig.show()

# Parse all codes and highlights in LLM output
plot_generated_codes(code_highlights_ded, 'LLM-generated inductive codes')

In [62]:
import plotly.express as px


# @title
def plot_generated_codes(code_highlights, title):
    code_counts = [(code, len(highlights)) for code, highlights in code_highlights.items()]
    df_codes = pd.DataFrame(code_counts, columns=['Code', 'Count'])
    df_codes = df_codes.sort_values(by='Count', ascending=False).reset_index(drop=True)

    # Create a vertical bar plot using Plotly with angled x-axis labels
    fig = px.bar(df_codes, x='Code', y='Count', title=title)

    # Update layout to angle x-axis labels at 45 degrees
    fig.update_layout(xaxis_tickangle=-45)
    fig.show()

# Parse all codes and highlights in LLM output
plot_generated_codes(code_highlights_ded, 'LLM-generated inductive codes')

In [None]:
ERROR

In [None]:
system_template = """
<Context> You are a researcher tasked with summarizing a list of words into generic/superordinate categories. Based on these categories, create a dictionary that assigns the respective subordinate terms (keys from the provided "overallList") to the generic terms. Laypersons were informed about the potential risks and benefits of rigid and soft {robots} through scenario texts. They then listed their perceived risks and benefits of rigid and soft robots in the "overallList" wordlist. The overarching topic of the list is {topicCategory}, whereby the topic involved {topicCategoryDetails}.</Context>

<Data Structure> The list "overallList" is a dictionary where the keys are written arguments, and the corresponding values are one or more comments related to those arguments. The value [nan] indicates that no specific comment was provided for the respective entry. If there are multiple comments or missing entries ([nan]), it signifies that the respective argument was mentioned multiple times, emphasizing its importance. </Data Structure>

<Task> Your task is to create two outputs:
1. A list called "listGeneric" that contains the generic/superordinate categories. You may use no more than six different categories.
2. A dictionary called "dictionary" that contains: Keys (the generic/superordinate categories) and values (the corresponding words - keys - from the "overallList" that have been summarized under each category).
The dictionary must contain all corresponding words (keys) from the "overallList". If it is not possible to assign a specific word, please place it in a category called "rest category".</Task>
"""


user_template = """
List "overallList": 
{overallList}
"""

Prompt to apply coding guidelines:

In [None]:
# aaa

In [None]:
ERROR



###############################################

# aaaaaaaaaaaaaaaaaaaa

### for task to get superordinate categories within single categories

In [None]:
system_template = """
<Context>
You are a researcher tasked with summarizing a list of words into generic/superordinate categories. Based on these categories, create a dictionary that assigns the respective subordinate terms (keys from the provided "overallList") to the generic terms. Laypersons were informed about the potential risks and benefits of rigid and soft {robots} through scenario texts. They then listed their perceived risks and benefits of rigid and soft robots in the "overallList" wordlist. The overarching topic of the list is {topicCategory}.
</Context>

<Data Structure>
The list "overallList" is a dictionary where the keys are written arguments, and the corresponding values are one or more comments related to those arguments. The value [nan] indicates that no specific comment was provided for the respective entry. If there are multiple comments or missing entries ([nan]), it signifies that the respective argument was mentioned multiple times, emphasizing its importance.
</Data Structure>

<Task>
Your task is to create two outputs:
1. A list called "listGeneric" that contains the generic/superordinate categories. You may use no more than six different categories.
2. A dictionary called "dictionary" that contains:
   - Keys: The generic/superordinate categories.
   - Values: The corresponding words (keys) from the "overallList" that have been summarized under each category.
The dictionary must contain all corresponding words (keys) from the "overallList". If it is not possible to assign a specific word, please place it in a category called "rest category".
</Task>
"""

# !!!
system_template = """
<Context> You are a researcher tasked with summarizing a list of words into generic/superordinate categories. Based on these categories, create a dictionary that assigns the respective subordinate terms (keys from the provided "overallList") to the generic terms. Laypersons were informed about the potential risks and benefits of rigid and soft {robots} through scenario texts. They then listed their perceived risks and benefits of rigid and soft robots in the "overallList" wordlist. The overarching topic of the list is {topicCategory}, whereby the topic involved {topicCategoryDetails}.</Context>

<Data Structure> The list "overallList" is a dictionary where the keys are written arguments, and the corresponding values are one or more comments related to those arguments. The value [nan] indicates that no specific comment was provided for the respective entry. If there are multiple comments or missing entries ([nan]), it signifies that the respective argument was mentioned multiple times, emphasizing its importance. </Data Structure>

<Task> Your task is to create two outputs:
1. A list called "listGeneric" that contains the generic/superordinate categories. You may use no more than six different categories.
2. A dictionary called "dictionary" that contains: Keys (the generic/superordinate categories) and values (the corresponding words - keys - from the "overallList" that have been summarized under each category).
The dictionary must contain all corresponding words (keys) from the "overallList". If it is not possible to assign a specific word, please place it in a category called "rest category".</Task>
"""




user_template = """
List "overallList": 
{overallList}
"""

# rescue robots and socially assistive robots
prompt_template_SC = ChatPromptTemplate.from_messages(
    [("system", system_template), ("user", user_template)]
)

constant_new_comments_mapping = combine_dicts(constant_comments_mapping, new_comments_mapping)

result = prompt_template_SC.invoke({"robots": "rescue robots and socially assistive robots", "topicCategory": abbreviations_dict[sheet_name], "overallList": constant_new_comments_mapping})
print(result)

print("result:", result)
print("result.to_messages():", result.to_messages())

## Provide schemas for structured outputs

### for task to get main findings, differences and summary

In [None]:
json_schema = {
    "title": "Outputs",
    "description": "Bullet lists detailing the similarities and differences between the rigid and soft lists and a summary paragraph.",
    "type": "object",
    "properties": {
        "mainFindings": {
            "type": "string",
            "description": "Bullet lists highlighting the main findings of the provided rigid and soft lists",
        },
        "differences": {
            "type": "string",
            "description": "Bullet lists detailing the differences between the rigid and soft lists",
        },
          "summary": {
            "type": "string",
            "description": "Summary paragraph that provides a summary of the main findings and the found differences",
        },
    },
    "required": ["similarities", "differences", "summary"],
}

### for task to get superordinate categories within single categories

In [None]:
json_schema_notUsed = {
    "title": "Outputs",
    "description": "List that contains the generic / superordinate categories and a dictionary, which assigns the respective subordinate terms to the generic terms.",
    "type": "object",
    "properties": {
        "listGeneric": {
            "type": "string",
            "description": "List that contains the generic / superordinate categories",
        },
        "dictionary": {
            "type": "string",
            "description": "Dictionary that contains the keys, the generic / superordinate categories and the corresponding words that have been summarised under the respective category",
        },
    },
    "required": ["listGeneric", "dictionary"],
}

In [None]:
json_schema_SC = {
    "title": "Outputs",
    "description": "List that contains the generic/superordinate categories and a dictionary that assigns the respective subordinate terms to the generic terms.",
    "type": "object",
    "properties": {
        "listGeneric": {
            "type": "array",
            "description": "List that contains the generic/superordinate categories.",
            "items": {
                "type": "string"
            }
        },
        "dictionary": {
            "type": "object",
            "description": "Dictionary that contains the generic/superordinate categories as keys and the corresponding words from the 'overallList' as values.",
            "additionalProperties": {
                "type": "array",
                "items": {
                    "type": "string"
                }
            }
        }
    },
    "required": ["listGeneric", "dictionary"]
}

## Define basic API call

### for task to get main findings, differences and summary

In [None]:
def basic_API_call(
    prompt,
    robots,
    topicCategory,
    openai_api_key,
    dictonaryRigid,
    dictonarySoft,
    json_schema,
    model_name="gpt-4o",
    max_tokens=1000,
):

    # prompt = PromptTemplate(template=template)
    seed = 123

    model = ChatOpenAI(model=model_name, openai_api_key=openai_api_key, max_tokens=max_tokens, model_kwargs={"seed": seed}, temperature=0.0)
       
    structured_llm = model.with_structured_output(json_schema, include_raw=True)
    chain = prompt | structured_llm

    with get_openai_callback() as cb:
        response = chain.invoke(
            {"robots": robots, "topicCategory": topicCategory, "rigid": dictonaryRigid, "soft": dictonarySoft}
        )
        print(cb)
    
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")
        
    return response

### for task to get superordinate categories within single categories

In [None]:
def basic_API_call_SC(
    prompt,
    robots,
    topicCategory,
    openai_api_key,
    dictonaryCombined,
    json_schema,
    model_name="gpt-4o",
    max_tokens=1000,
):

    # prompt = PromptTemplate(template=template)
    seed = 123

    model = ChatOpenAI(model=model_name, openai_api_key=openai_api_key, max_tokens=max_tokens, model_kwargs={"seed": seed}, temperature=0.0)
       
    structured_llm = model.with_structured_output(json_schema, include_raw=True)
    chain = prompt | structured_llm

    with get_openai_callback() as cb:
        response = chain.invoke(
            {"robots": robots, "topicCategory": topicCategory, "overallList": dictonaryCombined}
        )
        print(cb)
    
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")
        
    return response

# Run ChatGPT

## Example (overall)

only for main findings, difference, summary

> Remark: The argument structures between the two types of robots differ significantly. Therefore, the robots are qualitatively summarized separately to ensure a clear and accurate comparison.

In [None]:
print(f"sheet_name: {sheet_name}")
print(f"abbreviations_dict[sheet_name]: {abbreviations_dict[sheet_name]}")


result = basic_API_call(prompt=prompt_template,
    robots="rescue robots and socially assistive robots",
    topicCategory=abbreviations_dict[sheet_name],
    openai_api_key=key.openai_api_key,
    dictonaryRigid=constant_comments_mapping, # overall
    dictonarySoft=new_comments_mapping,
    json_schema=json_schema,
    model_name="gpt-4o",
    max_tokens=1000,
)

In [None]:
# Extract the 'parsed' section from the JSON data
parsed_section = result.get('parsed', {})
#print(parsed_section)
# Extract the translations
mainFindings = parsed_section.get('mainFindings')
differences = parsed_section.get('differences')
summary = parsed_section.get('summary')

print(f"result (raw): {result}")
print(f"mainFindings: {mainFindings}")
print(f"differences: {differences}")
print(f"summary: {summary}")

## Separately for robots (rescue robot and socially assistive robot)

### for rescue robots (main findings, difference, summary)

In [None]:
categories = []
mainFindings = []
differences = []
summary = []
rawResults = []


for category in abbreviations_dict.keys():
    print(f"category: {category}")
    
    # do not process the rest categories
    if category not in ['RCPP', 'RCPN', 'RCA', 'RCN']:
        df = all_sheets_RR[category]
        constant_comments_mapping = create_multivalue_dict(df, 'constant', 'constant_comments')
        # print("mapping constant x comments:", constant_comments_mapping)
        # print(len(constant_comments_mapping))
        
        new_comments_mapping = create_multivalue_dict(df, 'new', 'new_comments')
        # print("mapping new x comments:", new_comments_mapping)
        # print(len(new_comments_mapping))
    
        result = basic_API_call(prompt=prompt_template,
            robots="rescue robots",
            topicCategory=abbreviations_dict[category],
            openai_api_key=key.openai_api_key,
            dictonaryRigid=constant_comments_mapping,
            dictonarySoft=new_comments_mapping,
            json_schema=json_schema,
            model_name="gpt-4o",
            max_tokens=1600, # increase limit
        )
        
        # append raw results
        categories.append(category)
        rawResults.append(result)
        
        # append parsed results
        parsed_section = result.get('parsed', {})
        mainFindings.append(parsed_section.get('mainFindings'))
        differences.append(parsed_section.get('differences'))
        summary.append(parsed_section.get('summary'))
        #print("length of mainFindings:", len(parsed_section.get('mainFindings')))
        #print("length of differences:", len(parsed_section.get('differences')))
        #print("length of summary:", len(parsed_section.get('summary')))
        
# save file
df_RR = pd.DataFrame({
    'Category': categories,
    'mainFindings': mainFindings,
    'differences': differences,
    'summary': summary,
    'rawResults' : rawResults
})

# Path to your Excel file
file_path = directory + "/output/" + "rescue robot_ChatGPT" + ".xlsx"
# save the dataframe to an Excel file
df_RR.to_excel(file_path, index=False)

### for rescue robots (get superordinate categories within single categories: listGeneric, dictionary)

In [None]:
categories = []
mainFindings = []
differences = []
summary = []
rawResults = []


for category in abbreviations_dict.keys():
    print(f"category: {category}")
    
    # do not process the rest categories
    # not in ['RCPP', 'RCPN', 'RCA', 'RCN']:
    if category in ['MT']:
        df = all_sheets_RR[category]
        constant_comments_mapping = create_multivalue_dict(df, 'constant', 'constant_comments')
        # print("mapping constant x comments:", constant_comments_mapping)
        # print(len(constant_comments_mapping))
        
        new_comments_mapping = create_multivalue_dict(df, 'new', 'new_comments')
        # print("mapping new x comments:", new_comments_mapping)
        # print(len(new_comments_mapping))
        constant_new_comments_mapping = combine_dicts(constant_comments_mapping, new_comments_mapping)


        result = basic_API_call_SC(prompt=prompt_template_SC,
            robots="rescue robots",
            topicCategory=abbreviations_dict[category],
            openai_api_key=key.openai_api_key,
            dictonaryCombined=constant_new_comments_mapping,
            json_schema=json_schema_SC,
            model_name="gpt-4o",
            max_tokens=2000, # increase limit
        )
        
        # append raw results
        categories.append(category)
        rawResults.append(result)
        
        # append parsed results
        #parsed_section = result.get('parsed', {})
        #mainFindings.append(parsed_section.get('mainFindings'))
        #differences.append(parsed_section.get('differences'))
        #summary.append(parsed_section.get('summary'))
        #print("length of mainFindings:", len(parsed_section.get('mainFindings')))
        #print("length of differences:", len(parsed_section.get('differences')))
        #print("length of summary:", len(parsed_section.get('summary')))
        


In [None]:
parsed_section = result.get('parsed', {})
print(parsed_section)

In [None]:
# save file
df_RR = pd.DataFrame({
    'Category': categories,
    'mainFindings': mainFindings,
    'differences': differences,
    'summary': summary,
    'rawResults' : rawResults
})

# Path to your Excel file
file_path = directory + "/output/" + "rescue robot_ChatGPT" + ".xlsx"
# save the dataframe to an Excel file
df_RR.to_excel(file_path, index=False)

### for socially assistive robots (main findings, difference, summary)

In [None]:
categories = []
mainFindings = []
differences = []
summary = []
rawResults = []


for category in abbreviations_dict.keys():
    print(f"category: {category}")
    
    # do not process the rest categories
    if category not in ['RCPP', 'RCPN', 'RCA', 'RCN']:
        df = all_sheets_SAR[category]
        constant_comments_mapping = create_multivalue_dict(df, 'constant', 'constant_comments')
        # print("mapping constant x comments:", constant_comments_mapping)
        # print(len(constant_comments_mapping))
        
        new_comments_mapping = create_multivalue_dict(df, 'new', 'new_comments')
        # print("mapping new x comments:", new_comments_mapping)
        # print(len(new_comments_mapping))
    
        result = basic_API_call(prompt=prompt_template,
            robots="socially assistive robots",
            topicCategory=abbreviations_dict[category],
            openai_api_key=key.openai_api_key,
            dictonaryRigid=constant_comments_mapping,
            dictonarySoft=new_comments_mapping,
            json_schema=json_schema,
            model_name="gpt-4o",
            max_tokens=1000,
        )
        
        # append raw results
        categories.append(category)
        rawResults.append(result)
        
        # append parsed results
        parsed_section = result.get('parsed', {})
        mainFindings.append(parsed_section.get('mainFindings'))
        differences.append(parsed_section.get('differences'))
        summary.append(parsed_section.get('summary'))
        
# save file
df_SAR = pd.DataFrame({
    'Category': categories,
    'mainFindings': mainFindings,
    'differences': differences,
    'summary': summary,
    'rawResults' : rawResults
})

# Path to your Excel file
file_path = directory + "/output/" + "socially assistive robot_ChatGPT" + ".xlsx"
# save the dataframe to an Excel file
df_SAR.to_excel(file_path, index=False)