# Set up the environment
## Install and import packages
The following cells install certain required packages and components into your virtual environment for working through each of the steps in this particular programme. 
Read the comments (any text that follows "#") for details of each line. The comments will also contain details of what elements can be changed for your individual use case where applicable.

In [None]:
# Install required packages
! pip install -r ../requirements.txt

In [1]:
# import required packages
import os # this package is used to access information from the operating system
import json # this package is used to read and write JSON files
import base64 # this package is used to encode and decode binary data, such as images
from openai import OpenAI # this package is used to interact with the OpenAI API
from dotenv import load_dotenv # this package is used to load environment variables from a .env file
import pandas as pd # this package is used for working with datasets

## Set variables and define functions
The following cells configure specific variables and define functions used later in this notebook. 
Note that using the OpenAI API requires an OpenAI secret key. OpenAI provides information about [managing your API keys.](https://platform.openai.com/docs/api-reference/authentication) 

In [None]:
# Load environment variables within the virtual environment
load_dotenv()  # This line brings all environment variables from .env into os.environ 

# Define the client and load the API key
client = OpenAI()

In [6]:
# Define a function to encode the image for the LLM
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

In [None]:
# Define system instructions for the LLM. Using the API, users can set system instructions for all requests sent to the LLM, which guide the model's responses. Users can edit the text below to change the system instructions. Run this cell after making any changes to set the revised system instructions before generating responses.
system_instructions = "Interpretive text are words used to explain the meaning behind museums, collections, or objects. Museums use collections to convey meaning, and an immediate way to interpret museum collections is through text. Interpretive text can give you a chance to experiment and to think outside the box. Well planned, written and placed text is a useful and effective way of encouraging your visitors to engage with your collections and the meanings behind them. Consider an object in a museum without any interpretive text. Museum visitors will project their own meaning onto this object, drawing upon their experience, interest, and knowledge. The role of interpretive text is to show the object within its wider context whilst still allowing the visitor to make up their own mind. This can be applied to groups of objects, historic buildings, landscapes, events, or indeed anything else you are interpreting. Interpretive text should be clear and succinct. Most visitors will not want to read a label or panel of more than 100 words. The text should be split into short paragraphs. In addition to clarity, labels should also have personality and rhythm, which will be favourable to the visitor's imagination and pique their interest. Before you begin writing, remember: 1. Objects are relevant or interesting because of the people who have used or continue to use them. You should ensure that there is a human presence in your interpretive text. People relate to other people. Utilise a human story from any object you are interpreting; 2. Use text to place objects in their historical and cultural context; 3. The text is adding context and conveying information about the object. The emphasis should be on the object; the text should be supportive; 4. Interpretive text doesn't stop with a physical visit to the museum. Many visitors will have looked at the museum's website before visiting. They may continue to engage with the museum in a digital capacity post-visit. You can use a digital platform like your organisation's website to go into more detail about objects. When you do this, the overall tone and writing style should stay consistent with interpretive text used elsewhere; 5. A label should make direct reference to its object. Encourage the label reader to look closely at the object and to develop their own conclusions about it, where this is appropriate; 6. Carefully select each word you use so that either a narrative is developed, the reader has learned something, or their interest is stimulated. This will also help you to keep the text succinct. Follow the 25% rule of copywriting. (Write your text, then cut out 50%. Review, then cut out another 50%. The remaining 25% should be clear and concise.). People often talk about the tone of a piece of writing; the impression it might create in its readers, or the associations and memories that this may evoke in the reader. This is especially important when attempting to engage visitors in your collections, buildings, or spaces. You should ensure as far as possible that the tone of your interpretation is appropriate to the content of your exhibition. Are you interpreting something that may be divisive? Do the objects or text that make up your exhibit have the potential to be upsetting to people? Extra care should be taken be to ensure that your tone is respectful and appropriate, and that your language is inclusive. Use this guidance along with the metadata and image provided by the user to write interpretive text. The text should not exceed 100 words."

## Testing your connection to OpenAI API
The following cell allows users to test their connection to the OpenAI API. If there are problems with keys or with the model selected in the chat completion configuration, users can troubleshoot them now on an ad hoc basis before working programmatically with larger datasets.
OpenAI makes available information about the [different models available to access via API](https://platform.openai.com/docs/pricing) along with the pricing per 1M tokens for each model.
The following cells provides an example of a chat completion response along with the number of tokens consumed to produce that response.


In [None]:
# Test the OpenAI API by creating a chat completion
# This is a simple example to check if the API is working correctly
# You can replace the messages with your own input to test different functionalities.

completion = client.chat.completions.create(
  model="gpt-3.5-turbo", # You can change the model to any other available model
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)

# The following lines show an example of how the model generates a response and how to access the generated content and associated metadata.
print(completion) # This prints the entire completion object, which contains metadata and the generated response.
print(completion.choices[0].message) # This prints the message part of the first choice in the completion, which is the actual response generated by the model.
print(completion.choices[0].message.content) # This prints just the content of the message, which is the text response from the model.
print("********") # This prints a separator line for better readability in the output.
# The following lines show how to access the token usage statistics of the completion request.
print("Completion tokens: "+str(completion.usage.completion_tokens)) # This prints the number of completion tokens used in the request.
print("Prompt tokens: "+str(completion.usage.prompt_tokens)) # This prints the number of prompt tokens used in the request.
print("Total tokens: "+str(completion.usage.total_tokens)) # This prints the number of total tokens used in the request.

ChatCompletion(id='chatcmpl-C1BdpeI3Zt2Bk9ef7CIss0te0lnRi', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hello! How can I assist you today?', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1754399113, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=9, prompt_tokens=19, total_tokens=28, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))
ChatCompletionMessage(content='Hello! How can I assist you today?', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None)
Hello! How can I assist you today?
********
Completion tokens: 9
Prompt tokens: 19
Total tokens: 28


# Create your first dataframe
Now that you have set up your virtual environment with all requisite packages and parameters, import the collection metadata to start working with functions and scripts programmatically. Dataframes are two dimensional data structures, which are common ways of working with data in data science, machine learning, and other data intensive fields. The following two cells contain two ways of creating a dataframe (*df*) from a file. Use the first to import data from a CSV file and the second to import data directly from a JSON file. 

In [3]:
# use pandas to read a JSON file
df = pd.read_json('../data/worldCulturesLivingLands.json')  # replace with your actual file path
# The following line selects specific columns from the data. You can add, remove, or use different names for the columns based on your dataset.
df = df[['museum_reference','title','description','date','style_culture','image','interpretation']]
# display the first few rows of the dataframe
df.head()

Unnamed: 0,museum_reference,title,description,date,style_culture,image,interpretation
0,A.1911.397.243,Firestick,"Firestick, wood with a reed sheath decorated w...",19th - 20th century,Australian Aboriginal,A.1911.397.243.png,A common method for making fire was to rub two...
1,A.1898.372.39,Vessel,"Vessel (coolamon), roughly elliptical with rai...",19th century,Australian Aboriginal,A.1898.372.39.png,Coolamon are wooden dishes moulded over a fire...
2,A.1911.397.159,Bag,"String bag (dilly-bag), plant fibre: Australas...",19th - 20th century,Australian Aboriginal,A.1911.397.159.png,"Across Australia's Northern Territory, Aborigi..."
3,K.2002.825,Basket,"Basket, bicornual with handle, cane plant: Aus...",19th century,Australian Aboriginal,K.2002.825.png,Crescent-shaped cane baskets were unique to th...
4,V.2008.25,Basket,"Basket with black, red and yellow strips runni...",c. 2007,Pitjantjatjara,V.2008.25.png,Colourful coiled baskets are popular tourist i...


# Generate interpretive text using the LLM

The following cell uses the system instructions, metadata, and an image of the object to prompt the selected model to generate interpretive text for each object in the dataframe created above using the OpenAI GPT4.1 model.
Model selection depends on a variety of factors including performance, capability, and cost. OpenAI publishes [tools for comparing the different models](https://platform.openai.com/docs/models). If images are included in the prompt, then the model selected in the prompt must have "vision" capabilities. Users should note that image inputs are [charged in tokens based on their dimensions](https://openai.com/api/pricing/). 


In [None]:
# Prompt OpenAI API to generate interpretative text for each object in the dataframe.

# create a list to store the responses
GenAIinterpretation = []
# create a list to store the number of tokens used
tokens_used = []

# loop through each row in the dataframe 
for index, row in df.iterrows():
    # define object metadata
    image_path = os.path.join('../images', row['image'])  # create the full path to the image file
    title = row['title']  # get the name from the dataframe
    description = row['description']  # get the description from the dataframe
    date = row['date']  # get the date from the dataframe
    style_culture = row['style_culture']  # get the style or culture from the dataframe

    # encode the image to base64
    base64_image = encode_image(image_path)

    # use the OpenAI API to generate object interpretation
    response = client.responses.create(
        model="gpt-4.1", # You can change the model to any other available model.
        input=[
            {
            "role": "system", 
            "content": system_instructions
            },
            {
                "role": "user",
                "content": [
                    { 
                        "type": "input_text", 
                        "text": f"Title: {title}, Description: {description}, Date: {date}, Culture: {style_culture}" 
                    },
                    {
                        "type": "input_image",
                        "image_url": f"data:image/png;base64,{base64_image}",
                    },
                ],
            }
        ],
    )
    GenAIinterpretation.append(response.output_text) # This appends the generated interpretation to the list
    tokens_used.append(response.usage.total_tokens) # This appends the number of tokens used to the list

# add the responses to the dataframe
df['GenAIinterpretation'] = GenAIinterpretation
df['tokens_used'] = tokens_used

# print the dataframe with the interpretations
df.head()

Unnamed: 0,museum_reference,title,description,date,style_culture,image,interpretation,GenAIinterpretation,tokens_used
0,A.1911.397.243,Firestick,"Firestick, wood with a reed sheath decorated w...",19th - 20th century,Australian Aboriginal,A.1911.397.243.png,A common method for making fire was to rub two...,"Firesticks such as this, with a reed sheath an...",1258
1,A.1898.372.39,Vessel,"Vessel (coolamon), roughly elliptical with rai...",19th century,Australian Aboriginal,A.1898.372.39.png,Coolamon are wooden dishes moulded over a fire...,"This coolamon, an elliptical wooden vessel fro...",1273
2,A.1911.397.159,Bag,"String bag (dilly-bag), plant fibre: Australas...",19th - 20th century,Australian Aboriginal,A.1911.397.159.png,"Across Australia's Northern Territory, Aborigi...","Woven from plant fibres, this Australian Abori...",1258
3,K.2002.825,Basket,"Basket, bicornual with handle, cane plant: Aus...",19th century,Australian Aboriginal,K.2002.825.png,Crescent-shaped cane baskets were unique to th...,"Handwoven from cane plant, this bicornual bask...",1250
4,V.2008.25,Basket,"Basket with black, red and yellow strips runni...",c. 2007,Pitjantjatjara,V.2008.25.png,Colourful coiled baskets are popular tourist i...,"Woven by Alison (Milyika) Carroll in 2007, thi...",1290


In [41]:
# display the interpretation for any individual object in the dataframe
display(df.GenAIinterpretation[100]) # Change the number in square brackets following 'GenAIinterpretation' to view the GenAIinterpretation for a different object

'Crafted in 19th-century Cape York Peninsula, this necklace features delicate pearl shell oblongs, each hand-perforated and patiently threaded onto cordage. Such adornments held significance for Aboriginal communities in Queensland, symbolising identity, connection to land and sea, and intricate craftsmanship. As you observe the soft iridescence of the shell, consider the careful hands that shaped each piece, and the stories woven into this wearable piece of history.'

# Save the dataframe
Dataframes work very well within this virtual environment for a wide variety of computational tasks, but they are not designed for long-term data storage or for sharing and distribution. The following cells take the data from the dataframe and store it in one of two persistent file formats. Users can use the first cell to export data to a CSV file and the second to export data to a JSON file.

In [None]:
# use pandas to write a CSV file
df.to_csv('../outputs/GenAIinterpretation.csv', index=False)  # replace 'outputs/GenAIinterpretation.csv' with the preferred file path

In [None]:
# use pandas to write a JSON file
df.to_json('../outputs/GenAIinterpretation.json', orient='records', lines=True)  # replace 'outputs/GenAIinterpretation.json' with the preferred file path

Go to the Workbooks directory and open the workbook titled "2_analysis.ipynb" to continue with the analysis.