# Lecuture NHH 2024 - Using LLMs for financial insight
In this notebook we will intoduce some ways we are using LLMs to expand research and decision making in NBIM.

## What Will You Learn?
The notebook covers:

- Set up Python environment for LLM development
- Use GPT models through Azure
- Analyze and extract data from different financial texts

## Prerequisites
Before you start, you'll need:

- A Python 3.10+ environment
- Jupyter
- VS Code or another code editor you are comfortable with
- Azure OpenAI api key and base url

## Getting Started
Run the first cell to install all required packages.\
The notebook contains step-by-step instructions and code examples to guide you through the process of using LLMs in Python.\
At the end of each section there are some challenges you can try to solve on your own.

Hope you enjoy the notebook, any feedback is appreciated at andreas.harto@nbim.no.

Let's start coding!

---------------------------------------------------------------------------------------------------------------------------------------------------

## 1 - Installing the required python packages and setting up the environment


In [None]:
# Installing the required packages
%pip install -U openai pydantic

In [None]:
# Set environment variables and create openai client

import os
from openai import AzureOpenAI
from google.colab import userdata # To get the secret keys

# Deployment name in azure openai studio
gpt_model = "gpt-4o-mini"  # ex. gpt-4o-mini

client = AzureOpenAI(
    api_key=userdata.get('AZURE_OPENAI_API_KEY'),
    api_version="2023-03-15-preview",
    azure_endpoint=userdata.get('AZURE_OPENAI_ENDPOINT'),
)

In [None]:
# Test that everything is working
messages = [{"role": "user", "content": "Hello, ready for some AI analysis?"}]
response = client.chat.completions.create(model=gpt_model, messages=messages, max_tokens=50)

print("Response: \n", response.choices[0].message.content)

# Expected respnse should be something like:
# Response:
#  Absolutely! What do you need analyzed?

-------------------

## 2 - Using the OpenAI python package

Read more about the OpenAI API [here](https://platform.openai.com/docs/api-reference/introduction)

The main input fields are:
1. "model" sets the model to use. Deployment name for Azure OpenAI
2. "messages" is main input to the api, note the these have different roles
    - Message with role "system" sets the system message where you can describe how you want the llm to act and behave
    - Message with role "user" is where you write your prompt
    - Message with role "assistant" is typically answers from the llm
    - Message with role "tool" is used for tool calls
3. "temperature" controles the creativity/randomness of the model. Higher temperature gives more creative answers, while lower temperature give you a more predictable output.

A typical message list can look like:
```json
[
    {
        "role": "system",
        "content": "You are a helpful assistant. Your main task is to analyse financial reports.",
    },
    {
        "role": "user",
        "content": "What does higher CAPEX indicate in a quarery report?",
    },
]
```

The API itself has no memory, meaning that you have to sende the whole chat history in every request to the API for the llm to know about earlier questions and answers.


In [None]:
# Using pydantic models from openai types for improved type checking
from openai.types.chat import ChatCompletionUserMessageParam, ChatCompletionSystemMessageParam

messages = [
    ChatCompletionSystemMessageParam(
        role="system", content="You are a helpful assistant. Your main task is to analyse financial reports."
    ),
    ChatCompletionUserMessageParam(
        role="user", content="What does higher CAPEX indicate in a company's quarterly report?"
    ),
]

completion = client.chat.completions.create(model=gpt_model, messages=messages, temperature=0.5)

print(completion.choices[0].message.content)

### Challenge 1
1. Try to change the system message to tweek the reponse to be in another language
2. Play around with different user prompts
3. Adjust the temperature and observe the responses to the same prompt

In [None]:
# Try solving the challenges here

-------------------

## 3 - Financal analysis
LLMs are trained on large amounts of data from the internet, which means they have a cut-off date for their training data and can hallucinate when asked about facts. \
To reduce hallucinations or to include data that was not in the training set, a very common approach when working with LLMs is to include data in the prompts.

There are many ways to efficiently include external data when working with LLMs. Here are a few:
1.	Using local text files
2.	Fetching data directly from the web
3.	Data stored in a pre-created vector store
4.	Traditional databases

As mentioned earlier, LLMs themselves do not have internal memory, so we have to include all the data the LLM needs to know about in the prompt. \
\
\
All needed data in this notebook is localed in the `/data` folder:
- Novo Nordisk 2024 Q2 earnings call transcript
- NVIDIA quarterly report Q1 2024
- Tesla Q1 2023 trascript
- Xiaomi 2022 earnings transcript, in chinese

Sources:
- https://www.sec.gov/edgar/search/#
- https://seekingalpha.com/


In [None]:
# Simple example on how to include data in the prompt

data = """
    Company, Revenue
    Apple, 100
    Microsoft, 200
    Amazon, 50
    Netflix, 150
"""

messages = [
    ChatCompletionUserMessageParam(
        role="user",
        content=f"""
            Using the data below, calculate the average revenue for the year 2021.

            <data>
                {data}
            </data>

            Calulation:
        """,
    ),
]

completion = client.chat.completions.create(model=gpt_model, messages=messages, temperature=0.5)

print(completion.choices[0].message.content)
# Correct answer should be 125

### Earnings call transcript

Lets start by having a look at some earnings calls from different companies. \
All the transcripts should be uploaded to root folder in the colab `/`.

The transcripts are live Q&A where reporterts and analysts can ask the company \
questions after they have presented their results. \
Since the questions and answers are live these tend to be more honest. \

There are a lot of interesting data to extract from earning calls and reports.
- Summary
- Sentiment
- Risk factors
- Financial outlook
- Sector comparison
- Etc.


In [None]:
from openai.types.chat import ChatCompletionUserMessageParam

path_to_file = "novo-nordisk-2024-q2.txt"

# Read the data from the file
with open(path_to_file, "r", encoding="UTF-8") as file:
    data = file.read()

messages = [
    ChatCompletionSystemMessageParam(
        role="system",
        content="""
            You are an expert financial assistant. Your main task is to analyse earning calls transcripts.
            You should only use the data provided to answer the questions.
            Keep your answers short and to the point.
        """,
    ),
    ChatCompletionUserMessageParam(
        role="user",
        content=f"""
            What at the most interesting insights you can find in the data below?
            Give a brief summary of the company's performance in the quarter.

            Provide a concluding statement on the company's performance,
            is the company outperforming or underperforming?.

            <data>{data}</data>
        """,
    ),
]

completion = client.chat.completions.create(
    model=gpt_model,
    messages=messages,
    temperature=0.1,
)

print(completion.choices[0].message.content)

### Sentiment analysis
Lets try to use an llm to perform sentiment analysis. \
Try the different companies and see if you agree with the score based on the summary.

Note that Xiaomis transcript is in chinese.

In [None]:
from openai.types.chat import ChatCompletionUserMessageParam

path_to_file = "novo-nordisk-2024-q2.txt"
# path_to_file = "tesla-q1-2023-transcript.txt"
# path_to_file = "intel-q1-2024-earnings-call.txt"
# path_to_file = "xiaomi-2022-earnings-transcript.txt"

# Read the data from the file
with open(path_to_file, "r", encoding="UTF-8") as file:
    data = file.read()

messages = [
    ChatCompletionSystemMessageParam(
        role="system",
        content="""
            You are an expert financial and linguistic analyst.
            Your main task is to analyse earning calls transcripts and score the sentiment based on the Q&A section.
            You should only use the data provided below for the analysis.
            Keep your answers short and to the point. It is very important to provide a clear, concise and honest analysis.
            Vauge or unclear answers should impact the score negatively.
            The final score should be between 0 and 10. 0 being negative, 5 neutral/no change and 10 being very positive.
            Provide the final score in a <score> tag.
        """,
    ),
    ChatCompletionUserMessageParam(
        role="user",
        content=f"""
            Analyse the sentiment of the Q&A section in the data below.

            <data>{data}</data>

            Analysis:
        """,
    ),
]

completion = client.chat.completions.create(
    model=gpt_model,
    messages=messages,
    temperature=0.2,
)


assert completion.choices[0].message.content is not None

print(completion.choices[0].message.content)

# Extract the final score from the completion tag
final_score = completion.choices[0].message.content.split("<score>")[1].split("</score>")[0]
print("\n Final score:", final_score)

### Challenge 2
Create a dictionary for three of the companies with:
- Name of the company and what quarter the call covers
- The most disussed topics in the call
- The top three financial risk factors
- The current earnings per share

In [None]:
# Solution for the challenge

## 4 - Extracting numbers from financial reports

Using LLMs to extract data like reported numbers can be a powerful, but challenging task. \
Lets use a quarterly report from NVIDIA to see how well the LLM performes.

In [None]:
import json
from openai.types.chat import ChatCompletionUserMessageParam, ChatCompletionSystemMessageParam

path_to_file = "nvidia-report-q2-2024.txt"

# Read the data from the file
with open(path_to_file, "r", encoding="UTF-8") as file:
    data = file.read()

messages = [
    ChatCompletionSystemMessageParam(
        role="system",
        content="""
            You are an expert financial analyst. Your main task is to extract insights from financial reports.
            You should only use the data provided below for the analysis.
            The response should be in JSON format.
        """,
    ),
    ChatCompletionUserMessageParam(
        role="user",
        content=f"""
            Extract the following insights from the data below in million USD:
            - Gross profit
            - Total operating expenses
            - Net income
            - Earnings per share, basic shares

            Find the number for both Jul 2024 and Jul 2023.

            The output json should be in the following format:
            {{
                "q2_2024": {{
                    "gross_profit": <number>,
                    "operating_expenses": <number>,
                    "net_income": <number>,
                    "EPS": <number>
                }},
                "q2__2023": {{
                    "gross_profit": <number>,
                    "operating_expenses": <number>,
                    "net_income": <number>,
                    "EPS": <number>
                }}
            }}

            <data>{data}</data>

            Response:
        """,
    ),
]

completion = client.chat.completions.create(
    model=gpt_model,
    messages=messages,
    temperature=0,
    response_format={"type": "json_object"},  # Guarantees that the response is in JSON format
)

assert completion.choices[0].message.content is not None

print(completion.choices[0].message.content)

q2_2024_data = json.loads(completion.choices[0].message.content)
print("Earnings per basic share in Q2 2024:", q2_2024_data["q2_2024"]["EPS"])

### Challenge 3
1. Try to create a breakdown of revenue by products(Gaming, data center etc) and calculate the change from Q2 2023
2. What can you do to validate that the numbers we extract are correct? \
    Write down some suggestions or discuss with the people next to you.

In [None]:
# Solve the challenges here

## Bonus section - Using LLMs with tools

Tool use or function calling are very powerful techniques that allow you to connect LLMs to external tools and systems. \
By doing so, we can combine the flexibility of LLMs with grounding and actions in real systems.

Read more about function calling by OpenAI [here](https://platform.openai.com/docs/guides/function-calling), and recent [structured outputs](https://openai.com/index/introducing-structured-outputs-in-the-api/) \
Similar examples from [Mistral](https://docs.mistral.ai/capabilities/function_calling/) and [Anthropic](https://docs.anthropic.com/en/docs/build-with-claude/tool-use)

-----
Here, we will use an example from NBIM to demonstrate the power of function calling.

At NBIM, we receive many emails from companies in which we have invested, asking us for investor meetings. \
We have over 9,000 companies in our portfolio, making it hard to keep track of who is responsible for each \
company and when we last met them. With LLMs and function calling, we can automatically identify which company \
the request is from and provide an overview of who the contact person at NBIM is, along with more details on when we last met, etc.

Let’s see how!



In [None]:
# First we define a function that represents access to our internal systems in NBIM

def get_company_investor_relation_data(company_name: str):
    # This function would normally access our internal systems to get the data
    # For the purpose of this demo, we will return a static data
    return {
        "company": company_name,
        "contact_person": "John Doe",
        "last_meeting_date": "2023-07-15",
        "last_meeting_notes": "The company is doing well and is planning to expand to new markets.",
        "current_holding": 1000,
    }

In [None]:
# Next we need to describe what the function does and what input it expects in a format that OpenAI can understand
from openai.types.chat import ChatCompletionToolParam
from openai.types.shared_params import FunctionDefinition

investor_relation_tool = ChatCompletionToolParam(
    function=FunctionDefinition(
        name="get_company_investor_relation_data",
        description="Get investor relation data for a company for a company, relevant when asking for investor meetings, holdings, contact person etc.",
        parameters={
            "type": "object",
            "properties": {
                "company_name": {
                    "type": "string",
                    "description": "Name of the company",
                },
            },
            "required": ["company_name"],
            "additionalProperties": False,
        },
        strict=False,
    ),
    type="function",
)

In [None]:
# Lets see if gpt picks up that a function can be called

dummy_request = """
    Hi NBIM,

    The CFO of Spotify will be in London on September 26th.
    We would like to meet with you to discuss our latest earnings report.
    Please let me know if there is any interest in scheduling a meeting.

    Best regards,
    Jane Doe
    Investor Relations
    Spotify
"""

messages = [
    ChatCompletionSystemMessageParam(
        role="system",
        content="You are a helpful employee in NBIM communications. Use the supplied tools to assist the incoming.",
    ),
    ChatCompletionUserMessageParam(
        role="user",
        content=f"Use the request below to find relevnat information in order to resond to the email.\n\n{dummy_request}",
    ),
]

completion = client.chat.completions.create(
    model=gpt_model, messages=messages, temperature=0.1, tools=[investor_relation_tool], tool_choice="auto"
)

print(completion.choices[0].message)

In [None]:
# Lastly, we want to see if GPT detected a tool use and then use the function and argument to call our internal system.

if not completion.choices[0].message.tool_calls:
    print("No tool call detected")
else:
    tool_call = completion.choices[0].message.tool_calls[0]

    if tool_call.function.name == "get_company_investor_relation_data":
        company_name = json.loads(tool_call.function.arguments).get("company_name", "")
        data = get_company_investor_relation_data(company_name)
        print(data)
    else:
        print("Unknown tool call")

We can now use the resulting data to either draft an answer to the request or further automate internal processes. \
The main advantage of tools used here is the flexibility that LLMs provide. We don’t need to create rules to \
determine which company is sending the request, and we can potentially integrate other tools to use instead of or in parallel with our investor relations tool.