# Setup

In this notebook, we will explore Large Language Models (LLMs) using use the OpenAI API. An **API (Application Programming Interface)** allows different software applications to communicate with one another. By using the OpenAI API, we can connect to OpenAI's language models, such as GPT-4, to generate human-like text and perform a variety of tasks, such as answering questions, summarizing, translating, and more.

## API Key Setup

To begin using the OpenAI API, we need to set up an API key. This key connects the code to your OpenAI account and manages the billing. OpenAI charges based on the number of tokens (words or chunks of words) processed. Different models have different pricing structures, based on their size and capabilities.

- **GPT-4o** is the latest and most advanced model, designed for a wide range of general-purpose tasks.

To set the API key, we need to configure it as an environment variable, which can vary depending on the type of machine you're using.

For detailed documentation and additional usage instructions, please refer to the [OpenAI API Reference](https://platform.openai.com/docs/api-reference).

### Windows Users

If you're using a Windows machine, uncomment and run the following code block to set your environment variable:

### Google Classroom

We'll need to share API Keys, please join this classroom to grab them easily [https://classroom.google.com/c/NzE0OTI4MTQxODI5?cjc=gdyiwn6]


In [4]:
import os
os.environ['OPENAI_API_KEY'] = "key from Alex"


SUCCESS: Specified value was saved.


### Mac/Linux Users

Uncomment & run the below chunk if utilizing a *nix machine.

In [None]:
#!export OPENAI_API_KEY "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"

## Installing the OpenAI Python Package

Before interacting with the OpenAI API, we need to ensure that the OpenAI Python package is installed. This package provides a simple interface to the OpenAI API, allowing us to make requests and receive responses.

To install the package, run the following command:


In [1]:
!pip install openai



### What is a large langauge model (LLM)?

* AI system designed to process and generate human-like language at scale.
* Utilizes deep learning 
* Trained on massive amounts of text data to learn patterns and relationships in language.
* "Large" because they often contain billions of parameters
    * Think of parameters as being mathematically similar to regression 
        * $y=x_1Parameter_1 + x_2Parameter_2 + \cdots$
    * Captures a wide range of linguistic features and nuances
        * GPT-1     117 million parameters
        * GPT-2     1.5 billion parameters
        * GPT-3     175 billion parameters
        * GPT-4     ? maybe trillion+
* Reasonably complex architecture
* At is essence, just a bunch of (clever) matrix algebra

How does an LLM operate?

We'll focus on three parameters:
* Max tokens
    * The maximum number of tokens to generate in the completion
* Temperature
    * Low = Less creative
    * High = more creative
* Logprobs
    * Include log probabilities of other most likely tokens

Let's consider a GPT-3 model
* Low temperature (same response every time we run it)
* Limit the response to 10 tokens




## Making an API Request to OpenAI

After installing the OpenAI Python package, we can start making API requests. In the following example, we use the **OpenAI client** to generate a text completion with a specified model, prompt, and additional parameters.


In [7]:
import os

# Set an environment variable
os.environ['OPENAI_API_KEY'] = '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'

In [10]:
from openai import OpenAI
import pandas as pd
client = OpenAI()

response = client.completions.create(
  model="davinci-002",
  prompt="The Miller College of Business at Ball State is the best in the world because",
  max_tokens=10,
  temperature=0.9,
    logprobs=3
)

In [11]:
response.choices[0].text

' of our talented leaders, families, alumni and friends'

In [12]:
logprobs = response.choices[0].logprobs

# Creating a DataFrame from the extracted components
df = pd.DataFrame({
    'text_offset': logprobs.text_offset,
    'token_logprobs': logprobs.token_logprobs,
    'tokens': logprobs.tokens,
    'top_logprobs': logprobs.top_logprobs
})

# Display the DataFrame
df

Unnamed: 0,text_offset,token_logprobs,tokens,top_logprobs
0,77,-1.005021,of,"{' of': -1.0050205, ' we': -1.6180624, ' it': ..."
1,80,-1.366456,our,"{' our': -1.3664559, ' the': -1.0654763, ' its..."
2,84,-3.776703,talented,"{' students': -1.6990833, ' people': -1.923612..."
3,93,-8.87825,leaders,"{' and': -1.2491404, ' faculty': -1.3675429, '..."
4,101,-0.909157,",","{',': -0.90915686, ' and': -1.4239655, '.': -1..."
5,102,-7.531237,families,"{' faculty': -1.9100987, ' students': -2.59267..."
6,111,-0.360292,",","{',': -0.36029196, ' and': -1.5601678, ' who':..."
7,112,-2.076356,alumni,"{' alumni': -2.0763564, ' and': -1.1199309, ' ..."
8,119,-2.079212,and,"{' and': -2.079212, ',': -0.1461331, ' &': -6...."
9,123,-0.711833,friends,"{' friends': -0.71183294, ' supporters': -2.46..."


Other "useful" parameters:
* Logit Bias
    * Intoduce bias to tokens
    * i.e. Ban or exclusively select words, phrases, topics, etc.
    * Continuous parameter
* Frequency Penalty
    * Encourage (or discourage) generation of unique tokens in a response
    * Can help prevent repeated responses
* Presence Penalty
    * Encourage (or discourage) model to make novel predictions
    * Doesn't depend on the frequency of past predictions

If you want to know more...

[Amazing, comprehensive explanation by Stephen Wolfram](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/)


## Utilizing GPT-4o

In [13]:
completion_1 = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Explain 2024 tax law changes relevant for individual returns in the style of a star wars title crawl."
        }
    ]
)

print(completion_1.choices[0].message.content)

```
A long time ago, in a galaxy not so far away,
the 2024 tax law changes emerged from the cosmic dust
of the fiscal nebula, bringing new hope to taxpayers
across the star systems.

Episode IV: A NEW DEDUCTION

It is a period of updates and amendments.
The galactic council of the IRS,
striving to balance the force of wealth and fairness,
has unveiled new tax provisions for individual returns.

A NEW STANDARD RISES
The enhanced standard deduction now offers greater shelter
from the imperial tax burden. Brave citizens may now claim
higher amounts, resisting the dark side of over-taxation.

THE ADVENT OF SAVINGS Hope emerges in the form of expanded
retirement savings plans. With increased limits for IRA contributions
and 401(k) plans, more rebels can now secure their future, building
their armada against the financial empire.

REBELLION AGAINST INFLATION Inflation adjustments come to the aid
of taxpayers, modifying tax brackets, credits, and deductions.
These adjustments protect the scat

## A more advanced prompting strategy

In [14]:
context = """
Doug and Pam Prospect are retired and ages 79 and 78 respectively. Doug was a former executive and Pam historically stayed at home with their children.  Today, they spend most of their time traveling to see their three adult children and eight grandchildren.  When not traveling, they are very engaged and active with their local church.  
Doug and Pam have approximately $5,000,000 in investable assets at a large investment advisory firm and a home valued at $1,000,000 with no debt. Their assets are invested exclusively in public markets with approximately 70% in equities and 30% in fixed income securities; all of which are either mutual funds or exchange traded funds (ETFs).   A further breakdown of the investable assets indicates that $1,000,0000 is held in a Traditional IRA and the balance in non-qualified accounts, which have a net unrealized long-term capital gain of approximately $250,000.
Doug and Pam are leery of any investment that is not liquid and would prefer securities that can be converted into cash quickly due to a “bad experience” in a private REIT.  They consistently talk about cash flow and worry about how to fund their lifestyle expenses as a result of a volatile 2022 market.  They have estimated their annual expenses to be approximately $250,000 (after-tax).  
During the prospecting discussions, Doug repeatedly shared his fear about the current administration’s focus on potential tax increases and the potential of a recession. They were concerned that these potential events could jeopardize their future and the wealth that they’ve worked so hard to accumulate.
Doug and Pam are losing trust in their current financial advisor, although they have glowing reviews about their “Dividend Stock Strategy” and the support team at the brokerage firm.  They felt like their advisor was not addressing their questions or concerns and often received the “canned” company response.   After CLA reviewed the performance of this strategy, the strategy has underperformed compared to corresponding benchmarks.  
The prospect has their taxes prepared by a well-known regional CPA firm and is expecting a tax liability of $300,000 due to a large capital gain event in the current year. Doug and Pam shared this is a one-time event and is not expected to repeat in the future.  Although the prospect believes in paying their fair share of taxes, they would like to understand if they can minimize their tax liabilities. Historically, Doug and Pam are made aware of their tax liabilities when they pick up their tax return in April each year. 
The prospect is charitably inclined.  In the past year, Doug and Pam donated $20,000 in cash donations writing multiple checks to their church and other 501(c)(3) organizations. 
"""

In [15]:
role = """
You are a tax accountant and wealth advisor.
"""

In [16]:
comments = """
Adapt your response to this scenario. 
Compare and contrast different approaches. 
Provide specific strategies.
"""

In [17]:
prompt = """
Doug and Pam would like to reduce their income tax liability in the current year given the large capital gain, how can they do so?
"""

In [18]:
completion_2 = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": role},
        {
            "role": "user",
            "content": context + comments + prompt
        }
    ]
)

print(completion_2.choices[0].message.content)

Doug and Pam have a complex financial situation with multiple elements at play, including investment management, tax liability management, and charitable inclinations. Here are several strategies and their potential impacts to address Doug and Pam's concerns and objectives.

### Tax Mitigation Strategies

1. **Harvesting Tax Losses**: 
   - **Description**: This involves selling investments that are currently at a loss to offset the capital gains.
   - **Advantages**: Directly reduces the taxable capital gains by matching gains with losses.
   - **Considerations**: They need to identify securities that have declined in value. They must also be aware of the wash-sale rule, which prevents repurchasing the same or substantially identical securities within 30 days before or after the sale.

2. **Charitable Contributions**:
   - **Donor-Advised Fund (DAF)**:
     - **Description**: A DAF allows Doug and Pam to make a large charitable contribution upfront while continuing to advise on how th

# Passing Data

## Loading and Displaying Financial Data

In this section, we load a CSV file containing financial data into a **Pandas DataFrame** for analysis. The **Pandas** library is a powerful tool in Python for data manipulation and analysis.


In [19]:
import pandas as pd
df = pd.read_csv("financial_data.csv")
df

Unnamed: 0,Quarter,Revenue,Cost of Goods Sold,Operating Expenses,Net Income
0,Q1 2023,500000,300000,120000,80000
1,Q2 2023,550000,320000,150000,80000
2,Q3 2023,600000,330000,170000,100000
3,Q4 2023,620000,340000,200000,80000


Add data to prompt

In [20]:
prompt_data = f"""
Analyze the following quarterly financial data for FY2023 and provide actionable recommendations to improve profitability. 

Highlight key trends, issues, and areas for improvement.

{df}
"""

In [21]:
prompt_data

'\nAnalyze the following quarterly financial data for FY2023 and provide actionable recommendations to improve profitability. \n\nHighlight key trends, issues, and areas for improvement.\n\n   Quarter  Revenue  Cost of Goods Sold  Operating Expenses  Net Income\n0  Q1 2023   500000              300000              120000       80000\n1  Q2 2023   550000              320000              150000       80000\n2  Q3 2023   600000              330000              170000      100000\n3  Q4 2023   620000              340000              200000       80000\n'

In [22]:
completion_3 = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": role},
        {
            "role": "user",
            "content": prompt_data
        }
    ]
)

print(completion_3.choices[0].message.content)

To analyze the given quarterly financial data for FY2023 and provide actionable recommendations for improving profitability, we'll first identify key trends and issues, followed by areas for improvement.

### Key Trends
1. **Revenue Growth**:
   - Revenue has grown consistently throughout the year (Q1: $500,000 to Q4: $620,000).

2. **Cost of Goods Sold (COGS)**:
   - COGS has also increased, but at a slower pace compared to revenue growth (Q1: $300,000 to Q4: $340,000).

3. **Operating Expenses**:
   - Operating expenses have increased significantly each quarter (Q1: $120,000 to Q4: $200,000).

4. **Net Income**:
   - Despite increased revenue, net income has remained relatively flat except for Q3 (Q1: $80,000, Q2: $80,000, Q3: $100,000, Q4: $80,000).

### Issues Identified
1. **Operating Expenses**:
   - A significant rise in operating expenses is eroding net income. This rise is disproportionate to the revenue growth.

2. **Flat Net Income**:
   - Net income has not shown the same i

# Use Assistants

## Image Analysis Using an Assistant

In this section of code, we are utilizing a client assistant to perform image analysis specifically for OSHA (Occupational Safety and Health Administration) compliance checks. Below is an overview of the steps involved:

1. **Uploading the Image**  
   The image (`osha.png`) is uploaded to the system with a purpose related to vision-based tasks. This image will later be analyzed for potential OSHA violations.

2. **Creating an Assistant**  
   We create an assistant named "OSHA Inspector" using the GPT-4 model. The assistant is designed to analyze the uploaded image and identify OSHA violations. It is equipped with a code interpreter tool to help it process the image.

3. **Initiating the Analysis Request**  
   A thread is created where the assistant is instructed to identify all OSHA violations in the uploaded image, cite the relevant laws, and suggest better practices.

4. **Running the Analysis**  
   The analysis is triggered, and we poll the system for results. If the process is completed successfully, it outputs "Done!"; otherwise, it checks the current status.

5. **Retrieving the Results**  
   Once the analysis is complete, we retrieve the results from the assistant and print the text-based findings to the console.

This workflow automates the process of identifying OSHA violations from an image, providing detailed feedback and suggestions based on OSHA standards.


In [23]:
osha_image = client.files.create(
  file=open("osha.png", "rb"),
  purpose='vision'
)

In [24]:
osha_assistant = client.beta.assistants.create(
  name="OSHA Inspector",
  description="You are an OSHA Inspector. Analyze the attached images and identify all OSHA violations.",
  model="gpt-4o",
  tools=[{"type": "code_interpreter"}],
  tool_resources={
    "code_interpreter": {
      "file_ids": [osha_image.id]
    }
  }
)

In [25]:
osha_thread = client.beta.threads.create(
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Identify all OSHA violations, cite the law, & describe better practices for each."
        },
        {
          "type": "image_file",
          "image_file": {"file_id": osha_image.id}
        },
      ],
    }
  ]
)

In [26]:
osha_run = client.beta.threads.runs.create_and_poll(
  thread_id=osha_thread.id,
  assistant_id=osha_assistant.id,
  instructions="Please address the user as Jane Doe. The user has a premium account."
)

In [27]:
if osha_run.status == 'completed': 
  print("Done!")
else:
  print(run.status)

Done!


In [28]:
osha_messages = client.beta.threads.messages.list(
    thread_id=osha_thread.id
  )

In [30]:
# osha_messages

In [31]:
for message in osha_messages.data:
    for content_block in message.content:
        if content_block.type == 'text':
            print(content_block.text.value)

Based on the image you've provided, there are a few clear Occupational Safety and Health Administration (OSHA) violations observed. Let’s identify them, cite relevant OSHA standards, and suggest better safety practices.

### OSHA Violations:

1. **Improper Use of Forklift for Lifting a Person**:
    - **Violation**: The person is being lifted using a forklift's forks, which is not designed or approved for lifting personnel.
    - **Relevant OSHA Standard**: OSHA standard 29 CFR 1910.178(m)(12) specifies that "under no circumstances shall a forklift be used to elevate employees unless proper lifting devices are attached."
    
2. **Lack of Personal Fall Arrest System**:
    - **Violation**: The individual lifted by the forklift is not wearing any personal fall protection equipment.
    - **Relevant OSHA Standard**: OSHA 29 CFR 1926.502(d) requires that personal fall arrest systems be used when workers are exposed to fall hazards over 4 feet in general industry workplaces and over 6 feet

## Data Analysis Using an Assistant

In this section of code, we are using a client assistant to perform data analysis and create visualizations from a CSV file. Here's a summary of the steps involved:

1. **Uploading the Data**  
   A CSV file containing data is uploaded to the system. This file will be used for analysis and visualization.

2. **Creating an Assistant**  
   We create an assistant named "Data Visualizer" using the GPT-4 model. This assistant is designed to analyze data from CSV files, identify trends, and generate relevant visualizations. Additionally, the assistant provides a brief summary of the trends observed.

3. **Initiating the Data Analysis Request**  
   A thread is created where the assistant is instructed to generate three data visualizations based on the trends found in the CSV file. The file is attached to this request, and the code interpreter tool is utilized to analyze it.

4. **Running the Analysis**  
   The assistant's analysis is triggered, and we poll the system for the results. If the analysis is successful, it confirms completion; otherwise, it checks the status.

5. **Retrieving the Results**  
   Once the process is complete, we retrieve the messages generated by the assistant. The assistant's output includes both the data visualizations and a summary of the trends identified from the CSV file.

This workflow automates the process of analyzing data from a CSV file, generating visualizations, and summarizing key trends.


In [None]:
file = client.files.create(
  file=open("HR_comma_sep.csv", "rb"),
  purpose='assistants'
)

In [None]:
assistant = client.beta.assistants.create(
  name="Data visualizer",
  description="You are great at creating beautiful data visualizations. You analyze data present in .csv files, understand trends, and come up with data visualizations relevant to those trends. You also share a brief text summary of the trends observed.",
  model="gpt-4o",
  tools=[{"type": "code_interpreter"}],
  tool_resources={
    "code_interpreter": {
      "file_ids": [file.id]
    }
  }
)

In [None]:
thread = client.beta.threads.create(
  messages=[
    {
      "role": "user",
      "content": "Create 3 data visualizations based on the trends in this file.",
      "attachments": [
        {
          "file_id": file.id,
          "tools": [{"type": "code_interpreter"}]
        }
      ]
    }
  ]
)

In [None]:
run = client.beta.threads.runs.create_and_poll(
  thread_id=thread.id,
  assistant_id=assistant.id,
  instructions="Please address the user as Jane Doe. The user has a premium account."
)

In [None]:
if run.status == 'completed': 
  messages = client.beta.threads.messages.list(
    thread_id=thread.id
  )
  print("Done!")
else:
  print(run.status)

In [None]:
# Updated function to extract all file IDs from SyncCursorPage[Message] object
def extract_file_ids(message_object):
    file_ids = []
    
    # Assuming the object has a 'data' attribute that contains the messages
    messages = message_object.data  # Access the 'data' attribute
    
    # Loop through all messages
    for message in messages:
        # Check if the message contains 'content'
        if hasattr(message, 'content'):
            # Loop through each content block
            for content_block in message.content:
                # Check if the content block is of type 'image_file'
                if content_block.type == 'image_file' and hasattr(content_block, 'image_file'):
                    # Extract the file_id and add it to the list
                    file_ids.append(content_block.image_file.file_id)
    
    return file_ids

# Now extract file IDs from the 'messages' object
file_ids = extract_file_ids(messages)

# Output the file IDs
print("Extracted file IDs:", file_ids)

In [None]:
file_ids

In [None]:
for file_id in file_ids:
    content = client.files.content(file_id)
    content_bytes = content.read()
    with open(f"{file_id}.png", "wb") as file:
        file.write(content_bytes)
    

In [None]:
import IPython.display as display
from PIL import Image

# Assuming the files are saved as .png in the current directory
for file_id in file_ids:
    content = client.files.content(file_id)
    content_bytes = content.read()
    
    # Save the file as before
    file_path = f"{file_id}.png"
    with open(file_path, "wb") as file:
        file.write(content_bytes)
    
    # Display the saved image
    img = Image.open(file_path)
    display.display(img)
