## Installing OpenAI library

Principles will be very similar to other AI libraries (Mistral, etc)

Open AI library is not currently included in Google Colab.
I would imagine Google would want to include some of their products(that is libraries to access them).
Those are Bard and Gemini (as of December 2023).

* Google did most of the original reasearch on Large Language Models in the last 15 years - but has not capitalized on it
* Instead companies such as OpenAI have overtaken Google in the mind-share and usability
* It is a classical case of Innovators Dilemma - Google was making too much money on their Search and they were afraid to hurt their product, now they have little choice
(lots of analogies elsewhere, for example BMW early electrical car looked very different from standard BMW because they were afraid of hurting their main product, now BMW realized that they have to catch up to companies such as Tesle, etc)

OpenAI is not immune, there are companies, startups such as Mistral from France, Europe that are very promising and may overtake OpenAI.
Competition in this field is high - and OpenAI could use some competition, because OpenAI is not actually open...

So pragmatically for now we will be using OpenAI to do our LLM tasks.

In [22]:
!pip install openai



In [23]:
import os # standard Python library for os related tasks
import openai # so this is openai library we just installed

## Getting OpenAI API key

It used to be that you could get OpenAI key for free at least for a month and you could get some credits. Maybe you still get one month or $10 free credits.

In [24]:
from getpass import getpass # standard library into Python
secret = getpass('Enter the secret API key for OpenAI value: ') # so very similar to input function but with stars...

Enter the secret API key for OpenAI value: ··········


In [25]:

# openai.api_key = "use_your_own_key!"  #never share your private API keys with the world! read them from enviroment or private text file
# or using getpass and copy pasting (like I did - not very convenient but good for one time use)
# alternatives, store API keys on your Google Drive - personally I do not recommend
# there is also something called Google Secrets, again I personally do not trust it, but your mileage may vary
openai.api_key = secret  #never share your private API keys with the world! read them from enviroment or private text file

## Accessing the API

https://github.com/openai/openai-python - documents the changes

Since this is cutting edge research, API changes quite often.

Expect things to stabilize in a few years.

In [26]:
from openai import OpenAI
client = OpenAI(
    # This is the default and can be omitted
    # api_key=os.environ.get("OPENAI_API_KEY"),
    api_key=secret, # instead of accessing my api key from my os, i use the key i pasted in to secret
    # again secret is just a string but do not write it directly here!!
)

In [27]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Say this is a test but translate into Latvian", # so this is your prompt
        }
    ],
    model="gpt-3.5-turbo", # this is among the cheapest of the models similar to the free version on ChatGPT
    # there are more options but we will stick with basics
)
# so with this example we made a call to OpenAI using the API key

In [29]:
# we can get the json response of everything the model provided
chat_completion.json

<bound method BaseModel.json of ChatCompletion(id='chatcmpl-8YbN0rAZoGQbb1kjO6y5crwo04T5x', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Šis ir tests.', role='assistant', function_call=None, tool_calls=None))], created=1703257494, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=6, prompt_tokens=18, total_tokens=24))>

In [28]:
# now to get just the answer in text form
chat_completion.choices[0].message.content

'Šis ir tests.'

In [30]:
len(chat_completion.choices) # how many choices do we have?

1

### Old API call - for historic purposes - changed in November 2023

In [None]:
response = openai.Completion.create(
  engine="davinci",
  prompt="Social media post: \"That new Spider Man movie stinks to high heaven\"\nSentiment (positive, neutral, negative):",
  temperature=0,
  max_tokens=1,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)

APIRemovedInV1: ignored

In [None]:
print(response)

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": " Negative"
    }
  ],
  "created": 1639577707,
  "id": "cmpl-4FPLfIQvBLsWX6ewmmTmkHarFKO8W",
  "model": "davinci:2020-05-03",
  "object": "text_completion"
}


In [None]:
response = openai.Completion.create(
  engine="davinci",
  prompt="Social media post: \"That new Spider Man movie is decent\"\nSentiment (positive, neutral, negative):",
  temperature=0,
  max_tokens=1,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)
print(response["choices"][0]["text"])

 positive


In [None]:
print(response)

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": " positive"
    }
  ],
  "created": 1639577843,
  "id": "cmpl-4FPNrGBWvwFN76Eeyb5oIQHufsZ7J",
  "model": "davinci:2020-05-03",
  "object": "text_completion"
}


In [None]:
response = openai.Completion.create(
  engine="davinci",
  prompt="Social media post: \"The first film had a much better balance between story and action. It seemed that this film had tons of unnecessary exposition (story really starts around 40 minutes into the movie), and the action was drawn out with lengthy CGI shots that did nothing to showcase the actors' talents, nothing to advance the story, and at provided little spectacle.\"\nSentiment (positive, neutral, negative):",
  temperature=0,
  max_tokens=1,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)
print(response["choices"][0]["text"])

APIRemovedInV1: ignored

In [31]:
prompt="Social media post: \"That new Spider Man movie is decent\"\nSentiment (positive, neutral, negative):"
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt, # so this is your prompt
        }
    ],
    model="gpt-3.5-turbo", # this is among the cheapest of the models similar to the free version on ChatGPT
    # there are more options but we will stick with basics
)
chat_completion.choices[0].message.content

'positive'

## Automating Sentiment Analysis

To avoid writing all the boileplate code by hand, instead I will write a function that combines all the required code and I can call the function instead, whenever I need sentiment analysis.

In [32]:
def getSentiment(prompt, client=client, sentiments=("positive","neutral","negative"), model="gpt-3.5-turbo", max_prompt=200):
    sentiment_text = ",".join(sentiments)  # I add all the sentiments in a string separated by comma
    prompt=f"Social media post: \"{prompt[:max_prompt]}\"Sentiment ({sentiment_text}):"
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
    )
    return chat_completion.choices[0].message.content

In [36]:
getSentiment("I really like bread and circuses")
# depending on temperature setting we might get different answers

'positive'

In [37]:
getSentiment("Where do I begin? This is a brand new 4K scan from the original negative of the movie with an added HDR10 & Dolby Vision HDR grading, which looks fantastic!")

'positive'

In [41]:
getSentiment("Man patik alus") # I like beer in Latvian

'neutral'

In [44]:
getSentiment("Man nepatīk slidenas ielas") #I do not like slippery streets

'negative'

In [45]:
# using this custom function you can pass in your own sentiments - for example in Latvian
getSentiment("Man nepatīk slidenas ielas", sentiments=["pozitīvs", "neitrāls", "negatīvs"])

'negatīvs'

In [48]:
getSentiment("Man patik alus", sentiments=["pozitīvs", "neitrāls", "negatīvs"])

'Pozitīvs'

## Prompt creation

So basica idea is we provide a sort of answer key when we need sentiment analysis. We ended our prompt with possible answers and LLM gave us one of them.

In [49]:
# so this is our own ChatGPT basically, except we can adjust more parameters
def getAnswer(prompt, client=client, model="gpt-3.5-turbo", max_prompt=500):
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt[:max_prompt],
            }
        ],
        model=model,
    )
    return chat_completion.choices[0].message.content

In [50]:
getAnswer("What is the capital of Latvia?")

'The capital of Latvia is Riga.'

In [51]:
# of course LLMs have a infamous tendendcy to hallucinate especially when the model does not have the full answer.
# One possible test to evaluate is to ask a question where the answer is between ranges of some information it already knows
getAnswer("Who was president of Latvia in 1926?") # key being that 1926 was a year President of Latvia did not do anything special

'In 1926, the president of Latvia was Jānis Čakste.'

In [52]:
getAnswer("Who was president of Latvia in 1926? Write a short paragraph on this president") # key being that 1926 was a year President of Latvia did not do anything special

"In 1926, the President of Latvia was Jānis Čakste. He served as the first President of Latvia from 1922 until his death in 1927. Čakste was a prominent lawyer and politician who played a crucial role in the establishment of an independent Latvia. As a representative of the Latvian Farmers' Union, he was elected as the Speaker of the Constitutional Assembly in 1920, which formulated the first Constitution of Latvia. During his presidency, Čakste emphasized maintaining constitutional governance, promoting social justice, and improving relations with other countries. However, his presidency was cut short by his untimely passing in 1927, leaving behind a legacy of democratic leadership and dedication to the Latvian nation."

### getAnswer as analogous to ChatGPT interface

So now if you had say 100 documents to analyse you could write a loop and pass this function 100 times.

In [53]:
def getMovieEmoji(movie_title,client=client, model="gpt-3.5-turbo"):
    prompt=f"""Back to Future: 👨👴🚗🕒
    Batman: 🤵🦇
    Transformers: 🚗🤖
    Wonder Woman: 👸🏻👸🏼👸🏽👸🏾👸🏿
    Winnie the Pooh: 🐻🐼🐻
    The Godfather: 👨👩👧🕵🏻‍♂️👲💥
    Game of Thrones: 🏹🗡🗡🏹
    {movie_title}: """
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
    )
    return chat_completion.choices[0].message.content

In [54]:
print(getMovieEmoji("The Bourne Conspiracy"))

🕶🔫🏃🏻‍♂️


In [55]:
print(getMovieEmoji("Top Gun: Maverick"))

🛩🚀🔥🎖


In [57]:
print(getMovieEmoji("Frozen 2"))

👸🏻👸🏼❄️🌬️


## Key takeaway - answers are not deterministic

Notice how multiple calls can return different answers, it is due to the temperature setting that slightly changes the answers.

Model stays the same but there is an element of slight randomness present.

So when you see some amazing LLM examples in action you have to ask how many tries did it take to generate them. Case in point recent Google Gemini Demo presentation.

In [64]:
# https://www.esrb.org/
def getGameRating(movie_text,client=client, model="gpt-3.5-turbo"):
    prompt=f""""Provide an ESRB rating for the following text:
    {movie_text}
    ESRB rating:"""
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
        max_tokens=60,
        temperature=0.7 # temperature ranges from 0 to 1
    )
    return chat_completion.choices[0].message.content


In [65]:
getGameRating("The game opens with a screen in a dark and stormy night. Five gamblers sit around a fire in a dark forest.")
# so one of the issues with LLMs is that you need to provide enought context

'The ESRB rating for the given text would likely be "Teen" (T) due to the dark and potentially intense atmosphere depicted in the opening scene.'

In [67]:
def getMovieRating(movie_text,client=client, model="gpt-3.5-turbo"):
    prompt=f""""Provide an MPA rating to the movie based on following description:
    {movie_text}
    MPA rating:"""
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
        max_tokens=60,
        temperature=0.7 # temperature ranges from 0 to 1
    )
    return chat_completion.choices[0].message.content

In [69]:
getMovieRating("They say all happy families are alike but all unhappy families are different in their own way")

'PG-13'

In [70]:
def getStudyNotes(subject, client=client, model="gpt-3.5-turbo"):
    prompt=f"What are some key points I should know when studying {subject} Provide at least five key points\n\n1."
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
        # using default for now, you can play around with some of the parameters
        # temperature=1,
        # max_tokens=64,
        # top_p=1.0,
        # frequency_penalty=0.0,
        # presence_penalty=0.0
    )
    return chat_completion.choices[0].message.content

In [71]:
print(getStudyNotes("Riga"))

Riga is the capital city of Latvia and is located in the Baltic region of Northern Europe.
2. With a population of around 700,000, Riga is the largest city in the Baltic states and the third-largest in the Baltic Sea region.
3. Riga is known for its well-preserved Art Nouveau architecture, which is recognized by UNESCO as a World Heritage Site.
4. The city has a rich history, with influences from various cultures, including German, Russian, and Swedish, due to its strategic location on the Baltic Sea.
5. Riga is a major economic and financial center in the region, with a strong service sector, manufacturing industry, and a thriving startup scene.


In [72]:
# i am using print because this answer includes newlines
study_notes_on_dd = getStudyNotes("Digital Discourse")
print(study_notes_on_dd)
# note how 1. is missing from the answer?
# why?
# because I already provide 1. at the end of my prompt
# remember LLMs are just word prediction machines (with some useful side effects)

Digital Discourse refers to the communication and interaction that takes place online or through digital platforms such as social media, chat rooms, and forums.

2. It encompasses various forms of communication, including written text, images, videos, emojis, and symbols, that are used to convey messages and engage in conversations online.

3. Digital Discourse is often characterized by its informality, speed, and brevity, with users frequently using abbreviations, acronyms, and slang to express themselves quickly and concisely.

4. It is influenced by the context in which it takes place, including the platform being used, the intended audience, and the social norms and conventions of that online community.

5. Digital Discourse is dynamic and constantly evolving, with new words, phrases, and expressions being coined regularly, and cultural references and memes playing a significant role in shaping online conversations.


In [73]:
# now i can save my notes to text file
with open("study_notes.txt", mode="w", encoding="utf-8") as f:
    f.write(study_notes_on_dd)

## Conclusion on use of LLMs in Digital Discourse

So LLMs can aid in discourse analysis with what:

* Automated Text Analysis
* Sentiment Analysis
* Topic Modeling
* Translation
* Context Analysis
* Cross-cultural Analyss
* Summarization
* Bias detection

In [None]:
notes = [getStudyNotes("Sentiment Analysis") for _ in range(5)]  # i ran the same query 5 times, so text completion will be different each tie
print(notes)

[' Keywords and phrases can be negative even when they don’t seem to be\n2. Popular sentiment analysis is not always the right analysis\n3. Slang and capitals can make sentences seem negative', ' Semantria tracks sentiment globally by applying sentiment analysis models to content expressed in multiple languages.\n2. Semantria sentiment analysis models come in three different methods which are Query Based, Feed Based, and Sentiment Match.\n3. With Query Based sentiment analysis you can input a phrase or sentence for Semantria', " Find out what is interesting about your business\n2. Focus on your buying process\n\n3. Reduce capital inventory\n4. Have the temperament of an artists\n\nWhat key points do you think I should know for this topic?\n\nYou shouldn't study how to do sentiment analysis or focus on this topic if", ' Authors generally use sentiment words to convey approval versus a negative sentiment\n a group holds about a person, company, issue, or life events.\n2. Automated sentim

In [None]:
digital_discourse_notes = [getStudyNotes("Digital Discourse") for _ in range(5)]
for note in digital_discourse_notes:
  print(note)
  print("="*40)

 There is a relationship between the word used and the identity of the individual speaker.

2. Gender can also change depending on the audience.

3. Tone serves as a crucial component for comprehension.

4. Slang and jargon differ from culture to culture and language to language

5.
Doesn't exist in a bubble so we need to pay attention to the framing and the understanding and assumptions around the discourse

2.It can be participatory and evolving

3.We need to study the social and political dimensions

4.Water cooler talk

5. A means of both self
 Digital Discourse includes the many locations of the discussion, such as a blog, a message board, a chatroom, a wiki, a MySpace page, a Facebook page, a YouTube video, etc.
A) Different people have different digital practices when it comes to culture

2. There are "four
 Textual and Visual 2. Readers and Writers 3. Poetic Knowledge and Poetic Practice 4. Technical and Cultural 5. Historical and Scientific 6. Analysis and Interdisciplinarity


In [None]:
def getEssayOutline(subject):
  response = openai.Completion.create(
  engine="davinci",
  prompt=f"Create an outline for an essay about {subject}:\n\nI: Introduction",
  temperature=0.7,
  max_tokens=60,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)
  return response["choices"][0]["text"]

In [None]:
print(getEssayOutline("Julius Cesar"))



II: Julius Cesar

III: Family background

IIII: Early life

IIII: Civil service

V: Cesar and Crassus

VI: Cesar and Pompey

VII: Cesar and the provinces

VIII:


In [None]:
print(getEssayOutline("Tourism in Latvia"))

.

- Tourism in Latvia.

II: What are the main development directions of the tourism business in Latvia?

- Growth of the tourism industry.

- The tourism industry as one of the most dynamic economic sectors in Latvia.

- Importance of the tourism industry


In [None]:
def getHorrorStory(topic):
  response = openai.Completion.create(
  engine="davinci",
  prompt=f"Topic: Breakfast\nTwo-Sentence Horror Story: He always stops crying when I pour the milk on his cereal. I just have to remember not to let him see his face on the carton.\n###\nTopic: {topic}\nTwo-Sentence Horror Story:",
  temperature=0.5,
  max_tokens=60,
  top_p=1.0,
  frequency_penalty=0.5,
  presence_penalty=0.0,
  stop=["###"]
)
  return response["choices"][0]["text"]

In [None]:
print(getHorrorStory("snow"))

 I was walking home from work when I realized that I was the only one on the sidewalk. A few minutes later, I saw a snowplow coming down the road. I waved to get its attention, but it just kept on going.
I don't know what happened to everyone else.


In [None]:
print(getHorrorStory("Christmas"))

 The real Santa Claus was too fat to fit down the chimney, so he left a note saying he'd be back next year.
Two-Sentence Horror Story: I think Santa Claus is going to kill me.



In [None]:
# once I have a list of some texts to analyzie
# i can simply loop over them and clal my getSentiment function for each text/document
my_tweets = ["In October main exports partners were Lithuania, Estonia, Germany and United Kingdom. The main import partners were Lithuania, Russian Federation, Poland and Germany."
,"In October 2021 the foreign trade turnover of Latvia amounted to € 3.39 billion, which at current prices was 23.5% larger than a year ago.Exports value of goods was ⬆️17.8% higher, but imports value of goods ⬆️28.8% higher. "]

my_tweet_sentiments = [getSentiment(tweet) for tweet in my_tweets]
my_tweet_sentiments

[' Positive', ' 0']