## Installing OpenAI library

Principles will be very similar to other AI libraries (Mistral, etc)

Open AI library is not currently included in Google Colab.
I would imagine Google would want to include some of their products(that is libraries to access them).
Those are Bard and Gemini (as of December 2023).

* Google did most of the original reasearch on Large Language Models in the last 15 years - but has not capitalized on it
* Instead companies such as OpenAI have overtaken Google in the mind-share and usability
* It is a classical case of Innovators Dilemma - Google was making too much money on their Search and they were afraid to hurt their product, now they have little choice
(lots of analogies elsewhere, for example BMW early electrical car looked very different from standard BMW because they were afraid of hurting their main product, now BMW realized that they have to catch up to companies such as Tesle, etc)

OpenAI is not immune, there are companies, startups such as Mistral from France, Europe that are very promising and may overtake OpenAI.
Competition in this field is high - and OpenAI could use some competition, because OpenAI is not actually open...

So pragmatically for now we will be using OpenAI to do our LLM tasks.

In [7]:
!pip install openai



In [8]:
import os # standard Python library for os related tasks
import openai # so this is openai library we just installed

## Getting OpenAI API key

It used to be that you could get OpenAI key for free at least for a month and you could get some credits. Maybe you still get one month or $10 free credits.

In [4]:
from google.colab import userdata # API to access user secrets

# for Google Colab users the best way to save Secrets is to use Secrets storage provided by Google Colab
# let's get secret key by name from Secrets storage: OPENAI_DH

secret = userdata.get("OPENAI_DH")
# how long is secret?
print(f"Secret is {len(secret)} characters long")

Secret is 164 characters long


In [5]:
# alternative to give secret by hand
if not secret:
  from getpass import getpass # standard library into Python
  secret = getpass('Enter the secret API key for OpenAI value: ') # so very similar to input function but with stars...
else:
  print(f"I already have secret key")

I already have secret key


In [9]:

# openai.api_key = "use_your_own_key!"  #never share your private API keys with the world! read them from enviroment or private text file
# or using getpass and copy pasting (like I did - not very convenient but good for one time use)
# alternatives, store API keys on your Google Drive - personally I do not recommend
# there is also something called Google Secrets, again I personally do not trust it, but your mileage may vary
openai.api_key = secret  #never share your private API keys with the world! read them from enviroment or private text file

## Accessing the API

https://github.com/openai/openai-python - documents the changes

Since this is cutting edge research, API changes quite often.

Expect things to stabilize in a few years.

In [10]:
from openai import OpenAI
client = OpenAI(
    # This is the default and can be omitted
    # api_key=os.environ.get("OPENAI_API_KEY"),
    api_key=secret, # instead of accessing my api key from my os, i use the key i pasted in to secret
    # again secret is just a string but do not write it directly here!!
)

In [11]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Say this is a test but translate into Latvian", # so this is your prompt
        }
    ],
    model="gpt-3.5-turbo", # this is among the cheapest of the models similar to the free version on ChatGPT
    # there are more options but we will stick with basics
)
# so with this example we made a call to OpenAI using the API key

In [14]:
# we can get the json response of everything the model provided
chat_completion.json()

/tmp/ipython-input-3186247119.py:2: PydanticDeprecatedSince20: The `json` method is deprecated; use `model_dump_json` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/
  chat_completion.json()


'{"id":"chatcmpl-Co7NUSYYdgDa634PpjNMolpfSQeJU","choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"≈†is ir tests.","refusal":null,"role":"assistant","annotations":[],"audio":null,"function_call":null,"tool_calls":null}}],"created":1766060796,"model":"gpt-3.5-turbo-0125","object":"chat.completion","service_tier":"default","system_fingerprint":null,"usage":{"completion_tokens":6,"prompt_tokens":18,"total_tokens":24,"completion_tokens_details":{"accepted_prediction_tokens":0,"audio_tokens":0,"reasoning_tokens":0,"rejected_prediction_tokens":0},"prompt_tokens_details":{"audio_tokens":0,"cached_tokens":0}}}'

In [13]:
# now to get just the answer in text form
chat_completion.choices[0].message.content

'≈†is ir tests.'

In [15]:
len(chat_completion.choices) # how many choices do we have?

1

## Automating Sentiment Analysis

To avoid writing all the boileplate code by hand, instead I will write a function that combines all the required code and I can call the function instead, whenever I need sentiment analysis.

In [17]:
def getSentiment(prompt, client=client, sentiments=("positive","neutral","negative"), model="gpt-3.5-turbo", max_prompt=200):
    sentiment_text = ",".join(sentiments)  # I add all the sentiments in a string separated by comma
    prompt=f"Social media post: \"{prompt[:max_prompt]}\"Sentiment ({sentiment_text}):"
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
    )
    return chat_completion.choices[0].message.content

In [18]:
getSentiment("I really like bread and circuses")
# depending on temperature setting we might get different answers

'Neutral'

In [19]:
getSentiment("Where do I begin? This is a brand new 4K scan from the original negative of the movie with an added HDR10 & Dolby Vision HDR grading, which looks fantastic!")

'Positive'

In [23]:
getSentiment("Man patik alus") # I like beer in Latvian

'Neutral'

In [24]:
getSentiment("Man nepatƒ´k slidenas ielas") #I do not like slippery streets

'negative'

In [25]:
# using this custom function you can pass in your own sentiments - for example in Latvian
getSentiment("Man nepatƒ´k slidenas ielas", sentiments=["pozitƒ´vs", "neitrƒÅls", "negatƒ´vs"])

'Negatƒ´vs'

In [26]:
getSentiment("Man patik alus", sentiments=["pozitƒ´vs", "neitrƒÅls", "negatƒ´vs"])

'pozitƒ´vs'

## Prompt creation

So basica idea is we provide a sort of answer key when we need sentiment analysis. We ended our prompt with possible answers and LLM gave us one of them.

In [27]:
# so this is our own ChatGPT basically, except we can adjust more parameters
def getAnswer(prompt, client=client, model="gpt-3.5-turbo", max_prompt=500):
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt[:max_prompt],
            }
        ],
        model=model,
    )
    return chat_completion.choices[0].message.content

In [28]:
getAnswer("What is the capital of Latvia?")

'Riga'

In [29]:
# of course LLMs have a infamous tendendcy to hallucinate especially when the model does not have the full answer.
# One possible test to evaluate is to ask a question where the answer is between ranges of some information it already knows
getAnswer("Who was president of Latvia in 1926?") # key being that 1926 was a year President of Latvia did not do anything special

'Gustavs Zemgals was the president of Latvia in 1926.'

In [30]:
getAnswer("Who was president of Latvia in 1926? Write a short paragraph on this president") # key being that 1926 was a year President of Latvia did not do anything special

"Gustavs Zemgals was the president of Latvia in 1926. A prominent lawyer and politician, Zemgals was known for his commitment to upholding the principles of democracy and the rule of law. During his presidency, he worked to strengthen Latvia's economy, improve social welfare, and promote diplomatic relations with other countries. Zemgals was a respected leader who strived to ensure stability and prosperity for the people of Latvia during a time of political uncertainty and economic challenges."

### getAnswer as analogous to ChatGPT interface

So now if you had say 100 documents to analyse you could write a loop and pass this function 100 times.

In [31]:
def getMovieEmoji(movie_title,client=client, model="gpt-3.5-turbo"):
    prompt=f"""Back to Future: üë®üë¥üöóüïí
    Batman: ü§µü¶á
    Transformers: üöóü§ñ
    Wonder Woman: üë∏üèªüë∏üèºüë∏üèΩüë∏üèæüë∏üèø
    Winnie the Pooh: üêªüêºüêª
    The Godfather: üë®üë©üëßüïµüèª‚Äç‚ôÇÔ∏èüë≤üí•
    Game of Thrones: üèπüó°üó°üèπ
    {movie_title}: """
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
    )
    return chat_completion.choices[0].message.content

In [32]:
print(getMovieEmoji("The Bourne Conspiracy"))

üïµÔ∏è‚Äç‚ôÇÔ∏èüíºüî´üïµÔ∏è‚Äç‚ôÄÔ∏èüèÉ‚Äç‚ôÇÔ∏èüîç


In [33]:
print(getMovieEmoji("Top Gun: Maverick"))

üõ©Ô∏èüë®‚Äç‚úàÔ∏èüî•


In [34]:
print(getMovieEmoji("Frozen 2"))

üëß‚ùÑÔ∏èüë∏üßäü¶åüå¨Ô∏èüèîÔ∏è


## Key takeaway - answers are not deterministic

Notice how multiple calls can return different answers, it is due to the temperature setting that slightly changes the answers.

Model stays the same but there is an element of slight randomness present.

So when you see some amazing LLM examples in action you have to ask how many tries did it take to generate them. Case in point recent Google Gemini Demo presentation.

In [None]:
# https://www.esrb.org/
def getGameRating(movie_text,client=client, model="gpt-3.5-turbo"):
    prompt=f""""Provide an ESRB rating for the following text:
    {movie_text}
    ESRB rating:"""
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
        max_tokens=60,
        temperature=0.7 # temperature ranges from 0 to 1
    )
    return chat_completion.choices[0].message.content


In [None]:
getGameRating("The game opens with a screen in a dark and stormy night. Five gamblers sit around a fire in a dark forest.")
# so one of the issues with LLMs is that you need to provide enought context

'The ESRB rating for the given text would likely be "Teen" (T) due to the dark and potentially intense atmosphere depicted in the opening scene.'

In [41]:
def getMovieRating(movie_text,client=client, model="gpt-3.5-turbo"):
    prompt=f""""Provide an MPA rating to the movie based on following description:
    {movie_text}
    MPA rating:"""
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
        max_tokens=60,
        temperature=0.7 # temperature ranges from 0 to 1
    )
    return chat_completion.choices[0].message.content

In [42]:
getMovieRating("They say all happy families are alike but all unhappy families are different in their own way")

'PG-13'

In [35]:
def getStudyNotes(subject, client=client, model="gpt-3.5-turbo"):
    prompt=f"What are some key points I should know when studying {subject} Provide at least five key points\n\n1."
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
        # using default for now, you can play around with some of the parameters
        # temperature=1,
        # max_tokens=64,
        # top_p=1.0,
        # frequency_penalty=0.0,
        # presence_penalty=0.0
    )
    return chat_completion.choices[0].message.content

In [37]:
print(getStudyNotes("Riga"))

 Riga is the capital city of Latvia, located on the banks of the Daugava River in the Baltic region. It is the largest city in the country and serves as an important economic, political, and cultural center.

2. Riga is known for its well-preserved medieval old town, which is a UNESCO World Heritage site. The old town is home to numerous historic buildings, including Riga Cathedral, the House of the Blackheads, and the Swedish Gate.

3. Riga is a multicultural city with a diverse population, including Latvians, Russians, Belarusians, and Ukrainians. This diversity is reflected in the city's architecture, cuisine, and cultural traditions.

4. Riga has a vibrant arts and cultural scene, with numerous museums, theaters, and art galleries showcasing both traditional and contemporary Latvian art. The city also hosts several major cultural events, such as the Riga Opera Festival and the Latvian Song and Dance Festival.

5. Riga is a popular tourist destination, known for its charming cobbled

In [38]:
# now let's try it with latest and gratest 5.2
print(getStudyNotes("Riga", model="gpt-5.2"))

1. **Geography & setting**: Riga is the capital of Latvia, located on the **Daugava River** near the **Gulf of Riga** (Baltic Sea). Its position made it a major **trade and transport hub** in Northern Europe.

2. **Historical development**: Founded in **1201** (traditionally associated with Bishop Albert), Riga grew quickly as a regional center. It later became part of major powers over time, including the **Polish‚ÄìLithuanian Commonwealth**, **Sweden**, the **Russian Empire**, and the **Soviet Union**, before Latvia restored independence in **1991**.

3. **Hanseatic League & trade**: Riga was an important member of the **Hanseatic League** (a medieval trade network). This shaped its economy, architecture, and multicultural character through merchants and international connections.

4. **Architecture & Old Town (Vecrƒ´ga)**: Riga is famous for its **well-preserved medieval Old Town**, a **UNESCO World Heritage** area, and for having one of Europe‚Äôs richest collections of **Art Nouve

In [39]:
# i am using print because this answer includes newlines
study_notes_on_dd = getStudyNotes("Digital Discourse")
print(study_notes_on_dd)
# note how 1. is missing from the answer?
# why?
# because I already provide 1. at the end of my prompt
# remember LLMs are just word prediction machines (with some useful side effects)

 Digital discourse refers to communication and interaction that takes place through digital platforms such as social media, online forums, and messaging apps.
   
2. It is important to consider the context and medium in which digital discourse is taking place, as this can greatly influence the tone, style, and content of communication.
   
3. Social media has revolutionized digital discourse by making it easier for people to share their thoughts and opinions with a wide audience, but it has also raised concerns about privacy, misinformation, and online bullying.
   
4. The use of emojis, GIFs, hashtags, and other forms of visual and textual shorthand are common in digital discourse and can convey subtle nuances of meaning and emotion.
   
5. Digital discourse can be both empowering and challenging, as it allows for greater freedom of expression and connection with others, but also presents risks of miscommunication, harassment, and data privacy concerns.


In [40]:
# now i can save my notes to text file
with open("study_notes.txt", mode="w", encoding="utf-8") as f:
    f.write(study_notes_on_dd)

In [None]:
# let's do the same with 5.2

## Conclusion on use of LLMs in Digital Discourse

So LLMs can aid in discourse analysis with what:

* Automated Text Analysis
* Sentiment Analysis
* Topic Modeling
* Translation
* Context Analysis
* Cross-cultural Analyss
* Summarization
* Bias detection

In [None]:
notes = [getStudyNotes("Sentiment Analysis") for _ in range(5)]  # i ran the same query 5 times, so text completion will be different each tie
print(notes)

[' Keywords and phrases can be negative even when they don‚Äôt seem to be\n2. Popular sentiment analysis is not always the right analysis\n3. Slang and capitals can make sentences seem negative', ' Semantria tracks sentiment globally by applying sentiment analysis models to content expressed in multiple languages.\n2. Semantria sentiment analysis models come in three different methods which are Query Based, Feed Based, and Sentiment Match.\n3. With Query Based sentiment analysis you can input a phrase or sentence for Semantria', " Find out what is interesting about your business\n2. Focus on your buying process\n\n3. Reduce capital inventory\n4. Have the temperament of an artists\n\nWhat key points do you think I should know for this topic?\n\nYou shouldn't study how to do sentiment analysis or focus on this topic if", ' Authors generally use sentiment words to convey approval versus a negative sentiment\n a group holds about a person, company, issue, or life events.\n2. Automated sent

In [None]:
digital_discourse_notes = [getStudyNotes("Digital Discourse") for _ in range(5)]
for note in digital_discourse_notes:
  print(note)
  print("="*40)

 There is a relationship between the word used and the identity of the individual speaker.

2. Gender can also change depending on the audience.

3. Tone serves as a crucial component for comprehension.

4. Slang and jargon differ from culture to culture and language to language

5.
Doesn't exist in a bubble so we need to pay attention to the framing and the understanding and assumptions around the discourse

2.It can be participatory and evolving

3.We need to study the social and political dimensions

4.Water cooler talk

5. A means of both self
 Digital Discourse includes the many locations of the discussion, such as a blog, a message board, a chatroom, a wiki, a MySpace page, a Facebook page, a YouTube video, etc.
A) Different people have different digital practices when it comes to culture

2. There are "four
 Textual and Visual 2. Readers and Writers 3. Poetic Knowledge and Poetic Practice 4. Technical and Cultural 5. Historical and Scientific 6. Analysis and Interdisciplinarity


In [None]:
def getEssayOutline(subject):
  response = openai.Completion.create(
  engine="davinci",
  prompt=f"Create an outline for an essay about {subject}:\n\nI: Introduction",
  temperature=0.7,
  max_tokens=60,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)
  return response["choices"][0]["text"]

In [None]:
print(getEssayOutline("Julius Cesar"))



II: Julius Cesar

III: Family background

IIII: Early life

IIII: Civil service

V: Cesar and Crassus

VI: Cesar and Pompey

VII: Cesar and the provinces

VIII:


In [None]:
print(getEssayOutline("Tourism in Latvia"))

.

- Tourism in Latvia.

II: What are the main development directions of the tourism business in Latvia?

- Growth of the tourism industry.

- The tourism industry as one of the most dynamic economic sectors in Latvia.

- Importance of the tourism industry


In [None]:
def getHorrorStory(topic):
  response = openai.Completion.create(
  engine="davinci",
  prompt=f"Topic: Breakfast\nTwo-Sentence Horror Story: He always stops crying when I pour the milk on his cereal. I just have to remember not to let him see his face on the carton.\n###\nTopic: {topic}\nTwo-Sentence Horror Story:",
  temperature=0.5,
  max_tokens=60,
  top_p=1.0,
  frequency_penalty=0.5,
  presence_penalty=0.0,
  stop=["###"]
)
  return response["choices"][0]["text"]

In [None]:
print(getHorrorStory("snow"))

 I was walking home from work when I realized that I was the only one on the sidewalk. A few minutes later, I saw a snowplow coming down the road. I waved to get its attention, but it just kept on going.
I don't know what happened to everyone else.


In [None]:
print(getHorrorStory("Christmas"))

 The real Santa Claus was too fat to fit down the chimney, so he left a note saying he'd be back next year.
Two-Sentence Horror Story: I think Santa Claus is going to kill me.



In [None]:
# once I have a list of some texts to analyzie
# i can simply loop over them and clal my getSentiment function for each text/document
my_tweets = ["In October main exports partners were Lithuania, Estonia, Germany and United Kingdom. The main import partners were Lithuania, Russian Federation, Poland and Germany."
,"In October 2021 the foreign trade turnover of Latvia amounted to ‚Ç¨ 3.39 billion, which at current prices was 23.5% larger than a year ago.Exports value of goods was ‚¨ÜÔ∏è17.8% higher, but imports value of goods ‚¨ÜÔ∏è28.8% higher. "]

my_tweet_sentiments = [getSentiment(tweet) for tweet in my_tweets]
my_tweet_sentiments

[' Positive', ' 0']

## Alternatives to OpenAI API

These days you have many many alternative LLM providers, such as Gemini from Google, Claude from Anthropic, Mistral, DeepSeek and many more

You could apply for API keys at individual ones, but my recommendation is to use an aggregrator.

I personally use https://openrouter.ai which let's me use SAME API to access pretty much any Large Language Model in the world - at least 400 as of last count.

Downside you pay about 5% premium for this convenience.

Sometimes this is actually unavoidable if you want to connect to some APIs as some providers are very hard to get a paid account and set it up, even Google makes things too difficult at times.

Another advantage of Openrouter.ai API keys is that you can set a SPENDING LIMIT on how much you want that key to allow.

This is crucial if you are worried of letting some script run wild and rack up huge bills. :)



## Premade Jupyter Notebook for Batch processing

If you want to avoid coding your own notebook, you can use

While working at various projects at National Library of Latvia, I've had colleagues ask for batch processing notebook:

You are also free to use it - just use your own openrouter.ai key:

https://colab.research.google.com/github/LNB-DH/PublicTools/blob/main/notebooks/llm/batch_processing.ipynb

