# Testing OpenRouter models

We will be testing how well various models available on OpenRouter perform extraction of transport related information from text.

We will be using OPENAI style API to access the models.

In [7]:
# first we need to import the basic libraries
# date
from datetime import datetime
now = datetime.now()
print(f"Date: {now}")
# python version
import sys
print(f"Python version: {sys.version}")
from pathlib import Path
import json
# import time for delay
import time
import requests
# print version
print(f"Requests version: {requests.__version__}")

from tqdm import tqdm
# import OpenAI
# let's try using OpenAI API
from openai import OpenAI
# import openai version
from openai import __version__ as openai_version
print(f"OpenAI version: {openai_version}")


# load OPENROUTER_API key from system environment
import os
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')
# assert key is not None
assert openrouter_api_key is not None, "OPENROUTER_API_KEY is not set"
print("We have the OPENROUTER_API_KEY should be good to go")

Date: 2025-02-04 08:46:14.479732
Python version: 3.12.6 (tags/v3.12.6:a4a2d2b, Sep  6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)]
Requests version: 2.32.3
OpenAI version: 1.59.9
We have the OPENROUTER_API_KEY should be good to go


In [2]:
# let's make a request to the OpenRouter API
response = requests.post(
  url="https://openrouter.ai/api/v1/chat/completions",
  headers={
    "Authorization": f"Bearer {openrouter_api_key}",
    "HTTP-Referer": "VSAU", # Optional. Site URL for rankings on openrouter.ai.
    "X-Title": "VSAU", # Optional. Site title for rankings on openrouter.ai.
  },
  data=json.dumps({
    "model": "openai/gpt-3.5-turbo", # Optional
    "messages": [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ]
    
  })
)
# print the response
print(response.json())

{'id': 'gen-1738576635-JSkQP3jMX8qWqBu6nnfN', 'provider': 'OpenAI', 'model': 'openai/gpt-3.5-turbo', 'object': 'chat.completion', 'created': 1738576635, 'choices': [{'logprobs': None, 'finish_reason': 'stop', 'native_finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': 'The meaning of life is a philosophical question that has been debated by humans for centuries. Different religions, cultures, and individuals have their own interpretations of the meaning of life. Some believe that the purpose of life is to seek happiness and fulfillment, others believe it is to serve a higher power or spiritual purpose, while some believe that life has no inherent meaning and it is up to each individual to create their own meaning. Ultimately, the meaning of life is a deeply personal and subjective question that each person must answer for themselves.', 'refusal': None}}], 'system_fingerprint': None, 'usage': {'prompt_tokens': 14, 'completion_tokens': 105, 'total_tokens': 119

In [3]:
# print actual content
print(response.json()['choices'][0]['message']['content'])

The meaning of life is a philosophical question that has been debated by humans for centuries. Different religions, cultures, and individuals have their own interpretations of the meaning of life. Some believe that the purpose of life is to seek happiness and fulfillment, others believe it is to serve a higher power or spiritual purpose, while some believe that life has no inherent meaning and it is up to each individual to create their own meaning. Ultimately, the meaning of life is a deeply personal and subjective question that each person must answer for themselves.


In [4]:
# let's try using OpenAI API
# from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key=openrouter_api_key,
)

completion = client.chat.completions.create(
  extra_headers={
    "HTTP-Referer": "VSAU", # Optional. Site URL for rankings on openrouter.ai.
    "X-Title": "VSAU", # Optional. Site title for rankings on openrouter.ai.
  },
  model="openai/gpt-3.5-turbo",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
)
print(completion.choices[0].message.content)

The meaning of life is a philosophical question that has been debated by thinkers and individuals for centuries. Different perspectives and beliefs exist regarding the purpose and meaning of life, with some attributing it to religious or spiritual beliefs, others focusing on personal fulfillment and happiness, and still others viewing it as a subjective and individual concept. Ultimately, the meaning of life may vary from person to person and can be shaped by one's values, beliefs, experiences, and goals.


## Function to use openai API

In [8]:
# let's make a function that calls the OpenAI API, function will have following parameters:
# base_url, api_key, model, messages, headers
base_url = "https://openrouter.ai/api/v1"
api_key = openrouter_api_key
model = "google/gemini-flash-1.5"
model = "google/gemini-2.0-flash-exp:free"
model = "google/gemini-flash-1.5-8b"
print(f"Using model {model}")
role = "user"
headers = {
  "HTTP-Referer": "VSAU", # Optional. Site URL for rankings on openrouter.ai.
  "X-Title": "VSAU", # Optional. Site title for rankings on openrouter.ai.
}

system_prompt = """You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: """


def openai_chat(user_prompt, 
                system_prompt=system_prompt, 
                base_url=base_url,
                api_key=api_key, 
                model=model, 
                headers=headers,
                verbose=True,
                max_retries=10,
                delay=0.5):
  client = OpenAI(
    base_url=base_url,
    api_key=api_key,
  )
  messages = [
        {
        "role": "system",
        "content": system_prompt
        },
        {
        "role": "user",
        "content": user_prompt
        }
    ]
  # if verbose let's show model, system_prompt and start time for request
  if verbose:
    print(f"Using model {model}")
    print(f"System prompt: {system_prompt}")
    start_time = datetime.now()
    print(f"Start time: {start_time}")
  completion = None
  tries = 0
  while (completion is None or completion.choices is None) and tries < max_retries:
    try:
      completion = client.chat.completions.create(
        extra_headers=headers,
        model=model,
        messages=messages
      )
    except Exception as e:
      print(f"Error: {e}")
      print(f"Trying again... in {delay} seconds")
      # delay
      time.sleep(delay)
    tries += 1
    
  if verbose:
    print(f"Time taken: {datetime.now() - start_time}")
  if completion is None or completion.choices is None:
    return f"Failed to get completion after {max_retries} retries using model {model}"
  return completion.choices[0].message.content


Using model google/gemini-flash-1.5-8b


In [9]:
# let's test the function with a prompt
prompt = """Pa ceļu gāja cilvēks ar suni, bet viņš bija ļoti noguris, tāpēc viņš nolēma izmantot sabiedrisko transportu. 
Vēlāk viņš sēdēja autobusā un skatījās uz logu.
Šis cilvēks sapņoja par jūru un pludmali, bet viņam bija jābrauc uz darbu.
Cik labī būtu bijis sēdēt laiva, makšķerēt zivis un baudīt sauli!"""
response = openai_chat(prompt)
print(response)

Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-02-04 08:46:33.967690
Time taken: 0:00:02.453795
* autobusā
* laiva



In [10]:
# let's list all text files in data/docs folder
# data_folder = Path("../data/docs")
data_folder = Path("../../lnb_lat_sen_rom_releases/lat_sen_rom_2025_01_28")
# assert folder exists
assert data_folder.exists(), f"Folder {data_folder} does not exist"
                   
# list all files
files = list(data_folder.glob("*.txt"))
# print all files
# how many files do we have?
print(f"Number of files: {len(files)}")


Number of files: 453


In [11]:
# let's load the files into a dictionary with filename stem as key and text as value
# remember to decode the text as utf-8
texts = {}
for file in tqdm(files):
  with open(file, "r", encoding="utf-8") as f:
    texts[file.stem] = f.read()
# how many texts do we have?
print(f"Number of texts: {len(texts)}")
# how many characters do we have in total?
total_chars = sum([len(text) for text in texts.values()])
print(f"Total characters: {total_chars}")
# what is the smallest text?
min_text = min(texts, key=lambda x: len(texts[x]))
print(f"Key for smallest text: {min_text}")
# how many characters does the smallest text have?
min_chars = len(texts[min_text])


100%|██████████| 453/453 [00:05<00:00, 89.47it/s] 

Number of texts: 453
Total characters: 188367402
Key for smallest text: VentA_DepuT_1293527





In [12]:
# let's create responses subfolder
responses_folder = Path("../data/responses")
# responses_folder = data_folder / "responses"
# create folder if it does not exist
responses_folder.mkdir(exist_ok=True)
# assert folder exists
assert responses_folder.exists(), f"Folder {responses_folder} does not exist"
# print full path
print(f"Responses folder: {responses_folder}")


Responses folder: ..\data\responses


## List of transportation terms

In [16]:
land_transportation = """'zirgs' 'dzelzceļš' 'ormanis' 'pajūgs' 'kariete' 'kamanas'
'velosipēds' 'automašīna' 'mašīna' 'vilciens' 'lokomotīve' 'tramvajs' 'bānītis' 'Mercedes'"""
# get rid of '
land_transportation = land_transportation.replace("'", "")
# split by space
land_transportation = land_transportation.split()
# print land_transportation
print(land_transportation)

# now let's recreate system_prompt_2 with unique values
# prompt_with_terms = f"""You are an expert on Latvian language and transportation.
# 1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
# 2. List of keywords: {" ".join(land_transportation)}
# 3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:"""

# prompt_with_terms = f"""You are an expert on Latvian language and transportation.
# Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
# {' '.join(land_transportation)}
# Display only the semantically related terms exactly as found in the text and nothing else. Latvian text follows: """


prompt_with_terms = f"""You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
{' '.join(land_transportation)}
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: """
print("Prompt with terms:")
print(prompt_with_terms)

['zirgs', 'dzelzceļš', 'ormanis', 'pajūgs', 'kariete', 'kamanas', 'velosipēds', 'automašīna', 'mašīna', 'vilciens', 'lokomotīve', 'tramvajs', 'bānītis', 'Mercedes']
Prompt with terms:
You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows:

## Function to call the model multiple times

In [19]:
# let's make a function that will take as parameters the following:
# texts, system_prompt, verbose, delay, max_retries, model, responses_folder
# first we will create a subfolder in responses_folder named YYYY_MM_DD_model
# then we will iterate over all texts and call openai_chat function with text as prompt
# we will save the response in the subfolder with the same name as the text
# we will delay with given delay between requests
# we will retry max_retries times if we get an error

def openai_chat_all(texts, system_prompt, verbose, delay, max_retries, model, responses_folder, responses_folder_suffix="with_terms"):
  # create subfolder in responses_folder
  # let's create model_name from model by replacing / with _
  model_name = model.replace("/", "_")
  model_name = model_name.replace(":", "_")
  subfolder = responses_folder / f"{datetime.now().strftime('%Y_%m_%d')}_{model_name}_{responses_folder_suffix}"
  if verbose:
    print(f"Subfolder: {subfolder}")
  # create subfolder if it does not exist
  subfolder.mkdir(exist_ok=True)
  # assert subfolder exists
  assert subfolder.exists(), f"Subfolder {subfolder} does not exist"
  # iterate over all texts
  for key, value in tqdm(texts.items()):
    if verbose:
      print(f"Processing key: {key}")
    # call openai_chat function with text as prompt
    response = openai_chat(user_prompt=value, 
                           system_prompt=system_prompt, 
                           verbose=verbose, delay=delay, max_retries=max_retries, model=model)
    # save the response in the subfolder with the same name as the text
    with open(subfolder / f"{key}.txt", "w", encoding="utf-8") as f:
      f.write(response)
    # delay with given delay between requests
    time.sleep(delay)

# let's test on first 10 texts
# openai_chat_all(dict(list(texts.items())[:20]), prompt_with_terms, True, 0.2, 10, model, responses_folder)

In [None]:
deep_seek_model = "deepseek/deepseek-r1:free"
# let's try 20 texts with deepseek model
openai_chat_all(dict(list(texts.items())[:20]), prompt_with_terms, True, 0.2, 10, deep_seek_model, responses_folder, "with_terms")

In [21]:
# let's try gemini experimental
gemini_exp_model = "google/gemini-2.0-flash-thinking-exp:free"
# let's try 20 texts with gemini experimental model
openai_chat_all(dict(list(texts.items())[:20]), prompt_with_terms, True, 0.2, 10, gemini_exp_model, responses_folder, "with_terms")

Subfolder: ..\data\responses\2025_02_04_google_gemini-2.0-flash-thinking-exp_free_with_terms


  0%|          | 0/20 [00:00<?, ?it/s]

Processing key: AizsV_MilaU_1049452
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:39:44.661895
Time taken: 0:00:06.694392


  5%|▌         | 1/20 [00:07<02:27,  7.78s/it]

Processing key: AkurJ_DegoS_771400
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:39:52.431621
Time taken: 0:00:04.333805


 10%|█         | 2/20 [00:13<01:54,  6.38s/it]

Processing key: AkurJ_PeteD_886346
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:39:57.747210
Time taken: 0:00:05.957774


 15%|█▌        | 3/20 [00:20<01:52,  6.64s/it]

Processing key: AkurJ_UgunZ_1049441
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:04.727318
Time taken: 0:00:04.401191


 20%|██        | 4/20 [00:25<01:38,  6.16s/it]

Processing key: Andra_Elita_1053573
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:10.124077
Time taken: 0:00:05.123360


 25%|██▌       | 5/20 [00:31<01:32,  6.15s/it]

Processing key: Anoni_BandK_1333186
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:16.334114
Time taken: 0:00:05.224011


 30%|███       | 6/20 [00:37<01:26,  6.20s/it]

Processing key: Anoni_BandK_419229
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:22.573136
Time taken: 0:00:05.738908


 35%|███▌      | 7/20 [00:44<01:22,  6.38s/it]

Processing key: Anoni_KaptT_419839
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:29.678312
Time taken: 0:00:04.567028


 40%|████      | 8/20 [00:50<01:14,  6.24s/it]

Processing key: Anoni_SarkM_1350350
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:35.462403


 45%|████▌     | 9/20 [00:54<01:01,  5.58s/it]

Time taken: 0:00:02.908504
Processing key: ArdeE_ApLie_1051730
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:39.389609
Time taken:

 50%|█████     | 10/20 [00:58<00:51,  5.15s/it]

Processing key: ArdeE_SvetA_1046832
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:43.640152
Time taken: 0:00:03.196786


 55%|█████▌    | 11/20 [01:03<00:43,  4.88s/it]

Processing key: ArdsL_TrijV_1053572
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:47.878947
Time taken: 0:00:03.094648


 60%|██████    | 12/20 [01:07<00:37,  4.66s/it]

Processing key: Arnis_AndrS_948028
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:52.264780
Time taken: 0:00:03.559856


 65%|██████▌   | 13/20 [01:12<00:32,  4.71s/it]

Processing key: Arnis_MilaL_1047332
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:40:56.826161
Time taken: 0:00:03.310929


 70%|███████   | 14/20 [01:16<00:27,  4.59s/it]

Processing key: Arnis_TaurK_1051711
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:41:01.192105


 75%|███████▌  | 15/20 [01:22<00:25,  5.01s/it]

Time taken: 0:00:04.911653
Processing key: Artis_ArNai_1053600
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:41:07.116407
Time taken:

 80%|████████  | 16/20 [01:27<00:19,  4.96s/it]

Processing key: AustA_GaraJ_1025406
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:41:11.928684
Time taken: 0:00:05.551808


 85%|████████▌ | 17/20 [01:33<00:16,  5.43s/it]

Processing key: AustA_KaspG_948026
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:41:18.550514
Time taken: 0:00:03.186671


 90%|█████████ | 18/20 [01:38<00:10,  5.08s/it]

Processing key: AustA_Puisk_1047362
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:41:22.743696


 95%|█████████▌| 19/20 [01:42<00:04,  4.72s/it]

Time taken: 0:00:02.894614
Processing key: BaloP_DaugU_1051661
Using model google/gemini-2.0-flash-thinking-exp:free
System prompt: You are an expert on Latvian language and transportation.
Analyze the following Latvian document and identify which terms are semantically similar in meaning to one of the following keywords:
zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis Mercedes
Display only the semantically related noun terms exactly as found in the text and nothing else. 
For example if the text contains the word 'zirgā' which is a form of 'zirgs' you should include 'zirgā' in the response.
For example if the text contains word 'kučieris' which is related to 'zirgs' you should also include 'kučieris' in the response.
For example if the text contains word 'Fords' which is semantically similar to 'Mercedes' you should include 'Fords' in the response.
Latvian text follows: 
Start time: 2025-02-04 10:41:26.682479
Time taken:

100%|██████████| 20/20 [01:45<00:00,  5.30s/it]


In [21]:
prompt_no_terms = system_prompt
print("Prompt without terms:")
print(prompt_no_terms)

Prompt without terms:
You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 


In [22]:
# now let's run first 20 texts without terms
openai_chat_all(dict(list(texts.items())[:20]), prompt_no_terms, True, 0.2, 10, model, responses_folder, "no_terms")

Subfolder: ..\data\responses\2025_01_29_google_gemini-flash-1.5-8b_no_terms


  0%|          | 0/20 [00:00<?, ?it/s]

Processing key: AizsV_MilaU_1049452
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:40:07.210005


  5%|▌         | 1/20 [00:07<02:18,  7.30s/it]

Time taken: 0:00:06.237332
Processing key: AkurJ_DegoS_771400
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:40:14.479043
Time taken: 0:00:03.775122


 10%|█         | 2/20 [00:12<01:45,  5.84s/it]

Processing key: AkurJ_PeteD_886346
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:40:19.259804
Time taken: 0:00:08.182847


 15%|█▌        | 3/20 [00:21<02:05,  7.37s/it]

Processing key: AkurJ_UgunZ_1049441
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:40:28.459678
Time taken: 0:00:08.312861


 20%|██        | 4/20 [00:30<02:10,  8.14s/it]

Processing key: Andra_Elita_1053573
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:40:37.729697
Time taken: 0:00:07.591146


 25%|██▌       | 5/20 [00:39<02:04,  8.29s/it]

Processing key: Anoni_BandK_1333186
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:40:46.315371
Time taken: 0:00:39.861545


 30%|███       | 6/20 [01:20<04:31, 19.36s/it]

Processing key: Anoni_BandK_419229
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:41:27.324114
Time taken: 0:00:33.292759


 35%|███▌      | 7/20 [01:54<05:15, 24.29s/it]

Processing key: Anoni_KaptT_419839
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:42:01.583005
Time taken: 0:00:18.933177


 40%|████      | 8/20 [02:14<04:34, 22.90s/it]

Processing key: Anoni_SarkM_1350350
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:42:21.498193
Time taken: 0:00:03.353871


 45%|████▌     | 9/20 [02:18<03:07, 17.09s/it]

Processing key: ArdeE_ApLie_1051730
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:42:25.818392
Time taken: 0:00:10.564573


 50%|█████     | 10/20 [02:30<02:34, 15.47s/it]

Processing key: ArdeE_SvetA_1046832
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:42:37.730790


 55%|█████▌    | 11/20 [02:39<02:00, 13.34s/it]

Time taken: 0:00:07.483849
Processing key: ArdsL_TrijV_1053572
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:42:46.254123
Time taken: 0:00:04.587100


 60%|██████    | 12/20 [02:44<01:27, 11.00s/it]

Processing key: Arnis_AndrS_948028
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:42:51.838382
Time taken: 0:00:13.029951


 65%|██████▌   | 13/20 [02:58<01:23, 11.91s/it]

Processing key: Arnis_MilaL_1047332
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:43:05.833611
Time taken: 0:00:05.957497


 70%|███████   | 14/20 [03:05<01:02, 10.41s/it]

Processing key: Arnis_TaurK_1051711
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:43:12.757124
Time taken: 0:00:16.058270


 75%|███████▌  | 15/20 [03:22<01:02, 12.40s/it]

Processing key: Artis_ArNai_1053600
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:43:29.786726
Time taken: 0:00:13.617145


 80%|████████  | 16/20 [03:37<00:52, 13.15s/it]

Processing key: AustA_GaraJ_1025406
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:43:44.654682
Time taken: 0:01:03.882531


 85%|████████▌ | 17/20 [04:42<01:26, 28.70s/it]

Processing key: AustA_KaspG_948026
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:44:49.536305
Time taken: 0:00:05.223718


 90%|█████████ | 18/20 [04:48<00:43, 21.94s/it]

Processing key: AustA_Puisk_1047362
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:44:55.741725
Time taken: 0:00:15.183847


 95%|█████████▌| 19/20 [05:04<00:20, 20.21s/it]

Processing key: BaloP_DaugU_1051661
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-29 10:45:11.935732
Time taken: 0:00:15.350159


100%|██████████| 20/20 [05:21<00:00, 16.06s/it]


In [23]:
# now let's run rest in non verbose mode
openai_chat_all(dict(list(texts.items())[20:]), prompt_no_terms, False, 0.1, 10, model, responses_folder, "no_terms")

100%|██████████| 433/433 [1:46:06<00:00, 14.70s/it]  


In [27]:
print(prompt_with_terms)

You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:


In [28]:
# let's try first 20 texts with terms
openai_chat_all(dict(list(texts.items())[:20]), prompt_with_terms, True, 0.2, 10, model, responses_folder, "with_terms")

Subfolder: ..\data\responses\2025_01_29_google_gemini-flash-1.5-8b_with_terms


  0%|          | 0/20 [00:00<?, ?it/s]

Processing key: AizsV_MilaU_1049452
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:18:31.196224
Time taken: 0:00:08.651312


  5%|▌         | 1/20 [00:09<03:04,  9.72s/it]

Processing key: AkurJ_DegoS_771400
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:18:40.872988


 10%|█         | 2/20 [00:16<02:25,  8.07s/it]

Time taken: 0:00:05.900391
Processing key: AkurJ_PeteD_886346
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:18:47.729473
Time taken: 0:00:54.156200


 15%|█▌        | 3/20 [01:11<08:22, 29.55s/it]

Processing key: AkurJ_UgunZ_1049441
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:19:42.872961


 20%|██        | 4/20 [01:22<05:54, 22.14s/it]

Time taken: 0:00:09.789939
Processing key: Andra_Elita_1053573
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:19:53.807475
Time taken: 0:00:07.961298


 25%|██▌       | 5/20 [01:31<04:21, 17.44s/it]

Processing key: Anoni_BandK_1333186
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:20:02.737126
Time taken: 0:00:34.847835


 30%|███       | 6/20 [02:07<05:31, 23.69s/it]

Processing key: Anoni_BandK_419229
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:20:38.568824
Time taken: 0:00:33.867620


 35%|███▌      | 7/20 [02:42<05:55, 27.34s/it]

Processing key: Anoni_KaptT_419839
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:21:13.408749
Time taken: 0:00:22.671692


 40%|████      | 8/20 [03:05<05:14, 26.17s/it]

Processing key: Anoni_SarkM_1350350
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:21:37.049199
Time taken: 0:00:03.992574


 45%|████▌     | 9/20 [03:10<03:34, 19.53s/it]

Processing key: ArdeE_ApLie_1051730
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:21:42.067826
Time taken: 0:00:10.941534


 50%|█████     | 10/20 [03:22<02:51, 17.20s/it]

Processing key: ArdeE_SvetA_1046832
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:21:54.005386
Time taken: 0:00:29.543506


 55%|█████▌    | 11/20 [03:53<03:11, 21.28s/it]

Processing key: ArdsL_TrijV_1053572
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:22:24.532895


 60%|██████    | 12/20 [03:58<02:11, 16.46s/it]

Time taken: 0:00:04.453059
Processing key: Arnis_AndrS_948028
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:22:29.985940
Time taken: 0:00:14.625278


 65%|██████▌   | 13/20 [04:14<01:53, 16.21s/it]

Processing key: Arnis_MilaL_1047332
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:22:45.579619
Time taken: 0:00:05.812588


 70%|███████   | 14/20 [04:21<01:20, 13.36s/it]

Processing key: Arnis_TaurK_1051711
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:22:52.454624
Time taken: 0:00:18.258761


 75%|███████▌  | 15/20 [04:40<01:15, 15.16s/it]

Processing key: Artis_ArNai_1053600
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:23:11.673023


 80%|████████  | 16/20 [04:56<01:01, 15.32s/it]

Time taken: 0:00:14.734658
Processing key: AustA_GaraJ_1025406
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:23:27.407459
Time taken: 0:01:05.062102


 85%|████████▌ | 17/20 [06:02<01:31, 30.58s/it]

Processing key: AustA_KaspG_948026
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:24:33.489968
Time taken: 0:00:06.166774


 90%|█████████ | 18/20 [06:09<00:47, 23.55s/it]

Processing key: AustA_Puisk_1047362
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:24:40.719144
Time taken: 0:00:15.409495


 95%|█████████▌| 19/20 [06:25<00:21, 21.42s/it]

Processing key: BaloP_DaugU_1051661
Using model google/gemini-flash-1.5-8b
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords similar to terms in given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Display the similar keywords as a list preserving their original transcription variants as they appear in the text. Text follows:
Start time: 2025-01-29 18:24:57.140848
Time taken: 0:00:16.578157


100%|██████████| 20/20 [06:43<00:00, 20.18s/it]


In [None]:
# now let's run rest in non verbose mode
openai_chat_all(dict(list(texts.items())[20:]), prompt_with_terms, False, 0.1, 10, model, responses_folder, "with_terms")

In [None]:
# let's call the openai_chat function for each text and save response to responses folder using key as filename
for key, text in tqdm(texts.items()):
  response = openai_chat(text, system_prompt=system_prompt_3)
  with open(responses_folder / f"{key}.txt", "w", encoding="utf-8") as f:
    f.write(response)
    time.sleep(0.5)

In [15]:
# let's open last file and read the content using utf-8 encoding
file = files[-1]
with open(file, encoding="utf-8") as f:
  text = f.read()
# first 300 characters
print(text[:300])

Sieviete


Pirmā daļa

Guberņas rentejas ierēdnis Apse pamodās, kad pulkstens aiz sienas ēdamistabā sāka sist. Pusmiegā viņš skaitīja... un nevarēja saskaitīt. Bet tomēr beigās zināja, ka četri. Četri... viņš vairāk reižu domās atkārtoja šo skaitli. Tad vienreiz izrunāja balsī tā, ka pats varēja dzi


In [17]:
# lets test the function with the text
response = openai_chat(text)
print(response)

Using model google/gemini-2.0-flash-exp:free
System prompt: Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Include specific types (e.g., cars, trains, bicycles) as well as broader categories (e.g., public transportation).
Provide the terms in Latvian, preserving their original transcription variants as they appear in the text. Text:
Start time: 2025-01-23 10:22:22.422112
Time taken: 0:00:11.144166
Pēc dotā teksta, zemes transporta līdzekļi ir:

*  zābaki (nav gan transportlīdzeklis, bet ir saistīts ar pārvietošanos pa zemi; zābaki, un arī kājas, bija bieži pieminēti)
*  kamanas (gan vienjūga, gan divjūga kamanas)
*  tramvaja vagons
*  dzelzceļa vilciens (minēts caur sajūtām - "vai sēd dzelzceļa vilcienā")
*  ekipaža (minēts kā nākotnes sapnis: "brauks sava paša eķipažā")
*  automobiļi (nav tieši minēti, bet netieši pieņemti, ka tie varētu pastāvēt, kad runāts par "pazīstamās automobiļu skaņām")
*  riteņi (saistīti ar velobraucēji

In [20]:
print(system_prompt)

Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Include specific types (e.g., cars, trains, bicycles) as well as broader categories (e.g., public transportation).
Provide the terms in Latvian, preserving their original transcription variants as they appear in the text. Text:


In [29]:
system_prompt = """You are an expert on Latvian language and transportation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: """

In [27]:
# lets test the function with the text
response = openai_chat(text, system_prompt=system_prompt)
print(response)

Using model google/gemini-2.0-flash-exp:free
System prompt: You are an expert on Latvian language and transporation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Show all specific land transportation types found in the text as a list.
No other information is needed, just the transportation terms as they appear in the text.
Provide the terms only in Latvian, preserving their original transcription variants as they appear in the text. Latvian Text follows: 
Start time: 2025-01-24 14:23:16.215333
Time taken: 0:00:12.244052
* zābakiem
* gājēji
* braucēji
*  klavieres
* krēsla
*  vilciens
* kamanu
*  šoseju
*  tramvaja
* vagons
* eķipažā
*  dzelzceļa
* slidotavu
* pajūgu
*  pulktiens
*  ormani
* autobuss



In [28]:
transport_lemmas = sorted("""zirgs
dzelzceļš
ormanis
pajūgs
rati
kariete
kamanas
laiva
plosts
burinieks
kuģis
tvaikonis
velosipēds
automašīna
mašīna
vilciens
lokomotīve
tramvajs
bānītis
vāģi
stīmeris
zēģele
fords
Mercedes
""".split())
transport_lemmas

['Mercedes',
 'automašīna',
 'burinieks',
 'bānītis',
 'dzelzceļš',
 'fords',
 'kamanas',
 'kariete',
 'kuģis',
 'laiva',
 'lokomotīve',
 'mašīna',
 'ormanis',
 'pajūgs',
 'plosts',
 'rati',
 'stīmeris',
 'tramvajs',
 'tvaikonis',
 'velosipēds',
 'vilciens',
 'vāģi',
 'zirgs',
 'zēģele']

In [30]:
system_prompt_2 = f"""You are an expert on Latvian language and transportation.
1. Please identify which keywords from the given list of land transportaton vehicles are present in the given text.
2. List of keywords: {" ".join(transport_lemmas)}
3. Identify all other land transportation vehicles in the given text.
4. Provide terms in Latvian as a list, preserving their original spelling variants as they appear in the text."""

In [38]:
# lets test the function with the text
response = openai_chat(text, system_prompt=system_prompt_2)
print(response)

Using model google/gemini-2.0-flash-exp:free
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords from the given list of land transportaton vehicles are present in the given text.
2. List of keywords: Mercedes automašīna burinieks bānītis dzelzceļš fords kamanas kariete kuģis laiva lokomotīve mašīna ormanis pajūgs plosts rati stīmeris tramvajs tvaikonis velosipēds vilciens vāģi zirgs zēģele
3. Identify all other land transportation vehicles in the given text.
4. Provide terms in Latvian as a list, preserving their original spelling variants as they appear in the text.
Start time: 2025-01-24 14:33:04.955773
Time taken: 0:00:06.156933
Okay, I can help with that! Let's break down the text as you requested.

**1. Keywords from the list found in the text:**

From the provided keyword list, the following terms are present in the text:

*   kamanas
*    mašīna
*   zirgs 
*   tramvajs
*   vilciens
*   vāģi

**2. Additional Land Transportati

In [39]:
files

[WindowsPath('../data/docs/JaunJ_Ziema_417458.txt'),
 WindowsPath('../data/docs/PaulM_ProfS_1051757.txt'),
 WindowsPath('../data/docs/UpitA_Sievi_771386.txt'),
 WindowsPath('../data/docs/UpitA_Siev_part_1.txt')]

In [40]:
# let's load second text in profesors variable
profesors = files[1]
with open(profesors, encoding="utf-8") as f:
  text = f.read()
# first 300 characters
print(text[:300])

title: Profesora Sūnas brīnišķīgais eleksīrs
isPartOf:
creator: Paulockis, Miķelis
dateIssued: 1938
publisher: Senatne
publicationPlace: Rīga
firstPublished:
firstEdition: 1938
dateModified:
copyright:
uri: http://dom.lndb.lv/data/obj/1051757
normalization:
ocrCorrection:
verified:



Profesora Sūna


In [41]:
# now let's use pandas to load last csv from csv subfolder
import pandas as pd

csv_folder = Path("../csv")
# assert csv_folder exists
assert csv_folder.exists(), "csv folder does not exist"
# list all files
csv_files = list(csv_folder.glob("*.csv"))
# sort by last modified
csv_files = sorted(csv_files, key=lambda x: x.stat().st_mtime)
# get latest file
csv_file = csv_files[-1]
# read csv file
df = pd.read_csv(csv_file)
# print first 5 rows
df.head()

Unnamed: 0,original,term,score
0,zirgs,vezums,0.813284
1,zirgs,rats,0.804218
2,zirgs,kumeļš,0.764619
3,zirgs,bēris,0.740647
4,zirgs,zirdziņš,0.732735


In [42]:
# get unique values from first column
unique_values = df.iloc[:, 0].unique()
# print unique values
print(unique_values)

['zirgs' 'dzelzceļš' 'ormanis' 'pajūgs' 'kariete' 'kamanas' 'laiva'
 'plosts' 'burinieks' 'kuģis' 'tvaikonis' 'velosipēds' 'automašīna'
 'mašīna' 'vilciens' 'lokomotīve' 'tramvajs' 'bānītis' 'stīmeris' 'zēģele']


In [50]:
# let's load last first file as text
ziema_file = files[0]
with open(ziema_file, encoding="utf-8") as f:
  text = f.read()
# first 300 characters
print(text[:300])

title: Ziema
isPartOf:
creator: Jaunsudrabiņš, Jānis, 1877-1962
dateIssued: 1925
publisher: Valters un Rapa
publicationPlace: Rīga
firstPublished:
firstEdition: 1925
dateModified: 26.06.2024
copyright:
uri: http://dom.lndb.lv/data/obj/417458
normalization:
ocrCorrection: Jā
verified: 26.06.2024, And


In [51]:
# let's test the function with profesors text and system_prompt_3
response = openai_chat(text, system_prompt=system_prompt_3)
print(response)

Using model google/gemini-2.0-flash-exp:free
System prompt: You are an expert on Latvian language and transportation.
1. Please identify which keywords from the given list of land transportaton vehicles are present in the given text.
2. List of keywords: zirgs dzelzceļš ormanis pajūgs kariete kamanas velosipēds automašīna mašīna vilciens lokomotīve tramvajs bānītis
3. Identify all other land transportation vehicles in the given text.
4. Provide terms in Latvian as a list, preserving their original spelling variants as they appear in the text.
Start time: 2025-01-24 14:57:33.166346
Time taken: 0:00:11.862424
Okay, here's the breakdown of land transportation terms from the provided text, based on your criteria:

**1. Keywords present in the text:**

Based on the provided list of keywords, the following are present in the text:
*   zirgs
*   velosipēds (not directly, but implied when referencing Pakalns and his method for obtaining water)
*   automašīna (not directly, but implied when ref

In [9]:
# now let's try the same prompt but with a different model
# we will use the following model meta-llama/llama-3.3-70b-instruct
model = "meta-llama/llama-3.3-70b-instruct"
response = openai_chat(prompt, model=model)
print(response)

Here's a list of transportation terms in the Latvian language:

1. Auto (Car) - Auto
2. Autoceļš (Highway) - Autoceļš
3. Velosipēds (Bicycle) - Velosipēds
4. Riteņbraucējs (Cyclist) - Riteņbraucējs
5. Būve (Construction) - Būve
6. Dzelzceļš (Railway) - Dzelzceļš
7. Vilciens (Train) - Vilciens
8. Lidmašīna (Airplane) - Lidmašīna
9. Osta (Port) - Osta
10. Kuģis (Ship) - Kuģis
11. Autobuss (Bus) - Autobuss
12. Trolejbuss (Trolleybus) - Trolejbuss
13. Tramvajs (Tram) - Tramvajs
14. Taksi (Taxi) - Taksi
15. Stacija (Station) - Stacija
16. Pietura (Stop) - Pietura
17. Brauciens (Trip) - Brauciens
18. Reiss (Journey) - Ceļojums (more common term for journey)
19. Maršruts (Route) - Maršruts
20. Satiksme (Transportation) - Satiksme

Additionally, here are some common phrases related to transportation:

- "Kā piesāpināt?" (How to get there?) - Kā piesāpināt?
- "Kā braukt?" (How to travel?) - Kā braukt?
- "Cik ilgi?" (How long does it take?) - Cik ilgi?
- "Cik daudz maksā?" (How much does it cost

In [10]:
# let's try the same prompt but with a different model
# we will use the following model meta-llama/llama-3.2-90b-vision-instruct:free 
model = "meta-llama/llama-3.2-90b-vision-instruct:free"
response = openai_chat(prompt, model=model)
print(response)

Here's a list of common transportation terms in Latvian:

1. Transport - transports (transportation)
2. Autobuss - autobus (bus)
3. Viena - vilciens (train)
4. Lidmašīna - lidmašīna (airplane)
5. Kuģis - kuģis (ship)
6. Laiva - laiva (boat)
7. Riteņbraucējs - riteņbraucējs (bicyclist)
8. Velosipēds - velosipēds (bicycle)
9. Motocikls - motocikls (motorcycle)
10. Automobiļi - mašīna (car)
11. Taksi - taksometrs (taxi)
12. Metro - metro (subway)
13. Tramvajs - tramvajs (tram)
14. Elektrovilciens - elektrības vilciens (electric train)
15. Starptautiskā lidosta - starptautiskā lidosta (international airport)
16. Tuksneša mašīna - kutermašīna (jeep)
17. Gājējs - gājējs (pedestrian)
18. Ceļš - ceļš (road)
19. Automaģistrāle - automaģistrāle (highway)
20. Tilts - tilts (bridge).

Note: Latvian language uses its specific characters, but they were not used here for easier search. Example of those would be:
ā (in transportation - transports), 
ē (in ceļš - road),
ī (in lidmašīna - airplane),
ū (

In [11]:
# now let's try with latest GPT-4o model
model = "openai/gpt-4o-2024-11-20"
response = openai_chat(prompt, model=model)
print(response)

Here is a list of transportation-related terms in Latvian:

1. **Automašīna** – Car  
2. **Autobuss** – Bus  
3. **Vilciens** – Train  
4. **Lidmašīna** – Airplane  
5. **Kuģis** – Ship  
6. **Velosipēds** – Bicycle  
7. **Motocikls** – Motorcycle  
8. **Skrejritenis** – Scooter  
9. **Taksometrs (Taksi)** – Taxi  
10. **Kravas automašīna** – Truck  
11. **Laiva** – Boat  
12. **Tramvajs** – Tram  
13. **Trolejbuss** – Trolleybus  
14. **Helikopters** – Helicopter  
15. **Metro** – Subway  
16. **Automaģistrāle** – Highway  
17. **Satiksmes līdzeklis** – Means of transport  
18. **Ceļš** – Road  
19. **Maršruts** – Route  
20. **Stacija** – Station  
21. **Pietura** – Stop  
22. **Autoosta** – Bus station  
23. **Osta** – Port  
24. **Lidosta** – Airport  
25. **Ceļojums** – Journey  
26. **Biļete** – Ticket  
27. **Ātrums** – Speed  
28. **Degviela** – Fuel  
29. **Pasažieris** – Passenger  
30. **Vadītājs** – Driver  
31. **Stūre** – Steering wheel  
32. **Bagāža** – Luggage  
33. **

In [12]:
prompt

'Provide a list of transportation terms in Latvian language'

In [21]:
model

'google/gemini-flash-1.5'

In [25]:
# let's get list of all txt files in data docs folder
from pathlib import Path
text_files = list(Path("../data/docs").rglob("*.txt"))
print(f"We have {len(text_files)} text files in the data/docs folder")
# first one
print(f"Name of first file: {text_files[0]}")
# 2nd one
print(f"Name of second file: {text_files[1]}")
# last one 
print(f"Name of last file: {text_files[-1]}")

We have 4 text files in the data/docs folder
Name of first file: ..\data\docs\JaunJ_Ziema_417458.txt
Name of second file: ..\data\docs\PaulM_ProfS_1051757.txt
Name of last file: ..\data\docs\UpitA_Siev_part_1.txt


In [18]:
# let's read text of last file using utf-8 encoding
with open(text_files[-1], "r", encoding="utf-8") as file:
  text = file.read()
# how many characters in the text
print(f"Number of characters in the text: {len(text)}")

Number of characters in the text: 163835


In [23]:
print(system_prompt)

Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Include specific types (e.g., cars, trains, bicycles) as well as broader categories (e.g., public transportation).
Provide the terms in Latvian, preserving their original transcription variants as they appear in the text. Text:


In [24]:
# let's use text as a prompt
response = openai_chat(text)
print(response)

The provided text mentions the following land transportation vehicles in Latvian:

* **kamanas** (sleds)
* **tramvaja vagons** (tram)
* **ormanis** (sleigh, specifically referring to a horse-drawn sleigh in this context)
* **kamanu slieces** (sled runners)
* **dzelzceļa vilciens** (train)


There is no mention of cars or bicycles.  The broader category of "public transportation" is not explicitly mentioned although "tramvajs" falls under this category.



In [29]:
elixir_text = Path(r"..\data\docs\PaulM_ProfS_1051757.txt").read_text(encoding="utf-8")
# how many characters in the text
print(f"Number of characters in the text: {len(elixir_text)}")
# first 200 characters
print(elixir_text[:400])

Number of characters in the text: 188657
title: Profesora Sūnas brīnišķīgais eleksīrs
isPartOf:
creator: Paulockis, Miķelis
dateIssued: 1938
publisher: Senatne
publicationPlace: Rīga
firstPublished:
firstEdition: 1938
dateModified:
copyright:
uri: http://dom.lndb.lv/data/obj/1051757
normalization:
ocrCorrection:
verified:



Profesora Sūnas brīnišķīgais eleksīrs

M. Paulockis

Romāns

IZDEVNIECĪBA „SENĀTNE*, RĪGĀ

Sp. „STAR* Rīgā,
Jumara


In [30]:
response = openai_chat(elixir_text)
print(response)

The provided text mentions the following land transportation vehicles in Latvian:

* auto-limuzīna
* tramvaji
* taksometrs
* autobuss
* vilcienos (plural - trains)
* lokomotīves (plural - locomotives)
* auto (plural - cars)
* elektrovāģis


Note that "automobīļus" (plural - automobiles) appears in the text within a sentence describing how people adapted to a changing world, not as a specific event in the narrative.  Therefore it's not included above, as a direct mention of a vehicle type in function within the story is required to be included in the list.



In [32]:
print(system_prompt)

Please extract a comprehensive list of all mentioned land transportation vehicles from the given texts.
Include specific types (e.g., cars, trains, bicycles) as well as broader categories (e.g., public transportation).
Provide the terms in Latvian, preserving their original transcription variants as they appear in the text. Text:


In [35]:
# # let's try different model - google/gemini-2.0-flash-exp:free
# model = "google/gemini-2.0-flash-exp:free"
# model = "google/gemini-exp-1206:free"

# response = openai_chat(elixir_text, model=model)
# print(response)

In [37]:
model = "openai/gpt-4o-mini"
response = openai_chat(elixir_text, model=model)
print(response)

Based on the provided text, here is a comprehensive list of mentioned land transportation vehicles, both specific types and broader categories in Latvian:

1. **Automobiļi** (Cars)
2. **Tramvaji** (Trams)
3. **Taksometri** (Taxis)
4. **Dzelzceļš** (Railway/Train)
5. **Vilciens** (Train)
6. **Centrālapkures katls** (Central Heating Boiler) - indirectly related as part of infrastructure
7. **Gaisa dzelzceļš** (Aerial Cableway) - mentioned in the context of transportation infrastructure

Broader categories and terms:
8. **Sabiedriskā transporta** (Public Transportation)
9. **Līdzekļi** (Vehicles/Means of Transportation)

This list includes various forms of land transportation cited directly in the text and captures the essence of the broader transportation context as discussed.


In [38]:
system_prompt = """You are an expert on Latvian language and transporation.
Please extract a comprehensive list of all mentioned land transportation vehicles from the given Latvian document.
Include all specific types (e.g., vilciens, tramvajs, divritenis, ormanis) as well as broader categories (e.g., dzelceļš, sabiedriskais transports).
Provide the terms solely in Latvian, preserving their original spelling as they appear in the text. Text: 
"""
print(f"Using model {model}")
response = openai_chat(elixir_text, system_prompt=system_prompt, model=model)
print(response)

Using model openai/gpt-4o-mini
Here is a comprehensive list of the land transportation vehicles mentioned in the provided text:

1. **vilciens** (train)
2. **tramvajs** (tram)
3. **taksometrs** (taxi)
4. **divritenis** (bicycle)
5. **auto** (car)
6. **gaisa dzelzceļš** (aerial tramway)
7. **dzelzceļš** (railway)
   
Broader categories:
1. **sabiedriskais transports** (public transportation) 

Please note that the terms are preserved in their original spelling as they appear in the text.


In [39]:
# let's try with Gemini model
model = "google/gemini-2.0-flash-exp:free"
print(f"Using model {model}")
response = openai_chat(elixir_text, system_prompt=system_prompt, model=model)
print(response)

Using model google/gemini-2.0-flash-exp:free
Okay, here's the comprehensive list of land transportation vehicles mentioned in the provided Latvian text, presented solely in Latvian:

*   sabiedriskais transports
*   auto-limuzīnā
*   taksometram
*   taksometra
*   tramvaja
*   tramvaji
*   dzelzceļpārbrauktuvi
*   dzelzceļš
*   vilciens
*   sliedēm
*   auto

