In [24]:
# Demonstration of an End-to-End AI Solution using Gemini API
### Master Thesis: *Subject Matter Language Recognition Using Training*
## This notebook demonstrates a fully operational end-to-end AI system based on the designed prompt-engineered agent developed in Lab 1.4.
## The system executes domain-specific Lithuanian speech-to-text transcription prompts using the Gemini API.

## Key objectives of this notebook:
## - Load Gemini API key securely using Colab Secrets
## - Execute zero-shot and few-shot prompt examples
## - Demonstrate the entire pipeline (data understanding → reasoning → inference → output generation)
## - Document the behaviour of the AI system as an end-to-end agent
## - Provide reflection and link notebook to GitHub repository

In [25]:
from google.colab import userdata
import google.generativeai as genai

GEMINI_KEY = userdata.get("GEMINI_KEY")
genai.configure(api_key=GEMINI_KEY)

# pažiūrim, kokie modeliai išvis pasiekiami:
for m in genai.list_models():
    if "generateContent" in m.supported_generation_methods:
        print(m.name)


models/gemini-2.5-pro-preview-03-25
models/gemini-2.5-flash
models/gemini-2.5-pro-preview-05-06
models/gemini-2.5-pro-preview-06-05
models/gemini-2.5-pro
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-flash-lite-preview
models/gemini-2.0-pro-exp
models/gemini-2.0-pro-exp-02-05
models/gemini-exp-1206
models/gemini-2.0-flash-thinking-exp-01-21
models/gemini-2.0-flash-thinking-exp
models/gemini-2.0-flash-thinking-exp-1219
models/gemini-2.5-flash-preview-tts
models/gemini-2.5-pro-preview-tts
models/learnlm-2.0-flash-experimental
models/gemma-3-1b-it
models/gemma-3-4b-it
models/gemma-3-12b-it
models/gemma-3-27b-it
models/gemma-3n-e4b-it
models/gemma-3n-e2b-it
models/gemini-flash-latest
models/gemini-flash-lite-latest
models/gemini-pro-latest
models/gemini-2.5-flash-lite
models/gemini-2.5-flash-image-preview
models/gemini-2.5-flash-image


In [26]:
import os
from google.colab import userdata

GEMINI_KEY = userdata.get('GEMINI_KEY')

if GEMINI_KEY is None:
    raise ValueError("⛔ Please add GEMINI_KEY in Colab: Tools → Secrets → New secret")

print("Gemini API key loaded successfully!")

Gemini API key loaded successfully!


In [27]:
!pip install -q google-generativeai

import google.generativeai as genai

genai.configure(api_key=GEMINI_KEY)

model = genai.GenerativeModel("gemini-pro")

print("Gemini model initialized.")

Gemini model initialized.


In [28]:
## Load Prompt Examples (Zero-shot and Few-shot)
## We will pull the `.md` files created in Lab 1.4 from the GitHub `/prompts/` directory.

In [29]:
import requests

GITHUB_BASE = "https://raw.githubusercontent.com/Eugenijus-ASR/ASR-System/main/Prompts_1.4/"

def load_prompt(name):
    url = GITHUB_BASE + name
    r = requests.get(url)
    if r.status_code != 200:
        raise Exception(f"Error loading {name}")
    return r.text

zero_shot_prompt = load_prompt("prompt1_zero_shot.md")
few_shot_prompt  = load_prompt("prompt2_few_shot.md")

print("Prompts loaded successfully.")

Prompts loaded successfully.


In [30]:
## Preview of Loaded Prompts
## Below we display the complete prompt contents exactly as stored in GitHub.

In [31]:
print("=== ZERO-SHOT PROMPT ===")
print(zero_shot_prompt)
print("\n=== FEW-SHOT PROMPT ===")
print(few_shot_prompt)

=== ZERO-SHOT PROMPT ===
# Prompt 1: Zero-shot Example

This example shows how a single AI agent receives one new (previously unseen) audio observation **X** and is asked to generate the corresponding transcription **y** without any additional examples.

---

## REQUEST (PROMPT)

You are an AI agent acting as a **domain-specific speech-to-text system** for Lithuanian expert dictation.  
Your task is to process a single audio input **X** through your full end-to-end pipeline:

1. Data understanding (identify language and domain context – e.g., medical or IT).
2. Inference (conceptually interpret acoustic features and map them to phonetic units).
3. Reasoning (apply domain-specific terminology and typical phrase structures).
4. Output generation (produce the final transcription **y**).

You will be given a description of one new Lithuanian audio recording.

### New observation X

A new Lithuanian audio file `new_case.wav` contains the following expert dictation:

> 

In [32]:
## Executing the Zero-Shot Prompt
## The system will now run the zero-shot prompt through the Gemini model and display the AI-generated transcription.

In [41]:
import google.generativeai as genai
genai.configure(api_key=GEMINI_KEY)

In [42]:
model = genai.GenerativeModel("gemini-2.5-flash")

response_zero = model.generate_content(zero_shot_prompt)
print(response_zero.text)


```json
{
  "transcription": "Pacientui planuojama atlikti papildomus tyrimus dėl širdies funkcijos sutrikimo."
}
```


In [35]:
## Executing the Few-Shot Prompt
## This example includes multiple (X,y) pairs to guide the model.
## Gemini will generate a new transcription using few-shot in-context learning.

In [36]:
response_few = model.generate_content(few_shot_prompt)
print("=== Gemini Few-shot Output ===")
print(response_few.text)

=== Gemini Few-shot Output ===
```json
{
  "transcription": "Atsarginė duomenų kopija sėkmingai sukurta ir patikrinta."
}
```


In [37]:
# End-to-End Pipeline Demonstration

## The following section conceptually shows the internal flow of our AI agent:

## 1. **Data Understanding**
##   - Identify Lithuanian language
##   - Recognize domain context (medical / IT)
##   - Extract intent from prompt

## 2. **Inference**
##   - Map expected acoustic/linguistic features
##   - Recognize patterns from expert-style phrasing
##   - Apply reasoning from previous (X,y) examples (few-shot mode)

## 3. **Reasoning**
##   - Apply domain-specific terminology
##   - Maintain syntax, phrase structure, and correct orthography

##4. **Output Generation**
##   - AI generates transcription in JSON structure:
##     `{ "transcription": "..." }`

In [38]:
custom_prompt = """
You are an AI system that transcribes Lithuanian expert dictation.

New audio description:
"Pacientui paskirta magnetinio rezonanso tomografija dėl galvos skausmų."

Return output in JSON:
{ "transcription": "..." }
"""

response_custom = model.generate_content(custom_prompt)
print(response_custom.text)

```json
{
  "transcription": "Pacientui paskirta magnetinio rezonanso tomografija dėl galvos skausmų."
}
```


In [39]:
## Reflection

## This Google Colab notebook demonstrates a complete end-to-end AI solution based on my master's thesis system.
## Using Gemini and prompt engineering, I recreated two prompt types (zero-shot and few-shot) to process Lithuanian domain-specific speech descriptions.
## The agent performs data understanding, reasoning, inference, and final output generation in a structured pipeline.
## Gemini handled domain-specific terminology well and produced consistent JSON outputs.
## The few-shot method significantly improved accuracy and consistency for specialized context examples.
## Future improvements could include adding audio-to-text pre-processing, integrating Kaldi output examples, and benchmarking accuracy against real ASR models.
## The notebook is linked to GitHub for reproducibility.

In [40]:
# Save & Upload
## After completing the notebook:
##- Go to File → Save
##- Download .ipynb
##- Add it to your GitHub under /notebooks/