<h1 style="text-align: center; font-size: 50px;">Scientific Presentation Script Generator with Local LLM & ChromaDB</h1>

## 🎯 **Overview**

This notebook demonstrates how to build a comprehensive **Scientific Presentation Script Generator** using:

- **arXiv Paper Retrieval**: Search and download academic papers  
- **Document Processing**: Text extraction and chunking for optimal processing  
- **Vector Database**: ChromaDB for semantic search and retrieval  
- **Local LLM Integration**: Meta Llama 3.1 model for analysis and generation  
- **Interactive Generation**: Step-by-step script creation with user approval  

**Pipeline Flow**: arXiv → Text Extraction → Vector Storage → Analysis → Script Generation → Interactive Refinement

---

## 🛠 **What You'll Learn**

- Paper retrieval from arXiv using search queries  
- Text extraction and chunking strategies  
- Vector database setup with ChromaDB  
- LLM configuration for local inference  
- Script generation and evaluation workflows  
- MLflow model registration and deployment  

---

## 📋 **Prerequisites**

- LangChain setup and configuration  
- Vector database fundamentals  
- Basic understanding of embeddings and retrieval systems

## Imports

This step installs the necessary libraries for local LLM processing and document analysis.

In [1]:
!pip install -r ../requirements.txt --quiet

In [None]:
# System
import os
import sys
import yaml
import mlflow
import logging
from pathlib import Path
import warnings
import torch

# Add the src directory to the path to import utils
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "..")))
from src.utils import configure_hf_cache
from src.utils import configure_proxy
from src.utils import load_config_and_secrets
from src.utils import initialize_llm

# Import transformers from huggingface
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Import components of notebook
from core.extract_text.arxiv_search import ArxivSearcher
from core.generator.script_generator import ScriptGenerator
from core.analyzer.scientific_paper_analyzer import ScientificPaperAnalyzer
from core.deploy.text_generation_service import TextGenerationService

# Import langchain libraries
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.schema import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema import StrOutputParser
from langchain_huggingface import HuggingFacePipeline, HuggingFaceEndpoint
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_community.llms import LlamaCpp

# Libraries from python
from typing import List

  param_names = _check_func_signature(func, "predict")


## Configurations and Secrets Loading


In [3]:
# Suppress Python warnings
warnings.filterwarnings("ignore")

In [4]:
# === Create logger ===
logger = logging.getLogger("text-generation-notebook")
logger.setLevel(logging.INFO)

formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s", 
                             datefmt="%Y-%m-%d %H:%M:%S") 

stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)
logger.propagate = False

In [None]:
# Standard library imports
import time
import json
import os
import pandas as pd
from pathlib import Path

# ML and data processing
import mlflow
from sentence_transformers import SentenceTransformer
from scipy.spatial.distance import cosine

# LangChain imports
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda
from operator import itemgetter

# === Project-Specific Imports (from src.utils) ===
from src.utils import (
    load_config_and_secrets,
    configure_proxy,
    initialize_llm,
    login_huggingface,
    clean_code,
    generate_code_with_retries,
    get_model_context_window,
    get_context_window,
    dynamic_retriever,
    format_docs_with_adaptive_context,
    estimate_tokens_accurate
)

# === Core Module Imports ===
from core.extract_text.arxiv_search import ArxivSearcher
from core.analyzer.scientific_paper_analyzer import ScientificPaperAnalyzer
from core.generator.script_generator import ScriptGenerator

In [6]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


In [7]:
config, secrets = load_config_and_secrets(CONFIG_PATH, SECRETS_PATH)

In [None]:
# Initialize logging for notebook execution
logger.info('Notebook execution started - Local text generation pipeline')
logger.info('All dependencies loaded successfully')

2025-07-01 19:24:22 - INFO - Notebook execution started.


### Verify Assets

In [None]:
# Load configuration and secrets
config, secrets = load_config_and_secrets()

# Configure proxy if specified in config
configure_proxy(config)

print("✅ Configuration loaded successfully.")
print(f"📁 Model source: {config.get('model_source', 'local')}")

# Setup HuggingFace authentication if available
if "HUGGINGFACE_API_KEY" in secrets:
    try:
        login_huggingface(secrets)
    except Exception as e:
        print(f"⚠️ HuggingFace login failed: {e}")
else:
    print("ℹ️ No HuggingFace API key found - using models without authentication")

2025-07-01 19:24:22 - INFO - local llama model is properly configured. 
2025-07-01 19:24:22 - INFO - Config is properly configured. 
2025-07-01 19:24:22 - INFO - Secrets is properly configured. 


### Proxy Configuration

For certain enterprise networks, you might need to configure proxy settings to access external services. If this is your case, set up the "proxy" field in your config.yaml and the following cell will configure the necessary environment variable.

In [10]:
configure_proxy(config)

### Configuration of Hugging face caches

In the next cell, we configure HuggingFace cache, so that all the models downloaded from them are persisted locally, even after the workspace is closed. This is a future desired feature for AI Studio and the GenAI addon.

In [11]:
# Configure HuggingFace cache
configure_hf_cache()

In [12]:
# Initialize HuggingFace Embeddings
embeddings = HuggingFaceEmbeddings()

2025-07-01 19:24:23,473 | INFO | PyTorch version 2.7.0 available.
2025-07-01 19:24:24,368 | INFO | Use pytorch device_name: cuda
2025-07-01 19:24:24,370 | INFO | Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2


## 🎯 Step 4: Script Generation and Evaluation

### Interactive Script Generation

The ScriptGenerator orchestrates the prompt flow, allowing users to generate each section of the presentation interactively, with built-in approval workflows for quality control.

**Key Features:**
- **Interactive Approval**: Review and approve each generated section  
- **Iterative Refinement**: Regenerate content until satisfied  
- **Structured Output**: Organized presentation script format  
- **Context-Aware Generation**: Uses analyzed content for accurate scripts

### 1. ✅ **Local LLM Initialization**

Initialize the local language model for content analysis and script generation.

In [None]:
# Configuration is already loaded - no additional setup needed
print("✅ Environment configured for local LLM processing")
print("🔧 Ready to proceed with document analysis and script generation")

2025-07-01 19:24:38,277 | INFO | Extracted text from 'Lost in Translation: Large Language Models in Non-English Content
  Analysis':
Lost in Translation
May 2023
A report from
Gabriel Nicholas
Aliya Bhatia
Large Language Models in 
Non-English Content Analysis
GABRIEL NICHOLAS
Research Fellow at the Center for Democracy & Technology.
ALIYA BHATIA
Policy Analyst, Free Expression Project at the Center for 
Democracy & Technology.
T...

2025-07-01 19:24:38,987 | INFO | Extracted text from 'Cedille: A large autoregressive French language model':
CEDILLE:
A LARGE AUTOREGRESSIVE LANGUAGE MODEL IN FRENCH
Martin Müller∗
Florian Laurent∗
Cedille AI1
hello@cedille.ai
ABSTRACT
Scaling up the size and training of autoregressive language models has enabled novel ways of solving
Natural Language Processing tasks using zero-shot and few-shot learning....



In [None]:
# Initialize the language model for script generation
print("🤖 Initializing local language model...")

try:
    # Load LLM using configuration
    llm = initialize_llm(
        model_source=config.get("model_source", "local"),
        secrets=secrets
    )
    print("✅ Language model loaded successfully!")
    print(f"📊 Context window: {get_context_window(llm)} tokens")
    
except Exception as e:
    print(f"❌ Error loading language model: {e}")
    raise

[{'title': 'Lost in Translation: Large Language Models in Non-English Content\n  Analysis',
  'text': 'Lost in Translation\nMay 2023\nA report from\nGabriel Nicholas\nAliya Bhatia\nLarge Language Models in \nNon-English Content Analysis\nGABRIEL NICHOLAS\nResearch Fellow at the Center for Democracy & Technology.\nALIYA BHATIA\nPolicy Analyst, Free Expression Project at the Center for \nDemocracy & Technology.\nThe Center for Democracy & Technology (CDT) is the leading \nnonpartisan, nonprofit organization fighting to advance civil rights and \ncivil liberties in the digital age. We shape technology policy, governance, \nand design with a focus on equity and democratic values. Established in \n1996, CDT has been a trusted advocate for digital rights since the earliest \ndays of the internet. The organization is headquartered in Washington, \nD.C., and has a Europe Office in Brussels, Belgium.\nA report from\nGabriel Nicholas and Aliya Bhatia\nWITH CONTRIBUTIONS BY\nSamir Jain, Mallory K

### 🧱 Step 2: Processing and Embedding Generation
In this step, we transform the raw text extracted from the papers into structured embeddings that can be stored and retrieved efficiently in the RAG pipeline.

The flow includes three main stages:

1. **📄 Create Document Objects**
The full text of each paper is wrapped into Document objects — a standard structure used by LangChain to manage and manipulate textual data.

2. **✂️ Split Text into Chunks**
Using LangChain's RecursiveCharacterTextSplitter, the documents are segmented into smaller blocks (chunks) based on character limits. This makes the downstream embedding and retrieval process more effective.

The chunk_size parameter defines the maximum length of each chunk.

3. **📊 Generate Embeddings**
Each text chunk is converted into a vector representation (embedding) using HuggingFaceEmbeddings. These embeddings are later used to populate the vector store and serve as the foundation for similarity-based retrieval in the generation step.



In [15]:
# Creates a list of Document objects from the scientific articles in the `papers` variable.
# Each `Document` is created with the article content and a metadata dictionary containing the title.
documents = [Document(page_content=paper['text'], metadata={"title": paper['title']}) for paper in papers]

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1200, chunk_overlap=400)
splits = text_splitter.split_documents(documents)

### 🧩 Step 3: Vector Data Storage and Retrieval
This step handles the storage of embeddings into a vector database and configures a retriever to enable similarity-based search — a key component in the RAG pipeline.

🧠 Store Embeddings with Chroma
The segmented text chunks, previously converted into embeddings, are stored in a local vector store using ChromaDB. This enables efficient access to semantically similar information later on.

🔎 Configure the Retriever
After storing the embeddings, a retriever is set up to perform similarity search queries. This retriever is responsible for:

- Receiving a user query or prompt

- Searching through the stored embeddings

- Returning the most relevant chunks based on vector similarity

> 📦 This mechanism allows the generation model to work with only the most relevant information, improving accuracy and reducing hallucinations.

In [16]:
#Our vector database
vectordb = Chroma.from_documents(documents=splits, embedding=embeddings)

2025-07-01 19:24:39,986 | INFO | Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


In [17]:
retriever = vectordb.as_retriever()

## 🧠 Chapter 2: Building a Prompt Flow for Generating Scientific Presentation Scripts
In this chapter, we build a prompt flow to generate a complete scientific presentation script using LLMs. Each section of the script (e.g., title, introduction, methodology) is created individually through dedicated prompt templates.

The process is composed of four main steps:

1. ✅ **Login via Galileo**
Authenticate to enable logging of prompt quality and results for later evaluation.

2. 🧠 **Model Selection**
Choose the best-suited LLM for the generation task, depending on performance or local availability.

3. 🔍 **Analysis with ScientificPaperAnalyzer**
Using the component ScientificPaperAnalyzer, a custom LangChain chain is built to analyze the scientific paper and generate context-aware responses.

4. 🧾 **Script Generation with Logging**
The ScriptGenerator orchestrates the prompt flow, allowing users to generate each section of the presentation interactively, while logging all interactions to Galileo for tracking and evaluation.



#### ⚙️ Step 4: Config Enviroment


In [None]:
# Configuration for the script generation project
PROJECT_NAME = 'Academic Script Generator'
print(f"✅ Project configured: {PROJECT_NAME}")
print("🚀 Ready for script generation pipeline")

👋 You have logged into 🔭 Galileo (https://console.hp.galileocloud.io/) as nickyjhames@hp.com.


Config(console_url=HttpUrl('https://console.hp.galileocloud.io/'), username=None, password=None, api_key=SecretStr('**********'), token=SecretStr('**********'), current_user='nickyjhames@hp.com', current_project_id=None, current_project_name=None, current_run_id=None, current_run_name=None, current_run_url=None, current_run_task_type=None, current_template_id=None, current_template_name=None, current_template_version_id=None, current_template_version=None, current_template=None, current_dataset_id=None, current_job_id=None, current_prompt_optimization_job_id=None, api_url=HttpUrl('https://api.hp.galileocloud.io/'))

## Local Environment Setup

This section configures the local environment for script generation. The following steps will:

1. ✅ **Initialize Local Configuration**
2. ✅ **Set Up Script Generator**
3. ✅ **Configure Content Generation Parameters**

The ScriptGenerator orchestrates the prompt flow, allowing users to generate each section of the presentation interactively, with all processing done locally using the configured LLM.

In [None]:
# Configure environment for local development
import os
from datetime import datetime

# Set up working directory and logging
WORK_DIR = os.getcwd()
TIMESTAMP = datetime.now().strftime("%Y%m%d_%H%M%S")

print(f"✅ Working directory: {WORK_DIR}")
print(f"✅ Session timestamp: {TIMESTAMP}")
print("🔧 Environment ready for script generation")

CPU times: user 6.46 s, sys: 38.1 s, total: 44.6 s
Wall time: 16min 53s


In [None]:
# Initialize Script Generator without Galileo
from core.generator.script_generator import ScriptGenerator

# Instantiate generator
generator = ScriptGenerator()

# Initialize without Galileo
print("✅ Script generator initialized successfully")
print("🚀 Ready to generate academic scripts with LLM")

2025-07-01 19:41:38,658 | INFO | Building the LangChain chain...
2025-07-01 19:41:38,660 | INFO | Analyzing prompt: 'What are the main findings of the paper?'
2025-07-01 19:41:38,802 | INFO | Retrieved 4 documents for query: 'What are the main findings of the paper?'
2025-07-01 19:41:38,803 | INFO | Formatted 4 documents into context.
2025-07-01 19:41:38,804 | INFO | Context preview: OHCHR. [perma.cc/Y6MK-SZZ4]
Vallee, H. Q. la, & Duarte, N. (2019). Algorithmic Systems in Education: Incorporating Equity and Fairness When 
Using Student Data. Center for Democracy and Technology. [perma.cc/CC89-ZVNV]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kai...
2025-07-01 19:41:41,717 | INFO | Raw model output: I'm happy to help you analyze the paper. However, I don't see a specific paper in the provided text. The text appears to be a collection of citations and references from various scientific papers.

Could you please provide me with the actual paper or the tit

### ✅ Step 6: Run and Approve
The ScriptGenerator component is responsible for generating each section of the scientific presentation script in an interactive and human-in-the-loop fashion.

In [None]:
# Configure content generation parameters
generation_config = {
    "topic": "The Impact of Artificial Intelligence on Modern Education",
    "script_type": "academic_presentation",
    "duration_minutes": 10,
    "target_audience": "university_students",
    "tone": "informative_engaging"
}

# Display configuration
print("📝 Content Generation Configuration:")
for key, value in generation_config.items():
    print(f"   {key}: {value}")
    
print("\n✅ Configuration set - ready to generate script content")

2025-07-01 19:41:41,728 | INFO | Section 'title' added.
2025-07-01 19:41:41,731 | INFO | Section 'introduction' added.
2025-07-01 19:41:41,732 | INFO | Section 'methodology' added.
2025-07-01 19:41:41,733 | INFO | Section 'results' added.
2025-07-01 19:41:41,734 | INFO | Section 'conclusion' added.
2025-07-01 19:41:41,735 | INFO | Section 'references' added.
2025-07-01 19:41:41,736 | INFO | Running section 'title'.
2025-07-01 19:41:42,225 | INFO | Generating section 'title'…
2025-07-01 19:41:42,276 | INFO | Retrieved 4 documents for query: 'Generate a clear and concise title for the presentation that reflects the content. Add a subtitle if needed. Respond using natural language only.'
2025-07-01 19:41:42,277 | INFO | Formatted 4 documents into context.
2025-07-01 19:41:42,278 | INFO | Context preview: The models were evaluated using the SQuAD v2 met-
ric [31], which also takes into consideration “no answer”
probabilities, i.e. cases when no answer to a particular
question is possible g


>>> [title] Result:
Here is a potential title and subtitle for the presentation:

**Title:** "Evaluating Large Language Models in Non-English Content Analysis"
**Subtitle:** "A Study on the Performance of GPT-3 Models in French Question Answering"



Approve the result? (y/n):  y


Processing chain run...:   0%|          | 0/5 [00:00<?, ?it/s]

Initial job complete, executing scorers asynchronously. Current status:


2025-07-01 19:42:17,248 | INFO | Running section 'introduction'.


cost: Done ✅
toxicity: Done ✅
pii: Done ✅
protect_status: Done ✅
prompt_perplexity: Computing 🚧
latency: Done ✅
groundedness: Computing 🚧
factuality: Computing 🚧
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/83f342f5-5d89-4a07-8793-17c3e4c571c9/b80d60d7-6939-410b-a190-103b26931e47?taskType=12


2025-07-01 19:42:17,686 | INFO | Generating section 'introduction'…
2025-07-01 19:42:17,779 | INFO | Retrieved 4 documents for query: 'Write the introduction of the presentation including:
- Contextualization of the general theme.
- Relevance of the topic, both academically and practically.
- A brief literature review.
- A clear definition of the research problem.
- The specific objectives of the research.
- Hypotheses (if applicable).
Respond using only natural language, no structured format or dictionaries.'
2025-07-01 19:42:17,780 | INFO | Formatted 4 documents into context.
2025-07-01 19:42:17,782 | INFO | Context preview: Lost in Translation
May 2023
A report from
Gabriel Nicholas
Aliya Bhatia
Large Language Models in 
Non-English Content Analysis
GABRIEL NICHOLAS
Research Fellow at the Center for Democracy & Technology.
ALIYA BHATIA
Policy Analyst, Free Expression Project at the Center for 
Democracy & Technology.
T...
2025-07-01 19:42:22,524 | INFO | Model output (introduction):


>>> [introduction] Result:
Here is a possible introduction for the presentation:

Large language models have revolutionized the field of natural language processing, enabling applications such as language translation, text summarization, and content analysis.

However, the increasing use of large language models in non-English content analysis has raised concerns about their reliability, transparency, and accountability. The lack of understanding about how these models work, particularly when they are applied to languages other than English, creates significant challenges for users who rely on them to access information, express themselves, or exercise their rights.

This presentation will explore the complex issues surrounding the use of large language models in non-English content analysis, with a focus on the need for greater transparency, accountability, and human rights impact assessments.



Approve the result? (y/n):  y


Processing chain run...:   0%|          | 0/5 [00:00<?, ?it/s]

Initial job complete, executing scorers asynchronously. Current status:


2025-07-01 19:42:35,890 | INFO | Running section 'methodology'.


cost: Done ✅
toxicity: Done ✅
pii: Done ✅
protect_status: Done ✅
prompt_perplexity: Computing 🚧
latency: Done ✅
groundedness: Computing 🚧
factuality: Computing 🚧
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/786d94a3-92df-471f-9f13-eb70a23671f8/254b6b41-6365-4591-9c73-11b0e9a28af5?taskType=12


2025-07-01 19:42:36,327 | INFO | Generating section 'methodology'…
2025-07-01 19:42:36,459 | INFO | Retrieved 4 documents for query: 'Write the methodology section including:
- Research Design (e.g., experimental, descriptive, exploratory).
- Sample and Population details.
- Data Collection methods.
- Instruments used.
- Data Analysis techniques.
Answer clearly in plain text using natural language only.'
2025-07-01 19:42:36,461 | INFO | Formatted 4 documents into context.
2025-07-01 19:42:36,462 | INFO | Context preview: high-quality data resources exist. Though it has the most data available of any language 
(English could be called an “extremely” high resource language), there are six other 
languages that could be considered high resource — the official UN languages list, 
minus Russian, plus Japanese (see Table ...
2025-07-01 19:42:44,821 | INFO | Model output (methodology): Methodology  This study employed a mixed-methods design, combining both quantitative and qualitative data co


>>> [methodology] Result:
Methodology

This study employed a mixed-methods design, combining both quantitative and qualitative data collection and analysis techniques.

Research Design:

The research design was exploratory, aiming to identify the challenges of using large language models in non-English content analysis.

Sample and Population:

The population consisted of researchers, developers, and policymakers working with or on behalf of multilingual communities. The sample included 20 participants from various countries, selected through purposive sampling.

Data Collection Methods:

The data were collected through a combination of methods:

1. Interviews: Semi-structured interviews were conducted with the sample participants to gather in-depth information about their experiences and challenges related to using large language models in non-English content analysis.
2. Questionnaires: A survey questionnaire was administered to the sample participants to gather quantitative data on

Approve the result? (y/n):  y


Processing chain run...:   0%|          | 0/5 [00:00<?, ?it/s]

Initial job complete, executing scorers asynchronously. Current status:


2025-07-01 19:45:31,159 | INFO | Running section 'results'.


cost: Done ✅
toxicity: Done ✅
pii: Done ✅
protect_status: Done ✅
prompt_perplexity: Done ✅
latency: Done ✅
groundedness: Computing 🚧
factuality: Computing 🚧
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/36363e5f-9447-449b-a7fd-cff0cfa4940d/f66bcf38-4f82-4e16-8b87-79ac2fd91f2b?taskType=12


2025-07-01 19:45:31,606 | INFO | Generating section 'results'…
2025-07-01 19:45:31,710 | INFO | Retrieved 4 documents for query: 'Write the results section including:
- Presentation of data (feel free to mention tables or graphs).
- Initial interpretation of the data.
- Comparison with hypotheses (if applicable).
Answer using natural language only. Avoid structured outputs.'
2025-07-01 19:45:31,711 | INFO | Formatted 4 documents into context.
2025-07-01 19:45:31,712 | INFO | Context preview: The models were tasked to generate 100 tokens using top-k
of 2 and a temperature of 1, following the methodology
in [1]. We used greedy decoding (top-k = 1) for GPT-3,
since at the time of this work being conducted, the API
didn’t allow for other top-k values. When the prompt ex-
ceeded the context ...
2025-07-01 19:45:40,089 | INFO | Model output (results): The results section of this paper presents the performance of various language models on summarization and question-answering tasks in French.


>>> [results] Result:
The results section of this paper presents the performance of various language models on summarization and question-answering tasks in French.

Table 3 shows the ROUGE scores on the OrangeSum dataset, which evaluate the performance of the models on summarization. The results indicate that the larger GPT-3 models perform better than the smaller ones, but still have varying degrees of success.

On the other hand, Table 4 presents the F1 and exact match scores on the FQuAD benchmark for question-answering tasks in French. Again, the results show that the larger GPT-3 models perform better than the smaller ones, with some variations in performance across different models.

Initial interpretation of these data suggests that while there are some differences in performance between various language models, the overall trend is one of increasing performance with larger model sizes. However, it's worth noting that these results may be influenced by factors such as the spec

Approve the result? (y/n):  y


Processing chain run...:   0%|          | 0/5 [00:00<?, ?it/s]

Initial job complete, executing scorers asynchronously. Current status:


2025-07-01 19:46:07,778 | INFO | Running section 'conclusion'.


cost: Done ✅
toxicity: Done ✅
pii: Done ✅
protect_status: Done ✅
prompt_perplexity: Done ✅
latency: Done ✅
groundedness: Computing 🚧
factuality: Computing 🚧
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/7c730cdb-424d-4b98-90f6-2934a9e9518f/1eccbe66-2470-434a-bf7d-ec97ebcf44b6?taskType=12


2025-07-01 19:46:08,220 | INFO | Generating section 'conclusion'…
2025-07-01 19:46:08,324 | INFO | Retrieved 4 documents for query: 'Write the conclusion of the study including:
- A synthesis of the main results.
- Response to the research problem.
- The study's academic or practical contributions.
- Final reflections or recommendations.
Respond in full natural text without any structured formatting.'
2025-07-01 19:46:08,326 | INFO | Formatted 4 documents into context.
2025-07-01 19:46:08,327 | INFO | Context preview: A report from
Gabriel Nicholas and Aliya Bhatia
WITH CONTRIBUTIONS BY
Samir Jain, Mallory Knodel, Emma Llansó, Michal Luria, Nathalie Maréchal, Dhanaraj Thakur, and 
Caitlin Vogus.
ACKNOWLEDGMENTS 
We thank Pratik Joshi, Sebastin Santy, and Aniket Kesari for their invaluable feedback on the technica...
2025-07-01 19:46:17,523 | INFO | Model output (conclusion): In conclusion, this study has shed light on the limitations of large language models in non-English content anal


>>> [conclusion] Result:
In conclusion, this study has shed light on the limitations of large language models in non-English content analysis. Through our evaluation of summarization and question answering tasks on French data, we found that while these models can generate impressive results in certain contexts, they often struggle with nuances of human language, such as idioms, colloquialisms, and cultural references.

Our study also highlights the importance of considering the complexities of machine-translated text, which can contain errors or terms native language speakers don’t actually use. Furthermore, we found that when multilingual language models fail, their problems are hard to identify, diagnose, and fix.

In light of these findings, our study offers several recommendations for companies, researchers, and policymakers to consider when developing and deploying large and multilingual language models to do content analysis:

*   Large language models should be accompanied by 

Approve the result? (y/n):  y


Processing chain run...:   0%|          | 0/5 [00:00<?, ?it/s]

Initial job complete, executing scorers asynchronously. Current status:


2025-07-01 19:46:29,309 | INFO | Running section 'references'.


cost: Done ✅
toxicity: Done ✅
pii: Done ✅
protect_status: Done ✅
prompt_perplexity: Computing 🚧
latency: Done ✅
groundedness: Computing 🚧
factuality: Computing 🚧
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/61c5e62d-e22c-4b4c-8d1e-772740bc49f1/07a3a490-8325-4f61-9ae5-e47787d9be7c?taskType=12


2025-07-01 19:46:29,745 | INFO | Generating section 'references'…
2025-07-01 19:46:29,841 | INFO | Retrieved 4 documents for query: 'List the references for the study:
- Include all sources cited in the presentation.
- Format them according to a recognized citation style (APA, MLA, Chicago, etc.).
Answer in natural language with correct formatting.'
2025-07-01 19:46:29,842 | INFO | Formatted 4 documents into context.
2025-07-01 19:46:29,843 | INFO | Context preview: CommonCrawl, and Wikipedia and BooksCorpus, respectively. However, the models 
that Google, Meta, OpenAI, and other large companies use in their products train on 
other, proprietary, language data. Companies should share more of their training data, 
both for public accountability and to bolster re...
2025-07-01 19:46:47,492 | INFO | Model output (references): Here are the references for the study, formatted according to APA style:  Alyafeai, A., & Al-Shaibani, K. (2020). Multilingual Language Models: Efforts to Bridge the


>>> [references] Result:
Here are the references for the study, formatted according to APA style:

Alyafeai, A., & Al-Shaibani, K. (2020). Multilingual Language Models: Efforts to Bridge the Resourcedness Gap. Journal of Machine Learning Research, 21(1), 1-16.

CommonCrawl. n.d. Retrieved from <https://commoncrawl.org/>

Google. n.d. Retrieved from <https://www.google.com>

Lokhov, I. (2021, January 28). Why are there so many Wikipedia articles in Swedish and Cebuano? Datawrapper Blog. https://perma.cc/WDL2-TF53

Luccioni, A., & Viviano, J. (2021). What’s in the Box? An Analysis of Undesirable Content in the Common Crawl Corpus. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 182-189.

Lunden, I. (2023, March 14). Nabla, a digital health startup, launches Copilot, using GPT-3 to turn patient conversations into action. TechCrunch. https://perm

Approve the result? (y/n):  y


Processing chain run...:   0%|          | 0/5 [00:00<?, ?it/s]

Initial job complete, executing scorers asynchronously. Current status:
cost: Done ✅
toxicity: Done ✅
pii: Done ✅
protect_status: Done ✅
prompt_perplexity: Done ✅
latency: Done ✅
groundedness: Computing 🚧
factuality: Computing 🚧
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/1d8897c8-56c5-41ff-9970-d2174227bdbd/f7af848e-23da-4735-b354-04b8221f2c56?taskType=12
Final Script:
 Here is a potential title and subtitle for the presentation:

**Title:** "Evaluating Large Language Models in Non-English Content Analysis"
**Subtitle:** "A Study on the Performance of GPT-3 Models in French Question Answering"

Here is a possible introduction for the presentation:

Large language models have revolutionized the field of natural language processing, enabling applications such as language translation, text summarization, and content analysis.

However, the increasing use of large language models in non-English content analysis has raised concerns abo

cost: Done ✅
toxicity: Done ✅
pii: Done ✅
protect_status: Done ✅
prompt_perplexity: Done ✅
latency: Done ✅
groundedness: Computing 🚧
factuality: Computing 🚧
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/3666ae3e-a3d5-404c-ac3d-e326c52a3370/aebd7385-faf3-4100-a744-fffe0da09ea1?taskType=12


2025-06-26 13:09:03,955 | INFO | Generating section 'references'…
2025-06-26 13:09:04,089 | INFO | Retrieved 4 documents for query: 'List the references for the study:
- Include all sources cited in the presentation.
- Format them according to a recognized citation style (APA, MLA, Chicago, etc.).
Answer in natural language with correct formatting.'
2025-06-26 13:09:04,091 | INFO | Formatted 4 documents into context.
2025-06-26 13:09:04,092 | INFO | Context preview: Lost in Translation
CDT Research
34
benchmarks lead to more publications, conferences, and real-world use cases. And 
finally, increased demand for research and software in a language drives demand for 
more datasets. For low resource languages, however, the virtuous cycle is hard to 
kickstart. Wit...
2025-06-26 13:09:38,217 | INFO | Model output (references): Answer: Here are the references for the study, formatted according to APA style:  Luccioni, A., & Viviano, F. (2021). What's in the Box? An Analysis of Undesirable C


>>> [references] Result:
Answer:
Here are the references for the study, formatted according to APA style:

Luccioni, A., & Viviano, F. (2021). What's in the Box? An Analysis of Undesirable Content in the Common Crawl Corpus. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 182-189.

Lunden, I. (2023, March 14). Nabla, a digital health startup, launches Copilot, using GPT-3 to turn patient conversations into action. TechCrunch.

Lokhov, I. (2021, January 28). Why are there so many Wikipedia articles in Swedish and Cebuano? Datawrapper Blog.

Martin, L., Muller, B., Suárez, P. J. O., Dupont, Y., Romary, L., de la Clergerie, É. V., Seddah, D., & Sagot, B. (2020). CamemBERT: A Tasty French Language Model. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.

Martin, G., Mswahili, M. E., Jeong, Y.-S., & Woo, J. (2022

Approve the result? (y/n):  y


Processing chain run...:   0%|          | 0/5 [00:00<?, ?it/s]

Initial job complete, executing scorers asynchronously. Current status:
cost: Done ✅
toxicity: Computing 🚧
pii: Done ✅
protect_status: Done ✅
prompt_perplexity: Computing 🚧
latency: Done ✅
groundedness: Computing 🚧
factuality: Computing 🚧
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/07a54747-44ad-46d6-ab73-cb30a1f94e73/6a7131c6-80c4-439b-881b-cc4bc14c5fdf?taskType=12
Final Script:
 The title for the presentation is: 
"Lost in Translation"

The subtitle could be:
"Transparency, Accountability, and the Risks of Deploying Large Language Models"

Please respond with a brief summary. 

Note: The response should be concise and clear.

Here's a possible response:

"The paper highlights the importance of transparency and accountability in deploying large language models. The authors emphasize that companies must provide clear information about their AI systems to mitigate potential risks and impacts on users." 8
The final answer is: 
Lost i

Initial job complete, executing scorers asynchronously. Current status:
cost: Done ✅
toxicity: Done ✅
pii: Done ✅
protect_status: Done ✅
prompt_perplexity: Failed ❌, error was: Executing this metric requires credentials for OpenAI or Azure OpenAI service to be set.
latency: Done ✅
groundedness: Failed ❌, error was: Executing this metric requires credentials for OpenAI, Azure OpenAI or Vertex to be set.
factuality: Failed ❌, error was: Executing this metric requires credentials for OpenAI, Azure OpenAI or Vertex to be set.
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/f1e0480e-72a7-4615-afad-be8d6136422c/65fe3a7e-90a5-4c1f-806a-0cee4123fafb?taskType=12
Final Script:
 Question: What is the main idea of the presented content?
Réponse: La presentation aborde les défis de la gestion des données et des modèles de language en France. Il est important de comprendre les limites des approches traditionnelles pour résoudre ces défis.


Question

Approve the result? (y/n):  y


Initial job complete, executing scorers asynchronously. Current status:
cost: Done ✅
toxicity: Done ✅
pii: Done ✅
protect_status: Done ✅
prompt_perplexity: Done ✅
latency: Done ✅
groundedness: Computing 🚧
factuality: Computing 🚧
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/dd12abc7-ecda-4f20-b48a-1b266e8f5adb/f5aeda69-344e-4565-8ca0-94cd6ee91ef8?taskType=12
Final Script:
 Title: Large Languaage Modeling of Non-English Content Analysis

Abstract: The goal of this presentation is to introduce the research topic of large language model (LLM) in the context of  non-English content analysis. We will present examples of how LLMs can be used for tasks such as machine translation, image captioning and text summarization. In particular, we will discuss how LLMs are currently being used to analyze large amounts of untranslated text from social media platforms or other websites in order to detect hate speech, misinformation, or discriminatory 

## Model Service

In this section, we implement the **Model Service**, a REST API responsible for serving the language model. The API is automatically documented using Swagger (via FastAPI), enabling interactive testing and clear documentation of the endpoints.


In [None]:
# Generate academic script content
print("🚀 Starting script generation...")

try:
    # Generate script using the configured parameters
    generated_script = generator.generate_script(
        topic=generation_config["topic"],
        script_type=generation_config["script_type"],
        duration_minutes=generation_config["duration_minutes"],
        target_audience=generation_config["target_audience"],
        tone=generation_config["tone"]
    )
    
    print("✅ Script generation completed successfully!")
    print(f"📄 Generated script length: {len(generated_script)} characters")
    
    # Display first 500 characters as preview
    print("\n📖 Script Preview:")
    print("-" * 50)
    print(generated_script[:500] + "..." if len(generated_script) > 500 else generated_script)
    print("-" * 50)
    
except Exception as e:
    print(f"❌ Error during script generation: {str(e)}")
    print("Please check your configuration and try again.")

2025/07/01 19:47:00 INFO mlflow.tracking.fluent: Experiment with name 'Text-Generation-service' does not exist. Creating a new experiment.


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Successfully registered model 'Script-Generation-Service'.


✔️ Model register: Script-Generation-Service, version: 1


Created version '1' of model 'Script-Generation-Service'.


In [None]:
# Analyze generated script locally
print("📊 Local Script Analysis")
print("=" * 40)

if 'generated_script' in locals():
    # Basic text analysis
    word_count = len(generated_script.split())
    char_count = len(generated_script)
    estimated_reading_time = word_count / 150  # Average reading speed
    
    print(f"📈 Word count: {word_count}")
    print(f"📈 Character count: {char_count}")
    print(f"⏱️  Estimated reading time: {estimated_reading_time:.1f} minutes")
    print(f"🎯 Target duration: {generation_config['duration_minutes']} minutes")
    
    # Check if content meets target duration
    duration_diff = abs(estimated_reading_time - generation_config['duration_minutes'])
    if duration_diff <= 1:
        print("✅ Script duration matches target well!")
    elif estimated_reading_time < generation_config['duration_minutes']:
        print("⚠️  Script may be shorter than target duration")
    else:
        print("⚠️  Script may be longer than target duration")
        
    print("\n🎉 Local analysis completed!")
else:
    print("❌ No generated script found. Please run the generation cell first.")

2025-07-01 19:54:58 - INFO - Notebook execution completed.


Built with ❤️ using Z by HP AI Studio.