# Generate Article from Transcripts

This notebook allows you to process a transcript file and convert its conversational content into a well-structured, readable article format using an AI API (like OpenAI or Grok). You can configure the settings using interactive widgets.

## Step 1: Setup and Configuration

Configure the input transcript file, output directory, API key, and model settings using interactive widgets.

In [1]:
import os
import ipywidgets as widgets
from IPython.display import display
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv(dotenv_path="../.env")

# Default configuration values
DEFAULT_INPUT_FILE = "../data/demo/clean_transcription.txt"
DEFAULT_OUTPUT_DIR = "../data/articles"
DEFAULT_OUTPUT_FILE = "article.txt"
DEFAULT_MODEL = "grok-3-mini-fast-latest"
DEFAULT_PROVIDER = "grok"

# Widgets for configuration
input_file_widget = widgets.Text(
    value=DEFAULT_INPUT_FILE,
    placeholder='Enter input transcript file path',
    description='Input File:',
    layout={'width': '500px'}
)

output_dir_widget = widgets.Text(
    value=DEFAULT_OUTPUT_DIR,
    placeholder='Enter output directory',
    description='Output Dir:',
    layout={'width': '500px'}
)

output_file_widget = widgets.Text(
    value=DEFAULT_OUTPUT_FILE,
    placeholder='Enter output article filename',
    description='Output File:',
    layout={'width': '500px'}
)

model_widget = widgets.Text(
    value=DEFAULT_MODEL,
    placeholder='Enter model name',
    description='Model:',
    layout={'width': '500px'}
)

provider_widget = widgets.Dropdown(
    options=[('OpenAI', 'openai'), ('Grok (via xAI)', 'grok'), ('Other', 'other')],
    value=DEFAULT_PROVIDER,
    description='Provider:',
    layout={'width': '500px'}
)

# Display widgets
display(input_file_widget)
display(output_dir_widget)
display(output_file_widget)
display(model_widget)
display(provider_widget)
print("API key is loaded from environment variables.")

Text(value='../data/demo/clean_transcription.txt', description='Input File:', layout=Layout(width='500px'), pl…

Text(value='../data/articles', description='Output Dir:', layout=Layout(width='500px'), placeholder='Enter out…

Text(value='article.txt', description='Output File:', layout=Layout(width='500px'), placeholder='Enter output …

Text(value='grok-3-mini-fast-latest', description='Model:', layout=Layout(width='500px'), placeholder='Enter m…

Dropdown(description='Provider:', index=1, layout=Layout(width='500px'), options=(('OpenAI', 'openai'), ('Grok…

API key is loaded from environment variables.


## Step 3: Define Article Generation Function

Define a function to process the transcript and generate a readable article using the AI API.

In [2]:
from litellm import completion
import os
from pathlib import Path

def generate_article(transcript_text, api_key, model, provider):
    """Generate a readable article from transcript text using an AI API.
    
    Args:
        transcript_text (str): The raw transcript text.
        api_key (str): API key for the AI service.
        model (str): Model name to use for generation.
        provider (str): AI service provider ('openai', 'grok', etc.).
    
    Returns:
        str: The generated article text.
    """
    # Map provider to litellm model prefix if needed
    if provider == 'openai':
        model_name = model
        api_base = None
    elif provider == 'grok':
        model_name = f"xai/{model}" if not model.startswith("xai/") else model
        api_base = None  # litellm handles the API base for xAI
    else:
        model_name = model
        api_base = None
    
    prompt = f"""
    You are an expert writer tasked with converting a conversational transcript into a well-structured, readable article.
    The transcript may contain colloquial language, repetitions, and filler words. Your goal is to:
    1. Remove unnecessary repetitions and filler content.
    2. Convert spoken language into formal, written language.
    3. Preserve the original meaning and key points of the content.
    4. Organize the content into logical paragraphs with a clear flow.
    5. Use headings or subheadings if appropriate to structure the article.
    
    Here is the transcript to process:
    
    {transcript_text}
    
    Output the polished article below:
    """
    
    response = completion(
        model=model_name,
        messages=[
            {"role": "system", "content": "You are a professional writer."},
            {"role": "user", "content": prompt}
        ],
        api_key=api_key,
        api_base=api_base,
        temperature=0.7,
        max_tokens=2000
    )
    
    return response.choices[0].message.content.strip()

def save_article(article_text, output_path):
    """Save the generated article to a file.
    
    Args:
        article_text (str): The article content to save.
        output_path (str): Path to save the article file.
    """
    Path(os.path.dirname(output_path)).mkdir(parents=True, exist_ok=True)
    with open(output_path, 'w', encoding='utf-8') as f:
        f.write(article_text)

## Step 4: Generate Article from Transcript

Read the transcript file and use the AI API to generate a polished article.

In [3]:
try:
    # Get values from widgets
    input_file = input_file_widget.value
    output_dir = output_dir_widget.value
    output_file = output_file_widget.value
    model = model_widget.value
    provider = provider_widget.value
    
    # Load API key from environment variables
    api_key = os.getenv('API_KEY', '')
    if not api_key:
        raise ValueError(f"❌ 請在 .env 檔案中提供 API 金鑰")
    
    output_path = os.path.join(output_dir, output_file)
    
    # Check if input file exists
    if not os.path.exists(input_file):
        raise FileNotFoundError(f"❌ 找不到輸入逐字稿檔案：{input_file}")
    
    # Read the transcript file
    print(f"📖 讀取逐字稿檔案：{input_file}")
    with open(input_file, 'r', encoding='utf-8') as f:
        transcript_text = f.read()
    
    # Generate the article
    print("✍️ 使用 AI 生成文章中...")
    article_text = generate_article(transcript_text, api_key, model, provider)
    
    # Save the article
    save_article(article_text, output_path)
    print(f"🎉 文章生成完成！結果已儲存至 {output_path}")
except Exception as e:
    print(f"❌ 處理過程中發生錯誤：{e}")

📖 讀取逐字稿檔案：../data/demo/clean_transcription.txt
✍️ 使用 AI 生成文章中...

[1;31mProvider List: https://docs.litellm.ai/docs/providers[0m

❌ 處理過程中發生錯誤：litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=grok/grok-3-mini-fast-latest
 Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers
