<a href="https://colab.research.google.com/github/Zeerroth/transcript-transformer/blob/main/transcrip_transformer_case_study.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install openai==0.27.0 gradio

Collecting openai==0.27.0
  Downloading openai-0.27.0-py3-none-any.whl.metadata (13 kB)
Downloading openai-0.27.0-py3-none-any.whl (70 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.1/70.1 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 1.58.1
    Uninstalling openai-1.58.1:
      Successfully uninstalled openai-1.58.1
Successfully installed openai-0.27.0


In [None]:
import os
import openai
import gradio as gr
from getpass import getpass

In [None]:
# Securely input your OpenAI API key
openai.api_key = getpass("Enter your OpenAI API key: ")

Enter your OpenAI API key: ··········


In [None]:
# System prompt for the transformation task
SYSTEM_PROMPT = """
Please transform the following unstructured transcript into a detailed, coherent, and logically structured teaching transcript suitable for a 30-minute lecture (~3900 words).

The lecture should include:
- An engaging Introduction that outlines the lecture objectives.
- Clearly defined Sections with headings and subheadings.
    - Each section should have detailed explanations, relevant examples, and, where appropriate, analogies.
- A concise Conclusion summarizing key points and suggesting next steps for students.

Use professional and academic language suitable for university-level students. Ensure the content flows logically, with smooth transitions between sections. Do not mention that this is based on a transcript or include any meta-commentary.

Assume the role of an expert educator specializing in the topic of the transcript. Your goal is to create a lecture that is informative and engaging for students.
"""

In [None]:
def split_text_to_chunks(text, max_chunk_size=2000):
    import re
    # Split the text into sentences
    sentences = re.split(r'(?<=[.!?]) +', text)
    chunks = []
    current_chunk = ""
    current_length = 0
    for sentence in sentences:
        sentence_length = len(sentence)
        if current_length + sentence_length <= max_chunk_size:
            current_chunk += sentence + " "
            current_length += sentence_length
        else:
            chunks.append(current_chunk.strip())
            current_chunk = sentence + " "
            current_length = sentence_length
    if current_chunk:
        chunks.append(current_chunk.strip())
    return chunks

In [None]:
def transform_chunk(chunk):
    try:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": chunk}
            ],
            max_tokens=1500,  # Adjusted to prevent exceeding limits
            temperature=0.7
        )
        return response["choices"][0]["message"]["content"]
    except Exception as e:
        return f"Failed to process chunk: {e}"

In [None]:
def process_transcript(file_content):
    try:
        # file_content is now bytes
        text = file_content.decode("utf-8")

        # Split text into chunks
        chunks = split_text_to_chunks(text)

        # Transform chunks with a progress bar
        from tqdm.notebook import tqdm
        transformed_chunks = []
        for chunk in tqdm(chunks, desc="Processing chunks"):
            transformed_chunk = transform_chunk(chunk)
            transformed_chunks.append(transformed_chunk)

        final_transcript = "\n\n".join(transformed_chunks)

        return final_transcript
    except Exception as e:
        return str(e)

In [None]:
iface = gr.Interface(
    fn=process_transcript,
    inputs=gr.File(label="Upload Transcript (.txt)", type="binary"),
    outputs=gr.Textbox(label="Transformed Lecture Transcript"),
    title="Transcript Transformer",
    description="Upload an unstructured transcript to transform it into a structured teaching transcript suitable for a 30-minute lecture (~3900 words).",
)

In [None]:
iface.launch(debug=True)

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://9e523d4eb0149669d5.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Processing chunks:   0%|          | 0/19 [00:00<?, ?it/s]

Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://9e523d4eb0149669d5.gradio.live


