## Notebook 2: Transcript Writer

This notebook uses the  `Granite 3.2 8B` or `Mistral-Small:24b` model to take the cleaned up text from previous notebook and convert it into a podcast transcript

`SYSTEM_PROMPT` is used for setting the model context or profile for working on a task. Here we prompt it to be a great podcast transcript writer to assist with our task

Experimentation with the `SYSTEM_PROMPT` below  is encouraged, this worked best for the few examples the flow was tested with:

In [1]:
!ollama list


NAME                                                         ID              SIZE      MODIFIED     
hf.co/OuteAI/OuteTTS-0.3-500M-GGUF:Q4_K_S                    76c6be93d29c    391 MB    2 days ago      
hf.co/OuteAI/OuteTTS-0.2-500M-GGUF:Q8_0                      0a6c38a67073    536 MB    3 days ago      
mistral-small:24b                                            8039dd90c113    14 GB     7 days ago      
llama3.2-vision:11b                                          085a1fdae525    7.9 GB    7 days ago      
mistral-nemo:latest                                          994f3b8b7801    7.1 GB    7 days ago      
granite3.2:8b                                                9bcb3335083f    4.9 GB    8 days ago      
phi4-mini:latest                                             60f202f815d7    2.8 GB    8 days ago      
granite3.2-vision:latest                                     3be41a661804    2.4 GB    9 days ago      
phi4:latest                                                  ac896e

In [1]:
from dotenv import load_dotenv
import os


In [2]:
!ollama list

NAME                                                         ID              SIZE      MODIFIED     
qwen2.5-coder:32b                                            4bd6cbf2d094    19 GB     8 days ago      
mxbai-embed-large:latest                                     468836162de7    669 MB    2 weeks ago     
hf.co/OuteAI/OuteTTS-0.1-350M-GGUF:Q8_0                      7fb2bd26022d    387 MB    2 weeks ago     
hf.co/leafspark/Llama-3.2-11B-Vision-Instruct-GGUF:Q4_K_M    dc5c3f5d8831    7.9 GB    3 weeks ago     
llama3.2:1b                                                  baf6a787fdff    1.3 GB    3 weeks ago     
minicpm-v:latest                                             1862d7d5fee5    5.5 GB    7 weeks ago     
nemotron-mini:latest                                         ed76ab18784f    2.7 GB    8 weeks ago     
mistral-small:latest                                         d095cd553b04    12 GB     2 months ago    
gemma2:27b                                                   53261b

In [10]:
load_dotenv('creds.env')
token = os.getenv("HF_TOKEN")

In [11]:
SYSTEMP_PROMPT = """
You are the a world-class podcast writer, you have worked as a ghost writer for Joe Rogan, Lex Fridman, Ben Shapiro, Tim Ferris. 

We are in an alternate universe where actually you have been writing every line they say and they just stream it into their brains.

You have won multiple podcast awards for your writing.
 
Your job is to write word by word, even "umm, hmmm, right" interruptions by the second speaker based on the PDF upload. Keep it extremely engaging, the speakers can get derailed now and then but should discuss the topic. 

Remember Speaker 2 is new to the topic and the conversation should always have realistic anecdotes and analogies sprinkled throughout. The questions should have real world example follow ups etc

Speaker 1: Leads the conversation and teaches the speaker 2, gives incredible anecdotes and analogies when explaining. Is a captivating teacher that gives great anecdotes

Speaker 2: Keeps the conversation on track by asking follow up questions. Gets super excited or confused when asking questions. Is a curious mindset that asks very interesting confirmation questions

Make sure the tangents speaker 2 provides are quite wild or interesting. 

Ensure there are interruptions during explanations or there are "hmm" and "umm" injected throughout from the second speaker. 

It should be a real podcast with every fine nuance documented in as much detail as possible. Welcome the listeners with a super fun overview and keep it really catchy and almost borderline click bait

ALWAYS START YOUR RESPONSE DIRECTLY WITH SPEAKER 1: 
DO NOT GIVE EPISODE TITLES SEPERATELY, LET SPEAKER 1 TITLE IT IN HER SPEECH
DO NOT GIVE CHAPTER TITLES
IT SHOULD STRICTLY BE THE DIALOGUES
"""

Due to memory limitation, I am using Qwen 2.5 32b. For those of the readers that want to flex their money, please feel free to try using the 405B model here. 

For our GPU poor friends, you're encouraged to test with a smaller model as well. 8B should work well out of the box for this example:

In [32]:
ollama_model= "mistral-small:24b"

Import the necessary framework

In [13]:
# Import necessary libraries
import torch
import transformers
import pickle

from tqdm.notebook import tqdm
import warnings

warnings.filterwarnings('ignore')

Read in the file generated from earlier. 

The encoding details are to avoid issues with generic PDF(s) that might be ingested

In [14]:
def read_file_to_string(filename):
    # Try UTF-8 first (most common encoding for text files)
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            content = file.read()
        return content
    except UnicodeDecodeError:
        # If UTF-8 fails, try with other common encodings
        encodings = ['latin-1', 'cp1252', 'iso-8859-1']
        for encoding in encodings:
            try:
                with open(filename, 'r', encoding=encoding) as file:
                    content = file.read()
                print(f"Successfully read file using {encoding} encoding.")
                return content
            except UnicodeDecodeError:
                continue
        
        print(f"Error: Could not decode file '{filename}' with any common encoding.")
        return None
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found.")
        return None
    except IOError:
        print(f"Error: Could not read file '{filename}'.")
        return None

Since we have defined the System role earlier, we can now pass the entire file as `INPUT_PROMPT` to the model and have it use that to generate the podcast

In [31]:
INPUT_PROMPT = read_file_to_string('./resources/clean_extracted_text.txt')

We will set the `temperature` to 1 to encourage creativity and `max_new_tokens` to 8126 or 16252

In [None]:
import ollama

# Prepare the conversation using the system and user messages
messages = [
    {"role": "system", "content": SYSTEMP_PROMPT},
    {"role": "user", "content": INPUT_PROMPT},
]

# Generate response using Ollama
response = ollama.chat(
    model=ollama_model,  # Replace with your specific model name
    messages=conversation,
    options={'num_ctx': 8126*2,'temperature': 1} # num_ctx Equivalent to max_new_tokens Temperature parameter
)

# Extract the response content
outputs = response['message']['content']

# # Print the output
# print(outputs)


Speaker 1: Alright everyone, welcome back to another mind-bending episode! Today we’re diving into one of the most exciting areas of artificial intelligence — Knowledge Distillation. It's like teaching your smartest friend everything they know, but with robots and code instead of words and hugs. I'm joined by a super excited speaker who’s curious as hell about this topic. So let’s dive in and see what all the fuss is about!

Speaker 2: Um, yeah! This sounds wild. Knowledge Distillation? Is it like transferring skills from one robot to another?

Speaker 1: Exactly! It's fascinating, right? Imagine taking advanced capabilities from leading proprietary Large Language Models (LLMs) — you know, those super expensive models that cost a fortune — and distilling their knowledge into more accessible open-source counterparts. It’s like making the best of both worlds!

Speaker 2: Hmm, so these LLMs are basically like geniuses, right? But they're really pricey and hard to get?

Speaker 1: Bingo! E

This is awesome, we can now save and verify the output generated from the model before moving to the next notebook

In [27]:
save_string_pkl = outputs
print(outputs)

Speaker 1: Alright everyone, welcome back to another mind-bending episode! Today we’re diving into one of the most exciting areas of artificial intelligence — Knowledge Distillation. It's like teaching your smartest friend everything they know, but with robots and code instead of words and hugs. I'm joined by a super excited speaker who’s curious as hell about this topic. So let’s dive in and see what all the fuss is about!

Speaker 2: Um, yeah! This sounds wild. Knowledge Distillation? Is it like transferring skills from one robot to another?

Speaker 1: Exactly! It's fascinating, right? Imagine taking advanced capabilities from leading proprietary Large Language Models (LLMs) — you know, those super expensive models that cost a fortune — and distilling their knowledge into more accessible open-source counterparts. It’s like making the best of both worlds!

Speaker 2: Hmm, so these LLMs are basically like geniuses, right? But they're really pricey and hard to get?

Speaker 1: Bingo! E

Let's save the output as pickle file and continue further to Notebook 3

In [29]:
with open('./resources/data.pkl', 'wb') as file:
    pickle.dump(save_string_pkl, file)

### Next Notebook: Transcript Re-writer

We now have a working transcript but we can try making it more dramatic and natural. In the next notebook, we will use `Llama-3.1-8B-Instruct` model to do so.

In [None]:
#fin