This notebook is the submission for the course: [Building AI products with OpenAI](https://corise.com/go/building-ai-products-with-openai-MWKY3)

# Intro

I decided to use local models because I didn't have any openAI credits lefts. Also this allowed me to have more control over which models are being used.

We used [Ollama](https://ollama.ai/) to download and manage models on the local machine.
We can then use [langchain](https://www.langchain.com/) to run the Ollama models

Bellow are some issues I had:
- Whisper need ffmpeg so to install it run: `brew install ffmpeg`


# We install everything we need


In [1]:
!pip install feedparser
!pip install git+https://github.com/stlukey/whispercpp.py
!pip install langchain
!pip install langchain-community
!pip install fastapi
!pip install nest_asyncio
!pip install uvicorn
!pip install faiss-cpu
!pip install sqlalchemy
!pip install pydantic

Collecting git+https://github.com/stlukey/whispercpp.py
  Cloning https://github.com/stlukey/whispercpp.py to /private/var/folders/_r/gx7qddks347_dfxnml9wdkcr0000gn/T/pip-req-build-i53dx580
  Running command git clone --filter=blob:none --quiet https://github.com/stlukey/whispercpp.py /private/var/folders/_r/gx7qddks347_dfxnml9wdkcr0000gn/T/pip-req-build-i53dx580
  Resolved https://github.com/stlukey/whispercpp.py to commit 7af678159c29edb3bc2a51a72665073d58f2352f
  Running command git submodule update --init --recursive -q
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[31mERROR: Could not find a version that satisfies the requirement sqlite3 (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for sqlite3[0m[31m


# We define the functions that can download a podcast information and episode.

There are 2 functions:
  - get_podcast_information: get the podcast information in a dictionary
  - download_episode: download an episode locally

In [2]:
import feedparser
from pathlib import Path
import logging
import requests
import uuid
import os


logging.basicConfig(filename='app.log', filemode='w', format='%(name)s - %(levelname)s - %(message)s', level=logging.DEBUG)
logger = logging.getLogger()


def get_podcast_information(rss_feed):
    """
    Returns the podcast information based on a RSS feed Url

    Parameters:
        rss_feed (str): The url for the podcast RSS feed

    Returns:
        podcast(dict): A dictionary with the podcast info
        podcast["title"] (string): The name of the podcast
        podcast["image"] (string): Url of the podcast thumbnail
        podcast["episodes"] (array): an array of all the episodes
        episode (dict): an dictionary with the information about an episode
        episode["title"] (string): The episode name
        episode["url"] (string): The link of the episode
    """
    feed = feedparser.parse(rss_feed)

    logger.debug('Getting podcast information from {}'.format(rss_feed))
    
    podcast = {
        "title": feed['feed']['title'],
        "image": feed['feed']['image'].href,
        "link": feed['feed']['link'],
        "description": feed['feed']['description'],
        "episodes": []
    }

    logger.debug(podcast)
    
    for episode in feed.entries:
        print(episode)
        episode_dict = {
            "id":episode.id,
            "title": episode.title,
            "description": episode.summary,
            "date": episode.published
        }
        
        if "itunes_episode" in episode:
            episode_dict["number"] = episode["itunes_episode"]
        
        for link in episode.links:
            if link['type'] == 'audio/mpeg' or link['type'] == 'audio/mp3':
                episode_dict["url"] = link.href
        podcast["episodes"].append(episode_dict)
        logger.debug(episode_dict)
    return podcast
    



def download_episode(episode_url, episode_name=str(uuid.uuid4())+".mp3", output_folder="/tmp/podcasts/"):
    """
    Download the audio file

    Parameters:
       episode_url (str): The url for the episode to be downloaded
       episode_name (str, optional): The name of the episode. Will be used as the file name. If not provided a random uuid will be provided. 
       output_folder (str, optional): the folder to save the downloaded episode. Defaults to "/tmp/podcasts/"
    
    Returns:
        episode_path (str): the path of the downloaded episode
    """
    logger.debug('Downloading {url} to folder {folder}'.format(url=episode_url, folder=output_folder))
    folder_path = Path(output_folder)
    folder_path.mkdir(exist_ok=True)

    with requests.get(episode_url, stream=True) as request:
        request.raise_for_status()
        episode_path = folder_path.joinpath(episode_name)
        with open(episode_path, 'wb') as file:
            for chunk in request.iter_content(chunk_size=8192):
                file.write(chunk)

    logger.debug("Podcast Episode downloaded to {path}".format(path=episode_path))
    return str(episode_path)

# podcast = get_podcast_information("https://www.marketplace.org/feed/podcast/make-me-smart")
# if podcast["episodes"][0]:
#     download_episode(podcast["episodes"][0]["url"], output_folder=os.getcwd()+"/episodes", episode_name="make_me_smart.mp3")

# We define the function that transcribe the episode audio

We use Whisper locally to transcribe the episode and then save it locally next to the episode

In [9]:
from whispercpp import Whisper

def local_transcribe_audio(audio_path):
    """
    Transcribe the audio file and save the content locally next to the audio file
    
    Parameters:
        audio_path (str): the path of the local audio file
    
    Returns:
        transcript_path (str): the path of the transcript text file
        transcript (str): the transcript of the audio
    """
    transcript_path = audio_path[:-3]+"txt"
    logger.debug('Transcribing audio from {audio_path}'.format(audio_path=audio_path))
    print(transcript_path)
    whisper = Whisper('tiny')
    result = whisper.transcribe(audio_path)
    transcript = whisper.extract_text(result)
    print(transcript)
    with open(transcript_path, 'w') as file:
        file.write(str(transcript))
    logger.debug("Transcript saved to {transcript_path}".format(transcript_path=transcript_path))
    return transcript_path, '. '.join(transcript)


# audio = os.getcwd()+"/episodes/make_me_smart.mp3"
# path,audio_transcript =local_transcribe_audio(audio)
    

# We create a function that configure the model we want to use

We use Ollama for downloading the model locally, and then use LangChain to use it.

In [4]:
from langchain_community.chat_models import ChatOllama


def setup_local_model(model):
    llm = ChatOllama(model=model)
    return llm

# We Tokenize the transcript to create embeddings

Creating embeddings allow use to be able to have prompts using the context of the transcript

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OllamaEmbeddings

def create_embeddings_from_transcript(transcript):
    """
    Use langchain to create embeddings
    
    Parameters:
        transcript (string): episode transcript to be converted into embeddings
    
    Returns:
        store (vector store): episode embeddings for the LLM
    """
    # Split the text into chunks
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    texts = splitter.split_text(transcript)
        
    # Create embeddings for each chunk
    embeddings = OllamaEmbeddings()
    
    # Create the vector store using FAISS
    store = FAISS.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))])
    return store

# We create the prompts to extract different information from the transcript

We want to get the following information:
- Podcast bullet points of the topics
- Podcast hosts and guest
- the picks from the guests if there is any
- keywords from the podcast

In [6]:
from langchain.prompts.chat import (
    ChatPromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.chains.question_answering import load_qa_chain
from langchain_core.messages import SystemMessage

def create_chat_prompt_template(llm):
    prompt = ChatPromptTemplate.from_messages(
        [
            SystemMessage(
                content=(
                    """
                    You are an expert copywriter tasked to read transcripts of podcast episodes and summarise them in a concise way.
                    You answer questions about the contents of a transcribed audio file. 
                    Use only the provided audio file transcription as context to answer the question. 
                    Do not use any additional information.
                    If you don't know the answer, just say that you don't know. Do not use external knowledge. 
                    Use three sentences maximum and keep the answer concise. 
                    """
                )
            ),
            HumanMessagePromptTemplate.from_template("Here is the transcript you need to base your answers on {context}"),
            AIMessagePromptTemplate.from_template("Thank you I will have a look at this transcript to answer your questions"),
            HumanMessagePromptTemplate.from_template("Based on the provided transcript answer the following question: {query}."),
        ]
    )
    # load in qa_chain
    return load_qa_chain(llm, chain_type="stuff", prompt=prompt)

def create_embedding_docs(store, query):
    return store.similarity_search(query)

def get_episode_summary(docs, chain):
    return chain.run(input_documents=docs, query="What is the main topic of this podcast. Start with a short summary then a list of bullet points. And conclude with a key highlight")

def get_episode_speakers(docs, chain):
    return chain.run(input_documents=docs, query="Make the list of the people talking on this podcast. make the difference between guests and hosts. Write your answer in a JSON format without any introduction.")

def get_episode_keywords(docs, chain):
    return chain.run(input_documents=docs, query="Use 3 to 5 keywords to describe the content of this podcast. Write the list as a JSON array without any introduction")

def get_episode_picks(docs, chain):
    return chain.run(input_documents=docs, query="Is there any picks by the people on this podcast. Write who made each picks. If there isn't any picks just reply 'no picks' do not write any introduction just give the answer")

## Playground
to test the functions we have created so far

In [7]:
transcript_path = os.getcwd()+"/episodes/make_me_smart.txt"

with open(transcript_path, 'r') as file:
    transcript = file.read()

llm= setup_local_model('llama2')
store = create_embeddings_from_transcript(transcript)
print('store')
transcript_chain = create_chat_prompt_template(llm)
print(transcript_chain)
transcript_docs = store.similarity_search("podcast topic")


store
llm_chain=LLMChain(prompt=ChatPromptTemplate(input_variables=['context', 'query'], messages=[SystemMessage(content="\n                    You are an expert copywriter tasked to read transcripts of podcast episodes and summarise them in a concise way.\n                    You answer questions about the contents of a transcribed audio file. \n                    Use only the provided audio file transcription as context to answer the question. \n                    Do not use any additional information.\n                    If you don't know the answer, just say that you don't know. Do not use external knowledge. \n                    Use three sentences maximum and keep the answer concise. \n                    "), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], template='Here is the transcript you need to base your answers on {context}')), AIMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='Thank you I will have a look at this 

In [75]:
transcript_chain.run(input_documents=transcript_docs, query="What is the main topic of this podcast. Start with a short summary then a list of bullet points. And conclude with a key highlight")

"Summary: The main topic of this podcast is Apple and their products, specifically the new iPhones and other devices.\n\nBullet Points:\n\n* The hosts discuss the latest news and rumors about Apple's upcoming product releases\n* They talk about the design and features of the new iPhones and how they compare to previous models\n* They mention a poll conducted among their listeners about their preferences for Apple devices\n* One of the hosts shares their personal experience with an Apple device and how it has helped them in their work\n\nKey Highlight: The main focus of the podcast is on Apple's latest products and how they are perceived by the hosts and their audience."

In [77]:
transcript_chain.run(input_documents=transcript_docs, query="Is there any picks by the people on this podcast. Write who made each picks. If there isn't any picks just reply 'no picks' do not write any introduction just give the answer")

'Yes, there are picks made by the people on this podcast. Here are the individuals who made each pick:\n\n* Drew picked Apple Vision Pro.\n* Kimberly Adams picked Novice Afo.\n* Willie picked Los Angeles.\n* Ariana Rosas picked a Fume Blanc from via D. Guadalupe in Baja, California, Mexico.'

In [71]:
transcript_chain.run(input_documents=transcript_docs, query="Use 3 to 5 keywords to describe the content of this podcast. Write the list as a JSON array without any introduction")

'[\n"Technology",\n"Apple",\n"Innovation",\n"Products",\n"Reviews"\n]'

In [74]:
transcript_chain.run(input_documents=transcript_docs, query="Make the list of the people talking on this podcast. make the difference between guests and hosts. Write your answer in a JSON format without any introduction.")

'{\n"speakers": [\n{\n"name": "Kimberly Adams",\n"role": "host"\n},\n{\n"name": "Nova Soffinding",\n"role": "host"\n},\n{\n"name": "Willie",\n"role": "guest"\n},\n{\n"name": "Drew",\n"role": "host"\n},\n{\n"name": "Ariana Rosas",\n"role": "producer"\n}\n]\n}'

# Setting up the web server

We want to have a webserver that will serve an interactive website.
We use FastAPI as it is a lightweight python server.

Because for this assignment we need to only submit this notebook we need to create the HTML and JavaScript files from the notebook rather than just using static files. The files will still be available in the gitHub repository

## Database setup

We use a database to store the results so that we don't have to redownload and regenerate everything. We use SQLite with sqlalchemy to make it easier

In [ ]:
static_path = os.getcwd()+"/static/"
os.makedirs("static", exist_ok=True)
def create_index_html():
   
    index_html = """
<!DOCTYPE html>
<html lang="en">
<head>
    <meta name="charset" value="UTF-8">
    <meta name="viewport" value="width=device-width, initial-scale=1.0">
    <title>Home Page</title>
    <!-- Include Tailwind CSS via CDN -->
    <script src="https://cdn.tailwindcss.com"></script>
    <script src="https://unpkg.com/vue@3/dist/vue.global.js"></script>
</head>
<body>
<div id="app">
</div>
<!-- Include Vue3 via CDN -->

<script type="module">
    import app from './app.js'
    const {createApp} = Vue;

    createApp(app).mount('#app');
</script>
</body>
</html>
        """
    with open(static_path+"index.html", 'w') as file:
        file.write(index_html)


def create_app_js():
    app_js="""
import HomePage from "./HomePage.js";
import PodcastPage from "./PodcastPage.js";

export default {
  name: 'App',
  components: {
    HomePage,
    PodcastPage
  },

  setup() {
    const {ref, watch} = Vue;

    const podcastFeed = ref('')
    const podcast = ref(null)

    watch(podcastFeed, async (newVal)=>{
      const response = await fetch('/api/rss?feed='+newVal)

      podcast.value = await response.json()

      console.log({podcast: podcast.value})
    })

    function saveFeed(feed){
      console.log({feed})
      podcastFeed.value = feed
    }

    return {podcastFeed, saveFeed, podcast}
  },

  template: `
  <div id="content" class="bg-gradient-to-r from-purple-400 via-pink-500 to-red-500 h-screen text-gray-800 text-lg fixed h-full w-full">
    <HomePage v-if="!podcastFeed" @submit="saveFeed"></HomePage>
    <PodcastPage v-if="podcast" :podcast="podcast"></PodcastPage>
  </div>
    `,
};
    """
    with open(static_path+"app.js", 'w') as file:
        file.write(app_js)
        
def create_home_page():
    homepage = """
export default {
  name: 'HomePage',
  emits: ['submit'],

  setup(props, {emit}) {
    const {ref, defineProps, defineEmits} = Vue

    console.log({props})
    console.log({emit})

    const rssFeed = ref('')

    function saveFeedUrl() {
      console.log({rssFeed: rssFeed.value})
      emit('submit', rssFeed.value)
    }

    return {rssFeed, saveFeedUrl}
  },

  template: `
  <div class="flex items-center justify-center h-screen w-full">
    <div id="home-page" class="bg-white p-10 rounded-xl shadow-lg max-w-md w-full">
          <h1 class="text-4xl font-bold mb-6 text-center">Podcast RSS Feed</h1>
          <p class="mb-6 text-center">Please enter the RSS feed of the podcast:</p>
          <input type="text" placeholder="RSS feed..." 
              class="w-full p-3 rounded-lg shadow-inner focus:outline-none focus:ring-2 focus:ring-purple-600" 
              :class="{ 'ring-2 ring-red-500': rssFeed.length > 0 && rssFeed.length < 3 }" 
              v-model="rssFeed">
          <button class="bg-purple-600 text-white w-full mt-6 p-2 rounded hover:bg-purple-700 transition-colors duration-150" @click="saveFeedUrl">
              Submit
          </button>
      </div>
    </div>
    `,
};    
"""
    with open(static_path+"HomePage.js", 'w') as file:
        file.write(homepage)
        

def create_podcast_page():
    podcast_page = """
import EpisodePage  from "./EpisodePage.js";

export default {
  name: 'PodcastPage',
  components:{
    EpisodePage
  },
  props: ['podcast'],

  setup(props) {

    const {ref} = Vue

    const podcast = props.podcast
    const selectedEpisode = ref(null)


    return {podcast, selectedEpisode}
  },

  template: `
    <div v-if="!selectedEpisode" class="container mx-auto flex flex-col flex-start p-8">
        <div v-if="podcast.title"  class="flex flex-col md:flex-row items-center bg-white p-5 rounded shadow">
            <img class="mb-5 md:mb-0 md:mr-5 rounded-full h-48 w-48 object-cover" :src="podcast.image" alt="Podcast Image">
            <div>
                <h1 class="text-3xl font-bold mb-2">Podcast Name: {{podcast.title}}</h1>
                <p class="text-gray-600">Podcast Description:</p>
                <p class="text-gray-400 font-sm" v-html="podcast.description"></p>
                <a target="_blank" :href="podcast.link">Go to website</a>
            </div>
        </div>
        <div v-if="podcast?.episodes?.length" class="mt-10">
            <h2 class="text-2xl font-bold mb-5">Episodes</h2>
            <ul class="space-y-4 overflow-scroll">
                <li v-for="episode of podcast.episodes" class="bg-white p-4 rounded shadow">
                    <details>
                        <summary class="text-lg font-bold">Episode {{episode.number}}: {{episode.title}}</summary>
                        <p class="text-gray-600 font-sm p-4" v-html="episode.description"></p>
                        <div class="mt-2">
                            <button class="bg-blue-400 color-white py-1 px-2 rounded-md" @click="selectedEpisode=episode">Go to Episode Page -></button>
                        </div>
                    </details>
                </li>
            </ul>
        </div>
    </div>
    <EpisodePage :episode="selectedEpisode" @back="selectedEpisode=null" />
    `,
};
"""
    with open(static_path+"PodcastPage.js", 'w') as file:
        file.write(podcast_page)
        
def create_episode_page():
    episode_page = """
export default {
    name: 'EpisodePage',
    props: ['episode'],
    emits: ['back'],

    setup(props, {emit}) {
        const {ref} = Vue

        function goBack() {
            emit('back')
        }

        const transcript = ref('')
        const transcriptPath = ref('')
        const audioPath = ref('')

        const summary = ref('')
        const keywords = ref('')
        const speakers = ref('')
        const picks = ref('')

        async function downloadPodcast() {
            const response = await fetch("/api/download", {
                method: "POST",
                body: JSON.stringify({
                    url: props.episode.url,
                    name: props.episode.title,
                }),
                headers: {
                    "Content-type": "application/json; charset=UTF-8"
                }
            });

            audioPath.value = await response.text()
        }

        async function generateTranscript() {
            console.log({audioPath: audioPath.value})
            const response = await fetch("/api/transcribe", {
                method: "POST",
                body: JSON.stringify({
                    path: audioPath.value.replaceAll('"','')
                }),
                headers: {
                    "Content-type": "application/json; charset=UTF-8"
                }
            });

            const data = await response.json()

            transcript.value = data.transcript
            transcriptPath.value = data['transcript_path']
        }

        async function generateAiResults(){
            const response = await fetch('/api/info?episode_url='+props.episode.url)

            const data = await response.json()

            summary.value= data.summary
            keywords.value = data.keywords
            speakers.value = data.speakers
            picks.value = data.picks
        }

        return { goBack, downloadPodcast, generateTranscript, generateAiResults, audioPath, transcript, transcriptPath, summary, keywords, speakers, picks}
    },

    template: `
    <div class="container mx-auto flex flex-col flex-start p-8" v-if="episode">
        <div class="mb-4">
            <button class="font-bold" @click="goBack"><- Go Back</button>
        </div>
        <div class="flex flex-col items-center justify-center w-full bg-white rounded-lg p-8">
            <h2 class="text-xl font-bold mb-6">{{episode.title}}</h3>
            <div class="flex flex-row gap-10">
                <button class="px-3 py-1 text-white transition-colors duration-150 bg-blue-600 rounded-lg focus:shadow-outline hover:bg-blue-500" @click="downloadPodcast">Download Audio</button>
                <button class="px-3 py-1 text-white transition-colors duration-150 bg-blue-600 rounded-lg focus:shadow-outline hover:bg-blue-500" @click="generateTranscript">Generate transcript</button>
                <button class="px-3 py-1 text-white transition-colors duration-150 bg-blue-600 rounded-lg focus:shadow-outline hover:bg-blue-500" @click="generateAiResults">Ask AI to generate episode information</button>
            </div>
            <div class="flex flex-col w-full">
                <h3 class="font-bold text-lg">AI Keywords</h3>
                <p>
                <pre class="text-wrap text-sm">
                    {{keywords}}
                </pre>
                </p>
            </div>
            <div class="flex flex-col w-full">
                <h3 class="font-bold text-lg">AI Summary</h3>
                <p>
                    <pre class="text-wrap text-sm">{{summary}}</pre>
                </p>
            </div>
            <div class="flex flex-col w-full">
                <h3 class="font-bold text-lg">AI Guest list</h3>
                <p><pre class="text-wrap text-sm">{{speakers}}</pre></p>
            </div>
            <div class="flex flex-col w-full">
                <h3>AI episode picks</h3>
                <p><pre class="text-wrap text-sm">{{picks}}</pre></p>
            </div>
            <div class="flex flex-col w-full">
                <h3 class="font-bold text-lg">Transcript</h3>
                <textarea>{{transcript}}</textarea>
            </div>
        </div>
        
    </div>
    `,
};
"""
    with open(static_path+"EpisodePage.js", 'w') as file:
        file.write(episode_page)
        
create_app_js()
create_index_html()
create_home_page()
create_podcast_page()
create_episode_page()

In [11]:
from sqlalchemy import create_engine, Table, MetaData, Column, Integer, String, select
from fastapi import FastAPI, Request
from fastapi.responses import RedirectResponse
from fastapi.staticfiles import StaticFiles
import json
import nest_asyncio
import uvicorn
from pydantic import BaseModel

llm= setup_local_model('llama2')

# Create a new SQLite database (or connect to an existing one)
engine = create_engine('sqlite:///example.db', echo=True)

# Create a metadata instance
metadata = MetaData()

# Define a new table with a name, metadata,
# two columns and a primary key column.
episodes = Table('episodes', metadata,
                 Column('id', Integer, primary_key=True),
                 Column('url', String, unique=True),
                 Column('audio_path', String, nullable=True),
                 Column('transcript', String, nullable=True),
                 Column('transcript_path', String, nullable=True),
                 Column('summary', String, nullable=True),
                 Column('keywords', String, nullable=True),
                 Column('picks', String, nullable=True),
                 Column('speakers', String, nullable=True),
                 )

# Create the table
metadata.create_all(engine)

app = FastAPI()
os.makedirs("static", exist_ok=True)
app.mount("/static", StaticFiles(directory="static"), name="static")

class PodcastDownloadRequest(BaseModel):
    url: str
    name: str
    
class PodcastTranscribeRequest(BaseModel):
    path: str

@app.get("/")
def index():
    return RedirectResponse(url="/static/index.html")

@app.get("/api/rss")
async def rss(request: Request):
    rss_feed = request.query_params.get('feed')
    print(rss_feed)
    podcast_from_rss = get_podcast_information(rss_feed)
    return podcast_from_rss

@app.post("/api/download")
async def download_podcast(request: PodcastDownloadRequest):
    podcast_request = request.dict()
    print(podcast_request)
    with engine.connect() as connection:
        result = connection.execute(select(episodes.c.audio_path).where(episodes.c.url == podcast_request["url"])).first()
        print(result)
        if result:
            return result[0]
    audio_path = download_episode(podcast_request["url"], podcast_request["name"]+".mp3", output_folder=os.getcwd()+"/episodes")
    with engine.connect() as connection:
        connection.execute(episodes.insert().values(url=podcast_request["url"], audio_path=audio_path))
        connection.commit()
    return audio_path

@app.post("/api/transcribe")
async def transcribe_audio(request: PodcastTranscribeRequest):
    audio_file = request.dict()
    audio_path = audio_file["path"]
    print(audio_path)
    with engine.connect() as connection:
        result = connection.execute(select(episodes.c.transcript, episodes.c.transcript_path).where(episodes.c.audio_path == audio_path)).first()
        print(result)
        if result[0]:
            return {"transcript_path": result[1], "transcript": result[0]}
    transcript_path, transcript = local_transcribe_audio(audio_path)
    with engine.connect() as connection:
        connection.execute(episodes.update().values(transcript=transcript, transcript_path=transcript_path).where(episodes.c.audio_path==audio_path))
        connection.commit()
    return {"transcript_path": transcript_path, "transcript": transcript}



@app.get("/api/info")
async def get_info_from_transcript(request: Request):
    episode_url = request.query_params.get("episode_url")
    print(episode_url)
    with engine.connect() as connection:
        result = connection.execute(
            select(episodes.c.transcript, episodes.c.summary, episodes.c.speakers, episodes.c.keywords, episodes.c.picks).where(episodes.c.url == episode_url)).first()
        if not result:
            return {"summary": "", "keywords": "", "speakers": "", "picks": ""}
        if result[1]:
            return {"summary": result[0], "keywords": result[2], "speakers": result[1], "picks": result[3]}
    
        episode_store = create_embeddings_from_transcript(result[0])
        print('store')
        episode_chain = create_chat_prompt_template(llm)
        print(transcript_chain)
        episode_docs = create_embedding_docs(episode_store,"podcast topics")
        
        summary = get_episode_summary(episode_docs, episode_chain)
        speakers = get_episode_speakers(episode_docs, episode_chain)
        keywords = get_episode_keywords(episode_docs, episode_chain)
        picks = get_episode_picks(episode_docs, episode_chain)
        
        connection.execute(episodes.update().values(
            summary=summary,
            keywords=keywords,
            picks=picks,
            speakers=speakers
        ))
        connection.commit()
        
        return {"summary": summary, "keywords": keywords, "speakers": speakers, "picks": picks}

if __name__ == "__main__":
    nest_asyncio.apply()
    uvicorn.run(app)



2024-02-05 00:03:55,233 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:03:55,234 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("episodes")
2024-02-05 00:03:55,234 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-02-05 00:03:55,235 INFO sqlalchemy.engine.Engine COMMIT


INFO:     Started server process [59032]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)


None
2024-02-05 00:03:58,877 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:03:58,882 INFO sqlalchemy.engine.Engine SELECT episodes.summary, episodes.speakers, episodes.keywords, episodes.picks 
FROM episodes 
WHERE episodes.url IS NULL
2024-02-05 00:03:58,886 INFO sqlalchemy.engine.Engine [generated in 0.00923s] ()
2024-02-05 00:03:58,891 INFO sqlalchemy.engine.Engine ROLLBACK
INFO:     127.0.0.1:61103 - "GET /api/info?feed=https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:61304 - "GET /static/index.html HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:61304 - "GET /static/PodcastPage.js HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:61304 - "GET /static/EpisodePage.js HTTP/1.1" 200 OK
https://feeds.acast.com/public/shows/73fe3ede-5c5c-4850-96a8-30db8dbae8bf


IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3
2024-02-05 00:04:58,516 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:04:58,521 INFO sqlalchemy.engine.Engine SELECT episodes.summary, episodes.speakers, episodes.keywords, episodes.picks 
FROM episodes 
WHERE episodes.url = ?
2024-02-05 00:04:58,528 INFO sqlalchemy.engine.Engine [generated in 0.01169s] ('https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3',)
store
llm_chain=LLMChain(prompt=ChatPromptTemplate(input_variables=['context', 'query'], messages=[SystemMessage(content="\n                    You are an expert copywriter tasked to read transcripts of podcast episodes and summarise them in a concise way.\n                    You answer questions about the contents of a transcribed audio file. \n                    Use only the provided audio file transcription as context to answer the question. \n                    Do not use any additional in

  warn_deprecated(


2024-02-05 00:06:06,032 INFO sqlalchemy.engine.Engine UPDATE episodes SET summary=?, keywords=?, picks=?, speakers=?
2024-02-05 00:06:06,040 INFO sqlalchemy.engine.Engine [generated in 0.00879s] ("Summary: The main topic of this podcast is the discussion of various topics related to technology, society, and culture.\n\nBullet Points:\n\n* The h ... (537 characters truncated) ... e of topics related to technology, society, and culture, with a focus on Apple products and services, as well as the impact of technology on society.", '[\n"tech",\n"Apple",\n"innovation",\n"products",\n"reviews"\n]', 'Yes, there are picks made by the people on this podcast. Here are the picks:\n\n* Drew Jostead picked "full" for the category "universal music group  ... (43 characters truncated) ...  picked "half empty" for the category "Apple Vision Pro poll"\n* Kai picked "I don\'t know" for the category "What are you drinking my young friend?"', '{\n"speakers": [\n{\n"name": "Kai",\n"role": "host"\n},\n{\n"n

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3
2024-02-05 00:08:20,281 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:08:20,283 INFO sqlalchemy.engine.Engine SELECT episodes.summary, episodes.speakers, episodes.keywords, episodes.picks 
FROM episodes 
WHERE episodes.url = ?
2024-02-05 00:08:20,285 INFO sqlalchemy.engine.Engine [cached since 201.8s ago] ('https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3',)
2024-02-05 00:08:20,291 INFO sqlalchemy.engine.Engine ROLLBACK
INFO:     127.0.0.1:62261 - "GET /api/info?episode_url=https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:62361 - "GET /static/index.html HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:62361 - "GET /static/EpisodePage.js HTTP/1.1" 304 Not Modified
https://feeds.acast.com/public/shows/73fe3ede-5c5c-4850-96a8-30db8dbae8bf


IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3
2024-02-05 00:08:46,034 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:08:46,037 INFO sqlalchemy.engine.Engine SELECT episodes.summary, episodes.speakers, episodes.keywords, episodes.picks 
FROM episodes 
WHERE episodes.url = ?
2024-02-05 00:08:46,039 INFO sqlalchemy.engine.Engine [cached since 227.5s ago] ('https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3',)
2024-02-05 00:08:46,041 INFO sqlalchemy.engine.Engine ROLLBACK
INFO:     127.0.0.1:62362 - "GET /api/info?episode_url=https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:62609 - "GET /static/index.html HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:62609 - "GET /static/EpisodePage.js HTTP/1.1" 200 OK
https://feeds.acast.com/public/shows/73fe3ede-5c5c-4850-96a8-30db8dbae8bf


IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3
2024-02-05 00:09:35,544 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:09:35,546 INFO sqlalchemy.engine.Engine SELECT episodes.summary, episodes.speakers, episodes.keywords, episodes.picks 
FROM episodes 
WHERE episodes.url = ?
2024-02-05 00:09:35,547 INFO sqlalchemy.engine.Engine [cached since 277s ago] ('https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3',)
2024-02-05 00:09:35,549 INFO sqlalchemy.engine.Engine ROLLBACK
INFO:     127.0.0.1:62610 - "GET /api/info?episode_url=https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:63078 - "GET /static/index.html HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:63078 - "GET /static/EpisodePage.js HTTP/1.1" 200 OK
https://feeds.acast.com/public/shows/73fe3ede-5c5c-4850-96a8-30db8dbae8bf


IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



INFO:     127.0.0.1:63079 - "GET /static/index.html HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:63079 - "GET /static/EpisodePage.js HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:63079 - "GET /static/index.html HTTP/1.1" 304 Not Modified
https://feeds.acast.com/public/shows/73fe3ede-5c5c-4850-96a8-30db8dbae8bf


IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3
2024-02-05 00:12:10,897 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:12:10,900 INFO sqlalchemy.engine.Engine SELECT episodes.summary, episodes.speakers, episodes.keywords, episodes.picks 
FROM episodes 
WHERE episodes.url = ?
2024-02-05 00:12:10,901 INFO sqlalchemy.engine.Engine [cached since 432.4s ago] ('https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3',)
2024-02-05 00:12:10,904 INFO sqlalchemy.engine.Engine ROLLBACK
INFO:     127.0.0.1:63248 - "GET /api/info?episode_url=https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:63564 - "GET /static/index.html HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:63564 - "GET /static/EpisodePage.js HTTP/1.1" 200 OK
https://feeds.acast.com/public/shows/73fe3ede-5c5c-4850-96a8-30db8dbae8bf


IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)




2024-02-05 00:13:24,385 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:13:24,388 INFO sqlalchemy.engine.Engine SELECT episodes.transcript, episodes.transcript_path 
FROM episodes 
WHERE episodes.audio_path = ?
2024-02-05 00:13:24,392 INFO sqlalchemy.engine.Engine [generated in 0.00773s] ('',)
None
2024-02-05 00:13:24,395 INFO sqlalchemy.engine.Engine ROLLBACK
INFO:     127.0.0.1:63564 - "POST /api/transcribe HTTP/1.1" 500 Internal Server Error


ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/loiclemerlus/miniconda3/envs/newConda/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 404, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/Users/loiclemerlus/miniconda3/envs/newConda/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/Users/loiclemerlus/miniconda3/envs/newConda/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/loiclemerlus/miniconda3/envs/newConda/lib/python3.8/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/loiclemerlus/miniconda3/envs/newConda/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/Users/loicleme

https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3
2024-02-05 00:13:47,295 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:13:47,301 INFO sqlalchemy.engine.Engine SELECT episodes.summary, episodes.speakers, episodes.keywords, episodes.picks 
FROM episodes 
WHERE episodes.url = ?
2024-02-05 00:13:47,302 INFO sqlalchemy.engine.Engine [cached since 528.8s ago] ('https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3',)
2024-02-05 00:13:47,305 INFO sqlalchemy.engine.Engine ROLLBACK
INFO:     127.0.0.1:63565 - "GET /api/info?episode_url=https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:64159 - "GET /static/index.html HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:64159 - "GET /static/PodcastPage.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64159 - "GET /static/EpisodePage.js HTTP/1.1" 200 OK
https://feeds.acast.com/public/shows/73fe3ede-5c5c-4

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3
2024-02-05 00:15:46,243 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:15:46,244 INFO sqlalchemy.engine.Engine SELECT episodes.summary, episodes.speakers, episodes.keywords, episodes.picks 
FROM episodes 
WHERE episodes.url = ?
2024-02-05 00:15:46,245 INFO sqlalchemy.engine.Engine [cached since 647.8s ago] ('https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3',)
2024-02-05 00:15:46,248 INFO sqlalchemy.engine.Engine ROLLBACK
INFO:     127.0.0.1:64159 - "GET /api/info?episode_url=https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:64423 - "GET /static/index.html HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:64423 - "GET /static/PodcastPage.js HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:64423 - "GET /static/EpisodePage.js HTTP/1.1" 200 OK
https://feeds.acast.com/public/shows/73fe3

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3
2024-02-05 00:16:50,616 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-02-05 00:16:50,618 INFO sqlalchemy.engine.Engine SELECT episodes.summary, episodes.speakers, episodes.keywords, episodes.picks 
FROM episodes 
WHERE episodes.url = ?
2024-02-05 00:16:50,619 INFO sqlalchemy.engine.Engine [cached since 712.1s ago] ('https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3',)
2024-02-05 00:16:50,622 INFO sqlalchemy.engine.Engine ROLLBACK
INFO:     127.0.0.1:64423 - "GET /api/info?episode_url=https://sphinx.acast.com/p/acast/s/ftnewsbriefing/e/65bc648eab4ae90016141d38/media.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:49337 - "GET / HTTP/1.1" 307 Temporary Redirect
INFO:     127.0.0.1:49337 - "GET /static/index.html HTTP/1.1" 304 Not Modified
INFO:     127.0.0.1:49337 - "GET /static/app.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:49337 - "GET /static/HomePage.js HTTP/1.1" 30

INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [59032]
