#Transcriping Podcasts using RSS

### Step 1: Install Required Libraries
We’ll use feedparser to parse the RSS feed and requests to download audio files. For transcription, you can use Google Cloud Speech-to-Text, OpenAI Whisper, or any other service you prefer.

Install these packages if you haven't already:

In [None]:
pip install feedparser requests
pip install google-cloud-speech

### Step 2: Parse the RSS Feed
Here’s a Python script that parses the RSS feed, finds audio files, and downloads them.

In [None]:
import feedparser
import requests
import os

def download_podcast_episodes(rss_url, download_folder="podcast_episodes"):
    # Parse RSS feed
    feed = feedparser.parse(rss_url)
    
    # Create folder if it doesn't exist
    os.makedirs(download_folder, exist_ok=True)
    
    for entry in feed.entries:
        # Find the audio file in the entry
        if 'enclosures' in entry and entry.enclosures:
            audio_url = entry.enclosures[0].href
            audio_filename = os.path.join(download_folder, f"{entry.title}.mp3")
            
            # Download the audio file
            print(f"Downloading: {audio_url}")
            response = requests.get(audio_url)
            with open(audio_filename, 'wb') as file:
                file.write(response.content)
            print(f"Downloaded to: {audio_filename}")
    
    print("All episodes downloaded.")
    return download_folder


### Step 3: Set Up Transcription (Google Cloud Speech-to-Text)
If you’d like to use Google Cloud Speech-to-Text, ensure you’ve created a Google Cloud project and enabled the Speech-to-Text API. Download the JSON credentials and set up authentication:

In [None]:
export GOOGLE_APPLICATION_CREDENTIALS="path_to_your_credentials.json"


Here’s a function to transcribe each audio file:

In [None]:
from google.cloud import speech
import io

def transcribe_audio(file_path):
    client = speech.SpeechClient()

    # Load audio file
    with io.open(file_path, "rb") as audio_file:
        content = audio_file.read()

    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.MP3,
        language_code="en-US"  # Change to the language of the podcast
    )

    # Transcribe the audio
    response = client.recognize(config=config, audio=audio)

    # Join transcriptions
    transcript = ""
    for result in response.results:
        transcript += result.alternatives[0].transcript + "\n"
    
    return transcript


### Step 4: Integrate Everything
Here’s a full script that downloads audio episodes from an RSS feed and transcribes each one.

In [None]:
def process_podcast(rss_url):
    download_folder = download_podcast_episodes(rss_url)
    for filename in os.listdir(download_folder):
        if filename.endswith(".mp3"):
            file_path = os.path.join(download_folder, filename)
            print(f"Transcribing: {file_path}")
            transcript = transcribe_audio(file_path)
            
            # Save the transcript
            transcript_file = f"{file_path}.txt"
            with open(transcript_file, "w") as file:
                file.write(transcript)
            print(f"Transcript saved to: {transcript_file}")

rss_url = "YOUR_PODCAST_RSS_FEED_URL"
process_podcast(rss_url)
