# How podcasts actually work?

In order to download soemthing, we need some kind of MP3 file. Podcasts get this mp3 files from an RSS feed. If you have a link to the RSS feed you have all the information for your podcast
    
<a href = "https://lexfridman.com/feed/podcast/">This is how an rss feed looks like </a>

RSS stands for Really Simple Syndication

<b>This XML file basically holds all the information about the podcasts the user has ever uploaded, including the mp3 file</b>




# So what can we do?
We can programatically go through our RSS feed, find the mp3/audio file link and download it along with all the relavent information

In [24]:
from bs4 import BeautifulSoup
import requests
import os
import re

In [2]:
rss_url_link = 'https://lexfridman.com/feed/podcast/'
page = requests.get(rss_url_link)
soup = BeautifulSoup(page.content, 'xml')

In [3]:
podcast_items = soup.find_all('item')

In [4]:
podcast_items[0].find('description').text

"Andrew Huberman is a neuroscientist at Stanford and host of the Huberman Lab Podcast. Please support this podcast by checking out our sponsors:\n- InsideTracker: https://insidetracker.com/lex to get 20% off\n- Eight Sleep: https://www.eightsleep.com/lex to get special savings\n- AG1: https://drinkag1.com/lex to get 1 month supply of fish oil\n- Shopify: https://shopify.com/lex to get $1 per month trial\n- NetSuite: http://netsuite.com/lex to get free product tour\n\nTranscript: https://lexfridman.com/andrew-huberman-4-transcript\n\nEPISODE LINKS:\nAndrew's YouTube: https://www.youtube.com/AndrewHubermanLab\nAndrew's Instagram: https://instagram.com/hubermanlab\nAndrew's Website: https://hubermanlab.com\nAndrew's Twitter: https://twitter.com/hubermanlab\n\nPODCAST INFO:\nPodcast website: https://lexfridman.com/podcast\nApple Podcasts: https://apple.co/2lwqZIr\nSpotify: https://spoti.fi/2nEwCF8\nRSS: https://lexfridman.com/feed/podcast/\nYouTube Full Episodes: https://youtube.com/lexfri

# Finding the mp3/audio link within the feed
To do that we can simply search for the enclosure field

In [12]:
print(podcast_items[0].find('title').text)

#we only want the url 
print(podcast_items[0].find('enclosure')['url'])

#393 – Andrew Huberman: Relationships, Drama, Betrayal, Sex, and Love
https://media.blubrry.com/takeituneasy/content.blubrry.com/takeituneasy/lex_ai_andrew_huberman_4.mp3


In [14]:
def get_audio_links(podcast_items):
    audio_links = []
    for i in podcast_items:
        title = i.find('title').text
        url = i.find('enclosure')['url']
        audio_links.append({'title':title,'url':url})
    return audio_links

audio_links = get_audio_links(podcast_items)
audio_links

[{'title': '#393 – Andrew Huberman: Relationships, Drama, Betrayal, Sex, and Love',
  'url': 'https://media.blubrry.com/takeituneasy/content.blubrry.com/takeituneasy/lex_ai_andrew_huberman_4.mp3'},
 {'title': '#392 – Joscha Bach: Life, Intelligence, Consciousness, AI & the Future of Humans',
  'url': 'https://media.blubrry.com/takeituneasy/content.blubrry.com/takeituneasy/lex_ai_joscha_bach_3.mp3'},
 {'title': '#391 – Mohammed El-Kurd: Palestine',
  'url': 'https://media.blubrry.com/takeituneasy/content.blubrry.com/takeituneasy/lex_ai_mohammed_el_kurd.mp3'},
 {'title': '#390 – Yuval Noah Harari: Human Nature, Intelligence, Power, and Conspiracies',
  'url': 'https://media.blubrry.com/takeituneasy/content.blubrry.com/takeituneasy/lex_ai_yuval_noah_harari.mp3'},
 {'title': '#389 – Benjamin Netanyahu: Israel, Palestine, Power, Corruption, Hate, and Peace',
  'url': 'https://media.blubrry.com/takeituneasy/content.blubrry.com/takeituneasy/lex_ai_benjamin_netanyahu.mp3'},
 {'title': '#388 – 

# Downloading an MP3 file

In [9]:
link = 'https://media.blubrry.com/takeituneasy/content.blubrry.com/takeituneasy/lex_ai_andrew_huberman_4.mp3'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

response = requests.get(link,headers = headers,stream = True)
with open("newfile.mp3",'wb') as f:
    for chunk in response.iter_content(chunk_size=300):
        f.write(chunk)

# Downloading multiple MP3 files in a folder

'a'

In [23]:
count = 0
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

for link in audio_links:
    if count == 5:
        break
    response = requests.get(link["url"],headers = headers,stream = True)
    title = link["title"]
    with open(f"./downloads/{title}.mp3",'wb') as f:
        for chunk in response.iter_content(chunk_size=300):
            f.write(chunk)
    print(title)
    count = count+1

#393 – Andrew Huberman: Relationships, Drama, Betrayal, Sex, and Love
#392 – Joscha Bach: Life, Intelligence, Consciousness, AI & the Future of Humans
#391 – Mohammed El-Kurd: Palestine
#390 – Yuval Noah Harari: Human Nature, Intelligence, Power, and Conspiracies
#389 – Benjamin Netanyahu: Israel, Palestine, Power, Corruption, Hate, and Peace


# Async requests for multiple requests

All code is synchronus by nature, to write async code we need to do things differents. 
We first create a coroutines object, which basically is a wrapper around our function which turns our synchronus function into an asynch function.

Using the async keyword before the function creates a coroutine object which makes the function an asynch function
Even if u use await keyword before calling the function like <await funcname()> it will give the following error
> SyntaxError: 'await' outside function

This is because, to run our coroutine object, we need to put it in an event loop using <b> asyncio.run(funcname()) </b>

The <b>await</b> keyword can only be used within an asyn function

## Doing other task while your function is idle

We can simply use the function <b> asyncio.create_task(name of async function) </b>

### Concurency is extremely useful when it comes to io operations, where u can command your progream to do other things while ur program is reading/writing information
    

In [31]:
import asyncio 
import aiofiles
import requests
from bs4 import BeautifulSoup
 
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

def get_audio_links(podcast_items):
    audio_links = []
    for i in podcast_items:
        title = i.find('title').text
        url = i.find('enclosure')['url']
        audio_links.append({'title':title,'url':url})
    return audio_links



async def download_files(fileName, content):
    async with aiofiles.open(f'./downloads/{fileName}.mp3','wb') as f:
        await f.write(content)
        print(f'File {fileName} written')
        
async def main():
    audio_links = get_audio_links(podcast_items)
    tasks = []
    count = 0
    for file in audio_links:
        if count == 5:
            break
        title = file['title']
        url = file['url']
        response = requests.get(url,headers = headers,stream = True)
        task = asyncio.create_task(download_files(title,response))
        tasks.append(task)
        
    await asyncio.gather(*tasks)
    
    
asyncio.run(main())

RuntimeError: asyncio.run() cannot be called from a running event loop

In [32]:
import asyncio 
import aiofiles
import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

def get_audio_links(podcast_items):
    audio_links = []
    for i in podcast_items:
        title = i.find('title').text
        url = i.find('enclosure')['url']
        audio_links.append({'title': title, 'url': url})
    return audio_links

async def download_files(fileName, content):
    async with aiofiles.open(f'./downloads/{fileName}.mp3', 'wb') as f:
        await f.write(content)
        print(f'File {fileName} written')

async def main():
    podcast_items = ...  # You need to define podcast_items here
    audio_links = get_audio_links(podcast_items)
    tasks = []
    count = 0
    for file in audio_links:
        if count == 5:
            break
        title = file['title']
        url = file['url']
        response = requests.get(url, headers=headers, stream=True)
        task = asyncio.create_task(download_files(title, response.content))
        tasks.append(task)
        count += 1
        
    await asyncio.gather(*tasks)

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())


RuntimeError: This event loop is already running