## Earnings Call ETL
---
[Seeking Alpha](https://seekingalpha.com) has a collection of earnings call
transcripts and audio recordings publicly available [here](https://seekingalpha.com/earnings/earnings-call-transcripts)

This notebook creates an end-to-end pipeline; saving each transcript as a .txt file in a local database



### Parse earnings call transcript page

In [1]:
# from splinter import Browser
from scraping import config_browser
from bs4 import BeautifulSoup as Soup


# Earnings call transcripts webpage
url = "https://seekingalpha.com/earnings/earnings-call-transcripts"

# Configure browser and visit webpage
browser = config_browser()
browser.visit(url)

# Parse html using BeautifulSoup
html = browser.html
soup = Soup(html, 'html.parser')

### Get list of urls for each earnings call

In [11]:
from scraping import get_transcript_elements
from scraping import get_transcript_urls


# Get HTML elements containing pertinant data
transcript_elements = get_transcript_elements(soup)

print("HTML elements containing links to earnings call\n",
      transcript_elements[:2])

# Parse html for 'src' tag and return list of urls for each call
transcript_urls = get_transcript_urls(transcript_elements)

print("\nLinks to pages containing call transcripts",
      transcript_urls[:2])

HTML elements containing links to earnings call
 [<li class="list-group-item article" data-id="4388740" data-published="1605254107">
<h3 class="list-group-item-heading">
<a class="dashboard-article-link" href="/article/4388740-deutsche-telekom-ag-dtegf-ceo-timotheus-hottges-on-q3-2020-results-earnings-call-transcript" sasource="earnings-center-transcripts_article">Deutsche Telekom AG (DTEGF) CEO Timotheus Höttges on Q3 2020 Results - Earnings Call Transcript</a>
</h3>
<div class="article-desc">
<span class="article-symbols"><a href="/symbol/DTEGF" sasource="earnings-center-transcripts_symbol" title="Deutsche Telekom AG">DTEGF</a>, <a href="/symbol/DTEGY" sasource="earnings-center-transcripts_symbol" title="Deutsche Telekom AG">DTEGY</a></span><span class="bullet">•</span>
      Fri, Nov. 13,  2:55 AM

        <span>•</span>
<a href="/author/sa-transcripts" sasource="earnings-center-transcripts_author">SA Transcripts</a>
</div>
</li>, <li class="list-group-item article" data-id="4388737

### Get transcript for a single earnings call

In [13]:
# Select one url from list
url = transcript_urls[0]
url

'https://seekingalpha.com/article/4388740-deutsche-telekom-ag-dtegf-ceo-timotheus-hottges-on-q3-2020-results-earnings-call-transcript'

In [17]:
# Visit url
browser.visit(url)

# Parse with BeautifulSoup
soup = Soup(browser.html, 'html.parser')

# Get 'main_content' element
main_content = soup.find('div', {'id':'main_content'})

# HTML containing transcript
main_content.find_all('p', {'class':'p1'})

[<p class="p p1">Deutsche Telekom AG <span class="ticker-hover-wrapper">(<a class="ticker-link" data_retrieved="0" href="https://seekingalpha.com/symbol/DTEGF" symbol="DTEGF">OTCQX:DTEGF</a>)</span> Q3 2020 Earnings Conference Call November 12, 2020  8:00 AM ET</p>,
 <p class="p p1"><strong>Company Participants</strong></p>,
 <p class="p p1">Hannes Wittig - Head, Investor Relations</p>,
 <p class="p p1">Timotheus Höttges - Chief Executive Officer</p>,
 <p class="p p1">Christian Illek - Chief Financial Officer</p>,
 <p class="p p1"><strong>Conference Call Participants</strong></p>,
 <p class="p p1">Polo Tang - UBS</p>,
 <p class="p p1">Frederic Boulan - Bank of America Merrill Lynch</p>,
 <p class="p p1">Ulrich Rathe - Jefferies International</p>,
 <p class="p p1">Joshua Mills - Exane BNP Paribas</p>,
 <p class="p p1">Akhil Dattani - J.P. Morgan</p>,
 <p class="p p1">Christian Fangmann - HSBC Bank</p>,
 <p class="p p1">Robert Grindle - Deutsche Bank</p>,
 <p class="p p1">Jacob Bluestone

### Reflections 

This can be easily modified to iterate through all our urls
in order to scrape the entire collection
