# Sweet n Sour Sentiment on the Street
Georgia Tech Data Science Bootcamp - Cohort 6
Final Project
Team Members:
* Joseph Ayala
* Andrew Behrman
* Michael Fox
* Michael Hankinson

### Transcript Scraper

#### This notebook is designed to scrape the most recent Earnings Call Transcript for each of 100 S&P500 companies, chunk the call into 4 categories (Operator dialogue, Company presentation, Analyst Questions, Company Responses), and produce a dataset of the constituent sentences in each category. It leverages a custom-built API to do the web scraping and data prep before inserting into a MongoDb.

In [1]:
# Dependencies
import requests
import json
import pandas as pd
import random

#### Scrape call list from SeekingAlpha website
Get transcript data for specified tickers

Source: SeekingAlpha.com

In [4]:
# read in list of tickers
list_tickers = pd.read_csv('../db/hundred_tickers.csv')
#list_tickers = list_tickers[77:100]
list_tickers

Unnamed: 0,ticker
0,VRTX
1,D
2,WELL
3,ORCL
4,NTRS
...,...
102,UAA
103,WFC
104,TWTR
105,AMZN


##### Iterate through list of tickers, call api for each, and store resulting record to mongodb

In [3]:
import pymongo
import time

# Initialize PyMongo to work with MongoDBs
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

# Define database and collection
db = client.sweet_n_sour
collection = db.NEW_call_transcripts

# process each ticker
for ticker in list_tickers['ticker']:
    
    try:
        # call api
        response = requests.get(f'http://127.0.0.1:5000/api/get-scripts/{ticker}/1')
        call_dict = response.json()

        ## DEBUG
        #json.dumps(call_dict[0])
        ##

        # store record in mongo
        if len(call_dict) > 0:
            collection.insert_one(call_dict[0])
            print(f'{ticker}: wrote to mongo')
        else:
            print(f'{ticker}: {response.text}')

        # pause few secs between calls
        sleeper = random.choice([1,2,3,4])
        print(f'sleeping {sleeper} seconds...')
        time.sleep(sleeper)
        
    except:
        print(f'{ticker}: caught error, skipping...')


VRTX: wrote to mongo
sleeping 4 seconds...
D: []

sleeping 1 seconds...
WELL: []

sleeping 4 seconds...
ORCL: []

sleeping 1 seconds...
NTRS: caught error, skipping...
RCL: []

sleeping 3 seconds...
REG: caught error, skipping...
EQT: caught error, skipping...
CFG: []

sleeping 4 seconds...
TSN: []

sleeping 3 seconds...
DVA: []

sleeping 1 seconds...
EVHC: []

sleeping 2 seconds...
CRM: []

sleeping 1 seconds...
MMC: []

sleeping 1 seconds...
BF.B: []

sleeping 3 seconds...
UNM: []

sleeping 2 seconds...
ADM: []

sleeping 2 seconds...
ETR: []

sleeping 2 seconds...
SCG: []

sleeping 4 seconds...
PX: []

sleeping 1 seconds...
GLW: caught error, skipping...
BKNG: caught error, skipping...
ROP: caught error, skipping...
UDR: caught error, skipping...
MDT: caught error, skipping...
MSI: caught error, skipping...
CBRE: caught error, skipping...
NFLX: caught error, skipping...
MLM: caught error, skipping...
ANDV: caught error, skipping...
AIZ: caught error, skipping...
CMI: caught error, sk