# Sweet n Sour Sentiment on the Street
Georgia Tech Data Science Bootcamp - Cohort 6
Final Project
Team Members:
* Joseph Ayala
* Andrew Behrman
* Michael Fox
* Michael Hankinson

### Transcript Scraper

#### This notebook is designed to scrape the most recent Earnings Call Transcript for each of 100 S&P500 companies, chunk the call into 4 categories (Operator dialogue, Company presentation, Analyst Questions, Company Responses), and produce a dataset of the constituent sentences in each category. It leverages a custom-built API to do the web scraping and data prep before inserting into a MongoDb.

In [1]:
# Dependencies
import requests
import json
import pandas as pd
import random

#### Scrape call list from SeekingAlpha website
Get transcript data for specified tickers

Source: SeekingAlpha.com

In [2]:
# read in list of tickers
list_tickers = pd.read_csv('../db/hundred_tickers.csv')
#list_tickers = list_tickers[77:100]
list_tickers

Unnamed: 0,ticker
0,VRTX
1,D
2,WELL
3,ORCL
4,NTRS
...,...
102,UAA
103,WFC
104,TWTR
105,AMZN


##### Iterate through list of tickers, call api for each, and store resulting record to mongodb

In [37]:
import pymongo
import time

# Initialize PyMongo to work with MongoDBs
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

# Define database and collection
db = client.sweet_n_sour
collection = db.NEW_call_transcripts

# process each ticker
for ticker in list_tickers['ticker']:
    
    try:
        # call api
        response = requests.get(f'http://127.0.0.1:5000/api/get-scripts/{ticker}/1')
        call_dict = response.json()

        ## DEBUG
        #json.dumps(call_dict[0])
        ##

        # store record in mongo
        if len(call_dict) > 0:
            collection.insert_one(call_dict[0])
            print(f'{ticker}: wrote to mongo')
        else:
            print(f'{ticker}: {response.text}')

        # pause few secs between calls
        sleeper = random.choice([1,2,3,4])
        print(f'sleeping {sleeper} seconds...')
        time.sleep(sleeper)
        
    except:
        print(f'{ticker}: caught error, skipping...')


VAR: wrote to mongo
sleeping 1 seconds...
MET: caught error, skipping...
CAG: wrote to mongo
sleeping 4 seconds...
ACN: wrote to mongo
sleeping 4 seconds...
DVN: wrote to mongo
sleeping 4 seconds...
VTR: wrote to mongo
sleeping 2 seconds...
NEM: wrote to mongo
sleeping 4 seconds...
NFX: []

sleeping 4 seconds...
BAC: wrote to mongo
sleeping 2 seconds...
CINF: wrote to mongo
sleeping 4 seconds...
MTB: wrote to mongo
sleeping 1 seconds...
AVB: wrote to mongo
sleeping 4 seconds...
VRSN: wrote to mongo
sleeping 2 seconds...
CNP: wrote to mongo
sleeping 2 seconds...
KSS: wrote to mongo
sleeping 4 seconds...
TWX: wrote to mongo
sleeping 1 seconds...
AMAT: wrote to mongo
sleeping 1 seconds...
NOV: wrote to mongo
sleeping 2 seconds...
LLY: wrote to mongo
sleeping 3 seconds...
M: wrote to mongo
sleeping 2 seconds...
MAR: wrote to mongo
sleeping 4 seconds...
ALK: wrote to mongo
sleeping 3 seconds...
PRU: wrote to mongo
sleeping 1 seconds...
