# Scraping Yahoo Finance

## Week 7. Practice Programming Assignment 1

In this assignment you are required to look at historical data for 30 companies from [Dow Jones Index](https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average). Tickers for companies from the index can be found in *dow_jones_tickers.txt*. For each company you should get historical daily stock prices for 2019 from https://finance.yahoo.com/, and then use the data to answer the questions you will find below. 

###### Import coursera grader tools

In [1]:
import sys
sys.path.append("..")
import grading
grader = grading.Grader(assignment_key="FpZrXMbETcuStX7z6jFv2Q", 
                      all_parts=["RbP2k", "LLRai", "hgIbw", "Tx3OG", "Y05pG",
                                 "lFPeF", "htUtf", "Xri0I", "4JfUm", "oD7pP"])

In [126]:
# token expires every 30 min
COURSERA_EMAIL = 'slavik9709@gmail.com'
COURSERA_TOKEN = '8inVeEacbZsz8ODC'

<br><br><br><br>

### Coding part

In [56]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
# from selenium.webdriver.common.by import By
# from selenium.webdriver.support import expected_conditions as EC
# from selenium.webdriver.support.ui import WebDriverWait
import requests
import re
from bs4 import BeautifulSoup
from tqdm import tqdm
import pandas as pd
import numpy as np

In [4]:
webdriver_path = r'C:\Users\Yaroslav\Documents\MDS2020\coding\data_scraping\chromedriver.exe'

In [5]:
dow_jones_companies = []

with open('dow_jones_tickers.txt') as f:
    for ticker in f:
        dow_jones_companies.append(ticker.strip())

In [6]:
driver = webdriver.Chrome(webdriver_path)

In [7]:
url = 'https://finance.yahoo.com/quote/AAA/history?period1=1546300800&period2=1577836800&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true'

In [41]:
data = []
for ticker in tqdm(dow_jones_companies):
    driver.get(url.replace('AAA', ticker))
    for i in range(10):
        driver.find_element_by_tag_name('html').send_keys([Keys.PAGE_DOWN for i in range(30)])
    soup = BeautifulSoup(driver.page_source)
    results = soup.find_all('tr', {'class': 'BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)'})
    
    if len(results) < 250:
        if ticker not in {'DOW'}:
            raise Exception(f'something wrong with {ticker}!')
        
    for el in results:
        record = el.find_all('span')
        if len(record) != 7:
            continue
        data.append({'ticker': ticker,
                    'date': record[0].text,
                    'open': float(record[1].text),
                    'high': float(record[2].text),
                    'low': float(record[3].text),
                    'close': float(record[4].text),
                    'adj_close': float(record[5].text),
                    'volume': int(record[6].text.replace(',', ''))})

100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:01<00:00,  4.05s/it]


In [99]:
data = pd.DataFrame(data)
data['date'] = pd.to_datetime(data['date'])
data['chg'] = (data['close'] / data['open'] - 1) * 100
data['month'] = data['date'].dt.month

In [100]:
data.head()

Unnamed: 0,ticker,date,open,high,low,close,adj_close,volume,chg,month
0,AXP,2019-12-31,124.29,124.57,123.78,124.49,121.51,2340400,0.160914,12
1,AXP,2019-12-30,125.2,125.46,124.18,124.3,121.33,2306500,-0.71885,12
2,AXP,2019-12-27,125.84,125.97,125.11,125.19,122.2,1788600,-0.516529,12
3,AXP,2019-12-26,124.98,125.44,124.53,125.41,122.41,1486600,0.344055,12
4,AXP,2019-12-24,124.95,125.33,124.38,124.74,121.76,953500,-0.168067,12


In [43]:
print(data.shape)

(7507, 8)


In [44]:
252 * 30

7560

In [59]:
px_chg = []
for ticker in dow_jones_companies:
    px_chg.append((data[data['ticker'] == ticker]['close'].iloc[0] / data[data['ticker'] == ticker]['open'].iloc[-1] - 1) * 100)

In [69]:
dow_jones_companies[np.argmin(px_chg)]

'WBA'

In [94]:
data['chg'].argmin()

5723

In [95]:
data.iloc[5723, 0]

'UNH'

In [107]:
monthly_data = []
for ticker in dow_jones_companies:
    for m in data[data['ticker'] == ticker]['month'].unique():
        df = data[(data['ticker'] == ticker) & (data['month'] == m)]
        chg = (df['close'].iloc[0] / df['open'].iloc[-1] - 1) * 100
        monthly_data.append({'ticker': ticker, 'month': m, 'chg': chg})

monthly_data = pd.DataFrame(monthly_data)

In [120]:
monthly_data.groupby(['month'])['chg'].mean().argmax()

0

In [55]:
driver.get(url.replace('AAA', 'AAPL'))
for i in range(20):
    driver.find_element_by_tag_name('html').send_keys([Keys.PAGE_DOWN for i in range(30)])

In [11]:
soup = BeautifulSoup(driver.page_source)

In [12]:
results = soup.find_all('tr', {'class': 'BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)'})

In [25]:
results[0].find_all('span')[0]

<span data-reactid="53">Dec 31, 2019</span>

In [132]:
driver.quit()

<br><br>

### Questions

<br><br>

**Question 1.** What is the average change of price over the year (in %)?

*Note 1*. The opening price is the price at which a stock first trades upon the opening of an exchange on a trading day.

*Note 2*. The closing price for any stock is the final price at which it trades during regular market hours on any given day.

*Note 3*. Here by the price change we going to mean a ratio of a closing price in the last day of the period to an opening price in the first day of that period, subtracted one and multiplied by 100.

Example: if a price of a stock in day 1 opened at \\$100, and its close price in the last day was \\$120, then the price change during the period is: $$\left(\dfrac{120}{100}-1\right) * 100 = (1.2 - 1) * 100=20.$$

The price grew by 20%.

In [63]:
answer_part_1 = 23.664716525481847

In [64]:
# Setting our answers to grader. Do not change!
grader.set_answer("RbP2k", answer_part_1)

<br>

**Question 2.** What company's stock price grew the most (in %)? Enter ticker of the company as an answer)

In [67]:
answer_part_2 = 'AAPL'

In [68]:
# Setting our answers to grader. Do not change!
grader.set_answer("LLRai", answer_part_2)

<br>

**Question 3.** What company's stock lost in price the most (in %)? Enter ticker of the company as an answer

In [70]:
answer_part_3 = 'WBA'

In [71]:
# Setting our answers to grader. Do not change!
grader.set_answer("hgIbw", answer_part_3)

<br>

**Question 4.** What company had the largest summary volume over the year? Enter ticker of the company as an answer

In [78]:
answer_part_4 = 'AAPL'

In [79]:
# Setting our answers to grader. Do not change!
grader.set_answer("Tx3OG", answer_part_4)

<br>

**Question 5.** What is the biggest stock price daily increase (in %)? Enter the number 

In [84]:
answer_part_5 = 7.740997118000381

In [85]:
# Setting our answers to grader. Do not change!
grader.set_answer("Y05pG", answer_part_5)

<br><br>

**Question 6.** What is the company that had the biggest stock price daily increase? Enter ticker of the company as an answer

In [89]:
answer_part_6 = 'BA'

In [90]:
# Setting our answers to grader. Do not change!
grader.set_answer("lFPeF", answer_part_6)

<br>

**Question 7.** What is the biggest stock price daily decrease (in %)? Enter the number

In [92]:
answer_part_7 = -7.16356455611108

In [93]:
# Setting our answers to grader. Do not change!
grader.set_answer("htUtf", answer_part_7)

<br>

**Question 8.** What is the company that had the biggest stock price daily decrese (in %)? Enter ticker of the company as an answer

In [96]:
answer_part_8 = 'UNH'

In [97]:
# Setting our answers to grader. Do not change!
grader.set_answer("Xri0I", answer_part_8)

<br>

**Question 9.** What was the best month for all companies (i.e. average monthly price increase was the best)? Enter one of the following: January, February, March, April, May, June, July, August, September, October, November, December

In [127]:
answer_part_9 = 'January'

In [128]:
# Setting our answers to grader. Do not change!
grader.set_answer("4JfUm", answer_part_9)

<br>

**Question 10.** What was the worst month for all companies (i.e. average monthly price increase was the worst)? Enter one of the following: January, February, March, April, May, June, July, August, September, October, November, December

In [129]:
answer_part_10 = 'May'

In [130]:
# Setting our answers to grader. Do not change!
grader.set_answer("oD7pP", answer_part_10)

<br>
<br>

### Submitting answers

In [131]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

Submitted to Coursera platform. See results on assignment page!
