# Scraping Yahoo Finance

## Week 7. Practice Programming Assignment 1

In this assignment you are required to look at historical data for 30 companies from [Dow Jones Index](https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average). Tickers for companies from the index can be found in *dow_jones_tickers.txt*. For each company you should get historical daily stock prices for 2019 from https://finance.yahoo.com/, and then use the data to answer the questions you will find below. 

### Coding part

In [1]:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.remote.webelement import WebElement


def get_elem(selector: By, name: str) -> WebElement:
    while True:
        try:
            return driver.find_element(selector, name)
        except:
            time.sleep(0.5)


def get_elems(elem: WebElement, selector: By, name: str) -> WebElement:
    while True:
        try:
            return elem.find_elements(selector, name)
        except:
            time.sleep(1)


def get_months(lst: list, best: tuple, worst: tuple) -> tuple:
    ms = dict()
    for line in lst:
        if len(line) != 7:
            continue

        m = line[0].text[:3]
        if m in ms.keys():
            ms[m]["s"] = float(line[4].text)
        else:
            ms[m] = {"s": 0, "e": float(line[1].text)}

    ms = {k: (v["e"] / v["s"] - 1) * 100 for k, v in ms.items() if v["e"] != ""}
    ms = dict(sorted(ms.items(), key=lambda x: x[1]))

    if list(ms.items())[0][1] < worst[1]:
        worst = list(ms.items())[0]

    if list(ms.items())[-1][1] > best[1]:
        best = list(ms.items())[-1]

    return best, worst


driver = webdriver.Chrome()
price_change = dict()
volume = dict()
inc = ("", 0)
dec = ("", 0)
best = ("", 0)
worst = ("", 0)
with open("dow_jones_tickers.txt", mode="r", encoding="utf-8") as f:
    for ticker in f:
        if ticker.strip() == "":
            break

        driver.get(f"https://finance.yahoo.com/quote/{ticker.strip()}/history")
        get_elem(By.CLASS_NAME, "tertiary-btn").click()
        if ticker.strip() == "DOW":
            dt = "21.03.2019"
        else:
            dt = "01.01.2019"

        get_elem(By.NAME, "startDate").send_keys(dt)
        get_elem(By.NAME, "endDate").send_keys("01.01.2020")
        get_elem(By.CLASS_NAME, "primary-btn").click()
        time.sleep(1.5)

        tbody = get_elem(By.TAG_NAME, "tbody")
        lines = get_elems(tbody, By.XPATH, "./*")
        date_1 = float(get_elems(lines[-1], By.XPATH, "./*")[1].text)
        date_2 = float(get_elems(lines[0], By.XPATH, "./*")[-3].text)
        print(ticker.strip(), date_2, date_1)
        ls = [get_elems(l, By.XPATH, "./*") for l in lines]
        vols = [l[-1].text.replace(",", "") for l in ls if len(l) == 7]
        volume[ticker.strip()] = max(map(float, vols))

        # open / close
        # ? or higt / open ?
        incs = [float(l[1].text) / float(l[4].text) for l in ls if len(l) == 7]
        incs_max = (max(incs) - 1) * 100
        if incs_max > inc[1]:
            inc = (ticker.strip(), incs_max)

        incs_min = (min(incs) - 1) * 100
        if incs_min < dec[1]:
            dec = (ticker.strip(), incs_min)

        best, worst = get_months(ls, best, worst)
        price_change[ticker.strip()] = ((date_2 / date_1) - 1) * 100

AXP 124.49 93.91
AAPL 73.41 38.72
BA 325.76 316.19
CAT 147.68 124.03
CSCO 47.96 42.28
CVX 120.51 107.34
XOM 69.78 67.35
GS 229.93 164.33
HD 218.38 169.71
IBM 128.15 107.08
INTC 59.85 45.96
JNJ 145.87 128.13
KO 55.35 46.94
JPM 139.4 95.95
MCD 197.61 175.41
MMM 147.51 157.04
MRK 86.78 71.84
MSFT 157.7 99.55
NKE 101.31 72.79
PFE 37.17 40.91
PG 124.9 91.03
TRV 136.95 117.49
UNH 293.98 245.0
RTX 94.25 66.18
VZ 61.4 56.16
V 187.9 130.0
WBA 58.96 67.2
WMT 39.61 30.55
DIS 144.63 108.1
DOW 54.73 49.99


In [16]:
price_change

{'WBA': -12.261904761904763,
 'PFE': -9.142019066242957,
 'MMM': -6.068517575140097,
 'BA': 3.0266611847306946,
 'XOM': 3.6080178173719446,
 'VZ': 9.330484330484335,
 'DOW': 9.481896379275835,
 'CVX': 12.269424259362772,
 'MCD': 12.656062938258938,
 'CSCO': 13.434247871333959,
 'JNJ': 13.845313353625222,
 'TRV': 16.56311175419185,
 'KO': 17.91648913506605,
 'CAT': 19.067967427235356,
 'IBM': 19.676877101232737,
 'UNH': 19.99183673469389,
 'MRK': 20.796213808463236,
 'HD': 28.678333627953556,
 'WMT': 29.656301145662844,
 'INTC': 30.22193211488251,
 'AXP': 32.56309232243637,
 'DIS': 33.792784458834426,
 'PG': 37.207514006371525,
 'NKE': 39.18120620964418,
 'GS': 39.919673827055306,
 'RTX': 42.41462677546084,
 'V': 44.53846153846155,
 'JPM': 45.28400208441896,
 'MSFT': 58.41285786037167,
 'AAPL': 89.59194214876032}

In [30]:
volume

{'AXP': 9872500.0,
 'AAPL': 365248800.0,
 'BA': 36922600.0,
 'CAT': 17421400.0,
 'CSCO': 103123400.0,
 'CVX': 42693700.0,
 'XOM': 35092000.0,
 'GS': 15194200.0,
 'HD': 14972200.0,
 'IBM': 23078630.0,
 'INTC': 86455700.0,
 'JNJ': 25868700.0,
 'KO': 58905400.0,
 'JPM': 31115200.0,
 'MCD': 17662100.0,
 'MMM': 17516855.0,
 'MRK': 46684837.0,
 'MSFT': 55636400.0,
 'NKE': 25330700.0,
 'PFE': 95739668.0,
 'PG': 30802700.0,
 'TRV': 3551600.0,
 'UNH': 27361400.0,
 'RTX': 21901664.0,
 'VZ': 42977900.0,
 'V': 20162000.0,
 'WBA': 36877800.0,
 'WMT': 67512600.0,
 'DIS': 65253500.0,
 'DOW': 19663400.0}

In [18]:
incs_min, incs_max

(-4.203650979735385, 5.388994307400363)

In [12]:
best, worst

(('Apr', 23.651231001688732), ('May', -16.283488504655143))

<br><br>

### Questions

<br><br>

**Question 1.** What is the average change of price over the year (in %)?

*Note 1*. The opening price is the price at which a stock first trades upon the opening of an exchange on a trading day.

*Note 2*. The closing price for any stock is the final price at which it trades during regular market hours on any given day.

*Note 3*. Here by the price change we going to mean a ratio of a closing price in the last day of the period to an opening price in the first day of that period, subtracted one and multiplied by 100.

Example: if a price of a stock in day 1 opened at \\$100, and its close price in the last day was \\$120, then the price change during the period is: $$\left(\dfrac{120}{100}-1\right) * 100 = (1.2 - 1) * 100=20.$$

The price grew by 20%.

In [9]:
import pandas as pd

pd.Series(price_change.values()).mean()

23.85516302707843

In [10]:
import numpy as np

answer_part_1 = np.mean(list(price_change.values()))
answer_part_1

23.85516302707843

<br>

**Question 2.** What company's stock price grew the most (in %)? Enter ticker of the company as an answer)

In [21]:
price_change = dict(sorted(price_change.items(), key=lambda x: x[1]))
answer_part_2 = list(price_change.items())[-1][0]
answer_part_2  # AAPL

'AAPL'

<br>

**Question 3.** What company's stock lost in price the most (in %)? Enter ticker of the company as an answer

In [22]:
answer_part_3 = list(price_change.items())[0][0]
answer_part_3  # WBA

'WBA'

<br>

**Question 4.** What company had the largest summary volume over the year? Enter ticker of the company as an answer

In [32]:
volume = dict(sorted(volume.items(), key=lambda x: x[1]))
answer_part_4 = list(volume.items())[-1][0]
answer_part_4  # AAPL

'AAPL'

<br>

**Question 5.** What is the biggest stock price daily increase (in %)? Enter the number 

In [24]:
answer_part_5 = inc[1]
answer_part_5  # 7.716328747284562

7.716328747284562

<br><br>

**Question 6.** What is the company that had the biggest stock price daily increase? Enter ticker of the company as an answer

In [34]:
answer_part_6 = dec[0]
answer_part_6  # BA

'UNH'

<br>

**Question 7.** What is the biggest stock price daily decrease (in %)? Enter the number

In [26]:
answer_part_7 = dec[1]
answer_part_7  # -7.184820379490519

-7.184820379490519

<br>

**Question 8.** What is the company that had the biggest stock price daily decrese (in %)? Enter ticker of the company as an answer

In [13]:
answer_part_8 = inc[0]
answer_part_8  # UNH

''

<br>

**Question 9.** What was the best month for all companies (i.e. average monthly price increase was the best)? Enter one of the following: January, February, March, April, May, June, July, August, September, October, November, December

In [3]:
answer_part_9 = best[0]
answer_part_9
# ! May, April, Apr, March, December

'Apr'

<br>

**Question 10.** What was the worst month for all companies (i.e. average monthly price increase was the worst)? Enter one of the following: January, February, March, April, May, June, July, August, September, October, November, December

In [5]:
answer_part_10 = worst[0]
answer_part_10  # May

'May'

<br>
<br>

#### Submit your answers