## Pulling Option Chains from Finance Yahoo

## Installation Steps

This parser depends on the following libraries being installed:
- yahoo_fin
- mysql

In order to install yahoo_fin, please visit: 
http://theautomatic.net/yahoo_fin-documentation/

In order to install mysql, please visit:
https://dev.mysql.com/downloads/mysql/
An Oracle user account and password is required to obtain this download. Registration and use of MySQL Community Server are free.

Additionally, install pymysql, from a terminal bash shell:
- pip install pymysql

Once MySQL is installed:
- In a terminal window, create an alias for MySQL
    - alias mysql=/usr/local/mysql/bin/mysql
    - set +H
- In a terminal window, log in to MySQL as root
    - mysql --user=root --password=<root_password>
- Once logged into MySQL: 
    - Create a database named FinanceYahoo
        CREATE DATABASE FinanceYahoo;
    - Create a user named FinanceYahoo
        CREATE USER 'FinanceYahoo'@'localhost' IDENTIFIED BY 'F!n@nc3Y@h00';
    - Grant all permission to user FinanceYahoo on database FinanceYahoo
        GRANT ALL PRIVILEGES ON FinanceYahoo.* TO 'FinanceYahoo'@'localhost';

### Import Libraries from yahoo_fin.stock_info and yahoo_fin.options
Once yahoo_fin is installed, import required libraries as indicated below:

In [1]:
from yahoo_fin.stock_info import tickers_dow
from yahoo_fin.stock_info import tickers_nasdaq
from yahoo_fin.stock_info import tickers_other
from yahoo_fin.stock_info import tickers_sp500
from yahoo_fin.options import get_expiration_dates
from yahoo_fin.options import get_options_chain

### Import Libraries to interact with MySQL

In [7]:
from sqlalchemy import create_engine
import pymysql

#This is needed in order to pass special characters in the MySQL connection string
from urllib import parse as urlparse

### Import DateTime

In [3]:
# This is needed to convert string date time stamps into datetime objects
from dateutil.parser import parse
# This is needed to invoke the pandas to_datetime function
import pandas as pd

## Get the list of unique tickers
- Pull all available tickers
- Eliminate duplicates if they exist
- Eliminate tickers that contain special characters

In [4]:
tickers = []
tickers = tickers_dow()
tickers.extend(tickers_nasdaq())
tickers.extend(tickers_other())
tickers.extend(tickers_sp500())
tickers = list(dict.fromkeys(tickers)) #This instruction eliminates duplicate tickers
tickers = [item for item in tickers if "-" not in item and "$" not in item and "." not in item]

## Connect to MySQL

In [8]:
#urllib.parse.quote_plus is used to url encode the special characters in the password
sql_connection_string = 'mysql+pymysql://FinanceYahoo:%s@localhost/FinanceYahoo' % urlparse.quote_plus('F!n@nc3Y@h00')
sql_engine = create_engine(sql_connection_string)
sql_connection = sql_engine.connect()

## Parse Finance Yahoo Data


- Create a function to parse the dataframes resulting from calling get_options_chain

In [9]:
def parse_dataframe(df,expiration):
    df.loc[df['Last Price'] == '-','Last Price'] = None
    df.loc[df['Bid'] == '-','Bid'] = None
    df.loc[df['Ask'] == '-','Ask'] = None
    df.loc[df['Change'] == '-','Change'] = None
    df['% Change'] = df['% Change'].map(lambda x: x.lstrip('+-').rstrip('%').replace(',',''))
    df.loc[df['% Change'] == '','% Change'] = '0'
    df.loc[df['Open Interest'] == '-','Open Interest'] = '0'
    df.loc[df['Volume'] == '-','Volume'] = '0'
    df['Implied Volatility'] = df['Implied Volatility'].map(lambda x: x.rstrip('%').replace(',',''))
    df['Last Trade Date'] = df['Last Trade Date'].map(lambda x: x.rstrip(' EDT'))
    df['Last Trade Date'] = pd.to_datetime(df['Last Trade Date'],format="%Y-%m-%d %I:%M%p")
    df['Ticker'] = df['Contract Name'].str.extract(r'([A-Z]*)')
    df['Expiration'] = parse(expiration)
    df['OptionType'] = df['Contract Name'].str.extract(r'[A-Z]*[0-9]{6}([A-Z])')
    return df.astype({"% Change" : float, "Volume" : int, "Implied Volatility" : float}) 

### The next code cell is a test showing the result of calling parse_dataframe with the calls dataframe of a specific ticker and expiration date
- This cell does not need to be run except for testing purposes

In [10]:
#Test
ticker = 'AAPL'
date = get_expiration_dates(ticker)[1]
options_chain = get_options_chain(ticker,date)
dataframe = parse_dataframe(options_chain['calls'],date)
dataframe


  result = method(y)


Unnamed: 0,Contract Name,Last Trade Date,Strike,Last Price,Bid,Ask,Change,% Change,Volume,Open Interest,Implied Volatility,Ticker,Expiration,OptionType
0,AAPL200605C00145000,2020-05-28 15:54:00,145.0,172.75,172.40,174.00,-2.43,1.39,1,0,252.73,AAPL,2020-06-05,C
1,AAPL200605C00155000,2020-05-28 15:54:00,155.0,162.75,163.60,163.95,-2.45,1.48,0,0,220.61,AAPL,2020-06-05,C
2,AAPL200605C00160000,2020-05-06 14:10:00,160.0,141.75,157.25,159.15,0.00,0.00,0,1,230.86,AAPL,2020-06-05,C
3,AAPL200605C00175000,2020-05-19 12:38:00,175.0,142.03,142.25,144.45,0.00,0.00,0,1,150.39,AAPL,2020-06-05,C
4,AAPL200605C00180000,2020-05-19 12:38:00,180.0,137.05,137.25,139.15,0.00,0.00,0,1,195.46,AAPL,2020-06-05,C
5,AAPL200605C00190000,2020-05-07 11:15:00,190.0,113.01,127.60,129.25,0.00,0.00,0,6,140.23,AAPL,2020-06-05,C
6,AAPL200605C00195000,2020-05-13 19:08:00,195.0,94.63,122.25,124.15,0.00,0.00,0,1,171.29,AAPL,2020-06-05,C
7,AAPL200605C00200000,2020-05-07 11:15:00,200.0,102.99,117.25,119.20,0.00,0.00,21,21,165.19,AAPL,2020-06-05,C
8,AAPL200605C00205000,2020-05-13 15:50:00,205.0,102.35,112.25,114.20,0.00,0.00,5,7,157.62,AAPL,2020-06-05,C
9,AAPL200605C00210000,2020-05-13 11:38:00,210.0,101.41,107.25,109.20,0.00,0.00,0,3,150.15,AAPL,2020-06-05,C


### Traverse the list of unique tickers
    - For optionable tickers, get all available expiration dates
    - Get the option chains (calls and puts) for each ticker / expiration date pair
    - Append each option to the database

In [None]:
ticker_count = 1

for ticker in tickers:
    print(f'{ticker}: {ticker_count} of {len(tickers)}')    
    ticker_count += 1
    
    for date in get_expiration_dates(ticker):
        try:
            options_chain = get_options_chain(ticker,date)
        except:
            print(f'No option chains for {ticker},{date}')
        
        try:
            df = parse_dataframe(options_chain['calls'],date)
            df.to_sql('Option',sql_connection,if_exists='append')
        except:
            print(f'Skipped {ticker},{date}')
        
        try:
            df = parse_dataframe(options_chain['puts'],date)
            df.to_sql('Option',sql_connection,if_exists='append')
        except:
            print(f'Skipped {ticker},{date}')
print('Done')

AAPL: 1 of 4267
No option chains for AAPL,July 2, 2020
Skipped AAPL,July 2, 2020
Skipped AAPL,July 2, 2020
AXP: 2 of 4267
BA: 3 of 4267
CAT: 4 of 4267
No option chains for CAT,June 12, 2020
Skipped CAT,June 12, 2020
Skipped CAT,June 12, 2020
CSCO: 5 of 4267
CVX: 6 of 4267
DIS: 7 of 4267
DOW: 8 of 4267
No option chains for DOW,June 12, 2020
Skipped DOW,June 12, 2020
Skipped DOW,June 12, 2020
No option chains for DOW,June 19, 2026
Skipped DOW,June 19, 2026
Skipped DOW,June 19, 2026
GS: 9 of 4267
No option chains for GS,June 12, 2020
Skipped GS,June 12, 2020
Skipped GS,June 12, 2020
No option chains for GS,June 26, 2020
Skipped GS,June 26, 2020
Skipped GS,June 26, 2020
HD: 10 of 4267
No option chains for HD,June 12, 2020
Skipped HD,June 12, 2020
Skipped HD,June 12, 2020
No option chains for HD,July 2, 2020
Skipped HD,July 2, 2020
Skipped HD,July 2, 2020
No option chains for HD,March 20, 2026
Skipped HD,March 20, 2026
Skipped HD,March 20, 2026
IBM: 11 of 4267
No option chains for IBM,Augus

ALIM: 158 of 4267
ALJJ: 159 of 4267
ALKS: 160 of 4267
ALLK: 161 of 4267
No option chains for ALLK,July 17, 2020
Skipped ALLK,July 17, 2020
Skipped ALLK,July 17, 2020
ALLO: 162 of 4267
ALLT: 163 of 4267
No option chains for ALLT,July 17, 2020
Skipped ALLT,July 17, 2020
Skipped ALLT,July 17, 2020
No option chains for ALLT,December 18, 2020
Skipped ALLT,December 18, 2020
Skipped ALLT,December 18, 2020
ALNA: 164 of 4267
ALNY: 165 of 4267
ALOT: 166 of 4267
ALP: 167 of 4267
ALR: 168 of 4267
ALRM: 169 of 4267
ALRS: 170 of 4267
ALSK: 171 of 4267
No option chains for ALSK,January 15, 2021
Skipped ALSK,January 15, 2021
Skipped ALSK,January 15, 2021
ALT: 172 of 4267
ALTM: 173 of 4267
ALTR: 174 of 4267
No option chains for ALTR,January 15, 2021
Skipped ALTR,January 15, 2021
Skipped ALTR,January 15, 2021
ALTY: 175 of 4267
ALX: 176 of 4267
ALYA: 177 of 4267
AMAG: 178 of 4267
AMAL: 179 of 4267
AMAT: 180 of 4267
AMBA: 181 of 4267
AMCA: 182 of 4267
AMCI: 183 of 4267
AMCIU: 184 of 4267
AMCIW: 185 of 426

No option chains for ATRC,January 15, 2021
Skipped ATRC,January 15, 2021
Skipped ATRC,January 15, 2021
ATRI: 320 of 4267
ATRO: 321 of 4267
ATRS: 322 of 4267
ATSG: 323 of 4267
ATVI: 324 of 4267
ATXI: 325 of 4267
AUB: 326 of 4267
AUDC: 327 of 4267
AUPH: 328 of 4267
AUTL: 329 of 4267
AUTO: 330 of 4267
No option chains for AUTO,June 19, 2020
Skipped AUTO,June 19, 2020
Skipped AUTO,June 19, 2020
No option chains for AUTO,October 16, 2020
Skipped AUTO,October 16, 2020
Skipped AUTO,October 16, 2020
No option chains for AUTO,January 15, 2021
Skipped AUTO,January 15, 2021
Skipped AUTO,January 15, 2021
AVAV: 331 of 4267
AVCO: 332 of 4267
AVCT: 333 of 4267
AVCTW: 334 of 4267
AVDL: 335 of 4267
AVEO: 336 of 4267
AVGO: 337 of 4267
AVGOP: 338 of 4267
AVGR: 339 of 4267
AVI: 340 of 4267
AVID: 341 of 4267
No option chains for AVID,January 15, 2021
Skipped AVID,January 15, 2021
Skipped AVID,January 15, 2021
AVNW: 342 of 4267
AVRO: 343 of 4267
No option chains for AVRO,July 17, 2020
Skipped AVRO,July 17, 

No option chains for BSRR,June 19, 2020
Skipped BSRR,June 19, 2020
Skipped BSRR,June 19, 2020
No option chains for BSRR,September 18, 2020
Skipped BSRR,September 18, 2020
Skipped BSRR,September 18, 2020
BSTC: 544 of 4267
BSV: 545 of 4267
No option chains for BSV,January 15, 2021
Skipped BSV,January 15, 2021
Skipped BSV,January 15, 2021
No option chains for BSV,March 19, 2021
Skipped BSV,March 19, 2021
Skipped BSV,March 19, 2021
BTAI: 546 of 4267
BTB: 547 of 4267
BTEC: 548 of 4267
BUG: 549 of 4267
BUSE: 550 of 4267
No option chains for BUSE,July 17, 2020
Skipped BUSE,July 17, 2020
Skipped BUSE,July 17, 2020
BVXV: 551 of 4267
BWAY: 552 of 4267
BWB: 553 of 4267
BWE: 554 of 4267
BWFG: 555 of 4267
BWMX: 556 of 4267
BXRX: 557 of 4267
BYFC: 558 of 4267
BYND: 559 of 4267
BYSI: 560 of 4267
BZU: 561 of 4267
CA: 562 of 4267
CAAS: 563 of 4267
CABA: 564 of 4267
CAC: 565 of 4267
CACC: 566 of 4267
CACG: 567 of 4267
CAKE: 568 of 4267
CALA: 569 of 4267
No option chains for CALA,December 18, 2020
Skippe

CMBM: 740 of 4267
CMCO: 741 of 4267
No option chains for CMCO,July 17, 2020
Skipped CMCO,July 17, 2020
Skipped CMCO,July 17, 2020
CMCSA: 742 of 4267
CMCT: 743 of 4267
CMCTP: 744 of 4267
CME: 745 of 4267
CMFNL: 746 of 4267
CMI: 747 of 4267
CMLS: 748 of 4267
CMPR: 749 of 4267
No option chains for CMPR,January 15, 2021
Skipped CMPR,January 15, 2021
Skipped CMPR,January 15, 2021
CMRX: 750 of 4267
No option chains for CMRX,July 17, 2020
Skipped CMRX,July 17, 2020
Skipped CMRX,July 17, 2020
CMTL: 751 of 4267
CNA: 752 of 4267
CNBKA: 753 of 4267
CNCE: 754 of 4267
No option chains for CNCE,June 19, 2020
Skipped CNCE,June 19, 2020
Skipped CNCE,June 19, 2020
No option chains for CNCE,January 15, 2021
Skipped CNCE,January 15, 2021
Skipped CNCE,January 15, 2021
CNCR: 755 of 4267
CNDT: 756 of 4267
CNET: 757 of 4267
No option chains for CNET,August 21, 2020
Skipped CNET,August 21, 2020
Skipped CNET,August 21, 2020
CNFR: 758 of 4267
CNFRL: 759 of 4267
CNNB: 760 of 4267
CNOB: 761 of 4267
No option chai

DBX: 913 of 4267
DCAR: 914 of 4267
DCOM: 915 of 4267
DCOMP: 916 of 4267
DCPH: 917 of 4267
DCTH: 918 of 4267
DDIV: 919 of 4267
DDOG: 920 of 4267
DE: 921 of 4267
DFF: 922 of 4267
DFNL: 923 of 4267
DFPH: 924 of 4267
DFPHU: 925 of 4267
DFPHW: 926 of 4267
DGICA: 927 of 4267
No option chains for DGICA,July 17, 2020
Skipped DGICA,July 17, 2020
Skipped DGICA,July 17, 2020
No option chains for DGICA,December 18, 2020
Skipped DGICA,December 18, 2020
Skipped DGICA,December 18, 2020
DGICB: 928 of 4267
DGII: 929 of 4267
No option chains for DGII,July 17, 2020
Skipped DGII,July 17, 2020
Skipped DGII,July 17, 2020
DGLD: 930 of 4267
DGLY: 931 of 4267
No option chains for DGLY,November 20, 2020
Skipped DGLY,November 20, 2020
Skipped DGLY,November 20, 2020
DGRE: 932 of 4267
DGRS: 933 of 4267
DGRW: 934 of 4267
DHC: 935 of 4267
DHCNI: 936 of 4267
DHCNL: 937 of 4267
DHIL: 938 of 4267
DINT: 939 of 4267
DIOD: 940 of 4267
DISCA: 941 of 4267
DISCB: 942 of 4267
DISCK: 943 of 4267
DISH: 944 of 4267
No option cha

EQRR: 1098 of 4267
ERI: 1099 of 4267
ERIC: 1100 of 4267
ERIE: 1101 of 4267
ERII: 1102 of 4267
ERV: 1103 of 4267
ERYP: 1104 of 4267
ESBK: 1105 of 4267
ESCA: 1106 of 4267
No option chains for ESCA,October 16, 2020
Skipped ESCA,October 16, 2020
Skipped ESCA,October 16, 2020
ESEA: 1107 of 4267
ESGD: 1108 of 4267
ESGE: 1109 of 4267
ESGR: 1110 of 4267
ESGRO: 1111 of 4267
ESGRP: 1112 of 4267
ESGU: 1113 of 4267
ESLT: 1114 of 4267
ESPO: 1115 of 4267
No option chains for ESPO,July 17, 2020
Skipped ESPO,July 17, 2020
Skipped ESPO,July 17, 2020
No option chains for ESPO,September 18, 2020
Skipped ESPO,September 18, 2020
Skipped ESPO,September 18, 2020
ESPR: 1116 of 4267
ESQ: 1117 of 4267
ESR: 1118 of 4267
ESRW: 1119 of 4267
ESSA: 1120 of 4267
ESSC: 1121 of 4267
ESSCR: 1122 of 4267
ESSCU: 1123 of 4267
ESSCW: 1124 of 4267
ESTA: 1125 of 4267
ESXB: 1126 of 4267
ETE: 1127 of 4267
ETFC: 1128 of 4267
No option chains for ETFC,June 12, 2020
Skipped ETFC,June 12, 2020
Skipped ETFC,June 12, 2020
No option c

FOSL: 1302 of 4267
FOX: 1303 of 4267
FOXA: 1304 of 4267
No option chains for FOXA,July 2, 2020
Skipped FOXA,July 2, 2020
Skipped FOXA,July 2, 2020
FOXF: 1305 of 4267
No option chains for FOXF,July 17, 2020
Skipped FOXF,July 17, 2020
Skipped FOXF,July 17, 2020
FPA: 1306 of 4267
FPAY: 1307 of 4267
FPRX: 1308 of 4267
FPXE: 1309 of 4267
FPXI: 1310 of 4267
FRA: 1311 of 4267
FRAF: 1312 of 4267
FRBA: 1313 of 4267
FRBK: 1314 of 4267
