-
Notifications
You must be signed in to change notification settings - Fork 0
Yahoo HTML Reader
This script reads a folder on my desktop which contains downloaded HTML files of Yahoo Finance pages. Each stock has its own directory, which is referred to as "each_dir" in the code, and by the actual stock name in the path. It then takes the following elements for each HTML file:
+The script takes the ticker symbol(ticker) by splitting the path of the folder by backslashes and taking the second-to-last one or [-1].
+It takes the date(date_stamp) found in the file name and converts it to unix time(unix_time).
+It takes the Debt-to-Equity Ratio(value) by splitting the file by the HTML tags surrounding the desired value. Price(stock_price) is taken the same way.
The script also reads YAHOO-Index_GSPC.csv, a file containing SP500 data, and saves it to a dataframe (sp500_df). sp500_df is organized by date, so the script goes goes row by row and saves the adjusted close to the variable sp500_value. Some of the sp500 data is taken on a weekend, so the script will subtract 3 days if the date does not match the day of unix_time. This serves to avoid weekends and 3 day holidays. This is a somewhat crude way to do this task and it can definitely be improved, but it is a decent estimation for now.
The variables(date_stamp,unix_time,value,stock_price,sp500_value) are all taken in the same for-loop which iterates through each file. The script appends this information to a Pandas dataframe which it saves as a .csv file.