# Betalyzer
This app analyzes the beta from 500 tickers.

## Technologies
I used the following components (other than the obvious like Python, HTML, CSS, jQuery).
 - Quandl: I chose [Quandl](http://www.quandl.com) as the primary datasource because the API is documented and the data sources are much fuller than Yahoo Finance.
 - Pandas: 
 - Flask:
 - Bokeh:
 - Bootstrap: 
 - Datatables:
 - Jupyter Notebook:

## Structure
Betalyzer is a Flask app but there are several files.
 - **`betalyzer.py`:** All of the business logic is in this file.
 - **`app.py`:** This is the Flask app and handles the execution of the web app.
 - **`templates/index.html`:** We only needed one page -- the index page -- for this project.
 - **Pickles:** For performance reasons, I do not want to fetch the data and recalculate the historical betas every time a user opens the web page. So I store the calculations in pickle files, including `df_tickers` for all the tickers, `df_betas` as a timeseries for historical betas by ticker and `df_changes` as a timeseries for daily changes by ticker.
 - **`betalyzer.ipynb`:** This file. This is the resulting report.
 - **`play.ipynb`:** I have left the play Notebooks in place where I did some of my sratch work, though none of them are actually used.

## Calculation Engine
I spent quite a bit of time optimizing the calculation engine. We break the steps down:
 - Get Tickers: The first step is to get a list of relevant tickers. Unfortunately, I couldn't find an easy way to do this using Quandl or Yahoo Finance, but I did find that NASDAQ publishes a daily list of securities traded on its exchange. `nasdaq_url` in `betalyzer.py` sets this source. `read_nasdaq()` 

## Extensions
Should this have been a production level project, here are some extensions that could have been added:
 - **Additional data:** We currently only get NASDAQ data. We should get data from NYSE as well to cover most equities. We also do not get ETFs. Additionally, there are some shortcomings in the adjustments to the SPY data (it is not total return) that need to be taken into account. 
 - **Datastore:** The data is currently stored in pickled DataFrames, and only the data we need (the rest is discarded). Ideally, we'd move this to a Postgres database, and perhaps Redshift if the data starts moving beyond a few GBs, for optimal performance. 
 - **Recalculation:** Currently, on recalculation, all data is fetched from Quandl, even data that we've used before, and all calculations are made again. Ideally, we should only fetch new data and calculate based on the new data. 
 - **Auto-recalculation:** Currently, only manual recalculation is supported. I would add a process that auto-recalculates every business day at a certain time when both NASDAQ and Quandl have updated their data. Perhaps an asynchronus process and a scheduler (Celery and Celery Beat is what I've used in the past) could help here. 