# Predict stock prices using RNNs
Following Lilan Weng's [blog post](https://lilianweng.github.io/lil-log/2017/07/08/predict-stock-prices-using-RNN-part-1.html)
* [*her code*](https://github.com/lilianweng/stock-rnn)

### Variables
* ${W_i}$: Window i
* ${w}$: size of windows 
* ${p_i}$: stock price at end of day i


**Predict:** ${W_{t+1}}$
* where ${W_{t+1}} = (p_{(t+1)w}, p_{(t+1)w+1}, ..., p_{(t+2)w-1})$
* and ${W_{t}} = (p_{tw}, p_{tw+1}, ..., p_{(t+1)w-1})$

i.e., we learn a function $f(W_0, ..., W_t) \approx W_{t+1}$

**RNN:**
![](imgs/unrolled_RNN.png)

* `input_size`: number of timesteps (days) in a window
* `num_steps`: number of windows in a single training ex 

## 1 - Fetching and formatting the data
### 1.1 - Fetching
The [pandas](https://pandas.pydata.org/pandas-docs/version/0.15/tutorials.html) library is used for data access / manipulation

In [3]:
import click
import os
import pandas as pd
import random
import time
import urllib

#from BeautifulSoup import BeautifulSoup
from datetime import datetime

#np.version.version
#print(tf.__version__)

In [11]:
DATA_DIR = "data"
RANDOM_SLEEP_TIMES = (1, 5)

# This repo "github.com/datasets/s-and-p-500-companies" has some other information about
# S & P 500 companies.
SP500_LIST_URL = "https://raw.githubusercontent.com/datasets/s-and-p-500-companies-financials/master/data/constituents-financials.csv"
SP500_LIST_PATH = os.path.join(DATA_DIR, "constituents-financials.csv")


def _download_sp500_list():
    if os.path.exists(SP500_LIST_PATH):
        return

    f = urllib.request.urlopen(SP500_LIST_URL)
    print ("Downloading ...", SP500_LIST_URL)
    
    #with open(SP500_LIST_PATH, 'w') as fin:
    #    print(fin, f.read, file=sys.stderr)
    #print("here we read:")
    #print(f.read())
    #exit()
    with open(SP500_LIST_PATH, 'wb') as fin:
        shutil.copyfileobj(f, fin)
    
def _load_symbols():
    _download_sp500_list()
    df_sp500 = pd.read_csv(SP500_LIST_PATH)
    df_sp500.sort_values('Market Cap', ascending=False, inplace=True)
    # all symbols(?) w/o duplicates, in a list
    stock_symbols = df_sp500['Symbol'].unique().tolist()
    print ("Loaded %d stock symbols", len(stock_symbols))
    return stock_symbols
    
# symbol: symbol of company in S&P    
# out_name: .csv to which info will be written    
def fetch_prices(symbol, out_name):
    """
    Fetch daily stock prices for stock `symbol`, since 1980-01-01.
    Args:
        symbol (str): a stock abbr. symbol, like "GOOG" or "AAPL".
    Returns: a bool, whether the fetch is succeeded.
    """
    now_datetime = datetime.now().strftime("%b+%d,+%Y")
    # XXX: this no longer works
    BASE_URL = "https://finance.google.com/finance/historical?output=csv&q={0}&startdate=Jan+1%2C+1980&enddate={1}"
    symbol_url = BASE_URL.format(
        urllib.request.quote(symbol),
        urllib.request.quote(now_datetime, '+')
    )
    
    f = urllib.request.urlopen(symbol_url)
    with open(out_name, 'w') as fin:
        #     depend    s
        #print >> fin, f.read()
        print(f.read(), end="", file=depend)
#     try:
# #             f = urllib.request.urlopen(symbol_url)
# #             with open(out_name, 'w') as fin:
# #                 #     depend    s
# #                 #print >> fin, f.read()
# #                 print(f.read(), end="", file=depend)
#     except:
#         print("Failed when fetching " + symbol_url)
#         return False

In [13]:
_load_symbols()
fetch_prices("ABT", "data/constituents-financials.csv")

Loaded %d stock symbols 505


HTTPError: HTTP Error 403: Forbidden

In [5]:
'''
price_df = get_data_panda()
# look at first 3 rows of data
#price_df[['Date', 'Close']][:3]
price_df[:3]

'NEM'
'''

BASE_URL = "https://finance.google.com/finance/historical?output=csv&q={0}&startdate=Jan+1%2C+1980&enddate={1}"
symbol_url = BASE_URL.format(
        urllib.request.quote('NEM'),
        urllib.request.quote(datetime.now().strftime("%b+%d,+%Y"), '+')
    )
symbol_url

'https://finance.google.com/finance/historical?output=csv&q=NEM&startdate=Jan+1%2C+1980&enddate=May+26%2C+2020'

In [6]:
price_df['Close'].plot().set_ylabel("Stock price ($)")

NameError: name 'price_df' is not defined