<a href="https://colab.research.google.com/github/MaraniMatias/Deep-RL/blob/master/Stocks_Trading_Using_RL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stocks Trading Using RL

## Trading

Since the first financial market was established, people have been trying to predict future price movements, as this promises lots of benefits, like “profit from nowhere” or protecting capital from sudden market movements. This problem is known to be complex and there are lots of financial consultants, investment funds, banks, and individual traders who are trying to predict the market and find the best moments to buy and sell to maximize profit.

The question is: can we look at the problem from the RL angle? We have some observation of the market and we want to make a decision: buy, sell, or wait. If we buy before the price goes up, our profit will be positive, otherwise, we’ll get a negative reward. What we’re trying to do is to get as much profit as possible. The connections between market trading and RL are quite obvious.

## Date

Inside the archive, we have CSV files with M1 bars, which means that every row in the CSV corresponds to a single minute in time and price movement during this minute is captured with four prices: open, high, low, and close. Here, an open price is the price at the beginning of the minute, high is the maximum price during the interval, low is the minimum price, and the close price is the last price of the minute time interval.
Every minute interval is called bar and allows us to have an idea of price movement within the interval

## Problem statements and key decisions

Flexibility, in this case, is good and bad at the same time. It’s good that we have the freedom to pass some information to the agent that we feel will be important to learn efficiently. For example, you can pass to the trading agent not only prices but also the information about news or important statistics to be published (which is known to influence financial markets a lot). The bad part is that this flexibility usually means that to find a good agent, you need to try lots of variants of data representation and it’s not always obvious which will work better. In our case, we’ll implement the basic trading agent in its simplest form. 

At every step, which will be after every minute’s bar, the agent can take one of the following actions:
**Do nothing:** Skip the bar without taking actions
**Buy a share:** If the agent has already got the share, nothing will be bought, otherwise we’ll pay the commission, which is usually some small percentage of the current price
**Close the position:** If we’ve got no share previously bought, nothing will happen, otherwise we’ll pay the commission for the trade.

The **reward** that the agent receives could be expressed in various ways. On the one hand, we can split the reward into multiple steps during our ownership of the share. In that case, the reward on every step will be equal to the last bar’s movement. On the other hand, the agent can receive reward only after the close action and receive full reward at once. At the first sight, both variants should have the same final result, but maybe with different convergence speed. However, in practice, the difference could be dramatic. We’ll implement both variants to have a chance to compare them.


One last decision to make is how to represent the prices in our environment observation. Ideally, we would like our agent to be independent on actual price values and take into account relative movement, such as “stock has grown 1% during the last bar” or “stock has lost 5%." This makes sense, as different stocks’ prices can vary, but they can have similar movement patterns. In finance, there exists a branch of analytics called “technical analysis," which studies such patterns to help to make predictions from them. We would like our system to be able to discover them (if they exist). To achieve this, we’ll convert every bar “open, high, low, and close” prices to three numbers showing high, low, and close prices represented as a percentage to the open price.


_This representation has its own drawbacks, as we’re potentially losing the information about key price levels._



## Data

> Note: [Read file on google drive](https://colab.research.google.com/drive/1JmwtF5OmSghC-y3-BkvxLan0zYXqCJJf#scrollTo=CJ9ijZC3Q1Xl)


In [0]:
!pip install --upgrade -q gspread

In [0]:
# Load the Drive helper and mount
# from google.colab import drive

# This will prompt for authorization.
# drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Buscar los CSV en Google Drive, tenes que subirlos a tu drive

In [0]:
from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

# Datos de entrenamiento
DEFAULT_STOCKS = gc.open('YNDX_150101_151231').sheet1
# Datos de validacion
DEFAULT_VAL_STOCKS = gc.open('YNDX_160101_161231').sheet1

# get_all_values gives a list of rows.
rows = DEFAULT_STOCKS.get_all_values()
# print(rows)
# rows = DEFAULT_VAL_STOCKS.get_all_values()
# print(rows)

# Convert to a DataFrame and render.
import pandas as pd
pd.DataFrame.from_records(rows)


<module 'pandas' from '/usr/local/lib/python3.6/dist-packages/pandas/__init__.py'>

Crear las etiquetas para las columnas del csv

In [0]:
import os
import csv
import glob
import numpy as np
import collections

Prices = collections.namedtuple('Prices', field_names=['open', 'high', 'low', 'close', 'volume'])

In [0]:
def read_csv(file_name, sep=',', filter_data=True, fix_open_price=False):
    print("Reading", file_name)
    with open(file_name, 'rt', encoding='utf-8') as fd:
        reader = csv.reader(fd, delimiter=sep)
        h = next(reader)
        if '<OPEN>' not in h and sep == ',':
            return read_csv(file_name, ';')
        indices = [h.index(s) for s in ('<OPEN>', '<HIGH>', '<LOW>', '<CLOSE>', '<VOL>')]
        o, h, l, c, v = [], [], [], [], []
        count_out = 0
        count_filter = 0
        count_fixed = 0
        prev_vals = None
        for row in reader:
            vals = list(map(float, [row[idx] for idx in indices]))
            if filter_data and all(map(lambda v: abs(v-vals[0]) < 1e-8, vals[:-1])):
                count_filter += 1
                continue

            po, ph, pl, pc, pv = vals

            # fix open price for current bar to match close price for the previous bar
            if fix_open_price and prev_vals is not None:
                ppo, pph, ppl, ppc, ppv = vals
                if abs(po - ppc) > 1e-8:
                    count_fixed += 1
                    po = ppc
                    pl = min(pl, po)
                    ph = max(ph, po)
            count_out += 1
            o.append(po)
            c.append(pc)
            h.append(ph)
            l.append(pl)
            v.append(pv)
            prev_vals = vals
    print("Read done, got %d rows, %d filtered, %d open prices adjusted" % (
        count_filter + count_out, count_filter, count_fixed))
    return Prices(open=np.array(o, dtype=np.float32),
                  high=np.array(h, dtype=np.float32),
                  low=np.array(l, dtype=np.float32),
                  close=np.array(c, dtype=np.float32),
                  volume=np.array(v, dtype=np.float32))


In [0]:
def prices_to_relative(prices):
    """
    Convert prices to relative in respect to open price
    :param ochl: tuple with open, close, high, low
    :return: tuple with open, rel_close, rel_high, rel_low
    """
    assert isinstance(prices, Prices)
    rh = (prices.high - prices.open) / prices.open
    rl = (prices.low - prices.open) / prices.open
    rc = (prices.close - prices.open) / prices.open
    return Prices(open=prices.open, high=rh, low=rl, close=rc, volume=prices.volume)

In [0]:
def price_files(dir_name):
    result = []
    for path in glob.glob(os.path.join(dir_name, "*.csv")):
        result.append(path)
    return result

In [0]:
def load_relative(csv_file):
    return prices_to_relative(read_csv(csv_file))

In [0]:
def load_year_data(year, basedir='data'):
    y = str(year)[-2:]
    result = {}
    for path in glob.glob(os.path.join(basedir, "*_%s*.csv" % y)):
        result[path] = load_relative(path)
    return result

## Environment
