### Background and Setup

So let's say you are a systematic trader and you want to build out your infrastructure to do research and backtesting. What should you look at? Today we are going to look at a couple finance libraries that may be useful for you.

Because of the level of interest, the quant finance area has some really awesome libraries. However, there are also some less well-supported libraries out there.

Today, while setting up a backtester, we will be channelling our inner hacker skills and go outside of our comfort zones to poke around unfamiliar libraries with confusing documentation.

Hopefully, by the end of this tutorial, you will also learn some new skills that you can use when you next encounter a befuddling library that you need to use.

In [None]:
# some setup stuff:
# make sure you have the most updated libraries
# pip3.6 install --user --upgrade gevent pandas quandl

# get notebook to show graphs
%pylab inline
matplotlib.style.use('ggplot')
# zipline spams a lot of warnings
import warnings
warnings.filterwarnings('ignore')

The first library that we will look at is Quandl. Quandl is one of those really awesome libraries. They are a data aggregator that gives you a unified api to directly get data. And they return that data in pandas dataframes. Super simple, and super nice.

From Quandl, you can access data from yahoo finance, to the St Louis Fred, to the commitment of traders report etc. They have a free api as well as a paid api (eg: [vendors](https://www.quandl.com/vendors) who sell their data on their platform).

Having a unified api means that you don't need to waste time getting your code to interface with 10 different data vendors, or to write different scrapers just to get your data. The flip side is they now control your data flow and if you need to access their api over 50k times a day, you may end up having to be paying for the api access.

If you do collect a lot of niche/proprietary data, the best tool for you may be [scrapy](http://scrapy.org/). They provide a very well structured way for you to write your scrapers, meaning that you won't just be stuck with a potpourri of different scrapers- instead you will once again plug into a unified pipeline as soon as possible.

There are even specialized Scraping-as-a-Service companies such as [scrapy cloud](http://scrapinghub.com) that you could consider using. Otherwise, you can also just run it as a cron job/scheduled task.


In [None]:
import quandl
quandl.ApiConfig.api_key = "nCnqK9fotdzGHTfUUsz1"

# first, go to quandl.com and search for whatever ticker symbol tickles your fancy
# let's say we want to look at (continuous) front month crude vs e-mini S&Ps
cl = quandl.get('CHRIS/CME_CL1')
es = quandl.get('CHRIS/CME_ES1')

In [None]:
# Quandl.get() returns a pandas dataframe, so you can use all the pandas goodies
# For example, you can use tail to look at the most recent data, just like the unix tail binary!
cl.tail()

In [None]:
# you can also get statistics
es.describe()

Next, let's take a look at zipline. We are going to try to get it to use Quandl data for the backtest. Zipline is a backtesting library that Quantopian opened sourced, but also develops and use for themselves.

Quantopian, by the way, offers a really cool platform for you to develop quant algorithms, with backtesters, and risk analytics etc, and I've heard nothing but good things about them. The main problems that I have with it is that the tradeable universe is currently US equities only (useless for a macro guy), and you don't really have the flexibility to add whatever you want and mold it to fit your needs.

On the other hand, I may be way too extreme when it comes to flexibility. I am all about customizing my text editor (yay vim!), and the git project I've contributed to most is probably my vimrc/bashrc etc dotfiles config repo. I also need to customize my email client so much that I'm use mutt. So _you_ may not actually find the inflexibility as suffocating as I do. If that is the case, Quantopian might actually be your best option when you are starting out- everything works out of the box etc.

The problem, however, is that documentation for zipline itself is a bit sparse. Try to follow the [example](http://www.zipline.io/beginner-tutorial.html#my-first-algorithm) from the documentation yourself, and try to get it to work with the Quandl dataset that we just downloaded.

Some problems that I encountered were:
- where do you get these functions? (ie. where do you imported them from?)
- where is buyapple.py?
- once I substituted in the Quandl dataset, there was a error with comparing timezone-naive and timezone-aware dates
- how does zipline know which price is which from the data? Do I need to label the data somehow?
- there are no more errors, but why are the trades not getting filled?

Some tips:
- search for the functions within the zipline github repo to find where you can import them
- test out your guesses- figure out how to isolate your guesses, by separating out the relevant code into it's own notebook cell block, using print statements, and raising exceptions
- read important files within the github repo source code to try to understand it more
- search on stack overflow for the relavant stack trace/error
- search for code snippets

In [None]:
import zipline as zp
from zipline.api import order, symbol

class BuyCrude(zp.TradingAlgorithm):
    def handle_data(self, data):
        order(symbol('cl'), 10)

algo = BuyCrude()

Fortunately, we are coding in python...

In [None]:
# this means that we can peek into the object internals
algo.__dict__.keys()

In [None]:
# let's look at one of them
algo.slippage

In [None]:
import pandas as pd

# zipline looks for a column called 'volume'. if there is no volume your trade will not get filled
data = pd.concat({'es': es.Last, 'cl': cl.Last, 'volume': cl.Volume}, axis=1)

# zipline requires timezone aware timestamps
data_with_tz = data.tz_localize('UTC')

data_with_tz.head()

Oh hey s&p futures don't go back that far... Let's cut off all the data that is irrelevant.

In [None]:
earliest_es_date = es.index[0]
# at first glance, you could just do
data_with_tz[earliest_es_date:].head()

In [None]:
# but just in case there is no matching precise date, we can also take the closest date instead
closest_date = data.index.searchsorted(earliest_es_date)
data_with_tz.iloc[closest_date:].head()

In [None]:
# For the purposes of this tutorial, let's only take one yr of data so this will run faster for now

from datetime import datetime, timedelta
import pytz

one_year_ago = datetime.now(tz=pytz.utc) - timedelta(days=365)
limited_data = data_with_tz[data_with_tz.index > one_year_ago]

In [None]:
report = algo.run(limited_data)

In [None]:
# the result that we get back is a pandas dataframe
report.iloc[0]

In [None]:
report.portfolio_value.plot()

In [None]:
# we can see that we are buying every single day
report.transactions.head()

In [None]:
# let's make sure we can access both variables
# and let's play with day of the week effects

class PairTradeAlgo(zp.TradingAlgorithm):
    def handle_data(self, data):
        if data['cl'].dt.dayofweek == 3:
            order(symbol('cl'), 10)
            order(symbol('es'), -1)

report = PairTradeAlgo().run(limited_data)

In [None]:
# cool, we double check that we buy only every week now
report.transactions.head(10)

In [None]:
# let's show the portfolio value chart with the chart of crude and s&p at the same time
fig = plt.figure()

fig.add_subplot(311)
report.portfolio_value.plot()

fig.add_subplot(312)
limited_data.cl.plot()

fig.add_subplot(313)
limited_data.es.plot()

hmm we want to make a pretty chart that also shows our buys/sells entry/exit points. We could parse this from report.transactions, but _there must be a better way_!

In [None]:
class WeeklyAlgo(zp.TradingAlgorithm):
    def handle_data(self, data):
        buy = False
        timestamp = data['cl'].dt
        if (timestamp.dayofweek * timestamp.month) % 17 == 3:
            order(symbol('cl'), 10)
            buy = True
        # this puts a buy column and a cl column into the report dataframe we get back
        self.record(buy=buy, cl=data['cl']['price'])

report = WeeklyAlgo().run(limited_data)

In [None]:
# magically, those columns appear in our report!
report[['buy', 'cl']].head()

In [None]:
# notice that the report time index vs the time stamp is different...

In [None]:
buy_dates = report[report.buy].index
buy_dates[:5]

In [None]:
crude_price_on_buy_dates = report[report.buy].cl
crude_price_on_buy_dates.head()

Finally, let's make a pretty graph!

In [None]:
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, sharex=True)

report.portfolio_value.plot(ax=ax1, title='Portfolio Value')
report.cl.plot(ax=ax2, title='Crude Price and Entry Points')
crude_price_on_buy_dates.plot(ax=ax2, style='^')


##### and below is just formatting/prettifying ######
#  If you are confused about the code below, be sure to check out
# the data visualization notebook for an explanation

fig.suptitle('Backtesting Results')
# make space for the title
fig.subplots_adjust(top=0.85)

# let's give a breakeven level
ax1.axhline(100000, color='black')

# how many major ticks there are
ax1.yaxis.set_major_locator(MaxNLocator(5))
ax2.yaxis.set_major_locator(MaxNLocator(5))

ax1.tick_params(
    which='both',  # both major and minor ticks
    bottom='off', top='off', right='off',
    labelbottom='off'  # labels along the bottom edge are off
)
ax2.tick_params(which='both', top='off', right='off')
ax2.tick_params(which='minor', bottom='off')

Below is an example of how I would have gone about trying to write the Weekly Algo, when I needed to figure out what was in the data variable, but was unable to isolate it easily.

In [None]:
from pprint import pprint
class ExplorationAlgo(zp.TradingAlgorithm):
    def handle_data(self, data):
        pprint(data.__dict__)
        pprint(data['cl'].dt.month)
        raise
report = ExplorationAlgo().run(limited_data)

If you have gotten to this stage, try writing and backtesting a more serious trading strategy!

You could also look at another way to solve this problem: adding a [new data source](https://github.com/quantopian/zipline/wiki/How-To-code-a-data-source) to zipline.

Check out the moving average and self.add_transform stuff in [this](http://nbviewer.jupyter.org/github/twiecki/financial-analysis-python-tutorial/blob/master/3.%20Backtesting%20using%20Zipline.ipynb) notebook.

[fecon235](https://github.com/rsvp/fecon235) seems like a pretty cool library with lots of different ideas you would look at. Try your hand at understand what they do, implement some strategies from there, and then back testing it with zipline!

If you enjoy learning through videos, here's a video series by [Sentdex](https://www.youtube.com/playlist?list=PLQVvvaa0QuDeN06s5ervxTfTcVvt-xpZN).

And [here](https://www.quantstart.com/articles/Free-Quantitative-Finance-Resources)'s some free data sources to play with.