Skip to content

Commit

Permalink
Add Alpaca pricing source and update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ttt733 committed Jun 26, 2019
1 parent cd2cb74 commit 874654f
Show file tree
Hide file tree
Showing 18 changed files with 373 additions and 70 deletions.
64 changes: 44 additions & 20 deletions README.md
Expand Up @@ -15,11 +15,11 @@ If you are looking to use this library for your Quantopian algorithm,
check out the [migration document](./migration.md).

## Data Sources
This library predominantly relies on the [IEX public data API](https://iextrading.com/developer/docs/) for daily
prices and fundamentals, but plans to connect to other data sources in
the future. Currently supported data sources include the following.
This library predominantly relies on the [Alpaca Data API](https://docs.alpaca.markets/api-documentation/api-v2/market-data/) for daily
price data. For users with funded Alpaca brokerage accounts, several [Polygon](https://polygon.io/) fundamental
data endpoints are supported. [IEX Cloud](https://iexcloud.io/docs/api/) data is also supported, though if too much
data is requested, it stops being free. (See the note in the IEX section below.)

- [Alpaca/Polygon](https://docs.alpaca.markets/)

## Install

Expand Down Expand Up @@ -75,8 +75,47 @@ and returns a DataFrame with the data for the current date (US/Eastern time).
Its constructor accepts `list_symbol` function that is supposed to return the full set of
symbols as a string list, which is used as the maximum universe inside the engine.

## Alpaca Data API
The [Alpaca Data API](https://docs.alpaca.markets/api-documentation/api-v2/market-data/) is currently the least-limited source of pricing data
supported by pipeline-live. In order to use the Alpaca Data API, you'll need to
register for an Alpaca account [here](https://app.alpaca.markets/signup) and generate API key information with
the dashboard. Once you have your keys generated, you need to store them in
the following environment variables:

```
APCA_API_BASE_URL
APCA_API_KEY_ID
APCA_API_SECRET_KEY
```

### pipeline_live.data.iex.pricing.USEquityPricing
This class provides the basic price information retrieved from
[Alpaca Data API](https://docs.alpaca.markets/api-documentation/api-v2/market-data/bars/).

## Polygon Data Source API
You will need to set an [Alpaca](https://alpaca.markets/) API key as `APCA_API_KEY_ID` to use this API.

### pipeline_live.data.polygon.fundamentals.PolygonCompany
This class provides the DataSet interface using
[Polygon Symbol Details API](https://polygon.io/docs/#!/Meta-Data/get_v1_meta_symbols_symbol_company)

### pipeline_live.data.polygon.filters.IsPrimaryShareEmulation
Experimental. This class filteres symbols by the following
rule to return something close to
[IsPrimaryShare()](https://www.quantopian.com/help#quantopian_pipeline_filters_fundamentals_IsPrimaryShare) in Quantopian.

- must be a US company
- must have a valid financial data

## IEX Data Source API
You don't have to configure anything to use these API
To use IEX-source data, you need to sign up for an IEX Cloud account and save
your IEX token as an environment variable called `IEX_TOKEN`.

IMPORTANT NOTE: IEX data is now limited for free accounts. In order to
avoid using more messages than you are allotted each month, please
be sure that you are not using IEX-sourced factors too frequently
or on too many securities. For more information about how many messages
each method will cost, please refer to [this part](https://iexcloud.io/docs/api/#data-weighting) of the IEX Cloud documentation.

### pipeline_live.data.iex.pricing.USEquityPricing
This class provides the basic price information retrieved from
Expand All @@ -102,18 +141,3 @@ A shortcut for `IEXCompany.sector.latest`

### pipeline_live.data.iex.classifiers.Industry()
A shortcut for `IEXCompany.industry.latest`

## Alpaca/Polygon Data Source API
You will need to set [Alpaca](https://alpaca.markets/) API key to use these API.

### pipeline_live.data.polygon.fundamentals.PolygonCompany
This class provides the DataSet interface using
[Polygon Symbol Details API](https://polygon.io/docs/#!/Meta-Data/get_v1_meta_symbols_symbol_company)

### pipeline_live.data.polygon.filters.IsPrimaryShareEmulation
Experimental. This class filteres symbols by the following
rule to return something close to
[IsPrimaryShare()](https://www.quantopian.com/help#quantopian_pipeline_filters_fundamentals_IsPrimaryShare) in Quantopian.

- must be a US company
- must have a valid financial data
40 changes: 25 additions & 15 deletions migration.md
Expand Up @@ -12,20 +12,22 @@ pylivetrader can run the pipeline object from this package.
## USEquityPricing
The most important class to think about first is the USEquityPricing class
and it is well covered by
`pipeline_live.data.iex.pricing.USEquityPricing` class.
`pipeline_live.data.alpaca.pricing.USEquityPricing` class.
This class gets the market-wide daily price data (OHLCV) up to the
previous day from [IEX chart API](https://iextrading.com/developer/docs/#chart).
Depending on the requested window length from its upstream pipeline, it
fetches different size of the data range (e.g. 3m, 1y). Again, the volume of
this data is market-wide size, so it's safe to use this with factors such
as AverageDollarVolume.
previous day from [Alpaca data API](https://docs.alpaca.markets/api-documentation/api-v2/market-data/bars/).

## Factors
In order to use many of the builtin factors with this price data loader,
you need to use `pipeline_live.data.iex.factors` package which has
all the builtin factor classes ported from zipline.
you need to use `pipeline_live.data.alpaca.factors` package which has
all the builtin factor classes ported from zipline. Use of the Alpaca data API
requires an Alpaca account, which you can sign up for [here](https://app.alpaca.markets/signup).

For example, if you have these lines,
Once you have an Alpaca account, you will need to store your account info
from their dashboard as environment variables. You can find information about
how to do so on [this documentation page](https://docs.alpaca.markets/api-documentation/how-to/).

To use the Alpaca factors, import them from `pipeline_live.data.alpaca.factors`.
For example, if you have these lines on Quantopian,

```py
from quantopian.pipeline.factors import (
Expand All @@ -37,10 +39,10 @@ from quantopian.pipeline.data.builtin import USEquityPricing
you can rewrite it to something like this.

```py
from pipeline_live.data.iex.factors import (
from pipeline_live.data.alpaca.factors import (
AverageDollarVolume, SimpleMovingAverage,
)
from pipeline_live.data.iex.pricing import USEquityPricing
from pipeline_live.data.alpaca.pricing import USEquityPricing
```

Of course, the builtin factor classes in the original zipline are mostly
Expand All @@ -49,27 +51,35 @@ ones, they also work with this `USEquityPricing`.

```py
from zipline.pipeline.factors import AverageDollarVolume
from pipeline_live.data.iex.pricing import USEquityPricing
from pipeline_live.data.alpaca.pricing import USEquityPricing

dollar_volume = AverageDollarVolume(
inputs=[USEquityPricing.close, USEquityPricing.volume],
window_length=20,
)
```

The only difference in the factor classes in `pipeline_live.data.iex.factors`
is that some of the classes have IEX's USEquityPricing as the default
The only difference in the factor classes in `pipeline_live.data.alpaca.factors`
is that some of the classes have Alpaca's USEquityPricing as the default
inputs, so you don't need to explicitly specify it.

## Fundamentals
The Quantopian platform allows you to retrieve various proprietary data
sources through pipeline, including Morningstar fundamentals. While the
intention of this pipline-live library is to add more such proprietary
data sources, the free alternative at the moment is IEX. There are two
data sources, the alternative at the moment is IEX. There are two
main dataset classes are builtin in this library, `IEXCompany` and
`IEXKeyStats`. Those both belong to the `pipeline_live.data.iex.fundamentals`
package.

Please note that, in order to use the IEX API data, you will need to sign up
for an IEX Cloud account [here](https://iexcloud.io/cloud-login#/register/) and set your IEX Cloud token in the
`IEX_TOKEN` environment variable. IEX limits your API messages per month. In
order to avoid running over your message quota, please make sure that you
filter your stock universe as much as possible before using IEX API data.
If you wish to use IEX data to frequently filter a larger set of symbols, you
may need to upgrade your IEX Cloud account.

### IEXCompany
This dataset class maps the basic stock information from the
[Company API](https://iextrading.com/developer/docs/#company).
Expand Down
Empty file.
35 changes: 35 additions & 0 deletions pipeline_live/data/alpaca/factors.py
@@ -0,0 +1,35 @@
'''
Duplicate builtin factor classes in zipline with IEX's USEquityPricing
'''

from zipline.pipeline.data import USEquityPricing as z_pricing
from zipline.pipeline import factors as z_factors

from .pricing import USEquityPricing as alpaca_pricing


def _replace_inputs(inputs):
map = {
z_pricing.open: alpaca_pricing.open,
z_pricing.high: alpaca_pricing.high,
z_pricing.low: alpaca_pricing.low,
z_pricing.close: alpaca_pricing.close,
z_pricing.volume: alpaca_pricing.volume,
}

if type(inputs) not in (list, tuple, set):
return inputs
return tuple([
map.get(inp, inp) for inp in inputs
])


for name in dir(z_factors):
factor = getattr(z_factors, name)
if factor != z_factors.Factor and hasattr(
factor, 'inputs') and issubclass(
factor, z_factors.Factor):
new_factor = type(factor.__name__, (factor,), {
'inputs': _replace_inputs(factor.inputs)
})
locals()[factor.__name__] = new_factor
23 changes: 23 additions & 0 deletions pipeline_live/data/alpaca/pricing.py
@@ -0,0 +1,23 @@
from zipline.pipeline.data.dataset import Column, DataSet
from zipline.utils.numpy_utils import float64_dtype

from .pricing_loader import USEquityPricingLoader


# In order to use it as a cache key, we have to make it singleton
_loader = USEquityPricingLoader()


class USEquityPricing(DataSet):
"""
Dataset representing daily trading prices and volumes.
"""
open = Column(float64_dtype)
high = Column(float64_dtype)
low = Column(float64_dtype)
close = Column(float64_dtype)
volume = Column(float64_dtype)

@staticmethod
def get_loader():
return _loader
115 changes: 115 additions & 0 deletions pipeline_live/data/alpaca/pricing_loader.py
@@ -0,0 +1,115 @@
import numpy as np
import logbook
import pandas as pd

from zipline.lib.adjusted_array import AdjustedArray
from zipline.pipeline.loaders.base import PipelineLoader
from zipline.utils.calendars import get_calendar
from zipline.errors import NoFurtherDataError

from pipeline_live.data.sources import alpaca


log = logbook.Logger(__name__)


class USEquityPricingLoader(PipelineLoader):
"""
PipelineLoader for US Equity Pricing data
"""

def __init__(self):
cal = get_calendar('NYSE')

self._all_sessions = cal.all_sessions

def load_adjusted_array(self, columns, dates, symbols, mask):
# load_adjusted_array is called with dates on which the user's algo
# will be shown data, which means we need to return the data that would
# be known at the start of each date. We assume that the latest data
# known on day N is the data from day (N - 1), so we shift all query
# dates back by a day.
start_date, end_date = _shift_dates(
self._all_sessions, dates[0], dates[-1], shift=1,
)

sessions = self._all_sessions
sessions = sessions[(sessions >= start_date) & (sessions <= end_date)]

timedelta = pd.Timestamp.utcnow() - start_date
chart_range = timedelta.days + 1
log.info('chart_range={}'.format(chart_range))
prices = alpaca.get_stockprices(chart_range)

dfs = []
for symbol in symbols:
if symbol not in prices:
df = pd.DataFrame(
{c.name: c.missing_value for c in columns},
index=sessions
)
else:
df = prices[symbol]
df = df.reindex(sessions, method='ffill')
dfs.append(df)

raw_arrays = {}
for c in columns:
colname = c.name
raw_arrays[colname] = np.stack([
df[colname].values for df in dfs
], axis=-1)
out = {}
for c in columns:
c_raw = raw_arrays[c.name]
out[c] = AdjustedArray(
c_raw.astype(c.dtype),
{},
c.missing_value
)
return out


def _shift_dates(dates, start_date, end_date, shift):
try:
start = dates.get_loc(start_date)
except KeyError:
if start_date < dates[0]:
raise NoFurtherDataError(
msg=(
"Pipeline Query requested data starting on {query_start}, "
"but first known date is {calendar_start}"
).format(
query_start=str(start_date),
calendar_start=str(dates[0]),
)
)
else:
raise ValueError("Query start %s not in calendar" % start_date)

# Make sure that shifting doesn't push us out of the calendar.
if start < shift:
raise NoFurtherDataError(
msg=(
"Pipeline Query requested data from {shift}"
" days before {query_start}, but first known date is only "
"{start} days earlier."
).format(shift=shift, query_start=start_date, start=start),
)

try:
end = dates.get_loc(end_date)
except KeyError:
if end_date > dates[-1]:
raise NoFurtherDataError(
msg=(
"Pipeline Query requesting data up to {query_end}, "
"but last known date is {calendar_end}"
).format(
query_end=end_date,
calendar_end=dates[-1],
)
)
else:
raise ValueError("Query end %s not in calendar" % end_date)
return dates[start - shift], dates[end - shift]
1 change: 1 addition & 0 deletions pipeline_live/data/polygon/filters.py
Expand Up @@ -22,6 +22,7 @@ def compute(self, today, symbols, out, *inputs):
], dtype=bool)
out[:] = ary


class StaticSymbols(CustomFilter):
inputs = ()
window_length = 1
Expand Down
41 changes: 41 additions & 0 deletions pipeline_live/data/sources/alpaca.py
@@ -0,0 +1,41 @@
import alpaca_trade_api as tradeapi

from .util import (
daily_cache, parallelize
)


def list_symbols():
return [
a.symbol for a in tradeapi.REST().list_assets()
if a.tradable and a.status == 'active'
]


def get_stockprices(limit=365, timespan='day'):
all_symbols = list_symbols()

@daily_cache(filename='alpaca_chart_{}'.format(limit))
def get_stockprices_cached(all_symbols):
return _get_stockprices(all_symbols, limit, timespan)

return get_stockprices_cached(all_symbols)


def _get_stockprices(symbols, limit=365, timespan='day'):
'''Get stock data (key stats and previous) from Alpaca.
Just deal with Alpaca's 200 stocks per request limit.
'''

def fetch(symbols):
barset = tradeapi.REST().get_barset(symbols, timespan, limit)
data = {}
for symbol in barset:
df = barset[symbol].df
# Update the index format for comparison with the trading calendar
df.index = df.index.tz_convert('UTC').normalize()
data[symbol] = df.asfreq('C')

return data

return parallelize(fetch, splitlen=199)(symbols)

0 comments on commit 874654f

Please sign in to comment.