Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coin api duplicated symbols fix #3262

Merged
merged 3 commits into from Jun 5, 2019

Conversation

Projects
None yet
4 participants
@Jay-Jay-D
Copy link
Collaborator

commented Jun 3, 2019

Description

This PR solves an issue with CoinApi symbols for Bitfinex, it turns out that raw data folders of older dates have duplicated data for symbols with four letters e.g. DASH.

Also changes the logic of CoinApiSymbolMapper, now making request to CoinApi API is the default behavior, but it can be configured to read a local file. This is useful for large paralellized bat processes.

Related Issue

N/A.

Motivation and Context

The duplicated symbol issue generates error when two threads tries to write the same file.

The CoinApiSymbolMapper issue hit request limits when processing processing CoinApi data in parallel.

Requires Documentation Change

No.

How Has This Been Tested?

Local run.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Non-functional change (xml comments/documentation/etc)

Checklist:

  • My code follows the code style of this project.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • My branch follows the naming convention bug-<issue#>-<description> or feature-<issue#>-<description>

@Jay-Jay-D Jay-Jay-D requested a review from StefanoRaggi Jun 3, 2019

@mchandschuh
Copy link
Collaborator

left a comment

Looks good, but questioning the file's time check as it appears to be backwards (or perhaps I'm misunderstanding it's purpose) -- either way some comments could be helpful :)

@@ -31,7 +33,8 @@ public class CoinApiSymbolMapper : ISymbolMapper
{
private const string RestUrl = "https://rest.coinapi.io";
private readonly string _apiKey = Config.Get("coinapi-api-key");

private readonly bool _useLocalSymbolList = Config.GetBool("coinapi-use-local-symbol-list", false);
private readonly FileInfo _coinApiSymbolsListFile = new FileInfo("CoinApiSymbols.json");

This comment has been minimized.

Copy link
@mchandschuh

mchandschuh Jun 4, 2019

Collaborator

Should this file be sourced from configuration instead of hard-coded?

This comment has been minimized.

Copy link
@StefanoRaggi

StefanoRaggi Jun 5, 2019

Member

Agree we should give this option with a default value:

private readonly FileInfo _coinApiSymbolsListFile = new FileInfo(
    Config.Get("coinapi-default-symbol-list-file", "CoinApiSymbols.json"));

using (var wc = new WebClient())
if (_useLocalSymbolList && _coinApiSymbolsListFile.Exists && _coinApiSymbolsListFile.CreationTimeUtc.Date < DateTime.UtcNow.Date.AddDays(-1))

This comment has been minimized.

Copy link
@mchandschuh

mchandschuh Jun 4, 2019

Collaborator

Am I reading this correct? We want to use the local file if it is older than yesterday? Also, using a file time to determine the age of data may not be the best -- wondering if it would be better to put the time the data was generated into the file instead? Consider a file being a week old and then we copy it to a new machine and it would show as new (methinks?).

This comment has been minimized.

Copy link
@StefanoRaggi

StefanoRaggi Jun 4, 2019

Member

Aside from the inverted file date check, a couple of alternatives to using the file timestamp:

  1. adding the date in the filename, e.g. CoinApiSymbols.yyyyMMdd.json
  2. reading the last date from the data itself (in the data_end field), this is the file format:
[
  {
    "symbol_id": "COINBASE_SPOT_BTC_USD",
    "exchange_id": "COINBASE",
    "symbol_type": "SPOT",
    "asset_id_base": "BTC",
    "asset_id_quote": "USD",
    "data_start": "2015-01-14",
    "data_end": "2019-05-29",
    "data_quote_start": "2015-05-17T00:51:36.2870000Z",
    "data_quote_end": "2019-05-29T22:57:31.6924043Z",
    "data_orderbook_start": "2015-05-17T00:51:36.2870000Z",
    "data_orderbook_end": "2019-05-29T19:02:17.4261189Z",
    "data_trade_start": "2015-01-14T16:07:05.0000000Z",
    "data_trade_end": "2019-05-29T23:05:46.5590000Z",
    "volume_1hrs": 421.426615810,
    "volume_1hrs_usd": 3681568.97,
    "volume_1day": 12742.682746710,
    "volume_1day_usd": 111319654.79,
    "volume_1mth": 603712.812420610,
    "volume_1mth_usd": 5274015151.04
  },
  ...
]

This comment has been minimized.

Copy link
@Jay-Jay-D

Jay-Jay-D Jun 4, 2019

Author Collaborator

My bad, not enough coffee that afternoon.

This comment has been minimized.

Copy link
@StefanoRaggi

StefanoRaggi Jun 5, 2019

Member

Thinking more about it, I'm wondering if we even need the file date check 🤔

When the CoinApiSymbolMapper is used in the CoinApiDataQueueHandler or in an occasional converter run, there will only be a single instance, so API rate limits are not an issue.

When it's used for massively parallel processing with the CoinApiDataConverter, the date check could return false for all processes and we wouldn't guarantee rate limits won't be hit. Perhaps for these scenarios, the file could be pre-downloaded (once a day) and the symbol mapper would just read the file with no date checks?

if (_useLocalSymbolList)
{
    if (!_coinApiSymbolsListFile.Exists) 
    {
        throw new Exception($"File not found: {_coinApiSymbolsListFile.FullName}");
    }
}

using (var wc = new WebClient())
if (_useLocalSymbolList && _coinApiSymbolsListFile.Exists && _coinApiSymbolsListFile.CreationTimeUtc.Date < DateTime.UtcNow.Date.AddDays(-1))

This comment has been minimized.

Copy link
@StefanoRaggi

StefanoRaggi Jun 4, 2019

Member

Aside from the inverted file date check, a couple of alternatives to using the file timestamp:

  1. adding the date in the filename, e.g. CoinApiSymbols.yyyyMMdd.json
  2. reading the last date from the data itself (in the data_end field), this is the file format:
[
  {
    "symbol_id": "COINBASE_SPOT_BTC_USD",
    "exchange_id": "COINBASE",
    "symbol_type": "SPOT",
    "asset_id_base": "BTC",
    "asset_id_quote": "USD",
    "data_start": "2015-01-14",
    "data_end": "2019-05-29",
    "data_quote_start": "2015-05-17T00:51:36.2870000Z",
    "data_quote_end": "2019-05-29T22:57:31.6924043Z",
    "data_orderbook_start": "2015-05-17T00:51:36.2870000Z",
    "data_orderbook_end": "2019-05-29T19:02:17.4261189Z",
    "data_trade_start": "2015-01-14T16:07:05.0000000Z",
    "data_trade_end": "2019-05-29T23:05:46.5590000Z",
    "volume_1hrs": 421.426615810,
    "volume_1hrs_usd": 3681568.97,
    "volume_1day": 12742.682746710,
    "volume_1day_usd": 111319654.79,
    "volume_1mth": 603712.812420610,
    "volume_1mth_usd": 5274015151.04
  },
  ...
]
var parts = x.Name.Split('-').Take(2);
return string.Join("-", parts);
}
);

This comment has been minimized.

Copy link
@StefanoRaggi

@jaredbroad jaredbroad requested a review from StefanoRaggi Jun 4, 2019

@StefanoRaggi
Copy link
Member

left a comment

PR looks good, the main question is if we need handling file timestamps or not.

@@ -31,7 +33,8 @@ public class CoinApiSymbolMapper : ISymbolMapper
{
private const string RestUrl = "https://rest.coinapi.io";
private readonly string _apiKey = Config.Get("coinapi-api-key");

private readonly bool _useLocalSymbolList = Config.GetBool("coinapi-use-local-symbol-list", false);

This comment has been minimized.

Copy link
@StefanoRaggi

StefanoRaggi Jun 5, 2019

Member

Config.GetBool defaults to false, so we can simplify:

private readonly bool _useLocalSymbolList = Config.GetBool("coinapi-use-local-symbol-list");
@@ -31,7 +33,8 @@ public class CoinApiSymbolMapper : ISymbolMapper
{
private const string RestUrl = "https://rest.coinapi.io";
private readonly string _apiKey = Config.Get("coinapi-api-key");

private readonly bool _useLocalSymbolList = Config.GetBool("coinapi-use-local-symbol-list", false);
private readonly FileInfo _coinApiSymbolsListFile = new FileInfo("CoinApiSymbols.json");

This comment has been minimized.

Copy link
@StefanoRaggi

StefanoRaggi Jun 5, 2019

Member

Agree we should give this option with a default value:

private readonly FileInfo _coinApiSymbolsListFile = new FileInfo(
    Config.Get("coinapi-default-symbol-list-file", "CoinApiSymbols.json"));

using (var wc = new WebClient())
if (_useLocalSymbolList && _coinApiSymbolsListFile.Exists && _coinApiSymbolsListFile.CreationTimeUtc.Date < DateTime.UtcNow.Date.AddDays(-1))

This comment has been minimized.

Copy link
@StefanoRaggi

StefanoRaggi Jun 5, 2019

Member

Thinking more about it, I'm wondering if we even need the file date check 🤔

When the CoinApiSymbolMapper is used in the CoinApiDataQueueHandler or in an occasional converter run, there will only be a single instance, so API rate limits are not an issue.

When it's used for massively parallel processing with the CoinApiDataConverter, the date check could return false for all processes and we wouldn't guarantee rate limits won't be hit. Perhaps for these scenarios, the file could be pre-downloaded (once a day) and the symbol mapper would just read the file with no date checks?

if (_useLocalSymbolList)
{
    if (!_coinApiSymbolsListFile.Exists) 
    {
        throw new Exception($"File not found: {_coinApiSymbolsListFile.FullName}");
    }
}

@Jay-Jay-D Jay-Jay-D force-pushed the coin-api-duplicated-symbols-fix branch from 8b4358f to 582b8d4 Jun 5, 2019

@jaredbroad jaredbroad merged commit b168592 into master Jun 5, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.