# GrabData
"You're only as good as your data" - Some Data Scientist  
Before coming up with strategies we need data to work with. In order to do this we setup an AWS where this file runs 24/7 and collecting data for 47 coin pairs. We collect market info and orderbook data and record everything timestamped in a CSV.

One hour of BTC-ETH [Info](https://github.com/darkfireXXI/Algo_Crypto_Trading/blob/master/BTC-ETH_Info_2018_11_16.csv) and [Depth](https://github.com/darkfireXXI/Algo_Crypto_Trading/blob/master/BTC-ETH_Depth_2018_11_16.csv) data taken at 30 second intervals is posted in this repository so you can see what the final output looks like.

## WebSockets vs REST API
The pros and cons of each are well documented in many other places, so I'll be brief here. At 30 second or even 1 minute intervals REST APIs make more sense.  

WebSockets allow for a continuous data stream, essentially listening to everything that happens. The major upside is that WebSockets would lower our TCP connection overhead and at minute scale intervals this would not be an issue to begin with. While it is possible to adjust some of the Binance WebSockets to only stream data at a given time interval, it only seemed to make sense if our time interval was 1 second or less and we don't require that level of granularity. 

### Imports, API Key, and Date

In [1]:
import json
import os
import time
import numpy as np
import csv
import datetime
import smtplib
from email.mime.text import MIMEText
import traceback
from binance.client import Client
client = Client('put your API','keys here') # on Binance you can set this one to be read only, in which case there's low security risk

np.set_printoptions(threshold = np.nan)

now = datetime.datetime.now()
year = now.year
if(now.month < 10):
	month = '0' + str(now.month)
else:
	month = now.month
if(now.day < 10):
	day = '0' + str(now.day)
else:
	day = now.day

### Settings
The main things to edit here are the spacing, depthsize, currency pairs, path, and email recipients.  
* spacing adjusts the time lag between each currency pair API call, normally this can be left at 0, but if you begin collecting over 100 currency pairs you might want to make it 0.1 in order avoid spamming the Binance server and hitting the API rate limit [details/limits](https://github.com/sammchardy/python-binance/blob/master/docs/overview.rst#api-rate-limit)  
* depthSize sets how many bids and asks you will collect from the orderbook. Currencty set at 5 you will collect the  5 highest buy orders and lowest 5 sell orders (Binance allows 100 as the maximum)
* abc_Markets is where for one of the 3 markets on Binance you can write in the currency pairs you want to collect
* path is the path to where you will save your CSV files. In the terminal/console you can cd your way to the directory and use pwd to find the associated path
* email recipients are the people who should be notified if the code errors out (it won't happen though)

In [2]:
spacing = 0 # seconds
depthSize = 5

BTC_Markets = ['ETH', 'XLM', 'XRP'] # helps to keep them alphabetical
ETH_Markets = ['XLM', 'XRP']
USDT_Markets = ['BTC']

BTC_lasts = []
ETH_lasts = []
USDT_lasts = []

BTC_orderBooks = []
ETH_orderBooks = []
USDT_orderBooks = []

currencypairs = len(BTC_Markets) + len(ETH_Markets) + len(USDT_Markets)
print('Grabbing {} Currency Pairs'.format(currencypairs)) # should currently be 6
print(' {} from BTC Markets'.format(len(BTC_Markets)))
print(' {} from ETH Markets'.format(len(ETH_Markets)))
print(' {} from USDT Markets'.format(len(USDT_Markets)))

filePath = '/path/' # INSERT WORKING DIRECTORY (PWD)
recipients = 'you@cryptoTrading.com'

timer = 1

### Grab Info & Depth Function
This function takes will take depthSize, the FX currency, and the base currency as inputs and fetch the market info and orderbook data and return it as a list to be recorded into the associated CSV.  

Assuming there is a problem establishing a connection the function will make two more attempts to connect before finally taking the previous data points (Last and orderbook) and feed them through as repeat data points in order to keep continuity. The first API call success rate is 99.9888%, so you can rest asssured that it will _likely never_ have to use this last resort method.  

To elaborate on the _likely never_ aspect: The first request error rate is 0.0112%. Given that the first API request errors let's assume the second API request is 50% likely to error. Assuming these both fail the third API request will almost surely fail, let's say 90% likely. This still brings us to 0.00504% chance that a single data point will be re-recorded.

In [3]:
def GrabInfoDepthVerify(Last, orderBook, depthSize, FXcurrency, baseCurrency):
    Pair = FXcurrency + baseCurrency
    try:
        try:
            try:
                Ticker = client.get_ticker(symbol=Pair)
                Last0 = [float(Ticker['lowPrice']), float(Ticker['bidPrice']), float(Ticker['lastPrice']), float(Ticker['askPrice']), float(Ticker['highPrice']), float(Ticker['weightedAvgPrice']), float(Ticker['priceChangePercent']), float(Ticker['bidQty']), float(Ticker['lastQty']), float(Ticker['askQty']), float(Ticker['volume'])]

                depth = client.get_order_book(symbol=Pair)
                orderBook0 = organizeDepth(depth, depthSize)
            except:
                print('Get Info/Depth Fail 1 - {}'.format(Pair))
                Ticker = client.get_ticker(symbol=Pair)
                Last0 = [float(Ticker['lowPrice']), float(Ticker['bidPrice']), float(Ticker['lastPrice']), float(Ticker['askPrice']), float(Ticker['highPrice']), float(Ticker['weightedAvgPrice']), float(Ticker['priceChangePercent']), float(Ticker['bidQty']), float(Ticker['lastQty']), float(Ticker['askQty']), float(Ticker['volume'])]

                depth = client.get_order_book(symbol=Pair)
                orderBook0 = organizeDepth(depth, depthSize)
        except:
            print('Get Info/Depth Fail 2 - {}'.format(Pair))
            Ticker = client.get_ticker(symbol = Pair)
            Last0 = [float(Ticker['lowPrice']), float(Ticker['bidPrice']), float(Ticker['lastPrice']), float(Ticker['askPrice']), float(Ticker['highPrice']), float(Ticker['weightedAvgPrice']), float(Ticker['priceChangePercent']), float(Ticker['bidQty']), float(Ticker['lastQty']), float(Ticker['askQty']), float(Ticker['volume'])]

            depth = client.get_order_book(symbol=Pair)
            orderBook0 = organizeDepth(depth, depthSize)
    except:
        traceback.print_exc()

        print('Get Info/Depth Fail 3 - {}'.format(Pair))
        Last.pop(0)
        Last0 = Last
        orderBook.pop(0)
        orderBook0 = orderBook
    return Last0, orderBook0

### Organizing the Order Book Data
The Depth data Binance returns is a bit of a mess to go through compared to Ticker data, so I wrote a function which takes the raw Depth data and depthSize as inputs and returns the orderbook bids/asks to the depth specified organized neatly and ready to be recorded.

In [4]:
def organizeDepth(depth, depthSize):
    bids, asks = np.zeros((depthSize*2)), np.zeros((depthSize*2))
    for i in range(0, depthSize): # can go up to 100 max
        bids[depthSize - i - 1] = depth['bids'][i][0]
        bids[2*depthSize - i - 1] = depth['bids'][i][1]
        asks[i] = depth['asks'][i][0]
        asks[depthSize + i] = depth['asks'][i][1]
    orderBook = np.zeros((depthSize*4))
    orderBook[0:depthSize] = bids[0:depthSize]
    orderBook[depthSize:depthSize*2] = asks[0:depthSize]
    orderBook[depthSize*2:depthSize*3] = bids[depthSize:]
    orderBook[depthSize*3:depthSize*4] = asks[depthSize:]
    return list(orderBook)

### Initialization
The initializaiton consists of making naked API calls to callect all the necessary data and appending it to the abc_lasts and abc_orderBooks variables. This will allow us to easily pass these values through the GrabInfoDepthVerify function in case we get unlucky and hit that 3rd except ;)  

This part of the code is also timed, so that we can find out how long it takes to make all the API calls.

In [5]:
try:

    delay_start = time.time()

    baseCurrency = 'BTC'
    for i in range(0, len(BTC_Markets)):
        FXcurrency = BTC_Markets[i]
        Pair = FXcurrency + baseCurrency
        Ticker = client.get_ticker(symbol=Pair)

        Last = [float(Ticker['lowPrice']), float(Ticker['bidPrice']), float(Ticker['lastPrice']), float(Ticker['askPrice']), float(Ticker['highPrice']), float(Ticker['weightedAvgPrice']), float(Ticker['priceChangePercent']), float(Ticker['bidQty']), float(Ticker['lastQty']), float(Ticker['askQty']), float(Ticker['volume'])]
        BTC_lasts.append(Last)

        depth = client.get_order_book(symbol=Pair)
        orderBook = organizeDepth(depth, depthSize)
        BTC_orderBooks.append(orderBook)

        time.sleep(spacing) # remove for finding delay timing
        i += 1

    baseCurrency = 'ETH'
    for j in range(0, len(ETH_Markets)):
        FXcurrency = ETH_Markets[j]
        Pair = FXcurrency + baseCurrency

        Ticker = client.get_ticker(symbol=Pair)
        Last = [float(Ticker['lowPrice']), float(Ticker['bidPrice']), float(Ticker['lastPrice']), float(Ticker['askPrice']), float(Ticker['highPrice']), float(Ticker['weightedAvgPrice']), float(Ticker['priceChangePercent']), float(Ticker['bidQty']), float(Ticker['lastQty']), float(Ticker['askQty']), float(Ticker['volume'])]
        ETH_lasts.append(Last)

        depth = client.get_order_book(symbol=Pair)
        orderBook = organizeDepth(depth, depthSize)
        ETH_orderBooks.append(orderBook)

        time.sleep(spacing) # remove for finding delay timing
        j += 1

    baseCurrency = 'USDT'
    for k in range(0, len(USDT_Markets)):
        FXcurrency = USDT_Markets[k]
        Pair = FXcurrency + baseCurrency

        Ticker = client.get_ticker(symbol=Pair)
        Last = [float(Ticker['lowPrice']), float(Ticker['bidPrice']), float(Ticker['lastPrice']), float(Ticker['askPrice']), float(Ticker['highPrice']), float(Ticker['weightedAvgPrice']), float(Ticker['priceChangePercent']), float(Ticker['bidQty']), float(Ticker['lastQty']), float(Ticker['askQty']), float(Ticker['volume'])]
        USDT_lasts.append(Last)

        depth = client.get_order_book(symbol=Pair)
        orderBook = organizeDepth(depth, depthSize)
        USDT_orderBooks.append(orderBook)

        time.sleep(spacing) # remove for finding delay timing
        k += 1

    delay_end = time.time()
    delay_ideal = round(delay_end - delay_start, 1)
    print('Ideal delay time should be {} seconds. Please adjust accordingly'.format(delay_ideal))

### Writing the CSV Files
Next we define our headings. Info file headings are hard coded, whereas Depth headings are variable based on depthSize. There are 3 loops for each base currency market where all the necessary CSV are first written.

In [6]:
headingsInfo = [('timestamp', 'lowPrice', 'bidPrice', 'lastPrice', 'askPrice', 'highPrice', 'weightedAvgPrice', 'priceChangePercent', 'bidQty', 'lastQty', 'askQty', 'volume')]

    headingsDepth = np.zeros((depthSize*4 + 1))
    headingsDepth = list(headingsDepth)
    headingsDepth[0] = 'timestamp'
    for i in range(0, depthSize):
        headingsDepth[i + 1] = 'bid{}'.format(depthSize - i)
        headingsDepth[depthSize*2 - i] = 'ask{}'.format(depthSize - i)
    for i in range(0, depthSize):
        headingsDepth[i + 2*depthSize + 1] = 'bidQty{}'.format(depthSize - i)
        headingsDepth[depthSize*4 - i] = 'askQty{}'.format(depthSize - i)

    baseCurrency = 'BTC'
    for i in range(0, len(BTC_Markets)):
        FXcurrency = BTC_Markets[i]

        myFile = open('{}{}-{}_Info_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'w')
        with myFile:
            writer = csv.writer(myFile)
            writer.writerow(['{}-{}'.format(baseCurrency, FXcurrency)])
            writer.writerows(headingsInfo)

        myFile = open('{}{}-{}_Depth_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'w')
        with myFile:
            writer = csv.writer(myFile)
            writer.writerow(['{}-{}'.format(baseCurrency, FXcurrency)])
            writer.writerows([headingsDepth])
        i += 1

    baseCurrency = 'ETH'
    for j in range(0, len(ETH_Markets)):
        FXcurrency = ETH_Markets[j]

        myFile = open('{}{}-{}_Info_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'w')
        with myFile:
            writer = csv.writer(myFile)
            writer.writerow(['{}-{}'.format(baseCurrency, FXcurrency)])
            writer.writerows(headingsInfo)

        myFile = open('{}{}-{}_Depth_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'w')
        with myFile:
            writer = csv.writer(myFile)
            writer.writerow(['{}-{}'.format(baseCurrency, FXcurrency)])
            writer.writerows([headingsDepth])
        j += 1

    baseCurrency = 'USDT'
    for k in range(0, len(USDT_Markets)):
        FXcurrency = USDT_Markets[k]

        myFile = open('{}{}-{}_Info_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'w')
        with myFile:
            writer = csv.writer(myFile)
            writer.writerow(['{}-{}'.format(baseCurrency, FXcurrency)])
            writer.writerows(headingsInfo)

        myFile = open('{}{}-{}_Depth_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'w')
        with myFile:
            writer = csv.writer(myFile)
            writer.writerow(['{}-{}'.format(baseCurrency, FXcurrency)])
            writer.writerows([headingsDepth])
        k += 1

### Collecting Data
To get the best accuracy the time is checked once a second to see if the seconds mod 30 are equal to 0 (for a 30 second interval). If this is True we record the data points in this moment. Once again there are 3 loops for BTC, ETH, and USDT markets where the data for each coin pair is fetched and recorded into the CSV. The abc_lasts and abc_orderBook variables are updated to only store the most recent data points as back ups.

In [7]:
    while True:
        time.sleep(1)

        now = datetime.datetime.now()

        if(now.second%30 == 0):
            year = str(now.year)
            if(now.month < 10):
                month = '0' + str(now.month)
            else:
                month = str(now.month)
            if(now.day < 10):
                day = '0' + str(now.day)
            else:
                day = str(now.day)
            if(now.hour < 10):
                hour = '0' + str(now.hour)
            else:
                hour = str(now.hour)
            if(now.minute < 10):
                minute = '0' + str(now.minute)
            else:
                minute = str(now.minute)
            if(now.second < 10):
                second = '0' + str(now.second)
            else:
                second = str(now.second)

            now = year + '/' + month + '/' + day + '_' + hour + ':' + minute + ':' + second

            baseCurrency = 'BTC'
            for i in range(0, len(BTC_Markets)):
                FXcurrency = BTC_Markets[i]

                last, orderBook = GrabInfoDepthVerify(BTC_lasts[i], BTC_orderBooks[i], depthSize, BTC_Markets[i], baseCurrency)
                BTC_lasts[i] = last
                BTC_orderBooks[i] = orderBook

                last.insert(0, now)
                myFile = open('{}{}-{}_Info_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'a')
                with myFile:
                    writer = csv.writer(myFile)
                    writer.writerows([last])

                orderBook.insert(0, now)
                myFile = open('{}{}-{}_Depth_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'a')
                with myFile:
                    writer = csv.writer(myFile)
                    writer.writerows([orderBook])

                time.sleep(spacing)
                i += 1

            baseCurrency = 'ETH'
            for j in range(0, len(ETH_Markets)):
                FXcurrency = ETH_Markets[j]

                last, orderBook = GrabInfoDepthVerify(ETH_lasts[j], ETH_orderBooks[j], depthSize, ETH_Markets[j], baseCurrency)
                ETH_lasts[j] = last
                ETH_orderBooks[j] = orderBook

                last.insert(0, now)
                myFile = open('{}{}-{}_Info_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'a')
                with myFile:
                    writer = csv.writer(myFile)
                    writer.writerows([last])

                orderBook.insert(0, now)
                myFile = open('{}{}-{}_Depth_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'a')
                with myFile:
                    writer = csv.writer(myFile)
                    writer.writerows([orderBook])

                time.sleep(spacing)
                j += 1

            baseCurrency = 'USDT'
            for k in range(0, len(USDT_Markets)):
                FXcurrency = USDT_Markets[k]

                last, orderBook = GrabInfoDepthVerify(USDT_lasts[k], USDT_orderBooks[k], depthSize, USDT_Markets[k], baseCurrency)
                USDT_lasts[k] = last
                USDT_orderBooks[k] = orderBook

                last.insert(0, now)
                myFile = open('{}{}-{}_Info_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'a')
                with myFile:
                    writer = csv.writer(myFile)
                    writer.writerows([last])

                orderBook.insert(0, now)
                myFile = open('{}{}-{}_Depth_{}_{}_{}.csv'.format(filePath, baseCurrency, FXcurrency, year, month, day), 'a')
                with myFile:
                    writer = csv.writer(myFile)
                    writer.writerows([orderBook])

                time.sleep(spacing)
                k += 1

            timer += 1
            print(timer)

### Errors
If any error should occur it will be printed to the terminal and you will be notified immediately by email.

In [8]:
except:
    traceback.print_exc()

    msg = MIMEText('The Grab Info/Depth Code errored out and needs to be checked')
    msg['Subject'] = 'Grab Info/Depth Code - Error'

    server = smtplib.SMTP('smtp.server.com', 465) # this will vary, find server/port for your email host
    server.ehlo()
    server.starttls()
    server.login('you@cryptoTrading.com', 'password')

    server.sendmail('you@cryptoTrading.com', recipients, msg.as_string())
    server.quit()