# Data and model exploration

In this notebook you will find information extracted from the historical data of binance, and how different models perform trading.

In [12]:
from binance.spot import Spot
import pandas as pd
from datetime import datetime
# binance has a limit of 500 elements per call, so you can make calls per year in order to get more data.
def get_historical_data(symbol, interval, start_date, end_date):
    # Initialize the Spot client
    client = Spot()
    
    # Convert date strings to milliseconds
    start_date = int(datetime.strptime(start_date, "%Y-%m-%d").timestamp() * 1000)
    end_date = int(datetime.strptime(end_date, "%Y-%m-%d").timestamp() * 1000)
    
    # Fetch the historical klines (candlestick data)
    klines = client.klines(symbol=symbol, interval=interval, startTime=start_date, endTime=end_date)
    
    # Convert to DataFrame
    df = pd.DataFrame(klines, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume', 'close_time', 'quote_asset_volume', 'number_of_trades', 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume', 'ignore'])
    
    # Convert timestamp to datetime
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
    
    # Set timestamp as index
    df.set_index('timestamp', inplace=True)
    
    # Convert numeric columns to appropriate types
    numeric_columns = ['open', 'high', 'low', 'close', 'volume', 'quote_asset_volume', 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume']
    df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric, axis=1)
    
    return df

# Example usage
symbol = 'BTCUSDT'
interval = '1d'  # 1 hour interval
start_date = '2022-01-01'
end_date = '2023-12-31'

df = get_historical_data(symbol, interval, start_date, end_date)

In [13]:
df.head()

Unnamed: 0_level_0,open,high,low,close,volume,close_time,quote_asset_volume,number_of_trades,taker_buy_base_asset_volume,taker_buy_quote_asset_volume,ignore
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-01-02,47722.66,47990.0,46654.0,47286.18,18340.4604,1641167999999,866611000.0,709624,9166.46954,433182400.0,0
2022-01-03,47286.18,47570.0,45696.0,46446.1,27662.0771,1641254399999,1292204000.0,885624,13524.76045,631879400.0,0
2022-01-04,46446.1,47557.54,45500.0,45832.01,35491.4136,1641340799999,1649170000.0,1021815,17689.26808,821725500.0,0
2022-01-05,45832.01,47070.0,42500.0,43451.13,51784.11857,1641427199999,2334289000.0,1478532,23552.9946,1063287000.0,0
2022-01-06,43451.14,43816.0,42430.58,43082.31,38880.37305,1641513599999,1674466000.0,1150707,19268.82662,829886600.0,0


In [14]:
df.shape

(500, 11)

In [15]:
df.columns.tolist()

['open',
 'high',
 'low',
 'close',
 'volume',
 'close_time',
 'quote_asset_volume',
 'number_of_trades',
 'taker_buy_base_asset_volume',
 'taker_buy_quote_asset_volume',
 'ignore']

In [16]:
df['ignore'].value_counts()

ignore
0    500
Name: count, dtype: int64

**Ideas to complement the dataset**: 
- Make more calls to the API in order to get more information of past transactions.
- Include information of stocks in the same period and include some data about the sentiment about crypto currencies at the moment.
- Include information about the economy in general, try to get a reliable source of data that can be contrasted with daily.
- Include information about trading between other crypto currencies beside bitcoin, like ethereum, worldcoin, etc.

In [17]:
# collect information about stocks of desired assets

import yfinance as yf

# set the ticker
ticker = 'AMZN'

amzn_data = yf.download(ticker, start_date, end_date)

amzn_data.tail()

[*********************100%%**********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-12-22,153.770004,154.350006,152.710007,153.419998,153.419998,29480100
2023-12-26,153.559998,153.979996,153.029999,153.410004,153.410004,25067200
2023-12-27,153.559998,154.779999,153.119995,153.339996,153.339996,31434700
2023-12-28,153.720001,154.080002,152.949997,153.380005,153.380005,27057000
2023-12-29,153.100006,153.889999,151.029999,151.940002,151.940002,39789000


In [18]:
# yahoo finance has the same problem, it can only get 500 elements in one API call. 
# The next step will be to try making more calls for different years.
amzn_data.shape

(501, 6)

Another idea for capturing the sentiment for bitcoin is to use selenium to scrap data from the fear/greed historical records in [coincodex](https://coincodex.com/sentiment/).
Another source for daily updates on fear and greed data is [alternative.me](https://alternative.me/crypto/fear-and-greed-index/#api), this site has data from 2018 to the last available date (today).