<a href="https://colab.research.google.com/github/Desmath3/Desmath3/blob/main/capstone.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The overarching objective of this research is to apply machine learning (deep learning) techniques to develop long/short pair trading strategies for investments. The goal is to use historical data and create a Python code that can identify potential pairs of assets to trade and generate profitable trading signals.

Step-by-Step Research Strategy:

1. Data Collection: Historical financial data is obtain for a set of assets that considered for pair trading. This data should include the relevant features such as price, volume, and any other indicators.

2. Preprocessing: The data is cleaned and preprocessed to remove any missing values, outliers, or inconsistencies and may require normalizing or standardizing the data in relation to the choice algorithms

3. Pair Selection: Statistical techniques such as cointegration or correlation analysis is used to identify pairs of assets that are likely to exhibit a mean-reverting behavior suitable for pair trading. This step helps to select the assets to form the pairs.

4. Feature Engineering: Based on the selected pairs, additional features are create that can potentially improve the performance of your trading strategy. These features could include moving averages, relative strength index (RSI), or any other technical indicators that are relevant to your trading approach.

5. Training and Testing Data: The dataset is split into training and testing sets. The training set will be used to train the machine learning models, while the testing set will be used to evaluate their performance.

6. Model Development: Implement machine learning models, such as deep neural networks or gradient boosting machines, to learn the patterns and relationships in the training data. Libraries like TensorFlow, Keras, or scikit-learn would  be used to build and train these models.

7. Model Evaluation: The performance of the trained models would be evaluated using appropriate metrics such as accuracy, precision, recall, or profit and loss measures.  The performance assessed on both the training and testing sets to check for overfitting.

8. Trading Signal Generation: The trained models to generate trading signals based on the input data. These signals will determine whether to take a long or short position on each pair of assets.

9. Backtesting and Evaluation: A  backtesting framework is implemented to simulate the trading strategy using historical data. The performance of the strategy is assessed by calculating metrics such as cumulative returns, Sharpe ratio, or maximum drawdown.

10. Iteration and Optimization: The models and trading strategy is fine-tuned based on the evaluation results and experimentedd with different hyperparameters, feature selection techniques, or model architectures to improve the strategy's performance.

Additional Step

11. Implementation: Once a optimal strategy is achieved, it can be implemented in real-time trading or paper trading platforms, its performance is monitored and necessary adjustments is made as market conditions change.

Data Libraries:
For importing necessary libraries in Python, you can use the following commonly used libraries:


In [1]:
import pandas as pd  # Data manipulation and analysis
import numpy as np  # Numerical operations
import matplotlib.pyplot as plt  # Data visualization
import seaborn as sns  # Enhanced data visualization
import statsmodels.api as sm  # Statistical models and tests
from sklearn.preprocessing import StandardScaler  # Data normalization
from sklearn.model_selection import train_test_split  # Data splitting
from sklearn.metrics import accuracy_score, precision_score, recall_score  # Evaluation metrics
import tensorflow as tf  # Deep learning framework
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam

In [13]:
#!pip install ccxt
import ccxt
import requests
import csv

In [8]:
def get_prices(exchange, symbol, timeframe='1h', limit=60):
    inst = getattr(ccxt, exchange)

    # Fetch OHLCV data
    ohlcv = inst().fetch_ohlcv(symbol, timeframe, limit=limit)

    # Convert the data to a Pandas DataFrame
    df = pd.DataFrame(ohlcv, columns=['Time', 'Open', 'High', 'Low', 'Close', 'Volume'])

    # Set the 'Time' column as the index and convert the timestamp to a human-readable format
    df.set_index('Time', inplace=True)
    df.index = pd.to_datetime(df.index, unit='ms')

    # Convert data types to float
    df = df.astype(float)

    return df

In [9]:
def get_top_cryptos(limit=500):
    # Fetch the list of top cryptocurrencies from CoinGecko
    url = f'https://api.coingecko.com/api/v3/coins/markets'
    params = {
        'vs_currency': 'usd',
        'order': 'market_cap_desc',
        'per_page': limit,
        'page': 1,
        'sparkline': False,
    }
    response = requests.get(url, params=params)
    top_cryptos = response.json()

    # Extract symbols from the response
    symbols = [crypto['symbol'].upper() for crypto in top_cryptos]

    return symbols

In [None]:
top_cryptos = get_top_cryptos(5)
top_cryptos[:]

['BTC', 'ETH', 'USDT', 'BNB', 'SOL']

In [10]:
def get_prices_for_top_cryptos(timeframe='1h', limit=60, top_n=100):
    all_prices = {}
    exchanges_to_check = ccxt.exchanges

    top_cryptos = get_top_cryptos(top_n)

    for symbol in top_cryptos:
        for exchange_name in exchanges_to_check:
            for usd_variant in ['USD', 'USDC', 'USDT']:
                try:
                    df = get_prices(exchange_name, f'{symbol}/{usd_variant}', timeframe, limit)
                    all_prices[f"{exchange_name}_{symbol}_{usd_variant}"] = df
                    break
                except Exception as e:
                    pass
            else:
                continue
            break

    return all_prices

In [23]:
# Data
timeframe_name = '1h'
limit_value = 168
top_n = 100

all_prices = get_prices_for_top_cryptos(timeframe_name, limit_value, top_n)
# Now 'all_prices' is a dictionary where keys are symbols and values are DataFrames with OHLCV data.


In [24]:
with open('crypto_data.csv', 'w') as csvfile:
  writer = csv.writer(csvfile)
  for key, df in all_prices.items():
    exchange, symbol, pair = key.split('_')
    writer.writerow([exchange, symbol, pair])
    df.to_csv(csvfile, index=True)

In [25]:
# Import data from CSV
import pandas as pd
data = {}
with open('crypto_data.csv', 'r') as csvfile:
  reader = csv.reader(csvfile)
  for row in reader:
    if row[0] == 'exchange':
      continue # skip header row
    exchange = row[0]
    symbol = row[1]
    pair = row[2]
    df = pd.read_csv(csvfile, index_col=0)
    data[f"{exchange}_{symbol}_{pair}"] = df

print(data)

{'ascendex_BTC_USD':                         Open     High      Low    Close  Volume
Time                                                           
2024-02-13 18:00:00  48712.0  49022.0  48615.0  48880.0  0.1805
2024-02-13 19:00:00  48893.0  49022.0  48749.0  48900.0  0.1793
2024-02-13 20:00:00  48972.0  49267.0  48825.0  49225.0  0.1849
2024-02-13 21:00:00  49248.0  49516.0  49128.0  49414.0  0.1858
2024-02-13 22:00:00  49332.0  49516.0  49181.0  49516.0  0.1286
...                      ...      ...      ...      ...     ...
2024-02-20 14:00:00   6.7809   6.7976   6.4688   6.5385  1031.0
2024-02-20 15:00:00    6.517   6.5598     6.23   6.5424   795.0
2024-02-20 16:00:00   6.5418   6.5506   6.2181   6.2361  1136.0
2024-02-20 17:00:00    6.242   6.4671    6.242   6.4656   732.0
2024-02-20 18:00:00   6.4475   6.4831   6.3336   6.3519   502.0

[16155 rows x 5 columns]}


In [None]:
#Load the required libraries
import pandas as pd
import yfinance as yf
import seaborn as sns
import matplotlib.pyplot as plt yyyyklkllllllll

  _empty_series = pd.Series()


Downloading  Cryptocurrency dataset for Binance, Bitcoin, Ethereum, and XRP(Ripple). The  tickers symbols BNB, BTC, ETH, and XRP will be used

In [None]:
# list of crptocurrencies as ticker arguments
cryptocurrencies = ['BNB-USD','BTC-USD', 'ETH-USD', 'XRP-USD']

In [None]:
data = yf.download(cryptocurrencies, start='2020-01-01',
                end='2023-12-12')
data.head()

[*********************100%%**********************]  4 of 4 completed


Price,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,High,High,...,Low,Low,Open,Open,Open,Open,Volume,Volume,Volume,Volume
Ticker,BNB-USD,BTC-USD,ETH-USD,XRP-USD,BNB-USD,BTC-USD,ETH-USD,XRP-USD,BNB-USD,BTC-USD,...,ETH-USD,XRP-USD,BNB-USD,BTC-USD,ETH-USD,XRP-USD,BNB-USD,BTC-USD,ETH-USD,XRP-USD
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2020-01-01,13.689083,7200.174316,130.802002,0.192667,13.689083,7200.174316,130.802002,0.192667,13.873946,7254.330566,...,129.198288,0.192107,13.730962,7194.89209,129.630661,0.192912,172980718,18565664997,7935230330,1041134003
2020-01-02,13.027011,6985.470215,127.410179,0.188043,13.027011,6985.470215,127.410179,0.188043,13.715548,7212.155273,...,126.95491,0.186947,13.698126,7202.55127,130.820038,0.192708,156376427,20802083465,8032709256,1085351426
2020-01-03,13.660452,7344.884277,134.171707,0.193521,13.660452,7344.884277,134.171707,0.193521,13.763709,7413.715332,...,126.490021,0.185846,13.035329,6984.428711,127.411263,0.187948,173683857,28111481032,10476845358,1270017043
2020-01-04,13.891512,7410.656738,135.069366,0.194355,13.891512,7410.656738,135.069366,0.194355,13.921914,7427.385742,...,133.040558,0.191835,13.667442,7345.375488,134.168518,0.193521,182230374,18444271275,7430904515,999331594
2020-01-05,14.111019,7411.317383,136.276779,0.195537,14.111019,7411.317383,136.276779,0.195537,14.410801,7544.49707,...,135.045624,0.193884,13.88834,7410.45166,135.072098,0.194367,202552703,19725074095,7526675353,1168067557


There are no missing data in the data frame. Therefore, we can proceed with the analysis, but first, we need to know what the features in the dataset represent.

In [None]:
#There are no missing data in the data frame. Therefore, we can proceed with the analysis, but first, we need to know what the features in the dataset represent.

In [None]:
# check for missing data
data.isnull().any()

Price      Ticker 
Adj Close  BNB-USD    False
           BTC-USD    False
           ETH-USD    False
           XRP-USD    False
Close      BNB-USD    False
           BTC-USD    False
           ETH-USD    False
           XRP-USD    False
High       BNB-USD    False
           BTC-USD    False
           ETH-USD    False
           XRP-USD    False
Low        BNB-USD    False
           BTC-USD    False
           ETH-USD    False
           XRP-USD    False
Open       BNB-USD    False
           BTC-USD    False
           ETH-USD    False
           XRP-USD    False
Volume     BNB-USD    False
           BTC-USD    False
           ETH-USD    False
           XRP-USD    False
dtype: bool