# Ape Minders: Trading Cryptocurrency on NFT Signals

By Andreas Michael, Michael Wander, Daria Vasileva, and Amit Kumar

## Introduction

The market of non-fungible tokens, or NFT's for short, has boomed over the past two years, with interest and trading at an all-time high. Although the hype cannot last forever, the effects of NFT's on cryptocurrency as a whole should not be dismissed. The most popular and most expensive NFT collections are generally based on the Ethereum blockchain, mainly through Polygon on OpensSea, but marketplaces on other blockchains now exist for the trading and minting of NFT's, including Solana, Cardano, Avalanche, Decentraland, and others. With this in mind, we aim to trade various cryptocurrencies, from large to small-cap, which are directly or indirectly related to the trading of NFT's, on signals produced from NFT data, such as number of sales, active wallets, buyers, and sales in USD, as well as sentiment and polarity data of text posts related to NFT's. We presume that NFT related data correlates well with our chosen universe of cryptocurrencies, which should allow for an effective trading strategy.  

## Market Universe

### Instruments

For our universe, we choose seven cryptocurrencies which are directly or indirectly related to NFT's and their marketplaces. All but two of these coins are based on their own blockchains. The two which are not, Polygon and Decentraland, are based on the Ethereum blockchain. The cryptocurrencies we choose range from large to small-cap in terms of market capitalization, which allows for a good spread of the volatility of altcoins. The instruments, along with their approximate market caps as of April 2022 are as follows:

* Solana (SOL), \$44B
* Cardano (ADA), \$39B
* Avalanche (AVAX), \$26B
* Polygon (MATIC), \$11.5B
* Decentraland (MANA), \$5B
* Flow (FLOW), \$2.5B
* Theta Fuel (TFUEL), \$1.1B

The OHLCV and market cap data for these instruments is sourced from https://coincodex.com/. The date range that we use for zipline ingestion is from 2017-10-02 to 2022-04-17, but backtesting, model training, and strategy evaluation is done on the range from 2021-09-19 to 2022-03-20.

### Exchange

For trading and transaction cost purposes, we will suppose that our trading is done on the Binance cryptocurrency exchange, despite being banned in New York. We choose Binance because we can trade all of our seven cryptocurrencies in our universe while incurring a transaction cost of 0.1% on each trade. This assumes that we convert all of our capital into Tether (USDT) before trading on Binance.

### Benchmark: AMI index

A capitalization-weighted index was created to compare our algorithmic strategy against.  The index consists of the 7 instruments that we are trading. We deploy a buy and hold strategy with it. This index is being rebalanced every quarter.

## NFT Features: Quantitative Data

## NFT Features: Alternative Data

Online sentiment about NFT's, and cryptocurrency in general, may contribute to a lot of the hype around the trading of these assets. Some notable places for discussions include Twitter, Reddit, Telegram, and Discord. For our collection of posts related to NFT's, we choose Reddit, specifically the r/NFT subreddit where many different NFT posts are made per day. This subreddit was created in 2016, but has historical posts dating back only to 2019, which does not affect our analysis.

![Reddit](figures/Sentiment/Reddit.png)

To collect the posts from Reddit, the Python Reddit API Wrapper (PRAW) and the Pushshift API Wrapper (PSAW) libraries are used. PSAW is only used due to the 1000 post limit per API call, and the inability to use date ranges to request posts, of the base Reddit API. Using these two libraries, around 300,000 posts from the r/NFT subreddit, from 2019 to 2022, are collected. For analysis, we only collect main posts and not comments or reply trees. These main posts usually do not have any text apart from their title, since most of them are memes, pictures, or other media related to NFT's. Due to this, the title of each post, as well as any post text that exists, are combined for each post and used for seniment analysis.

![text_df](figures/Sentiment/text_df.png)

The sentiment analysis is done using SpaCy and the spacytextblob pipeline component. From each post title and text, the polarity is produced and collected as a column in the same dataframe. 

![pol_df1](figures/Sentiment/polarity_df_1.png)

To aggregate the polarity for each day, an aggregate sum and mean are taken of the polarity of each post associated with their respective day. Then, a threshold based on the polarity sum column is created, which produces either a label of -1, 0, or 1, for overall negative, neutral, or positive sentiment for the day. The threshold for the negative label is less than 0, and the threshold for the positive label is greater than 100. 

![pol_df2](figures/Sentiment/polarity_df_2.png)

While analyzing and collecting the sentiment data, we noticed that most of the posts, and therefore days, have neutral sentiment. This may be because we did not do any cleaning or preprocessing of the posts gathered from the subreddit. Many posts are just people writing about various NFT giveaways they claim to be doing, or other things that would not indicate positive or negative sentiment necessarily. Nevertheless, the sentiment data is used as is.

## Regression Models for Predicted Returns

The trading algorithm for our universe is based on predicted forward returns of one day. To predict these returns, we will use a regression model with the previously described NFT features, as well as some general technical indicators for the close prices of our instruments. These indicators are the RSI, Bollinger Bands (high and low), Average True Rate, and MACD (technical factor methods referenced from *Machine Learning for Algorithmic Trading*, Chapter 7). Three simple models are chosen for comparison and evaluation of the predicted returns: OLS, random forest, and XGBoost. All three models are trained on the same input features, which include the technical indicators and the NFT features. No hyperparameters are changed, and the input features are not scaled or normalized. In total, there are 17 input features: 5 technical features, 10 quantitative NFT features, and 2 sentiment NFT features. The target variable are the one-day forward returns of the close price of each coin.

The three models are all trained separately on each coin rather than all at once on the multi-indexed dataframe, due to issues with accurately setting the date range of the training, testing, and backtesting data. The feature importance and coefficients of the models are also calculated and visualized in the figures below. It seems that all three models use some or all of the quantitative NFT features, but the sentiment is only used slightly in the OLS model. The technical indicators do make up a bulk of the feature importance.
  
| | | |
|-|-|-|
![ols_fi](figures/Models/ols_fi.png) | ![rf_fi](figures/Models/rf_fi.png) | ![rf_fi](figures/Models/rf_fi.png)

For evaluation, the MSE and MAE are calculated for each coin, then averaged together and compared. Overall, the random forest model appears to have the lowest error, though for Cardano the OLS model has a lower error.


![errors1](figures/Models/errors1.png)
![errors2](figures/Models/errors2.png)

On average, the random forest has the lowest error, followed by the OLS model and the XGBoost model.

| | |
|-|-|
![errors3](figures/Models/errors3.png) | ![errors4](figures/Models/errors4.png)

However, the error in Cardano may affect the returns produced in the backtesting, as we find that the OLS model's predicted returns produces much better results in our backtesting range than both the random forest and the XGBoost models. Thus, we will use the OLS model's predicted returns to create our trading signals.

## Trading Algorithm

After collecting all the necessary features, future returns of each coin are predicted on a daily basis. Long signal in this strategy is defined as a more than 1% increase in predicted future returns. Short signal in this strategy is defined as a less than 0% decrease in predicted future returns or the holding days for that coin reaches 18 days. A coin can be bought when our current position of that coin is equal to 0. A coin can be sold only when our current position of that coin is more than 0.

## Portfolio Evaluation

## Conclusion