<a href="https://colab.research.google.com/github/ParnLimwat/Astral/blob/main/Astral.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Astral - RNDM Quant Track Notebook**
Parn Limwattananon, Philipp Bruhns, Rhin Choe and Xingyan Liu

# **Abstract**

This project employs a Moving Average Crossover Strategy to evaluate trends in cryptocurrency markets, utilizing sentiment scores derived from recent cryptocurrency news. Sentiment analysis and keyword extraction are conducted using Natural Language Processing (NLP) techniques to provide insight into potential market shifts. Backtesting on historical data from the past month reveals a notable profit margin of approximately 30%, underscoring the strategy’s efficacy in short-term cryptocurrency market predictions. ​

# **1. Prerequisites**

For our code, we're using multiple libraries that require prior installation. We use requests to handle API calls to NewsAPI. Additionally, Pandas and the standard JSON library help us with data manipulation.

We use the NLTK package to perform Natural Language Processing (NLP) on the article's content. By using the vader lexicon, we are able to supply English vocabulary to the model.

Also, we are using the backtesting.py library as our backtesting engine since it includes crossover and SMA functions which we will use later. It also provides a way to visualise our performance on historic data.

In [None]:
%%bash
pip install backtesting
pip install requests
pip install pandas
pip install nltk

In [None]:
from backtesting import Backtest, Strategy
from backtesting.lib import crossover
from backtesting.test import SMA
from datetime import datetime, timedelta, date

import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')

import requests
import pandas as pd
import json

At this stage, also make sure that you have uploaded the ETHSUDT.csv from the repository to the session storage.

# **2. Data Aggregation**

**2.1 Retrieving news articles via NewsAPI**

We decided to use NewsAPI as the source for our news articles. Unfortunately, the free subscription tier only allows us to retrieve articles from the past month and also limits responses to 100 articles per request. Changing to the paid subscription tier would allow access to over six years and multiple thousand articles per request, which should dramatically improve results. However, the subscription tier does not require any changes in the code below.




In [None]:
key = "ce2f9d4118e044aea7b7f7b57697055c" # Use the variable key to store a NewsAPI key
currency = "Ethereum"
from_date = "2024-10-01"
to_date = "2024-10-25"

# Use the url variable to store the url of the information for news articles related to each cryptocurrency
url = (f'https://newsapi.org/v2/everything?q={currency}&from={from_date}&to={to_date}&sortBy=popularity&apiKey={key}')

response = requests.get(url)

print(response.json())

For example, running the code above would give us news articles about 'Ethereum' between 2024-10-01 and 2024-10-25 in JSON format including the total number of results, the articles' sources, the articles' heading, and the first few sentences of the articles.



In [None]:
data = response.json()

articles = data.get('articles')

# List of news article headers to analyse
contents = [article['content'] for article in articles if article['title'] and article['content']]


As we are only interested in the content of each news article for further analysis, we are putting all the contents of the retrieved articles into a list.

**2.2 Historic trading data for backtesting**

For later backtesting, we retrieved daily candlebar data of the ETH/USDT trading pair on binance from https://www.CryptoDataDownload.com, which includes data from 2024-10-25 back to 2017-08-17. We then manually reformatted the csv to match the requirements of our backtesting engine and cut the data down to the past month, because the free NewsAPI subscription does not allow us to pull older articles. More on this in **4. Backtesting**.

The manually formatted data can be found in ETHUSDT.csv

# **3. Strategy: Moving Average Crossover using Sentimental Score**

A **moving average crossover** is a trading strategy where two moving averages of different periods, a short-term and a long-term average, are used to identify trends. When the short-term moving average crosses above the long-term moving average, it signals a potential uptrend. Conversely, when it crosses below, it indicates a possible downtrend.

For our strategy, we decided to use moving average crossovers to analyse trends in the sentiment of news articles about a certain DeFi coin in order to find the points in time where the coin's price is potentially increasing.

The goal is to calculate a long-term trend in the sentiment of news articles and check whether the short-term trend is better or worse. If it is better, we want to buy, and if it is worse, we want to sell. Backtesting this strategy on historic data will then let us determine the optimal time interval that should be considered as long-term and short-term respectively (see: **4.1 Optimisation**).

**The strategy consists of four main steps:**
1. Use NLP to determine the sentiment of each article.
2. Further check for keywords to improve accuracy.
3. Calculate an overall sentiment score for a DeFi coin of a specific day based off step 1 and 2.
4. Generate buy and sell signals according to the moving average crossing algorithm fed with the sentiment scores from step 3.


**Step 1: Natural Language Processing**

We then used a sentiment analyser to determine whether an article delivers a positive or negative sentiment. This analysis can then allow for a polarity score to be assigned, ranging from 0 to 1, where 0 is negative and 1 is positive.

In [None]:
# Initialise the VADER sentiment analyser
sia = SentimentIntensityAnalyzer()

The function below allows us to obtain the header, sentiment, and sentiment score for each news article.

In [None]:
# Function to analyse sentiment
def analyze_sentiment(headers):
    results = []
    for header in headers:
        sentiment_score = sia.polarity_scores(header)
        # Classify as positive, negative, or neutral based on the compound score
        if sentiment_score['compound'] >= 0.05:
            sentiment = 'Positive'
        elif sentiment_score['compound'] <= -0.05:
            sentiment = 'Negative'
        else:
            sentiment = 'Neutral'

        results.append((header, sentiment, sentiment_score))

    return results

These sentiment scores can be used to calculate an overall score. This will be done by the score_on_date function.

**Step 2: Keywords Analysis**

After analysing the sentiment of the news articles, we can analyse the positive and negative keywords in the articles to help refine the sentiment score.

In [None]:
def analyze_keywords(articles, score):
    # Define positive and negative keywords
    positive_keywords = ['partnership', 'adoption', 'integration', 'mainstream', 'listing', 'upgrade', 'investment', 'feature']
    negative_keywords = ['hack', 'regulatory crackdown', 'ban', 'exit scam', 'fraud', 'shutdown', 'legal issues']

    # Check each article for keywords
    for article in articles:
        content = article.get('content', '').lower()  # Get the content and convert to lowercase

        # Check for positive keywords
        if any(keyword in content for keyword in positive_keywords):
            score += 0.1  # Increase score for each positive keyword found

        # Check for negative keywords
        if any(keyword in content for keyword in negative_keywords):
            score -= 0.1  # Decrease score for each negative keyword found

    return score

We can define positive keywords as words that have a connotation with growth, opportunity, stability, or progress. These might include terms like 'partnership,' 'adoption', or 'integration'.

Conversely, negative keywords are words that imply risk, uncertainty, loss, or regulatory challenges. Examples could include 'crash,' 'hack,' and 'ban'.

For each positive keyword in the article, the score would increase by 0.1, while each negative keyword would decrease the score by 0.1.

**Step 3: Retrieve weighted score on specific a date**

In [None]:
def score_on_date(year,month,date,crypto):
    date = f"{year}-{month}-{date}"
    url = (f'https://newsapi.org/v2/everything?q={crypto}&from={date}&to={date}&sortBy=popularity&apiKey=ce2f9d4118e044aea7b7f7b57697055c')

    response = requests.get(url)
    data = response.json()
    articles = data.get('articles')

    # List of news article headers to analyse
    contents = [article['content'] for article in articles if article['title'] and article['content']]

    # Analyse the sentiment of the news headers
    analysis_results = analyze_sentiment(contents)

    # Initialise counters for sentiment
    sum_negative = 0
    sum_positive = 0
    sum_neutral = 0
    total = 0

    for header, sentiment, score in analysis_results:
        if sentiment == 'Negative':
            sum_negative += 1
            total += 1
        elif sentiment == 'Positive':
            sum_positive += 1
            total += 1
        else:
            sum_neutral += 1
            total += 1

    # Calculate percentage scores
    if total > 0:
        per_positive = sum_positive / total
        per_negative = sum_negative / total
        score = per_positive - per_negative
    else:
        print(f'{crypto} : No sentiment data available.')
        score = 0  # No data leads to a neutral score

    # Analyse keywords and adjust the score
    score = analyze_keywords(articles, score)

    return (score)


The above function is used to calculate the sentiment score of the selected cryptocurrency for a given date. This is done by finding the proportion of the articles that are positive and negative and minusing the negative proportions from the positive proportions. The score is then passed into the analyze_keywords function to consider the positive and negative keywords.

In [None]:
print(score_on_date(2024,10,22,"Ethereum"))

For example, this displays the score for Ethereum on October 22nd, 2024.

In order to not exhaust all of our API Calls, we are using scores that were pre-computed using the same score_on_date function for "Ethereum" and save them in a dictionary as shown below.

*Note: With access to the premium NewsAPI this is not necessary and would also allow access to scores before the past month.*

In [None]:
nlpdict = {"2024-9-25": 0.2, "2024-9-26": 0.57, "2024-9-27": -0.13829787234042556, "2024-9-28": 0.13157894736842107, "2024-9-29": 0.8428571428571429, "2024-9-30": 0.4043478260869565, "2024-10-1": 0.53, "2024-10-2": 0.9894736842105262, "2024-10-3": -1.2, "2024-10-4": 0.16853932584269668, "2024-10-5": 0.08837209302325583, "2024-10-6": -0.20416666666666664, "2024-10-7": -0.38, "2024-10-8": 0.7941176470588234, "2024-10-9": 0.22000000000000006, "2024-10-10": 0.8399999999999999, "2024-10-11": 0.47777777777777775, "2024-10-12": -0.08235294117647063, "2024-10-13": 0.5555555555555555, "2024-10-14": 1.6400000000000003, "2024-10-15": 0.9699999999999999, "2024-10-16": 0.702061855670103, "2024-10-17": 0.31999999999999995, "2024-10-18": 0.47, "2024-10-19": 1.0283018867924527, "2024-10-20": 0.004545454545454547, "2024-10-21": 1.2333333333333334, "2024-10-22": 0.4826086956521739, "2024-10-23": 0.57, "2024-10-24": -0.33, "2024-10-25": 0.12000000000000005}


**Step 4: Implementing the moving average crossing algorithm**

We then implement the moving average crossing algorithm to detect up and down trends in our sentiment score. For this, we are using the built-in crossover and SMA (simple moving average) functions from our backtesting engine.

In [None]:
class NLP(Strategy):
    n1 = 6
    n2 = 8

    def init(self):
        self.sma1 = self.I(SMA, nlpdict.values(), self.n1)
        self.sma2 = self.I(SMA, nlpdict.values(), self.n2)

    def next(self):
        if crossover(self.sma1, self.sma2):
            self.buy()

        elif crossover(self.sma2, self.sma1):
            self.sell()

The class variables n1 and n2 determine the length of the short-term and long-term interval respectively. Once the short-term trend crosses the long-term trend, we open a long position. Vice versa, we open a short position if the long-term trend crosses the short-term trend.

In this case, we are already using the optimal values for n1 and n2. To see how we determined those see **4.1 Optimisation**.

# **4. Backtesting**

As a proof of concept, we are backtesting our strategy for Ethereum. Specificially, we are simulating it against a binance ETH/USDT as described in **2. Data Aggregation**. Again, using the paid subscription to backtest the past multiple years should improve accuracy by a lot.

First, we load the candlebar data for the ETH/USDT trading pair into a Pandas dataframe. (See: **2. Data Aggregation**)

*(Make sure you correctly followed the setup insturctions on GitHub and have uploaded the ETHUSDT.csv to the colab session storage.)*

In [None]:
filename = "ETHUSDT.csv"
prices = pd.read_csv(filename, index_col='Unix', parse_dates=True)

Then, we instantiate a backtest for our backtesting engine using this data and the strategy we defined in **3. Strategy Step 4**. The engine simulates with initial equity of 100000 USDT and assumes a commission fee of .001 per trade.
By setting exclusive_ordes=True we assume that we only have one open position at a time. This means that opening a new position automatically closes all previous positions first.

In [None]:
backtest = Backtest(prices, NLP,
              cash=100000, commission=.001,
              exclusive_orders=True)
output = backtest.run()

The results of our backtest can be summarised and plotted using the functions below.

In [None]:
print(output)

In [None]:
backtest.plot()

As you can see on the plot, the strategy makes a total of 4 trades and yields an overall profit of approximately 31.71% during our one month time span.

**4.1 Optimisation**

To find the optimal values n1 and n2 we can use the built-in optimisation function.


In [None]:
stats = backtest.optimize(n1=range(1, 15, 1),
                    n2=range(2, 30, 2),
                    maximize='Equity Final [$]',
                    constraint=lambda param: param.n1 < param.n2)

print(stats._strategy)

In our case, this gives us n1=6 and n2=8 as optimal values. But as the free NewsAPI subscription limits our available data, the best values may vary on a bigger dataset and with further historic data.

# **5. Pseudocode**

    import necessary libraries
    import datetime, timedelta, and date from datetime
    import nltk
    import SentimentIntensityAnalyzer from nltk.sentiment

    use nltk to download vader_lexicon

    import requests
    import pandas as pd
    import json

    # Assume we have an API connected to a DeFi trading platform
    import TradingAPI
    from TradingAPI import crossover, SMA, Strategy
---
    # Define function to analyze sentiment
    FUNCTION analyze_sentiment(contents):
      SET results to an empty list

      FOR each content in contents:
          CALCULATE sentiment_score using sia.polarity_scores(content)

          # Classify sentiment based on compound score
          IF sentiment_score['compound'] >= 0.05:
              SET sentiment to 'Positive'
          ELSE IF sentiment_score['compound'] <= -0.05:
              SET sentiment to 'Negative'
          ELSE:
              SET sentiment to 'Neutral'

          APPEND (content, sentiment, sentiment_score) to results

      RETURN results

---
    FUNCTION analyze_keywords(articles, score):
      DEFINE positive_keywords list with key terms for positive sentiment
      DEFINE negative_keywords list with key terms for negative sentiment

      FOR each article in articles:
          SET content to article's content in lowercase

          # Check for positive keywords
          IF any keyword in content matches a positive_keywords entry:
              INCREMENT score by 0.1

          # Check for negative keywords
          IF any keyword in content matches a negative_keywords entry:
              DECREMENT score by 0.1

      RETURN score
---
    FUNCTION score_on_date(year, month, date, crypto):
      SET date to formatted string "{year}-{month}-{date}"
      SET url to news API endpoint with specified date, crypto, and API key

      GET response from API request to url
      SET data to parsed JSON response
      SET articles to articles within data

      # Extract article contents
      SET contents to a list of article contents if title and content exist

      # Analyze sentiment of extracted contents
      SET analysis_results to analyze_sentiment(contents)

      # Initialize counters for sentiment categories
      SET sum_negative, sum_positive, sum_neutral, total to 0

      FOR each (header, sentiment, score) in analysis_results:
          IF sentiment is 'Negative':
              INCREMENT sum_negative and total by 1
          ELSE IF sentiment is 'Positive':
              INCREMENT sum_positive and total by 1
          ELSE:
              INCREMENT sum_neutral and total by 1

      # Calculate percentage scores if total is greater than 0
      IF total > 0:
          SET per_positive to sum_positive / total
          SET per_negative to sum_negative / total
          SET score to per_positive - per_negative
      ELSE:
          PRINT "No sentiment data available for {crypto}"
          SET score to 0

      # Adjust score based on keyword analysis
      SET score to analyze_keywords(articles, score)

      RETURN score


---
    CLASS NLP inherits from Strategy:
        SET n1 to 6
        SET n2 to 8

        # Define initialization method
        FUNCTION init():
            SET sma1 to result of calling SMA function on nlpdict values with window size n1
            SET sma2 to result of calling SMA function on nlpdict values with window size n2

        # Define method for each new candle in the trading strategy
        FUNCTION next():
            IF crossover(sma1, sma2) is True:
                CALL buy()

            ELSE IF crossover(sma2, sma1) is True:
                CALL sell()


    # Main program
    DEFINE nlpdict as a dictionary of all historical scores retrieved via score_on_date
    nlpdict.sort()

    # Execute our strategy on the trading platform
    TradingAPI.trade(NLP, exlusive_order=True)