# Elon Musk Tweets Sentiment Analysis

I will be attempting to write an algorithmic trading bot which either longs or shorts TSLA stock depending on the sentiment analysis of Elon Musk's tweets. In order to see if such an algorithm would perform well, I will be gathering a [dataset](https://www.kaggle.com/ayhmrba/elon-musk-tweets-2010-2021) of his tweets from the years 2011-2021, and using them to backtest the strategy on QuantConnect. 



## First, some preprocessing is needed to clean up the data.

In [None]:
import pandas as pd
import re

In [None]:
df = pd.read_csv("D:/Code/QuantConnect/ElonMuskTweetSentimentAnalysis/data/2021.csv")
df

The only columns we will need are the 'date' and 'tweet' columns. Also, we need to reverse them for the algorithm's sake. They need to be going from olders to latest, not vice versa.

In [None]:
df = df[['date', 'tweet']]
df = df[::-1].reset_index(drop=True)
df

Much better. Lets look at a couple random tweets.

In [None]:
print(df['tweet'][115])
print(df['tweet'][742])
print(df['tweet'][3545])
print(df['tweet'][9211])
print(df['tweet'][2259])

So, firstly we can see that it is possible for Elon to share good news about Tesla, which has a probability of inflating TSLA stock price, thereby giving us a bit of alpha if we are quick enough. Secondly, we can see that there are URLS in a lot of the tweets, which could be detremental to the sentiment analyzer, so we will have to remove them.

In [None]:
for i in range(len(df)):
    if "http" in df["tweet"][i]:
        urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|[?:%[0-9a-fA-F][0-9a-fA-F])+', df["tweet"][i])

        for url in urls:
            df["tweet"][i] = df["tweet"][i].replace(url, '{URL}')

In [None]:
print(df['tweet'][115])
print(df['tweet'][742])
print(df['tweet'][3545])
print(df['tweet'][9211])
print(df['tweet'][2259])

Much better. I will save this new DF as a csv of its own to have for later.

In [None]:
df.to_csv("data/elon_tweets/ElonMuskTweetsPreProcessed.csv", index=False)

Now, since QuantConnect does not let us import the transformers library into its environement, we will have to perform the sentiment analysis on the data beforehand, and save it as a new csv which has scores instead of tweets. 

In [None]:
df

In [None]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')