# Correlation analysis between the Bitcoin currency and Twitter

This project consists of a correlation analysis between the Bitcoin currency and tweets. In order to define the positiveness of a tweet (if the course of the bitcoin will go up or down), we realise a sentiment analysis of each tweet using the VADER algorithm. Finally we try to find a correlation between the two and we will make some machine learning to make predictions.

This notebook was written using Python 3.6.

# Sentiment analysis

We will first read all the tweets that we retrieved with the API in the TwitterExtraction notebook.

In [20]:
import json
import pandas as pd
import io

d = pd.read_csv('tweets_raw.csv')
d

Unnamed: 0,ID,Text,UserName,UserFollowerCount,RetweetCount,CreatedAt
0,995987428166000642,RT @CryptoAmb: Cryptocurrency is not far from ...,Mark Brown,402,1,Mon May 14 11:21:22 +0000 2018
1,995987423271235585,"RT @icopool: 10,000 XRP Giveaway is now on! 10...",Billy Blocks,333,25,Mon May 14 11:21:20 +0000 2018
2,995987420058275841,Want to switch your mining between different c...,MaxiMine,404,0,Mon May 14 11:21:20 +0000 2018
3,995987412550606849,Day Trading: 2 Manuscripts: Absolute Beginners...,Blockchain,17876,0,Mon May 14 11:21:18 +0000 2018
4,995987411472670723,Arduino: The Comprehensive Beginner's Guide To...,Blockchain,17876,0,Mon May 14 11:21:18 +0000 2018
5,995987400315801600,Well ain't that interesting. #blockchain #bitc...,Crypto is Coming ⚡️,1213,0,Mon May 14 11:21:15 +0000 2018
6,995987398587834368,"RT @decoin_io: DECOIN is a blockchain-based ""R...",Alla Mahaban,3817,262,Mon May 14 11:21:15 +0000 2018
7,995987395542634496,RT @DAGTofficial: Goldman Sachs and even the N...,Dbravo,7627,486,Mon May 14 11:21:14 +0000 2018
8,995987392661217282,RT @ArminVanBitcoin: #segwit adoption hits new...,ti.na1,237,48,Mon May 14 11:21:13 +0000 2018
9,995987378094460928,Future Focus with Walmart's Latest Blockchain ...,BitcoinNews.com,10985,0,Mon May 14 11:21:10 +0000 2018


### VADER
Here we do the sentiment analysis to calculate the sentiment score for each tweet with the VADER algorithm. Then we add the number of retweets of each tweet as a weight to the score.

Don't forget to install VaderSentiment `pip install VaderSentiment`!

In [21]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
#neg = []
#neu = []
#pos = []
compound = []
for i,s in enumerate(d['Text']):
    vs = analyzer.polarity_scores(s)
    #print("{:-<65} {}".format(s, str(vs)))
    #neg.append(vs["neg"])
    #neu.append(vs["neu"])
    #pos.append(vs["pos"])
    compound.append(vs["compound"])
#d["neg"] = pd.Series(neg)
#d["neu"] = pd.Series(neu)
#d["pos"] = pd.Series(pos)
d["compound"] = pd.Series(compound)

d

Unnamed: 0,ID,Text,UserName,UserFollowerCount,RetweetCount,CreatedAt,compound
0,995987428166000642,RT @CryptoAmb: Cryptocurrency is not far from ...,Mark Brown,402,1,Mon May 14 11:21:22 +0000 2018,0.0000
1,995987423271235585,"RT @icopool: 10,000 XRP Giveaway is now on! 10...",Billy Blocks,333,25,Mon May 14 11:21:20 +0000 2018,0.8381
2,995987420058275841,Want to switch your mining between different c...,MaxiMine,404,0,Mon May 14 11:21:20 +0000 2018,0.0772
3,995987412550606849,Day Trading: 2 Manuscripts: Absolute Beginners...,Blockchain,17876,0,Mon May 14 11:21:18 +0000 2018,0.0000
4,995987411472670723,Arduino: The Comprehensive Beginner's Guide To...,Blockchain,17876,0,Mon May 14 11:21:18 +0000 2018,0.2500
5,995987400315801600,Well ain't that interesting. #blockchain #bitc...,Crypto is Coming ⚡️,1213,0,Mon May 14 11:21:15 +0000 2018,-0.0408
6,995987398587834368,"RT @decoin_io: DECOIN is a blockchain-based ""R...",Alla Mahaban,3817,262,Mon May 14 11:21:15 +0000 2018,0.0000
7,995987395542634496,RT @DAGTofficial: Goldman Sachs and even the N...,Dbravo,7627,486,Mon May 14 11:21:14 +0000 2018,0.0000
8,995987392661217282,RT @ArminVanBitcoin: #segwit adoption hits new...,ti.na1,237,48,Mon May 14 11:21:13 +0000 2018,0.4574
9,995987378094460928,Future Focus with Walmart's Latest Blockchain ...,BitcoinNews.com,10985,0,Mon May 14 11:21:10 +0000 2018,0.0000


In [None]:
scores = []
for i,s in d.iterrows():
    scores.append(s["compound"] * ((s["RetweetCount"]+1)/1000) * ((s["UserFollowerCount"]+1)/1000))
    print(s["compound"] * ((s["RetweetCount"]+1)/1000) * ((s["UserFollowerCount"]+1)/1000))
d["Scores"] = pd.Series(scores)

In [None]:
import matplotlib.pyplot as plt
#d['CreatedAt'] = pd.to_datetime(d['CreatedAt'])
#d.index = d['CreatedAt']
#d2 = d.groupby(d.index.hour).mean()
#print(d2)
plt.plot_date(d["CreatedAt"], d["Scores"], ls='-', marker='o')
plt.show()

## Retrieve the Bitcoin currency
Here we use the Kraken API.

TODO Do we retrieve data from multiple API (maybe they don't have the same prices).

TODO Maybe we should make a daily prevision

In [None]:
import os
import numpy as np
import pandas as pd
import pickle
import quandl
from datetime import datetime

In [None]:
import plotly.offline as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
py.init_notebook_mode(connected=True)

In [None]:
def get_quandl_data(quandl_id):
    '''Download and cache Quandl dataseries'''
    cache_path = '{}.pkl'.format(quandl_id).replace('/','-')
    try:
        f = open(cache_path, 'rb')
        df = pickle.load(f)   
        print('Loaded {} from cache'.format(quandl_id))
    except (OSError, IOError) as e:
        print('Downloading {} from Quandl'.format(quandl_id))
        df = quandl.get(quandl_id, returns="pandas")
        df.to_pickle(cache_path)
        print('Cached {} at {}'.format(quandl_id, cache_path))
    return df

In [None]:
btc_usd_price_kraken = get_quandl_data('BCHARTS/KRAKENUSD')


In [None]:
btc_usd_price_kraken.head()


In [None]:
btc_trace = go.Scatter(x=btc_usd_price_kraken.index, y=btc_usd_price_kraken['Weighted Price'])
py.iplot([btc_trace])

## Correlation analysis
Here we make the correlation analysis between Bitcoin currency and tweets.

We can test our correlation hypothesis using the Pandas corr() method, which computes a Pearson correlation coefficient for each column in the dataframe against each other column.

In [None]:
d.corr(btc_usd_price_kraken.reindex(d.index, method='pad'))

# TODO Check twythonstreamer?

Stream the new tweets without limit?? It would be nice for realtime predictions

In [None]:
from twython import TwythonStreamer

class MyStreamer(TwythonStreamer):
    def on_success(self, data):
        if 'text' in data:
            print(data['text'])

    def on_error(self, status_code, data):
        print(status_code)

        # Want to stop trying to get data because of the error?
        # Uncomment the next line!
        # self.disconnect()
OAUTH_TOKEN = "3459248236-0XPtHldG3ou6BfpTwaKWnOL2ywFk2niQekLwE7K"
OAUTH_TOKEN_SECRET = "08Vy2wuOkp7AmuC3rbjCHFJ94MLG2sWqdvGQtoiXmkVKr"

stream = MyStreamer(APP_KEY, APP_SECRET,
                    OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
stream.statuses.filter(track='bitcoin')