# CryptoCurrency - Data Aggregation

## Objective

Aggregating data for the following stocks
*'['Binance Coin', 'Bitcoin', 'EOS', 'Ethereum', 'Litecoin','Stellar', 'TRON', 'XRP', 'Bitcoin Cash']*

We will have to aggergate data from the following sources:

1. OHLCV Data
2. General News
3. Financial News
4. Reddit Information
5. Twitter Information

The data is from the period 1st January 2018 till 27th Feb 2019 [1 Year] for all 9 cryptocurrencies

### OHLCV Data - CryptoCurrency

In [3]:
import pandas as pd
ohlcv_data = pd.read_json('Hourly-Processed-Data/crypto_hist.json')
ohlcv_data = ohlcv_data.rename({'created_time' : 'created_utc','asset_name':'symbol','crypto_name':'asset_name'},axis=1)
ohlcv_data['created_utc'] = pd.to_datetime(ohlcv_data['created_utc']).dt.tz_localize(None)
ohlcv_data.head(4)

Unnamed: 0,symbol,close,created_utc,asset_name,high,low,open,volumefrom,volumeto
0,BNB,14.1,2018-01-22 08:00:00,Binance Coin,14.44,14.0,14.14,105978.95,1507209.31
1,BNB,14.2,2018-01-22 09:00:00,Binance Coin,14.35,13.85,14.1,83496.47,1184968.02
2,BNB,14.05,2018-01-22 10:00:00,Binance Coin,14.24,14.03,14.2,33415.74,471694.25
3,BNB,13.43,2018-01-22 11:00:00,Binance Coin,14.11,13.41,14.05,110293.99,1515195.81


### Reddit Data - CryptoCurrency

In [6]:
reddit_crypto_data = pd.read_json('Hourly-Processed-Data/reddit_crypto.json')
reddit_crypto_data['created_utc'] = pd.to_datetime(reddit_crypto_data['created_utc']).dt.tz_localize(None)
reddit_crypto_data = reddit_crypto_data.rename({'crypto' : 'asset_name','compound':'reddit_compound', 'domain':'reddit_domain', 'neg':'reddit_neg','neu': 'reddit_neu',
       'num_comments':'reddit_num_comments', 'pos':'reddit_pos', 'score':'reddit_score','title': 'reddit_title'},axis=1)

reddit_crypto_data.head(3)

Unnamed: 0,reddit_compound,created_utc,asset_name,reddit_domain,reddit_neg,reddit_neu,reddit_num_comments,reddit_pos,reddit_score,reddit_title
0,0.064512,2018-01-01 00:00:00,Binance Coin,"[self.cardano, self.Ripple, self.altcoin, self...",0.122176,0.769882,17,0.107941,4.588235,"[Binance withdrawal suspended, A significant n..."
1,0.167085,2018-01-01 01:00:00,Binance Coin,"[self.kucoin, self.noncensored_bitcoin, self.R...",0.044846,0.852846,13,0.102308,41.538462,"[Kucoin exchange pros/cons, [uncensored-r/Cryp..."
2,-0.126776,2018-01-01 02:00:00,Binance Coin,"[coinstreet.io, self.IOTAmarkets, self.noncens...",0.101412,0.879059,17,0.019529,2.411765,[How to Buy Cindicator at Binance (CND) - A St...


In [8]:
new_df = pd.merge(ohlcv_data,reddit_crypto_data,  how='left', on = ['asset_name', 'created_utc'])
new_df.head(2)

Unnamed: 0,symbol,close,created_utc,asset_name,high,low,open,volumefrom,volumeto,reddit_compound,reddit_domain,reddit_neg,reddit_neu,reddit_num_comments,reddit_pos,reddit_score,reddit_title
0,BNB,14.1,2018-01-22 08:00:00,Binance Coin,14.44,14.0,14.14,105978.95,1507209.31,0.110769,"[forum.bitcoin.com, self.CryptoCurrency, self....",0.056615,0.831615,13.0,0.111769,2.769231,[Telegram Groups &amp; Pumps sub-forum • Free ...
1,BNB,14.2,2018-01-22 09:00:00,Binance Coin,14.35,13.85,14.1,83496.47,1184968.02,0.104713,"[support.binance.com, self.Stellar, self.strat...",0.03825,0.81775,8.0,0.144,5.5,[Hey Devs! Pls go and fill out the form to get...


### General News - CryptoCurrency

In [9]:
gen_news = pd.read_json('Hourly-Processed-Data/processed_general_news.json')
gen_news['time'] = pd.to_datetime(gen_news['time']).dt.tz_localize(None)
gen_news = gen_news.rename({'compound':'news_compound', 'kids':'news_kids', 'neg':'news_neg','neu': 'news_neu',
       'url':'news_url', 'pos':'news_pos', 'score':'news_score',
                'title': 'news_title','time':'created_utc'},axis=1)
gen_news.head(3)

Unnamed: 0,news_compound,news_kids,news_neg,news_neu,news_pos,news_score,created_utc,news_title,news_url
0,0.128304,57,0.03,0.850982,0.119018,6.754386,2017-09-27 20:00:00,"[Hacktoberfest 2017, 18 things only an Indie d...","[hacktoberfest.digitalocean.com, www.buildbox...."
1,0.060505,58,0.060103,0.852379,0.087517,4.689655,2017-09-27 21:00:00,[Introducing Akaunting: Free Accounting Softwa...,"[akaunting.com, futurism.com, www.bbc.co.uk, l..."
2,0.103068,47,0.056213,0.826766,0.117021,3.957447,2017-09-27 22:00:00,[US Senator sees Reddit as potential target fo...,"[thehill.com, www.facebook.com, www.npmjs.com,..."


In [11]:
result_1 = pd.merge(new_df,gen_news,on=['created_utc'],how='left')
result_1 = result_1[result_1.created_utc < '2019-02-19 23:00:00']
result_1.columns.values

array(['symbol', 'close', 'created_utc', 'asset_name', 'high', 'low',
       'open', 'volumefrom', 'volumeto', 'reddit_compound',
       'reddit_domain', 'reddit_neg', 'reddit_neu', 'reddit_num_comments',
       'reddit_pos', 'reddit_score', 'reddit_title', 'news_compound',
       'news_kids', 'news_neg', 'news_neu', 'news_pos', 'news_score',
       'news_title', 'news_url'], dtype=object)

### Financial News - CryptoCurrency

In [12]:
fin_news = pd.read_json('Hourly-Processed-Data/processed_financial_news.json')
fin_news['created_utc'] = pd.to_datetime(fin_news['created_utc']).dt.tz_localize(None)
fin_news = fin_news.rename({'compound':'fin_compound', 'subheading':'fin_subheading', 'neg':'fin_neg','neu': 'fin_neu',
 'pos':'fin_pos','title': 'fin_title'},axis=1)
fin_news.head(3)

Unnamed: 0,fin_compound,created_utc,fin_neg,fin_neu,fin_pos,fin_subheading,fin_title
0,0.0,2017-02-16 22:00:00,0.0,1.0,0.0,[0],"[Fast Asia Open: Singapore GDP, Thailand forex..."
1,0.0,2017-02-16 23:00:00,0.0,0.0,0.0,[],[]
2,0.368767,2017-02-17 00:00:00,0.035,0.775,0.19,"[0, 0, Wall Street broke its longest winning s...","[Sterling's puzzling purple patch, Singapore Q..."


In [13]:
result_2 = pd.merge(result_1,fin_news,on=['created_utc'],how='left')
result_2.columns.values

array(['symbol', 'close', 'created_utc', 'asset_name', 'high', 'low',
       'open', 'volumefrom', 'volumeto', 'reddit_compound',
       'reddit_domain', 'reddit_neg', 'reddit_neu', 'reddit_num_comments',
       'reddit_pos', 'reddit_score', 'reddit_title', 'news_compound',
       'news_kids', 'news_neg', 'news_neu', 'news_pos', 'news_score',
       'news_title', 'news_url', 'fin_compound', 'fin_neg', 'fin_neu',
       'fin_pos', 'fin_subheading', 'fin_title'], dtype=object)

### Twitter - CryptoCurrency

In [15]:
twitter_crypto_data = pd.read_json('Hourly-Processed-Data/twitter_crypto.json')
twitter_crypto_data['created_utc'] = pd.to_datetime(twitter_crypto_data['created_utc']).dt.tz_localize(None)
twitter_crypto_data = twitter_crypto_data.rename({'compound':'tweet_compound', 'favorites':'tweet_favorites', 'neg':'tweet_neg','neu': 'tweet_neu',
       'favorites':'tweet_favorites', 'pos':'tweet_pos', 'retweets':'tweet_retweets','text': 'tweet_text','hashtags':'tweet_hashtags'},axis=1)

twitter_crypto_data.head(5)

Unnamed: 0,asset_name,tweet_compound,created_utc,tweet_favorites,tweet_hashtags,tweet_neg,tweet_neu,tweet_pos,tweet_retweets,tweet_text
0,Binance Coin,0.153991,2017-12-31 16:00:00,23,"[#XVG #Binance #Wraith #WraithProtocol, None, ...",0.040739,0.879913,0.079304,23,[#XVG at the top on #Binance http:// XVG.zone ...
1,Binance Coin,0.26485,2017-12-31 17:00:00,2,"[#xvg #verge #cryptocurrency #crypto, #Verge #...",0.026,0.891,0.0835,2,[@bitshares @XVGAsia @XVGWhale @_CryptoBeggar ...
2,Binance Coin,0.0322,2017-12-31 19:00:00,2,"[None, None]",0.022,0.9525,0.0255,2,[I'm not a whale either. My address is my Bina...
3,Binance Coin,0.9741,2017-12-31 20:00:00,1,[#SHND #stronghands],0.0,0.574,0.426,1,[to SHND community in the world. I am the memb...
4,Binance Coin,0.8922,2017-12-31 23:00:00,1,[#WraithProtocol #Vergecurrency #verge #ripple],0.0,0.798,0.202,1,"[#WraithProtocol is complete, and released. #V..."


### Final Aggeration

In [17]:
aggerageted_df = pd.merge(result_2,twitter_crypto_data,  how='left', on = ['asset_name', 'created_utc'])
aggerageted_df.head(5)

Unnamed: 0,symbol,close,created_utc,asset_name,high,low,open,volumefrom,volumeto,reddit_compound,...,fin_subheading,fin_title,tweet_compound,tweet_favorites,tweet_hashtags,tweet_neg,tweet_neu,tweet_pos,tweet_retweets,tweet_text
0,BNB,14.1,2018-01-22 08:00:00,Binance Coin,14.44,14.0,14.14,105978.95,1507209.31,0.110769,...,[Possibility that machines can get smarter wil...,[Artificial intelligence could yet upend the l...,0.4443,3.0,"[None, None, #XRP #crypto #cryptocurrency #bit...",0.0,0.859,0.141,3.0,[It’s open again. Binance sign up. Don’t be to...
1,BNB,14.2,2018-01-22 09:00:00,Binance Coin,14.35,13.85,14.1,83496.47,1184968.02,0.104713,...,[Stapleton Capital becomes latest group to ‘pi...,[Telecoms group up 125% on name change to Bloc...,0.1647,4.0,"[#domains, #crypto #altcoins, #Binance #altcoi...",0.0,0.9715,0.0285,4.0,[http:// CryptoWorld.com sold for $195k to Bin...
2,BNB,14.05,2018-01-22 10:00:00,Binance Coin,14.24,14.03,14.2,33415.74,471694.25,0.104555,...,[Move seen as creating a ‘disastrous consequen...,[Call for South Korea to reconsider tax revisi...,0.659,1.0,[#DYOR #kucoin #bittrix #binance],0.046,0.795,0.159,1.0,[Incase you missed the $ btcp article. https:/...
3,BNB,13.43,2018-01-22 11:00:00,Binance Coin,14.11,13.41,14.05,110293.99,1515195.81,0.256573,...,[Investors on watch for signs of super-charged...,[Fourth-quarter earnings to offer insight into...,0.0,2.0,"[#SUB, #satoshi #hitbtc #bittrex]",0.0,1.0,0.0,2.0,[$ SUB #SUB about to TAKE OFF.. . Making a cal...
4,BNB,13.5,2018-01-22 12:00:00,Binance Coin,13.55,13.18,13.43,130306.48,1740399.59,0.110217,...,[Funding vote unlikely to pass with Republican...,"[Daily Briefing: Fresh US shutdown vote, Davos...",0.47925,4.0,"[#btc #Bittrex #JohnMcAfee #yobit #Binance, #c...",0.0,0.86225,0.13775,4.0,"[All crypto exchanges out there,$html coin com..."


In [19]:
#aggerageted_df.to_json('processed_crypto.json',orient='records',date_format='iso')
aggerageted_df.columns

Index(['symbol', 'close', 'created_utc', 'asset_name', 'high', 'low', 'open',
       'volumefrom', 'volumeto', 'reddit_compound', 'reddit_domain',
       'reddit_neg', 'reddit_neu', 'reddit_num_comments', 'reddit_pos',
       'reddit_score', 'reddit_title', 'news_compound', 'news_kids',
       'news_neg', 'news_neu', 'news_pos', 'news_score', 'news_title',
       'news_url', 'fin_compound', 'fin_neg', 'fin_neu', 'fin_pos',
       'fin_subheading', 'fin_title', 'tweet_compound', 'tweet_favorites',
       'tweet_hashtags', 'tweet_neg', 'tweet_neu', 'tweet_pos',
       'tweet_retweets', 'tweet_text'],
      dtype='object')