# TWITTER WEBSCRAPPING

Here we are going to webscrap the twitter accounts of the top-100 influencers in the cryptocurrency world. The first step is webscrapping the top100 accounts and then all the tweets they perfromed.

#### Data Sources:

Cryptocurrencies influencer ranking: "https://cryptoweekly.co/100/"

The following code has been adapted from:
https://codeburst.io/a-twitter-analysis-of-the-100-most-influential-people-in-crypto-bb95b2608925 and https://dev.to/rodolfoferro/sentiment-analysis-on-trumpss-tweets-using-python-

In [2]:
# First we load the general packages:
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
# Here we are going to write a function to webscarpe the names of the influencers and their twitter handles:

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup

def getTwitterHandles():
	# Fill in with url of page which is to be scraped
	url = "https://cryptoweekly.co/100/"

	# Retreives and parses page html
	client = urlopen(url)
	pageHtml = client.read()
	pageSoup = soup(pageHtml, "html.parser")

	# Adds all Twitter handles to twitterHandles list
	profiles = pageSoup.findAll("div", {"class":"testimonial-wrapper"})
	twitterHandles = []
	for person in profiles:
		twitterHandles.append(person.findAll("div",{"class":"author"}))
	for i in range(len(twitterHandles)):
		twitterHandles[i]=twitterHandles[i][0].findAll("a")[0].text[1:]

	client.close()
	return twitterHandles


In [3]:
# Here the Twitter API tweepy is used to webscrape the tweets from the influencers:

import tweepy
import csv
import sys

# Twitter API credentials (expired, don't even try it)
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""


def get_all_tweets(screen_name):
	print("Getting tweets from @" + str(screen_name))

	#Twitter only allows access to a users most recent 3240 tweets with this method
	
	#authorize twitter, initialize tweepy
	auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
	auth.set_access_token(access_key, access_secret)
	api = tweepy.API(auth)
	
	#initialize a list to hold all the tweepy Tweets
	alltweets = []	
	
	#make initial request for most recent tweets (200 is the maximum allowed count)
	new_tweets = api.user_timeline(screen_name = screen_name,count=200)
	
	#save most recent tweets
	alltweets.extend(new_tweets)
	
	#save the id of the oldest tweet less one
	oldest = alltweets[-1].id - 1
	
	#keep grabbing tweets until there are no tweets left to grab
	while len(new_tweets) > 0:
		print ("Getting tweets before %s" % (oldest))
		
		#all subsiquent requests use the max_id param to prevent duplicates
		new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
		
		#save most recent tweets
		alltweets.extend(new_tweets)
		
		#update the id of the oldest tweet less one
		oldest = alltweets[-1].id - 1
		
		print ("...%s tweets downloaded so far" % (len(alltweets)))

	#transform the tweepy tweets into a 2D array that will populate the csv	
	outtweets = [[tweet.id_str.encode('utf-8'), tweet.created_at.strftime('%m/%d/%Y'), tweet.text.encode('utf-8')] for tweet in alltweets]
	
	#write the csv	
	with open('C:/Users/Eric/Python Projects/Bitcoin-NLP-Strategy/Data/Tweets/%s_tweets.csv' % screen_name, 'w') as f:
		writer = csv.writer(f)
		writer.writerow(["id","created_at","text"])
		writer.writerows(outtweets)
	
	pass


if __name__ == '__main__':
	handles = getTwitterHandles()
    # The Peter Todd twitter handle is updated to his new account, as long as the old one was listed in the website:
	for i in range(len(handles)):
		if handles[i] == 'petertoddbtc':
			handles[i] = 'peterktodd'
	for handle in handles:
		get_all_tweets(str(handle))

Getting tweets from @VitalikButerin
Getting tweets before 1030153461071937535
...400 tweets downloaded so far
Getting tweets before 1027019992137519103
...600 tweets downloaded so far
Getting tweets before 1017464117966327813
...800 tweets downloaded so far
Getting tweets before 1011576147245559807
...1000 tweets downloaded so far
Getting tweets before 1002007994186448895
...1200 tweets downloaded so far
Getting tweets before 993681117982150655
...1399 tweets downloaded so far
Getting tweets before 987663625383956479
...1599 tweets downloaded so far
Getting tweets before 981095914616979455
...1799 tweets downloaded so far
Getting tweets before 968005088080928770
...1999 tweets downloaded so far
Getting tweets before 955487885276385279
...2198 tweets downloaded so far
Getting tweets before 945324451259858944
...2398 tweets downloaded so far
Getting tweets before 936785364991197183
...2598 tweets downloaded so far
Getting tweets before 932523105817214975
...2798 tweets downloaded so far


...2194 tweets downloaded so far
Getting tweets before 786870903174930431
...2394 tweets downloaded so far
Getting tweets before 772785360044711935
...2594 tweets downloaded so far
Getting tweets before 764193170431369215
...2794 tweets downloaded so far
Getting tweets before 755207484000890880
...2994 tweets downloaded so far
Getting tweets before 749913271705145343
...3194 tweets downloaded so far
Getting tweets before 746719231010545664
...3195 tweets downloaded so far
Getting tweets before 746718897399799807
...3195 tweets downloaded so far
Getting tweets from @gavinandresen
Getting tweets before 940643034584186888
...398 tweets downloaded so far
Getting tweets before 866019623266897921
...598 tweets downloaded so far
Getting tweets before 831890752653422593
...797 tweets downloaded so far
Getting tweets before 745687988609388544
...996 tweets downloaded so far
Getting tweets before 661943242158694399
...1196 tweets downloaded so far
Getting tweets before 578596808378720255
...1396

...1399 tweets downloaded so far
Getting tweets before 987272565633798144
...1599 tweets downloaded so far
Getting tweets before 976418040056303615
...1799 tweets downloaded so far
Getting tweets before 969266027166085119
...1999 tweets downloaded so far
Getting tweets before 963759215964753919
...2199 tweets downloaded so far
Getting tweets before 956277404422561793
...2399 tweets downloaded so far
Getting tweets before 951197494091964415
...2599 tweets downloaded so far
Getting tweets before 946746131760992255
...2799 tweets downloaded so far
Getting tweets before 943921268893601793
...2999 tweets downloaded so far
Getting tweets before 938599398828896255
...3199 tweets downloaded so far
Getting tweets before 932726696364650497
...3222 tweets downloaded so far
Getting tweets before 932366211802976255
...3222 tweets downloaded so far
Getting tweets from @TimDraper
Getting tweets before 890263674480336896
...400 tweets downloaded so far
Getting tweets before 749033103893618687
...600 t

...1594 tweets downloaded so far
Getting tweets before 870700867971710975
...1793 tweets downloaded so far
Getting tweets before 858792689705394176
...1993 tweets downloaded so far
Getting tweets before 842078492305043455
...2192 tweets downloaded so far
Getting tweets before 827356272508932095
...2392 tweets downloaded so far
Getting tweets before 818001704474079231
...2592 tweets downloaded so far
Getting tweets before 806149169735958527
...2791 tweets downloaded so far
Getting tweets before 791194640649166848
...2991 tweets downloaded so far
Getting tweets before 779938071018078207
...3188 tweets downloaded so far
Getting tweets before 766951871768567807
...3234 tweets downloaded so far
Getting tweets before 764844378468777984
...3234 tweets downloaded so far
Getting tweets from @StephanTual
Getting tweets before 973542461925568511
...400 tweets downloaded so far
Getting tweets before 953622103152001024
...600 tweets downloaded so far
Getting tweets before 922443153579433983
...800 

...1599 tweets downloaded so far
Getting tweets before 1032351074248794111
...1799 tweets downloaded so far
Getting tweets before 1031179409561145343
...1999 tweets downloaded so far
Getting tweets before 1030023294420963328
...2193 tweets downloaded so far
Getting tweets before 1028732342775545858
...2393 tweets downloaded so far
Getting tweets before 1027149352744177663
...2593 tweets downloaded so far
Getting tweets before 1025685138838052864
...2792 tweets downloaded so far
Getting tweets before 1024877568984600576
...2992 tweets downloaded so far
Getting tweets before 1022480849701416959
...3192 tweets downloaded so far
Getting tweets before 1021336041599520767
...3226 tweets downloaded so far
Getting tweets before 1020531625493516287
...3226 tweets downloaded so far
Getting tweets from @michaelkitces
Getting tweets before 1039248109652594687
...400 tweets downloaded so far
Getting tweets before 1035990344452763647
...599 tweets downloaded so far
Getting tweets before 103420139153

...999 tweets downloaded so far
Getting tweets before 888116563605245951
...1199 tweets downloaded so far
Getting tweets before 863453709514264575
...1399 tweets downloaded so far
Getting tweets before 840002238537916415
...1535 tweets downloaded so far
Getting tweets before 834131064268349439
...1535 tweets downloaded so far
Getting tweets from @novogratz
Getting tweets before 1001646432326168582
...400 tweets downloaded so far
Getting tweets before 933540763895517183
...597 tweets downloaded so far
Getting tweets before 1166191035
...597 tweets downloaded so far
Getting tweets from @dahongfei
Getting tweets before 441837033514729471
...81 tweets downloaded so far
Getting tweets from @woonomic
Getting tweets before 1022432930881257471
...400 tweets downloaded so far
Getting tweets before 1000223543655915519
...599 tweets downloaded so far
Getting tweets before 952396872596926463
...799 tweets downloaded so far
Getting tweets before 933751501381541887
...999 tweets downloaded so far
Ge

...999 tweets downloaded so far
Getting tweets before 974418435378696191
...1199 tweets downloaded so far
Getting tweets before 968882424355549183
...1399 tweets downloaded so far
Getting tweets before 963504484159770624
...1599 tweets downloaded so far
Getting tweets before 961306667739738112
...1799 tweets downloaded so far
Getting tweets before 958071849635794945
...1999 tweets downloaded so far
Getting tweets before 956297129391206399
...2199 tweets downloaded so far
Getting tweets before 954923981542486016
...2399 tweets downloaded so far
Getting tweets before 953677716754190337
...2599 tweets downloaded so far
Getting tweets before 951895463619416064
...2799 tweets downloaded so far
Getting tweets before 950773177604505599
...2999 tweets downloaded so far
Getting tweets before 949030532909846533
...3199 tweets downloaded so far
Getting tweets before 946110472109084672
...3235 tweets downloaded so far
Getting tweets before 945749941078708224
...3235 tweets downloaded so far
Gettin

...2980 tweets downloaded so far
Getting tweets before 674226084074909695
...3180 tweets downloaded so far
Getting tweets before 661094848217354239
...3209 tweets downloaded so far
Getting tweets before 659424184624398336
...3209 tweets downloaded so far
Getting tweets from @SunnyStartups
Getting tweets before 1034895616550133759
...1 tweets downloaded so far
Getting tweets from @anondran
Getting tweets before 1036282712091516928
...400 tweets downloaded so far
Getting tweets before 1029874375925948419
...599 tweets downloaded so far
Getting tweets before 1022875558567456770
...798 tweets downloaded so far
Getting tweets before 1001630420906065919
...998 tweets downloaded so far
Getting tweets before 993512072880193535
...1197 tweets downloaded so far
Getting tweets before 986003352793288703
...1397 tweets downloaded so far
Getting tweets before 981286941344911360
...1597 tweets downloaded so far
Getting tweets before 978372206546874372
...1795 tweets downloaded so far
Getting tweets b

...400 tweets downloaded so far
Getting tweets before 1005080576968347647
...600 tweets downloaded so far
Getting tweets before 996138344374513670
...800 tweets downloaded so far
Getting tweets before 985161476251013121
...1000 tweets downloaded so far
Getting tweets before 967030974352588800
...1200 tweets downloaded so far
Getting tweets before 948975089676144641
...1400 tweets downloaded so far
Getting tweets before 943137317115629567
...1600 tweets downloaded so far
Getting tweets before 939880945028681729
...1800 tweets downloaded so far
Getting tweets before 938094403234738176
...2000 tweets downloaded so far
Getting tweets before 936367326894678015
...2200 tweets downloaded so far
Getting tweets before 935373483273609215
...2400 tweets downloaded so far
Getting tweets before 932442194132180991
...2600 tweets downloaded so far
Getting tweets before 930834987104063487
...2800 tweets downloaded so far
Getting tweets before 929477062154424320
...2998 tweets downloaded so far
Getting

...1400 tweets downloaded so far
Getting tweets before 887797322809171967
...1599 tweets downloaded so far
Getting tweets before 845730743607214080
...1799 tweets downloaded so far
Getting tweets before 813579935537205247
...1999 tweets downloaded so far
Getting tweets before 791324802598379519
...2195 tweets downloaded so far
Getting tweets before 776196747257274367
...2394 tweets downloaded so far
Getting tweets before 762691098040872960
...2594 tweets downloaded so far
Getting tweets before 733034761069854719
...2793 tweets downloaded so far
Getting tweets before 703446962477993983
...2989 tweets downloaded so far
Getting tweets before 664582577093652480
...3188 tweets downloaded so far
Getting tweets before 635145934762958847
...3200 tweets downloaded so far
Getting tweets before 630866388135981056
...3200 tweets downloaded so far
Getting tweets from @cryptomanran
Getting tweets before 1034445521262440448
...400 tweets downloaded so far
Getting tweets before 1026430870901678079
...

...2000 tweets downloaded so far
Getting tweets before 936221688450105343
...2200 tweets downloaded so far
Getting tweets before 918387334940262399
...2400 tweets downloaded so far
Getting tweets before 908962204996608000
...2600 tweets downloaded so far
Getting tweets before 886834257003335679
...2800 tweets downloaded so far
Getting tweets before 867566359089004543
...3000 tweets downloaded so far
Getting tweets before 844916789632888833
...3200 tweets downloaded so far
Getting tweets before 815579657202339839
...3204 tweets downloaded so far
Getting tweets before 815113502595563519
...3204 tweets downloaded so far
Getting tweets from @prestonjbyrne
Getting tweets before 1040953520193785855
...400 tweets downloaded so far
Getting tweets before 1039670136276496383
...600 tweets downloaded so far
Getting tweets before 1038508081871761408
...800 tweets downloaded so far
Getting tweets before 1037302306189008896
...1000 tweets downloaded so far
Getting tweets before 1035270607817134084
.

...2000 tweets downloaded so far
Getting tweets before 959166162356629504
...2200 tweets downloaded so far
Getting tweets before 949810112918687743
...2400 tweets downloaded so far
Getting tweets before 941286622003453951
...2599 tweets downloaded so far
Getting tweets before 933590573486817279
...2798 tweets downloaded so far
Getting tweets before 926651102862430207
...2996 tweets downloaded so far
Getting tweets before 919567276386471936
...3195 tweets downloaded so far
Getting tweets before 911394006696910848
...3234 tweets downloaded so far
Getting tweets before 909490641259962370
...3234 tweets downloaded so far
Getting tweets from @certainassets
Getting tweets before 955916091066339328
...395 tweets downloaded so far
Getting tweets before 922648041848541183
...592 tweets downloaded so far
Getting tweets before 895444229198983168
...787 tweets downloaded so far
Getting tweets before 855911982263914496
...983 tweets downloaded so far
Getting tweets before 819378723040489471
...1180

In [8]:
# The previous function generated a .csv file for every account, here we create a unified dataset:
# Create a unified dataset:
mt = pd.DataFrame(columns = ["Name", "id", "created_at", "text"])

for handle in handles:
    df = pd.read_csv("Data/Tweets/"+str(handle)+"_tweets.csv")
    df['Name'] = handle
    df = df[["Name", "id", "created_at", "text"]]
    mt = mt.append(df)

In [5]:
pd.DataFrame.to_csv(mt, "Data/twitter_ddbb.csv")

In [54]:
## TWITTER SENTIMENT ANALYSIS

from textblob import TextBlob
import re

def clean_tweet(tweet):
    return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())

def analize_sentiment(tweet):
    analysis = TextBlob(clean_tweet(tweet))
    return analysis.sentiment.polarity    
    
# We create a column with the result of the analysis:
mt['SA'] = np.array([ analize_sentiment(tweet) for tweet in mt['text']])

# We display the updated dataframe with the new column:
#display(mt.head(10))

In [56]:
pd.DataFrame.to_csv(mt, "Data/twitter_ddbb_SA.csv")

Let's reaload the data again, for further preprocessing without need of running the scrape and NLP processes every time:

In [16]:
mt = pd.read_csv("Data/twitter_ddbb_SA.csv", index_col=0)

In [17]:
# Selecting only the tweets in the desired time frame:
mt['created_at'] = pd.to_datetime(mt['created_at'])
mask = (mt['created_at'] >= '05/01/2017') & (mt['created_at'] <= '05/01/2018')
mt.loc[mask]
twind_mt = mt.loc[mask]

# Group by DAY and aggreagate the data:
agg_mt = twind_mt.groupby('created_at').agg({'text':'count', 'Name': 'nunique', 'SA': 'mean'})
agg_mt = agg_mt.rename(columns={'text': 'Tweets (#)', 'Name': 'Active Influencers (#)', 'SA': 'Average SA'})

In [67]:
pd.DataFrame.to_csv(agg_mt, "C:/Users/Eric/Python Projects/Bitcoin-NLP-Strategy/Data/twitter_agg_ddb.csv")