# TWITTER WEBSCRAPPING

Here we are going to webscrap the twitter accounts of the top-100 influencers in the cryptocurrency world. The first step is webscrapping the top100 accounts and then all the tweets they perfromed.

#### Data Sources:

Cryptocurrencies influencer ranking: "https://cryptoweekly.co/100/"

The following code has been adapted from:
https://codeburst.io/a-twitter-analysis-of-the-100-most-influential-people-in-crypto-bb95b2608925


In [3]:
# First we load the general packages:
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [4]:
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup

def getTwitterHandles():
	# Fill in with url of page which is to be scraped
	url = "https://cryptoweekly.co/100/"

	# Retreives and parses page html
	client = urlopen(url)
	pageHtml = client.read()
	pageSoup = soup(pageHtml, "html.parser")

	# Adds all Twitter handles to twitterHandles list
	profiles = pageSoup.findAll("div", {"class":"testimonial-wrapper"})
	twitterHandles = []
	for person in profiles:
		twitterHandles.append(person.findAll("div",{"class":"author"}))
	for i in range(len(twitterHandles)):
		twitterHandles[i]=twitterHandles[i][0].findAll("a")[0].text[1:]

	client.close()
	return twitterHandles


In [9]:
# Modified from: https://gist.github.com/yanofsky/5436496

import tweepy #https://github.com/tweepy/tweepy
import csv
import sys

# Twitter API credentials (expired, don't even try it)
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""


def get_all_tweets(screen_name):
	print("Getting tweets from @" + str(screen_name))

	#Twitter only allows access to a users most recent 3240 tweets with this method
	
	#authorize twitter, initialize tweepy
	auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
	auth.set_access_token(access_key, access_secret)
	api = tweepy.API(auth)
	
	#initialize a list to hold all the tweepy Tweets
	alltweets = []	
	
	#make initial request for most recent tweets (200 is the maximum allowed count)
	new_tweets = api.user_timeline(screen_name = screen_name,count=200)
	
	#save most recent tweets
	alltweets.extend(new_tweets)
	
	#save the id of the oldest tweet less one
	oldest = alltweets[-1].id - 1
	
	#keep grabbing tweets until there are no tweets left to grab
	while len(new_tweets) > 0:
		print ("Getting tweets before %s" % (oldest))
		
		#all subsiquent requests use the max_id param to prevent duplicates
		new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
		
		#save most recent tweets
		alltweets.extend(new_tweets)
		
		#update the id of the oldest tweet less one
		oldest = alltweets[-1].id - 1
		
		print ("...%s tweets downloaded so far" % (len(alltweets)))

	#transform the tweepy tweets into a 2D array that will populate the csv	
	outtweets = [[tweet.id_str.encode('utf-8'), tweet.created_at.strftime('%m/%d/%Y'), tweet.text.encode('utf-8')] for tweet in alltweets]
	
	#write the csv	
	with open('C:/Users/Eric/Python Projects/Bitcoin-NLP-Strategy/Data/Tweets/%s_tweets.csv' % screen_name, 'w') as f:
		writer = csv.writer(f)
		writer.writerow(["id","created_at","text"])
		writer.writerows(outtweets)
	
	pass


if __name__ == '__main__':
	handles = getTwitterHandles()
	for i in range(len(handles)):
		if handles[i] == 'petertoddbtc':
			handles[i] = 'peterktodd'
	for handle in handles:
		get_all_tweets(str(handle))

Getting tweets from @VitalikButerin
Getting tweets before 1030153145299566592
...400 tweets downloaded so far
Getting tweets before 1027019525898682367
...600 tweets downloaded so far
Getting tweets before 1017443618473369599
...800 tweets downloaded so far
Getting tweets before 1011573593593950207
...1000 tweets downloaded so far
Getting tweets before 1002007292160569343
...1200 tweets downloaded so far
Getting tweets before 993679744393732096
...1399 tweets downloaded so far
Getting tweets before 987397430629945344
...1599 tweets downloaded so far
Getting tweets before 981095728008257535
...1799 tweets downloaded so far
Getting tweets before 967393860635697152
...1999 tweets downloaded so far
Getting tweets before 955487619139416063
...2198 tweets downloaded so far
Getting tweets before 945322137249648640
...2398 tweets downloaded so far
Getting tweets before 936784631583477759
...2598 tweets downloaded so far
Getting tweets before 932512383263633407
...2798 tweets downloaded so far


...1993 tweets downloaded so far
Getting tweets before 796466213010472959
...2193 tweets downloaded so far
Getting tweets before 786870903174930431
...2393 tweets downloaded so far
Getting tweets before 772785360044711935
...2593 tweets downloaded so far
Getting tweets before 764193170431369215
...2793 tweets downloaded so far
Getting tweets before 755207484000890880
...2993 tweets downloaded so far
Getting tweets before 749913271705145343
...3193 tweets downloaded so far
Getting tweets before 746719231010545664
...3194 tweets downloaded so far
Getting tweets before 746718897399799807
...3194 tweets downloaded so far
Getting tweets from @gavinandresen
Getting tweets before 940643034584186888
...398 tweets downloaded so far
Getting tweets before 866019623266897921
...598 tweets downloaded so far
Getting tweets before 831890752653422593
...797 tweets downloaded so far
Getting tweets before 745687988609388544
...996 tweets downloaded so far
Getting tweets before 661943242158694399
...1196

...3214 tweets downloaded so far
Getting tweets from @TuurDemeester
Getting tweets before 1038454680656011263
...400 tweets downloaded so far
Getting tweets before 1035150543314931713
...600 tweets downloaded so far
Getting tweets before 1031933789012877312
...800 tweets downloaded so far
Getting tweets before 1028849513891606528
...1000 tweets downloaded so far
Getting tweets before 1025049635247800322
...1200 tweets downloaded so far
Getting tweets before 1019630069470449663
...1400 tweets downloaded so far
Getting tweets before 1016119760344449029
...1600 tweets downloaded so far
Getting tweets before 1012065723403919359
...1800 tweets downloaded so far
Getting tweets before 1008781176415379455
...2000 tweets downloaded so far
Getting tweets before 1004785571213008895
...2200 tweets downloaded so far
Getting tweets before 1000456466229288959
...2400 tweets downloaded so far
Getting tweets before 995489348178075647
...2600 tweets downloaded so far
Getting tweets before 99153100171637

...1399 tweets downloaded so far
Getting tweets before 987130902844190719
...1599 tweets downloaded so far
Getting tweets before 976201781255528449
...1799 tweets downloaded so far
Getting tweets before 969241875919114239
...1999 tweets downloaded so far
Getting tweets before 963605295598620671
...2199 tweets downloaded so far
Getting tweets before 956248702494134274
...2399 tweets downloaded so far
Getting tweets before 951076144379514879
...2599 tweets downloaded so far
Getting tweets before 946744280479084543
...2799 tweets downloaded so far
Getting tweets before 943917863399804933
...2999 tweets downloaded so far
Getting tweets before 938542975793188864
...3199 tweets downloaded so far
Getting tweets before 932709755080069119
...3220 tweets downloaded so far
Getting tweets before 932366211802976255
...3220 tweets downloaded so far
Getting tweets from @TimDraper
Getting tweets before 890263674480336896
...400 tweets downloaded so far
Getting tweets before 749033103893618687
...600 t

...1594 tweets downloaded so far
Getting tweets before 870698694462525441
...1793 tweets downloaded so far
Getting tweets before 858428203060383744
...1993 tweets downloaded so far
Getting tweets before 842052728595615743
...2192 tweets downloaded so far
Getting tweets before 827343913983832063
...2392 tweets downloaded so far
Getting tweets before 817992934666698751
...2592 tweets downloaded so far
Getting tweets before 806002283943182335
...2791 tweets downloaded so far
Getting tweets before 790811174455615487
...2991 tweets downloaded so far
Getting tweets before 779932657710923775
...3188 tweets downloaded so far
Getting tweets before 766948384825569279
...3232 tweets downloaded so far
Getting tweets before 764844378468777984
...3232 tweets downloaded so far
Getting tweets from @StephanTual
Getting tweets before 973542461925568511
...400 tweets downloaded so far
Getting tweets before 953622103152001024
...600 tweets downloaded so far
Getting tweets before 922443153579433983
...800 

...1599 tweets downloaded so far
Getting tweets before 1032270278427111424
...1799 tweets downloaded so far
Getting tweets before 1030549491726868479
...1999 tweets downloaded so far
Getting tweets before 1029693981578461183
...2193 tweets downloaded so far
Getting tweets before 1028728724181667839
...2393 tweets downloaded so far
Getting tweets before 1026957111517962244
...2593 tweets downloaded so far
Getting tweets before 1025412283646849023
...2792 tweets downloaded so far
Getting tweets before 1024283299932524543
...2992 tweets downloaded so far
Getting tweets before 1022111417523208191
...3192 tweets downloaded so far
Getting tweets before 1020531332034842623
...3236 tweets downloaded so far
Getting tweets before 1020383439755784191
...3236 tweets downloaded so far
Getting tweets from @michaelkitces
Getting tweets before 1039182257779613696
...400 tweets downloaded so far
Getting tweets before 1035940925023903744
...599 tweets downloaded so far
Getting tweets before 103410323964

...999 tweets downloaded so far
Getting tweets before 888116563605245951
...1199 tweets downloaded so far
Getting tweets before 863453709514264575
...1399 tweets downloaded so far
Getting tweets before 840002238537916415
...1535 tweets downloaded so far
Getting tweets before 834131064268349439
...1535 tweets downloaded so far
Getting tweets from @novogratz
Getting tweets before 1001646267078991872
...400 tweets downloaded so far
Getting tweets before 933465914397347839
...596 tweets downloaded so far
Getting tweets before 1166191035
...596 tweets downloaded so far
Getting tweets from @dahongfei
Getting tweets before 441837033514729471
...81 tweets downloaded so far
Getting tweets from @woonomic
Getting tweets before 1022432923969052671
...400 tweets downloaded so far
Getting tweets before 1000223540111663103
...599 tweets downloaded so far
Getting tweets before 952171242131357695
...799 tweets downloaded so far
Getting tweets before 933575600307965951
...999 tweets downloaded so far
Ge

...999 tweets downloaded so far
Getting tweets before 974418435378696191
...1199 tweets downloaded so far
Getting tweets before 968882424355549183
...1399 tweets downloaded so far
Getting tweets before 963504484159770624
...1599 tweets downloaded so far
Getting tweets before 961306667739738112
...1799 tweets downloaded so far
Getting tweets before 958071849635794945
...1999 tweets downloaded so far
Getting tweets before 956297129391206399
...2199 tweets downloaded so far
Getting tweets before 954923981542486016
...2399 tweets downloaded so far
Getting tweets before 953677716754190337
...2599 tweets downloaded so far
Getting tweets before 951895463619416064
...2799 tweets downloaded so far
Getting tweets before 950773177604505599
...2999 tweets downloaded so far
Getting tweets before 949030532909846533
...3199 tweets downloaded so far
Getting tweets before 946110472109084672
...3235 tweets downloaded so far
Getting tweets before 945749941078708224
...3235 tweets downloaded so far
Gettin

...2980 tweets downloaded so far
Getting tweets before 674226084074909695
...3180 tweets downloaded so far
Getting tweets before 661094848217354239
...3209 tweets downloaded so far
Getting tweets before 659424184624398336
...3209 tweets downloaded so far
Getting tweets from @SunnyStartups
Getting tweets before 1034895616550133759
...1 tweets downloaded so far
Getting tweets from @anondran
Getting tweets before 1036023382196805631
...400 tweets downloaded so far
Getting tweets before 1029744264962551808
...599 tweets downloaded so far
Getting tweets before 1022637936599334911
...798 tweets downloaded so far
Getting tweets before 1000851043494772736
...998 tweets downloaded so far
Getting tweets before 993154916620288000
...1197 tweets downloaded so far
Getting tweets before 985653862509023231
...1397 tweets downloaded so far
Getting tweets before 981259148024217599
...1597 tweets downloaded so far
Getting tweets before 978356818631151615
...1795 tweets downloaded so far
Getting tweets b

...400 tweets downloaded so far
Getting tweets before 1005065086665482239
...600 tweets downloaded so far
Getting tweets before 996073427134894079
...800 tweets downloaded so far
Getting tweets before 985155882974613503
...1000 tweets downloaded so far
Getting tweets before 967029998719782911
...1200 tweets downloaded so far
Getting tweets before 948972243845033988
...1400 tweets downloaded so far
Getting tweets before 943099134822711300
...1600 tweets downloaded so far
Getting tweets before 939879617393053696
...1800 tweets downloaded so far
Getting tweets before 938092276617801729
...2000 tweets downloaded so far
Getting tweets before 936364309764083711
...2200 tweets downloaded so far
Getting tweets before 935373068511465471
...2400 tweets downloaded so far
Getting tweets before 932441455448113151
...2600 tweets downloaded so far
Getting tweets before 930834728906842113
...2800 tweets downloaded so far
Getting tweets before 929476181933613055
...2998 tweets downloaded so far
Getting

...1400 tweets downloaded so far
Getting tweets before 887797322809171967
...1599 tweets downloaded so far
Getting tweets before 845730743607214080
...1799 tweets downloaded so far
Getting tweets before 813579935537205247
...1999 tweets downloaded so far
Getting tweets before 791324802598379519
...2195 tweets downloaded so far
Getting tweets before 776196747257274367
...2394 tweets downloaded so far
Getting tweets before 762691098040872960
...2593 tweets downloaded so far
Getting tweets before 733034761069854719
...2792 tweets downloaded so far
Getting tweets before 703446962477993983
...2988 tweets downloaded so far
Getting tweets before 664582577093652480
...3186 tweets downloaded so far
Getting tweets before 635145934762958847
...3198 tweets downloaded so far
Getting tweets before 630866388135981056
...3198 tweets downloaded so far
Getting tweets from @cryptomanran
Getting tweets before 1034174544351780863
...400 tweets downloaded so far
Getting tweets before 1026414673279819776
...

...1800 tweets downloaded so far
Getting tweets before 951366991637495807
...2000 tweets downloaded so far
Getting tweets before 936213046560669695
...2200 tweets downloaded so far
Getting tweets before 918379661838577664
...2400 tweets downloaded so far
Getting tweets before 908957769922838527
...2600 tweets downloaded so far
Getting tweets before 886832984476631040
...2800 tweets downloaded so far
Getting tweets before 867561563506434047
...3000 tweets downloaded so far
Getting tweets before 844648924702949375
...3200 tweets downloaded so far
Getting tweets before 815579080758202368
...3203 tweets downloaded so far
Getting tweets before 815113502595563519
...3203 tweets downloaded so far
Getting tweets from @prestonjbyrne
Getting tweets before 1040804898001043455
...400 tweets downloaded so far
Getting tweets before 1039609233627656191
...600 tweets downloaded so far
Getting tweets before 1038472759100284929
...800 tweets downloaded so far
Getting tweets before 1037067346119155711
..

TweepError: Failed to send request: ('Connection aborted.', OSError("(10054, 'WSAECONNRESET')",))

In [10]:
# Create a unified dataset:
mt = pd.DataFrame(columns = ["Name", "id", "created_at", "text"])

for handle in handles:
    df = pd.read_csv("C:/Users/Eric/Python Projects/Bitcoin-NLP-Strategy/Data/Tweets/"+str(handle)+"_tweets.csv")
    df['Name'] = handle
    df = df[["Name", "id", "created_at", "text"]]
    mt = mt.append(df)
mt

FileNotFoundError: File b'C:/Users/Eric/Python Projects/Bitcoin-NLP-Strategy/Data/Tweets/HeyTaiZen_tweets.csv' does not exist

In [14]:
pd.DataFrame.to_csv(mt, "C:/Users/Eric/Python Projects/Bitcoin-NLP-Strategy/Data/twitter_ddbb.csv")