# Introduction

Whether you are starting a business or want to grow an existing one - scraping the web is one of the most important ways to collect data about your competition and products. Whether you want to build your brand or want to understand how your latest ad campaign is faring, one of the quickest way to get this data is through scraping the web. 

In todays workshop, we are going to understand how to scrap twitter and use it to build a simple Sentiment Classifier. 
###  Why Twitter?

Twitter has been one of the most powerful platforms for an effective lead generation, viral advertising, and quality social network building. There are about 326 million active users on Twitter and it supports 40 languages. it is the best and the easiest way to reach the widest target groups. Given the user engagement and tweets from the pioneers of the industry, thought leaders, and monitoring your close competitors there are a ton of insights you can gather.

Twitter holds data that is real-time and this information holds a lot of value to your business. If you are a digital marketer, think of how Twitter scraping will help you – all the influencers you can connect to, the competitors you can constantly monitor, the sentiment analysis you can perform, and the customer behavior study you can do.

### Imports


In [3]:
#!pip install tweepy #Uncomment if tweepy not installed

In [5]:
# Data manipulation
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = None
pd.options.display.max_rows = None

pd.options.display.max_colwidth=-1

# Display all cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

from IPython import get_ipython
ipython = get_ipython()

# autoreload extension
if 'autoreload' not in ipython.extension_manager.loaded:
    %load_ext autoreload

%autoreload 2

# Visualizations
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)


import tweepy
#import cufflinks as cf
#cf.go_offline(connected=True)
#cf.set_config_file(theme='white')

# Analysis/Modeling


Twitter API requires that all requests use OAuth to authenticate. So you need to create the required authentication credentials to be able to use the API. These credentials are four text strings:

1. Consumer key
2. Consumer secret
3. Access token
4. Access secret

For getting OAUTH Credentials, we need to register in Twiotter Developer Site (https://developer.twitter.com/en/apps) and register to get the OAuth Tokens 


Once u create an app, you can get the Keys and Tokens. 

**THE KEYS AND TOKENS ARE NOT TO BE SHARED WITH ANYONE.SAVE IT AS AN JSON**




In [9]:
def storeCredentials(consumer_key,consumer_secret,access_key,access_secret):
    import json

    twitter_cred = dict()

# Enter your own consumer_key, consumer_secret, access_key and access_secret
# Replacing the stars ("********")

    twitter_cred['CONSUMER_KEY'] = consumer_key
    twitter_cred['CONSUMER_SECRET'] = consumer_secret
    twitter_cred['ACCESS_KEY'] = access_key
    twitter_cred['ACCESS_SECRET'] = access_secret


    # Save the information to a json so that it can be reused in code without exposing
    # the secret info to public

    with open('twitter_credentials.json', 'w') as secret_info:
        json.dump(twitter_cred, secret_info, indent=4, sort_keys=True)

In [15]:
def loadCredentials(credentials_file):
    with open(credentials_file) as f:
        data = json.load(f)
    return data

In [16]:
twitter_cred=loadCredentials("twitter_credentials.json")

### Let us authenticate Crdentials

1. OAuthHandler class that you can use to set the credentials to be used in all API calls.

2. API class has many methods that provide access to Twitter API endpoints. Using these methods, you can access the Twitter API’s functionality.

To create an API class object, tweepy.API() is called. wait_on_rate_limit & wait_on_rate_limit_notify are set to True makes the API object print a message and wait if the rate limit is exceeded

In [17]:
def authenticateCredentials(twitter_credentials_filename):
    twitter_cred=loadCredentials(twitter_credentials_filename)
    auth = tweepy.OAuthHandler(twitter_cred['CONSUMER_KEY'], twitter_cred['CONSUMER_SECRET'])
    auth.set_access_token(twitter_cred['ACCESS_KEY'], twitter_cred['ACCESS_SECRET'])
    api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
    return api

In [18]:
api=authenticateCredentials("twitter_credentials.json")

In [20]:
def verify_credentials(api):
    try:
        api.verify_credentials()
        return "Authenticated"
    except:
        return "Not Authenticated"

In [21]:
verify_credentials(api)

'Authenticated'

In [22]:
api.verify_credentials()

User(_api=<tweepy.api.API object at 0x1137c1470>, _json={'id': 3300364314, 'id_str': '3300364314', 'name': 'Aiswarya', 'screen_name': 'aiswaryaram88', 'location': '', 'description': '', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 4, 'friends_count': 1, 'listed_count': 0, 'created_at': 'Wed Jul 29 10:38:52 +0000 2015', 'favourites_count': 1, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 1, 'lang': None, 'status': {'created_at': 'Sat Aug 18 19:34:35 +0000 2018', 'id': 1030900787986067456, 'id_str': '1030900787986067456', 'text': 'RT @AnalyticsVidhya: “Machine Learning to Predict Taxi Fare — Part One : Exploratory Analysis” by Aiswarya Ramachandran https://t.co/GZWMZg…', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'AnalyticsVidhya', 'name': 'Analytics Vidhya', 'id': 2311645130, 'id_str': '2311645130', 'indices': [3, 19]}], 'urls': []}

When I do verify credentials, if it is an valid OAuth token, then you can see your profile information on verify_credentials()

### Reading User Timeline using Tweepy

#### Reading your own Timeline

home_timeline() returns the top 20 tweets in your (authenticating users) timeline

In [23]:
timeline = api.home_timeline()
for tweet in timeline:
    print(f"{tweet.user.name} said {tweet.text}")

Reuters Top News said Lower provisions help India's ICICI bank swing to first quarter profit https://t.co/ZjqQSg27Mt https://t.co/SvTlrchPyH
Reuters Top News said ICYMI: Samsung's Galaxy Fold will go on sale from September in selected markets after its launch was delayed by scr… https://t.co/qDqqUD5fAw
Reuters Top News said England call up Archer for Ashes opener, Stokes reinstated https://t.co/biiuBRdzJD https://t.co/l7teHLzwxU
Reuters Top News said Hungary PM Orban flags more economic stimulus plans for 2020 https://t.co/toSE6d5RPy https://t.co/sqlbXO8mbu
Reuters Top News said Novartis CEO pledges not to sell Sandoz generics unit https://t.co/yEuKx4h8fh https://t.co/qGRNbtrLZD
Reuters Top News said The plight of a mother and son who had traveled some 1,500 miles from Guatemala to the border city of Ciudad Juarez… https://t.co/tWiJDN53Wp
Reuters Top News said Irish PM says hard Brexit would raise issue of Irish unification https://t.co/wnLSrs4EyE https://t.co/sQXBggPUzZ
Reuters Top Ne

#### Reading other public user timeline

Lets see what Sachin Tendulkar has to tweet

In [27]:
user_timeline=timeline = api.user_timeline("@sachin_rt") #GEt Sachin Ten
for tweet in timeline:
    print(f"{tweet.user.name} said {tweet.text}")

Sachin Tendulkar said Come on Mumbai! The @IDBIFed Marathon Season is back and it starts with Mumbai.
Register for the @MumbaiHM and stan… https://t.co/2UW5BCf9sX
Sachin Tendulkar said Congratulations on a wonderful One Day career, #Malinga.
Wishing you all the very best for the future. https://t.co/RLeKIudyWl
Sachin Tendulkar said On #KargilVijayDiwas, I salute the valour, honour &amp; selfless service of our brave hearts who sacrificed their lives… https://t.co/U14GqMntt1
Sachin Tendulkar said It was, as always, such a joy to meet @MarkKnopfler for breakfast and chat about music, sports and life! A great mu… https://t.co/WIpvCgRoWm
Sachin Tendulkar said A friend shared this video with me.
Found it very unusual!
What would your decision be if you were the umpire? 🤔 https://t.co/tJCtykEDL9
Sachin Tendulkar said I congratulate Team @isro on achieving yet another milestone with the launch of #Chandrayaan2!

Hope this paves the… https://t.co/NpANPxH1TX
Sachin Tendulkar said Happy birthday

### Write an tweet

In [28]:
api.update_status("Test tweet from Tweepy Python")

Status(_api=<tweepy.api.API object at 0x1137c1470>, _json={'created_at': 'Sat Jul 27 13:15:46 +0000 2019', 'id': 1155104494654377988, 'id_str': '1155104494654377988', 'text': 'Test tweet from Tweepy Python', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, 'source': '<a href="https://example1.com" rel="nofollow">Twitter_ISME</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 3300364314, 'id_str': '3300364314', 'name': 'Aiswarya', 'screen_name': 'aiswaryaram88', 'location': '', 'description': '', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 4, 'friends_count': 1, 'listed_count': 0, 'created_at': 'Wed Jul 29 10:38:52 +0000 2015', 'favourites_count': 1, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 2, 'lang': No

### Get an Users Information

In [31]:
user = api.get_user("@sachin_rt")

print("User details:")
print(user.name)
print(user.description)
print(user.location)
print(user.followers_count) #Get number of followers

User details:
Sachin Tendulkar
Proud Indian
ÜT: 18.986431,72.823769
29896937


### Search for a particular hashtag or query

In [32]:
for tweet in api.search(q="Python", lang="en", rpp=10): #rpp is number of tweets to return per page - max=100
    print(f"{tweet.user.name}:{tweet.text}")

Gamer Geek:RT @PythonMori: Software Architecture with Python

☞ https://t.co/8suXHyweAw

#Python

S1RjgO2P3V https://t.co/a8pjlZ61Ph
A mind full of Curiosity:RT @JonTrevithick: Kamikaze Scotsman dummy carried by BBC crew member.
Monty Python's Flying Circus at Norwich Castle, November 1971. https…
Kalle Hallden:RT @Mybridge: Python Top 10 Articles for the Past Month (v.July 2019)

@ThePSF 
https://t.co/qoYeWWG1Td https://t.co/H1rtAvBMmM
Python:Software Architecture with Python

☞ https://t.co/8suXHyweAw

#Python

S1RjgO2P3V https://t.co/a8pjlZ61Ph
JAVASCRIPT - BOT:RT @FreebiesGlobal: $0 (Was $199) Udemy Course
👉 #EthicalHacking with #Python

#freebiesglobal #webdev #coder #Developer #coding #tech #mys…
JAVASCRIPT - BOT:RT @JKaylight: #Day16 
#100DaysOfCode

I published my first #reactjs article on the first steps of #React. Putting people through the hassl…
Martina:RT @41Strange: A Piebald Ball Python With a Jack O’ Lantern Pumpkin Pattern https://t.co/at2fFZI29c
Python Ireland:Check o

This is a REST API and you cannot go back in time. If you want to continuously get data from Twitter. You need to use a streaming API.This will jeep colklecting any new tweets that come based on your query

### Use Streaming API for collecting tweets

In [33]:
import json
import tweepy

class MyStreamListener(tweepy.StreamListener):
    def __init__(self, api):
        self.api = api
        self.me = api.me()

    def on_status(self, tweet):
        print(f"{tweet.user.name}:{tweet.text}")

    def on_error(self, status):
        print("Error detected")

tweets_listener = MyStreamListener(api)
stream = tweepy.Stream(api.auth, tweets_listener)
stream.filter(track=["Python", "Tweepy"], languages=["en"])

Collins Njau:RT @ProgrammerBooks: Decentralized Applications ==&gt; https://t.co/V4RhkPCnwu

#python #javascript #angular #reactjs #vuejs #perl #ruby #Csha…
ProCode:Let me show you how Python can fool you - can you guess the output?

i= 4 
for i in range(10 ,20):
    print(i)
ﻡ,:RT @waqas_x: 6 year old me had a bunch of parakeets. One fine 14th August in a fit of emotions after reading Allama Iqbal's poem about a ca…
Erron Bennett:RT @androckb: #OTD: July 26, 1977 - Congo section of @BuschGardens #Tampa officially opens to guests. 

Area included new Claw Island, Afri…


KeyboardInterrupt: 

### Get Trending Topics on Twitter

In [35]:
trends_result = api.trends_place(1) # 1 is the Global Yahoo WOEID
for trend in trends_result[0]["trends"]:
    print(trend["name"])

#IdolPHFinalShowdown
Jasminbette HızlıÇekim
#nhkらじらー
#世界最強動物ランキング
#GermanGP
#Ensobette15TLBedava
土用の丑の日
隅田川花火大会
Ali Koç
花火の音
Go Lucas
不二先輩
Nicolas Pepe
ヤンキーボーイ・ヤンキーガール
半グレ
ベッテル
マーティン
#PengenNikah
#KenyavsNigeria
#اجازه_المعلمين_طويله
#すべらない話
#KazdağlarıHepimizin
#suriyelileriistemiyoruz
#頑張ってiKON
#嵐にしやがれ
#ドッキリGP
#PSGInter
#BiroJomblo
#ماذا_تعني_لك_امك
#FelizSabado
#GetMeNakedIn5Words
#JJDJ
#世界一受けたい授業
#EXplOrationinSeoulDay5
#ماذا_ينقصك_في_هاللحظه
#SaturdayMorning
#OurIcePrincessJESSICA
#NRLStormManly
#SowetoDerby
#كيف_جوكم
#HappyBirthdayDhanush
#beinligue2
#MekanıSahibiSagopaKajmer
#EuApoioManu
#テニチャ
#あなたへ苦情殺到のお知らせ
#SIA
#놀이동산에_간다면
#IkawAngMelodyMNL48
#KIMJAEHWANinKL


In [36]:

##You can get WOEID from here https://codebeautify.org/jsonviewer/f83352
trends_result = api.trends_place(2282863) # 2282863 is  Yahoo WOEID for INDIA
for trend in trends_result[0]["trends"]:
    print(trend["name"])

#HappyBirthdayDhanush
#StopLynchings
#Hero
#IPledgekNOwHep
#BoysMovieCastingCall
bali
Nicolas Pepe
Ratul Puri
Auba
Jofra Archer
Kriti Sanon
Rs 1,908
GST Council
Article 35A
Thanks Meet
Lille
Chota Bheem
Tierney
#MahalaxmiExpress
#REAEastZone
#HBDBelovedDulquer
#GermanGP
#BharatRatna
#JackpotAudioLaunch
#AndThatsWhyIHateFacebook
#JackDanielFirstLook
#MahaCMMeetsSM
#ilayasuperstarDhanush
#Badlapur
#WishIHadSpaceFor
#SpotifyIndia
#Pattasu
#NovaDairyQuiz
#sunteja
#MeriKritiKaBirthday
#SingappenneyFastest700KLikes
#SenaParSiyasatKyon
#FamilyHolidayWithClubM
#SaveYUDKBH
#ddvtp
#CRPFFoundationday
#MumbaiFloods
#HappyBirthdayAai
#Caturday
#Apache
#KnowBeforeYouGo
#AhmedabadRain
#soori
#teamsoul_pmco
#BJPStayAwayFromJournalism


# Conclusions and Next Steps
We can use the concepts we have learnt here to build a complete Twitter Bot. Scraping is a very important method of collecting data from the web.There are eays to get data from Twitter, without using the API as well and without any restriction.