# <center>Web Scraping by API </center>

## 1. Scrape data through APIs 
- Online content providers usually provide APIs for you to access data. Two types of APIs:
   * Python packages: e.g. tweepy package from Twitter
   * REST APIs: e.g. OMDB APIs (http://www.omdbapi.com), or TMDB (https://developers.themoviedb.org/3/getting-started)
- You need to read documentation of APIs to figure out how to access data

## 2. Scrape data by REST APIs (e.g. OMDB API)
- A REST API is a web service that uses `HTTP` requests to `GET`, `PUT`, `POST` and `DELETE` data
- Example:
    - https://groceries.asda.com/api/items/search<font color="blue"><b>?</b></font><font color='green'><b>keyword</b></font>=<font color='red'><b>yogurt<b></font><front color='purple'><b>&</b></font><font color='green'><b>r</b></font>=<font color='red'><b>json<b></font>, where
        - `?`: separate API endpoint  `https://groceries.asda.com/api/items/search` from parameters
        - `keyword=yogurt`: search `yogurt` on parameter `keyword`
        - `&`: combine multiple search criteria
        - `r=json`: result is in json format 
    - You can directly paste the above API to your browser
    - Or issue API calls using requests
- You need to read API documentation to understand how to specify parameters

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import requests
import json
import pandas as pd

In [2]:
import requests
import json

keyword = 'yogurt'


url="https://groceries.asda.com/api/items/search?keyword=" + keyword + "&r=json"

print(url)

# invoke the API 
r = requests.get(url)

# if the API call returns a successful response
if r.status_code==200:
    
    # This API call returns a json object
    # r.json() gives the json object
    # json. dumps() function converts a Python object into a json string
    result = r.json()
    print (json.dumps(result, indent=4))



https://groceries.asda.com/api/items/search?keyword=yogurt&r=json
{
    "statusMessage": "The API Item Search was executed successfully",
    "errors": [],
    "keyword": "yogurt",
    "storeId": "4565",
    "autoCorrectedTerm": "",
    "didYouMeanTerm": "",
    "isHookLogicInsert": "false",
    "totalResult": "413",
    "currentPage": "1",
    "resultsStartIndex": "1",
    "resultsEndIndex": "60",
    "maxPages": "7",
    "qusApplied": false,
    "productBoostingDetails": "0^rule_5f8046bf0931946b86fb4387^^^Default",
    "monetizedItems": [],
    "items": [
        {
            "shelfId": "1215286383583",
            "shelfName": "Corners",
            "deptId": "1215341888021",
            "deptName": "Yogurts & Desserts",
            "isBundle": "false",
            "meatStickerDetails": "10::for::\u00a33.5::true",
            "extraLargeImageURL": "",
            "bundledItemCount": "0",
            "scene7Host": "https://ui.assets-asda.com:443/dm/",
            "cin": "6362225",
 

In [3]:
# Exercise 2.2.  Another way to pass parameters

parameters = {'keyword': 'yogurt', 
              'r': 'json'}

r=requests.get('https://groceries.asda.com/api/items/search', params=parameters)

# in case authentication is needed, use
# r = requests.get('https://api.github.com/user', \
# auth=('user', 'pass'))

# if the API call returns a successful response
if r.status_code==200:
    
    # This API call returns a json object
    # r.json() gives the json object
    print (json.dumps(r.json(), indent=4))



{
    "statusMessage": "The API Item Search was executed successfully",
    "errors": [],
    "keyword": "yogurt",
    "storeId": "4565",
    "autoCorrectedTerm": "",
    "didYouMeanTerm": "",
    "isHookLogicInsert": "false",
    "totalResult": "413",
    "currentPage": "1",
    "resultsStartIndex": "1",
    "resultsEndIndex": "60",
    "maxPages": "7",
    "qusApplied": false,
    "productBoostingDetails": "0^rule_5f8046bf0931946b86fb4387^^^Default",
    "monetizedItems": [],
    "items": [
        {
            "shelfId": "1215286383583",
            "shelfName": "Corners",
            "deptId": "1215341888021",
            "deptName": "Yogurts & Desserts",
            "isBundle": "false",
            "meatStickerDetails": "10::for::\u00a33.5::true",
            "extraLargeImageURL": "",
            "bundledItemCount": "0",
            "scene7Host": "https://ui.assets-asda.com:443/dm/",
            "cin": "6362225",
            "promoDetailFull": "10 for \u00a33.5",
            "ava

## 3. JSON (JavaScript Object Notation)

### What is JSON
- A lightweight data-interchange format
- "Self-describing" and easy to understand
- JSON format is text only 
- Language independent: can be read and used as a data format by any programming language

###  JSON Syntax Rules
JSON syntax is derived from JavaScript object notation syntax:
- Data is in **name/value** pairs separated by commas
- Curly braces hold objects
- Square brackets hold arrays

### A JSON object is:
- **a dictionary** or 
- a **list of dictionaries**

### Useful JSON functions
- dumps: save json object to string
- dump: save json object to file
- loads: load from a string in json format
- load: load from a file in json format

In [4]:
# Exercise 3.1 API returns a JSON object 

parameters = {'keyword': 'yogurt', 
              'r': 'json'}

r=requests.get('https://groceries.asda.com/api/items/search', params=parameters)

# if the API call returns a successful response
if r.status_code==200:
    result = r.json()
    
    df = pd.DataFrame(result["items"])
    df.head()
    

Unnamed: 0,shelfId,shelfName,deptId,deptName,isBundle,meatStickerDetails,extraLargeImageURL,bundledItemCount,scene7Host,cin,promoDetailFull,availability,totalReviewCount,asdaSuggest,itemName,price,imageURL,aisleName,id,promoId,isFavourite,hasAlternates,wasPrice,brandName,promoType,weight,promoOfferTypeCode,promoQty,promoValue,productAttribute,scene7AssetId,promoDetail,bundleDiscount,avgStarRating,name,avgWeight,iconDetails,maxQty,pricePerWt,productURL,pricePerUOM,searchTuningScore,onSale,salePrice,positionChngByMargin
0,1215286383583,Corners,1215341888021,Yogurts & Desserts,False,10::for::£3.5::true,,0,https://ui.assets-asda.com:443/dm/,6362225,10 for £3.5,A,8,,Vanilla Yogurt with Chocolate Balls,£0.55,,Yogurts & Fromage Frais,1000120228362,ls91195,False,False,,Muller Corner,No Promo,130g,15,10,£3.50,,4025500245221,10 for £3.5,0.0,4.625,Muller Corner Vanilla Yogurt with Chocolate Balls,,"{'promotionalIcons': ['59600049'], 'informatio...",10.0,Each,https://groceries.asda.com:443/api/items/view?...,,18692820.0,False,,0
1,1215286383583,Corners,1215341888021,Yogurts & Desserts,False,10::for::£3.5::true,,0,https://ui.assets-asda.com:443/dm/,6362239,10 for £3.5,A,3,,Strawberry Yogurt,£0.55,,Yogurts & Fromage Frais,1000120228322,ls91195,False,False,,Muller Corner,No Promo,143g,15,10,£3.50,,4025500245092,10 for £3.5,0.0,3.6667,Muller Corner Strawberry Yogurt,,"{'promotionalIcons': ['59600049'], 'informatio...",10.0,Each,https://groceries.asda.com:443/api/items/view?...,,13473809.0,False,,0
2,1215286383583,Corners,1215341888021,Yogurts & Desserts,False,10::for::£3.5::true,,0,https://ui.assets-asda.com:443/dm/,6362227,10 for £3.5,A,9,,Banana Yogurt with Chocolate Flakes,£0.55,,Yogurts & Fromage Frais,1000120228335,ls91195,False,False,,Muller Corner,No Promo,130g,15,10,£3.50,,4025500245207,10 for £3.5,0.0,3.6667,Muller Corner Banana Yogurt with Chocolate Flakes,,"{'promotionalIcons': ['59600049'], 'informatio...",10.0,Each,https://groceries.asda.com:443/api/items/view?...,,12041251.0,False,,0
3,1215286383583,Corners,1215341888021,Yogurts & Desserts,False,10::for::£3.5::true,,0,https://ui.assets-asda.com:443/dm/,6362229,10 for £3.5,A,8,,Toffee Yogurt with Chocolate Hoops,£0.55,,Yogurts & Fromage Frais,1000120228313,ls91195,False,False,,Muller Corner,No Promo,130g,15,10,£3.50,,4025500245146,10 for £3.5,0.0,3.75,Muller Corner Toffee Yogurt with Chocolate Hoops,,"{'promotionalIcons': ['59600049'], 'informatio...",10.0,Each,https://groceries.asda.com:443/api/items/view?...,,8325497.0,False,,0
4,1215286383583,Corners,1215341888021,Yogurts & Desserts,False,10::for::£3.5::true,,0,https://ui.assets-asda.com:443/dm/,6362233,10 for £3.5,A,10,,Strawberry Yogurt with White Chocolate Shortca...,£0.55,,Yogurts & Fromage Frais,1000120228351,ls91195,False,False,,Muller Corner,No Promo,130g,15,10,£3.50,,4025500245160,10 for £3.5,0.0,3.8,Muller Corner Strawberry Yogurt with White Cho...,,"{'promotionalIcons': ['59600049'], 'informatio...",10.0,Each,https://groceries.asda.com:443/api/items/view?...,,7703160.0,False,,0


In [5]:
# Exercise 3.2. Parse JSON object (a dictionary)

# convert the first 2 items to string
s = json.dumps(result["items"][0:2], indent=4)
print(s)

# load back from a string
items = json.loads(s)
items

# save to file
json.dump(result["items"], open("items.json","w"))

# load back from file
items = json.load(open("items.json","r"))
print("test loaded data\n")
len(items)
items[0]

[
    {
        "shelfId": "1215286383583",
        "shelfName": "Corners",
        "deptId": "1215341888021",
        "deptName": "Yogurts & Desserts",
        "isBundle": "false",
        "meatStickerDetails": "10::for::\u00a33.5::true",
        "extraLargeImageURL": "",
        "bundledItemCount": "0",
        "scene7Host": "https://ui.assets-asda.com:443/dm/",
        "cin": "6362225",
        "promoDetailFull": "10 for \u00a33.5",
        "availability": "A",
        "totalReviewCount": "8",
        "asdaSuggest": "",
        "itemName": "Vanilla Yogurt with Chocolate\u00a0Balls",
        "price": "\u00a30.55",
        "imageURL": "",
        "aisleName": "Yogurts & Fromage Frais",
        "id": "1000120228362",
        "promoId": "ls91195",
        "isFavourite": "false",
        "hasAlternates": "false",
        "wasPrice": "",
        "brandName": "Muller Corner",
        "promoType": "No Promo",
        "weight": "130g",
        "promoOfferTypeCode": "15",
        "promoQty": 

[{'aisleName': 'Yogurts & Fromage Frais',
  'asdaSuggest': '',
  'availability': 'A',
  'avgStarRating': '4.625',
  'avgWeight': '',
  'brandName': 'Muller Corner',
  'bundleDiscount': '0.00',
  'bundledItemCount': '0',
  'cin': '6362225',
  'deptId': '1215341888021',
  'deptName': 'Yogurts & Desserts',
  'extraLargeImageURL': '',
  'hasAlternates': 'false',
  'iconDetails': {'informationalIcons': [], 'promotionalIcons': ['59600049']},
  'id': '1000120228362',
  'imageURL': '',
  'isBundle': 'false',
  'isFavourite': 'false',
  'itemName': 'Vanilla Yogurt with Chocolate\xa0Balls',
  'maxQty': '10.0',
  'meatStickerDetails': '10::for::£3.5::true',
  'name': 'Muller Corner Vanilla Yogurt with Chocolate\xa0Balls',
  'onSale': False,
  'positionChngByMargin': 0,
  'price': '£0.55',
  'pricePerUOM': '',
  'pricePerWt': 'Each',
  'productAttribute': '',
  'productURL': 'https://groceries.asda.com:443/api/items/view?itemid=1000120228362',
  'promoDetail': '10 for £3.5',
  'promoDetailFull': '

test loaded data



60

{'aisleName': 'Yogurts & Fromage Frais',
 'asdaSuggest': '',
 'availability': 'A',
 'avgStarRating': '4.625',
 'avgWeight': '',
 'brandName': 'Muller Corner',
 'bundleDiscount': '0.00',
 'bundledItemCount': '0',
 'cin': '6362225',
 'deptId': '1215341888021',
 'deptName': 'Yogurts & Desserts',
 'extraLargeImageURL': '',
 'hasAlternates': 'false',
 'iconDetails': {'informationalIcons': [], 'promotionalIcons': ['59600049']},
 'id': '1000120228362',
 'imageURL': '',
 'isBundle': 'false',
 'isFavourite': 'false',
 'itemName': 'Vanilla Yogurt with Chocolate\xa0Balls',
 'maxQty': '10.0',
 'meatStickerDetails': '10::for::£3.5::true',
 'name': 'Muller Corner Vanilla Yogurt with Chocolate\xa0Balls',
 'onSale': False,
 'positionChngByMargin': 0,
 'price': '£0.55',
 'pricePerUOM': '',
 'pricePerWt': 'Each',
 'productAttribute': '',
 'productURL': 'https://groceries.asda.com:443/api/items/view?itemid=1000120228362',
 'promoDetail': '10 for £3.5',
 'promoDetailFull': '10 for £3.5',
 'promoId': 'ls91

## 4. Get Tweets

Reference: 
- https://github.com/scalto/snscrape-by-location/blob/main/snscrape_by_location_tutorial.ipynb
- https://medium.com/swlh/how-to-scrape-tweets-by-location-in-python-using-snscrape-8c870fa6ec25

Note: User object is not exposed by TwitterSearchScraper any more.

In [7]:
pip install snscrape

Collecting snscrape
  Downloading snscrape-0.3.4-py3-none-any.whl (35 kB)
Installing collected packages: snscrape
Successfully installed snscrape-0.3.4


In [9]:
import pandas as pd
import snscrape.modules.twitter as sntwitter
import itertools


In [10]:
#  search by keywords + time
# TwitterSearchScraper returns an interator, islice loops through the iterator

df = pd.DataFrame(itertools.islice(sntwitter.TwitterSearchScraper(
    '"blockchain + since:2020-10-31 until:2020-11-03"').get_items(), 500))

print(len(df))
df.head()


500


Unnamed: 0,url,date,content,id,username,outlinks,outlinksss,tcooutlinks,tcooutlinksss
0,https://twitter.com/zawphyowai199/status/13234...,2020-11-02 23:59:55+00:00,Synic Token Airdrop is now Live🚀💰🏆\n\nClick on...,1323414568236806146,zawphyowai199,[https://t.me/synictoken_officialAirdropbot],https://t.me/synictoken_officialAirdropbot,[https://t.co/O83hXYOniH],https://t.co/O83hXYOniH
1,https://twitter.com/CryptoWatchBot/status/1323...,2020-11-02 23:59:53+00:00,"@NEO_Blockchain, #NEO is the coin with the bes...",1323414560204824576,CryptoWatchBot,[],,[],
2,https://twitter.com/Rayinhosen/status/13234145...,2020-11-02 23:59:46+00:00,"📌 CPX Airdrop is Live, 🎁 Join to get Free 7 CP...",1323414530769051648,Rayinhosen,[https://t.me/CrypxieAirdrop_bot?start=r017400...,https://t.me/CrypxieAirdrop_bot?start=r0174001...,[https://t.co/dcfBIyNYi4],https://t.co/dcfBIyNYi4
3,https://twitter.com/coinmarketnet/status/13234...,2020-11-02 23:59:43+00:00,"📌 CPX Airdrop is Live, 🎁 Join to get Free 7 CP...",1323414518651781120,coinmarketnet,[https://t.me/CrypxieAirdrop_bot?start=r076622...,https://t.me/CrypxieAirdrop_bot?start=r0766228434,[https://t.co/Ff0IdY5i1G],https://t.co/Ff0IdY5i1G
4,https://twitter.com/Link_Errors/status/1323414...,2020-11-02 23:59:42+00:00,Yearnify Finance Airdrop is now Live🚀💰🏆\n\nCli...,1323414512675016706,Link_Errors,[https://t.me/YearnifyAirdropBot],https://t.me/YearnifyAirdropBot,[https://t.co/6HpZOBOZzI],https://t.co/6HpZOBOZzI


In [11]:
# search by user

df = pd.DataFrame(itertools.islice(sntwitter.TwitterUserScraper(
    '"zawphyowai199"').get_items(), 500))

print(len(df))
df.tail()

500


Unnamed: 0,url,date,content,id,username,outlinks,outlinksss,tcooutlinks,tcooutlinksss
495,https://twitter.com/zawphyowai199/status/13669...,2021-03-03 03:01:06+00:00,@latokens @giftedhandsGHD #LATOKEN \n\nNice,1366946708941250562,zawphyowai199,[],,[],
496,https://twitter.com/zawphyowai199/status/13669...,2021-03-03 03:00:18+00:00,@mma728122 \n@mst5792 https://t.co/idnoWjNaUL,1366946508499652611,zawphyowai199,[https://twitter.com/latokens/status/136678554...,https://twitter.com/latokens/status/1366785545...,[https://t.co/idnoWjNaUL],https://t.co/idnoWjNaUL
497,https://twitter.com/zawphyowai199/status/13669...,2021-03-03 02:51:11+00:00,@airdropinspect @Ashwsbreal @blazingbitcoin @m...,1366944211526819840,zawphyowai199,[],,[],
498,https://twitter.com/zawphyowai199/status/13669...,2021-03-03 02:47:55+00:00,@alexanhtuan #Phoswapper\n\n@mma728122 \n@mst5...,1366943388881154048,zawphyowai199,[],,[],
499,https://twitter.com/zawphyowai199/status/13667...,2021-03-02 14:55:21+00:00,@KoalaDefi @dfgr5eytsy56es1 @paisalnurpadil1 @...,1366764068342751238,zawphyowai199,[],,[],


## 5. Tweepy
- Tweepy is a python library to access Twitter API. 
- `pip install tweepy`
- The Tweepy documentation has detailed explanations: https://docs.tweepy.org/en/stable/
- You need to apply for a developer account from here: https://developer.twitter.com/en/apply-for-access

In [12]:
import tweepy
import csv
import datetime

# https://docs.tweepy.org/en/stable/auth_tutorial.html

# enter your account information
CONSUMER_KEY=''
CONSUMER_SECRET=''
ACCESS_KEY=''
ACCESS_SECRET=''

auth=tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY,ACCESS_SECRET)
api=tweepy.API(auth)

In [14]:
# Take a look at the public tweets from your account's home timeline 

public_tweets = api.home_timeline()
for tweet in public_tweets[:2]:
    print(tweet.text)

TweepError: ignored

In [None]:
# is this useful information?
# Let's take a close look at ONE tweet json

public_tweets[0]
# there's no way to figure this out


In [None]:
# make it look better
# convert to string
json_str = json.dumps(public_tweets[0]._json)

# deserialise string into python object
parsed = json.loads(json_str)

print(json.dumps(parsed, indent=4, sort_keys=True))
# Now we can have a better idea of the clustered relations of the json object

### 5.1. Get tweets from users' timeline
- Make a Timeline call to retrieve the most recent 3200 tweets by a user (a rule set by Twitter).
    - Note: the time range you get depends on how often the user posts tweets. 
- Parameters for the timeline call
    - `count`: the number of results to try and retrieve per page. Maximum is 200. 
    - Make multiple calls to retrieve the 3200 tweets. 
    - `tweet_mode`:swaps the text index for full_text, and prevents a primary tweet longer than 140 characters from being truncated.
- Variables of tweet objects
    - https://docs.tweepy.org/en/stable/api.html#tweepy-api-twitter-api-wrapper

In [None]:
# Get the first five tweets of a user.
timeline = api.user_timeline("KelloggCompany",count=5,tweet_mode="extended")

for status in timeline:
    print (status.id)
    print (status.full_text)

In [None]:
# Step 1: get a list of tweets 
# Step 2: extract the varaibles you want

def get_all_tweets(user_name):
    auth=tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
    auth.set_access_token(ACCESS_KEY,ACCESS_SECRET)
    api=tweepy.API(auth)
    
    # initialize the first call
    alltweets=[]
    new_tweets=api.user_timeline(user_name, count=200)
    alltweets.extend(new_tweets)
    oldest=alltweets[-1].id-1  #next time start from the oldest one minus one 
    
    # continue to get tweets
    while len(new_tweets)>0:  
        print ("getting tweets before", oldest)
        new_tweets = api.user_timeline(user_name,count=200, max_id=oldest)
        alltweets.extend(new_tweets)
        oldest=alltweets[-1].id-1
        print("...{} tweets downloaded so far".format(len(alltweets)))
    
    # extract the variables you want
    outtweets = [[tweet.id_str, tweet.user.name, tweet.created_at, tweet.user.followers_count,
                  tweet.text.encode("utf-8")] for tweet in alltweets]
            
    # write out your variables
    with open('%s_tweet.csv' % user_name,'w') as outputfile: 
        writer=csv.writer(outputfile)
        writer.writerow(["id","user_name","created_at","followers","text"])
        writer.writerows(outtweets)
    pass

# use your function
if __name__ == '__main__':
    #pass in the username of the account you want to download
    get_all_tweets("KelloggCompany")
    

In [None]:
# Take a look at the table you got
df= pd.read_csv('KelloggCompany_tweet.csv', header=0)
df.head()

# how many tweets we get?
len(df)

# The first tweet is a retweet. Take a look at the text.
df.text[0]

# Let's compare it with the actual tweet. 
# You can find each tweet by its ID. 
# https://twitter.com/KelloggCompany/status/1438671732513116163

### 5.2. Deal with truncated text
- For text mining on Twitter, it is important to get the full text. 
    - Full text would be essential for topic modeling and sentiment analysis.
    - Full text is also important for extracting mention networks (note the previous example). 
- Use the `tweet_mode="extended"` when calling a user's timeline.
    - When using extended mode, the `text` attribute of Status objects returned is replaced by a `full_text` attribute, which contains the entire untruncated text of the Tweet. 
- Full text for tweets that are retweets.
    - If the tweet is a retweet, the full_text is still truncated. 
    - We need to access the full text through `retweeted_status` attribute, which is a status object itself. 
- For reference: https://docs.tweepy.org/en/stable/extended_tweets.html

In [None]:
# Let's deal with retweets

def get_all_tweets(user_name):
    auth=tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
    auth.set_access_token(ACCESS_KEY,ACCESS_SECRET)
    api=tweepy.API(auth,wait_on_rate_limit=True)

    alltweets=[]
    new_tweets=api.user_timeline(user_name, count=200,tweet_mode="extended")
    alltweets.extend(new_tweets)
    oldest=alltweets[-1].id-1  
    
    # set date condition
    startDate=datetime.datetime(2021, 1, 1, 0, 0, 0)
    while new_tweets[-1].created_at > startDate:
        print ("getting tweets before", oldest)
        new_tweets = api.user_timeline(user_name,count=200, max_id=oldest)
        alltweets.extend(new_tweets)
        oldest=alltweets[-1].id-1
        print("...{} tweets downloaded so far".format(len(alltweets)))
        
    # check if it's a retweet
    # When using extended mode with a Retweet, the full_text attribute of the Status object may be truncated    
    # However, since the retweeted_status attribute (of a Status object that is a Retweet) is itself a Status object
    # the full_text attribute of the Retweeted Status object can be used instead.
    
    outtweets_all=[]
    for tweet in alltweets:
        status = api.get_status(tweet.id, tweet_mode="extended")
        
        if hasattr(status, "retweeted_status"):  # is a retweet
            full_text=status.retweeted_status.full_text.encode("utf-8")
            
            outtweets=[
            # tweet content
            tweet.id_str, tweet.created_at,full_text,
            # user features
            tweet.user.name, tweet.user.screen_name, tweet.user.followers_count, 
            # retweet features
            tweet.retweeted_status.user.name,tweet.retweeted_status.user.screen_name,tweet.retweeted_status.user.description]
            outtweets_all.append(outtweets)
  
        else: # not a retweet
            full_text=status.full_text.encode("utf-8")
                    
            outtweets=[
            # tweet content
            tweet.id_str, tweet.created_at,full_text,
            # user features
            tweet.user.name, tweet.user.screen_name, tweet.user.followers_count, 
            # retweet features
            "no value","no value","no value"]
            outtweets_all.append(outtweets)

    with open('%s_full_tweet.csv' % user_name,'w') as outputfile: 
        writer=csv.writer(outputfile)
        writer.writerow(["id","created_at","full_text",
                        "user.name","user.screen_name","user.followers_count",
                        "retweeted_status.user.name","retweeted_status.user.screen_name","retweeted_status.user.description"])
        writer.writerows(outtweets_all)

        
if __name__ == '__main__':
    #pass in the username of the account you want to download
    get_all_tweets("KelloggCompany")


In [None]:
df= pd.read_csv('KelloggCompany_full_tweet.csv', header=0)
df.head()
len(df)
# plese compare this full text with the above truncated text, what differences can you find?
df.full_text[0]

### 5.3. Build Twitter networks
- **Follower-followee network**
    - If you have a list of user accounts, you may retrive the pairwise boolean values of following relations. 
    - Parameters
        * `source_id` – The user_id of the subject user.
        * `source_screen_name` – The screen_name of the subject user.
        * `target_id` – The user_id of the target user.
        * `target_screen_name` – The screen_name of the target user.
- **Retweet network**
    - Retweeted accounts can be extracted while scraping the API. 
    - Or retweeted accounts can be extracted from the text. 
- **Mention network**
    - Can be extracted from the full text. 

In [None]:
# How to scrape the follower-followee network?
# we can directly retrieve a bollean value 

dog="Microsoft"
cat="Oracle"

is_following = api.show_friendship(source_screen_name=dog,target_screen_name=cat)
print(is_following[1].following)

# Question: how to get the adjacency matrix of a follower-followee network?

##### Twitter data resources
https://github.com/echen102/us-pres-elections-2020 <br>
https://github.com/echen102/COVID-19-TweetIDs