## Project: Scrape Adidas Twitter Account Data

In this project, I have scraped tweets from the [@adidas](https://twitter.com/adidas) twitter account using different libraries

### Scrape Tweets using snscrape

- snscrape library used to scrape tweeps without requiring personal API keys. Also, it can return thousands of tweets in seconds and has powerful search tools that allows for highly customisable searches.

In [1]:
!pip install snscrape

[0x7FF9993545B0] ANOMALY: use of REX.w is meaningless (default operand size is 64)
[0x7FF99940676C] ANOMALY: use of REX.w is meaningless (default operand size is 64)
[0x7FF9993545B0] ANOMALY: use of REX.w is meaningless (default operand size is 64)
[0x7FF99940676C] ANOMALY: use of REX.w is meaningless (default operand size is 64)


In [2]:
#import required packages

import snscrape.modules.twitter as sntwitter
import pandas as pd
import numpy as np
import datetime as dt
import time
import itertools

In [3]:
# empty list to store each tweet retrieved from Twitter
tweets_list = []

# TwitterUserScraper - to retrieve tweets based on particular user
for i,tweet in enumerate(sntwitter.TwitterUserScraper(username="adidas").get_items()):
    tweets_list.append([tweet.date, tweet.id, tweet.content, tweet.url, tweet.user.username, tweet.likeCount, tweet.retweetCount])

# convet tweets list to DataFrame and display it  
tweets_df = pd.DataFrame(tweets_list,columns=['created_at','tweet_id', 'tweet_content','urls', 'username', 'favorite_count', 'retweet_count'])
tweets_df.head()

Unnamed: 0,created_at,tweet_id,tweet_content,urls,username,favorite_count,retweet_count
0,2022-02-22 13:38:03+00:00,1496117082068500480,A limited number of the new adidas TERREX HS1 ...,https://twitter.com/adidas/status/149611708206...,adidas,33,4
1,2022-02-22 13:38:02+00:00,1496117077752664065,Introducing: the adidas TERREX HS1\n \nCompose...,https://twitter.com/adidas/status/149611707775...,adidas,67,8
2,2022-02-22 13:38:00+00:00,1496117070504906752,ICYMI: Last year we partnered with Finnish tex...,https://twitter.com/adidas/status/149611707050...,adidas,18,1
3,2022-02-22 13:38:00+00:00,1496117068864831495,"Calling all adventurers, hikers, and nature lo...",https://twitter.com/adidas/status/149611706886...,adidas,100,11
4,2022-02-14 22:40:22+00:00,1493354457308016651,"@Vishy_vish Unfortunately, we're not able to a...",https://twitter.com/adidas/status/149335445730...,adidas,0,0


In [4]:
# find the total number of roes and columns retrived
tweets_df.shape

(12339, 7)

In [5]:
# declare the variables required
maxTweets = 5000
tweets_all = []
keyword ="@adidas"

#TwitterSearchScraper - retrieve tweets based on the keyword given
for i,tweet in enumerate(sntwitter.TwitterSearchScraper(query=keyword).get_items()):
    if i < maxTweets:
        tweets_all.append([tweet.date, tweet.id, tweet.content, tweet.url, tweet.user.username, tweet.likeCount, tweet.retweetCount])
    else:
        break

# convet tweets list to DataFrame and display it      
tweets_all_df = pd.DataFrame(tweets_all,columns=['created_at','tweet_id', 'tweet_content','urls', 'username', 'favorite_count', 'retweet_count'])
tweets_all_df.head()

Unnamed: 0,created_at,tweet_id,tweet_content,urls,username,favorite_count,retweet_count
0,2022-02-28 03:51:11+00:00,1498143719123472388,@spidadmitchell @Theo_Howard14 #OhReally hoopi...,https://twitter.com/SnoopUpnext/status/1498143...,SnoopUpnext,1,0
1,2022-02-28 03:49:14+00:00,1498143228620513280,@elevchenko @FIFAcom Hey\n@adidas \n@CocaCola ...,https://twitter.com/DrSoup34/status/1498143228...,DrSoup34,0,0
2,2022-02-28 03:47:25+00:00,1498142772875829250,@adidas Please stop track suit sale in Russia!,https://twitter.com/va4gdagama/status/14981427...,va4gdagama,0,0
3,2022-02-28 03:34:52+00:00,1498139612329431041,@CocaCola @adidas @Visa @Hyundai. \nIf @FIFAWo...,https://twitter.com/SachsKoch/status/149813961...,SachsKoch,0,0
4,2022-02-28 03:33:41+00:00,1498139317318955013,Hit them where it hurts. Cancel all @adidas ex...,https://twitter.com/drewgibson16/status/149813...,drewgibson16,1,0


In [6]:
# find the total number of roes and columns retrived
tweets_all_df.shape

(5000, 7)

### Scrape Tweets using Tweepy

We will be using Tweepy to query Twitter's API for fetching data from adidas

In [7]:
!pip install tweepy

[0x7FF9993545B0] ANOMALY: use of REX.w is meaningless (default operand size is 64)
[0x7FF99940676C] ANOMALY: use of REX.w is meaningless (default operand size is 64)
[0x7FF9993545B0] ANOMALY: use of REX.w is meaningless (default operand size is 64)
[0x7FF99940676C] ANOMALY: use of REX.w is meaningless (default operand size is 64)


In [8]:
# import required packages
import tweepy
from tweepy import OAuthHandler

In [9]:
# Twitter API Credentials
access_token = '3147900582-iyD6Mlvi2YjdXrM5OQZxAAgZ0y2uO3NlLJkGKVG'
access_secret = 'LfEKItqGYoUmBLRAP29ELviAwE8lqiO1WBfny9FtoRip8'
consumer_key = 'QvAIXD2iYvJeWyIhcMyA7Inhf'
consumer_secret = '0VJkkwkatSmfvez0k8M40G3XPktXCrVz4euxylAB1WCLCisKWX'

#create connection with Twitter API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

### Tweepy Cursor Method

used to fetch tweets from twitter API and the number of items we want. 
- **api.user_timeline:** gets the most recent tweets posted from the **user specified**
- **api.search:** gets the most recent tweets posted for the **keyword specified**

In [10]:
username = 'adidas'
try:
    tw_obj1 = tweepy.Cursor(api.user_timeline,id=username, tweet_mode='extended').items(2000)
    tw_list1 = [[tweet.created_at, tweet.id, tweet.full_text, tweet.user._json['screen_name'], tweet.user._json['name'], tweet.entities['urls'], tweet._json['favorite_count'], tweet._json['retweet_count']] for tweet in tw_obj1]
    tw_df1 = pd.DataFrame(tw_list1,columns = ['created_at','tweet_id', 'tweet_text', 'screen_name', 'name',
                                            'urls', 'favorite_count', 'retweet_count'])
except BaseException as e:
    print('something went wrong', str(e))
    
tw_df1.head()    

Unnamed: 0,created_at,tweet_id,tweet_text,screen_name,name,urls,favorite_count,retweet_count
0,2022-02-25 22:17:47,1497335041214590981,RT @adidasoriginals: adidas x Gucci \n\n#adida...,adidas,adidas,[],0,4620
1,2022-02-22 13:38:03,1496117082068500480,A limited number of the new adidas TERREX HS1 ...,adidas,adidas,"[{'url': 'https://t.co/obaGevq34N', 'expanded_...",33,4
2,2022-02-22 13:38:02,1496117077752664065,Introducing: the adidas TERREX HS1\n \nCompose...,adidas,adidas,[],67,8
3,2022-02-22 13:38:00,1496117070504906752,ICYMI: Last year we partnered with Finnish tex...,adidas,adidas,[],18,1
4,2022-02-22 13:38:00,1496117068864831495,"Calling all adventurers, hikers, and nature lo...",adidas,adidas,[],100,11


In [11]:
tw_df1.shape

(2000, 8)

In [12]:
try:
    tw_obj2 = tweepy.Cursor(api.search, q='@adidas').items(2000)
    tw_list2 = [[tweet.created_at, tweet.id, tweet.text, tweet.user._json['screen_name'], tweet.user._json['name'], tweet.entities['urls'], tweet._json['favorite_count'], tweet._json['retweet_count']] for tweet in tw_obj2]
    tw_df2 = pd.DataFrame(tw_list2,columns = ['created_at','tweet_id', 'tweet_text', 'screen_name', 'name', 'urls', 'favorite_count', 'retweet_count'])
except BaseException as e:
    print('something went wrong', str(e))
    
tw_df2.head()    

Unnamed: 0,created_at,tweet_id,tweet_text,screen_name,name,urls,favorite_count,retweet_count
0,2022-02-28 03:58:02,1498145444500758534,Day 21 of getting @adidas and @Nike to sponsor me,DeadxGuru7,Paul Emeric,[],0,0
1,2022-02-28 03:57:17,1498145255849435144,@RawStory @FIFAcom @fifamedia is an Internatio...,MF1108,Mark Fuller🇺🇦,"[{'url': 'https://t.co/pDbE3Chbou', 'expanded_...",0,0
2,2022-02-28 03:56:48,1498145131354075136,"RT @BolekLegia: Hey, @adidas are you ok with t...",klucher87,klucher87,[],0,1
3,2022-02-28 03:54:52,1498144645402050561,"RT @ZelaznyPiotr: Hey, @adidas are you ok with...",JacekWlod,jacekwlodarczyk-Jaszczur,[],0,1547
4,2022-02-28 03:54:48,1498144627794145280,@subiekt_sport @fifamedia @CocaCola @adidas @V...,Titiektitiek2,Titiek titiek,"[{'url': 'https://t.co/Ji3o3qW88r', 'expanded_...",0,0


In [13]:
tw_df2.shape

(2000, 8)