# TwiZlatan

For today's lesson, we're gonna learn something more practical: Twitter scraping with the [Twython](https://twython.readthedocs.io/en/latest/usage/install.html) package.

The topic will be, as some of you already guessed from the name of the notebook, this guy:

![image](images/zlatan.PNG)

(If you don't know who he is, which is highly problematic, here is his [Wikipedia page](https://en.wikipedia.org/wiki/Zlatan_Ibrahimovi%C4%87))

We're going to grab Tweets relating to him at different points in time and in different locations, to get an idea of how the public perception of ~~God~~ this guy has changed over time across the world.

## Lesson plan

1. Grabbing some tweets with Twython
2. Tweepy
3. Snscrape
4. Sentiment analysis

Let's start by importing the required packages:

In [1]:
from twython import Twython, TwythonError
import tweepy
import configparser
import numpy as np
import pandas as pd
import snscrape.modules.twitter as sntwitter
from textblob import TextBlob
import nltk
nltk.download('vader_lexicon')

Then, you need to generate an application to connect to the Twitter API: go to [this website](https://apps.twitter.com), generate an application, and store its credentials in a safe location, preferrably an .ini file. You will be required to apply for a Twitter Developer account, and to apply for Elevated access.

In [2]:
config = configparser.ConfigParser()
config.read(r'H:\credentials\config.ini') # Your .ini file with your credentials

api_key = config['twitter']['api_key']
api_key_secret = config['twitter']['api_key_secret']
bearer_token = config['twitter']['bearer_token'] #we need to escape the "%" sign with another "%" sign in the .ini file
access_token = config['twitter']['access_token']
access_token_secret = config['twitter']['access_token_secret']

### 1. Grabbing some tweets with Twython

We create a connection and perform the search:

In [27]:
connection = Twython(
    api_key,
    api_key_secret
)

In [28]:
results = connection.search(q='#ibrahimovic',count=10)

`reults` is a dictionary of dictionaries. The actual tweets are stored under the key `'statuses'`, which yields another dictionary; the key `'text'` gives the text of the tweets.

In [29]:
for i,tweet in enumerate(results['statuses']):
    print('\n')
    print(f'Tweet{i+1}')
    print(tweet['text'])



Tweet1
RT @ViVA__SPORT: مترجم || 

- باتو ييروي قصة اللحظه التي كاد أن يقتل فيها ابراهيموفيتش من الامريكي اونييو .

#Ibrahimovic #onyewu https://t…


Tweet2
RT @yt_GBH2: Amazing Goals Compilation in eFootball 2023 🎮⚽ | Christmas Special 🎄🎅

#efootball2023 #music #konami #football #comp #highligh…


Tweet3
RT @FutebolVerdad: Pena de quem não viveu isso...
Olhem o nível desses times.

#derbydellamadonnina #calcio #acmilan #Internazionale #ronal…


Tweet4
Sono uno dei pochi a cui #StanotteAMilano non è piaciuto.
Mi piace #Milano, apprezzo #albertoangela e le sue trasmi… https://t.co/waeKbG3KHR


Tweet5
@Encu5Futbol 🇪🇸Real Madrid: Luka #Modric
🇪🇸Atlético Madrid: Nahuel #Molinas
🇩🇪Bayern: Salió #Mane
🏴󠁧󠁢󠁥󠁮󠁧󠁿Chelsea:… https://t.co/BHrIYxWH7u


Tweet6
RT @casamilanisti: Il 27 dicembre 2019 il Milan annunciava il ritorno di #Ibrahimovic.

Il resto è STORIA 🔴⚫️ https://t.co/1uWivLZsPU


Tweet7
Pena de quem não viveu isso...
Olhem o nível desses times.

#derbydellamadonnina #calcio #acmilan #I

### 2. Get the tweets of the man himself with Tweepy

We will focus on the Client class, which connects to the 2.0 version of the Twitter API.

Let's authenticate:

In [63]:
client = tweepy.Client(bearer_token=bearer_token) #1.0 write and read, 2.0 read only

This time, we will exploit the API's query functionalities a little bit more in depth. In particular:
* We will search for tweets from Ibra's official account only
* We will exclude retweets
* We'll grab tweets in English only

In [73]:
query = 'from:Ibra_official -is:retweet lang:en'

In [74]:
tweets = client.search_recent_tweets(query=query,max_results=10)

In [75]:
for i,tweet in enumerate(tweets.data):
    print(f'\n**Tweet {i+1}**\n',tweet.text)



**Tweet 1**
 Gotham City https://t.co/p9W9rVpff8

**Tweet 2**
 Merry ChristmaZ to you and your wifes! Ciao https://t.co/xM3gIjrEZr

**Tweet 3**
 Thank you for an unforgettable visit and a World Cup 2022 Qatar 🇶🇦 final that will remain in the history books forever https://t.co/daGdBz1wGM


Notice that, even with the Elevated access we requested, we can only grab tweets from the last 7 days, and in any case only up to 100 tweets! The `paginator` object can help in this respect. Let's also grab some more metainformation about each tweet, and let's use logical operators within the query:

In [83]:
for i,tweet in enumerate(tweepy.Paginator(client.search_recent_tweets, query='(#zlatan #ibrahimovic) lang:en',
                              tweet_fields=['context_annotations', 'created_at'], max_results=100).flatten(limit=10000)):
    print(f'\n**Tweet {i+1}**\n',tweet.text)
    print(tweet.created_at)


**Tweet 1**
 RT @yt_GBH2: Amazing Goals Compilation in eFootball 2023 🎮⚽ | Christmas Special 🎄🎅

#efootball2023 #music #konami #football #comp #highligh…
2022-12-28 12:54:05+00:00

**Tweet 2**
 Waiting for AC Milan #milan #milano #acmilan #sansiro #rossoneri #redblack #devil #puma #soccer #championsleague #uefa #seriea #pioliisonfire #pioli #weareacmilan #football #shop #storeonline #ibrahimovic #ibra #zlatan #zlatanibrahimovic #leao #maldini #tbt #curvasud https://t.co/2XypDwbOHI
2022-12-27 17:49:48+00:00

**Tweet 3**
 RT @yt_GBH2: Amazing Goals Compilation in eFootball 2023 🎮⚽ | Christmas Special 🎄🎅

#efootball2023 #music #konami #football #comp #highligh…
2022-12-27 03:02:53+00:00

**Tweet 4**
 What Religion Does Footballer Zlatan Ibrahimovic Follow #ibrahimovic #zlatan #football #zlatanibrahomovic #muslim #christian https://t.co/3Blg16BZ5q
2022-12-26 15:21:05+00:00

**Tweet 5**
 RT @MilanPosts: 📰 #CorSport: As for #Zlatan #Ibrahimovic's recovery, no date has been set for his retur

In order to grab even more tweets and run some meaningful analysis, we would need to ask for the **Academic Research product track** within the Twitter Developer space. Given that we are instead well-paid consultants, we simply grabbed recent tweets, and will use an "hackier" solution to extract all historical tweets about Zlatan...

### Hack your way into Twitter data with snscrape

Snscrape is a scraper for social networks. It can scrape Twitter, Facebook, Instagram, and many others, without any limits on the amount of data to be grabbed.

In [3]:
# Created a list to append all tweet attributes(data)
attributes_container = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('(#zlatan OR #ibrahimovic OR #zlatanibrahimovic OR @Ibra_official) lang:en').get_items()):
    if i >= 100000:
        break
    attributes_container.append([tweet.user.username, tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])

# Creating a dataframe from the tweets list above 
tweets_df = pd.DataFrame(attributes_container, columns=["Username", "Date Created", "Number of Likes", "Source of Tweet", "Text"])

Snscrape will start grabbing the most recent tweets first.

In [15]:
tweets_df.to_csv(r'H:\Data science\TwiZlatan\Twitter data.csv')

### Sentiment analysis

* Data cleaning

### References