# Final Assignment Correction

## Subject

1. Install `minet`, `ural` using `pip`.
2. Using `minet` (relevant documentation [here](https://github.com/medialab/minet/blob/master/docs/twitter.md)), scrape all tweets matched by the `(blocage OR occupation) sciencespo` query, and store them in a list (this should amount to roughly ~1300 tweets). As this can take some minutes, print the number of tweets scraped each time you scrape 100 tweets so you can have feedback on what's happening.
3. Find the top tweet by number of retweets and display its text, its author's screen name and its number of retweets.
4. Find the 10 accounts that tweeted the most in our corpus (you should rely on the `user_screen_name` key).
5. Find the 10 most influent accounts by total number of retweets in our corpus.
6. For nonzero values, compute descriptive statistics (min, max, mean, median, stdev) of the numbers of retweets in our corpus (remember python has a [`statistics`](https://docs.python.org/3/library/statistics.html) module).
7. Write a csv file using a [`csv.DictWriter`](https://docs.python.org/3.6/library/csv.html#csv.DictWriter) with two columns `url` and `count`, containing all the shared urls along with the number of times they were shared in our corpus (using the `links` key, containing the urls shared by a single tweet).
8. Find the 10 most shared urls.
9.  Find the 25 most used hashtags (check out the `hashtags` key).
10. Try running your own queries on the dataset to better explore it.
11. Redact a concise paragraph explaining what you understood of the subject at hand and the way people spoke of it on Twitter, by exploring this corpus of tweets.

## Answers

In [1]:
# 1.
!pip install minet ural

You should consider upgrading via the '/home/yomguithereal/.pyenv/versions/3.8.5/envs/psia/bin/python3.8 -m pip install --upgrade pip' command.[0m[33m
[0m

In [2]:
# Imports
import csv
from minet.twitter import TwitterAPIScraper
from statistics import mean, median, stdev
from collections import Counter

In [3]:
# 2.
scraper = TwitterAPIScraper()

TWEETS = []
nb = 0

for tweet in scraper.search_tweets('(blocage OR occupation) sciencespo'):
    TWEETS.append(tweet)
    
    nb += 1
    
    # Using modulo operator to print a message every 100 tweet
    if nb % 100 == 0:
        print('Retrieved', nb, 'tweets')
        
len(TWEETS)

Retrieved 100 tweets
Retrieved 200 tweets
Retrieved 300 tweets
Retrieved 400 tweets
Retrieved 500 tweets
Retrieved 600 tweets
Retrieved 700 tweets
Retrieved 800 tweets
Retrieved 900 tweets
Retrieved 1000 tweets
Retrieved 1100 tweets
Retrieved 1200 tweets


1282

In [4]:
TWEETS[0]

{'id': '1524071820466569218',
 'local_time': '2022-05-10T17:00:12',
 'timestamp_utc': 1652202012,
 'text': '@marsaturanus @Antoine_pepito Précision tout de même: Quentin a 18ans, il est en L1 à sciences po. Pas forcément armé pour répondre au niveau d’un ingénieur. Ça ne justifie pas le blocage mais bon… ;)',
 'url': 'https://twitter.com/MonlyAdam/status/1524071820466569218',
 'quoted_id': None,
 'quoted_user': None,
 'quoted_user_id': None,
 'quoted_timestamp_utc': None,
 'retweeted_id': None,
 'retweeted_user': None,
 'retweeted_user_id': None,
 'retweeted_timestamp_utc': None,
 'media_files': [],
 'media_types': [],
 'media_urls': [],
 'links': [],
 'links_to_resolve': False,
 'domains': [],
 'hashtags': [],
 'mentioned_ids': ['969903829', '740599442068373504'],
 'mentioned_names': ['antoine_pepito', 'marsaturanus'],
 'collection_time': '2022-05-24T15:36:50.607172',
 'match_query': True,
 'collected_via': ['scraping'],
 'coordinates': None,
 'to_tweetid': '1524049172143427585',
 'to

In [5]:
# 3.
top_tweet = max(TWEETS, key=lambda tweet: tweet['retweet_count'])
print('Text:', top_tweet['text'])
print('User:', top_tweet['user_screen_name'])
print('Retweets:', top_tweet['retweet_count'])

Text: 💥📣OCCUPATION EN COURS A SCIENCES PO !  
Après l' #ENS et la #Sorbonne, c'est au tour des étudiant-es de SciencesPo à #Paris de se mobiliser. 
Ni Le Pen, ni Macron, contre la précarité, le déni écologique et la violation des droits humains, la jeunesse s'organise ! https://twitter.com/sarah_chp/status/1514499001948774403/photo/1
User: sarah_chp
Retweets: 357


In [6]:
# 4.
top_accounts = Counter(tweet['user_screen_name'] for tweet in TWEETS)
top_accounts.most_common(10)

[('sciencespo', 19),
 ('MatthDes', 18),
 ('DavidLibeau', 17),
 ('LaPeniche', 15),
 ('ubellier', 15),
 ('SieurHibou', 11),
 ('st3vK', 8),
 ('wnewspresse', 7),
 ('ActusNonStop', 6),
 ('unisciencespo', 6)]

In [7]:
# 5.
influent_accounts = Counter()

for tweet in TWEETS:
    influent_accounts[tweet['user_screen_name']] += tweet['retweet_count']

influent_accounts.most_common(10)

[('sarah_chp', 357),
 ('Anton1Ferreira', 349),
 ('RemyBuisine', 294),
 ('LarrereMathilde', 290),
 ('sciencespo', 271),
 ('LesNews', 239),
 ('LEtudiant_Libre', 222),
 ('Valeurs', 199),
 ('InsoumisJeunes', 181),
 ('UrsusArctos92', 166)]

In [8]:
# 6.
nonzero_retweets = [tweet['retweet_count'] for tweet in TWEETS if tweet['retweet_count'] != 0]
print('min', min(nonzero_retweets))
print('max', max(nonzero_retweets))
print('mean', mean(nonzero_retweets))
print('median', median(nonzero_retweets))
print('stdev', stdev(nonzero_retweets))

min 1
max 357
mean 14.649048625792812
median 3
stdev 36.38721086534486


In [9]:
# 7.
urls = Counter()

for tweet in TWEETS:
    for url in tweet['links']:
        urls[url] += 1
        
with open('../data/urls.csv', 'w', encoding='utf-8', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['url', 'shares'])
    writer.writeheader()
    
    # NOTE: if you don't give a number to most_common, it will iterate over
    # items in decreasing count
    for url, shares in urls.most_common():
        writer.writerow({'url': url, 'shares': shares})

In [10]:
# 8.
urls.most_common(10)

[('https://youtu.be/hPzR5g-sdoU', 18),
 ('http://www.lemonde.fr/enseignement-superieur/article/2013/02/27/sciences-po-occupation-d-un-amphi-contre-la-procedure-de-succession_1840022_1473692.html',
  18),
 ('https://www.huffingtonpost.fr/2018/04/18/blocage-a-sciencespo-reconnaitrez-vous-la-replique-de-film-culte-citee-par-cet-etudiant_a_23414065/',
  12),
 ('http://www.lemonde.fr/campus/article/2018/04/20/a-sciences-po-paris-le-blocage-etudiant-leve-apres-des-negociations_5288336_4401467.html',
  10),
 ('https://www.ouest-france.fr/education/universites/blocage-des-universites-sciences-po-bloque-paris-et-rennes-5705238',
  7),
 ('http://lemde.fr/13Z0Vjv', 7),
 ('http://militant.es', 6),
 ('https://www.leparisien.fr/elections/presidentielle/presidentielle-occupation-de-la-sorbonne-lens-a-paris-et-de-sciences-po-nancy-par-des-etudiants-en-colere-13-04-2022-MP76WOAVI5ANVISJ7ELSFONBGE.phpnull',
  6),
 ('http://www.revolutionpermanente.fr/L-occupation-de-Sciences-Po-Toulouse-votee-a-une-ecra

In [11]:
# 9.
hashtags = Counter()

for tweet in TWEETS:
    for hashtag in tweet['hashtags']:
        hashtags[hashtag] += 1

hashtags.most_common(25)

[('sciencespo', 222),
 ('blocage', 71),
 ('nuitdebout', 58),
 ('periscope', 55),
 ('occupation', 21),
 ('paris', 21),
 ('loitravail', 19),
 ('sorbonne', 17),
 ('blocus', 16),
 ('agsciencespo', 16),
 ('rennes', 14),
 ('occupyscpo', 14),
 ('toulouse', 11),
 ('tolbiac', 9),
 ('macron', 8),
 ('occupyboutmy', 8),
 ('giletsjaunes', 7),
 ('sciencespofacoccupee', 7),
 ('agloitravail', 7),
 ('sciences', 6),
 ('nonalaselection', 6),
 ('lgophilippot', 6),
 ('étudiants', 5),
 ('unef', 5),
 ('grenoble', 5)]