# Lab 1: Getting Data

What we will do:

1. Explain this programming environment
2. Scrape some Tweets based on a keyword search using the *minet* package
3. Use the pandas package to explore the data and generate some descriptive statistics and visualisations (unfortunately no networks today)
4. Learn some Python and command line principles on the way (if you didn't know it before)

There will be two versions of this so called Jupyter Notebook for you to follow along:

* One already filled out for you, in case you want to pay more attention on other things than typing or rather alter the code to try new things.
* Another one with the code 'cells' emptied for you to practice your Python typing skills alongside the lecturer (or maybe sometimes find even better solutions to the given problems)

But now let's start.

## Get to know the minet package

Let's check whether minet is correctly setup in this programming environment.

The output of this cell should be something like `minet 0.67.1`

In [1]:
!minet --version

minet 0.67.1
[0m

Let's call for help.

In [2]:
!minet --help

usage: minet [-h] [--version]
             {buzzsumo,bz,cookies,crawl,crowdtangle,ct,extract,facebook,fb,fetch,google,hyphe,instagram,insta,mediacloud,mc,resolve,scrape,telegram,tl,tiktok,tk,twitter,tw,url-extract,url-join,url-parse,youtube,yt,help}
             ...

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

actions:
  {buzzsumo,bz,cookies,crawl,crowdtangle,ct,extract,facebook,fb,fetch,google,hyphe,instagram,insta,mediacloud,mc,resolve,scrape,telegram,tl,tiktok,tk,twitter,tw,url-extract,url-join,url-parse,youtube,yt,help}
                        Action to execute
[0m

We actually want twitter data, so let's try that

In [3]:
!minet twitter

usage: minet twitter [-h] [--rcfile RCFILE]
                     {friends,followers,list-followers,list-members,retweeters,scrape,tweet-date,users,user-tweets,tweets,attrition,tweet-search,tweet-count,user-search}
                     ...

Minet Twitter Command

Gather data from Twitter.

options:
  -h, --help                                      show this help message and exit
  --rcfile RCFILE                                 Custom path to a minet configuration file.

actions:
  {friends,followers,list-followers,list-members,retweeters,scrape,tweet-date,users,user-tweets,tweets,attrition,tweet-search,tweet-count,user-search}
                                                  Action to perform.
[0m

Not sure whether the API is still working, so we choose scraping.

In [5]:
!minet twitter scrape -h

usage: minet twitter scrape [-h] [--rcfile RCFILE] [--include-refs] [-l LIMIT]
                            [-o OUTPUT] [--query-template QUERY_TEMPLATE]
                            [-s SELECT]
                            {tweets,users} query [file]

Minet Twitter Scrape Command

Scrape Twitter's public facing search API to collect tweets or users.

Be sure to check Twitter's advanced search to check what kind of
operators you can use to tune your queries (time range, hashtags,
mentions, boolean etc.):
https://twitter.com/search-advanced?f=live

Useful operators include "since" and "until" to search specific
time ranges like so: "since:2014-01-01 until:2017-12-31".

positional arguments:
  {tweets,users}                   What to scrape. Currently only `tweets` and `users` are possible.
  query                            Search query or name of the column containing queries to run in given CSV file.
  file                             Optional CSV file containing the queries to be run.



We're interested in discussions about Germany giving battle tanks to Ukraine. So, let's try to scrape 100 tweets, just to try our query, containing the word `Leopard` (the name of a German tank model most requested by Ukraine).

In [7]:
!minet twitter scrape tweets -l 10 "Leopard"

query,id,timestamp_utc,local_time,user_screen_name,text,possibly_sensitive,retweet_count,like_count,reply_count,impression_count,lang,to_username,to_userid,to_tweetid,source_name,source_url,user_location,lat,lng,user_id,user_name,user_verified,user_description,user_url,user_image,user_tweets,user_followers,user_friends,user_likes,user_lists,user_created_at,user_timestamp_utc,collected_via,match_query,retweeted_id,retweeted_user,retweeted_user_id,retweeted_timestamp_utc,quoted_id,quoted_user,quoted_user_id,quoted_timestamp_utc,collection_time,url,place_country_code,place_name,place_type,place_coordinates,links,domains,media_urls,media_files,media_types,media_alt_texts,mentioned_names,mentioned_ids,hashtags,intervention_type,intervention_text,intervention_url
Searching for "Leopard"                                                         
Leopard,1635682232093364224,1678812008,2023-03-14T16:40:08,99leopard99,"@LEAD_Coalition As an Alzheimers caregiver, this book was instrumental in my un

Guess, we have to refine the query … 

In [9]:
!minet twitter scrape tweets -l 10 "(ukraine Germany) AND (tank OR tanks OR leopard)"

query,id,timestamp_utc,local_time,user_screen_name,text,possibly_sensitive,retweet_count,like_count,reply_count,impression_count,lang,to_username,to_userid,to_tweetid,source_name,source_url,user_location,lat,lng,user_id,user_name,user_verified,user_description,user_url,user_image,user_tweets,user_followers,user_friends,user_likes,user_lists,user_created_at,user_timestamp_utc,collected_via,match_query,retweeted_id,retweeted_user,retweeted_user_id,retweeted_timestamp_utc,quoted_id,quoted_user,quoted_user_id,quoted_timestamp_utc,collection_time,url,place_country_code,place_name,place_type,place_coordinates,links,domains,media_urls,media_files,media_types,media_alt_texts,mentioned_names,mentioned_ids,hashtags,intervention_type,intervention_text,intervention_url
Searching for "(ukraine Germany) AND (tank OR tanks OR leopard)"                
(ukraine Germany) AND (tank OR tanks OR leopard),1635682012638982160,1678811955,2023-03-14T16:39:15,Hkjhgc2,"BRUTAL ATTACK!! U artillery brigade destro

Meh, still not good enough?

In [10]:
!minet twitter scrape tweets -l 10 "(Ukraine Germany) AND (tank OR tanks OR leopard) AND (deliver OR delivery OR delivers)"

query,id,timestamp_utc,local_time,user_screen_name,text,possibly_sensitive,retweet_count,like_count,reply_count,impression_count,lang,to_username,to_userid,to_tweetid,source_name,source_url,user_location,lat,lng,user_id,user_name,user_verified,user_description,user_url,user_image,user_tweets,user_followers,user_friends,user_likes,user_lists,user_created_at,user_timestamp_utc,collected_via,match_query,retweeted_id,retweeted_user,retweeted_user_id,retweeted_timestamp_utc,quoted_id,quoted_user,quoted_user_id,quoted_timestamp_utc,collection_time,url,place_country_code,place_name,place_type,place_coordinates,links,domains,media_urls,media_files,media_types,media_alt_texts,mentioned_names,mentioned_ids,hashtags,intervention_type,intervention_text,intervention_url
Searching for "(Ukraine Germany) AND (tank OR tanks OR leopard) AND (deliver OR delivery OR delivers)"
(Ukraine Germany) AND (tank OR tanks OR leopard) AND (deliver OR delivery OR delivers),1635529283845206016,1678775542,2023-03-14T