<a href="https://colab.research.google.com/github/chrdrn/digital-behavior-data-binder/blob/main/session_04-showcase_twitter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<IMG SRC="https://raw.githubusercontent.com/chrdrn/digital-behavioral-data/main/img/dbd_hex.png" WIDTH=15% HEIGHT=15% ALIGN="left" HSPACE="20" VSPACE="20" /> 
<h1>Disclaimer </h1>
<p>For reasons of research ethics and out of respect for privacy the data collected and processed in the course will be managed in a private OSF repository. Students will only have access to this OSF repository for a limited period of time. </p>
<p>Although the collection of data is (at least) partially documented in the showcases, detailed instructions can be found in the slides of course for the respective session. </p>
<p> Link: <a href="https://chrdrn.github.io/digital-behavioral-data/">https://chrdrn.github.io/digital-behavioral-data/</a>
</p>

<BR CLEAR=”left” />

---
### <img src="https://icons.getbootstrap.com/assets/icons/info-circle-fill.svg" width="15" height="15"> Technical note

While the chunk outputs were saved, the data basis was not. For an error-free execution of this notebook, the data must be collected and reloaded. All chunks for which the path to the data must be re-entered are marked with the following symobl: <img src="https://icons.getbootstrap.com/assets/icons/database-fill-down.svg" width="15" height="15">


---

### <img src="https://icons.getbootstrap.com/assets/icons/exclamation-triangle-fill.svg" width="15" height="15"> Important note
This session was created before Twitter announced changes to its API access:

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Starting February 9, we will no longer support free access to the Twitter API, both v2 and v1.1. A paid basic tier will be available instead 🧵</p>&mdash; Twitter Dev (@TwitterDev) <a href="https://twitter.com/TwitterDev/status/1621026986784337922?ref_src=twsrc%5Etfw">February 2, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

As things stand (28.02.2023), there will no longer be a free academic research twitter API.

----


# Background

Practical application of the <img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/brands/twitter.svg" width="15" height="15"> *Twitter Academic Research Product Track v2 API endpoint* with the help of the [academictwitteR](https://github.com/cjbarrie/academictwitteR) package. Visit the <img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/brands/github.svg" width="15" height="15"> [repository](https://github.com/cjbarrie/academictwitteR) of the package for further information.


This version of the Twitter API allows researchers to access larger volumes of Twitter data. For more information on the the Twitter API, including how to apply for access to the Academic Research Product Track, see the [Twitter Developer platform.](https://developer.twitter.com/en/use-cases/do-research/academic-research)

## Preparation

Install addtional necessary packages

⚠ It might take a few minutes to install all packages and dependencies



In [None]:
if(!require("here")) {install.packages("here"); library("here")}
if(!require("academictwitteR")) {install.packages("academictwitteR"); library("academictwitteR")}
if(!require("lubridate")) {install.packages("lubridate"); library("lubridate")}
if(!require("sjmisc")) {install.packages("sjmisc"); library("sjmisc")}
if(!require("tidyverse")) {install.packages("tidyverse"); library("tidyverse")}
if(!require("quanteda")) {install.packages("quanteda"); library("quanteda")}
if(!require("quanteda.textstats")) {install.packages("quanteda.textstats"); library("quanteda.textstats")}
if(!require("quanteda.textplots")) {install.packages("quanteda.textplots"); library("quanteda.textplots")}
if(!require("ggthemes")) {install.packages("ggthemes"); library("ggthemes")}
if(!require("ggpubr")) {install.packages("ggpubr"); library("ggpubr")}

Loading required package: here

“there is no package called ‘here’”
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

here() starts at /content

Loading required package: academictwitteR

“there is no package called ‘academictwitteR’”
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

Loading required package: sjmisc

“there is no package called ‘sjmisc’”
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘datawizard’, ‘insight’, ‘sjlabelled’



Attaching package: ‘sjmisc’


The following object is masked from ‘package:purrr’:

    is_empty


The following object is masked from ‘package:tidyr’:

    replace_na


The following object is masked from ‘package:tibble’:

    add_case


Loading required package: quanteda

“there is no package called ‘quanteda’”
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the d

## Set personal bearer token


In [None]:
personal_bearer_token <- "INSERT BEARER TOKEN HERE"

## Mining tweets: hashtag(s)

### Data collection

In [None]:
get_all_tweets(
    query = "#Karneval", 
    start_tweets = "2022-11-11T00:00:00Z",
    end_tweets = "2022-11-13T12:00:00Z",
    file = "karneval",
    data_path = "data.local/raw_karneval/",
    n = 100000,
    #bearer_token = personal_bearer_token
  )


#### Read data from local


In [None]:
tweets_karneval <- bind_tweets(
  data_path = here("content/04-api_access-twitter/data.local/raw_karneval"),
  output_format = "tidy") %>% 
    mutate(
    datetime = ymd_hms(created_at),
    date = date(datetime),
    hour = hour(datetime),
    min  = minute(datetime),
    hms  = hms::as_hms(datetime),
    hm   = hms::parse_hm(hms)
  )

### Data analysis

#### Overview of data set

In [None]:
tweets_karneval %>% glimpse

#### Language of tweets

In [None]:
tweets_karneval %>% glimpse

#### Tweets over time

In [None]:
tweets_karneval %>% 
  ggplot(aes(hour)) +
  geom_bar() +
  facet_grid(cols = vars(date)) +
  theme_pubr()

#### Most frequent time (HH:MM) of sending tweets


In [None]:
tweets_karneval %>%
  frq(hm,
      sort.frq = "desc", 
      min.frq = 10)

#### User with the most tweets


In [None]:
tweets_karneval %>% 
  frq(user_username,
      sort.frq = "desc", 
      min.frq = 5)

## Mining tweets: profile(s)


### Data collection


In [None]:
get_all_tweets(
    users = c("elonmusk"),
    start_tweets = "2020-11-11T00:00:00Z",
    end_tweets = "2022-11-13T12:00:00Z",
    file = "elonmusk",
    data_path = here("content/04-api_access-twitter/data.local/raw_elonmusk/"),
    n = 100000,
    bearer_token = personal_bearer_token
  )

#### Read data from local


In [None]:
tweets_musk <- bind_tweets(
  data_path = here("content/04-api_access-twitter/data.local/raw_elonmusk"),
  # data_path = "data/raw_karneval",
  output_format = "tidy") %>% 
    mutate(
    datetime = ymd_hms(created_at),
    date = date(datetime),
    hour = hour(datetime),
    min  = minute(datetime),
    hms  = hms::as_hms(datetime),
    hm   = hms::parse_hm(hms)
  )

### Data analysis

#### Overview of dataset


In [None]:
tweets_musk %>% glimpse

#### Tweets over time


In [None]:
tweets_musk %>% 
  ggplot(aes(date)) +
  geom_bar() +
  theme_pubr()

#### Tweets with the most likes


In [None]:
tweets_musk %>% 
  filter(is.na(sourcetweet_type)) %>% 
  arrange(-like_count) %>% 
  select(text, created_at, like_count) %>% 
  head(10)

#### Tweets with the most retweets


In [None]:
tweets_musk %>% 
  filter(is.na(sourcetweet_type)) %>% 
  arrange(-retweet_count) %>% 
  select(text, created_at, retweet_count) %>% 
  head(10)

#### Proportion of tweets


In [None]:
tweets_musk %>% 
  frq(sourcetweet_type)

#### Languate of tweets


In [None]:
tweets_musk %>% 
  frq(lang)

## Text mining


### Preprocessing

In [None]:
remove_html <- "&amp;|&lt;|&gt;"

tweets_en <- tweets_musk %>% 
  filter(lang == "en",
         is.na(sourcetweet_type)) %>% 
  select(tweet_id, text, user_username) %>% 
  mutate(text = str_remove_all(text, remove_html))

In [None]:
tweets_en_corpus <- corpus(
  tweets_en,
  docid_field = "tweet_id",
  text_field = "text")

In [None]:
tweets_en_tokens <- 
  tokens(tweets_en_corpus,
         remove_punct = TRUE,
         remove_numbers = TRUE,
         remove_symbols = TRUE,
         remove_url = TRUE) %>% 
  tokens_tolower() %>% 
  tokens_remove(stopwords("english"))

In [None]:
tweets_en_dfm <- dfm(tweets_en_tokens)

### Analysis


#### Top Hashtags


In [None]:
tag_dfm <- dfm_select(tweets_en_dfm, pattern = "#*")
toptag <- names(topfeatures(tag_dfm, 50))
head(toptag, 10)

#### Top Mentions


In [None]:
user_dfm <- dfm_select(tweets_en_dfm, pattern = "@*")
topuser <- names(topfeatures(user_dfm, 50))
head(topuser, 10)

#### Exclude Hashtags & Metions


In [None]:
tweets_en_clean <- tweets_en_dfm %>% 
  dfm_remove(pattern = "@*") %>% 
  dfm_remove(pattern = "#*")

#### Top 10 features


In [None]:
term_freq_en <- textstat_frequency(tweets_en_clean)
head(term_freq_en, n = 10)

#### Wordcloud with Top 50 features


In [None]:
textplot_wordcloud(tweets_en_clean, max_words = 50)