Skip to content

MonikaBarget/DistantReading

Repository files navigation

DistantReading

This repository contains teaching materials for distant reading in the humanities and social sciences.

Social media analysis with Twitter

Some of the scripts and data shared here relate to social media analysis with Twitter. The Twitter academic API access was used to harvest the data.

Analysing public Mastodon data

As an alternative to Twitter, we also explore Mastodon. The endpoints for collecting public data from Mastodon are explained here: Playing with public data

Analysing Apple Store reviews

This code and data samples highlight the opportunity of using reviews in the Apple Store for data analysis. Such reviews can relate to apps that users downloaded or podcasts that users listen to.

In the academic year of 2022/2023, I started experimenting with podcast reviews because one of my MA DC thesis students at Maastricht University was interested in them. The Apple podcast reviews turned out to be an exciting source because they are normally rich in content and relatively easy to scrape. The HTML of the Apple Store site is well-structured, and some developers have already built specialised Python package to harvest them.

One such useful package is the App Store Scraper for Apple, which was originally developed by Eric Lim under an MIT License (MIT).

Here is my own script using app store scraper to collect podcast reviews and write them to a CSV file:

https://github.com/MonikaBarget/DigitalHistory/blob/master/JupyterNotebooks/Webscraping_ApplePodcasts.ipynb

Here is a script by Lars Keuris @R14 using the same package but already cleaning the data along the way (e.g. by transforming emojis into punctuation marks only):

https://github.com/14RS/ScrapeApplePodcastReviews/blob/main/ScrapeApplePodcastReviewWithApp_Store_Scraper.ipynb/

Data cleaning

Up to a range of about 50000 social media posts, data can still be cleaned semi-manually in EXCEL or a browser-based cleaning tool. More data usually cause severe performance issues, so cleaning via script is recommended. The following script permits the cleaning of all kinds of social media collected in .txt format, with a special focus on deleting @ signs, hashtags, URLs and emojis:

Tool recommended for cleaning plain text in your browser: TextCleanR

Tool recommended for cleaning structured data (e.g. in CSV and EXCEL format): OpenRefine

Overview of collected data sets and task sheets

The overview of the available data sets will be regularly updated. For some data sets, task sheets for student group work can also be found in this repository:

Sample analyses

The Sample analyses folder contains slides / presentations with interpretations of the datasets shared here. Most of the presentations were created by students of Digital Cultures at the University of Maastricht and were (anonymously) shared with their permission.

BA and MA projects with a distant reading component

The following repositories contain data and visualisations (mostly created with Voyant) from BA and MA theses written at the Faculty of Arts and Social Sciences (FASoS) in Maastricht:

This repository contains data sets and visualisations relating to the social media communication of the German "Querdenker" movement during the COVID-19 pandemic. The data were collected and analysed for the BA thesis in "Digital Society", submitted by Deborah Helmich at Maastricht University in 2022. The visualisations are based on distant reading with Voyant Tools. Data tables in EXCEL format need to be downloaded before viewing. CSV files can be opened and viewed directly on GitHub.

This repository contains data tables and charts created for the MA thesis written by Xing Yin (@sachixing) in 2022. The thesis covers diversity policies in higher education worldwide and places a special emphasis on internationalisation, ethnic diversity and de-colonization. All selected policy papers have been analysed with Voyant Tools. All the charts published here have been exported from Voyant. This thesis was submitted in the MA degree programme Digital Cultures at Maastricht University.

Data tables and charts created for the MA thesis written by Yin Nien Chiang in 2022. The thesis covers digitalization policies in party manifetos issued by parties of the political left in the United Kingdom. The party manifestos have been analysed with Voyant Tools. All the charts published here have been exported from Voyant. This thesis was submitted in the MA degree programme Digital Cultures at Maastricht University.

-True Crime Podcast

This repository contains data and code created for Nicole Schanzmeyer's MA DC thesis project on True Crime Podcasts and their public perception in the academic year of 2022/2023. This is a repository on the UM Gitlab instance and only visible to members of Maastricht University.