Skip to content

Lisa-Ho/three-investigators

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About this project

This is a personal project scraping and analysing data on the German audio drama ‘Die drei ???’ (orig. the three investigators). As a huge fan and regular listener since my childhood, I was curious to find out more about the series while expanding my python skills around webscraping.

Thanks to the fansite Rocky-Beach.com I was able to collect data on each episode. I then used Genderize API to predict the gender of the show's actors based on their first name.

The write up of the analysis can be found on my blog.

Notebooks

The project has been organised across different jupyter notebooks:

  1. Webscraping (data collection)
  2. Actor gender predictions using Genderize API
  3. Content analysis of titles
  4. Data cleaning and analysis

Requirements

This project is run on python 3 and a number of data analysis packages. The exact packages are specified in requirements.txt

Notes on methodology

Data cleaning

As with any data, I found some inconsistencies that required additional cleaning, for example renaming roles where different names were used for the same character.

The majority of data preparation involved deriving variables and merging different data sets.

Title categories

As part of the title content analysis, words found in episode titles were categorised into the following themes:

  • animal: katze, hund, löwe, papagei, spinne, schlange, wal, wolf, tiger, rabe, insekt, cobra, skorpion, motte, vögel, marder, - ameise, hai, gockel
  • colour: grün, schwarz, gelb, blau, rot, weiß
  • danger: gefahr, schrecken, gefährlich, grauen
  • death: tot, grab,tod, tödlich, mumie, särge
  • ethnic: volk, wikinger, azteke, pirat,samurai
  • fire: feuer, brennen, flamme
  • mystery: rätsel, geheimnis, verschwunden, unsichtbar, verschollen, täuschung, heimlich
  • paranormal: drache, monster, geist, phantom, teufel, werwolf, spuk, vodoo, vampir, dämon, fluch, untote, hexe, kobold, biest, ungeheuer, jenseits, hölle, höllisch, unterwelt, ufo, magisch
  • person: diva, mönch, madonna, pilot, zauberer, bauchredner,hehler, gaukler, wächter, mann, passagier, segler, maler, filmstar, millionär
  • place: stadt, meer, see, straße, bucht, castle, dorf, fels, hollywood, canyon, villa, höhle, schlucht, berg, insel, turm, zirkus, ranch, riff,schloß , mine, tal, moor, haus
  • sport: spieler, fussball, fußball, skateboard, poker, quiz, foul
  • tech: computer, internet, email, sms, netz, gps, handy
  • treasure: gold, diamant, schatz, rubin

About

Respository for scraping and analysing fan data on a German audio drama called 'Die Drei Fragezeichen' (the three investigators).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published