Skip to content

acwilb/Digital-Humanities-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Beyond Beats: A Data-Driven, Humanistic View into the Marginalization of Black Artists and the Cultural Rejection of Hip Hop's Emergence

Collaborative project with UCLA classmates done January - March 2024 for Digial Humanities final project.


For our Digital Humanities final project, the goal was to create a web-based digital presentation that explores and analyzes a topic from a variety of perspectives and tools. Since this was an elective across various majors, we drew from each member's different expertise, where those studying Data Science collaborating on the data cleaning, analysis, and visualization, those with a background in Web Development collaborated on creating a visually-appealing website to display our findings on, and those studying Humanities collected research literature on our topic. Integrating our data analysis findings with literature on the topic allowed for a holistic perspective on this humanistic topic.

We used a variety of tools throughout the course of this project, including Wordpress, Timeline.JS, Tableau, Zotero, R Programming and packages, Python Programming and packages, HTML, and CSS.

The data set we built upon was aggregated by user walkerq at https://github.com/walkerkq/musiclyrics. This data set included all Billboard Year-End Hot 100 entries from 1964 to 2015, included the variables "Rank", "Song", "Artist", "Year", "Lyrics", and "Lyric Source", and was cleaned and standardized. Our team collaboratively added an extra variable and recent observations to this data set in order to answer the research question we intended to focus on: "How have socio-cultural dynamics influenced the marginalization of Black artists and stigma surrounding the emergence of Hip Hop, as viewed through the lens of music?" We added the variable "Genre" in order to narrow our focus on the Hip Hop genre, and added songs from the past 8 years to ensure that Hip Hop's impact was truly represented.

I used the R packages rvest and RCurlto perform web scraping and add more recent data to our data set. I used Wikipedia to scrape the artist and song names of each Billboard Year-End Hot 100 entry from 2016 to 2023, and letras.com to scrape the lyrics of each song using a loop that inputs the scraped song information (song name and artist) to generate predictable URL strings such as "letras.com/ARTIST/SONG.html". If a particular song's lyric page did not exist on letras.com, the loop moved past this observation and indicated that the lyrics were not found in order to prevent any skewness in our analysis of the data. After scraping this recent data, the other Data Science majors on our team used Python's Spotify API to scrape each song's genre. Once these were added, we cleaned the genre column by categorizing observations such as "['pop rap', 'rap', 'southern hip hop', 'trap', 'viral trap']" into "Hip Hop" using regex and classification loops.

Our cleaned and updated data set contains six variables "Rank", "Song", "Artist", "Year", "Lyrics", and "Genre" and contains 5880 observations from 1965 to 2023. I collaborated with my teammates to create visualizations from this data in R's ggplot2. I then used plotly to convert these to interactive HTML embeds for our digital presentation, which can be found at https://theafternoon.humspace.ucla.edu/narrative/.