**Assignment explanation:**

An explanation of the central idea behind your final project (what is the idea?, why is it interesting? which datasets did you need to explore the idea?, how did you download them)
A walk-through of your preliminary data-analysis, addressing
What is the total size of your data? (MB, number of rows, number of variables, etc)
What is the network you will be analyzing? (number of nodes? number of links?, degree distributions, what are node attributes?, etc.)
What is the text you will be analyzing?
How will you tie networks and text together in your paper?


The core idea of our project is to map the ecosystem of Danish music festivals by looking at who plays where, how festivals cluster, and how artist descriptions change across years and genres. This lets us explore how the Danish music scene has evolved over time.

We are especially interested in how festival identities shift. For example, Roskilde originally began as a rock and pop festival, but its lineup today spans many genres. We want to see whether festivals are becoming more similar by booking the same artists, or whether they still serve different audiences and demands. We also want to identify whether there are artists who appear across multiple generations, playing both in our parents’ time and in our own.

To explore this, we are building a dataset based on the largest and oldest festivals in Denmark, including Roskilde Festival, Smukfest, Copenhell, Vig Festival, Distortion, and others. Most of the historical lineups come from festivalhistorik.dk, and any missing years will be collected from festival websites or Wikipedia.

Altogether, this gives us around 250 nodes, where each node is a specific festival in a specific year. Two nodes will be connected if they share artists.

> missing a bit more analysis...


For the text analysis, we will use artist descriptions and the official festival descriptions from 2025. We will compare how closely each festival-year matches the 2025 description, to see how festival identities develop and how their audiences may have shifted across generations.


# code for festval api

In [None]:
import requests

festivals = []

for i in range(1, 159):  # 1 to 158 inclusive
    url = f"https://api.festivalhistorik.dk/api/festivals/{i}"
    try:
        r = requests.get(url)

        if r.status_code == 200:
            data = r.json().get("data")
            if data:
                festival_slug = data.get("slug")

                cat = data.get("festival_category", {})
                general_id = data.get("id")
                festival_name = cat.get("title")

                festivals.append((general_id, festival_slug, festival_name))

                print(f"{general_id} | {festival_slug} | {festival_name}")

        else:
            print(f"{i}: no data ({r.status_code})")

    except Exception as e:
        print(f"{i}: error {e}")

print("Done.")
print(festivals)


In [None]:
# Step 2: fetch artists for each festival
all_artists = []

for fest_id, fest_slug, festival_name in festivals:
    url = f"https://api.festivalhistorik.dk/api/festivals/{fest_id}/artists"
    try:
        r = requests.get(url)
        if r.status_code == 200:
            data_artists = r.json().get("data", [])
            for artist in data_artists:
                slug = artist.get("slug")
                all_artists.append((fest_slug, slug))
                print(f"{fest_slug} | {slug}")
        else:
            print(f"Error: {r.status_code}")

    except Exception as e:
        print(f"{fest_id}: error {e}")

print("Artist scraping done.")


# code for northside

In [None]:
import re
url = "https://northside.dk/om-northside/historie/"

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                  "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15"
}

resp = requests.get(url, headers=headers)
resp.raise_for_status()
html = resp.text

years = re.findall(r'>\s*(20\d{2})\s*<', html)
unique_years = list(set(years))
artist_northside=[]
for years in unique_years:
    url_artist=f"https://northside.dk/om-northside/historie/northside-{years}"
    fest_slug = url_artist.split("/")[-1]
    resp_artist = requests.get(url_artist, headers=headers)
    if resp_artist.status_code == 200:
        resp_artist.raise_for_status()
        html_artist = resp_artist.text
        lineup = re.search(r'<p><strong>Lineup:</strong>&nbsp;(.*?)</p>', html, re.S)
        if lineup:
            artists = lineup.group(1)
        else:
            fallback = re.search(r"Line[-\s]?up:?[^<]*</?(?:strong|h\d)[^>]*>\s*:?[\s]*(.*?)(?=<h\d|<strong|</div>)", html, re.S)
            if fallback:
                artists = fallback.group(1)

        artists = re.sub(r"<.*?>", "", artists)  # fjerner tags
        artists = artists.strip()
        for artist in artists.split(", "):
            clean = artist.replace(" ", "-").replace("&amp;", "&")
            artist_northside.append((fest_slug, clean))
    else:
        print(f"Error: {r.status_code}")



## Tinderbox

In [None]:
import re
import requests
url = "https://tinderbox.dk/plakater/"
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                  "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15"
}
resp = requests.get(url, headers=headers)
resp.raise_for_status()
html = resp.text

# pattern: grab year (2015 or 2019) and the <p> that contains "Lineup:"
pattern = r'''
    <h2[^>]*>\s*TINDERBOX\s+(2015|2019)\s*</h2>   # heading with year
    \s*<p>.*?</p>                                # dates paragraph
    \s*<p>(.*?)</p>                              # lineup paragraph
'''

matches = re.findall(pattern, html, re.S | re.I | re.X)

lineups = {}
artist_tinderbox=[]

for year, lineup_html in matches:
    # remove HTML tags
    text = re.sub(r'<.*?>', '', lineup_html)
    # strip "Lineup:" or "Lineup: " at start (case-insensitive)
    text = re.sub(r'^\s*lineup:\s*', '', text, flags=re.I)
    text = text.strip()
    # split into artists if you want a list
    artists = [a.strip() for a in text.split(',') if a.strip()]
    for artist in artists:
        artist_tinderbox.append((f"tinderbox-{year}",artist))

print(artist_tinderbox)

# Nibe festival (2024-2000)

In [2]:
import requests
import re

BASE_URL = "https://www.setlist.fm/festival/{year}/nibe-festival-{year}-73d6366d.html"

# matches artist names in links like:
# <a ... href="/setlists/...">Artist</a>
# <a ... href="/setlist/...">Artist</a>
# <a ... class="artist"...>Artist</a>
ARTIST_REGEX = re.compile(r'>([^<]+)</a>')

nibe_festival_artists = []

def scrape_year(year):
    url = BASE_URL.format(year=year)
    print("Scraping:", url)

    r = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
    if r.status_code != 200:
        print("No data for year", year)
        return []

    html = r.text

    # Extract ALL <a>text</a> matches
    candidates = ARTIST_REGEX.findall(html)

    artists = []

    for name in candidates:
        name = name.strip()

        # Filter out junk entries
        if not name:
            continue
        if "Add time" in name:
            continue
        if "Add Setlist" in name:
            continue
        if "Report festival" in name:
            continue
        if "Group by" in name:
            continue
        if "Artists" in name and "(A-Z)" in name:
            continue

        # Real artist names do NOT contain ":" or "By "
        if ":" in name and "Tribute" not in name:
            # filters things like "Scheduled: Artist"
            continue

        # Skip non-artist navigation menu links
        if name in ["Home", "Festivals"]:
            continue

        # Save potential artist
        artists.append(name)

    # remove duplicate artists
    artists = list(dict.fromkeys(artists))

    return artists


# scrape every year 2000–2024
for year in range(2000, 2025):
    artists = scrape_year(year)
    for a in artists:
        nibe_festival_artists.append((f"Nibe-Festival-{year}", a))


print("\nTotal entries:", len(nibe_festival_artists))
print(nibe_festival_artists)


Scraping: https://www.setlist.fm/festival/2000/nibe-festival-2000-73d6366d.html
Scraping: https://www.setlist.fm/festival/2001/nibe-festival-2001-73d6366d.html
Scraping: https://www.setlist.fm/festival/2002/nibe-festival-2002-73d6366d.html
Scraping: https://www.setlist.fm/festival/2003/nibe-festival-2003-73d6366d.html
Scraping: https://www.setlist.fm/festival/2004/nibe-festival-2004-73d6366d.html
Scraping: https://www.setlist.fm/festival/2005/nibe-festival-2005-73d6366d.html
Scraping: https://www.setlist.fm/festival/2006/nibe-festival-2006-73d6366d.html
Scraping: https://www.setlist.fm/festival/2007/nibe-festival-2007-73d6366d.html
Scraping: https://www.setlist.fm/festival/2008/nibe-festival-2008-73d6366d.html
Scraping: https://www.setlist.fm/festival/2009/nibe-festival-2009-73d6366d.html
Scraping: https://www.setlist.fm/festival/2010/nibe-festival-2010-73d6366d.html
Scraping: https://www.setlist.fm/festival/2011/nibe-festival-2011-73d6366d.html
Scraping: https://www.setlist.fm/festiva