# Clead data for Graph 1


# Performer-Genre Trends

## Purpose
Identify the popularity of different genres (e.g., music: pop, rock; sports: rugby, soccer) across the world by examining the performers and their associated genres. This graph will provide insights into:
- **Universally appealing genres**: Genres with broad, global appeal.
- **Locally preferred genres**: Genres that are popular in specific regions.

## Analysis
The graph will enable:
- **Popularity Analysis**: Highlighting genres that are broadly popular based on performer count and genre connections.
- **Niche Genres**: Identifying genres with limited appeal, primarily associated with a small number of performers.

## Data Needed
- **Performer Information**: Name and ID.
- **Genre Information**: Type and ID.
- **Relationships**: Connections between performers and the events they are associated with.

## Metric
**Degree Centrality**:  
The number of connections a node has.  
- In this context, the popularity of a genre will be measured based on the number of performers it is connected to.

## Graph Type
**Undirected Graph**:  
- Represents the mutual relationship between performers and genres.
- A performer belongs to a genre, and a genre represents its associated performers without implying directionality.

## Sketch
- **Nodes**: Performers and genres.
- **Edges**: Connections between performers and their corresponding genres.


In [26]:
import pandas as pd

# Define file paths
file_paths = {
    "genres": "data/genres.csv",
    "price_ranges": "data/price_ranges.csv",
    "seat_maps": "data/seat_maps.csv",
    "venues": "data/venues.csv",
    "artists": "data/artists.csv",
    "events": "data/events.csv"
}

# Load the data into DataFrames
dataframes = {name: pd.read_csv(path) for name, path in file_paths.items()}

# Display the first few rows of each dataframe
for name, df in dataframes.items():
    print(f"\n=== {name.capitalize()} ===")
    print(df.head())



=== Genres ===
      genre_id genre_name         sub_genre_id sub_genre_name
0  KnvZfZ7vAeA       Rock  KZazBEonSMnZfZ7v6F1            Pop
1  KnvZfZ7vAvl      Other  KZazBEonSMnZfZ7vk1I          Other
2  KnvZfZ7vAAk  Wrestling  KZazBEonSMnZfZ7vFna      Wrestling
3  KnvZfZ7vAev        Pop  KZazBEonSMnZfZ7vkJv    African Pop
4  KnvZfZ7vAe6  Undefined  KZazBEonSMnZfZ7v6JI      Undefined

=== Price_ranges ===
        event_id                     type currency  min_price  max_price
0  Z698xZ2qZa17W  standard including fees      EUR       51.0      554.5
1  Z698xZ2qZa17W                 standard      EUR       45.0      500.0
2  Z698xZ2qZa1Ad  standard including fees      EUR       51.0      558.5
3  Z698xZ2qZa1Ad                 standard      EUR       45.0      500.0
4  Z698xZ2qZa13A                 standard      EUR      297.0     6757.5

=== Seat_maps ===
        event_id                                         static_url
0  Z698xZ2qZa17W  https://media.ticketmaster.eu/spain/937c9bcfe2.

In [27]:
# Analyze unique values and counts in genres and artists data
genres_df = dataframes['genres']
artists_df = dataframes['artists']

# For genres dataset
print("=== Genres Dataset ===")
print(f"Number of unique genres: {genres_df['genre_name'].nunique()}")
print(f"Unique genres:\n{genres_df['genre_name'].unique()}\n")
print(f"Number of unique sub-genres: {genres_df['sub_genre_name'].nunique()}")
print(f"Unique sub-genres:\n{genres_df['sub_genre_name'].unique()}\n")

# For artists dataset
print("=== Artists Dataset ===")
print(f"Number of unique genres associated with artists: {artists_df['type'].nunique()}")
print(f"Unique genres in artists:\n{artists_df['type'].unique()}\n")


=== Genres Dataset ===
Number of unique genres: 11
Unique genres:
['Rock' 'Other' 'Wrestling' 'Pop' 'Undefined' 'Classical' 'Alternative'
 'Comedy' 'Dance/Electronic' 'Hip-Hop/Rap' 'Family']

Number of unique sub-genres: 11
Unique sub-genres:
['Pop' 'Other' 'Wrestling' 'African Pop' 'Undefined' 'Classical/Vocal'
 'Alternative Rock' 'Comedy' 'Dance/Electronic' 'Hip-Hop/Rap' 'Hard Rock']

=== Artists Dataset ===
Number of unique genres associated with artists: 1
Unique genres in artists:
['attraction']



In [28]:
import json
import pandas as pd


with open('data/data.json', 'r') as f:
    data = json.load(f)

# Initialize a list to store artist-genre data
artist_genre_data = []

# Iterate over events
events = data["_embedded"]["events"]

for event in events:
    classifications = event.get("classifications", [])

    for classification in classifications:
        
        # Extract genre and sub-genre names and IDs
        genre_name = classification.get("genre", {}).get("name")
        genre_id = classification.get("genre", {}).get("id")
        sub_genre_name = classification.get("subGenre", {}).get("name")
        sub_genre_id = classification.get("subGenre", {}).get("id")
        
        # Extract artists
        attractions = event.get("_embedded", {}).get("attractions", [])

        for artist in attractions:
            artist_name = artist.get("name")
            artist_id = artist.get("id")
            
            # Append the extracted data to the list
            artist_genre_data.append({
                "artist_name": artist_name,
                "artist_id": artist_id,
                "genre_name": genre_name,
                "genre_id": genre_id,
                "sub_genre_name": sub_genre_name,
                "sub_genre_id": sub_genre_id
            })


# Convert to DataFrame
artist_genre_df = pd.DataFrame(artist_genre_data)

# Display unique artist-genre associations
print(artist_genre_df.head())


       artist_name    artist_id genre_name     genre_id sub_genre_name  \
0   Paul McCartney  K8vZ9171uq0       Rock  KnvZfZ7vAeA            Pop   
1   Paul McCartney  K8vZ9171uq0       Rock  KnvZfZ7vAeA            Pop   
2  Imagine Dragons  K8vZ917GSz7       Rock  KnvZfZ7vAeA            Pop   
3  Imagine Dragons  K8vZ917GSz7       Rock  KnvZfZ7vAeA            Pop   
4  Imagine Dragons  K8vZ917GSz7       Rock  KnvZfZ7vAeA            Pop   

          sub_genre_id  
0  KZazBEonSMnZfZ7v6F1  
1  KZazBEonSMnZfZ7v6F1  
2  KZazBEonSMnZfZ7v6F1  
3  KZazBEonSMnZfZ7v6F1  
4  KZazBEonSMnZfZ7v6F1  


In [29]:
# unique artists
print("=== Unique Artists ===")
print(f"Number of unique artists: {artist_genre_df['artist_name'].nunique()}")

=== Unique Artists ===
Number of unique artists: 77


In [30]:
# unique genres
print("\n=== Unique Genres ===")
print(f"Number of unique genres: {artist_genre_df['genre_name'].nunique()}")
print(f"Unique genres:\n{artist_genre_df['genre_name'].unique()}")


=== Unique Genres ===
Number of unique genres: 10
Unique genres:
['Rock' 'Other' 'Wrestling' 'Undefined' 'Classical' 'Alternative' 'Comedy'
 'Dance/Electronic' 'Hip-Hop/Rap' 'Family']


In [31]:
# unique sub-genres
print("\n=== Unique Sub-Genres ===")
print(f"Number of unique sub-genres: {artist_genre_df['sub_genre_name'].nunique()}")
print(f"Unique sub-genres:\n{artist_genre_df['sub_genre_name'].unique()}")
# count of instances of each genre
print("\n=== Genre Counts ===")
print(artist_genre_df['genre_name'].value_counts())
# of each subgenre
print("\n=== Sub-Genre Counts ===")
print(artist_genre_df['sub_genre_name'].value_counts())


=== Unique Sub-Genres ===
Number of unique sub-genres: 10
Unique sub-genres:
['Pop' 'Other' 'Wrestling' 'Undefined' 'Classical/Vocal'
 'Alternative Rock' 'Comedy' 'Dance/Electronic' 'Hip-Hop/Rap' 'Hard Rock']

=== Genre Counts ===
genre_name
Rock                111
Family               62
Undefined            12
Alternative          11
Other                10
Comedy                8
Dance/Electronic      3
Wrestling             2
Classical             2
Hip-Hop/Rap           1
Name: count, dtype: int64

=== Sub-Genre Counts ===
sub_genre_name
Pop                 109
Other                72
Undefined            12
Alternative Rock     11
Comedy                8
Dance/Electronic      3
Classical/Vocal       2
Wrestling             2
Hard Rock             2
Hip-Hop/Rap           1
Name: count, dtype: int64


In [32]:
# unqiue genre-subgenre associations
print("\n=== Unique Genre-Sub-Genre Associations ===")
unique_genre_sub_genre = artist_genre_df.groupby(["genre_name", "sub_genre_name"]).size().reset_index(name="count")
print(unique_genre_sub_genre)



=== Unique Genre-Sub-Genre Associations ===
          genre_name    sub_genre_name  count
0        Alternative  Alternative Rock     11
1          Classical   Classical/Vocal      2
2             Comedy            Comedy      8
3   Dance/Electronic  Dance/Electronic      3
4             Family             Other     62
5        Hip-Hop/Rap       Hip-Hop/Rap      1
6              Other             Other     10
7               Rock         Hard Rock      2
8               Rock               Pop    109
9          Undefined         Undefined     12
10         Wrestling         Wrestling      2


In [33]:
# any missing values
print("\n=== Missing Values ===")
print(artist_genre_df.isnull().sum())


=== Missing Values ===
artist_name       0
artist_id         0
genre_name        0
genre_id          0
sub_genre_name    0
sub_genre_id      0
dtype: int64


In [34]:
# artist genre relationships
print("\n=== Artist-Genre Relationships ===")
artist_genre_relationships = artist_genre_df.groupby(["artist_name", "genre_name"]).size().reset_index(name="count")
# order by count
print(artist_genre_relationships.sort_values(by="count", ascending=False))


=== Artist-Genre Relationships ===
                  artist_name genre_name  count
13  Cirque du Soleil : Corteo     Family     31
12           Cirque du Soleil     Family     31
51                 Morgan Jay     Comedy      7
14               Diego Torres       Rock      6
5                   Anastacia       Rock      6
..                        ...        ...    ...
66                    Slimane       Rock      1
63               Rebeka Brown  Undefined      1
73          Twenty One Pilots      Other      1
74          Twenty One Pilots       Rock      1
76                     Weezer       Rock      1

[79 rows x 3 columns]


basically the genere sub-gnere is redudnadant


In [35]:
# column genre and subgeren with value Undefined reanme to Other
artist_genre_df["genre_name"] = artist_genre_df["genre_name"].replace("Undefined", "Other")
artist_genre_df["sub_genre_name"] = artist_genre_df["sub_genre_name"].replace("Undefined", "Other")

In [36]:
# unique values in genre and subgenre   
print("\n=== Unique Genres ===")
print(f"Number of unique genres: {artist_genre_df['genre_name'].nunique()}")
print(f"Unique genres:\n{artist_genre_df['genre_name'].unique()}")
print("\n=== Unique Sub-Genres ===")
print(f"Number of unique sub-genres: {artist_genre_df['sub_genre_name'].nunique()}")
print(f"Unique sub-genres:\n{artist_genre_df['sub_genre_name'].unique()}")


=== Unique Genres ===
Number of unique genres: 9
Unique genres:
['Rock' 'Other' 'Wrestling' 'Classical' 'Alternative' 'Comedy'
 'Dance/Electronic' 'Hip-Hop/Rap' 'Family']

=== Unique Sub-Genres ===
Number of unique sub-genres: 9
Unique sub-genres:
['Pop' 'Other' 'Wrestling' 'Classical/Vocal' 'Alternative Rock' 'Comedy'
 'Dance/Electronic' 'Hip-Hop/Rap' 'Hard Rock']


In [37]:
# all combinations of genre and subgenre
print("\n=== Unique Genre-Sub-Genre Associations ===")
unique_genre_sub_genre = artist_genre_df.groupby(["genre_name", "sub_genre_name"]).size().reset_index(name="count")
print(unique_genre_sub_genre.sort_values(by="count", ascending=False))


=== Unique Genre-Sub-Genre Associations ===
         genre_name    sub_genre_name  count
8              Rock               Pop    109
4            Family             Other     62
6             Other             Other     22
0       Alternative  Alternative Rock     11
2            Comedy            Comedy      8
3  Dance/Electronic  Dance/Electronic      3
1         Classical   Classical/Vocal      2
7              Rock         Hard Rock      2
9         Wrestling         Wrestling      2
5       Hip-Hop/Rap       Hip-Hop/Rap      1


In [38]:
genre_mapping = {
    ("Rock", "Pop"): "Rock/Pop",
    ("Family", "Other"): "Family",
    ("Other", "Other"): "Other",
    ("Alternative", "Alternative Rock"): "Alternative Rock",
    ("Comedy", "Comedy"): "Comedy",
    ("Dance/Electronic", "Dance/Electronic"): "Dance/Electronic",
    ("Classical", "Classical/Vocal"): "Classical/Vocal",
    ("Rock", "Hard Rock"): "Hard Rock",
    ("Wrestling", "Wrestling"): "Wrestling",
    ("Hip-Hop/Rap", "Hip-Hop/Rap"): "Hip-Hop/Rap"
}

# Create the new Genre column by mapping the tuples
artist_genre_df["Genre"] = artist_genre_df.apply(lambda row: genre_mapping.get((row["genre_name"], row["sub_genre_name"])), axis=1)

print(artist_genre_df.head())

       artist_name    artist_id genre_name     genre_id sub_genre_name  \
0   Paul McCartney  K8vZ9171uq0       Rock  KnvZfZ7vAeA            Pop   
1   Paul McCartney  K8vZ9171uq0       Rock  KnvZfZ7vAeA            Pop   
2  Imagine Dragons  K8vZ917GSz7       Rock  KnvZfZ7vAeA            Pop   
3  Imagine Dragons  K8vZ917GSz7       Rock  KnvZfZ7vAeA            Pop   
4  Imagine Dragons  K8vZ917GSz7       Rock  KnvZfZ7vAeA            Pop   

          sub_genre_id     Genre  
0  KZazBEonSMnZfZ7v6F1  Rock/Pop  
1  KZazBEonSMnZfZ7v6F1  Rock/Pop  
2  KZazBEonSMnZfZ7v6F1  Rock/Pop  
3  KZazBEonSMnZfZ7v6F1  Rock/Pop  
4  KZazBEonSMnZfZ7v6F1  Rock/Pop  


In [39]:
# print unique values in Genre
print("\n=== Unique Genres ===")
print(f"Number of unique genres: {artist_genre_df['Genre'].nunique()}") 
print(f"Unique genres:\n{artist_genre_df['Genre'].unique()}")


=== Unique Genres ===
Number of unique genres: 10
Unique genres:
['Rock/Pop' 'Other' 'Wrestling' 'Classical/Vocal' 'Alternative Rock'
 'Comedy' 'Dance/Electronic' 'Hip-Hop/Rap' 'Family' 'Hard Rock']


In [40]:
# pirnt hte unique group by genre_name and sub_genre_name and Genre
print("\n=== Unique Genre-Sub-Genre Associations ===")
unique_genre_sub_genre = artist_genre_df.groupby(["genre_name", "sub_genre_name", "Genre"]).size().reset_index(name="count")
print(unique_genre_sub_genre.sort_values(by="count", ascending=False))


=== Unique Genre-Sub-Genre Associations ===
         genre_name    sub_genre_name             Genre  count
8              Rock               Pop          Rock/Pop    109
4            Family             Other            Family     62
6             Other             Other             Other     22
0       Alternative  Alternative Rock  Alternative Rock     11
2            Comedy            Comedy            Comedy      8
3  Dance/Electronic  Dance/Electronic  Dance/Electronic      3
1         Classical   Classical/Vocal   Classical/Vocal      2
7              Rock         Hard Rock         Hard Rock      2
9         Wrestling         Wrestling         Wrestling      2
5       Hip-Hop/Rap       Hip-Hop/Rap       Hip-Hop/Rap      1


In [41]:
# save the artist-genre data to a CSV file
artist_genre_df.to_csv("data/artist_genre_data.csv", index=False)