# Song Dataset Analyser

This notebook implements a graphical user interface using Jupyter Widgets to access various song analysis functionalities.

Author: F416781  
Date: January 2024

In [None]:
# Configure Jupyter to display plots inline
%matplotlib inline

import ipywidgets as widgets
from IPython.display import display, clear_output, HTML
import matplotlib.pyplot as plt
import CW_Preprocessing as preproc
import Genres
import Artist
import Top5
import sqlite3
from pathlib import Path

# Configure matplotlib for high-quality notebook display
#plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = [12, 6]
plt.rcParams['figure.dpi'] = 100

### Overview

The Song Data Analyzer program offers several key benefits for users, especially those interested in analyzing music data, such as music analysts, researchers, or music industry professionals. First, it helps users efficiently clean and organize large datasets of song information by converting and standardizing important data points, such as song duration and year, ensuring consistency. It then filters the data based on specific metrics like popularity, danceability, and speechiness, enabling users to focus on the most relevant songs that meet their criteria. 

The program also creates a structured SQLite database, where songs, artists, and genres are linked in a way that ensures data integrity and ease of use. By organizing the data into relationships, users can easily retrieve and analyze connections between different aspects of the songs.

For users interested in visual analysis, the program provides simple yet effective bar chart visualizations. It compares the popularity of individual artists against the overall genre average, making it easy to see how successful an artist is relative to others in the same genre. The program uses color-coding to differentiate between an artist’s popularity and the overall genre average, which simplifies the interpretation of the data.

In summary, this program saves users time and effort by automating the data cleaning, filtering, and visualization process. It helps users to quickly identify trends, patterns, and insights, enabling better decision-making in the music industry.

### CW_Preprocessing

The script processes and analyzes a dataset of songs, cleans and filters the data, and populates an SQLite database with structured records. The `load_and_clean_data` function loads the dataset from a CSV file, cleans it by converting durations to seconds, handling year data, and extracting genres. The `filter_data` function applies criteria such as minimum popularity, danceability thresholds, and speechiness range, logging the filtering impact and listing artists whose songs meet the criteria. The `create_database_schema` function establishes the database structure, defining tables for artists, genres, songs, and their many-to-many relationships. The `create_and_populate_database` function populates the database with processed data, mapping songs to artists and genres while preserving relational integrity. Finally, the `main` function orchestrates the entire process: loading and cleaning data, filtering it, and populating the database, ensuring a smooth pipeline for preprocessing. If invoked directly, the script initiates the workflow, logging progress and handling errors.

### Genre

The code contains various functions for analyzing music data stored in an SQLite database. It includes `validate_year`, which checks if a given year is within the valid range (1998–2020), and `check_database_years`, which lists distinct years and song counts in the database. The `get_genre_statistics` function retrieves genre-specific statistics for a given year, including average metrics like danceability and popularity, while handling cases with insufficient data by suggesting alternative years. The `display_statistics` function formats and prints these statistics, and `create_visualizations` generates a pie chart to visualize song distributions by genre. Finally, the `analyze_year` function integrates these operations to analyze a specific year's data, and the `main` function facilitates user interaction via the command line.

### Artist

This Python script is designed to analyze an artist's popularity across musical genres using data stored in an SQLite database. It includes the `validate_artist` function to check if the artist exists in the database and suggest similar names when necessary. The `get_artist_popularity` function retrieves and compares an artist's average popularity with the overall genre averages, calculating differences and highlighting values above the mean. A helper function, `get_similar_artists`, generates suggestions for similar artist names based on input. Results are displayed in a formatted table via `display_popularity_table` and visualized using a bar chart in `create_popularity_chart`. The `analyze_artist` function integrates these components, handling errors and providing user-friendly outputs. A command-line interface in the `main` function allows users to input an artist's name and trigger the analysis. Logging is implemented to track warnings, errors, and system activities.

### Top5

The program analyzes and visualizes top artists based on song counts and popularity from a SQLite database within a specified year range (1998-2020). It uses the `RankingWeights` class to define and validate weights for ranking criteria, while `validate_year_range` ensures the provided years are valid integers within the range. The `calculate_ranking_value` function computes a weighted ranking score combining normalized song counts and average popularity. The `get_top_artists` function queries the database, calculates rankings, and identifies the top 5 artists, presenting a breakdown of their yearly performance. Results are displayed using `display_rankings_table`, which highlights key values, and visualized with `create_visualization`, which generates a line chart of artist rankings over time. The `analyze_top_artists` function integrates these components, validating inputs, retrieving data, and presenting results, while the `main` function serves as the program’s entry point, allowing users to input a year range and initiate the analysis.

### How the program is used by the user


1. **Load Song Data**: The user starts by providing a CSV file containing song data. This file can include various attributes such as song title, artist, genre, popularity, danceability, and year of release. The program will automatically load and clean the data, converting units (e.g., milliseconds to seconds for duration) and handling any missing or incorrect values.

2. **Data Cleaning and Filtering**: Once the data is loaded, the program cleans the data by removing irrelevant or empty entries and ensuring proper data types (e.g., converting year to numeric values). The user can apply filters to focus on songs that meet specific criteria, such as songs with popularity above 50, a danceability rating above 0.2, or speechiness within a certain range. This allows users to narrow down the dataset to only the most relevant songs.

3. **Visualize Data**: After filtering the data, the user can view a bar chart that compares an artist's popularity with the overall genre's average popularity. The chart uses color-coding to distinguish between the artist’s popularity (blue) and the overall genre average (orange), making it easy for users to see how an artist performs in comparison to others in the same genre.

4. **Database Creation**: For users who need to store the cleaned and filtered song data, the program can create an SQLite database. It organizes the data into tables for songs, artists, and genres, establishing relationships between them. This structure allows users to efficiently query and analyze the data in the future.

5. **Save and Export**: The user can save the visualized data and the created SQLite database for future reference or use in further analysis.

Overall, the program is designed for music analysts, researchers, and anyone interested in exploring music data, allowing them to easily preprocess and visualize data, identify trends, and make data-driven decisions. The user interaction is mostly through providing a CSV file and following the steps outlined, with the program handling the backend processing and analysis.

In [None]:
class SongAnalyserMenu:
    """Main menu class for the Song Dataset Analyser"""
    
    def __init__(self):
        """Initialize the menu system with widgets"""
        self.output = widgets.Output()
        self.status = widgets.HTML(value="")
        self.create_main_menu()
        
    def check_database(self):
        """Check if the database exists and has been initialized"""
        if not Path("CWDatabase.db").exists():
            self.show_status("Database not found. Please run Data Preprocessing first.", "danger")
            return False
        return True

    def show_status(self, message, status_type="info"):
        """Display a status message with appropriate styling"""
        colors = {
            "info": "blue",
            "success": "green",
            "danger": "red",
            "warning": "orange"
        }
        self.status.value = f"<div style='color: {colors[status_type]}; margin: 10px 0;'>{message}</div>"

    def create_main_menu(self):
        """Create the main menu interface with buttons for each functionality"""
        # Title and description
        title = widgets.HTML(value="<h1 style='margin-bottom: 20px;'>Song Dataset Analyser</h1>")
        description = widgets.HTML(
            value="""<div style='margin-bottom: 20px;'>
            Welcome to the Song Dataset Analyser. Choose one of the following options:
            <ul>
                <li><strong>Data Preprocessing:</strong> Clean and prepare the dataset</li>
                <li><strong>Genre Statistics:</strong> Analyze genres for a specific year</li>
                <li><strong>Artist Popularity:</strong> Compare artist popularity across genres</li>
                <li><strong>Top 5 Artists:</strong> View top performing artists</li>
            </ul>
            </div>"""
        )
        
        # Create buttons with improved styling
        button_style = {'button_color': '#4CAF50', 'font_weight': 'bold'}
        
        self.preproc_btn = widgets.Button(
            description='Data Preprocessing',
            button_style='info',
            tooltip='Clean and store data in SQLite database',
            layout=widgets.Layout(width='300px', margin='10px 0')
        )
        
        self.genres_btn = widgets.Button(
            description='Genre Statistics',
            button_style='info',
            tooltip='View genre statistics for a specific year',
            layout=widgets.Layout(width='300px', margin='10px 0')
        )
        
        self.artist_btn = widgets.Button(
            description='Artist Popularity',
            button_style='info',
            tooltip='Analyze artist popularity across genres',
            layout=widgets.Layout(width='300px', margin='10px 0')
        )
        
        self.top5_btn = widgets.Button(
            description='Top 5 Artists',
            button_style='info',
            tooltip='View top 5 artists for a year range',
            layout=widgets.Layout(width='300px', margin='10px 0')
        )
        
        # Add button click handlers
        self.preproc_btn.on_click(self.handle_preproc)
        self.genres_btn.on_click(self.handle_genres)
        self.artist_btn.on_click(self.handle_artist)
        self.top5_btn.on_click(self.handle_top5)
        
        # Layout buttons vertically
        buttons = widgets.VBox([
            self.preproc_btn,
            self.genres_btn,
            self.artist_btn,
            self.top5_btn
        ])
        
        # Main layout
        self.main_container = widgets.VBox([
            title,
            description,
            buttons,
            self.status,
            self.output
        ])
        
        display(self.main_container)
    
    def handle_preproc(self, btn):
        """Handle preprocessing button click"""
        with self.output:
            clear_output()
            try:
                self.show_status("Processing data...", "info")
                preproc.main()
                self.show_status("Data preprocessing completed successfully!", "success")
            except Exception as e:
                self.show_status(f"Error during preprocessing: {str(e)}", "danger")
    
    def handle_genres(self, btn):
        """Handle genres button click"""
        with self.output:
            clear_output()
            if not self.check_database():
                return

            # Create year input widget with validation
            year_input = widgets.BoundedIntText(
                value=2020,
                description='Year:',
                min=1998,
                max=2020,
                layout=widgets.Layout(width='200px')
            )
            
            def on_analyze(btn):
                with self.output:
                    clear_output()
                    try:
                        self.show_status("Analyzing genres...", "info")
                        Genres.analyze_year(year_input.value)
                        self.show_status("Analysis completed!", "success")
                    except Exception as e:
                        self.show_status(f"Error analyzing genres: {str(e)}", "danger")
            
            analyze_btn = widgets.Button(
                description='Analyze',
                button_style='success',
                layout=widgets.Layout(margin='10px 0')
            )
            analyze_btn.on_click(on_analyze)
            
            form = widgets.VBox([
                widgets.HTML(value="<h3>Genre Statistics Analysis</h3>"),
                widgets.HTML(value="Enter a year between 1998 and 2020 to analyze genre statistics:"),
                year_input,
                analyze_btn
            ])
            
            display(form)
    
    def handle_artist(self, btn):
        """Handle artist button click"""
        with self.output:
            clear_output()
            if not self.check_database():
                return

            # Create artist input widget with validation
            artist_input = widgets.Text(
                description='Artist:',
                placeholder='Enter artist name',
                layout=widgets.Layout(width='300px')
            )
            
            def on_analyze(btn):
                with self.output:
                    clear_output()
                    if not artist_input.value.strip():
                        self.show_status("Please enter an artist name", "warning")
                        return
                    try:
                        self.show_status("Analyzing artist popularity...", "info")
                        Artist.analyze_artist(artist_input.value)
                        self.show_status("Analysis completed!", "success")
                    except ValueError as ve:
                        self.show_status(str(ve), "warning")
                    except Exception as e:
                        self.show_status(f"Error analyzing artist: {str(e)}", "danger")
            
            analyze_btn = widgets.Button(
                description='Analyze',
                button_style='success',
                layout=widgets.Layout(margin='10px 0')
            )
            analyze_btn.on_click(on_analyze)
            
            form = widgets.VBox([
                widgets.HTML(value="<h3>Artist Popularity Analysis</h3>"),
                widgets.HTML(value="Enter an artist name to analyze their popularity across genres:"),
                artist_input,
                analyze_btn
            ])
            
            display(form)
    
    def handle_top5(self, btn):
        """Handle top 5 artists button click"""
        with self.output:
            clear_output()
            if not self.check_database():
                return

            # Create year range input widgets with validation
            start_year = widgets.BoundedIntText(
                value=2017,
                description='Start Year:',
                min=1998,
                max=2020,
                layout=widgets.Layout(width='200px')
            )
            end_year = widgets.BoundedIntText(
                value=2019,
                description='End Year:',
                min=1998,
                max=2020,
                layout=widgets.Layout(width='200px')
            )
            
            def validate_years():
                """Validate year range"""
                if end_year.value < start_year.value:
                    self.show_status("End year must be greater than or equal to start year", "warning")
                    return False
                return True
            
            def on_analyze(btn):
                with self.output:
                    clear_output()
                    if not validate_years():
                        return
                    
                    try:
                        self.show_status("Analyzing top artists...", "info")
                        Top5.analyze_top_artists(start_year.value, end_year.value)
                        self.show_status("Analysis completed!", "success")
                    except Exception as e:
                        self.show_status(f"Error analyzing top artists: {str(e)}", "danger")
            
            analyze_btn = widgets.Button(
                description='Analyze',
                button_style='success',
                layout=widgets.Layout(margin='10px 0')
            )
            analyze_btn.on_click(on_analyze)
            
            form = widgets.VBox([
                widgets.HTML(value="<h3>Top 5 Artists Analysis</h3>"),
                widgets.HTML(value="Select a year range to analyze top performing artists:"),
                start_year,
                end_year,
                analyze_btn
            ])
            
            display(form)

In [None]:
# Create and display the menu
menu = SongAnalyserMenu()

VBox(children=(HTML(value="<h1 style='margin-bottom: 20px;'>Song Dataset Analyser</h1>"), HTML(value="<div sty…