# Popular Programming Languages #

## Overview ##

This code creates a comprehensive interactive dashboard using the `Dash` framework. The dashboard is designed to analyze the popularity of various programming languages over time, provide different visualizations of the data, and generate forecasts using `Prophet`. The dashboard includes several key components such as filtering options, multiple graph types (line charts, stacked area charts, scatter plots, bar charts, and correlation heatmaps), anomaly detection, and a data table for raw data inspection.

## Key Components ##
1. **Libraries and Dependencies**
    - ***Pandas and NumPy:*** Used for data manipulation and numerical operations.
    - ***Dash Components:***
        - `Dash`, `dcc`, and `html` are used for building the web application layout and handling interactivity.
        - `Input` and `Output` from Dash enable reactive callbacks.
    - ***Dash Ag-Grid:*** Used to create an interactive data table that displays raw data with filtering and sorting capabilities.
    - ***Plotly Express and Plotly Graph Objects:*** Employed to create various types of visualizations with a dark-themed style.
    - ***Prophet:*** A forecasting library used to predict future trends based on historical data.
2. **Global Styling**
    - The `STYLES` dictionary contains a set of common styling parameters (such as colors, margins, widths, and borders) used throughout the dashboard. This ensures consistency in appearance across all components.
3. **Data Processing Functions**

The code defines several helper functions to clean and transform the data:
- ***fill_missing_values(df, strategy):*** Fills missing values in the DataFrame using one of three strategies—linear interpolation, median fill, or zero fill.
- ***check_anomalies(df, cols, threshold):*** Computes the z-score for specified columns and identifies indices where the absolute z-score exceeds a set threshold, flagging potential anomalies.
- ***calculate_moving_average(df, cols, window):*** Calculates the moving average for each specified column using a rolling window. The parameter `min_periods=1` ensures that the calculation is performed even if fewer than the window’s worth of data is available, thereby filling the beginning of the series.
- ***calculate_monthly_diff(df, cols):*** Computes the month-over-month difference for the specified columns. Similar to the moving average, the use of `.diff()` ensures that the early values are computed based on the available data.
- ***calculate_yoy(df, cols):*** Computes the year-over-year (YoY) percentage change for each specified column. If the value from 12 months prior is zero, the function sets the YoY change to zero to avoid division by zero errors.

4. **Data Loading and Preprocessing**
- The `Date` column is converted to a datetime object.
- Missing values are filled using the chosen strategy (in this case, linear interpolation).
- The code defines a list of programming language columns to focus on.
- Anomalies are detected for these columns using the `check_anomalies` function.
- Additional features are computed:
    - Moving averages for smoothing the data.
    - Month-over-month differences.
    - Year-over-year changes.
- A new column, `sum_all_langs`, is created to store the total popularity value across all languages.
- The share of the total popularity for each language is calculated in one step using a dictionary and then concatenated with the original DataFrame to avoid performance issues.

5. **Common Graph Layout**
- The function `get_common_layout(title)` centralizes the configuration of graph layouts. It sets the dark template, title, hover mode, and axis titles, ensuring that all graphs share a consistent look and feel.

6. **Layout Creation Functions**

The dashboard’s layout is modularized into separate functions for clarity and reusability:
- ***create_filters():*** Constructs a filter section that includes a multi-select dropdown for choosing programming languages and a date range picker.
- ***create_graphs():*** Builds the main visualizations, which include:
    - A multi-line chart for displaying the popularity trends of selected languages.
    - A stacked area chart to show each language’s share of the total popularity.
    - A scatter plot for year-over-year percentage changes.
    - A bar chart for displaying the popularity data for the latest date in the selected range.
    - A correlation heatmap to visualize relationships among the selected languages.
- ***create_forecast_section():*** Creates a section dedicated to forecasting. It includes a dropdown for selecting a language to forecast, a slider to choose the forecast horizon (in months), and a graph to display the forecast results generated by `Prophet`.
- ***create_anomalies_display():*** Generates a display section for any anomalies detected in the dataset, listing each anomaly in a separate HTML element.
- ***create_data_table():*** Creates an interactive data table using `Dash Ag-Grid`, which allows the user to inspect the raw data with features like pagination, filtering, and sorting.

7. **Application Layout**
- The overall application layout is defined by combining all the layout creation functions into a single Dash HTML Div. The layout includes a title, the filters, the various graph sections, the forecasting section, the anomaly display, and the raw data table.

8. **Callbacks**

The interactivity of the dashboard is managed by several callback functions:
- ***update_charts():*** This callback updates all the main graphs when the user changes the selected languages or date range. It includes logic to handle empty data, cases where all selected languages have zero values, and builds each graph (multi-line, stacked area, scatter, bar, and heatmap) with consistent styling.
- ***display_hover_data():*** A simple callback that displays detailed information about data points on the multi-line chart when the user hovers over them.
- ***update_forecast():*** This callback handles the forecasting functionality. It filters the data based on the selected date range and language, renames columns as needed for `Prophet`, drops any remaining missing values, and checks that there are at least 24 data points (a requirement for yearly seasonality). It then fits the `Prophet` model, generates future forecasts, and builds a forecast graph with actual data, forecasted values, and confidence intervals. Duplicate keyword arguments in the layout update were removed to fix a TypeError.

9. **Running the Application**

Finally, the app is launched by calling `app.run_server(debug=True)` if the script is executed as the main module.

### 1. IMPORT LIBRARIES ###

In [3]:
import pandas as pd
import numpy as np
from dash import Dash, dcc, html, Input, Output, State
import dash_ag_grid as dag
import plotly.express as px
import plotly.graph_objects as go
from prophet import Prophet

### 2. GLOBAL VARIABLES AND STYLES DICTIONARY ###

In [5]:
# STYLES dictionary holds common styling for the dashboard
STYLES = {
    'background': '#111111',         # Dashboard background color
    'text_color': 'white',           # Common text color
    'grid_height': '400px',          # Height for the data table grid
    'margin': '20px',                # Standard margin for spacing between elements
    'border': '1px solid #444',      # Border style for components like anomaly display
    'half_width': '49%',             # Width for components that share a row (two per row)
    'full_width': '100%',            # Full width styling for certain elements
    'filter_width': '40%'            # Width for filter components such as dropdowns and date pickers
}

### 3. DATA PROCESSING FUNCTIONS ###

In [7]:
def fill_missing_values(df, strategy='interpolate'):
    """
    Fill missing values in the DataFrame using the specified strategy.
    
    Parameters:
      - df: Input DataFrame.
      - strategy: Method to fill missing values. Options are:
          "interpolate" - uses linear interpolation,
          "median" - fills with the median of each column,
          any other value - fills with zeros.
          
    Returns:
      - DataFrame with missing values filled.
      
    Note: The returned DataFrame is assigned back to ensure the changes persist.
    """
    if strategy == 'interpolate':
        df = df.interpolate(method='linear')
    elif strategy == 'median':
        df = df.fillna(df.median())
    else:
        df=df.fillna(0)

In [8]:
def check_anomalies(df, cols, threshold=3):
    """
    Check for anomalies in the specified columns using the z-score method.
    
    For each numeric column, calculate the z-score and flag indices where
    the absolute z-score exceeds the threshold.
    
    Parameters:
      - df: Input DataFrame.
      - cols: List of columns to check.
      - threshold: z-score threshold for flagging anomalies.
      
    Returns:
      - Dictionary with column names as keys and lists of anomaly indices as values.
    """
    anomalies = {}
    for col in cols:
        mean = df[col].mean()
        std = df[col].std()
        if std == 0:
            continue

        z_scores = (df[col] - mean) / std
        anomaly_idx = df.index[abs(z_scores) > threshold].tolist()
        if anomaly_idx:
            anomalies[col] = anomaly_idx

    return anomalies

In [9]:
def calculate_moving_average(df, cols, window=3):
    """
    Calculate the moving average for each specified column.
    
    Parameters:
      - df: Input DataFrame.
      - cols: List of column names to calculate the moving average.
      - window: Window size for the moving average.
      
    Returns:
      - DataFrame with additional columns containing the moving averages.
      
    Note: min_periods=1 is used, so the average is calculated even when the number of available data points
          is less than the window size. This intentionally fills the beginning of the series.
    """
    for col in cols:
        df[f'{col}_MA_{window}'] = df[col].rolling(window=window, min_periods=1).mean()
    return df

In [10]:
def calculate_monthly_diff(df, cols):
    """
    Calculate the month-over-month difference for each specified column.
    
    Parameters:
      - df: Input DataFrame.
      - cols: List of column names.
      
    Returns:
      - DataFrame with new columns containing the difference from the previous month.
      
    Note: The use of .diff() with min_periods=1 ensures that the beginning of the series is filled with the difference
          computed from the available data.
    """
    for col in cols:
        df[f'{col}_MoM'] = df[col].diff()
    return df

In [11]:
def calculate_yoy(df, cols):
    """
    Calculate the year-over-year (YoY) percentage change for each specified column.
    
    If the value from 12 months prior is zero, the YoY change is set to zero to avoid division by zero.
    
    Parameters:
      - df: Input DataFrame.
      - cols: List of column names.
      
    Returns:
      - DataFrame with new columns containing YoY percentage changes.
    """
    for col in cols:
        prev = df[col].shift(12)
        df[f'{col}_YoY'] = np.where(prev == 0, 0, ((df[col] - prev) / prev) * 100)
    return df

### 4. LOAD AND PREPROCESS DATA ###

In [13]:
# Load data from a URL (replace with a local file path if necessary)
df_raw = pd.read_csv('Popularity of Programming Languages from 2004 to 2024.csv')
df_raw

Unnamed: 0,Date,Abap,Ada,C/C++,C#,Cobol,Dart,Delphi/Pascal,Go,Groovy,...,Powershell,Python,R,Ruby,Rust,Scala,Swift,TypeScript,VBA,Visual Basic
0,July 2004,0.34,0.36,10.01,4.68,0.42,0.00,2.80,0.00,0.03,...,0.16,2.51,0.39,0.33,0.24,0.17,0.00,0.00,1.43,8.50
1,August 2004,0.35,0.36,9.74,4.96,0.46,0.00,2.65,0.00,0.07,...,0.15,2.62,0.40,0.40,0.19,0.17,0.00,0.00,1.45,8.51
2,September 2004,0.41,0.41,9.59,5.04,0.51,0.00,2.64,0.00,0.08,...,0.08,2.71,0.40,0.41,0.17,0.13,0.00,0.00,1.54,8.38
3,October 2004,0.40,0.38,9.47,5.29,0.53,0.00,2.76,0.00,0.09,...,0.12,2.91,0.42,0.46,0.12,0.14,0.00,0.00,1.61,8.46
4,November 2004,0.38,0.38,9.48,5.22,0.55,0.00,2.75,0.00,0.07,...,0.12,2.83,0.41,0.44,0.17,0.15,0.00,0.00,1.50,8.21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
241,August 2024,0.62,0.96,6.36,6.67,0.20,0.93,0.06,2.12,0.27,...,0.98,29.59,4.65,0.98,2.62,0.59,2.75,2.95,1.00,0.43
242,September 2024,0.65,1.00,6.49,6.66,0.23,0.90,0.05,2.16,0.25,...,0.97,29.49,4.72,0.99,2.66,0.58,2.72,2.98,1.01,0.45
243,October 2024,0.64,1.03,6.75,6.59,0.24,0.94,0.03,2.19,0.25,...,0.98,29.46,4.71,0.98,2.58,0.55,2.71,2.94,0.99,0.46
244,November 2024,0.64,1.12,6.92,6.49,0.19,0.95,0.03,2.20,0.27,...,0.99,29.41,4.70,0.97,2.63,0.48,2.68,2.91,0.99,0.49


In [14]:
df_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 246 entries, 0 to 245
Data columns (total 30 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Date           246 non-null    object 
 1   Abap           246 non-null    float64
 2   Ada            246 non-null    float64
 3   C/C++          246 non-null    float64
 4   C#             246 non-null    float64
 5   Cobol          246 non-null    float64
 6   Dart           246 non-null    float64
 7   Delphi/Pascal  246 non-null    float64
 8   Go             246 non-null    float64
 9   Groovy         246 non-null    float64
 10  Haskell        246 non-null    float64
 11  Java           246 non-null    float64
 12  JavaScript     246 non-null    float64
 13  Julia          246 non-null    float64
 14  Kotlin         246 non-null    float64
 15  Lua            246 non-null    float64
 16  Matlab         246 non-null    float64
 17  Objective-C    246 non-null    float64
 18  Perl      

In [15]:
# Create a copy of the raw data
df = df_raw.copy()

In [16]:
# Convert the 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 246 entries, 0 to 245
Data columns (total 30 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Date           246 non-null    datetime64[ns]
 1   Abap           246 non-null    float64       
 2   Ada            246 non-null    float64       
 3   C/C++          246 non-null    float64       
 4   C#             246 non-null    float64       
 5   Cobol          246 non-null    float64       
 6   Dart           246 non-null    float64       
 7   Delphi/Pascal  246 non-null    float64       
 8   Go             246 non-null    float64       
 9   Groovy         246 non-null    float64       
 10  Haskell        246 non-null    float64       
 11  Java           246 non-null    float64       
 12  JavaScript     246 non-null    float64       
 13  Julia          246 non-null    float64       
 14  Kotlin         246 non-null    float64       
 15  Lua            246 non-

  df['Date'] = pd.to_datetime(df['Date'], errors='coerce')


In [17]:
# Define the language columns (adjust this list if the dataset changes)
language_cols = [
    'Abap', 'Ada', 'C/C++', 'C#',
    'Cobol', 'Dart', 'Delphi/Pascal',
    'Go', 'Groovy', 'Haskell', 'Java',
    'JavaScript', 'Julia', 'Kotlin',
    'Lua', 'Matlab', 'Objective-C',
    'Perl', 'PHP', 'Powershell',
    'Python', 'R', 'Ruby', 'Rust',
    'Scala', 'Swift', 'TypeScript',
    'VBA', 'Visual Basic'
]

In [18]:
# Check for anomalies in the language columns
anomalies = check_anomalies(df, language_cols)
anomalies

{'Haskell': [239, 240, 241, 242, 243, 244], 'Rust': [241, 242, 244, 245]}

In [19]:
# Calculate moving average and month-over-month differences
df = calculate_moving_average(df, language_cols, window=3)
df = calculate_monthly_diff(df, language_cols)
df = calculate_yoy(df, language_cols)
df

Unnamed: 0,Date,Abap,Ada,C/C++,C#,Cobol,Dart,Delphi/Pascal,Go,Groovy,...,Powershell_YoY,Python_YoY,R_YoY,Ruby_YoY,Rust_YoY,Scala_YoY,Swift_YoY,TypeScript_YoY,VBA_YoY,Visual Basic_YoY
0,2004-07-01,0.34,0.36,10.01,4.68,0.42,0.00,2.80,0.00,0.03,...,,,,,,,,,,
1,2004-08-01,0.35,0.36,9.74,4.96,0.46,0.00,2.65,0.00,0.07,...,,,,,,,,,,
2,2004-09-01,0.41,0.41,9.59,5.04,0.51,0.00,2.64,0.00,0.08,...,,,,,,,,,,
3,2004-10-01,0.40,0.38,9.47,5.29,0.53,0.00,2.76,0.00,0.09,...,,,,,,,,,,
4,2004-11-01,0.38,0.38,9.48,5.22,0.55,0.00,2.75,0.00,0.07,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
241,2024-08-01,0.62,0.96,6.36,6.67,0.20,0.93,0.06,2.12,0.27,...,6.521739,6.247756,5.442177,-7.547170,24.761905,-14.492754,7.003891,-4.220779,7.526882,-23.214286
242,2024-09-01,0.65,1.00,6.49,6.66,0.23,0.90,0.05,2.16,0.25,...,7.777778,5.812702,6.546275,-8.333333,27.884615,-15.942029,1.492537,-2.295082,4.123711,-27.419355
243,2024-10-01,0.64,1.03,6.75,6.59,0.24,0.94,0.03,2.19,0.25,...,8.888889,5.139186,6.081081,-8.411215,26.470588,-16.666667,0.743494,-1.672241,3.125000,-20.689655
244,2024-11-01,0.64,1.12,6.92,6.49,0.19,0.95,0.03,2.20,0.27,...,10.000000,5.073240,5.381166,-5.825243,30.845771,-21.311475,-2.189781,-1.355932,3.125000,-12.500000


In [20]:
# Calculate share of total for each language while avoiding division by zero.
# To avoid DataFrame fragmentation, build share columns in one go using pd.concat.
df['sum_all_langs'] = df[language_cols].sum(axis=1)
share_data = {
    f'{lang}_Share': np.where(df['sum_all_langs'] == 0, 0, df[lang] / df['sum_all_langs'])
    for lang in language_cols
}
share_df = pd.DataFrame(share_data, index=df.index)
df = pd.concat([df, share_df], axis=1)

In [21]:
df

Unnamed: 0,Date,Abap,Ada,C/C++,C#,Cobol,Dart,Delphi/Pascal,Go,Groovy,...,Powershell_Share,Python_Share,R_Share,Ruby_Share,Rust_Share,Scala_Share,Swift_Share,TypeScript_Share,VBA_Share,Visual Basic_Share
0,2004-07-01,0.34,0.36,10.01,4.68,0.42,0.00,2.80,0.00,0.03,...,0.001600,0.025105,0.003901,0.003301,0.002400,0.001700,0.000000,0.000000,0.014303,0.085017
1,2004-08-01,0.35,0.36,9.74,4.96,0.46,0.00,2.65,0.00,0.07,...,0.001500,0.026203,0.004000,0.004000,0.001900,0.001700,0.000000,0.000000,0.014501,0.085109
2,2004-09-01,0.41,0.41,9.59,5.04,0.51,0.00,2.64,0.00,0.08,...,0.000800,0.027095,0.003999,0.004099,0.001700,0.001300,0.000000,0.000000,0.015397,0.083783
3,2004-10-01,0.40,0.38,9.47,5.29,0.53,0.00,2.76,0.00,0.09,...,0.001200,0.029097,0.004200,0.004600,0.001200,0.001400,0.000000,0.000000,0.016098,0.084592
4,2004-11-01,0.38,0.38,9.48,5.22,0.55,0.00,2.75,0.00,0.07,...,0.001200,0.028306,0.004101,0.004401,0.001700,0.001500,0.000000,0.000000,0.015003,0.082116
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
241,2024-08-01,0.62,0.96,6.36,6.67,0.20,0.93,0.06,2.12,0.27,...,0.009801,0.295930,0.046505,0.009801,0.026203,0.005901,0.027503,0.029503,0.010001,0.004300
242,2024-09-01,0.65,1.00,6.49,6.66,0.23,0.90,0.05,2.16,0.25,...,0.009700,0.294900,0.047200,0.009900,0.026600,0.005800,0.027200,0.029800,0.010100,0.004500
243,2024-10-01,0.64,1.03,6.75,6.59,0.24,0.94,0.03,2.19,0.25,...,0.009799,0.294571,0.047095,0.009799,0.025797,0.005499,0.027097,0.029397,0.009899,0.004600
244,2024-11-01,0.64,1.12,6.92,6.49,0.19,0.95,0.03,2.20,0.27,...,0.009900,0.294100,0.047000,0.009700,0.026300,0.004800,0.026800,0.029100,0.009900,0.004900


### 5. HELPER FUNCTION FOR COMMON GRAPH LAYOUT ###

In [23]:
def get_common_layout(title):
    """
    Return a dictionary with common layout settings for Plotly graphs.
    
    Parameters:
      - title: Graph title.
      
    Returns:
      - Dictionary of layout settings.
    """
    return dict(
        template='plotly_dark',
        title=title,
        hovermode='x unified',
        xaxis_title='Date',
        yaxis_title='Value'
    )

### 6. LAYOUT CREATION FUNCTIONS ###

In [25]:
def create_filters():
    """
    Create the filters section containing a language selection dropdown and a date range picker.
    
    Returns:
      - A Dash HTML Div containing the filter components.
    """
    return html.Div(
        children=[
            html.Div([
                html.Label(
                    'Select Languages:',
                    style={'color': STYLES['text_color']}
                ),
                dcc.Dropdown(
                    id='language-dropdown',
                    options=[
                        {'label': lang, 'value': lang} for lang in language_cols
                    ],
                    value=['Python', 'JavaScript'],  # Default selected languages
                    multi=True,
                    style={
                        'backgroundColor': STYLES['background'],
                        'color': 'black'
                    }
                )
            ], style={'width': STYLES['filter_width'], 'display': 'inline-block', 'verticalAlign': 'top', 'margin': STYLES['margin']}),
            html.Div([
                html.Label(
                    'Select Date Range:',
                    style={
                        'color': STYLES['text_color']
                    }
                ),
                dcc.DatePickerRange(
                    id='date-range-picker',
                    start_date=df['Date'].min(),
                    end_date=df['Date'].max(),
                    min_date_allowed=df['Date'].min(),
                    max_date_allowed=df['Date'].max(),
                    display_format='YYYY-MM-DD',
                    style={
                        'backgroundColor': STYLES['background'],
                        'color': 'black'
                    }
                )
            ], style={'width': STYLES['filter_width'], 'display': 'inline-block', 'marginLeft': STYLES['margin'], 'verticalAlign': 'top'})
        ], style={'marginBottom': STYLES['margin']}
    )

In [26]:
def create_graphs():
    """
    Create the main graphs section including:
      - Multi-line chart (language popularity)
      - Stacked area chart (share of total)
      - Scatter plot (year-over-year changes)
      - Bar chart (latest data)
      - Correlation heatmap
      
    Returns:
      - A Dash HTML Div containing all graphs.
    """
    return html.Div([
        # Top row: Multi-line chart and stacked area chart
        html.Div([
            html.Div(
                [dcc.Graph(id='multi-line-chart')],
                style={
                    'width': STYLES['half_width'],
                    'display': 'inline-block'
                }
            ),
            html.Div(
                [dcc.Graph(id='stacked-area-chart')],
                style={
                    'width': STYLES['half_width'],
                    'display': 'inline-block'
                }
            )
        ], style={'marginBottom': STYLES['margin']}),
        # Next row: Scatter plot and bar chart
        html.Div([
            html.Div(
                [dcc.Graph(id='yoy-scatter-plot')],
                style={
                    'width': STYLES['half_width'],
                    'display': 'inline-block'
                }
            ),
            html.Div(
                [dcc.Graph(id='bar-chart')],
                style={
                    'width': STYLES['half_width'],
                    'display': 'inline-block'
                }
            )
        ], style={'marginBottom': STYLES['margin']}
                ),
        # Full-width correlation heatmap
        html.Div(
            [dcc.Graph(id='heatmap')],
            style={
                'width': STYLES['full_width'],
                'display': 'inline-block'
            }
        )
    ])

In [27]:
def create_forecast_section():
    """
    Create the forecasting section with controls (language dropdown and horizon slider) and the forecast graph.
    
    Returns:
      - A Dash HTML Div containing the forecasting controls and graph.
    """
    return html.Div([
        html.H2(
            'Forecasting',
            style={
                'marginTop': STYLES['margin'],
                'color': STYLES['text_color']
            }
        ),
        html.Div([
            html.Div([
                html.Label(
                    'Select Language for Forecasting:',
                    style={
                        'color': STYLES['text_color']
                    }
                ),
                dcc.Dropdown(
                    id='forecast-language-dropdown',
                    options=[{'label': lang, 'value': lang} for lang in language_cols],
                    value='Python',  # Default forecast language
                    clearable=False,
                    style={
                        'backgroundColor': STYLES['background'],
                        'color': 'black'
                    }
                )
            ], style={'width': STYLES['filter_width'], 'display': 'inline-block', 'verticalAlign': 'top', 'margin': STYLES['margin']}),
            html.Div([
                html.Label(
                    'Forecast Horizon (months):',
                    style={
                        'color': STYLES['text_color']
                    }
                ),
                dcc.Slider(
                    id='forecast-horizon-slider',
                    min=1,
                    max=24,
                    step=1,
                    value=12,
                    marks={i: str(i) for i in range(1, 25)}
                )
            ], style={'width': STYLES['filter_width'], 'display': 'inline-block', 'marginLeft': STYLES['margin'], 'verticalAlign': 'top'})
        ], style={'marginBottom': STYLES['margin'], 'marginTop': STYLES['margin']}),
        html.Div(
            [dcc.Graph(id='forecast-figure')],
            style={
                'width': STYLES['full_width'],
                'display': 'inline-block',
                'marginBottom': STYLES['margin']
            }
        )
    ])

In [28]:
def create_anomalies_display():
    """
    Create a section to display detected anomalies in the data.
    
    Returns:
      - A Dash HTML Div that shows each anomaly as a separate element.
    """
    if anomalies:
        anomaly_items = [
            html.Div(f'{col}: indices {idx_list}') for col, idx_list in anomalies.items()
        ]
    else:
        anomaly_items = [
            html.Div('No anomalies detected.')
        ]
    return html.Div([
        html.H3(
            'Data Anomalies',
            style={
                'color': STYLES['text_color']
            }
        ),
        html.Div(
            anomaly_items,
            style={
                'padding': STYLES['margin'],
                'border': STYLES['border']
            }
        )
    ], style={'marginTop': STYLES['margin']})

In [29]:
def create_data_table():
    """
    Create an interactive data table using dash_ag_grid.
    
    Returns:
      - A Dash HTML Div containing the data table.
    """
    return html.Div([
        html.H3(
           'Raw Data',
            style={
                'color': STYLES['text_color']
            }
        ),
        dag.AgGrid(
            rowData=df.to_dict('records'),
            columnDefs=[{"field": i, "filter": True, "sortable": True} for i in df.columns],
            dashGridOptions={
                'pagination': True,
                'paginationPageSize': 10
            },
            className='ag-theme-balham-dark',
            style={
                'height': STYLES['grid_height'],
                'width': STYLES['full_width']
            }
        )
    ], style={'marginTop': STYLES['margin']})

### 7. CREATE THE APP LAYOUT ###

In [31]:
app = Dash(__name__)
app.layout = html.Div(
    style={
        'backgroundColor': STYLES['background'],
        'color': STYLES['text_color'],
        'padding': STYLES['margin']
    },
    children=[
        html.H1(
            'Programming Languages Popularity Analysis Dashboard',
            style={
                'textAlign': 'center',
                'marginBottom': STYLES['margin']
            }
        ),
        create_filters(),
        create_graphs(),
        # Hover information section for the multi-line chart
        html.Div([
            html.H4(
                'Hover Information (Multi-Line Chart):',
                style={'color': STYLES['text_color']}
            ),
            html.Div(
                id='hover-info',
                style={
                    'padding': STYLES['margin'],
                    'border': STYLES['border']
                }
            )
        ], style={'marginTop': STYLES['margin']}),
        create_forecast_section(),
        create_anomalies_display(),
        create_data_table()
    ]
)

### 8. CALLBACKS ###

In [33]:
# 8.1: Callback to update main graphs based on selected languages and date range.
@app.callback(
    Output('multi-line-chart', 'figure'),
    Output('stacked-area-chart', 'figure'),
    Output('yoy-scatter-plot', 'figure'),
    Output('bar-chart', 'figure'),
    Output('heatmap', 'figure'),
    Input('language-dropdown', 'value'),
    Input('date-range-picker', 'start_date'),
    Input('date-range-picker', 'end_date')
)

def update_charts(selected_langs, start_date, end_date):
    # If no language is selected, default to ['Python'].
    if not selected_langs:
        selected_langs = ['Python']

    # Filter the DataFrame by the selected date range.
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    dff = df.loc[mask].copy()
    if dff.empty:
        empty_fig = go.Figure(
            layout=get_common_layout('No data available in the selected range')
        )
        return empty_fig, empty_fig, empty_fig, empty_fig, empty_fig

    # If all selected languages have only zero values, return a placeholder for all graphs.
    if all(dff[lang].sum() == 0 for lang in selected_langs):
        placeholder = go.Figure(
            layout=get_common_layout('All selected languages have zero values')
        )
        return placeholder, placeholder, placeholder, placeholder, placeholder

    # 1) Build the multi-line chart for language popularity.
    fig_multi_line = go.Figure()
    for lang in selected_langs:
        # If all values for the language are zero, add a trace with zeros and annotate it.
        if dff[lang].sum() == 0:
            fig_multi_line.add_trace(
                go.Scatter(
                    x=dff['Date'],
                    y=[0] * len(dff),
                    mode='lines',
                    name=f'{lang} (all zeros)'
                )
            )
        else:
            fig_multi_line.add_trace(
                go.Scatter(
                    x=dff['Date'],
                    y=dff[lang],
                    mode='lines',
                    name=lang
                )
            )
    fig_multi_line.update_layout(
        **get_common_layout('Language Popularity Dynamics')
    )

    # 2) Build the stacked area chart for share of total popularity.
    fig_stacked_area = go.Figure()
    for lang in selected_langs:
        fig_stacked_area.add_trace(
            go.Scatter(
                x=dff['Date'],
                y=dff[f'{lang}_Share'],
                stackgroup='one',
                name=f'{lang} Share'
            )
        )
    fig_stacked_area.update_layout(
        **get_common_layout('Share of Total Popularity')
    )

    # 3) Build the scatter plot for year-over-year (YoY) percentage changes.
    fig_yoy_scatter = go.Figure()
    for lang in selected_langs:
        fig_yoy_scatter.add_trace(
            go.Scatter(
                x=dff['Date'],
                y=dff[f'{lang}_YoY'],
                mode='markers+lines',
                name=f'{lang} YoY (%)'
            )
        )
    fig_yoy_scatter.update_layout(
        **get_common_layout('Year-over-Year (%)')
    )

    # 4) Build the bar chart for the latest date in the selected range.
    latest_date = dff['Date'].max()
    dff_latest = dff[dff['Date'] == latest_date]
    dff_melted = dff_latest.melt(
        id_vars='Date',
        value_vars=selected_langs,
        var_name='Language',
        value_name='Popularity'
    )
    fig_bar = px.bar(
        dff_melted,
        x='Language',
        y='Popularity',
        title=f'Popularity on {latest_date.date()}',
        template='plotly_dark'
    )
    fig_bar.update_layout(
        xaxis_title='Language',
        yaxis_title='Popularity'
    )

    # 5) Build the correlation heatmap among the selected languages.
    corr_data = dff[selected_langs].corr()
    fig_heatmap = px.imshow(
        corr_data,
        text_auto=True,
        title='Correlation Heatmap',
        template='plotly_dark'
    )

    return fig_multi_line, fig_stacked_area, fig_yoy_scatter, fig_bar, fig_heatmap

In [34]:
# 8.2: Callback to display hover information for the multi-line chart.
@app.callback(
    Output('hover-info', 'children'),
    Input('multi-line-chart', 'hoverData')
)

def display_hover_data(hover_data):
    if hover_data is None:
        return 'Hover over the multi-line chart to see details.'
    # Directly access the list of points; each point contains 'x' and 'y'.
    points = hover_data['points']
    messages = [f'Date: {point['x']}, Value: {point['y']}' for point in points]
    return html.Div([html.Div(msg) for msg in messages])

In [35]:
# 8.3: Callback for forecasting using Prophet.
@app.callback(
    Output('forecast-figure', 'figure'),
    Input('forecast-language-dropdown', 'value'),
    Input('forecast-horizon-slider', 'value'),
    Input('date-range-picker', 'start_date'),
    Input('date-range-picker', 'end_date')
)

def update_forecast(selected_lang, horizon_months, start_date, end_date):
    # Filter data for forecasting and rename columns for Prophet ('ds' for date, 'y' for target).
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    dff = df.loc[mask, ['Date', selected_lang]].rename(columns={'Date': 'ds', selected_lang: 'y'}).copy()
    
    # Drop any remaining missing values to prevent warnings or errors in Prophet.
    dff = dff.dropna()

    # Check if there is enough data for forecasting (minimum 24 months for yearly seasonality).
    if len(dff) < 24:
        return go.Figure(
            layout=get_common_layout('Insufficient data for forecasting (< 24 months)')
        )

    if dff.empty or dff['y'].sum() == 0:
        empty_fig = go.Figure(
            layout=get_common_layout('No data available for forecasting')
        )
        return empty_fig

    # Use a fixed frequency 'MS' (month start) for forecasting due to potential irregularities in the data.
    freq = 'MS'

    # Build and fit the Prophet model with error handling.
    model = Prophet(
        yearly_seasonality=True,
        weekly_seasonality=False,
        daily_seasonality=False
    )
    try:
        model.fit(dff)
    except Exception as e:
        return go.Figure(
            layout=get_common_layout(f'Forecasting error: {str(e)}')
        )

    # Create a future DataFrame for forecasting with the fixed frequency.
    future = model.make_future_dataframe(periods=horizon_months, freq=freq)
    forecast = model.predict(future)

    # Build the forecast figure with actual data, forecast, and confidence intervals.
    fig = go.Figure()
    fig.add_trace(
        go.Scatter(
            x=dff['ds'],
            y=dff['y'],
            mode='lines',
            name='Actual Data'
        )
    )
    fig.add_trace(
        go.Scatter(
            x=forecast['ds'],
            y=forecast['yhat'],
            mode='lines',
            name='Forecast'
        )
    )
    fig.add_trace(
        go.Scatter(
            x=forecast['ds'],
            y=forecast['yhat_upper'],
            mode='lines',
            line=dict(width=0),
            name='Upper Confidence',
            showlegend=False
        )
    )
    fig.add_trace(
        go.Scatter(
            x=forecast['ds'],
            y=forecast['yhat_lower'],
            mode='lines',
            line=dict(width=0),
            fill='tonexty',
            fillcolor='rgba(255,0,0,0.2)',
            name='Lower Confidence',
            showlegend=False
        )
    )
    fig.update_layout(
        **get_common_layout(f'{selected_lang} - {horizon_months}-Month Forecast')
    )

    return fig

### 9. RUN THE APP ###

In [37]:
if __name__ == "__main__":
    app.run_server(debug=True)