# Banknotes Data Analysis Dashboard #

## Overview ##
This Python script builds an interactive dashboard using the `Dash` framework with a dark theme, styled via `dash_bootstrap_components` and the `DARKLY` theme. The dashboard's purpose is to analyze a dataset of banknotes by exploring attributes of the figures depicted on them, such as their profession, lifespan, and geographic distribution. The script is divided into several key sections:
- Data loading and preprocessing
- Feature engineering
- Filter options
- Interactive visualizations organized across multiple tabs

The tabs provide distinct analyses, including main analysis, trends, geography, correlation, machine learning, network visualization, and forecasting.

## 1. Importing Libraries ##
The script imports various Python libraries to support its functionality:
- **Data Manipulation and Visualization:**
    - `pandas` and `numpy` handle data processing.
    - `plotly.express` and `plotly.graph_objects` create interactive visualizations.
- **Dash and Bootstrap Components:**
    - `Dash` (with submodules `dcc`, `html`, and `dash_table`) constructs the web application.
    - `dash_bootstrap_components` applies a dark theme.
- **Machine Learning and Evaluation:**
    - `sklearn.ensemble.RandomForestClassifier` implements a RandomForest model.
    - `sklearn.metrics` computes accuracy, F1-score, precision, and recall.
- **Statistical Testing:**
    - `scipy.stats.chi2_contingency` conducts chi-squared tests.
- **Forecasting:**
    - `Prophet` performs time-series forecasting.
- **Network Visualization:**
    - **networkx** generates graphs of relationships between figures.

## 2. Data Loading and Preprocessing ##
- **Loading the Dataset:**
    - The script loads data from a CSV file named `banknotesData.csv` using `pandas.read_csv()`.
    - It prints few first and last rows to confirm successful loading.
- **Handling Missing Values:**
    - Numeric columns are filled with median values.
    - String columns are filled with "Unknown".
    - The `deathDate` column is converted to numeric values, with invalid entries coerced to `NaN`.

## 3. Feature Engineering ##
- **Pioneer Flag:**
    - A new column, `isPioneer`, is added.
    - It is True if `knownForBeingFirst` (stripped and lowercased) equals "yes", otherwise False.
- **Profession Standardization and Classification:**
    - The `profession` column is converted to title case and saved as `profession_clean`.
    - A function, `classify_profession()`, categorizes professions and storing results in `prof_category`.
- **LifeSpan Calculation:**
    - The `lifeSpan` column is calculated as the absolute difference between `deathDate` and `firstAppearanceDate`, ensuring positive values.

## 4. Filter Options and Helper Variables ##
The dashboard includes interactive filters:
- **Dropdown Filters:**
    - Options for `Country`, `Gender`, `Pioneer`, and `Profession`.
- **Range Sliders:**
    - Sliders for `First Appearance Year` and `Bill Value`.
- **Specialized Filters:**
    - A dropdown for profession category in the geography tab.
    - A country filter for the network visualization.

## 5. App Layout ##
The dashboard layout is built with Dash components:
- **Header and Global Filters:**
    - A header appears at the top.
    - Global filters include dropdowns for `Country`, `Gender`, `Pioneer`, and `Profession`, plus sliders for `First Appearance Year` and `Bill Value`.
    - A "Reset Filters" button resets all filters to defaults.
- **Tabs:**
    - ***Main Analysis Tab:***
        - Bar chart of profession category counts
        - Scatter plot of bill value vs. waiting time
        - Grouped bar chart of professions by gender
        - Box plot of waiting time by pioneer status
        - Data table of the dataset
    - ***Trends Analysis Tab:***
        - Line chart of banknote distribution over time
        - Radio button to group by gender or profession
        - Moving average lines for smoothing
    - ***Geography Analysis Tab:***
        - Choropleth map of banknote distribution by country
        - Dropdown to filter by profession category
    - ***Correlation Analysis Tab:***
        - Heatmap of numerical correlations
        - Cross-tab heatmap of gender vs. profession
        - Chi-squared test results
    - ***Machine Learning Tab:***
        - Bar chart of RandomForest feature importance
        - Text displaying accuracy, F1-score, precision, and recall
    - ***Network Visualization Tab:***
        - Interactive graph of relationships between figures
        - Built with `networkx` and `kamada_kawai_layout`
        - Country filter for nodes
    - ***Forecasting Tab:***
        - Time-series forecast of banknote counts using Prophet
        - Visualization of historical and forecasted data

## 6. Callbacks ##
Dash callbacks make the dashboard interactive:
- **Reset Filters Callback:**
    - Triggered by the "Reset Filters" button to restore default filter values.
- **Main Visualization Callback:**
    - Responds to changes in all filters.
    - ***Data Filtering:*** Applies selected filters to the dataset.
    - ***Main Analysis:*** Updates bar, scatter, grouped bar, box plot, and table.
    - ***Trends:*** Reindexes data for all years, adds smoothing, and updates the line chart.
    - ***Geography:*** Groups data by country and updates the choropleth map.
    - ***Correlation:*** Computes heatmaps and chi-squared results.
    - ***Machine Learning:*** Trains a RandomForest model, showing feature importance and metrics.
    - ***Network:*** Builds a networkx graph with nodes and edges, using kamada_kawai_layout.
    - ***Forecasting:*** Uses Prophet to forecast five years ahead, plotting historical and predicted data.

## 7. Running the Application ##
- The script runs the Dash app in debug mode.
- It starts a local server, allowing access to the dashboard via a web browser.

### 1. Import required libraries ###

In [4]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from dash import Dash, dcc, html, Input, Output, State
import dash_bootstrap_components as dbc
from dash import dash_table
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from scipy.stats import chi2_contingency
from prophet import Prophet
import networkx as nx

### 2. Data Loading and Preprocessing ###

In [6]:
df = pd.read_csv('banknotesData.csv')
df

Unnamed: 0,country,countryAbbr,currencyName,name,gender,billCount,profession,knownForBeingFirst,currentBillValue,propTotalBills,firstAppearanceDate,deathDate,appearanceDeathDiff,comments,hoverText,hasPortrait,id
0,Argentina,ARS,Argentinian Peso,Eva Perón,F,1.0,Activist,No,100,,2012,1952,60.0,,,True,ARS_Evita
1,Argentina,ARS,Argentinian Peso,Julio Argentino Roca,M,1.0,Head of Gov't,No,100,,1988,1914,74.0,,,True,ARS_Argentino
2,Argentina,ARS,Argentinian Peso,Domingo Faustino Sarmiento,M,1.0,Head of Gov't,No,50,,1999,1888,111.0,,,True,ARS_Domingo
3,Argentina,ARS,Argentinian Peso,Juan Manuel de Rosas,M,1.0,Politician,No,20,,1992,1877,115.0,,,True,ARS_Rosas
4,Argentina,ARS,Argentinian Peso,Manuel Belgrano,M,1.0,Founder,Yes,10,,1970,1820,150.0,Came up with the first Argentine flag.,Designed first Argentine flag,True,ARS_Belgrano
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
274,Venezuela,VES,Venezuelan bolivar,Francisco de Miranda,M,1.0,Military,No,200,,1968,1816,152.0,,,False,VES_Miranda
275,Venezuela,VES,Venezuelan bolivar,Simón Rodrigues,M,1.0,Educator,No,20,,2007,1854,153.0,,,False,VES_Rodrigues
276,Venezuela,VES,Venezuelan bolivar,Ezequiel Zamora,M,1.0,Military,No,100,,2018,1860,158.0,,,False,VES_Ezequiel
277,Venezuela,VES,Venezuelan bolivar,Rafael Urdaneta,M,1.0,Head of Gov't,No,10,,2018,1845,173.0,,,False,VES_Rajael


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 279 entries, 0 to 278
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   country              279 non-null    object 
 1   countryAbbr          279 non-null    object 
 2   currencyName         279 non-null    object 
 3   name                 279 non-null    object 
 4   gender               279 non-null    object 
 5   billCount            279 non-null    float64
 6   profession           279 non-null    object 
 7   knownForBeingFirst   279 non-null    object 
 8   currentBillValue     279 non-null    int64  
 9   propTotalBills       59 non-null     float64
 10  firstAppearanceDate  279 non-null    int64  
 11  deathDate            272 non-null    object 
 12  appearanceDeathDiff  271 non-null    float64
 13  comments             119 non-null    object 
 14  hoverText            89 non-null     object 
 15  hasPortrait          279 non-null    boo

In [8]:
# Fill missing values: numeric columns with median and string columns with "Unknown"
num_cols = df.select_dtypes(include=[np.number]).columns
str_cols = df.select_dtypes(include=[object]).columns

# Fill numeric columns with median
for col in num_cols:
    df[col] = df[col].fillna(df[col].median())

# Fill string columns with "Unknown"
for col in str_cols:
    df[col] = df[col].fillna("Unknown")

In [9]:
# Convert the 'deathDate' column to numeric type
df['deathDate'] = pd.to_numeric(df['deathDate'], errors='coerce')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 279 entries, 0 to 278
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   country              279 non-null    object 
 1   countryAbbr          279 non-null    object 
 2   currencyName         279 non-null    object 
 3   name                 279 non-null    object 
 4   gender               279 non-null    object 
 5   billCount            279 non-null    float64
 6   profession           279 non-null    object 
 7   knownForBeingFirst   279 non-null    object 
 8   currentBillValue     279 non-null    int64  
 9   propTotalBills       279 non-null    float64
 10  firstAppearanceDate  279 non-null    int64  
 11  deathDate            271 non-null    float64
 12  appearanceDeathDiff  279 non-null    float64
 13  comments             279 non-null    object 
 14  hoverText            279 non-null    object 
 15  hasPortrait          279 non-null    boo

### 3. Feature Engineering ###

In [11]:
# 1. Create a flag 'isPioneer' based on whether the person is known for being the first
df['isPioneer'] = df['knownForBeingFirst'].apply(lambda x: True if str(x).strip().lower() == "yes" else False)

In [12]:
# 2. Standardize the profession names
df['profession_clean'] = df['profession'].str.title()

In [13]:
# 3. Simplified classification of professions for easier analysis
def classify_profession(prof):
    # Приведение профессии к нижнему регистру для единообразия
    prof_lower = str(prof).lower()
    
    # Определение списков профессий для каждой категории
    creative = ['writer', 'musician', 'visual artist', 'performer']
    political = ['politician', 'head of gov\'t', 'monarch', 'founder']
    
    # Проверка и возврат категории
    if prof_lower in creative:
        return 'Creative'
    elif prof_lower in political:
        return 'Political'
    elif prof_lower == 'revolutionary':
        return 'Revolutionary'
    elif prof_lower == 'military':
        return 'Military'
    elif prof_lower == 'religious figure':
        return 'Religious'
    elif prof_lower == 'stem':
        return 'STEM'
    elif prof_lower == 'activist':
        return 'Activist'
    elif prof_lower == 'educator':
        return 'Educator'
    elif prof_lower == 'other':
        return 'Historical Figure'  # Для "Other" по умолчанию
    else:
        return 'Unknown'

df['prof_category'] = df['profession_clean'].apply(classify_profession)

In [14]:
# 4. Create a lifeSpan feature if data is available
def compute_lifespan(row):
    try:
        return abs(float(row['deathDate']) - float(row['firstAppearanceDate']))
    except:
        return np.nan

df['lifeSpan'] = df.apply(compute_lifespan, axis=1)

In [15]:
df

Unnamed: 0,country,countryAbbr,currencyName,name,gender,billCount,profession,knownForBeingFirst,currentBillValue,propTotalBills,...,deathDate,appearanceDeathDiff,comments,hoverText,hasPortrait,id,isPioneer,profession_clean,prof_category,lifeSpan
0,Argentina,ARS,Argentinian Peso,Eva Perón,F,1.0,Activist,No,100,0.1,...,1952.0,60.0,Unknown,Unknown,True,ARS_Evita,False,Activist,Activist,60.0
1,Argentina,ARS,Argentinian Peso,Julio Argentino Roca,M,1.0,Head of Gov't,No,100,0.1,...,1914.0,74.0,Unknown,Unknown,True,ARS_Argentino,False,Head Of Gov'T,Political,74.0
2,Argentina,ARS,Argentinian Peso,Domingo Faustino Sarmiento,M,1.0,Head of Gov't,No,50,0.1,...,1888.0,111.0,Unknown,Unknown,True,ARS_Domingo,False,Head Of Gov'T,Political,111.0
3,Argentina,ARS,Argentinian Peso,Juan Manuel de Rosas,M,1.0,Politician,No,20,0.1,...,1877.0,115.0,Unknown,Unknown,True,ARS_Rosas,False,Politician,Political,115.0
4,Argentina,ARS,Argentinian Peso,Manuel Belgrano,M,1.0,Founder,Yes,10,0.1,...,1820.0,150.0,Came up with the first Argentine flag.,Designed first Argentine flag,True,ARS_Belgrano,True,Founder,Political,150.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
274,Venezuela,VES,Venezuelan bolivar,Francisco de Miranda,M,1.0,Military,No,200,0.1,...,1816.0,152.0,Unknown,Unknown,False,VES_Miranda,False,Military,Military,152.0
275,Venezuela,VES,Venezuelan bolivar,Simón Rodrigues,M,1.0,Educator,No,20,0.1,...,1854.0,153.0,Unknown,Unknown,False,VES_Rodrigues,False,Educator,Educator,153.0
276,Venezuela,VES,Venezuelan bolivar,Ezequiel Zamora,M,1.0,Military,No,100,0.1,...,1860.0,158.0,Unknown,Unknown,False,VES_Ezequiel,False,Military,Military,158.0
277,Venezuela,VES,Venezuelan bolivar,Rafael Urdaneta,M,1.0,Head of Gov't,No,10,0.1,...,1845.0,173.0,Unknown,Unknown,False,VES_Rajael,False,Head Of Gov'T,Political,173.0


### 4. Filter Options and Helper Variables

In [17]:
country_options = [{'label': country, 'value': country} for country in sorted(df['country'].unique())]
gender_options = [{'label': gender, 'value': gender} for gender in sorted(df['gender'].unique())]
pioneer_options = [
    {'label': 'Pioneers (Yes)', 'value': True},
    {'label': '"Non-Pioneers (No)', 'value': False}
]
profession_options = [{'label': prof, 'value': prof} for prof in sorted(df['profession_clean'].unique())]

In [18]:
# Filter for the geography tab by profession category
geo_category_options = [{'label': prof, 'value': prof} for prof in sorted(df['profession_clean'].unique())]
geo_category_options.insert(0, {'label': 'All', 'value': 'all'})

min_year = int(df['firstAppearanceDate'].min())
max_year = int(df['firstAppearanceDate'].max())
min_bill = float(df['currentBillValue'].min())
max_bill = float(df['currentBillValue'].max())

In [19]:
# For network filter – by node country
network_country_options = [{'label': country, 'value': country} for country in sorted(df['country'].unique())]
network_country_options.insert(0, {'label': 'All', 'value': 'all'})

### 5. App Layout with Tabs and Advanced Filters, Including a Reset Button ###

In [21]:
external_stylesheets = [dbc.themes.DARKLY]
app = Dash(__name__, external_stylesheets=external_stylesheets)
app.title = 'Banknotes Data Dashboard'

In [22]:
app.layout = dbc.Container([
    dbc.Row(
        dbc.Col(
            html.H1(
                'Banknotes Data Dashboard',
                className='text-center text-light my-4'),
            width=12
        )
    ),
    # Global Filters with a Reset Button
    dbc.Row([
        dbc.Col([
            html.Label(
                'Country',
                className='text-light'
            ),
            dcc.Dropdown(
                id='country-filter',
                options=country_options,
                multi=True,
                className='text-dark',
                placeholder='Select country...'
            )
        ], md=2),
        dbc.Col([
            html.Label(
                'Gender',
                className='text-light'
            ),
            dcc.Dropdown(
                id='gender-filter',
                options=gender_options,
                multi=True,
                className='text-dark',
                placeholder='Select gender...'
            )
        ], md=2),
        dbc.Col([
            html.Label(
                'Pioneer',
                className='text-light'
            ),
            dcc.RadioItems(
                id='pioneer-filter',
                options=pioneer_options,
                value=None,
                inline=True,
                labelStyle={'margin-right': '10px'}
            )
        ], md=2),
        dbc.Col([
            html.Label(
                'Profession',
                className='text-light'
            ),
            dcc.Dropdown(
                id='profession-filter',
                options=profession_options,
                multi=True,
                className='text-dark',
                placeholder='Select profession...'
            )
        ], md=2),
        dbc.Col([
            html.Label(
                'First Appearance Year',
                className='text-light'),
            dcc.RangeSlider(
                id='year-slider',
                min=min_year,
                max=max_year,
                step=1,
                marks={str(year): str(year) for year in range(min_year, max_year+1, max(1, (max_year-min_year)//10))},
                value=[min_year, max_year]
            )
        ], md=2),
        dbc.Col([
            html.Label(
                'Bill Value',
                className='text-light'
            ),
            dcc.RangeSlider(
                id='bill-slider',
                min=min_bill,
                max=max_bill,
                step=(max_bill-min_bill)/100,
                marks={str(round(val,1)): str(round(val,1)) for val in np.linspace(min_bill, max_bill, num=5)},
                value=[min_bill, max_bill]
            )
        ], md=2)
    ], className='mb-4'),
    dbc.Row([
        dbc.Col(
            dbc.Button(
                'Reset Filters',
                id='reset-button',
                color='secondary',
                className='mb-2'
            ),
            width=2
        )
    ]),
    # Additional filters for individual tabs
    dbc.Tabs([
        dbc.Tab(label='Main Analysis', children=[
            dbc.Row([
                dbc.Col(
                    dcc.Graph(
                        id='bar-profession',
                        config={'displayModeBar': False}
                    ),
                    md=6
                ),
                dbc.Col(
                    dcc.Graph(
                        id='scatter-bill-waiting',
                        config={'displayModeBar': False}
                    ),
                    md=6
                )
            ]),
            dbc.Row([
                dbc.Col(
                    dcc.Graph(
                        id='grouped-profession-gender',
                        config={'displayModeBar': False}
                    ),
                    md=6
                ),
                dbc.Col(
                    dcc.Graph(
                        id='box-waiting-time',
                        config={'displayModeBar': False}
                    ),
                    md=6
                )
            ]),
            dbc.Row([
                dbc.Col(
                    dash_table.DataTable(
                        id='data-table',
                        columns=[{'name': i, 'id': i} for i in df.columns],
                        data=df.to_dict('records'),
                        filter_action='native',
                        sort_action='native',
                        page_action='native',
                        page_current=0,
                        page_size=10,
                        style_table={'overflowX': 'auto'},
                        style_header={
                            'backgroundColor': '#303030',
                            'color': 'white'
                        },
                    style_cell={
                        'backgroundColor': '#424242',
                        'color': 'white',
                        'textAlign': 'left'
                    }
                ), md=12)
            ], className='mt-4')
        ]),
        dbc.Tab(label='Trends Analysis', children=[
            dbc.Row([
                dbc.Col([
                    html.Label(
                        'Group By:',
                        className='text-light'
                    ),
                    dcc.RadioItems(
                        id='trend-groupby',
                        options=[
                            {'label': 'Gender', 'value': 'gender'},
                            {'label': 'Profession', 'value': 'profession_clean'}
                        ],
                        value='gender',
                        inline=True,
                        labelStyle={'margin-right': '10px'}
                    )
                ], md=4)
            ], className='mb-4'),
            dbc.Row([
                dbc.Col(
                    dcc.Graph(
                        id='trend-analysis',
                        config={'displayModeBar': False}
                    ),
                    md=12
                )
            ])
        ]),
        dbc.Tab(label='Geography Analysis', children=[
            dbc.Row([
                dbc.Col([
                    html.Label(
                        'Select Profession for Map:',
                        className='text-light'
                    ),
                    dcc.Dropdown(
                        id='geo-category',
                        options=geo_category_options,
                        value='all',
                        className='text-dark',
                        clearable=False
                    )
                ], md=4)
            ], className='mb-4'),
            dbc.Row([
                dbc.Col(
                    dcc.Graph(
                        id='geo-map',
                        config={'displayModeBar': False}
                    ),
                    md=12
                )
            ])
        ]),
        dbc.Tab(label='Correlation Analysis', children=[
            dbc.Row([
                dbc.Col(
                    dcc.Graph(
                        id='corr-heatmap',
                        config={'displayModeBar': False}
                    ),
                    md=6
                ),
                dbc.Col(
                    dcc.Graph(
                        id='cross-tab-heatmap',
                        config={'displayModeBar': False}
                    ),
                    md=6
                )
            ]),
            dbc.Row([
                dbc.Col(
                    html.Div(
                        id='chi2-result',
                        className='text-light'
                    ),
                    md=12
                )
            ])
        ]),
        dbc.Tab(label='Machine Learning', children=[
            dbc.Row([
                dbc.Col(
                    dcc.Graph(
                        id='ml-feature',
                        config={'displayModeBar': False}
                    ),
                    md=12
                )
            ]),
            dbc.Row([
                dbc.Col(
                    html.Div(
                        id='ml-metrics',
                        className='text-light'
                    ),
                    md=12
                )
            ])
        ]),
        dbc.Tab(label='Network Visualization', children=[
            dbc.Row([
                dbc.Col([
                    html.Label(
                        'Filter Nodes by Country:',
                        className='text-light'
                    ),
                    dcc.Dropdown(
                        id='network-country-filter',
                        options=network_country_options,
                        value='all',
                        className='text-dark',
                        clearable=False
                    )
                ], md=4)
            ], className='mb-4'),
            dbc.Row([
                dbc.Col(
                    dcc.Graph(
                        id='network-graph',
                        config={'displayModeBar': False}
                    ),
                    md=12
                )
            ])
        ]),
        dbc.Tab(label='Forecasting', children=[
            dbc.Row([
                dbc.Col(
                    dcc.Graph(
                        id='forecast-graph',
                        config={'displayModeBar': False}
                    ),
                    md=12
                )
            ])
        ])
    ])
], fluid=True)

### 6. Callback: Update All Visualizations Based on Selected Filters ###

In [24]:
@app.callback(
    [Output('bar-profession', 'figure'),
     Output('scatter-bill-waiting', 'figure'),
     Output('grouped-profession-gender', 'figure'),
     Output('box-waiting-time', 'figure'),
     Output('data-table', 'data'),
     Output('trend-analysis', 'figure'),
     Output('geo-map', 'figure'),
     Output('corr-heatmap', 'figure'),
     Output('cross-tab-heatmap', 'figure'),
     Output('chi2-result', 'children'),
     Output('ml-feature', 'figure'),
     Output('ml-metrics', 'children'),
     Output('network-graph', 'figure'),
     Output('forecast-graph', 'figure')],
    [Input('country-filter', 'value'),
     Input('gender-filter', 'value'),
     Input('pioneer-filter', 'value'),
     Input('profession-filter', 'value'),
     Input('year-slider', 'value'),
     Input('bill-slider', 'value'),
     Input('trend-groupby', 'value'),
     Input('geo-category', 'value'),
     Input('network-country-filter', 'value')]
)
def update_all(selected_countries, selected_genders, selected_pioneer, selected_professions, year_range, bill_range, trend_group, geo_profession, network_country):
    # Filter the DataFrame based on global filters.
    dff = df.copy()
    if selected_countries and len(selected_countries) > 0:
        dff = dff[dff['country'].isin(selected_countries)]
    if selected_genders and len(selected_genders) > 0:
        dff = dff[dff['gender'].isin(selected_genders)]
    if selected_pioneer is not None:
        dff = dff[df['isPioneer'] == selected_pioneer]
    if selected_professions and len(selected_professions) > 0:
        dff = dff[dff['profession_clean'].isin(selected_professions)]
    dff = dff[(dff['firstAppearanceDate'] >= year_range[0]) & (dff['firstAppearanceDate'] <= year_range[1])]
    dff = dff[(dff['currentBillValue'] >= bill_range[0]) & (dff['currentBillValue'] <= bill_range[1])]

    # If filtered data is empty, create empty figures to avoid errors.
    if dff.empty:
        empty_fig = go.Figure()
        empty_fig.update_layout(
            template='plotly_dark',
            title='No data available'
        )
        return empty_fig, empty_fig, empty_fig, empty_fig, [], empty_fig, empty_fig, empty_fig, empty_fig, 'No data available', empty_fig, 'No ML results available', empty_fig, empty_fig

    # 1. Main Analysis:
    # a) Bar Chart: Count of banknotes by simplified profession category.
    prof_counts = dff['prof_category'].value_counts().reset_index()
    prof_counts.columns = ['Profession Category', 'Count']
    fig_bar = px.bar(
        prof_counts,
        x='Profession Category',
        y='Count',
        text='Count',
        title='Count of Banknotes by Profession Category',
        template='plotly_dark'
    )
    fig_bar.update_traces(textposition='outside')
    
    # b) Scatter Plot: Bill Value vs. Waiting Time (appearanceDeathDiff)
    scatter_df = dff.dropna(subset=[
        'appearanceDeathDiff',
        'currentBillValue'
    ])
    fig_scatter = px.scatter(
        scatter_df,
        x='currentBillValue',
        y='appearanceDeathDiff',
        hover_data=['name', 'profession_clean'],
        title='Bill Value vs. Waiting Time',
        template='plotly_dark'
    )
    
    # c) Grouped Bar Chart: Distribution of Profession by Gender.
    fig_grouped = px.histogram(
        dff,
        x='profession_clean',
        color='gender',
        barmode='group',
        title='Distribution of Profession by Gender',
        template='plotly_dark'
    )
    fig_grouped.update_layout(xaxis_tickangle=-45)
    
    # d) Box Plot: Waiting Time by Pioneer Status.
    dff['pioneer_label'] = dff['isPioneer'].apply(lambda x: 'Pioneer (Yes)' if x else 'Non-Pioneer (No)')
    fig_box = px.box(
        dff,
        x='pioneer_label',
        y='appearanceDeathDiff',
        color='pioneer_label',
        title='Waiting Time by Pioneer Status',
        template='plotly_dark',
        labels={
            'pioneer_label': 'Pioneer Status',
            'appearanceDeathDiff': 'Waiting Time (years)'
        }
    )
    fig_box.update_layout(showlegend=False)
    
    # e) Data Table Update.
    table_data = dff.to_dict('records')

    # 2. Trends Analysis:
    # Group data by 'firstAppearanceDate' and the chosen grouping (gender or profession).
    # Reindex to ensure complete years for smoothing.
    complete_years = pd.DataFrame({'firstAppearanceDate': range(year_range[0], year_range[1] + 1)})
    trend_df = dff.groupby(['firstAppearanceDate', trend_group]).size().reset_index(name='count')
    trend_df = complete_years.merge(trend_df, on='firstAppearanceDate', how='left').fillna({'count':0})
    # If grouping column is missing for some years, fill with a default value.
    if trend_group not in trend_df.columns:
        trend_df[trend_group] = 'Unknown'
    fig_trend = px.line(
        trend_df,
        x='firstAppearanceDate',
        y='count',
        color=trend_group,
        title=f"Trends: Distribution of Banknotes Over Years (Grouped by {trend_group})",
        template='plotly_dark'
    )
    # Add smoothing lines using moving average (window=3).
    smooth_trends = []
    for grp in trend_df[trend_group].unique():
        sub = trend_df[trend_df[trend_group] == grp].sort_values('firstAppearanceDate')
        sub['smoothed'] = sub['count'].rolling(window=3, min_periods=1).mean()
        smooth_trends.append(go.Scatter(
            x=sub['firstAppearanceDate'],
            y=sub['smoothed'],
            mode='lines',
            name=f"{grp} (Smoothed)"
        ))
    for trace in smooth_trends:
        fig_trend.add_trace(trace)

    # 3. Geography Analysis:
    # Filter by category if a specific one is selected (other than "all").
    geo_df = dff.copy()
    if geo_profession != "all":
        geo_df = geo_df[geo_df['profession_clean'] == geo_profession]
    geo_group = geo_df.groupby('country').size().reset_index(name='count')
    fig_geo = px.choropleth(
        geo_group,
        locations='country',
        locationmode='country names',
        color='count',
        hover_name='country',
        color_continuous_scale='Viridis',
        title=f"Distribution of Banknotes by Country ({'All Professions' if geo_profession == 'all' else geo_profession})",
        template='plotly_dark'
    )

    # 4. Correlation Analysis:
    # a) Heatmap for numerical variables.
    corr_vars = ['currentBillValue', 'firstAppearanceDate', 'deathDate', 'appearanceDeathDiff', 'lifeSpan']
    corr_df = dff[corr_vars].corr()
    fig_corr = px.imshow(
        corr_df,
        text_auto=True,
        title='Correlation between Numerical Variables',
        template='plotly_dark'
    )
    
    # b) Cross-tab frequency heatmap for categorical data (gender vs. profession).
    cross_tab = pd.crosstab(dff['gender'], dff['profession_clean'])
    fig_cross = px.imshow(
        cross_tab,
        text_auto=True,
        title='Cross-Tab Frequency: Gender vs. Profession',
        template='plotly_dark'
    )
    
    # c) Perform chi-squared test for categorical data (gender vs. profession).
    try:
        chi2, p, dof, ex = chi2_contingency(cross_tab)
        chi2_text = f"χ² test (Gender vs. Profession): χ² = {chi2:.2f}, p-value = {p:.4f}"
    except Exception as e:
        chi2_text = f"Error performing χ² test: {e}"

    # 5. Machine Learning:
    # Train a RandomForest classifier to predict 'isPioneer' and display feature importance
    # along with quality metrics (accuracy, F1-score, precision, recall).
    ml_df = dff.copy()
    ml_df = ml_df[(ml_df['gender'] != 'Unknown') & (ml_df['profession_clean'] != 'Unknown') & (ml_df['country'] != 'Unknown')]
    if ml_df.shape[0] > 10:
        features = ml_df[['gender', 'profession_clean', 'country', 'currentBillValue', 'firstAppearanceDate']]
        target = ml_df['isPioneer']
        features_encoded = pd.get_dummies(features, drop_first=True)
        X_train, X_test, y_train, y_test = train_test_split(features_encoded, target, test_size=0.3, random_state=42)
        rf = RandomForestClassifier(n_estimators=100, random_state=42)
        rf.fit(X_train, y_train)
        importances = rf.feature_importances_
        imp_df = pd.DataFrame({'feature': features_encoded.columns, 'importance': importances})
        imp_df = imp_df.sort_values('importance', ascending=False)
        fig_ml = px.bar(
            imp_df,
            x='importance',
            y='feature',
            orientation='h',
            title='Feature Importance (RandomForest)',
            template='plotly_dark'
        )
        # Compute additional metrics.
        y_pred = rf.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        ml_metrics_text = (f"Accuracy: {accuracy:.2f} | F1-score: {f1:.2f} | "
                           f"Precision: {precision:.2f} | Recall: {recall:.2f}")
    else:
        fig_ml = go.Figure()
        fig_ml.update_layout(
            title='Insufficient data for ML',
            template='plotly_dark'
        )
        ml_metrics_text = 'No ML results available'

    # 6. Network Visualization:
    # Use kamada_kawai_layout for improved performance and filter nodes by selected country if not "all".
    net_df = dff.copy()
    if network_country != 'all':
        net_df = net_df[net_df['country'] == network_country]
    G = nx.Graph()
    for idx, row in net_df.iterrows():
        G.add_node(
            row['id'],
            label=row['name'],
            country=row['country'],
            profession=row['profession_clean'],
            gender=row['gender']
        )
    nodes = list(G.nodes(data=True))
    for i in range(len(nodes)):
        for j in range(i+1, len(nodes)):
            if (nodes[i][1]['country'] == nodes[j][1]['country']) or (nodes[i][1]['profession'] == nodes[j][1]['profession']):
                G.add_edge(nodes[i][0], nodes[j][0])
    # Use kamada_kawai_layout for performance.
    pos = nx.kamada_kawai_layout(G)
    node_x, node_y, node_text, node_color = [], [], [], []
    for node, attr in G.nodes(data=True):
        x, y = pos[node]
        node_x.append(x)
        node_y.append(y)
        node_text.append(f"{attr['label']}\n{attr['profession']}\n{attr['country']}")
        node_color.append('cyan' if attr['gender'].lower() == 'female' else 'magenta')
    edge_x, edge_y = [], []
    for edge in G.edges():
        x0, y0 = pos[edge[0]]
        x1, y1 = pos[edge[1]]
        edge_x.extend([x0, x1, None])
        edge_y.extend([y0, y1, None])
    edge_trace = go.Scatter(
        x=edge_x,
        y=edge_y,
        line=dict(width=0.5, color='#888'),
        hoverinfo='none',
        mode='lines'
    )
    node_trace = go.Scatter(
        x=node_x,
        y=node_y,
        mode='markers',
        marker=dict(size=10, color=node_color),
        text=node_text,
        hoverinfo='text'
    )
    fig_network = go.Figure(data=[edge_trace, node_trace])
    fig_network.update_layout(
        title='Network of Banknote Figures',
        template='plotly_dark',
        xaxis={'visible': False},
        yaxis={'visible': False}
    )

    # 7. Forecasting:
    # Use Prophet to forecast the count of banknotes over the years.
    ts_df = dff.groupby('firstAppearanceDate').size().reset_index(name='count')
    ts_df = ts_df.sort_values('firstAppearanceDate')
    if ts_df.shape[0] > 5:
        # Prepare data for Prophet: columns "ds" and "y"
        prophet_df = ts_df.rename(columns={'firstAppearanceDate': 'ds', 'count': 'y'})
        # Convert years to datetime. Ensure proper conversion.
        prophet_df['ds'] = pd.to_datetime(prophet_df['ds'], format='%Y', errors='coerce')
        prophet_df = prophet_df.dropna(subset=['ds'])
        m = Prophet(
            yearly_seasonality=True,
            daily_seasonality=False,
            weekly_seasonality=False
        )
        m.fit(prophet_df)
        future = m.make_future_dataframe(periods=5, freq='Y')
        forecast = m.predict(future)
        fig_forecast = go.Figure()
        fig_forecast.add_trace(
            go.Scatter(
                x=prophet_df['ds'],
                y=prophet_df['y'],
                mode='lines+markers',
                name='Historical Data'
            )
        )
        fig_forecast.add_trace(
            go.Scatter(
                x=forecast['ds'],
                y=forecast['yhat'],
                mode='lines',
                name='Forecast'
            )
        )
        fig_forecast.update_layout(
            title='Forecast of Banknote Counts by Year (Prophet)',
            template='plotly_dark'
        )
    else:
        fig_forecast = go.Figure()
        fig_forecast.update_layout(
            title='Insufficient data for forecasting',
            template='plotly_dark'
        )
    
    return (fig_bar, fig_scatter, fig_grouped, fig_box, table_data, fig_trend, fig_geo, 
            fig_corr, fig_cross, chi2_text, fig_ml, ml_metrics_text, fig_network, fig_forecast)

### 7. Run the App ###

In [26]:
if __name__ == '__main__':
    app.run_server(debug=True)