# Automated Network Analysis Report Generator

---


Welcome! This notebook automates the process of creating a detailed PDF report from your network analysis data (e.g., from VOSviewer).
## Quick Start Guide
Follow these three simple steps to generate your report.
### Step 1: Run the Setup Cells
Run the first few cells in order to install the necessary libraries and load the analysis functions. You do not need to make any changes to them.
### Step 2: Run the Main Workflow Cell
Execute the final cell labeled "EXECUTION". This will start the interactive process:
Upload Your Files: A file upload dialog will pop up. Please select all your data files at once:
Your Node file (must end with _node.csv)
Your Edge file (must end with _edge.csv)
Any PNG images of your network visualizations.
Start the Analysis: After the files finish uploading, a green Start Analysis button will appear. Click it to begin processing your data.
Download Your Report: Once the analysis is complete, a blue Download Report button will appear. Click it to save the final PDF to your computer.
### Step 3: Clean Up
After you have downloaded your report, a red Clean Up Files button will become active. Click this button to securely delete all your data and the generated report from this temporary Colab session.

---


**That's it! If you encounter any errors, the notebook will automatically clean up the files and provide an error message. Just re-run the final "EXECUTION" cell to try again.**

In [None]:
#===================================================================
# CELL 1: SETUP
#
# This cell installs and imports all necessary libraries for the analysis.
# It should be run once at the beginning of the session.
#
#===================================================================
!pip install reportlab -q

import pandas as pd
import networkx as nx
from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas
from reportlab.lib import colors
from reportlab.lib.units import cm
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Spacer, PageBreak, Image
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from google.colab import files
import ipywidgets as widgets
from IPython.display import display, HTML
import os
import io

print("Setup complete. You can now run the other cells.")

In [None]:
#===================================================================
# CELL 2: CONFIGURATION
#
# This cell contains a centralized configuration dictionary.
# All user-adjustable parameters, such as filenames, report titles, and
# column headers from the input data, should be set here.
#
#===================================================================

# This dictionary holds all configurable settings.
config = {
    # --- Report Settings ---
    "PDF_FILENAME": "Network_Analysis_Report.pdf",
    "REPORT_TITLE": "Network Analysis Report",
    "TOP_N_KEYWORDS": 10,       # Number of top keywords to show in tables
    "TOP_N_INFLUENTIAL": 5,     # Number of influential keywords for in-depth analysis

    # --- Data Column Names ---
    "NODE_ID_COL": 'id',
    "NODE_LABEL_COL": 'label',
    "NODE_CLUSTER_COL": 'cluster',
    "NODE_OCCURRENCES_COL": 'weight<Occurrences>',
    "NODE_LINK_STRENGTH_COL": 'weight<Total link strength>',
    "NODE_AVG_PUB_YEAR_COL": 'score<Avg. pub. year>',
    "EDGE_SOURCE_COL": 'source',
    "EDGE_TARGET_COL": 'target',
    "EDGE_WEIGHT_COL": 'weight',
}

print("Configuration loaded.")

In [None]:
#===================================================================
# CELL 3: DATA LOADING & ANALYSIS FUNCTIONS
#
# This cell defines all functions responsible for core data processing and
# network analysis. Each function adheres to the Single Responsibility
# Principle, focusing on one specific task (e.g., loading data, calculating
# centrality, analyzing clusters).
#
#===================================================================

def load_data(uploaded_files, config):
    """Loads, validates, and prepares node, edge, and graph data.

    Identifies node and edge CSV files from the uploaded dictionary, reads
    them into pandas DataFrames, validates required columns, and constructs
    a NetworkX graph object.

    Args:
        uploaded_files (dict): A dictionary where keys are filenames and
            values are file content in bytes, as provided by google.colab.
        config (dict): The configuration dictionary with column names.

    Returns:
        tuple: A tuple containing:
            - pd.DataFrame: The edges DataFrame.
            - pd.DataFrame: The nodes DataFrame, indexed by node ID.
            - nx.Graph: The constructed network graph.

    Raises:
        FileNotFoundError: If the required '_edge.csv' or '_node.csv'
            files are not found in the uploaded files.
        ValueError: If the node file is missing required columns.
    """
    edge_file_name = next((name for name in uploaded_files if '_edge.csv' in name), None)
    node_file_name = next((name for name in uploaded_files if '_node.csv' in name), None)

    if not edge_file_name or not node_file_name:
        raise FileNotFoundError("Could not find the required '_edge.csv' or '_node.csv' files.")

    edges_df = pd.read_csv(io.BytesIO(uploaded_files[edge_file_name]), sep=",", header=None,
                           names=[config['EDGE_SOURCE_COL'], config['EDGE_TARGET_COL'], config['EDGE_WEIGHT_COL']])
    nodes_df = pd.read_csv(io.BytesIO(uploaded_files[node_file_name]), sep=",")

    # --- Data Validation ---
    required_node_cols = {
        config['NODE_ID_COL'], config['NODE_LABEL_COL'], config['NODE_CLUSTER_COL'],
        config['NODE_OCCURRENCES_COL'], config['NODE_LINK_STRENGTH_COL'], config['NODE_AVG_PUB_YEAR_COL']
    }
    if not required_node_cols.issubset(nodes_df.columns):
        missing = required_node_cols - set(nodes_df.columns)
        raise ValueError(f"Node file is missing required columns: {missing}.")

    nodes_df = nodes_df.set_index(config['NODE_ID_COL'])
    graph = nx.from_pandas_edgelist(edges_df, config['EDGE_SOURCE_COL'], config['EDGE_TARGET_COL'], config['EDGE_WEIGHT_COL'])
    return edges_df, nodes_df, graph

def analyze_occurrences(nodes_df, config):
    """Analyzes keywords by their occurrence count.

    Args:
        nodes_df (pd.DataFrame): The DataFrame containing node data.
        config (dict): The configuration dictionary with settings.

    Returns:
        pd.DataFrame: A DataFrame of top keywords by occurrence.
    """
    df = nodes_df.nlargest(config['TOP_N_KEYWORDS'], config['NODE_OCCURRENCES_COL'])
    df = df[[config['NODE_LABEL_COL'], config['NODE_OCCURRENCES_COL']]]
    return df.rename(columns={config['NODE_OCCURRENCES_COL']: 'Occurrences', config['NODE_LABEL_COL']: 'Keyword'})

def analyze_link_strength(nodes_df, config):
    """Analyzes keywords by their total link strength.

    Args:
        nodes_df (pd.DataFrame): The DataFrame containing node data.
        config (dict): The configuration dictionary with settings.

    Returns:
        pd.DataFrame: A DataFrame of top keywords by link strength.
    """
    df = nodes_df.nlargest(config['TOP_N_KEYWORDS'], config['NODE_LINK_STRENGTH_COL'])
    df = df[[config['NODE_LABEL_COL'], config['NODE_LINK_STRENGTH_COL']]]
    return df.rename(columns={config['NODE_LINK_STRENGTH_COL']: 'Total Link Strength', config['NODE_LABEL_COL']: 'Keyword'})

def analyze_centrality(graph, nodes_df, config):
    """Calculates Degree, Betweenness, Closeness, and Eigenvector centralities.

    Args:
        graph (nx.Graph): The NetworkX graph object.
        nodes_df (pd.DataFrame): The DataFrame containing node data.
        config (dict): The configuration dictionary with settings.

    Returns:
        dict: A dictionary of DataFrames, with each key representing a
              centrality metric ('degree', 'betweenness', etc.).
    """
    degree = pd.DataFrame(nx.degree_centrality(graph).items(), columns=[config['NODE_ID_COL'], 'Degree Centrality']).set_index(config['NODE_ID_COL'])
    betweenness = pd.DataFrame(nx.betweenness_centrality(graph).items(), columns=[config['NODE_ID_COL'], 'Betweenness Centrality']).set_index(config['NODE_ID_COL'])
    closeness = pd.DataFrame(nx.closeness_centrality(graph).items(), columns=[config['NODE_ID_COL'], 'Closeness Centrality']).set_index(config['NODE_ID_COL'])

    try:
        eigenvector = pd.DataFrame(nx.eigenvector_centrality(graph, max_iter=1000, weight=config['EDGE_WEIGHT_COL']).items(),
                                   columns=[config['NODE_ID_COL'], 'Eigenvector Centrality']).set_index(config['NODE_ID_COL'])
    except nx.PowerIterationFailedConvergence:
        print("Warning: Eigenvector centrality did not converge. This metric will be skipped.")
        eigenvector = pd.DataFrame(columns=[config['NODE_ID_COL'], 'Eigenvector Centrality']).set_index(config['NODE_ID_COL'])

    centrality_df = nodes_df.join([degree, betweenness, closeness, eigenvector])

    results = {
        'degree': centrality_df.nlargest(config['TOP_N_KEYWORDS'], 'Degree Centrality')[[config['NODE_LABEL_COL'], 'Degree Centrality']].rename(columns={config['NODE_LABEL_COL']: 'Keyword'}),
        'betweenness': centrality_df.nlargest(config['TOP_N_KEYWORDS'], 'Betweenness Centrality')[[config['NODE_LABEL_COL'], 'Betweenness Centrality']].rename(columns={config['NODE_LABEL_COL']: 'Keyword'}),
        'closeness': centrality_df.nlargest(config['TOP_N_KEYWORDS'], 'Closeness Centrality')[[config['NODE_LABEL_COL'], 'Closeness Centrality']].rename(columns={config['NODE_LABEL_COL']: 'Keyword'}),
        'eigenvector': centrality_df.nlargest(config['TOP_N_KEYWORDS'], 'Eigenvector Centrality')[[config['NODE_LABEL_COL'], 'Eigenvector Centrality']].rename(columns={config['NODE_LABEL_COL']: 'Keyword'})
    }
    return results

def analyze_clusters(nodes_df, graph, config):
    """Analyzes thematic clusters by calculating coherence and top labels.

    Args:
        nodes_df (pd.DataFrame): The DataFrame containing node data.
        graph (nx.Graph): The NetworkX graph object.
        config (dict): The configuration dictionary with settings.

    Returns:
        pd.DataFrame: A DataFrame containing cluster ID, coherence score,
                      and top labels for each cluster.
    """
    cluster_labels = nodes_df.groupby(config['NODE_CLUSTER_COL'])[config['NODE_LABEL_COL']].agg(
        lambda x: ', '.join(sorted(x, key=lambda y: -len(y))[:5]) + ('...' if len(x) > 5 else '')).to_dict()

    coherence_scores = {}
    for cluster_id in nodes_df[config['NODE_CLUSTER_COL']].unique():
        cluster_nodes = nodes_df[nodes_df[config['NODE_CLUSTER_COL']] == cluster_id].index
        subgraph = graph.subgraph(cluster_nodes)
        if subgraph.number_of_edges() > 0:
            avg_weight = subgraph.size(weight=config['EDGE_WEIGHT_COL']) / subgraph.number_of_edges()
            coherence_scores[cluster_id] = avg_weight
        else:
            coherence_scores[cluster_id] = 0

    coherence_df = pd.DataFrame(coherence_scores.items(), columns=['Cluster ID', 'Average Edge Weight']).round(4)
    coherence_df['Top 5 Cluster Labels'] = coherence_df['Cluster ID'].map(cluster_labels)
    return coherence_df

def analyze_temporal_trends(nodes_df, config):
    """Identifies the most recent keywords based on average publication year.

    Args:
        nodes_df (pd.DataFrame): The DataFrame containing node data.
        config (dict): The configuration dictionary with settings.

    Returns:
        pd.DataFrame: A DataFrame of top keywords from recent publications.
    """
    df = nodes_df.nlargest(config['TOP_N_KEYWORDS'], config['NODE_AVG_PUB_YEAR_COL'])
    df = df[[config['NODE_LABEL_COL'], config['NODE_AVG_PUB_YEAR_COL']]]
    df = df.rename(columns={config['NODE_AVG_PUB_YEAR_COL']: 'Avg. Pub. Year'})
    return df.rename(columns={config['NODE_LABEL_COL']: 'Keyword Label'})[['Keyword Label', 'Avg. Pub. Year']]

def analyze_influential_keywords(nodes_df, edges_df, config):
    """Identifies top influential keywords and their strongest connections.

    Args:
        nodes_df (pd.DataFrame): The DataFrame containing node data.
        edges_df (pd.DataFrame): The DataFrame containing edge data.
        config (dict): The configuration dictionary with settings.

    Returns:
        list: A list of dictionaries, where each dictionary contains an
              influential keyword and a DataFrame of its top connections.
    """
    influential_nodes = nodes_df.nlargest(config['TOP_N_INFLUENTIAL'], config['NODE_LINK_STRENGTH_COL'])
    connections_data = []

    for node_id, row in influential_nodes.iterrows():
        keyword_label = row[config['NODE_LABEL_COL']]
        connected_edges = edges_df[(edges_df[config['EDGE_SOURCE_COL']] == node_id) | (edges_df[config['EDGE_TARGET_COL']] == node_id)]

        other_nodes = connected_edges.apply(lambda r: r[config['EDGE_TARGET_COL']] if r[config['EDGE_SOURCE_COL']] == node_id else r[config['EDGE_SOURCE_COL']], axis=1)
        connection_weights = pd.DataFrame({'other_node_id': other_nodes, 'weight': connected_edges[config['EDGE_WEIGHT_COL']]})

        top_connections = connection_weights.groupby('other_node_id')['weight'].sum().nlargest(config['TOP_N_KEYWORDS']).reset_index()
        top_connections['Connected Keyword'] = top_connections['other_node_id'].map(nodes_df[config['NODE_LABEL_COL']])

        connections_data.append({
            'Keyword': keyword_label,
            'Connections': top_connections[['Connected Keyword', 'weight']].rename(columns={'weight': 'Connection Strength'})
        })
    return connections_data

print("Analysis functions defined.")

In [None]:
#===================================================================
# CELL 4: PDF REPORT GENERATION FUNCTIONS
#
# This cell contains all helper functions related to building the PDF report
# using the ReportLab library. This includes functions for creating tables,
# adding headers/footers, and structuring report sections.
#
#===================================================================

def dataframe_to_table(df):
    """Converts a Pandas DataFrame to a styled ReportLab Table object.

    Args:
        df (pd.DataFrame): The DataFrame to convert.

    Returns:
        reportlab.platypus.Table: The styled table object.
    """
    styles = getSampleStyleSheet()
    cell_style = ParagraphStyle('CellStyle', parent=styles['Normal'], fontSize=9, leading=10)

    header_style = ParagraphStyle('HeaderStyle', parent=cell_style, fontName='Helvetica-Bold', fontSize=10, textColor=colors.white)
    data = [[Paragraph(col, header_style) for col in df.columns]]

    for _, row in df.iterrows():
        formatted_row = []
        for cell in row:
            text = f"{cell:.4f}" if isinstance(cell, float) else str(cell)
            formatted_row.append(Paragraph(text, cell_style))
        data.append(formatted_row)

    table = Table(data, repeatRows=1)
    table.setStyle(TableStyle([
        ('BACKGROUND', (0,0), (-1,0), colors.HexColor('#4F81BD')),
        ('ALIGN', (0,0), (-1,-1), 'LEFT'),
        ('VALIGN', (0,0), (-1,-1), 'TOP'),
        ('GRID', (0,0), (-1,-1), 0.5, colors.HexColor('#B8CCE4')),
        ('BACKGROUND', (0,1), (-1,-1), colors.HexColor('#DCE6F1')),
    ]))
    return table

def create_report_section(elements, styles, title, image_path, image_caption, tables):
    """Creates a standard section for the report with title, image, and tables.

    Args:
        elements (list): The list of ReportLab flowables to append to.
        styles (dict): The stylesheet dictionary from ReportLab.
        title (str): The main heading for this section.
        image_path (str): The file path to the visualization image.
        image_caption (str): The caption to display below the image.
        tables (list): A list of tuples, where each tuple contains
                       (table_title, table_caption, pd.DataFrame).
    """
    elements.append(Paragraph(title, styles['Heading1']))
    if image_path and os.path.exists(image_path):
        img_title = os.path.basename(image_path).replace('.png','').replace('_', ' ').title()
        elements.append(Paragraph(f"{img_title} Visualization", styles['Heading2']))
        elements.append(Paragraph(image_caption, styles['BodyText']))
        elements.append(Image(image_path, width=15*cm, height=12*cm))
        elements.append(Spacer(1, 0.5 * cm))
    else:
        print(f"Warning: Visualization for '{title}' not found. Skipping image.")

    for tbl_title, tbl_caption, tbl_df in tables:
        if not tbl_df.empty:
            elements.append(Paragraph(tbl_title, styles['Heading3']))
            elements.append(Paragraph(tbl_caption, styles['BodyText']))
            elements.append(dataframe_to_table(tbl_df))
            elements.append(Spacer(1, 0.5*cm))
    elements.append(PageBreak())

def add_header_footer(canvas, doc, config):
    """Adds a header and page number to each page of the PDF.

    Args:
        canvas (reportlab.pdfgen.canvas.Canvas): The canvas object.
        doc (reportlab.platypus.SimpleDocTemplate): The document object.
        config (dict): The configuration dictionary with report settings.
    """
    canvas.saveState()
    # Header
    header_style = ParagraphStyle(name='Header', fontName='Helvetica-Bold', fontSize=14, alignment=0, textColor=colors.darkblue)
    header = Paragraph(config['REPORT_TITLE'], header_style)
    header.wrapOn(canvas, doc.width, doc.topMargin)
    header.drawOn(canvas, doc.leftMargin, doc.pagesize[1] - 1.5*cm)
    # Footer (Page Number)
    canvas.setFont('Helvetica', 9)
    canvas.drawRightString(doc.pagesize[0] - doc.rightMargin, 1.5*cm, f"Page {doc.page}")
    canvas.restoreState()

def find_image_paths(uploaded_files):
    """Finds the paths for the standard visualization images.

    Args:
        uploaded_files (dict): The dictionary of uploaded files.

    Returns:
        dict: A dictionary mapping image types ('network', 'density', 'overlay')
              to their filenames.
    """
    return {
        "network": next((name for name in uploaded_files if "network" in name and name.endswith('.png')), None),
        "density": next((name for name in uploaded_files if "density" in name and name.endswith('.png')), None),
        "overlay": next((name for name in uploaded_files if "overlay" in name and name.endswith('.png')), None)
    }

print("Report generation functions defined.")

In [None]:
#===================================================================
# CELL 5: MAIN ORCHESTRATOR
#
# This cell contains the main function that coordinates the entire workflow,
# from loading data to performing all analyses and finally assembling the
# PDF report.
#
#===================================================================

def run_analysis_and_generate_report(uploaded_files, config):
    """Orchestrates the analysis and report generation process.

    This function calls the necessary analysis functions in order,
    assembles the results into a structured report, and builds the PDF.

    Args:
        uploaded_files (dict): The dictionary of uploaded files from Colab.
        config (dict): The global configuration dictionary.

    Returns:
        bool: True if the report was generated successfully, False otherwise.
    """
    doc = SimpleDocTemplate(config['PDF_FILENAME'], pagesize=A4, leftMargin=2*cm, rightMargin=2*cm, topMargin=2.5*cm, bottomMargin=2.5*cm)
    styles = getSampleStyleSheet()
    elements = []

    try:
        # --- 1. Load Data and Find Images ---
        print("Step 1/5: Loading and validating data...")
        edges_df, nodes_df, graph = load_data(uploaded_files, config)
        image_paths = find_image_paths(uploaded_files)
        print("Data loaded successfully.")

        # --- 2. Perform All Analyses (SRP & DRY) ---
        # Each analysis is run once and its results are stored.
        print("Step 2/5: Performing all analyses...")
        occurrences_df = analyze_occurrences(nodes_df, config)
        link_strength_df = analyze_link_strength(nodes_df, config)
        centrality_results = analyze_centrality(graph, nodes_df, config)
        cluster_coherence_df = analyze_clusters(nodes_df, graph, config)
        temporal_keywords_df = analyze_temporal_trends(nodes_df, config)
        influential_connections = analyze_influential_keywords(nodes_df, edges_df, config)
        print("All analyses complete.")

        # --- 3. Build Report Sections ---
        print("Step 3/5: Building report sections...")

        # Section A: Overall Network Analysis
        overall_tables = [
            (f"Top {config['TOP_N_KEYWORDS']} Keywords by Occurrence", "Highest raw count in source documents.", occurrences_df),
            (f"Top {config['TOP_N_KEYWORDS']} Keywords by Link Strength", "Strongest cumulative co-occurrence links.", link_strength_df),
            (f"Top {config['TOP_N_KEYWORDS']} Keywords by Degree Centrality", "Most directly connected keywords.", centrality_results['degree']),
            (f"Top {config['TOP_N_KEYWORDS']} Keywords by Betweenness Centrality", "'Bridge' keywords connecting different areas.", centrality_results['betweenness']),
            (f"Top {config['TOP_N_KEYWORDS']} Keywords by Closeness Centrality", "Keywords that can reach all others most quickly.", centrality_results['closeness']),
            (f"Top {config['TOP_N_KEYWORDS']} Keywords by Eigenvector Centrality", "Influential keywords connected to other influential ones.", centrality_results['eigenvector'])
        ]
        create_report_section(elements, styles, "Overall Network Analysis", image_paths["network"],
                              "This visualization depicts the overall co-occurrence network of keywords.", overall_tables)

        # Section B: Thematic Analysis
        thematic_tables = [("Cluster Coherence and Top Labels", "Shows cluster coherence and representative keywords.", cluster_coherence_df)]
        create_report_section(elements, styles, "Thematic Analysis (Clusters)", image_paths["density"],
                              "This visualization highlights keyword density, indicating thematic clusters.", thematic_tables)

        # Section C: Temporal Analysis
        temporal_tables = [("Most Recent Keywords", "Lists keywords from recent publications, indicating emerging trends.", temporal_keywords_df)]
        create_report_section(elements, styles, "Temporal Analysis", image_paths["overlay"],
                              "This visualization shows the temporal evolution of keywords.", temporal_tables)

        # Section D: Keyword-Level Analysis
        elements.append(Paragraph("Keyword-Level Analysis", styles['Heading1']))
        for connection in influential_connections:
            elements.append(Paragraph(f"Influential Keyword: {connection['Keyword']}", styles['Heading2']))
            elements.append(Paragraph(f"Top co-occurrence connections for '{connection['Keyword']}'.", styles['BodyText']))
            elements.append(dataframe_to_table(connection['Connections']))
            elements.append(Spacer(1, 0.5 * cm))
        print("Report sections constructed.")

        # --- 4. Generate PDF ---
        print("\nStep 4/5: Generating PDF document... this may take a moment.")
        header_footer_with_config = lambda c, d: add_header_footer(c, d, config)
        doc.build(elements, onFirstPage=header_footer_with_config, onLaterPages=header_footer_with_config)

        print(f"\nReport '{config['PDF_FILENAME']}' generated successfully!")
        return True

    except (FileNotFoundError, ValueError, KeyError) as e:
        print(f"\nERROR: {e}")
        print("   Please check your uploaded files and the settings in the 'Configuration' cell, then try again.")
        return False
    except Exception as e:
        print(f"\nAn unexpected error occurred: {e}")
        return False

print("Main orchestrator function defined.")

In [None]:
#===================================================================
# CELL 6: EXECUTION
#
# This cell provides a simple, controlled workflow for the entire
# analysis process.
#
#===================================================================
"""
Workflow:
1.  Uses the standard `google.colab.files.upload()` pop-up for file selection.
2.  Validates that the required files have been uploaded. If validation fails,
    it automatically cleans up the uploaded files and instructs the user.
3.  If validation succeeds, it presents a "Start Analysis" button.
4.  If the analysis process encounters an error, it automatically cleans up
    the uploaded files and reports the specific error.
5.  If the analysis succeeds, it presents "Download Report" and "Clean Up Files"
    buttons, where the cleanup action is only enabled after the download is
    initiated.
"""

def _perform_cleanup(uploaded_files_dict, pdf_filename=None, output_widget=None):
    """
    A central function to delete uploaded files and the generated PDF.

    This helper function is used for both automatic cleanup on error and
    manual cleanup initiated by the user.

    Args:
        uploaded_files_dict (dict): The dictionary of uploaded files from Colab.
        pdf_filename (str, optional): The name of the generated PDF file to delete.
        output_widget (ipywidgets.Output, optional): The widget to print
            status messages to. If provided, the initial message is adjusted
            for manual cleanup.
    """
    message = "Initiating automatic cleanup due to error..."
    if output_widget:
        # This branch is for manual cleanup, so the message is different.
        output_widget.clear_output()
        message = "Initiating cleanup..."

    print(message)

    files_to_delete = list(uploaded_files_dict.keys())
    if pdf_filename and os.path.exists(pdf_filename):
        files_to_delete.append(pdf_filename)

    if not files_to_delete:
        print("No files to clean up.")
        return

    deleted_count = 0
    for filename in set(files_to_delete):
        if os.path.exists(filename):
            try:
                os.remove(filename)
                print(f"  - Removed: {filename}")
                deleted_count += 1
            except OSError as e:
                print(f"  - Error removing file {filename}: {e}")

    print(f"\nCleanup finished. Removed {deleted_count} file(s).")


# --- Step 1: Upload Files ---
print("Please select all required and properly formatted files ('*_node.csv', '*_edge.csv', 'network.png', 'density.png', 'overlay.png') in one step. The upload process will ignore any other file uploaded.")
print("Please upload your data files. You will need:")
print("  1. A mandantory node file (e.g., '*my_data*_node.csv')")
print("  2. A mandantory edge file (e.g., '*my_data*_edge.csv')")
print("  3. Optionally network visualization images (as 'network.png', 'density.png', 'overlay.png')")
print("-" * 30)

uploaded = files.upload()

# --- Step 2: Validate Files and Present Next Action ---
if not uploaded:
    print("\nError: No files were uploaded. Please run the cell again to select your files.")
else:
    print("-" * 30)
    print(f"Success: Uploaded {len(uploaded)} file(s):")
    for fn in uploaded.keys():
        print(f"  - {fn}")
    print("-" * 30)

    # Validate that the required files are present
    filenames = uploaded.keys()
    has_node = any('_node.csv' in f for f in filenames)
    has_edge = any('_edge.csv' in f for f in filenames)

    if has_node and has_edge:
        # --- IF VALIDATION PASSES: Present the 'Start Analysis' Button ---
        start_button = widgets.Button(description="Start Analysis", button_style='success', icon='cogs')
        analysis_output = widgets.Output()

        def on_start_analysis_clicked(b):
            """
            Handles the click event for the 'Start Analysis' button.

            This function runs the main analysis process and, upon success,
            displays the final download and cleanup actions. It handles
            exceptions by initiating an automatic cleanup.

            Args:
                b (ipywidgets.Button): The button instance that was clicked.
            """
            b.disabled = True
            b.description = "Analysis in Progress..."

            with analysis_output:
                analysis_output.clear_output()
                print("Starting analysis... This may take a moment.")
                try:
                    success = run_analysis_and_generate_report(uploaded, config)
                    if success:
                        print("\nSuccess: Analysis Complete. Your report is ready.")
                        # Define the final widgets
                        download_button = widgets.Button(description="Download Report", button_style='primary', icon='download')
                        cleanup_button = widgets.Button(description="Clean Up Files", button_style='danger', icon='trash', disabled=True)
                        final_output = widgets.Output()

                        def on_download_clicked(btn):
                            """
                            Handles the download button click event.

                            Initiates the file download and enables the cleanup button.

                            Args:
                                btn (ipywidgets.Button): The button instance.
                            """
                            files.download(config['PDF_FILENAME'])
                            btn.description = "Download Started"
                            btn.disabled = True
                            cleanup_button.disabled = False  # Enable cleanup

                        def on_cleanup_clicked(btn):
                            """
                            Handles the cleanup button click event.

                            Calls the helper function to remove all session files.

                            Args:
                                btn (ipywidgets.Button): The button instance.
                            """
                            with final_output:
                                _perform_cleanup(uploaded, config['PDF_FILENAME'], output_widget=final_output)
                                btn.description = "Cleanup Complete"
                                btn.disabled = True
                                download_button.disabled = True

                        download_button.on_click(on_download_clicked)
                        cleanup_button.on_click(on_cleanup_clicked)

                        # Group buttons in a container for reliable display
                        button_container = widgets.HBox([download_button, cleanup_button])

                        display(HTML("<h3>Final Actions</h3><p>Please download your report, then clean up the files.</p>"))
                        display(button_container)
                        display(final_output)

                except Exception as e:
                    print(f"\nError: An unexpected error occurred during analysis: {e}")
                    _perform_cleanup(uploaded)

        start_button.on_click(on_start_analysis_clicked)
        display(HTML("<h3>Ready to Proceed</h3><p>Required files are present. Click the button below to start the analysis.</p>"))
        display(start_button, analysis_output)

    else:
        # --- IF VALIDATION FAILS: Show error and automatically clean up ---
        print("\nError: VALIDATION FAILED. Required files are missing.")
        print("   Please make sure both a '_node.csv' file and an '_edge.csv' file were selected.")
        _perform_cleanup(uploaded)