# Program Description: Statistics of Structure Descriptors (Module 5)

## Overview:
This module performs statistical analysis on structural descriptors obtained from previous modules. It generates various statistical tables and visualizations, including histograms, density distribution graphs, and box plots. The statistical analysis is applied to features like coordination numbers (CN), bond lengths (CR), and other structure-related properties.

The module can also analyze datasets after optimization and data division, with the option to exclude outlier samples based on predefined criteria. This is controlled by parameters within the module.

## Input Files/Folders:
- **Dataset Files**: 
  - These are the files containing the structural descriptors, typically located in the folder where this module is executed (by default).
  - The files contain features such as `chi`, `xmu`, `rdf`, CN, CR, and other descriptors.

## Output Files/Folders:
- **Output Folder**: 
  - A new folder named `statistics_{current_time}` is created in the input folder, where `{current_time}` is the current timestamp.
  - This folder will contain statistical analysis tables and visualizations (e.g., histograms, density distribution graphs, box plots) for the structural descriptors.

## Parameters:
- **remove_excluded**: 
  - **True**: Excludes outlier files from the statistical analysis, using predefined criteria for outlier detection.
  - **False**: Includes all files in the statistical analysis, regardless of whether they are identified as outliers or not.
  
- **read_after_division**: 
  - **True**: Processes the dataset after it has been divided into different sets (e.g., training, validation, test).
  - **False**: Processes the entire dataset, without considering any division.
  - Note: The `remove_excluded` parameter is only relevant for datasets before division. If analyzing data after division, this parameter can be ignored.

## Process:
1. **Statistical Analysis**: 
   - The module computes basic statistical metrics, such as mean, median, standard deviation, and variance, for each structural descriptor.
   - It visualizes these metrics with the help of histograms, density distribution plots, and box plots, making it easy to interpret the distribution and spread of the data.
  
2. **Outlier Handling**:
   - The module can exclude outlier data points from the analysis based on user-defined criteria, controlled via the `remove_excluded` parameter.
   - If `remove_excluded` is set to **True**, outliers identified by an external process (like Module 7) will be removed before analysis.

3. **Data Division Handling**:
   - The `read_after_division` parameter allows the user to specify whether to process the dataset before or after it has been divided into subgroups (e.g., training, validation, test).
   - If **True**, the program processes only the datasets that have been divided, ensuring that the statistical analysis is performed separately on each subset.

4. **Output Generation**:
   - The results of the statistical analysis are stored in the `statistics_{current_time}` directory, including CSV files with summary statistics and image files for the generated plots (e.g., histograms, density plots, and box plots).

## Example Workflow:
1. **Input**: Dataset files containing structural descriptors (e.g., `chi`, `xmu`, `rdf`, CN, CR).
2. **Process**: Perform statistical analysis (mean, median, standard deviation), generate visualizations (histograms, density plots, box plots), and exclude outliers if needed.
3. **Output**: Statistical tables and plots saved in a folder named `statistics_{current_time}`.

## Notes:
- This module helps users better understand the statistical properties of the structural descriptors and their distributions across samples.
- The `remove_excluded` parameter allows flexibility in handling outliers, making the analysis more robust when needed.
- The visualizations generated (histograms, box plots, density plots) provide a clear overview of the data's spread and help in identifying patterns or discrepancies in the dataset.


contacts: zhaohf@ihep.ac.cn

#  Import libraries

In [1]:
from os.path import join, splitext, basename
import os
import glob
import pandas as pd
from scipy.interpolate import interp1d
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import sys 
from datetime import datetime
import logging


 ## Version Information

In [2]:
def get_python_version():
    return sys.version
def get_package_version(package_name):
    try:
        module = __import__(package_name)
        version = getattr(module, '__version__', None)
        if version:
            return version
        else:
            return pkg_resources.get_distribution(package_name).version
    except (ImportError, AttributeError, pkg_resources.DistributionNotFound):
        return "Version info not found"

packages = ['matplotlib', 'pandas', 'seaborn','numpy','sklearn','scipy']
for package in packages:
    print(f"{package}: {get_package_version(package)}")
print(f"Python: {get_python_version()}")

matplotlib: 3.7.5
pandas: 2.0.3
seaborn: 0.13.2
numpy: 1.23.5
sklearn: 1.3.2
scipy: 1.10.1
Python: 3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:46:39) 
[GCC 10.4.0]


# Parameter Settings

## Input File:
- **load_path**: 
  - This parameter specifies the path to the input dataset file(s) containing structural descriptors (e.g., `chi`, `xmu`, `rdf`, CN, CR). 
  - The default path is set to the folder where the module is located, but users can modify this to point to any location containing the dataset.

## Output:
- **Output Directory**: 
  - The processed statistical results will be saved in a directory named `statistics_{current_time}` within the input folder. 
  - `{current_time}` is a timestamp of when the analysis was performed, ensuring the output is uniquely labeled.

## Labels Parameter:
- **labels**:
  - Defines which structural descriptors to analyze. It can be one or more of the following:
    - `cr`: Bond length (CR) data.
    - `cn`: Coordination number (CN) data.
  - The range of values for this parameter is `[cr, cn]`. You can select both descriptors or a single one to perform statistical analysis.

## Precision Parameters:
- **cn_precision**:
  - Specifies the number of decimal places to retain for the statistical analysis of the **coordination number (CN)** data. 
  - This allows users to control the precision level of CN values in the output statistics (e.g., 2 decimal places for CN precision).
  
- **cr_precision**:
  - Specifies the number of decimal places to retain for the statistical analysis of the **bond length (CR)** data. 
  - Similar to `cn_precision`, this parameter controls the precision level of CR values in the output statistics.

## Notes:
- **load_path** should point to the correct dataset folder containing the structural descriptors.
- The **labels** parameter determines which descriptors will be analyzed (CN and/or CR).
- **cn_precision** and **cr_precision** control the decimal precision in the statistical output for CN and CR, respectively.
- The output is saved in a timestamped folder for easy versioning and reference.


In [3]:
# Range of parameters for coordination number (`cn`) and coordination bond length (`cr`)
labels = ["cr", "cn"]

# `read_after_division` determines whether to use the dataset before or after statistical division
read_after_division = False

# `cn_precision` is the precision limit for the coordination number. Although the coordination number is theoretically an integer (typically between 1 and 12),
# it can sometimes have decimal values in chemistry.
# `cr_precision` is the precision limit for the coordination bond length (in angstroms). 
# The first coordination bond length typically ranges between 2 and 3 angstroms, and XAS measurement precision is typically around 0.01 angstroms.
cn_precision = 0
cr_precision = 1

# The `remove_excluded` parameter determines whether to process the filtered data.
# It can be set to `True` or `False`.
remove_excluded = True

# `keyword` defines the search term for identifying relevant data. Possible values can be elements, compounds, or other identifiers (e.g., "Cu").
keyword = "Cu"

# Get the current timestamp for file naming and logging
current_time = datetime.now().strftime("%Y%m%d_%H%M")
# If `read_after_division` is True, load the dataset after it has been divided and perform relevant operations
if read_after_division:
    dateset_path= "0926-datasets"
    load_path = os.path.join(dateset_path, "prepare")
    date_path = os.path.join(load_path, "datasets(JmolNN)")  # Specify the directory for JmolNN dataset
    statistics_path = os.path.join(dateset_path, f"division_statistics_{current_time}")
    output_log_path = os.path.join(statistics_path, 'output_log.txt')
    os.makedirs(statistics_path, exist_ok=True)
    # Uncomment and select from these if analyzing specific dataset subsets: ["all", "train", "valid", "test"]
    # If analyzing the divided dataset, select the appropriate structure data method
    method = "JmolNN"
    # Specify the dataset(s) to be counted (all, train, etc.)
    data_set = ["all", "train"]
else:
    # If `read_after_division` is False, load the dataset before division
    dateset_path= "0926-datasets"
    load_path = os.path.join(dateset_path, "prepare")
    statistics_path = os.path.join(dateset_path, f"statistics_{current_time}")
    output_log_path = os.path.join(statistics_path, 'output_log.txt')
    os.makedirs(statistics_path, exist_ok=True)
    # If analyzing the filtered dataset, select the appropriate method to find the corresponding index file
    method = "JmolNN"
    
    if remove_excluded:
        # `excluded_indices_file` contains indices of the excluded samples that need to be removed from the dataset
        excluded_indices_file = os.path.join(load_path, f"indices_to_move_{method}.csv")
        excluded_indices = []
        
        # If the excluded indices file exists, load the indices to be excluded
        if os.path.exists(excluded_indices_file):
            excluded_indices = pd.read_csv(excluded_indices_file)["index"].tolist()
            print(f"Read {len(excluded_indices)} excluded sample indices. Processing the filtered dataset.")
            logging.info(f"Read {len(excluded_indices)} excluded sample indices. Processing the filtered dataset.")
        else:
            if remove_excluded:
                print(f"File '{excluded_indices_file}' not found. Unable to process the filtered dataset.")
            else:
                print(f"File '{excluded_indices_file}' not found. Processing the entire dataset without filtering.")
    else:
        # If no excluded samples need to be removed, process the entire dataset
        excluded_indices = []
        print(f"Read {len(excluded_indices)} excluded sample indices. Processing the entire dataset.")
        logging.info(f"Read {len(excluded_indices)} excluded sample indices. Processing the entire dataset.")


File '0926-datasets/prepare/indices_to_move_JmolNN.csv' not found. Unable to process the filtered dataset.


## Fix: Check if the file has any issues

In [4]:
# Function to build file paths for labels based on the given base path, keyword, label types, and data subsets
def build_label_paths(base_path, keyword, labels, data_subsets):
    file_paths = {}
    for label_type in labels:
        for subset in data_subsets:
            # Define the pattern to search for label files based on the label type and data subset
            pattern = os.path.join(base_path, f"*label_{label_type}_{subset}*.txt")
            print(f"Checking pattern: {base_path}/*label_{label_type}_{subset}*.txt")
            # Use glob to find all files matching the pattern
            matched_files = glob.glob(pattern)
            
            # If files are matched, store the first matched file in the dictionary
            if matched_files:
                print(f"Matched files: {matched_files}")
                key = f"{keyword}_{label_type}_{subset}"
                file_paths[key] = matched_files[0]
            else:
                # If no files are matched, log a warning
                logging.warning(f"No files matched for pattern: {pattern}")
    
    return file_paths

# Depending on the `read_after_division` flag, build the appropriate file paths
if read_after_division:
    # If reading after division, use `build_label_paths` to get paths for the divided data
    label_paths = build_label_paths(date_path, keyword, labels, data_set)
else:
    # Otherwise, create a dictionary of paths for the labels in the base load path
    paths = {label: os.path.join(load_path, label) for label in labels}

# Ensure that the directory for statistics exists (create it if not)
def ensure_directory_exists(directory):
    if not os.path.exists(directory):
        # If the directory does not exist, create it
        os.makedirs(directory)
        print(f"Directory '{directory}' created.")
    else:
        # If the directory already exists, no need to create it
        print(f"Directory '{directory}' already exists, no need to create.")

# Ensure that the directory for storing statistics exists
ensure_directory_exists(statistics_path)

# Set up logging for the output with the specified log file path and format
logging.basicConfig(filename=output_log_path, level=logging.INFO, format='%(message)s')


Directory '0926-datasets/statistics_20250121_0956' already exists, no need to create.


In [5]:
# Function to build file paths for labels based on the given base path, keyword, labels, and data subsets
def build_label_paths(base_path, keyword, labels, data_subsets):
    file_paths = {}
    for label_type in labels:
        for subset in data_subsets:
            # Define the pattern to search for label files based on the label type and data subset
            pattern = os.path.join(base_path, f"*label_{label_type}_{subset}*.txt")
            print(f"Checking pattern: {base_path}/*label_{label_type}_{subset}*.txt")
            # Use glob to find all files matching the pattern
            matched_files = glob.glob(pattern)
            
            # If files are matched, store the first matched file in the dictionary
            if matched_files:
                print(f"Matched files: {matched_files}")
                key = f"{keyword}_{label_type}_{subset}"
                file_paths[key] = matched_files[0]
            else:
                # If no files are matched, log a warning
                logging.warning(f"No files matched for pattern: {pattern}")
    
    return file_paths

# If reading after division, build the file paths for the divided dataset
if read_after_division:
    # Call the `build_label_paths` function to retrieve paths based on the `date_path` and `data_set`
    label_paths = build_label_paths(date_path, keyword, labels, data_set)

    # Output the label paths in a unified format for the user
    for key, path in label_paths.items():
        print(f"{key}: {path}")
else:
    # If not reading after division, directly map the paths for the labels in the base load path
    paths = {label: os.path.join(load_path, label) for label in labels}

    # Output the paths in a unified format for the user
    for label, path in paths.items():
        print(f"{label}: {path}")

# Function to ensure that a directory exists; creates it if it doesn't
def ensure_directory_exists(directory):
    if not os.path.exists(directory):
        # If the directory does not exist, create it
        os.makedirs(directory)
        print(f"Directory '{directory}' created.")
    else:
        # If the directory already exists, notify the user that no creation is necessary
        print(f"Directory '{directory}' already exists, no need to create.")

# Ensure the directory for storing statistics exists
ensure_directory_exists(statistics_path)

# Set up logging for the output with the specified log file path and format
logging.basicConfig(filename=output_log_path, level=logging.INFO, format='%(message)s')


cr: 0926-datasets/prepare/cr
cn: 0926-datasets/prepare/cn
Directory '0926-datasets/statistics_20250121_0956' already exists, no need to create.


# Function settings

In [6]:

def count_samples_in_cn_cr(data_dir, file_pattern="*.csv", remove_excluded=False, excluded_indices=[]):

    if read_after_division:
        sample_counts = {}
        df = pd.read_csv(data_dir)  # Read the CSV file directly (if after division)
        print(data_dir)
        logging.info(data_dir)
        sample_counts[basename(data_dir)] = len(df) + 1  # Account for potential header or extra row
        print(sample_counts)
        logging.info(sample_counts)
    else:
        file_list = glob.glob(join(data_dir, file_pattern))  # Get the list of files matching the pattern
        sample_counts = {}
        for file_path in file_list:
            try:
                df = pd.read_csv(file_path)  # Read the CSV file
                if remove_excluded:
                    df = df[~df['index'].isin(excluded_indices)]  # Exclude specific indices if needed
                sample_counts[basename(file_path)] = len(df)  # Count the number of rows in the file
            except Exception as e:
                print(f"Error processing file {file_path}: {e}")
        return sample_counts

# Extract the method name from a file path by parsing the filename
def extract_method_name(file_path):
    """Extract the method name from the file path by splitting the filename."""
    file_name = splitext(basename(file_path))[0]
    method_name = file_name.split('_')[1]  # Method is the second part of the file name
    return method_name


# Function to process a list of data arrays to ensure they all have the same length,
# padding shorter arrays or truncating longer arrays as needed.
def process_data(data_list, max_length=None):

    if max_length is None:
        max_length = max(len(data) for data in data_list)  # Use the longest array length by default
    
    processed_data = []
    for data in data_list:
        if len(data) < max_length:  # If data is shorter than the target length
            interp_func = interp1d(np.arange(len(data)), data, kind='linear', fill_value='extrapolate')
            processed_data.append(interp_func(np.linspace(0, len(data)-1, max_length)))  # Interpolate to match max length
        else:
            processed_data.append(data[:max_length])  # Truncate if necessary
    
    return np.array(processed_data)


# Function to generate statistical summaries and plots from CSV data files.
def generate_statistics_and_plots(data_dir, plot_dir, precision=None, file_pattern="*.csv", excluded_indices_file=None, remove_excluded=False, read_after_division=False):
    if read_after_division:
        file_list = [data_dir]  # If reading after division, use the single file
        print(f"Reading single file: {file_list}")
    else:
        file_list = sorted(glob.glob(join(data_dir, file_pattern)))  # Get list of files matching the pattern
        if not file_list:
            print(f"No files matching pattern {file_pattern} found in directory {data_dir}.")
            return
    
    stats = []
    
    # Ensure that the plot directory exists, creating it if necessary
    ensure_directory_exists(plot_dir)

    for file_path in file_list:
        try:
            if read_after_division:
                df = pd.read_csv(file_path, delim_whitespace=True, header=None)  # Read TXT file if after division
            else:
                df = pd.read_csv(file_path)  # Default CSV reading method
            
            if df.empty:
                print(f"Warning: {file_path} is empty.")
                continue

            # Extract method name from the file path
            full_method_name = extract_method_name(file_path)

            # Split the method name if necessary (in case there are parts separated by '-')
            method_name_parts = full_method_name.split('-')
            method_name = method_name_parts[1].strip() if len(method_name_parts) > 1 else full_method_name

            # Remove excluded indices if requested
            if remove_excluded and excluded_indices_file:
                excluded_indices_df = pd.read_csv(excluded_indices_file, header=None)["index"].tolist()
                df = df[~df.index.isin(excluded_indices_df)]

            # Statistical description of the data column
            if read_after_division:
                data_column = df[0]  # Use the first column if reading after division
            else:
                data_column = df.iloc[:, 1]  # Otherwise, use the second column as data

            description = data_column.describe()  # Get the descriptive statistics of the data
            stats.append(description)

            # Plotting

            # Density plot
            plt.figure(figsize=(12, 8))
            sns.kdeplot(data_column, fill=True)
            plt.xlabel(f"{'Coordination Number' if 'cn' in data_dir else 'Bond length'} - {method_name}", fontsize=16)
            plt.ylabel("Density", fontsize=16)
            plt.title("Density Plot", fontsize=16)
            plt.xticks(rotation=0, fontsize=14)
            plt.yticks(rotation=0, fontsize=14)
            plt.tight_layout()
            plot_filename = f"{full_method_name}_density.png"
            plot_path = join(plot_dir, plot_filename)
            plt.savefig(plot_path)
            plt.close()
            print(f"Plot saved: {plot_path}")

            # Box plot
            plt.figure(figsize=(10, 8))
            sns.boxplot(y=data_column, color='skyblue', medianprops={'color': 'red'})
            plt.ylabel(f"y ({method_name})", fontsize=16)
            plt.title("Box Plot", fontsize=16)
            x_label = "Bond Length (Å)" if "cr" in data_dir else "Coordination Number"
            plt.xlabel(x_label, fontsize=16)
            plt.xticks(rotation=0, fontsize=14)
            plt.tight_layout()
            plot_filename = f"{full_method_name}_box.png"
            plot_path = join(plot_dir, plot_filename)
            plt.savefig(plot_path)
            plt.close()
            print(f"Plot saved: {plot_path}")

            # Histogram
            min_value = data_column.min()
            max_value = data_column.max()
            precision = precision or max(int(-np.floor(np.log10(data_column.abs().max()))), 0)
            num_bins = min(30, int((max_value - min_value) / (10 ** -precision)))

            bin_width = (max_value - min_value) / num_bins
            bins = np.arange(min_value, max_value + bin_width, bin_width)

            plt.figure(figsize=(12, 8))
            ax = sns.histplot(data_column, bins=bins, kde=False)
            ax.set_xticks(bins + bin_width / 2)
            ax.set_xticklabels([f'{bin_edge:.{precision}f}' for bin_edge in bins], rotation=45, ha='right')

            num_patches = len(ax.patches)
            print(f"Number of patches: {num_patches}")
            if num_patches > 15:
                plt.xticks(rotation=45, fontsize=12, fontstyle='italic')
            else:
                plt.xticks(rotation=0, fontsize=14)
            plt.xlabel(f"{'Coordination Number' if 'cn' in data_dir else 'Bond length'} - {method_name}", fontsize=16)
            plt.ylabel("Count", fontsize=16)
            plt.title("Histogram", fontsize=16)
            plt.tight_layout()

            for p in ax.patches:
                if p.get_height() > 0:
                    ax.annotate(f'{int(p.get_height())}', (p.get_x() + p.get_width() / 2., p.get_height()),
                                ha='center', va='center', xytext=(0, 5), textcoords='offset points')

            plot_filename = f"{full_method_name}_hist.png"
            plot_path = join(plot_dir, plot_filename)
            plt.savefig(plot_path)
            plt.close()
            print(f"Plot saved: {plot_path}")

        except Exception as e:
            print(f"Error processing file {file_path}: {e}")

    # Combine all statistical descriptions into a single DataFrame and save as CSV
    stats_df = pd.concat(stats, axis=1)
    stats_df.columns = [splitext(basename(f))[0] for f in file_list]
    stats_df.to_csv(join(plot_dir, 'statistics_summary.csv'))
    print("Statistics summary saved.")


# Main program

In [7]:
for label in labels:
    if read_after_division:
        for subset in data_set:
            keyword_label = f"{keyword}_{label}_{subset}" 
            print(keyword_label)
            data_dir = paths[keyword_label]  
            sample_counts = count_samples_in_cn_cr(data_dir, file_pattern="*.csv", remove_excluded=None, excluded_indices=None)
    else:
        data_dir = paths[label]
        sample_counts = count_samples_in_cn_cr(data_dir, file_pattern="*.csv", remove_excluded=remove_excluded, excluded_indices=excluded_indices)
        logging.info(f"Sample counts for {label}:")
        print(f"Sample counts for {label}:")
        for file_name, count in sample_counts.items():
            print(f"{file_name}: {count} samples")
            logging.info(f"{file_name}: {count} samples")

Sample counts for cr:
cr_MinimumDistanceNN.csv: 5001 samples
cr_JmolNN.csv: 5001 samples
cr_VoronoiNN.csv: 5001 samples
cr_CrystalNN.csv: 5001 samples
cr_BrunnerNN_relative.csv: 5001 samples
cr_EconNN.csv: 5001 samples
Sample counts for cn:
cn_CrystalNN.csv: 5001 samples
cn_MinimumDistanceNN.csv: 5001 samples
cn_JmolNN.csv: 5001 samples
cn_BrunnerNN_relative.csv: 5001 samples
cn_EconNN.csv: 5001 samples
cn_VoronoiNN.csv: 5001 samples


In [8]:
def label_statistics(labels, remove_excluded=False, cn_precision=None, cr_precision=None):
    for label in labels:
        if read_after_division:  # If the data is split after division
            # Process data for each subset (e.g., train, valid, test)
            for subset in data_set:
                keyword_label = f"{keyword}_{label}_{subset}"
                print(f"Processing label: {keyword_label}")
                data_dir = paths[keyword_label]  # Get the directory for the current label subset
                output_dir = join(statistics_path, f'{label}_{subset}_distribution_plots')  # Directory to save plots
                
                # Determine the appropriate precision based on the label type
                precision = cn_precision if label == "cn" else cr_precision
                
                # Call the function to generate statistics and plots for the data
                generate_statistics_and_plots(data_dir, output_dir, precision=precision, 
                                              remove_excluded=None, read_after_division=True)
        else:  # If not reading after division, process the full data
            data_dir = paths[label]  # Get the directory for the current label
            output_dir = join(statistics_path, f'{label}_distribution_plots')  # Directory to save plots
            print(f"Processing label: {label}, data_dir: {data_dir}, output_dir: {output_dir}")
            
            # Select precision based on the label type (coordination number or bond length)
            precision = cn_precision if label == "cn" else cr_precision
            generate_statistics_and_plots(data_dir, output_dir, precision=precision, 
                                          remove_excluded=remove_excluded)

# Call the function based on whether the data is read after division
if read_after_division:
    label_statistics(labels, remove_excluded=None, cn_precision=cn_precision, cr_precision=cr_precision)
else:
    label_statistics(labels, remove_excluded=remove_excluded, cn_precision=cn_precision, cr_precision=cr_precision)


Processing label: cr, data_dir: 0926-datasets/prepare/cr, output_dir: 0926-datasets/statistics_20250121_0956/cr_distribution_plots
Directory '0926-datasets/statistics_20250121_0956/cr_distribution_plots' created.
Plot saved: 0926-datasets/statistics_20250121_0956/cr_distribution_plots/BrunnerNN_density.png
Plot saved: 0926-datasets/statistics_20250121_0956/cr_distribution_plots/BrunnerNN_box.png
Number of patches: 24
Plot saved: 0926-datasets/statistics_20250121_0956/cr_distribution_plots/BrunnerNN_hist.png
Plot saved: 0926-datasets/statistics_20250121_0956/cr_distribution_plots/CrystalNN_density.png
Plot saved: 0926-datasets/statistics_20250121_0956/cr_distribution_plots/CrystalNN_box.png
Number of patches: 9
Plot saved: 0926-datasets/statistics_20250121_0956/cr_distribution_plots/CrystalNN_hist.png
Plot saved: 0926-datasets/statistics_20250121_0956/cr_distribution_plots/EconNN_density.png
Plot saved: 0926-datasets/statistics_20250121_0956/cr_distribution_plots/EconNN_box.png
Number o