# Program Description: Spectra & Structure Descriptors Plot (Module 4)

## Input Files/Folders:
- **Dataset Files**: The module reads dataset files (by default, located in the same directory as the module). These datasets include various spectra and structure descriptors such as chi, xmu, norm, CR, CN, and RDF.

## Output Files/Folders:
- The output plots are saved in a folder named `{plots_{current_time}}`, which is created within the input folder to store the visualization results.

## Visualization Details:
- This module randomly selects features (such as chi, xmu, norm, CR, CN, RDF) and visualizes them in intervals. A random seed is set to ensure that the same samples are selected consistently in each execution.
- The results are displayed using various plot types:
  - **Bar charts**: Used to visualize structural features like CN and CR.
  - **Spectra**: Plotted for chi, xmu, and other descriptors as required.
- **Start and End Parameters**: These parameters define the sample range to be visualized, providing flexibility in selecting the portion of data for plotting.

## Special Note:
- To plot the **k²χ** (k-squared chi) spectrum, simply input the chi spectrum data and add `k2chi` to the `data_types` parameter. This will automatically generate the k²χ spectrum alongside other plots.

contacts: zhaohf@ihep.ac.cn

#  Import libraries

In [1]:
import os
from os.path import join, splitext, split, basename
import glob
import random
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import shutil
import sys 
import numpy as np
from datetime import datetime


# Version Information

In [2]:
def get_python_version():
    return sys.version
def get_package_version(package_name):
    try:
        module = __import__(package_name)
        version = getattr(module, '__version__', None)
        if version:
            return version
        else:
            return pkg_resources.get_distribution(package_name).version
    except (ImportError, AttributeError, pkg_resources.DistributionNotFound):
        return "Version info not found"

packages = ['matplotlib', 'pandas', 'seaborn','numpy']
for package in packages:
    print(f"{package}: {get_package_version(package)}")
print(f"Python: {get_python_version()}")

matplotlib: 3.7.5
pandas: 2.0.3
seaborn: 0.13.2
numpy: 1.23.5
Python: 3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:46:39) 
[GCC 10.4.0]


# Parameter Settings

## Input Files:
- **Input Files**: The input dataset files (by default, in the same directory as the module) are read for visualization purposes. These files contain the spectra and structural descriptors like chi, xmu, norm, CR, CN, and RDF.

## Output Files:
- **Output Folder**: The visualized plots will be saved in a folder named `{plots_{current_time}}` within the input folder, where `{current_time}` represents the timestamp at which the module is run.

## Data Types:
- **data_types**: This parameter allows users to specify which data variables will be visualized. Users can select multiple data types such as chi, xmu, norm, CR, CN, RDF, or even k²χ (by adding `k2chi` to the list).
  - Example: `data_types = ['chi', 'xmu', 'k2chi']`
  - The selected data types will be visualized accordingly in the generated plots.


In [3]:
import os
from datetime import datetime

# Input directory path containing the dataset
load_path = "0926-datasets"

# List of variables to visualize (can include 'chi', 'xmu', 'cr', 'cn', etc.)
data_types = ["chi", "xmu", "cr", "cn"]

# Flag to specify whether to save all plots in one directory or separate directories
all_in_one = True

# Check if the input directory exists
if os.path.exists(load_path):
    print(f"Directory '{load_path}' exists.")
else:
    # Raise an error if the directory doesn't exist
    raise FileNotFoundError(f"Directory '{load_path}' does not exist.")

# Create a new folder path for storing plots with a timestamp
current_time = datetime.now().strftime("%Y%m%d_%H%M")
prepare_path = os.path.join(load_path, f"plots_{current_time}")

# Create directories for saving plots
if all_in_one:
    # If 'all_in_one' is True, create a folder for all plots
    all_plot_path = os.path.join(prepare_path, "all_plots")
    os.makedirs(all_plot_path, exist_ok=True)

# Print the path where plots will be saved
print(f"Output folder '{prepare_path}' created.")

File '0926-datasets' exists.
Output folder '0926-datasets/plots_20250115_1649'


## Visualization Range Parameter Settings

- **`interval`**:
  - **Description**: Specifies the interval between data points to be visualized. 
  - **Usage**: Determines how frequently data points are selected for plotting. For example, an interval of `2` will select every second data point for visualization.
  
- **`start`**:
  - **Description**: Defines the starting sample index for visualization.
  - **Usage**: Sets the first sample to be included in the visualization.
  - **Default**: If set to `None`, visualization starts from the first sample in the dataset.

- **`end`**:
  - **Description**: Defines the ending sample index for visualization.
  - **Usage**: Sets the last sample to be included in the visualization.
  - **Default**: If set to `None`, visualization includes samples up to the last one in the dataset.

- **Behavior When `start` or `end` is `None`**:
  - If either `start` or `end` is set to `None`, the module defaults to visualizing from the first sample (`start=None`) or up to the last sample (`end=None`) in the dataset. This ensures the full range of samples is considered unless specified otherwise.


In [4]:
# Interval parameter sets the interval between visualized data points
interval = 1000

# Start parameter indicates the starting sample index, 
# and the end parameter specifies the ending sample index.
# 'None' means counting from the 'start' index to the last sample.
start = 0
end = None

# CR bin width parameter sets the bin width for the CR statistics.
# 'None' means automatically setting the bin width based on the data distribution.
cr_bin_width = 0.1

# Print the configured parameters for reference
print(f"Interval between visualized data points: {interval}")
print(f"Starting sample: {start}")
print(f"Ending sample: {end}")
print(f"CR bin width: {cr_bin_width}")


Interval between visualized data points: 1000
Starting sample: 0
Ending sample: None
CR bin width: 0.1


In [5]:
# Dictionary to store paths
paths = {}

# Iterate over each data type to find corresponding folder paths
for data_type in data_types:
    if data_type == "k2chi":
        # Special handling for k2chi data type: if a direct k2chi folder doesn't exist, use chi as a fallback
        k2chi_folders = glob.glob(join(load_path, "*k2chi*"))
        if k2chi_folders:
            paths[data_type] = k2chi_folders[0]
        else:
            chi_folders = glob.glob(join(load_path, "*chi*"))
            if chi_folders:
                paths[data_type] = chi_folders[0]
            else:
                print(f"Warning: Folder '{data_type}' not found in {load_path}.")
    else:
        # For other data types, search for the respective folder
        data_folders = glob.glob(join(load_path, f"*{data_type}*"))
        if data_folders:
            paths[data_type] = data_folders[0]  # Assume there is only one folder per data type
        else:
            print(f"Warning: Folder '{data_type}' not found in {load_path}.")

# Dictionary to store save paths
save_paths = {}

# Iterate over each data type to create corresponding save paths
for data_type, data_folder in paths.items():
    if data_folder:
        if data_type == "k2chi":
            save_paths[data_type] = join(prepare_path, "k2chi")
        else:
            save_paths[data_type] = join(prepare_path, basename(data_folder))

# Create all necessary directories if they don't exist
for path in [prepare_path, load_path] + list(save_paths.values()):
    os.makedirs(path, exist_ok=True)

# Print data paths and save paths
for data_type in data_types:
    data_path = paths.get(data_type, "N/A")
    save_data_path = save_paths.get(data_type, "N/A")
    
    # Dynamically assign paths as globals
    globals()[f"{data_type}_path"] = data_path
    globals()[f"save_{data_type}_path"] = save_data_path

    # Print the paths
    print(f"{data_type} data path: {data_path}")
    print(f"{data_type} data save path: {save_data_path}")


chi data path: 0926-datasets/chi
chi data save path: 0926-datasets/plots_20250115_1649/chi
xmu data path: 0926-datasets/xmu
xmu data save path: 0926-datasets/plots_20250115_1649/xmu
cr data path: 0926-datasets/cr
cr data save path: 0926-datasets/plots_20250115_1649/cr
cn data path: 0926-datasets/cn
cn data save path: 0926-datasets/plots_20250115_1649/cn
chi data path: 0926-datasets/chi
chi data save path: 0926-datasets/plots_20250115_1649/chi
xmu data path: 0926-datasets/xmu
xmu data save path: 0926-datasets/plots_20250115_1649/xmu
cr data path: 0926-datasets/cr
cr data save path: 0926-datasets/plots_20250115_1649/cr
cn data path: 0926-datasets/cn
cn data save path: 0926-datasets/plots_20250115_1649/cn


#  Function settings

In [6]:
random_seed = 42
random.seed(random_seed)

# Function to plot k2chi spectra. Parameters: data_dir, output_dir, interval, start, end.
# data_dir: input file path, output_dir: output file path, interval: visualization interval, start: starting sample, end: ending sample
def plot_k2chi(data_dir, output_dir, interval, start, end):
    os.makedirs(output_dir, exist_ok=True)
    file_list = sorted(glob.glob(join(data_dir, "*.csv")), key=lambda x: int(splitext(basename(x))[0].split('_')[-1]))
    if end is None:
        end = len(file_list)
    for i in range(start, end, interval):
        end_interval = min(i + interval, end)
        random_index = random.randint(i, end_interval - 1)
        file = file_list[random_index]
        try:
            dat = pd.read_csv(file)
            if dat.empty:
                continue
            plt.figure()
            plt.plot(dat['k'], dat['chi'] * (dat['k'] ** 2))
            plt.xlabel('$k(\\mathrm{\AA}^{-1})$', fontsize=14)
            plt.ylabel('$k^2\\chi$(a.u.)', labelpad=0.1, fontsize=14)
            plt.title(f'Sample {random_index}', fontsize=14)
            plt.xticks(rotation=0, fontsize=12)
            plt.yticks(rotation=0, fontsize=12)
            plt.tight_layout()
            output_path = join(output_dir, f'sample_{random_index}_k2chi.png')
            
            plt.savefig(output_path)
            plt.tight_layout()
            plt.close()
            print(f"Plotted: {file}, saved to {output_path}")
        except Exception as e:
            print(f"Error processing file {file}: {e}")

# Function to plot chi spectra. Parameters: data_dir, output_dir, interval, start, end.
# data_dir: input file path, output_dir: output file path, interval: visualization interval, start: starting sample, end: ending sample
def plot_chi(data_dir, output_dir, interval, start, end):
    os.makedirs(output_dir, exist_ok=True)
    file_list = sorted(glob.glob(join(data_dir, "*.csv")), key=lambda x: int(splitext(basename(x))[0].split('_')[-1]))
    if end is None:
        end = len(file_list)
    for i in range(start, end, interval):
        end_interval = min(i + interval, end)
        random_index = random.randint(i, end_interval - 1)
        file = file_list[random_index]
        try:
            dat = pd.read_csv(file)
            if dat.empty:
                continue
            plt.figure()
            plt.plot(dat['k'], dat['chi'])
            plt.xlabel('$k(\\mathrm{\AA}^{-1})$', fontsize=14)
            plt.ylabel('$\chi$(a.u.)', labelpad=0.1, fontsize=14)
            plt.title(f'Sample {random_index}', fontsize=14)
            plt.xticks(rotation=0, fontsize=12)
            plt.yticks(rotation=0, fontsize=12)
            plt.tight_layout()
            output_path = join(output_dir, f'sample_{random_index}_chi.png')
            plt.savefig(output_path)
            plt.tight_layout()
            plt.close()
            print(f"Plotted: {file}, saved to {output_path}")
        except Exception as e:
            print(f"Error processing file {file}: {e}")

# Function to plot xmu spectra. Parameters: data_dir, output_dir, interval, start, end.
# data_dir: input file path, output_dir: output file path, interval: visualization interval, start: starting sample, end: ending sample
def plot_xmu(data_dir, output_dir, interval, start, end):
    os.makedirs(output_dir, exist_ok=True)
    file_list = sorted(glob.glob(join(data_dir, "*.csv")), key=lambda x: int(splitext(basename(x))[0].split('_')[-1]))
    if end is None:
        end = len(file_list)
    for i in range(start, end, interval):
        end_interval = min(i + interval, end)
        random_index = random.randint(i, end_interval - 1)
        file = file_list[random_index]
        try:
            dat = pd.read_csv(file)
            if dat.empty:
                continue
            plt.figure()
            plt.plot(dat['energy'], dat['mu'])
            plt.xlabel('E (eV)', fontsize=14)
            plt.ylabel('$\mu$(a.u.)', fontsize=14)
            plt.title(f'Sample {random_index}')
            plt.xticks(rotation=0, fontsize=12)
            plt.yticks(rotation=0, fontsize=12)
            plt.tight_layout()
            output_path = join(output_dir, f'sample_{random_index}_xmu.png')
            plt.savefig(output_path)
            plt.close()
            print(f"Plotted: {file}, saved to {output_path}")
        except Exception as e:
            print(f"Error processing file {file}: {e}")

# Function to plot normalized spectra. Parameters: data_dir, output_dir, interval, start, end.
# data_dir: input file path, output_dir: output file path, interval: visualization interval, start: starting sample, end: ending sample
def plot_norm(data_dir, output_dir, interval, start, end):
    os.makedirs(output_dir, exist_ok=True)
    file_list = sorted(glob.glob(join(data_dir, "*.csv")), key=lambda x: int(splitext(basename(x))[0].split('_')[-1]))
    if end is None:
        end = len(file_list)
    for i in range(start, end, interval):
        end_interval = min(i + interval, end)
        random_index = random.randint(i, end_interval - 1)
        file = file_list[random_index]
        try:
            dat = pd.read_csv(file)
            if dat.empty:
                continue
            plt.figure()
            plt.plot(dat['energy'], dat['norm'])
            plt.xlabel('E (eV)', fontsize=14)
            plt.ylabel('Normalized Mu', fontsize=14)
            plt.title(f'Norm (Sample {random_index})', fontsize=14)
            plt.xticks(rotation=0, fontsize=12)
            plt.yticks(rotation=0, fontsize=12)
            plt.tight_layout()
            output_path = join(output_dir, f'sample_{random_index}_norm.png')
            plt.savefig(output_path)
            plt.close()
            print(f"Plotted: {file}, saved to {output_path}")
        except Exception as e:
            print(f"Error processing file {file}: {e}")

# Function to copy files at every nth interval. Parameters: src_dir, dest_dir, interval, start, end.
# src_dir: source directory, dest_dir: destination directory, interval: interval, start: starting sample, end: ending sample
def copy_files_every_nth(src_dir, dest_dir, interval, start, end):
    """Copies every nth file from src_dir to dest_dir"""
    if not os.path.exists(dest_dir):
        os.makedirs(dest_dir)

    file_list = sorted(glob.glob(join(src_dir, "*.png")), key=lambda x: int(splitext(basename(x))[0].split('_')[-1]))
    if end is None:
        end = len(file_list)
    for i in range(start, end, interval):
        end_interval = min(i + interval, end)
        random_index = random.randint(i, end_interval - 1)
        src_file = file_list[random_index]
        dest_file = join(dest_dir, splitext(basename(src_file))[0] + '.png')
        shutil.copy(src_file, dest_file)
        print(f"Copied: {src_file} to {dest_file}")

# Function to plot the distribution of coordination numbers as a bar chart. Parameters: data_dir, plot_dir, start, end, file_pattern.
# data_dir: input file path, plot_dir: output file path, start: starting sample (default 0), end: ending sample (default None), file_pattern: file pattern
def plot_cn_method_distribution(data_dir, plot_dir, start=0, end=None, file_pattern="*.csv"):
    """Traverses each file, plots a bar distribution chart, and uses the filename as the title"""
    file_list = sorted(glob.glob(join(data_dir, file_pattern)))
    if not file_list:
        print(f"No files matching pattern {file_pattern} found in directory {data_dir}.")
        return

    for file_path in file_list:
        print(f"Processing file: {file_path}")
        try:
            df = pd.read_csv(file_path)
            if df.empty:
                print(f"Warning: {file_path} is empty.")
                continue

            # Extract method_name
            method_name = splitext(basename(file_path))[0].split('_', 1)[1]

            num_samples = len(df)  # Get the number of rows as sample count

            # If end is not specified, set it to the maximum number of samples
            if end is None:
                end = num_samples

            df = df.iloc[start:end]  # Slice using start and end
            if df.empty:
                print(f"No data in the specified index range for file {file_path}")
                continue

            # Extract and count column values
            value_counts = df.iloc[:, 1].value_counts().reset_index()
            value_counts.columns = ["value", "count"]

            # Plot bar chart
            plt.figure(figsize=(10, 8))
            ax = sns.barplot(x="value", y="count", data=value_counts, color="blue")
            plt.xlabel(f"Coordination Number ({method_name})", fontsize=14)
            plt.ylabel("Count", fontsize=14)
            plt.title("Distribution of Coordination Numbers", fontsize=14)
            plt.xticks(rotation=0, fontsize=12)
            plt.yticks(rotation=0, fontsize=12)

            # Adjust font style and size dynamically
            if len(value_counts) > 15:
                plt.xticks(rotation=45, fontsize=10, fontstyle='italic')
            else:
                plt.xticks(rotation=0, fontsize=12)

            plt.tight_layout()

            # Add annotations
            for p in ax.patches:
                ax.annotate(f"{int(p.get_height())}", (p.get_x() + p.get_width() / 2., p.get_height()),
                            ha='center', va='center', fontsize=10, color='black', xytext=(0, 5),
                            textcoords='offset points')

            if not os.path.exists(plot_dir):
                os.makedirs(plot_dir)

            plot_filename = f"{method_name}.png"
            plot_path = join(plot_dir, plot_filename)
            plt.savefig(plot_path)
            plt.close()
            print(f"Plot saved: {plot_path}")

        except Exception as e:
            print(f"Error processing file {file_path}: {e}")

# Function to plot the distribution of bond lengths as a histogram. Parameters: data_dir, plot_dir, start, end, file_pattern, max_bins, bin_width.
# data_dir: input file path, plot_dir: output file path, start: starting sample (default 0), end: ending sample (default None), file_pattern: file pattern,
# max_bins: maximum number of bins (default 30), bin_width: width of bins
def plot_cr_method_distribution(data_dir, plot_dir, start=0, end=None, file_pattern="*.csv", max_bins=30, bin_width=None):
    """Traverses each file, plots a histogram distribution chart, and uses the filename as the title"""
    file_list = sorted(glob.glob(join(data_dir, file_pattern)))
    if not file_list:
        print(f"No files matching pattern {file_pattern} found in directory {data_dir}.")
        return

    for file_path in file_list:
        print(f"Processing file: {file_path}")
        try:
            df = pd.read_csv(file_path)
            if df.empty:
                print(f"Warning: {file_path} is empty.")
                continue

            # Extract method_name
            method_name = splitext(basename(file_path))[0].split('_', 1)[1]

            num_samples = len(df)  # Get the number of rows as sample count

            # If end is not specified, set it to the maximum number of samples plus one
            current_end = end if end is not None else num_samples

            df = df.iloc[start:current_end]  # Slice using start and current_end
            if df.empty:
                print(f"No data in the specified index range for file {file_path}")
                continue

            min_value = df.iloc[:, 1].min()
            max_value = df.iloc[:, 1].max()

            # Calculate bins
            current_bin_width = bin_width if bin_width is not None else (max_value - min_value) / max_bins
            bins = np.arange(min_value, max_value + current_bin_width, current_bin_width)

            # Determine the number of decimal places for rounding based on cr_bin_width
            rounding_digits = 1 if abs(current_bin_width - round(current_bin_width, 1)) < 1e-9 else 2

            # Plot histogram
            plt.figure(figsize=(12, 8))
            ax = sns.histplot(df.iloc[:, 1], bins=bins, color="blue")

            plt.xlabel(f"Bond Length ({method_name})", fontsize=14)
            plt.ylabel("Count", fontsize=14)
            plt.title("Distribution of Bond Lengths", fontsize=14)

            # Set x-axis labels and adjust tick styles
            ax.set_xticks(bins)
            if len(bins) > 15:
                ax.set_xticklabels([f"{round(x, rounding_digits)}" for x in bins], rotation=45, fontsize=10, fontstyle='italic')
            else:
                ax.set_xticklabels([f"{round(x, rounding_digits)}" for x in bins], rotation=0, fontsize=12)

            plt.tight_layout()

            # Annotate bars
            for p in ax.patches:
                height = int(p.get_height())
                if height > 0:  # Annotate only non-zero heights
                    ax.annotate(f"{height}", (p.get_x() + p.get_width() / 2., height),
                                ha='center', va='center', fontsize=10, color='black', xytext=(0, 5),
                                textcoords='offset points')

            if not os.path.exists(plot_dir):
                os.makedirs(plot_dir)

            plot_filename = f"{method_name}.png"
            plot_path = join(plot_dir, plot_filename)
            plt.savefig(plot_path)
            plt.close()
            print(f"Plot saved: {plot_path}")

        except Exception as e:
            print(f"Error processing file {file_path}: {e}")
# Function to plot radial distribution function (rdf). Parameters: data_dir, output_dir, interval, start, end.
# data_dir: input file path, output_dir: output file path, interval: visualization interval, start: starting sample, end: ending sample
def plot_rdf(data_dir, output_dir, interval, start, end):
    os.makedirs(output_dir, exist_ok=True)
    file_list = sorted(glob.glob(join(data_dir, "*.csv")), key=lambda x: int(splitext(basename(x))[0].split('_')[-1]))
    if end is None:
        end = len(file_list)
    for i in range(start, end, interval):
        end_interval = min(i + interval, end)
        random_index = random.randint(i, end_interval - 1)
        file = file_list[random_index]
        try:
            dat = pd.read_csv(file)
            if dat.empty:
                continue
            plt.figure()
            plt.plot(dat['r'], dat['g(r)'])
            plt.xlabel('r (Å)', fontsize=14)
            plt.ylabel('g(r)', fontsize=14)
            plt.xticks(rotation=0, fontsize=12)
            plt.yticks(rotation=0, fontsize=12)
            plt.title(f'Radial Distribution Function - (Sample {random_index})')
            output_path = join(output_dir, f'sample_{random_index}_rdf.png')
            plt.savefig(output_path)
            plt.close()
            print(f"Plotted: {file}, saved to {output_path}")
        except Exception as e:
            print(f"Error processing file {file}: {e}")
## Function to copy all plots to a directory with naming based on their source folder
def copy_all_plots_to_directory(all_plot_path, data_types, save_paths):
    os.makedirs(all_plot_path, exist_ok=True)
    
    for data_type in data_types:
        if data_type in save_paths:
            source_folder = save_paths[data_type]
            plot_files = glob.glob(join(source_folder, "*.png"))
            
            for plot_file in plot_files:
                try:
                    file_name = splitext(basename(plot_file))[0]
                    dest_file = join(all_plot_path, f"{basename(source_folder)}_{file_name}.png")
                    shutil.copy(plot_file, dest_file)
                    print(f"Copied: {plot_file} to {dest_file}")
                except Exception as e:
                    print(f"Error copying file {plot_file}: {e}")
# data_dir: input file path, plot_dir: output file path, start: starting sample (default 0), end: ending sample (default None), file_pattern: file pattern
def plot_cn_method_distribution1(data_dir, plot_dir, start=0, end=None, file_pattern="*.csv"):
    """Traverses each file, plots a scatter distribution chart with arbitrary x-coordinates, and uses the filename as the title"""
    file_list = sorted(glob.glob(join(data_dir, file_pattern)))
    if not file_list:
        print(f"No files matching pattern {file_pattern} found in directory {data_dir}.")
        return

    for file_path in file_list:
        print(f"Processing file: {file_path}")
        try:
            df = pd.read_csv(file_path)
            if df.empty:
                print(f"Warning: {file_path} is empty.")
                continue

            # Extract method_name
            method_name = splitext(basename(file_path))[0].split('_', 1)[1]

            num_samples = len(df)  # Get the number of rows as sample count

            # If end is not specified, set it to the maximum number of samples
            if end is None:
                end = num_samples

            df = df.iloc[start:end]  # Slice using start and end
            if df.empty:
                print(f"No data in the specified index range for file {file_path}")
                continue

            # Generate arbitrary x-coordinates (indices for each point)
            x_coords = range(len(df))

            # Extract values for y-axis (second column of the data)
            y_values = df.iloc[:, 1]

            # Plot scatter chart
            plt.figure(figsize=(15, 8))
            plt.scatter(x_coords, y_values, color="blue", s=50, alpha=0.7)

            plt.xlabel(f"Sample Index ({method_name})", fontsize=14)
            plt.ylabel("Coordination Number", fontsize=14)
            plt.title("Scatter Plot of Coordination Numbers", fontsize=14)
            plt.xticks(fontsize=12)
            plt.yticks(fontsize=12)

            plt.tight_layout()

            if not os.path.exists(plot_dir):
                os.makedirs(plot_dir)

            plot_filename = f"{method_name}_scatter.png"
            plot_path = join(plot_dir, plot_filename)
            plt.savefig(plot_path)
            plt.close()
            print(f"Plot saved: {plot_path}")

        except Exception as e:
            print(f"Error processing file {file_path}: {e}")
def plot_cr_method_distribution1(data_dir, plot_dir, start=0, end=None, file_pattern="*.csv"):
    """Traverses each file, plots a scatter distribution chart with arbitrary x-coordinates, and uses the filename as the title"""
    file_list = sorted(glob.glob(join(data_dir, file_pattern)))
    if not file_list:
        print(f"No files matching pattern {file_pattern} found in directory {data_dir}.")
        return

    for file_path in file_list:
        print(f"Processing file: {file_path}")
        try:
            df = pd.read_csv(file_path)
            if df.empty:
                print(f"Warning: {file_path} is empty.")
                continue

            # Extract method_name
            method_name = splitext(basename(file_path))[0].split('_', 1)[1]

            num_samples = len(df)  # Get the number of rows as sample count

            # If end is not specified, set it to the maximum number of samples
            if end is None:
                end = num_samples

            df = df.iloc[start:end]  # Slice using start and end
            if df.empty:
                print(f"No data in the specified index range for file {file_path}")
                continue

            # Generate arbitrary x-coordinates (indices for each point)
            x_coords = range(len(df))

            # Extract values for y-axis (second column of the data)
            y_values = df.iloc[:, 1]

            # Plot scatter chart
            plt.figure(figsize=(15, 8))
            plt.scatter(x_coords, y_values, color="blue", s=50, alpha=0.7)

            plt.xlabel(f"Sample Index ({method_name})", fontsize=14)
            plt.ylabel("Bond Length", fontsize=14)
            plt.title("Scatter Plot of Bond Lengths", fontsize=14)
            plt.xticks(fontsize=12)
            plt.yticks(fontsize=12)

            plt.tight_layout()

            if not os.path.exists(plot_dir):
                os.makedirs(plot_dir)

            plot_filename = f"{method_name}_scatter.png"
            plot_path = join(plot_dir, plot_filename)
            plt.savefig(plot_path)
            plt.close()
            print(f"Plot saved: {plot_path}")

        except Exception as e:
            print(f"Error processing file {file_path}: {e}")


# Main program

In [7]:
# Define a dictionary mapping data types to their respective plotting functions
plot_functions = {
    "chi": lambda: plot_chi(chi_path, save_chi_path, interval, start, end),
    "xmu": lambda: plot_xmu(xmu_path, save_xmu_path, interval, start, end),
    "norm": lambda: plot_norm(norm_path, save_norm_path, interval, start, end),
    "wt_pic": lambda: copy_files_every_nth(wt_pic_path, save_wt_pic_path, interval, start, end),
    "rdf": lambda: plot_rdf(rdf_path, save_rdf_path, interval, start, end),
    "k2chi": lambda: plot_k2chi(k2chi_path, save_k2chi_path, interval, start, end),
    "cn": lambda: plot_cn_method_distribution1(cn_path, save_cn_path, start=start, end=end),
    "cr": lambda: plot_cr_method_distribution1(cr_path, save_cr_path, start=start, end=end)
}

# Execute the plotting functions based on the available data types
for data_type in data_types:
    if data_type in plot_functions:
        plot_functions[data_type]()  # Call the respective function for each data type

# If `all_in_one` is True, copy all generated plots to the specified directory (`all_plot_path`)
if all_in_one:
    copy_all_plots_to_directory(all_plot_path, data_types, save_paths)


Plotted: 0926-datasets/chi/654.csv, saved to 0926-datasets/plots_20250115_1649/chi/sample_654_chi.png
Plotted: 0926-datasets/chi/1114.csv, saved to 0926-datasets/plots_20250115_1649/chi/sample_1114_chi.png
Plotted: 0926-datasets/chi/2025.csv, saved to 0926-datasets/plots_20250115_1649/chi/sample_2025_chi.png
Plotted: 0926-datasets/chi/3759.csv, saved to 0926-datasets/plots_20250115_1649/chi/sample_3759_chi.png
Plotted: 0926-datasets/chi/4281.csv, saved to 0926-datasets/plots_20250115_1649/chi/sample_4281_chi.png
Plotted: 0926-datasets/chi/5000.csv, saved to 0926-datasets/plots_20250115_1649/chi/sample_5000_chi.png
Plotted: 0926-datasets/xmu/228.csv, saved to 0926-datasets/plots_20250115_1649/xmu/sample_228_xmu.png
Plotted: 0926-datasets/xmu/1142.csv, saved to 0926-datasets/plots_20250115_1649/xmu/sample_1142_xmu.png
Plotted: 0926-datasets/xmu/2754.csv, saved to 0926-datasets/plots_20250115_1649/xmu/sample_2754_xmu.png
Plotted: 0926-datasets/xmu/3104.csv, saved to 0926-datasets/plots_20