# Guide to Using the SearchOscillograms Class

This notebook provides a guide on how to use the `SearchOscillograms` class from the `preparing_oscillograms.search_oscillograms` module. The `SearchOscillograms` class is designed to help find, copy, and organize oscillogram files.

We will cover the following key methods:

1.  `copy_new_oscillograms`: For copying oscillogram files from a source directory to a destination, with options for filtering by type and avoiding duplicates using hashes.
2.  `find_terminal_hashes_from_json`: For identifying oscillogram hashes associated with specific terminal numbers, based on a JSON file containing hash-to-path mappings.
3.  `organize_oscillograms_by_terminal`: For arranging oscillogram files into a structured directory format, where files are sorted into subfolders named after their respective terminals.

**Note:** Before running the code cells, ensure that the `SearchOscillograms` class is accessible in your Python environment and that you have created the necessary sample directories and files for the examples to work correctly.

## 1. `copy_new_oscillograms`

This method is used to scan a source directory, identify various types of oscillogram files, and copy them to a destination directory. It can optionally preserve the original directory structure and uses MD5 hashes to prevent copying duplicate files. It also generates JSON files (`_hash_table.json` and `new_hash_table_<timestamp>.json`) to keep track of copied files.

### Parameters:

*   `source_dir` (str): The path to the directory where the original oscillogram files are located.
*   `dest_dir` (str): The path to the directory where the selected oscillogram files will be copied.
*   `copied_hashes` (dict, optional): A dictionary containing hashes of already processed files. This is useful for resuming copying or avoiding reprocessing. For a first run, this is typically an empty dictionary `{}`. If you have a `_hash_table.json` from a previous run, you can load it here.
*   `preserve_dir_structure` (bool, optional): If `True` (default), the original folder structure from `source_dir` will be replicated in `dest_dir`. If `False`, files will be copied directly into `dest_dir` (or type-specific subfolders like `COMTRADE_CFG_DAT`).
*   `use_hashes` (bool, optional): If `True` (default), MD5 hashes of `.dat` files (for COMTRADE) or the files themselves (for other types) are used to check for duplicates. Only new files will be copied.
*   `use_comtrade` (bool, optional): Set to `True` to copy COMTRADE files (`.cfg` and `.dat`). Default is `True`.
*   `use_new_comtrade` (bool, optional): Set to `True` to copy COMTRADE CFF files (`.cff`). Default is `True`.
*   `use_brs` (bool, optional): Set to `True` to copy Bresler files (`.brs`). Default is `True`.
*   `use_neva` (bool, optional): Set to `True` to copy Neva files (`.os*`). Default is `True`.
*   `use_ekra` (bool, optional): Set to `True` to copy Ekra files (`.dfr`). Default is `True`.
*   `use_parma` (bool, optional): Set to `True` to copy Parma files (`.do`, `.to`). Default is `True`.
*   `use_black_box` (bool, optional): Set to `True` to copy "black box" files (`.bb`). Default is `True`.
*   `use_res_3` (bool, optional): Set to `True` to copy RES-3 files (`.sg2`). Default is `True`.
*   `use_osc` (bool, optional): Set to `True` to copy generic OSC files (`.osc`). Default is `True`.

Other parameters like `_new_copied_hashes`, `_first_run`, `_path_temp`, `progress_callback`, `stop_processing_fn`, `is_write_names` are primarily for internal use or integration with graphical interfaces and can usually be left to their default values for basic scripting.

In [ ]:
import os
import shutil
import json
from preparing_oscillograms.search_oscillograms import SearchOscillograms, TYPE_OSC

# --- Setup Example Directories and Files ---
# Create dummy directories for the example
# You might want to create these manually or adapt paths to your existing data.

# Source directory with some sample oscillogram files
source_example_dir = "sample_data/raw_oscillograms"
# Destination directory for copied files
dest_example_dir = "output_data/copied_oscillograms"
# Path for the hash table (example)
hash_table_path = os.path.join(dest_example_dir, "_hash_table.json")

# Clean up previous run's example directories (optional)
if os.path.exists(source_example_dir):
    shutil.rmtree(source_example_dir)
if os.path.exists(dest_example_dir):
    shutil.rmtree(dest_example_dir)

os.makedirs(os.path.join(source_example_dir, "subdir1"), exist_ok=True)
os.makedirs(dest_example_dir, exist_ok=True)

# Create some dummy oscillogram files for demonstration
# COMTRADE example
with open(os.path.join(source_example_dir, "osc1.cfg"), "w") as f:
    f.write("dummy cfg content")
with open(os.path.join(source_example_dir, "osc1.dat"), "w") as f:
    f.write("dummy dat content for osc1")

with open(os.path.join(source_example_dir, "subdir1", "osc2.cfg"), "w") as f:
    f.write("dummy cfg content")
with open(os.path.join(source_example_dir, "subdir1", "osc2.dat"), "w") as f:
    f.write("dummy dat content for osc2")

# CFF example
with open(os.path.join(source_example_dir, "osc3.cff"), "w") as f:
    f.write("dummy cff content")
    
# BRS example
with open(os.path.join(source_example_dir, "osc4.brs"), "w") as f:
    f.write("dummy brs content")

print(f"Created sample files in: {source_example_dir}")
print(f"Files: {os.listdir(source_example_dir)}")
print(f"Subdir1 files: {os.listdir(os.path.join(source_example_dir, 'subdir1'))}")
print(f"Destination directory will be: {dest_example_dir}")
print("----------------------------------------------------")

# --- Instantiate SearchOscillograms ---
search_ops = SearchOscillograms()

# --- Load existing hashes if available (optional) ---
# For a first run, copied_hashes would be {}
# If resuming, you might load a previously saved hash table:
# if os.path.exists(hash_table_path):
#     with open(hash_table_path, 'r') as f:
#         copied_hashes_example = json.load(f)
# else:
#     copied_hashes_example = {}
copied_hashes_example = {} # Starting fresh for this example

# --- Call copy_new_oscillograms ---
print(f"Starting copy_new_oscillograms...")
print(f"Source: {source_example_dir}")
print(f"Destination: {dest_example_dir}")

# Example: Copy only COMTRADE (.cfg/.dat) and CFF (.cff) files, preserving structure
count = search_ops.copy_new_oscillograms(
    source_dir=source_example_dir,
    dest_dir=dest_example_dir,
    copied_hashes=copied_hashes_example,
    preserve_dir_structure=True,
    use_hashes=True,
    use_comtrade=True,        # Enable COMTRADE (.cfg, .dat)
    use_new_comtrade=True,    # Enable COMTRADE (.cff)
    use_brs=False,            # Disable BRS for this example
    use_neva=False,
    use_ekra=False,
    use_parma=False,
    use_black_box=False,
    use_res_3=False,
    use_osc=False
)

print(f"----------------------------------------------------")
print(f"copy_new_oscillograms finished.")
print(f"Number of new files copied: {count}")

print(f"Contents of destination directory ({dest_example_dir}):")
for root, dirs, files in os.walk(dest_example_dir):
    for name in files:
        print(os.path.join(root, name))
    for name in dirs:
        print(os.path.join(root, name))
        
# You should find 'osc1.cfg', 'osc1.dat', 'subdir1/osc2.cfg', 'subdir1/osc2.dat' (in COMTRADE_CFG_DAT)
# and 'osc3.cff' (in COMTRADE_CFF) in the dest_example_dir.
# Also, '_hash_table.json' and 'new_hash_table_<timestamp>.json' will be created in dest_example_dir.

## 2. `find_terminal_hashes_from_json`

This method searches for oscillogram hashes related to specific terminal numbers. It takes a JSON file (typically `_hash_table.json` generated by `copy_new_oscillograms` or a similar file mapping hashes to file paths and names) and a list of terminal numbers. It then identifies which oscillograms likely belong to these terminals based on:
1.  Filename: If the oscillogram filename starts with `t<terminal_number_padded_to_5_digits>` (e.g., `t00001.cfg`).
2.  Filepath: If the filepath contains the marker "ОТГРУЖЕННЫЕ ТЕРМИНАЛЫ И ШКАФЫ" AND the terminal number (padded to 5 or 4 digits) as a directory name in the path.

The output is a new JSON file mapping each specified terminal number (as a string) to a list of associated oscillogram hashes.

### Parameters:

*   `input_json_path` (str): Path to the input JSON file. This file should have a structure like: `{"hash_value": ["filename.cfg", "full/path/to/filename.cfg"], ...}`. The `_hash_table.json` created by `copy_new_oscillograms` is suitable for this.
*   `terminal_numbers_to_find` (list[int]): A list of integer terminal numbers you want to find oscillograms for.
*   `output_json_path` (str): Path where the resulting JSON file will be saved. The output format will be: `{"terminal_num_as_string": ["hash1", "hash2", ...], ...}`.

In [ ]:
import os
import json
from preparing_oscillograms.search_oscillograms import SearchOscillograms

# --- Setup for Example ---
# Assume 'output_data/copied_oscillograms/_hash_table.json' was created by the previous step.
# For this example, we'll create a dummy _hash_table.json if it doesn't exist.

base_output_dir = "output_data/copied_oscillograms" # From previous example
input_json_example_path = os.path.join(base_output_dir, "_hash_table.json")
output_json_example_path = "output_data/terminal_hashes.json" # Where results will be saved

# Ensure the output directory for terminal_hashes.json exists
os.makedirs("output_data", exist_ok=True)

# Create a dummy _hash_table.json for demonstration if not present
# This simulates the output of copy_new_oscillograms
if not os.path.exists(input_json_example_path):
    print(f"Warning: '{input_json_example_path}' not found. Creating a dummy version for demonstration.")
    os.makedirs(base_output_dir, exist_ok=True)
    dummy_hash_table_content = {
        "hash1_t00001_cfg": ["t00001.cfg", "sample_data/raw_oscillograms/t00001.cfg"],
        "hash2_some_other_cfg": ["some_other.cfg", "sample_data/raw_oscillograms/some_other.cfg"],
        "hash3_terminal_path_t00002": ["another.cfg", "archive_data/ОТГРУЖЕННЫЕ ТЕРМИНАЛЫ И ШКАФЫ/00002/another.cfg"],
        "hash4_t00003_cfg_in_shipped": ["t00003.cfg", "archive_data/ОТГРУЖЕННЫЕ ТЕРМИНАЛЫ И ШКАФЫ/00003/t00003.cfg"],
        "hash5_unrelated": ["unrelated.cfg", "somewhere/else/unrelated.cfg"]
    }
    with open(input_json_example_path, 'w', encoding='utf-8') as f:
        json.dump(dummy_hash_table_content, f, indent=4)
    print(f"Created dummy '{input_json_example_path}'")

print(f"Using input JSON: {input_json_example_path}")
print(f"Output JSON will be: {output_json_example_path}")
print("----------------------------------------------------")

# --- Instantiate SearchOscillograms ---
# search_ops = SearchOscillograms() # Already instantiated in the previous cell if running sequentially

# --- Define parameters ---
terminals_to_search_example = [1, 2, 3, 4] # Example terminal numbers

# --- Call find_terminal_hashes_from_json ---
print(f"Starting find_terminal_hashes_from_json for terminals: {terminals_to_search_example}...")
search_ops.find_terminal_hashes_from_json(
    input_json_path=input_json_example_path,
    terminal_numbers_to_find=terminals_to_search_example,
    output_json_path=output_json_example_path
)
print(f"----------------------------------------------------")
print(f"find_terminal_hashes_from_json finished.")

if os.path.exists(output_json_example_path):
    print(f"Results saved to: {output_json_example_path}")
    with open(output_json_example_path, 'r', encoding='utf-8') as f:
        results = json.load(f)
        print("Content of the output JSON:")
        print(json.dumps(results, indent=4, ensure_ascii=False))
else:
    print(f"Error: Output file '{output_json_example_path}' was not created.")

# Expected output (structure might vary based on dummy data, but should find hashes for terminals 1, 2, 3):
# {
#     "1": ["hash1_t00001_cfg"],
#     "2": ["hash3_terminal_path_t00002"],
#     "3": ["hash4_t00003_cfg_in_shipped"],
#     "4": [] 
# }

## 3. `organize_oscillograms_by_terminal`

This method copies oscillogram files (specifically `.cfg` and their associated `.dat` files) from a source directory into a structured destination directory. The destination directory will contain subfolders named after terminal identifiers, and the corresponding oscillogram files will be placed within these subfolders.

This method relies on a list of terminal names and a dictionary that maps these terminal names to the oscillogram identifiers (hashes or filenames) that belong to them.

### Parameters:

*   `source_dir` (str): The directory where the oscillogram files (e.g., `.cfg`, `.dat`) are currently stored. This is often the output directory of `copy_new_oscillograms` (e.g., `output_data/copied_oscillograms/COMTRADE_CFG_DAT`).
*   `dest_dir` (str): The root directory where the organized oscillograms will be copied. Subdirectories for each terminal will be created here.
*   `terminal_list` (list): A list of strings, where each string is a terminal identifier/name (e.g., `["Terminal_1", "Terminal_2"]`). These names will be used to create subfolders in `dest_dir`.
*   `terminal_oscillogram_names` (dict): A dictionary that maps terminal identifiers (from `terminal_list`) to a list of oscillogram identifiers.
    *   Example: `{"Terminal_1": ["hashA", "hashB"], "Terminal_2": ["hashC"]}`
    *   The oscillogram identifiers are typically hashes (if `is_hashes=True`) or base filenames without extensions (if `is_hashes=False`). This dictionary can be derived from the output of `find_terminal_hashes_from_json`.
*   `is_hashes` (bool, optional): Defaults to `True`. If `True`, the oscillogram identifiers in `terminal_oscillogram_names` are treated as MD5 hashes (filename without extension). The method will look for `.cfg` files whose base name matches these hashes in the `source_dir`. If `False`, the identifiers are treated as direct base filenames (e.g., "osc1" for "osc1.cfg"), and the method will calculate the hash from the corresponding `.dat` file to match against `terminal_oscillogram_names`. The problem description implies the input `terminal_oscillogram_names` usually contains hashes from `find_terminal_hashes_from_json`, so `is_hashes=True` is typical.

**Important Note on `source_dir` for `organize_oscillograms_by_terminal`:**
The `organize_oscillograms_by_terminal` method expects to find `.cfg` files directly (or in subdirectories if `preserve_dir_structure` was `True` during the initial copy) within the `source_dir` you provide to *it*. If `copy_new_oscillograms` placed COMTRADE files into a type-specific subdirectory (e.g., `output_data/copied_oscillograms/COMTRADE_CFG_DAT`), then *that* specific subdirectory should be used as the `source_dir` for `organize_oscillograms_by_terminal`.

In [ ]:
import os
import shutil
import json
from preparing_oscillograms.search_oscillograms import SearchOscillograms

# --- Setup for Example ---
# This example assumes:
# 1. Oscillograms were copied by `copy_new_oscillograms` into a directory, 
#    e.g., 'output_data/copied_oscillograms/COMTRADE_CFG_DAT'.
# 2. `find_terminal_hashes_from_json` produced a JSON file mapping terminal numbers to hashes,
#    e.g., 'output_data/terminal_hashes.json'.

# Source directory where .cfg/.dat files are located (after copy_new_oscillograms)
# IMPORTANT: Adjust this path if your copy_new_oscillograms output COMTRADE files to a specific subfolder like "COMTRADE_CFG_DAT"
organize_source_dir = "output_data/copied_oscillograms/COMTRADE_CFG_DAT" # Example path

# Destination for organized files
organized_dest_dir = "output_data/organized_by_terminal"

# Path to the JSON file from find_terminal_hashes_from_json
terminal_hashes_json_path = "output_data/terminal_hashes.json" # From previous example

# Clean up previous run's example directory (optional)
if os.path.exists(organized_dest_dir):
    shutil.rmtree(organized_dest_dir)
os.makedirs(organized_dest_dir, exist_ok=True)

# --- Create Dummy Source Files and Terminal Hashes JSON for this example ---
# This ensures the example is runnable even if previous steps were skipped or modified.

# Create dummy source oscillogram files (simulating output of copy_new_oscillograms)
# These filenames should be hashes if is_hashes=True (default) in organize_oscillograms_by_terminal
os.makedirs(organize_source_dir, exist_ok=True) 

# Corresponding .dat files are also needed if is_hashes=False, 
# but for is_hashes=True, only .cfg files named <hash>.cfg are primary.
# The function internally constructs .dat names from .cfg names.
# Let's assume the hashes from our dummy terminal_hashes.json are the filenames.
dummy_cfg_files_to_create = {
    "hash1_t00001_cfg.cfg": "dummy cfg content for hash1", # Note: filename is hash + .cfg
    "hash3_terminal_path_t00002.cfg": "dummy cfg content for hash3",
    "hash4_t00003_cfg_in_shipped.cfg": "dummy cfg content for hash4"
}
for fname, content in dummy_cfg_files_to_create.items():
    with open(os.path.join(organize_source_dir, fname), "w") as f:
        f.write(content)
    # Create dummy .dat files as well, as SearchOscillograms checks for their existence
    dat_fname = fname[:-4] + ".dat"
    with open(os.path.join(organize_source_dir, dat_fname), "w") as f:
        f.write("dummy dat content for " + dat_fname)

print(f"Created dummy oscillogram files in: {organize_source_dir}")
print(f"Files: {os.listdir(organize_source_dir)}")

# Create a dummy terminal_hashes.json (simulating output of find_terminal_hashes_from_json)
if not os.path.exists(terminal_hashes_json_path):
    print(f"Warning: '{terminal_hashes_json_path}' not found. Creating a dummy version.")
    dummy_terminal_hashes = {
        "1": ["hash1_t00001_cfg"],  # These are hashes (filenames without .cfg)
        "2": ["hash3_terminal_path_t00002"],
        "3": ["hash4_t00003_cfg_in_shipped"],
        "4": [] 
    }
    # Ensure the directory for terminal_hashes.json exists if we're creating it
    os.makedirs(os.path.dirname(terminal_hashes_json_path), exist_ok=True)
    with open(terminal_hashes_json_path, 'w', encoding='utf-8') as f:
        json.dump(dummy_terminal_hashes, f, indent=4)
    print(f"Created dummy '{terminal_hashes_json_path}'")

print(f"Source for organization: {organize_source_dir}")
print(f"Destination for organized files: {organized_dest_dir}")
print(f"Using terminal hashes from: {terminal_hashes_json_path}")
print("----------------------------------------------------")

# --- Load terminal_oscillogram_names from the JSON file ---
try:
    with open(terminal_hashes_json_path, 'r', encoding='utf-8') as f:
        terminal_oscillogram_data = json.load(f)
except FileNotFoundError:
    print(f"Error: Terminal hashes JSON file not found at '{terminal_hashes_json_path}'. Cannot proceed.")
    terminal_oscillogram_data = {} # Assign empty dict to avoid error later if user wants to see structure

# The `organize_oscillograms_by_terminal` method expects terminal names as keys.
# The JSON from `find_terminal_hashes_from_json` uses terminal numbers (as strings) as keys.
# We can use these directly if we consider terminal numbers as their "names" for folders.
# Or, we can map them if needed, e.g., "Terminal_1", "Terminal_2". For simplicity, use numbers as names.
terminal_list_example = list(terminal_oscillogram_data.keys()) # ["1", "2", "3", "4"]
# terminal_oscillogram_names is already in the correct format: {"terminal_str_id": [hash_list]}

# --- Instantiate SearchOscillograms ---
# search_ops = SearchOscillograms() # Already instantiated if running cells sequentially

# --- Call organize_oscillograms_by_terminal ---
print(f"Starting organize_oscillograms_by_terminal...")
if terminal_oscillogram_data: # Proceed only if data was loaded
    search_ops.organize_oscillograms_by_terminal(
        source_dir=organize_source_dir, # Directory containing <hash>.cfg and <hash>.dat files
        dest_dir=organized_dest_dir,
        terminal_list=terminal_list_example,
        terminal_oscillogram_names=terminal_oscillogram_data, # Use the loaded data directly
        is_hashes=True # Since the keys in terminal_oscillogram_data are hashes (filenames without .cfg)
    )
    print(f"----------------------------------------------------")
    print(f"organize_oscillograms_by_terminal finished.")

    print(f"Contents of organized destination directory ({organized_dest_dir}):")
    for root, dirs, files in os.walk(organized_dest_dir):
        for name in files:
            print(os.path.join(root, name))
        for name in dirs:
            print(os.path.join(root, name))
            
    # Expected output:
    # output_data/organized_by_terminal/1/hash1_t00001_cfg.cfg
    # output_data/organized_by_terminal/1/hash1_t00001_cfg.dat
    # output_data/organized_by_terminal/2/hash3_terminal_path_t00002.cfg
    # output_data/organized_by_terminal/2/hash3_terminal_path_t00002.dat
    # ... and so on for terminal 3. Terminal 4's folder might be empty or not created if no hashes.
else:
    print("Skipped running organize_oscillograms_by_terminal due to missing terminal data.")
