# Pairing OIB and ZIP Files by Numerical Prefix After Filename Cleaning

This script first cleans the filenames using the clean_filenames function to standardize and prepare them for matching. After cleaning, the script matches .oib and .zip files by their shared numerical prefix, ￼. It accomplishes this by parsing each cleaned filename, extracting the relevant prefix, and then grouping files into pairs where each .oib file is paired with its corresponding .zip file based on their common prefix.

The matching is based on the following filename patterns:

	•	.oib files: These files start with a number followed by an underscore (e.g., 1_Cry11_2mg_2h_MitoT_750nM_60xmed_post_3.5_Z.oib).
	•	.zip files: These files begin with RoiSet_, followed by the same number and an underscore (e.g., RoiSet_1_Cry11_2mg_2h_MitoT_750nM_60xmed_post_3.5_Z.zip).

The script performs the following steps:

	1.	Clean Filenames: It first calls clean_filenames on the input list of filenames to ensure consistent naming for accurate matching.
	2.	Define Regex Patterns: Uses regular expressions to capture the number prefix in cleaned .oib and .zip filenames.
	3.	File Grouping: Iterates through the cleaned filenames, classifying .oib and .zip files into separate dictionaries (oib_files and zip_files) using the extracted prefix as a key.
	4.	File Pairing: Matches files by identifying prefixes present in both dictionaries, resulting in a list of tuples, with each tuple containing an .oib file and its corresponding .zip file.

The output is a list of paired filenames, based on the shared numerical prefix, that have been cleaned and matched accurately.

## Install Libraries

In [7]:
print("Installing necessary libraries...")
!pip install natsort oiffile ome-zarr read_roi > /dev/null 2>&1
print("Libraries installed successfully.")

Installing necessary libraries...
Libraries installed successfully.


## Import Libraries

In [8]:
import os
from oiffile import imread
import pandas as pd
from natsort import natsorted

## Define File Paths

In [9]:
# Directory containing the files
directory_path = '/home/jovyan/LNMA/bravoa/data/New2 Fig para colocalizacion Manders-Mito'

# Functions

In [21]:
def clean_filenames(directory_path):
    # Initialize a list to store cleaned file names
    cleaned_file_names = []
    # Process each file in the specified directory
    for file_name in os.listdir(directory_path):
        file_name = file_name.replace(' ', '_').replace('-', '_').replace('/', '_')
        file_name = file_name.replace('+', '_').replace('_copy', '')
        # Append the cleaned file name to the list
        cleaned_file_names.append(file_name)
    # Return the list of cleaned file names
    return natsorted(cleaned_file_names)

In [15]:
# Example usage
files_names_cleaned = clean_filenames(directory_path)
files_names_cleaned

['1_Cry11_2mg_2h_MitoT_750nM_60xmed_post_3.5_Z.oib',
 '1_Cry11_2mg_2h_MitoT_750nM_60xmed_post_3.5_Z.zarr',
 '2_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_bis.oib',
 '3_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_3.oib',
 '4_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_4.oib',
 '5_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_6.oib',
 '6_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant_bisbis_Z3.5.oib',
 '7_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant_int2_Z3.5.oib',
 '8_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant__bis4_int2_Z3.5.oib',
 '9_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant__bis5_int2_Z3.5.oib',
 'RoiSet_1_Cry11_2mg_2h_MitoT_750nM_60xmed_post_3.5_Z.zip',
 'RoiSet_2_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_bis.zip',
 'RoiSet_3_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_3.zip',
 'RoiSet_4_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_4.zip',
 'RoiSet_5_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_6.zip',
 'RoiSet_6_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant_bisbis_Z3.5.zip',
 'RoiSet_7_Cry11Ba_1mg_3h_MitoTrack

In [22]:
import re
from collections import defaultdict

# Dictionaries to store .oib and .zip files by their prefix number
oib_files = {}
zip_files = {}

# Regular expressions to match prefixes in filenames
oib_pattern = re.compile(r'^(\d+)_.*\.oib$')
zip_pattern = re.compile(r'^RoiSet_(\d+)_.*\.zip$')

# Populate the dictionaries with files based on their number prefix
for filename in files_names_cleaned:
    oib_match = oib_pattern.match(filename)
    zip_match = zip_pattern.match(filename)
    
    if oib_match:
        prefix = oib_match.group(1)
        oib_files[prefix] = filename
    elif zip_match:
        prefix = zip_match.group(1)
        zip_files[prefix] = filename

# Pair .oib and .zip files by their prefix
paired_files = [(oib_files[key], zip_files[key]) for key in oib_files.keys() if key in zip_files]

# Display the result
paired_files= natsorted(paired_files)

paired_files


[('1_Cry11_2mg_2h_MitoT_750nM_60xmed_post_3.5_Z.oib',
  'RoiSet_1_Cry11_2mg_2h_MitoT_750nM_60xmed_post_3.5_Z.zip'),
 ('2_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_bis.oib',
  'RoiSet_2_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_bis.zip'),
 ('3_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_3.oib',
  'RoiSet_3_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_3.zip'),
 ('4_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_4.oib',
  'RoiSet_4_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_4.zip'),
 ('5_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_6.oib',
  'RoiSet_5_Cry11_2mg_2h_MitoT_750nM_60x__post_Z3.5_bis_6.zip'),
 ('6_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant_bisbis_Z3.5.oib',
  'RoiSet_6_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant_bisbis_Z3.5.zip'),
 ('7_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant_int2_Z3.5.oib',
  'RoiSet_7_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant_int2_Z3.5.zip'),
 ('8_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant__bis4_int2_Z3.5.oib',
  'RoiSet_8_Cry11Ba_1mg_3h_MitoTrack_750nM_60X_ant__bis4_int2_Z3.5