# File Renamer

## Evan Huang, Swinburne Lab, UC Berkeley

This program renames all files in a directory of sequencing data with the suffix .ab1 or .seq based on a user-inputted reference csv. It will make a copy of the renamed data with a copy of the original data. Note this program will not work with versions of Python older than 3.8. 

(add reference csv examples)

In [2]:
import os
import shutil
import csv
import re
import ipywidgets as widgets

This helper function reads in the reference csv and returns a nx2 matrix of indices and names. 

In [3]:
def read_csv(csv_path): 
    """ helper function to read reference csv. make sure csv has a header row """
    csv_file = open(csv_path)
    csvreader = csv.reader(csv_file)
    next(csvreader)
    matrix = []
    for row in csvreader: 
        matrix.append(row)
    csv_file.close()
    
    # check csv in correct format
    assert len(matrix) > 0
    assert len(matrix[0]) >= 2
    
    return matrix
    

This is the function that renames the files. It takes in the path to the reference csv, the path to the sequencing files directory, the output path. It searches for the index of each file in the original directory by looking for the number after a dash in each files name, so please ensure the file indices correspond in your input csv. The suffix variable should be set to whatever file type you would like to rename ('.ab1', '.seq', etc.)  or 'both' if you want to rename .ab1 and .seq files. 

In [4]:
def renamer(ref_csv, input_path, output_path, suffix, prepend=False): 
    
    if suffix == 'both': 
        renamer(ref_csv, input_path, output_path, '.ab1', prepend)
        suffix = '.seq'
        
        
    # add period to front of suffix if not added by user
    if suffix[0] != '.': 
        suffix = '.' + suffix
        
    
    names_ref = read_csv(ref_csv)
    
    # make output dir if doesn't already exist
    if not os.path.isdir(output_path): 
        os.mkdir(output_path)
        
    # make a copy of original directory and put it in output path (for safekeeping purposes)
    copy_data_path = output_path + "/data_copy"
    if not os.path.exists(copy_data_path): 
        os.mkdir(copy_data_path)
    shutil.copytree(input_path, copy_data_path, dirs_exist_ok=True)
    
    # get all genotyping files and their indices
    input_dir_raw = os.listdir(input_path)
    ab_files = [x for x in input_dir_raw if x[-len(suffix):] == suffix]
    ab_file_indices = [int(re.findall('-\d+', x)[0][1:]) for x in ab_files]
    
    # get corresponding name for each file based on index and suffix, not including the suffix in the return value
    def findName(ab_file_index): 
        for i in names_ref: 
            if len(i[0]) > 0 and len(i[1]) > 0 and int(i[0]) == ab_file_index: 
                if i[1][-len(suffix):] == suffix: 
                    return i[0] + "_" + i[1][:-len(suffix)]
                else: 
                    return i[0] + "_" + i[1]
        raise Exception("cannot find index in csv")
    
    # copy and rename each file using reference csv
    for i in range(len(ab_file_indices)):
        new_path = shutil.copy2(input_path+"/"+ab_files[i], output_path)
        new_name_raw = findName(ab_file_indices[i])
        if not prepend: 
            new_name = f"{output_path}/{new_name_raw}{suffix}"
            
        else: 
            new_name = f"{output_path}/{new_name_raw}_{ab_files[i]}"
        os.rename(src=new_path, dst=new_name)

# [Setup Guide](https://docs.google.com/document/d/1nstSoI9pFRei7pu8AAqdtOgbb1B34kcdZqy_U5lsKfM/edit?usp=sharing)
This guide contains step-by-step instructions to run this renamer. Please be sure to read it in its entirety to ensure the program works as intended. Below are example runs of the renamer, including a widget for the function. 

In [5]:
widgets.interact_manual(renamer, 
                 ref_csv="", 
                 input_path="", 
                 output_path="",
                 suffix="",
                 prepend="")

interactive(children=(Text(value='', description='ref_csv'), Text(value='', description='input_path'), Text(va…

<function __main__.renamer(ref_csv, input_path, output_path, suffix, prepend=False)>

In [5]:
renamer("Test Data/name_ref.csv", 
        "Test Data/files_to_rename", 
        "Test Data/output_dir", 
        "both")

