## Rename parts

This template is used for cases where we have an assembly plan with undomesticated and unprefixed part names.

1. a. Create an assembly plan using `template3_plan_template.ods`. Use the `original_names` sheet and specify the position prefixes used in GeneDom in the header, to create a final plan in the `plan` sheet.

   b. Alternatively, use the first section below to add prefixes, based on a column: prefix lookup list.

2. Export the final plan to csv (`template3_plan.csv`).

3. Run the below code to create new sequence files that are named according to the plan. These can be domesticated with GeneDom.

---
### Optional section (1b): prefix part names in plan

Parameters:

In [None]:
# Note that the first prefix is empty, for the backbone, but may be utilised in other cases:
column_prefixes = ["", "e1e2", "e2e3", "e3e4", "e4e5", "e5e0"]
path_to_plan_csv = "template3_plan_noprefix.csv"
prefixed_plan_path = "template3_plan_prefixed.csv"

In [None]:
import pandas as pd
plan = pd.read_csv(path_to_plan_csv, header=None)
prefixes = [""]

for prefix in column_prefixes:
    if prefix == "":
        prefixes += [prefix]
    else:  # not empty, need separator character
        prefixes += [prefix + "_"]

for col in plan.columns:
    prefix = prefixes[col]
    plan[col] = prefix + plan[col].astype(str)

plan.to_csv(prefixed_plan_path, header=None, index=None)

---

### Section 3:

Parameters:

In [None]:
dir_to_process = "original_parts/"
assembly_plan_path = "template3_plan.csv"
export_dir = "prefixed_sequences/"

Load in the part sequence files. This assumes that the file names are the sequence IDs:

In [None]:
import dnacauldron as dc
seq_records = dc.biotools.load_records_from_files(folder=dir_to_process, use_file_names_as_ids=True)
seq_records_names = [record.id for record in seq_records]
print(len(seq_records))

Read plan and obtain the part names:

In [3]:
import pandas as pd

In [4]:
plan = pd.read_csv(assembly_plan_path, header=None)

In [5]:
plan

Unnamed: 0,0,1,2,3,4,5,6
0,CONSTRUCT_1,HC_Amp_ccdB,e1e2_FLP,e2e3_promoter_1,e3e4_GFP,e4e5_terminator_1,e5e0_insulator
1,CONSTRUCT_2,HC_Amp_ccdB,e1e2_FLP,e2e3_promoter_2,e3e4_GFP,e4e5_terminator_1,e5e0_insulator
2,CONSTRUCT_3,HC_Amp_ccdB,e1e2_FLP,e2e3_promoter_2,e3e4_RFP,e4e5_terminator_2,e5e0_insulator


In [6]:
l = plan.iloc[:, 2:].values.tolist()  # first column is construct name, second column is backbone
flat_list = [item for sublist in l for item in sublist if str(item) != 'nan'] 

In [7]:
parts_in_plan = list(set(flat_list))

In [8]:
parts_in_plan

['e2e3_promoter_1',
 'e5e0_insulator',
 'e4e5_terminator_1',
 'e1e2_FLP',
 'e3e4_GFP',
 'e3e4_RFP',
 'e4e5_terminator_2',
 'e2e3_promoter_2']

Make a dictionary, find a record with matching name, save with new name in another list
(some records may be exported into multiple variants, if the same part is used in multiple positions):

In [12]:
dict_pos_name = {}
for part in parts_in_plan:
    part_cut = part.split('_', 1)[1]  # we split at the first underscore
    dict_pos_name[part] = part_cut

In [14]:
dict_pos_name

{'e2e3_promoter_1': 'promoter_1',
 'e5e0_insulator': 'insulator',
 'e4e5_terminator_1': 'terminator_1',
 'e1e2_FLP': 'FLP',
 'e3e4_GFP': 'GFP',
 'e3e4_RFP': 'RFP',
 'e4e5_terminator_2': 'terminator_2',
 'e2e3_promoter_2': 'promoter_2'}

In [None]:
import copy
pos_records = []  # collects records with position prefix added
for pos_name, old_name in dict_pos_name.items():
    for record in seq_records:
        if record.id == old_name:
            new_record = copy.deepcopy(record)
            new_record.name = pos_name
            new_record.id = pos_name
            pos_records.append(new_record)
            break

#### Save sequences

In [72]:
import os
# os.mkdir(export_dir)
for record in pos_records:
    filepath = os.path.join(export_dir, (record.name + ".gb"))
    with open(filepath, "w") as output_handle:
        SeqIO.write(record, output_handle, "genbank")