This notebook is for troubleshooting plastomes using NCBI-provided software pipelines. It was a suggestion from NCBI team rejected the annotation twice after uploading.

In [11]:
import os

# Defining variables for files
template = 'plastomes/final/template.sbt'
gb = 'plastomes/final/Crepis_callicephala.gb'
fasta = 'plastomes/in/Crepis_callicephala.fasta'

# command itself:
# table2asn -t template.sbt -i sequence.fsa

In [12]:
# check for fasta

from Bio import SeqIO, SeqRecord

try:
    with open(fasta, "r") as fasta_file:
        print(f"File {fasta} does exist.")
except FileNotFoundError:
    print(f"File '{fasta}' not found.\nConverting genbank file '{gb}' to FASTA...")
    try:
        first_record = next(SeqIO.parse(gb, "genbank"))
        SeqIO.write(first_record, fasta, "fasta")
        print(f"Successfully created '{fasta}'.")
    except FileNotFoundError:
        print(f"Error: The source GenBank file '{gb}' was not found.")
    except StopIteration:
        print(f"Error: The GenBank file '{gb}' is empty.")

File 'plastomes/in/Crepis_callicephala.fasta' not found.
Converting genbank file 'plastomes/final/Crepis_callicephala.gb' to FASTA...
Successfully created 'plastomes/in/Crepis_callicephala.fasta'.


## Run `table2asn` program

There are some recommendations on NCBI site that were not illuminated at Readme file of the program.
`table2asn` will recognize files with **the same basename** as the input sequence file. Sequences that are part of a plasmid, or an organellar chromosome, or specific nuclear chromosomes need to have that information included in the fasta definition line, in these formats:

- [location=mitochondrion]
- [location=chloroplast]

Sequences that are a complete circular chromosome or plasmid need to have the circular topology and the completeness included.

- [topology=circular] [completeness=complete]
- [topology=circular] gap at end, not circularized




In [14]:
# preparing directory
# all the files should be stored at the same directory
import os

project_dir = 'plastomes/table2asn'


SHORT_NAMES = {
    "Crepis_callicephala": "cc",
    "Crepis_purpurea": "cp",
}

def create_symlinks(project_dir: str, sourcefile: str):
    """
    Create symlinks for files to process.
    """
    basename = os.path.basename(sourcefile)
    #print("basename", basename)
    species = basename.rsplit(".")[0]
    #print("species", species)
    ext= basename.rsplit(".")[-1]
    #print("ext", ext)
    label = SHORT_NAMES[species]
    #print("label", label)
    target_dir = os.path.join(project_dir, label)
    #print("target_dir", target_dir)
    target_filename = os.path.join(target_dir, f"{label}.{ext}")
    #print("target_filename", target_filename)

    try:
        os.symlink(sourcefile, target_filename)
        print(f"Symlink '{target_filename}' for '{sourcefile}' was successfully created.")
    except FileExistsError:
        print(f"Symlink for '{sourcefile}' is already exist.")
    except PermissionError:
        print(f"You have no permissions for this action.")


print("Creating symbolic links:")
create_symlinks(project_dir, gb)
create_symlinks(project_dir, fasta)

Creating symbolic links:
Symlink for 'plastomes/final/Crepis_callicephala.gb' is already exist.
Symlink for 'plastomes/in/Crepis_callicephala.fasta' is already exist.
