## Load the Cytochrome-b Sequences
We load the `.fasta` file using Biopython's `SeqIO` to parse the 12 species of penguin sequences.

In [1]:
from Bio import SeqIO

records = list(SeqIO.parse("penguins_cytb.fasta", "fasta"))
print(f"Loaded {len(records)} sequences")

Loaded 12 sequences


## Step-1: Document Dr. X's GET SEQUENCES FUNCTION

The `get_sequences_from_file(fasta_fn)` function reads a fasta file and extracts the DNA sequences for individual penguin species category. It analyses each record, takes the penguin species names from the sequence description, and stores the sequence as a value in dictionary where the key is the species name.

## Return types:
- 'dict' : A dictionary where the keys are the names of species of penguins, for example: "Aptenodytes forsteri", and values (sequences are saved as values in dictionary) are 'Seq' objects of the DNA sequences. 

## Arguements:
- 'fasta_fn' (str) : This is the path to a FASTA file that contains the DNA sequences. 




In [6]:
from Bio import SeqIO

def get_sequences_from_file(fasta_fn):
    """
    Extracts DNA sequences from a FASTA file and saves them in a dictionary using the name of species.

    Args:
        fasta_fn (str): Path to the FASTA file that contains the DNA sequences.

    Returns:
        dict: Dictionary with species names as keys and DNA sequences (Bio.Seq object) as values.
    """
    sequence_data_dict = {} # Creates an empty dictionary to save sequences
    for record in SeqIO.parse(fasta_fn, "fasta"): # Analyses each sequence record in the FASTA file
        description = record.description.split() # Splits the description line into parts
        species_name = description[1] + " " + description[2] # Get the species and genus names for 2nd and 3rd words
        sequence_data_dict[species_name] = record.seq # Saves the sequence with the species name as the key
    return sequence_data_dict # Returns the completed dictionary

In [5]:
cytb_seqs = get_sequences_from_file("penguins_cytb.fasta")
print(f"Loaded {len(cytb_seqs)} penguin sequences.")
list(cytb_seqs.keys())

Loaded 12 penguin sequences.


['Aptenodytes forsteri',
 'Aptenodytes patagonicus',
 'Eudyptes chrysocome',
 'Eudyptes chrysolophus',
 'Eudyptes sclateri',
 'Eudyptula minor',
 'Pygoscelis adeliae',
 'Pygoscelis antarctica',
 'Pygoscelis papua',
 'Spheniscus demersus',
 'Spheniscus humboldti',
 'Spheniscus magellanicus']

## Step-2: Translation of nucleotides to amino acids
A function that translates a string of nucleotides to amino acids based on Dr. X's pseudo-code suggestion. It loops through the sequence in three codons and converts each codon into an amino acid and builds a string. If a stop codon is found, the translation stops there.  