# Working with Features using the Dseqrecord class

Features are important components of a .gb file, describing the key biological properties of the sequence. In Genbank, features "include genes, gene products, as well as regions of biological significance reported in the sequence." (See [here](https://www.ncbi.nlm.nih.gov/genbank/samplerecord/) for a description of a Genbank file and associated terminologies/annotations) Examples include coding sequences (CDS), introns, promoters, etc.

pydna offers many ways to easily view, add, extract, and write features into a Genbank file via the `Dseqrecord` class. Before working with features, you need to import the Genbank files (or other biological formats including FASTA, snapgene, EMBL) into python. Please refer to the Importing_Seqs page for a quick how-to tutorial.

After importing the file, we can checkout the list of features in a sequence using the following code:

In [None]:
from pydna.dseqrecord import Dseqrecord

# Assuming you've already loaded a Dseqrecord
file = Dseqrecord("your_sequence_or_file_here")

# List all features
for feature in file.features:
    print(feature)

For the sample record from Genbank shown [here](https://www.ncbi.nlm.nih.gov/genbank/samplerecord/), the listed features look like this in python:

type: source  
location: [0:5028](+)  
qualifiers:  
    Key: chromosome, Value: ['IX']  
    Key: db_xref, Value: ['taxon:4932']  
    Key: mol_type, Value: ['genomic DNA']  
    Key: organism, Value: ['Saccharomyces cerevisiae']  

type: mRNA  
location: [<0:>206](+)  
qualifiers:  
    Key: product, Value: ['TCP1-beta']  

type: CDS  
location: [<0:206](+)  
qualifiers:  
    Key: codon_start, Value: ['3']  
    Key: product, Value: ['TCP1-beta']  
    Key: protein_id, Value: ['AAA98665.1']  
    Key: translation, Value: ['SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEAAEVLLRVDNIIRARPRTANRQHM']  

type: gene  
location: [<686:>3158](+)  
qualifiers:  
    Key: gene, Value: ['AXL2']  
...
    Key: product, Value: ['Rev7p']  
    Key: protein_id, Value: ['AAA98667.1']  
    Key: translation, Value: ['MNRWVEKWLRVYLKCYINLILFYRNVYPPQSFDYTTYQSFNLPQFVPINRHPALIDYIEELILDVLSKLTHVYRFSICIINKKNDLCIEKYVLDFSELQHVDKDDQIITETEVFDEFRSSLNSLIMHLEKLPKVNDDTITFEAVINAIELELGHKLDRNRRVDSLEEKAEIERDSNWVKCQEDENLPDNNGFQPPKIKLTSLVGSDVGPLIIHQFSEKLISGDDKILNGVYSQYEEGESIFGSLF']  


If you have a CDS with parts (i.e CDS separated by introns or other genetic elements), you can show each part of the CDS as such. The `[0]` represents the first part of the CDS, which can be modified to show the second `[1]`, third `[2]`, etc. 

In [None]:
cds_feature = [f for f in file.features if f.type == "CDS"][0]
print(cds_feature.location)

(Add a how_to for list comprehension?) simple way to list all the features of a particular type?

## Editing Features

Adding a new feature to describe a region of interest, for instance for a region that you would like to perform a PCR, to a file is easy. pydna provides the `add_feature` method to add a misc type feature to a file, given the starting base and the ending base. Note that the base numbering follow biological convention, rather than python numbering methods. For instance, the following code adds a new feature from the 2nd to the 4th nucleotide.

In [None]:
file.add_feature(2,4)

#Confirms that the new feature has been added
file.features

To make a note of what this sequence is, we can add a qualifier to the new feature by accessing the `qualifiers` dictionary. This dictionary can be accessed by simply adding your notes as keyword arguments in the `add_feature` method.  
For instance, if I would like to label a new feature between 24-56 bases as my region of interest, I can write the `add_feature` method as such:

In [None]:
file.add_feature(24,56, ROI="my region of interest")

Python returns the following result:  

type: misc  
location: [24:56](+)  
qualifiers:  
    Key: ROI, Value: my region of interest  
    Key: label, Value: ['ft32']  

Note that pydna automatically assumes the plus strand of the sequence. If you would like to refer to a sequence on the minus strand, you can add a parameter specifying `strand = -1`. 

In [None]:
file.add_feature(24,56, strand=-1, ROI="my region of interest")

pydna also allows you to add a new feature and specifying the feature type directly, rather than always sticking to the misc type. This is useful if you have, for instance, inserted a sequence with known function into a new plasmid, and you would like to reflect that in your records. pydna allows the users to specify a `type_` parameter in the `add_feature` method to do this.

In [None]:
file.add_feature(24,56, type_= "gene")

pydna suppports all the conventional feature types through the `_type` parameters. A non-exhaustive list include gene, CDS, promoter, exon, intron, 5' UTR, 3' UTR, terminator, enhancer, and RBS. You can also define custom features, which could be useful for synthetic biology applications. For instance, you might want to have Bio_brick or spacer features to describe a synthetic standardised plasmid construct.  
  
It is important to note that while pydna does not restrict the feature types you can use, sticking to standard types helps maintain compatibility with other bioinformatics tools and databases. I recommend referring to the official [GenBank_Feature_Table](https://www.insdc.org/submitting-standards/feature-table/#2), if in doubt.

### Adding a Feature with Parts



We can find the new feature in the list of features using their qualifiers, too.

If you need more information on `qualifiers`, please refer to pydna's parent `Biopython` class's annotations [here](https://biopython.org/docs/1.75/api/Bio.SeqFeature.html) for more information. 