# Working with Features using the Dseqrecord class

> Visit the full library documentation [here](https://bjornfjohansson.github.io/pydna/)

Features are important components of a .gb file, describing the key biological properties of the sequence. In Genbank, features "include genes, gene products, as well as regions of biological significance reported in the sequence." (See [here](https://www.ncbi.nlm.nih.gov/genbank/samplerecord/) for a description of a Genbank file and associated terminologies/annotations) Examples include coding sequences (CDS), introns, promoters, etc.

pydna offers many ways to easily view, add, extract, and write features into a Genbank file via the `Dseqrecord` class. Before working with features, you need to import the Genbank files (or other biological formats including FASTA, snapgene, EMBL) into python. Please refer to the Importing_Seqs page for a quick how-to tutorial.

After importing the file, we can checkout the list of features in a sequence using the following code:  

Note that all the following code in this page assumes that a Dseqrecord object has already been loaded/parsed, and that `Dseqrecord` has been imported from `pydna.dseqrecord`

In [None]:
from pydna.dseqrecord import Dseqrecord

# Assuming you've already loaded a Dseqrecord
file = Dseqrecord("your_sequence_or_file_here")

# List all features
for feature in file.features:
    print(feature)

For the sample record from Genbank shown [here](https://www.ncbi.nlm.nih.gov/genbank/samplerecord/), the listed features look like this in python:

type: source  
location: [0:5028](+)  
qualifiers:  
    Key: chromosome, Value: ['IX']  
    Key: db_xref, Value: ['taxon:4932']  
    Key: mol_type, Value: ['genomic DNA']  
    Key: organism, Value: ['Saccharomyces cerevisiae']  

type: mRNA  
location: [<0:>206](+)  
qualifiers:  
    Key: product, Value: ['TCP1-beta']  

type: CDS  
location: [<0:206](+)  
qualifiers:  
    Key: codon_start, Value: ['3']  
    Key: product, Value: ['TCP1-beta']  
    Key: protein_id, Value: ['AAA98665.1']  
    Key: translation, Value: ['SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEAAEVLLRVDNIIRARPRTANRQHM']  

type: gene  
location: [<686:>3158](+)  
qualifiers:  
    Key: gene, Value: ['AXL2']  
...
    Key: product, Value: ['Rev7p']  
    Key: protein_id, Value: ['AAA98667.1']  
    Key: translation, Value: ['MNRWVEKWLRVYLKCYINLILFYRNVYPPQSFDYTTYQSFNLPQFVPINRHPALIDYIEELILDVLSKLTHVYRFSICIINKKNDLCIEKYVLDFSELQHVDKDDQIITETEVFDEFRSSLNSLIMHLEKLPKVNDDTITFEAVINAIELELGHKLDRNRRVDSLEEKAEIERDSNWVKCQEDENLPDNNGFQPPKIKLTSLVGSDVGPLIIHQFSEKLISGDDKILNGVYSQYEEGESIFGSLF']  


If you have a CDS with parts (i.e CDS separated by introns or other genetic elements), you can show each part of the CDS as such. The `[0]` represents the first part of the CDS, which can be modified to show the second `[1]`, third `[2]`, etc. 

In [None]:
cds_feature = [f for f in file.features if f.type == "CDS"][0]
print(cds_feature.location)

Other ways to view, and search for particular features, are shown at the bottom of the page under "Other Methods to Viewing Features"

## Adding Features and Qualifiers

Adding a new feature to describe a region of interest, for instance for a region that you would like to perform a PCR, to a file is easy. pydna provides the `add_feature` method to add a misc type feature to a file, given the starting base and the ending base. Note that the base numbering follow biological convention, rather than python numbering methods. For instance, the following code adds a new feature from the 2nd to the 4th nucleotide.

In [None]:
file.add_feature(2,4)

#Confirms that the new feature has been added
file.features

To make a note of what this sequence is, we can add a qualifier to the new feature by accessing the `qualifiers` dictionary. This dictionary can be accessed by simply adding your notes as keyword arguments in the `add_feature` method.  
For instance, if I would like to label a new feature between 24-56 bases as my region of interest, I can write the `add_feature` method as such:

In [None]:
file.add_feature(24,56, ROI="my region of interest")

Python returns the following result:  

type: misc  
location: [24:56](+)  
qualifiers:  
    Key: ROI, Value: my region of interest  
    Key: label, Value: ['ft32']  

Note that pydna automatically assumes the plus strand of the sequence. If you would like to refer to a sequence on the minus strand, you can add a parameter specifying `strand = -1`. 

In [None]:
file.add_feature(24,56, strand=-1, ROI="my region of interest")

pydna also allows you to add a new feature and specifying the feature type directly, rather than always sticking to the misc type. This is useful if you have, for instance, inserted a sequence with known function into a new plasmid, and you would like to reflect that in your records. pydna allows the users to specify a `type_` parameter in the `add_feature` method to do this.

In [None]:
file.add_feature(24,56, type_= "gene")

pydna suppports all the conventional feature types through the `_type` parameters. A non-exhaustive list include gene, CDS, promoter, exon, intron, 5' UTR, 3' UTR, terminator, enhancer, and RBS. You can also define custom features, which could be useful for synthetic biology applications. For instance, you might want to have Bio_brick or spacer features to describe a synthetic standardised plasmid construct.  
  
It is important to note that while pydna does not restrict the feature types you can use, sticking to standard types helps maintain compatibility with other bioinformatics tools and databases. I recommend referring to the official [GenBank_Feature_Table](https://www.insdc.org/submitting-standards/feature-table/#2), if in doubt.

### Adding a Feature with Parts

To add a feature with parts, like a CDS with introns, we need to apply classes and methods from the parent classes of pydna. As of the current version of pydna, there isn't a dedicated built-in method to directly add a feature with parts to a `Dseqrecord`. However, you can achieve this by creating a `CompoundLocation` manually and then adding it to a `SeqFeature` object. The `SeqFeature` object can then be appended to the features list of a `Dseqrecord`

The example code belows adds a CDS with two parts, between 5-15bp and 20-30bp, named "example gene" in the qualifiers, to my features list. 

In [None]:
from pydna.dseqrecord import Dseqrecord
from Bio.SeqFeature import SeqFeature, FeatureLocation, CompoundLocation

# Define the locations of the CDS
locations = [FeatureLocation(5, 15), FeatureLocation(20, 30)]

# Create a compound location from these parts
compound_location = CompoundLocation(locations)

# Create a SeqFeature with this compound location, including type and qualifiers. 
cds_feature = SeqFeature(location=compound_location, type="CDS", qualifiers={"gene": "example_gene"})

# Add the feature to the Dseqrecord
file.features.append(cds_feature)

# Verify the added feature
for feature in file.features:
    print(feature)

Note that `SeqFeature` uses `type` rather than `_type` to define feature types.  
  
Our added feature looks like this, as appropriate:  

type: CDS  
location: join{[5:15], [20:30]} . 
qualifiers:  
    Key: gene, Value: example_gene  

Further documentation for `SeqFeature`, `CompoundLocation`, and `FeatureLocation` can be found in the `SeqFeature` module [here](https://biopython.org/docs/1.75/api/Bio.SeqFeature.html). 

### Handling Origin Spanning Features

An origin spanning feature is a special type of features that crosses over a circular sequence's origin. In pydna, such a feature is represented as a feature with parts, joining the part of the sequence before the origin and after the origin. However, they can be added as normal using the `add_feature` method.  

An origin spanning feature, between base 19 to base 6, in a 25bp long circular sequence, is represented like so:`   
  
type: gene 
location: join{[19:25](+), [0:6](+)}  
qualifiers: gene, Value: example_gene  
  
The code uses the `add_feature` method as normal.

In [None]:
file.add_feature(19,6, type_="gene", gene="example_gene")

### Other Methods to Viewing Features

pydna also provides the `list_features` method as a simple way to list all the features in a `Dseqrecord` object. 

In [None]:
print(file.list_features())

+-----+------------------+-----+-------+-------+------+--------+------+  
| Ft# | Label or Note    | Dir | Sta   | End   |  Len | type   | orf? |  
+-----+------------------+-----+-------+-------+------+--------+------+  
|   0 | nd               | --> | 0     | 5028  | 5028 | source |  no  |  
|   1 | nd               | --> | <0    | >206  |  206 | mRNA   |  no  |  
|   2 | nd               | --> | <0    | 206   |  206 | CDS    |  no  |  
|   3 | nd               | --> | <686  | >3158 | 2472 | gene   | yes  |  
|   4 | nd               | --> | <686  | >3158 | 2472 | mRNA   | yes  |  
|   5 | N:plasma membran | --> | 686   | 3158  | 2472 | CDS    | yes  |  
|   6 | nd               | <-- | <3299 | >4037 |  738 | gene   | yes  |  
|   7 | nd               | <-- | <3299 | >4037 |  738 | mRNA   | yes  |  
|   8 | nd               | <-- | 3299  | 4037  |  738 | CDS    | yes  |  
|   9 | nd               | --- | 5     | 30    |   20 | CDS    |  no  |  

This method is convenient for checking-out a brief overview of each feature, without reading through an entire sequence record.  

To look for specific features using their qualifiers, we can search through the features list for a specific qualifier. For instance, if I want to file my feature with the gene name of example_gene, I can use the following code:

In [None]:
gene = [f for f in file.features if "gene" in f.qualifiers and f.qualifiers["gene"] == "example_gene"]
print(gene)

If you would like to search for another type of features, simply replace the `"gene"` with your desired feature type in quotation marks.

### Removing Features

Unfortunately, pydna does not provide a built-in method to remove features from a features list. However, we can search for the feature that we would like to remove using the types or qualififers to edit a feature list. 

For instance, we can modify the features list to exclude all CDS:

In [None]:
file.features = [f for f in file.features if not (f.type == "CDS")]

We can also modify the features list to exclude a specific gene:

In [None]:
file.features = [f for f in file.features if not (f.qualifiers == {"gene": "example_gene"})]