# BioMart

## What is BioMart?

[BioMart](http://useast.ensembl.org/biomart/martview/180192de636a075b7add9d266a1307c8) is a web-based tool from Ensembl that extracts data across databases.

## How is it useful?

BioMart can give you any information that you can get from an individual page for a gene or transcript on Ensembl in a tabular format that you can download as a text file. BioMart is especially useful for getting transcripts and annotation information for a list of genes. It can also be used to download FASTA sequence files that you can view in Snapgene.

## How do I use it?

Ensembl has a step-by-step documentation on their website, but below is an example of some common input parameters we have used in GPP in the past. 

You can access BioMart by first going to ensembl.org and selecting "BioMart" from the top toolbar.

![access_BioMart](images/BioMart_00_AccessBioMart.png)

This link will take you to the page shown below.

![BioMart_start](images/BioMart_01_StartingPoint.png?)

For our transcript purposes, we generally use the Ensembl Genes Database and the human genes dataset.

![BioMart_select_database](images/BioMart_02_SelectDatabase.png)
![BioMart_select_dataset](images/BioMart_03_SelectDataset.png)

After you select a dataset, you will see a "Filters" option on the sidebar. Click on "Filters", then the + next to "GENE" to be able to restrict your results to a specific set of genes.
![BioMart_select_database](images/BioMart_04_Filters.png?)

Select "Input external references ID list \[Max 500 advised\]." You can select "Gene names" from the annotations list. If your gene list is coming from an external database, such as UniProt, you can select the corresponding option from the list accordingly. 

![BioMart_select_gene_annotations](images/BioMart_05_SelectGeneInputAnnotations.png)

You can either (1) manually enter a set of genes in the space provided,

![BioMart_attributes](images/BioMart_06a_InputGenesManually.png)
or (2) upload a .txt file containing a list of genes. 
![BioMart_attributes](images/BioMart_06b_UploadGeneList.png?)

Next, click on "Attributes" on the left sidebar.

### Generating a table with gene and transcript information

Click on the + next to "GENE" under "Features" (default) to select what information related to the gene list you want to output.

![BioMart_attributes](images/BioMart_07_Attributes.png)

The attributes that you want to select depends on what you're looking for in the BioMart run, but here are a few recommendations with their descriptions:

* **Gene stable ID**: Ensembl gene ID (e.g. ENSG00000276876)
* **Gene stable ID version**: Ensembl gene ID with version (e.g. ENSG00000276876.4)
* **Transcript stable ID**: Ensembl transcript ID (e.g. ENST00000616597)
* **Transcript stable ID version**: Ensembl transcript ID with version (e.g. ENST00000616597.4)
* **RefSeq match transcript (MANE Select)**: If a transcript has a corresponding NCBI ID in this column, it is a preferred transcript for that gene. Matched Annotation from NCBI and EMBL-EBI (MANE) means that the exons for that transcript sequence match between NCBI and Ensembl (for more information, visit the [NCBI page](https://www.ncbi.nlm.nih.gov/refseq/MANE/))
* **Transcript length (including UTRs and CDS)**: useful info to have; if none of the transcript flag filters help to select a transcript for a gene, we can pick the longest transcript for a particular gene
* **Gene name**
* **UniProtKB/Swiss-Prot ID**: (if applicable) UniProt gene ID, if your input gene list comes from this database, e.g., when designing a phosphosite base editor library

The following attributes refer to transcript quality flags. For a full breakdown of these annotations and their hierarchy of transcripts, visit the Ensembl documentation [here]
(https://www.ensembl.org/info/genome/genebuild/transcript_quality_tags.html)

* **APPRIS annotation**: APPRIS is a transcript annotation database that we used before MANE Select, which can be useful for picking the best transcript (indicated as principal1) for a particular gene in the absence of a MANE Select transcript. 
* **Transcript support level (TSL)**: a database that assigns how well-supported a transcript is
* **GENCODE basic annotation**: If no transcript passes the above transcript flag filters, we can use the GENCODE transcript annotations system

![BioMart_select_attributes](images/BioMart_08_SelectAttributes.png)


After you've selected attributes, select the "Results" button in the top left corner.
![BioMart_go_to_results](images/BioMart_09_GoToResults.png???)

To avoid duplicate rows, select "Unique results only" and click "Go" to download the full table. 
![BioMart_results](images/BioMart_10_DownloadResults.png)

### Downloading FASTA files to view in Snapgene

Select "Sequences" to select which sequences you want to download.

![BioMart_seq_attributes](images/BioMart_07_SequenceAttributes.png?)

To avoid generating too many sequence files, you can go back to "Filters" and check the "MANE Select" only box, to only generate sequences for the MANE Select transcripts for your gene. 

![BioMart_MANE_Select_Filter](images/BioMart_08_AddMANESelectFilter.png?)

Going back to "Attributes", select the desired sequence that you'd like to generate from the options shown. In this example, we select "Coding Sequence."

![BioMart_attributes](images/BioMart_09_SelectSequenceAttributes.png)

After you've selected attributes, select the "Results" button in the top left corner.

![BioMart_attributes](images/BioMart_10_DownloadSequenceResults.png)

To avoid duplicate rows, select "Unique results only" and click "Go" to download the sequence files.

![BioMart_results](images/BioMart_10_DownloadSequenceResults.png?)

The file will be downloaded as a text file, but it will need to be renamed to a fasta file.
<table><tr>
<td> <img src="images/BioMart_11_RenameTxtFile_1.png" alt="BioMart_rename_txt_file_1"/> </td>
<td> <img src="images/BioMart_11_RenameTxtFile_2.png" alt="BioMart_rename_txt_file_2"/> </td>
<td> <img src="images/BioMart_11_RenameTxtFile_3.png?" alt="BioMart_rename_txt_file_3"/> </td>
</tr></table>


Now you are free to move this file if necessary and open this file in Snapgene.