# Genomics

The goal of this lecture is to highligh a variety of genomics tools and methods that are commonly used in python.

Things we will go over in this module are:

1) Using Biopython for searching in NCBI databases
2) Using Scanpy for conducting scRNAseq analysis


## Installs/Imports

In [1]:
!pip -q install biopython

### GenBank comprises several subdivisions:

**Nucleotide**: a collection of nucleic acid sequences from several sources.

**Genome Survey Sequence** (GSS): uncharacterized short genomic sequences.

**Expressed Sequence Tags** (EST): uncharacterized short cDNA sequences.

Searching the Nucleotide database with general text queries will produce the most relevant results. You can also use a simple query based on protein name, gene name or gene symbol.

To limit your search to only certain kinds of records, you can search using GenBank's Limits page or alternatively use the Filter your results field to select categories of records after a search.

If you cannot find what you are searching for, check how the database interpreted your query by investigating the Search details field on the right side of the page. This field automatically translates your search into standard keywords.

Here is a link to all the potential search [fields used in Genbank](https://www.ncbi.nlm.nih.gov/books/NBK49540/)

In [40]:
#Bio comes from installing the biopython python module
from Bio import Entrez

Entrez.email = "your_name@your_mail_server.com"
handle = Entrez.esearch(db="nucleotide", term='"mays"[Genus] AND rbcL[Gene]')
record = Entrez.read(handle)
record["Count"]

'81'

**Given**: A genus name, followed by two dates in YYYY/M/D format.

**Return**: The number of Nucleotide GenBank entries for the given genus that were published between the dates specified.

In [35]:
genus = np.genfromtxt('./data/genus_list.csv',dtype=str)