# BacMapping

This jupyter notebook cointains some examples on how to use the bacmapping functions. 

In [None]:
import bacmapping as bmap

## Main pipeline

The following pipeline download all the necesary files from the FTP server, and creates 2 folders called details and sequences. details contains ... and sequences contains ... . Download can be set to false if this pipeline was previuosly run. 


In [None]:
bmap.getNewClones(download = True) # this will take some time, downloading first all the clones and then their sequences
bmap.narrowDownLibraries()

The following functions generate the database locally in a folder called maps. cpustouse determines the number of cores to use when running multiprocessing. chunk_size determines the amount of lines to read into pandas at once, larger is faster but requires more memory.

In [None]:
bmap.mapSequencedClones(cpustouse=16, chunk_size=1000) 
bmap.mapPlacedClones(cpustouse=16, chunk_size=1000)

Note: Running the main pipeline can take around 8 hours using 16 cores.

## Functions for statistics

In [None]:
bmap.countPlacedBACs()
bmap.getCoverage()
bmap.getAverageLength()
bmap.getSequencedClonesStats()
bmap.makePairs()

Output files:

- countPlacedBACs generates a file called counts.csv with the library and the number of clones in the library
- getCoverage generates a file called coverage.csv with accession, chromosome, bases covered and total bases
- getAverageLength generates a file called averagelength.csv with the library and the average length of clones in basepairs
- getSequencedClonesStats generates a file called sequencedStats.csv with the library, average length and number of clones

Output folder:

- makePairs generates a folder called pairs with a database of pairs of BACs which have overlaps generated by restriction enzymes that linearize the BACs, separated by library and then chromosome

## Function to explore the library

### findPairs

Given a row for a specific BAC as well as overlap and other details, finds possible BACs with acceptable overlap and restriction sites.

In [None]:
name = "RP11-168H2"
row = bmap.getRow(name) # to do - fix row
pairs = bmap.findPairs(row)

### getRightIsoschizomer

Given an enzyme name, returns the enzyme name and Bio.restriction class which corresponds to the isoschizomer which is in the database

In [None]:
enzyme =  "HindIII"
_ , renzyme = bmap.getRightIsoschizomer(enzyme)

### DrawMap

Draws a map for a given BAC and enzyme.

In [None]:
name = "RP11-168H2"
enzyme =  "HindIII"
bmap.drawMap(name, enzyme)

### getSequenceFromName

Given the name of a BAC, tries to return the sequence of that insert.

In [None]:
name = "RP11-168H2"
seq = bmap.getSequenceFromName(name)

### getSequenceFromLoc

Given a chromosome, start and end location, returns sequence of that location.

In [None]:
chrom = 2
start = 100000
end = 500000
seq = bmap.getSequenceFromLoc(chrom,start,end)

### getMapFromName

Given the name of a BAC, tries to return all the restriction maps for that name.

In [None]:
name = "RP11-168H2"
mapfn = bmap.getMapFromName(name)
print(mapfn['HindIII'])

### getMapsFromLoc

Given a chromosome, start and end location, returns all the maps in that region.

In [None]:
chrom = 2
start = 100000
end = 500000
maps = bmap.getMapsFromLoc(chrom,start,end)

### getRestrictionMap

Given the name of a BAC and an enzyme, returns the cut locations.

In [None]:
name = "RP11-168H2"
enzyme = "HindIII"
maps = bmap.getRestrictionMap(name, enzyme)