## Making restriction maps of the BACs in the RP11 library on chromosome 19

This jupyter notebook contains an example of running the pipeline, the input data is reduced to only BACs in RP11 on chromosome 19 so it should only take a few minutes to run any command.

Importing bacmapping as bmap to have access to the functions, importing plt from matplotlib so we can show a nice map when we're done, os is useful for moving around files

In [None]:
import bacmapping as bmap
import matplotlib.pyplot as plt
import os

We need to get the clones we're going to map and download the sequence we are using
For the full pipeline, we'd run getNewClones instead of getNewClonesMiniset. This would take longer and use more space.
This will make two new folders, details and sequences, details has all the information on the clones, sequences has the sequences

Run the following commands, it will take a few minutes to download everything

In [None]:
email = "example@website.com" # Remember to give NCBI your email!

acc = 'NC_000019.10' # accession number of human genome chromosome 19
lib = 'RP11' # library we're interested in
chrom = '19' # chromosome number we're interested in
chunksize = 1000 # how many entries to read into memory in each chunk, bigger = faster and more memory usage
cpus = 8 # how many cpus to use in multiprocessing, more means faster but you need the cpus free

bmap.getNewClonesMiniset(email, lib, acc, chrom, chunk_size = chunksize) # a variation on downloading the dataset which only downloads one library and one chromosome

Map the end-sequenced BACs in the example dataset by running mapPlacedClones, this will take a few minutes to run

In [None]:
bmap.mapPlacedClones(cpustouse=cpus, chunk_size=chunksize)

Some of the BACs are insert-sequenced and use a different function to be restriction mapped, mapSequencedClones

In [None]:
bmap.mapSequencedClones(cpustouse=cpus)

The statistics on this dataset can be determined by running the following commands, which are detailed in the "Functions for statistics" section, they'll save 4 csv files with the results of this analysis

In [None]:
bmap.countPlacedBACs()
bmap.getCoverage()
bmap.getAverageLength()
bmap.getSequencedClonesStats()

Finally, let's explore one set of maps produced in the library, we'll return all the maps for one BAC which is included in the library and then get an image of the produced map.

In [None]:
name = 'RP11-815C22'
enzyme = 'FspI'
maps = bmap.getMaps(name)
print(maps)
rmap = bmap.getRestrictionMap(name,enzyme)
print(rmap)
plt = bmap.drawMap(name, enzyme)
plt.show()

-   maps from bmap.getMaps(name) is a series of all the restriction maps for RP11-1055H23
-   rmap from bmap.getRestrictionMap(name,enzyme) is just the cut locations of FspI in RP11-1055H23
-   plt is a visual representation of rmap


Then let's find pairs that include our BAC of interest

In [None]:
print(bmap.findPairsFromName(name,500,0))

Instead of just finding one, we can find all BAC pairs which are linearized to produce overlapping ends. We'll set longestoverlap, the longest acceptable overlap in the overlapping end, to 500 and shortestoverlap, the shorted acceptable overlap, to 0. This means that we'll also include BACs which are linearized at the same site. This code will produce a file in pairs detailing all the possible pairs.

In [None]:
bmap.makePairs(cpustouse=cpus,longestoverlap=500,shortestoverlap=0)