# Locus selection
When inspecting the results of the reference-based assembly you may find that many loci are not covered by many or any reads. In that case you may want to make a selection of loci with good coverage and continue to only work with these loci for downstream analyses. `secapr` has a function that helps you to extract loci with good coverage called `locus_selection`.

In [2]:
%%bash
source activate secapr_env
secapr locus_selection -h

usage: secapr locus_selection [-h] --input INPUT --output OUTPUT [--n N]

Extract the n loci with the best read-coverage from you reference-based
assembly (bam-files)

optional arguments:
  -h, --help       show this help message and exit
  --input INPUT    The folder with the results of the reference based assembly
                   or the phasing results.
  --output OUTPUT  The output directory where results will be safed.
  --n N            The n loci that are best represented accross all samples
                   will be extracted.


This function will compile the average read-coverage for each locus and sample and will select the `n` loci with the best coverage accross all samples.
You can run this function simply like this:

    secapr locus_selection --input data/processed/remapped_reads --output data/processed/selected_loci --n 30

After running `secapr locus_selection` you can view the results by using a script in the `src/` folder, which plots the read-coverage of all loci (after providing the correct path to your output folder) and the set of loci that was selected by `secapr locus_selection`. As you see many loci were not very well covered in our example data (light colors):

In [7]:
%%bash
cd ../../
python src/heatmap_plot.py

In [9]:
%%html
<div>
    <a href="https://plot.ly/~tobiashofmann/48/?share_key=wC4zjzzzXVpyZ4iRjUja28" target="_blank" title="plot from API (24)" style="display: block; text-align: center;"><img src="https://plot.ly/~tobiashofmann/48.png?share_key=wC4zjzzzXVpyZ4iRjUja28" alt="plot from API (24)" style="max-width: 100%;width: 600px;"  width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
    <script data-plotly="tobiashofmann:48" sharekey-plotly="wC4zjzzzXVpyZ4iRjUja28" src="https://plot.ly/embed.js" async></script>
</div>


The read-coverage in the set of selected loci however is rather good for all/most samples:

In [10]:
%%html
<div>
    <a href="https://plot.ly/~tobiashofmann/50/?share_key=VZLFvEmzO1oJ3VGD3SUc8g" target="_blank" title="plot from API (25)" style="display: block; text-align: center;"><img src="https://plot.ly/~tobiashofmann/50.png?share_key=VZLFvEmzO1oJ3VGD3SUc8g" alt="plot from API (25)" style="max-width: 100%;width: 600px;"  width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
    <script data-plotly="tobiashofmann:50" sharekey-plotly="VZLFvEmzO1oJ3VGD3SUc8g" src="https://plot.ly/embed.js" async></script>
</div>


[Previous page](reference_assembly.ipynb) | [Next page](phasing.ipynb)