# Working with bedtools

<div class="alert alert-info" style="margin: 0px">
In this exercise we are going to learn about some basic functionalities of bedtools.  
</div>

1. Once in JupyterLab, open a terminal. and type `bedtools --help` and familiarize yourself with the tool. 
2. In the terminal type `bedtools intersect` and familiarize yourself with this sub-command. Also check [this](https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html) link.  
3. We have a list of random loci in `notebooks/data/random.bed` and we want to see with which genes they interesct. The genes and their coordinate is available in: `notebooks/data/genes_hg38.bed`
4. Use less to inspect these two files.
5. constatuct your command so that you get a list of genes that are intersected by the random loci. Store the results into `genes_covered.bed`
6. Count the number of unique genes in the resulting file in step 5.
7. Using Python `pandas` and `matplotlib`, create a bar chart showing the chromosome breakdown of the covered genes. If everything goes right, you should get a figure like this:  

<br>
<img style="float: left;" src="contents/bar_chart.png">

## Answers

#### Commands in terminal
```bash
bedtools intersect -a notebooks/data/genes_hg38.bed -b notebooks/data/random.bed -wa > notebooks/genes_covered.bed
cut -f4 notebooks/genes_covered.bed | sort -u | wc -l
cut -f1,4 notebooks/genes_covered.bed | sort -u > notebooks/genes_covered_unique.bed
```
  
#### Python script
  
```python
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('genes_covered_unique.bed', 
                 sep='\t', 
                 header=None,
                 names=['chrom', 'gene'])

chrom_counts = df.chrom.value_counts(sort=True)
chrom_counts.plot(kind='bar');
```