In this article we have a first look into the first set of filtered callsets.  We count the number of variants in each, look at variants' positions and the corresponding alignment at that position.

In [4]:
%load_ext autoreload
%autoreload 2
%reload_ext autoreload
import synapseclient
import synapseutils

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Working locally
### Getting callsets

In [7]:
syn = synapseclient.login()

Welcome, Attila Jones!



In [8]:
e = synapseutils.sync.syncFromSynapse(syn, 'syn21897893', path='/big/results/bsm/calls/filtered')
fpaths = [f.path for f in e]
fpaths[:2]

['/big/results/bsm/calls/filtered/MSSM_106_brain.ploidy_50.filtered.vcf',
 '/big/results/bsm/calls/filtered/MSSM_109_brain.ploidy_50.filtered.vcf']

In [12]:
%%bash
cd /home/attila/projects/bsm/results/2020-06-05-filtered-callsets/filtered
for f in *vcf; do
echo -ne "$f\t"
bcftools view -H $f | wc -l
done | sed 's/_brain.ploidy_50.filtered.vcf//'

MSSM_106	27
MSSM_109	52
MSSM_118	9
MSSM_175	19
MSSM_179	25
MSSM_183	27
MSSM_215	15
MSSM_369	28
MSSM_373	45
MSSM_391	17
PITT_010	35
PITT_064	51
PITT_091	18
PITT_118	42


### Getting variants' positions
Let's look at the 27 variants in the MSSM_106 callset! 

In [14]:
%%bash
cd /home/attila/projects/bsm/results/2020-06-05-filtered-callsets/
vcf=MSSM_106_brain.ploidy_50.filtered-epigen.vcf
bcftools view -H $vcf | cut -f1-2

1	43788142
1	82063741
1	166060224
2	283678
2	99427525
2	112425551
2	203909037
2	227287453
4	76273821
4	139540924
6	40164483
6	143541413
9	3374188
10	33474609
10	52554008
10	126238172
12	11925758
12	46471131
12	58483907
12	83652663
12	130493454
15	61039233
16	1537659
16	4509189
16	64236891
19	24279784
22	43977690


## Working on Ada

Below I will demonstrate the following operatons:
1. start an SSH session on Ada
1. get position of variants from a VCF
1. view variants in the corresponding BAM file

### Connecting to Ada

```
[local machine]$ ssh <username>@ada.1470mad.mssm.edu
```

### Getting positions from a VCF
Filtered callsets are to be found here in `/projects/bsm/attila/results/2020-06-05-filtered-callsets`.  To look at the positions of the callset for MSSM_106_NeuN_pl sample:

```
[Ada]$ vcfdir=/projects/bsm/attila/results/2020-06-05-filtered-callsets
[Ada]$ vcf=$vcfdir/MSSM_106_brain.ploidy_50.filtered.vcf
[Ada]$ bcftools view -H $vcf | cut -f1-2
1	43788142
1	82063741
[...]
22	43977690
```

### Viewing variants in a BAM file
This is done with the script `viewvar`. Its usage is given by the `-h` switch:

In [15]:
%%bash
viewvar -h

viewvar [-l|-c|-h] chr pos bam


The `-l` switch places the variant at the left edge of the alignment view, whereas the `-c` switch centers the variant (this is the default behavior so the `-c` may be omitted).

Now let's look at the second and last variant of the VCF!  These are (chrom pos):

* 1 82063741
* 22 43977690

```
[local machine]$ ssh username@ada.1470mad.mssm.edu
[Ada]$ bam=/projects/bsm/alignments/MSSM_106/MSSM_106_NeuN_pl.bam
[Ada]$ viewvar 1 82063741 $bam | less
[Ada]$ viewvar 22 43977690 $bam | less
```

The output should be this...

In [16]:
%%bash
cd ~/projects/bsm/results/2020-06-05-filtered-callsets/varviews
cat MSSM_106_NeuN_pl-1_82063741

1:82063741 in /projects/bsm/alignments/MSSM_106/MSSM_106_NeuN_pl.bam

82063701  82063711  82063721  82063731  |8|2063741  82063751  82063761            
CCACCTATCTTTGGTTGAAGTCTGCCTCCAGAGGTAACAA|C|TGTAGTGTACTTCTGGTTTGCA*TTTTTTGGTAAACTGA
........................................|.|...................... ................
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,|,|,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,
,.......................................|.|......................*................
, ............................A.........|.|......................*................
.... ,,g,,,,,,,,,,,,g,g,,,,,,,,,,,,,g,,,|,|,,,,,,,,,,,,,g,,g,,,,,*,,,,,,,,,,,,,,,,
.....  .................................|.|......................*................
,,,,,, .................................|.|......................*................
....... ................................|.|......................*................
,,,,,,,,,    ...........................|.|......................*................
..........       

...and this

In [17]:
%%bash
cd ~/projects/bsm/results/2020-06-05-filtered-callsets/varviews
cat MSSM_106_NeuN_pl-22_43977690

22:43977690 in /projects/bsm/alignments/MSSM_106/MSSM_106_NeuN_pl.bam

 43977651  43977661  43977671  43977681 | |43977691  43977701  43977711           
CACAGCTGCAGAGCCGCCTTGCACAGGCTCTGCGTCGGGT|C|GGCTTGCATGATGGAGACACCAAGGAAAAGGACAATCAA
........................................|.|.......................................
,,,,,,,,a,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,|,|,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,cc
,,,.....................................|.|.......................................
........................................|.|.......................................
.... ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,|,|,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
......,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,|,|,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,|,|,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,|,|,,,,,,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,
.........   ............................|.|.......................................
,,,,,,,,,    ,,,

In [1]:
%connect_info

{
  "shell_port": 54373,
  "iopub_port": 58121,
  "stdin_port": 39325,
  "control_port": 38521,
  "hb_port": 43153,
  "ip": "127.0.0.1",
  "key": "87a1c297-f083d55c194f0b848f62595a",
  "transport": "tcp",
  "signature_scheme": "hmac-sha256",
  "kernel_name": ""
}

Paste the above JSON into a file, and connect with:
    $> jupyter <app> --existing <file>
or, if you are local, you can connect with just:
    $> jupyter <app> --existing kernel-b305e6f8-015b-4a6a-9508-f990e9cf9a52.json
or even just:
    $> jupyter <app> --existing
if this is the most recent Jupyter kernel you have started.
