<p style='text-align: justify;'>Pipeline to download and process desired pophuman.uab.cat parameters in BedGraph format in order to treat raw data as DataFrames. It uses python kernel to execute bash commands.

The pipeline requires default [python kernel](https://github.com/ipython/ipykernel) in order to use ***%%bash*** [Built-in magic commands](http://ipython.readthedocs.io/en/stable/interactive/magics.html)</p>

## Download raw tracks from <a href="https://pophuman.uab.cat">PopHuman</a>,recomb from<a href="https://www.nature.com/articles/ncomms14994">Bhèrer et al.,(2017)</a>, <a href="http://benhaller.com/slim/SLiM.zip">SLiM software</a> and <a href="http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigWigToBedGrap">bigWigtoBedgraph script</a>

In [None]:
%%bash 
wget -r http://~/pophuman.uab.cat/files/wig
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigWigToBedGraph
wget http://promoter.bx.psu.edu/hi-c/downloads/hg19.TADs.zip 
wget http://www.oreganno.org/dump/ORegAnno_Combined_2016.01.19.tsv
# http://www.cell.com/cell/fulltext/S0092-8674(14)01497-4?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867414014974%3Fshowall%3Dtrue)
wget https://github.com/cbherer/Bherer_etal_SexualDimorphismRecombination/raw/master/Refined_genetic_map_b37.tar.gz 
# https://www.nature.com/articles/ncomms14994
wget http://benhaller.com/slim/SLiM.zip
chmod +x bigWigToBedGraph


## Creating project structure and needed folders

In [None]:
%%bash
###################################################
######################FOLDERS######################
###################################################
mkdir -p ~/pophuman.uab.cat/Data
mkdir -p ~/pophuman.uab.cat/Data/bedGraph/
mkdir -p ~/pophuman.uab.cat/Data/bedGraph/10kb
mkdir -p ~/pophuman.uab.cat/Data/bedGraph/100kb
mkdir -p ~/pophuman.uab.cat/Data/bedGraph/10kb
mkdir -p ~/pophuman.uab.cat/Data/wig/10kb
mkdir -p ~/pophuman.uab.cat/Data/wig/100kb
mkdir -p ~/pophuman.uab.cat/Data/TADs
mkdir -p ~/pophuman.uab.cat/Data/Fst/
mkdir -p ~/pophuman.uab.cat/Data/Fst/CEU2CHB
mkdir -p ~/pophuman.uab.cat/Data/Fst/CEU2YRI
mkdir -p ~/pophuman.uab.cat/Data/Fst/YRI2CHB
mkdir -p ~/pophuman.uab.cat/Data/simulations1000GP/

mkdir -p ~/pophuman.uab.cat/Results/

mkdir -p ~/pophuman.uab.cat/Results/Plots
mkdir -p ~/pophuman.uab.cat/Results/Plots/Top50
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Populations
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Populations/10kb
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Populations/100kb
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Populations/10-100kb
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Chr
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Chr/10kb
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Chr/100kb
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Chr/10-100kb
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Distributions
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Distributions/10kb
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Distributions/100kb
mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Recomb

mkdir -p ~/pophuman.uab.cat/Results/10kb
mkdir -p ~/pophuman.uab.cat/Results/10kb/tabFiles
mkdir -p ~/pophuman.uab.cat/Results/10kb/Features/
mkdir -p ~/pophuman.uab.cat/Results/10kb/Pvalues
mkdir -p ~/pophuman.uab.cat/Results/10kb/Pvalues/Significatives
mkdir -p ~/pophuman.uab.cat/Results/100kb
mkdir -p ~/pophuman.uab.cat/Results/100kb/tabFiles

STAT='Tajima_D FayWu_H FuLi_D FuLi_F iHS S theta Pi XPEHH Fst'

for i in $STAT
do 
    mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Chr/10kb/$i
    mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Chr/100kb/$i
    mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Chr/10-100kb/$i
    mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Distributions/10kb/$i
    mkdir -p ~/pophuman.uab.cat/Results/Plots/Features/Distributions/100kb/$i
done

###################################################
###################MOVING FILES####################
###################################################

mv hg19.TADs.zip ~/pophuman.uab.cat/Data/TADs
unzip ~/pophuman.uab.cat/Data/TADs/hg19.TADs.zip
mv ORegAnno_Combined_2016.01.19.tsv ~/pophuman.uab.cat/Data
mv Refined_genetic_map_b37.tar.gz ~/pophuman.uab.cat/Data
tar -zxvf ~/pophuman.uab.cat/Data/Refined_genetic_map_b37.tar.gz

## Converting raw BigWig files to BedGraph using <a href="http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigWigToBedGraph"> bigWigtoBedgraph</a> script from UCSC in order to use them as data.frames

### 10KB windows

In [None]:
%%bash
STAT='Tajima_D FayWu_H FuLi_D FuLi_F iHS S theta Pi XPEHH'
POP='YRI LWK GWD MSL ESN ACB ASW CHB JPT CHS CDX KHV CEU TSI FIN GBR IBS GIH PJL BEB STU ITU MXL PEL PUR CLM'

for j in $STAT
do
        DATA=~/pophuman.uab.cat/files/wig
        mkdir -p ~/pophuman.uab.cat/Data/wig/10kb/$j
        DATAWIG=~/pophuman.uab.cat/Data/wig/10kb/$j
        ls $DATA | grep -P "^$j\_"  | grep '10kb.bw' | while read i; do cp $DATA/$i $DATAWIG;done
done

for j in $STAT
do   
    echo $j 
    DATAWIG=~/pophuman.uab.cat/Data/wig/10kb/$j
    mkdir -p ~/pophuman.uab.cat/Data/bedGraph/10kb/$j
    OUTPUT=~/pophuman.uab.cat/Data/bedGraph/10kb/$j
    if [ $j = 'XPEHH' ]
    then
        POP='CEU2YRI CHB2CEU CHB2YRI'
    fi
    for i in $POP
        do 
        echo "+++++++++++++++++++$i+++++++++++++++++++"
        var=$(ls $DATAWIG | grep $i | grep '10kb' | cut -d'.' -f1)
        file=$(ls $DATAWIG | grep $i | grep '10kb')
        ./bigWigToBedGraph $DATAWIG/$file $OUTPUT/$var.bedGraph
        done
done


### 100KB windows

In [None]:
%%bash
STAT='Tajima_D FayWu_H FuLi_D FuLi_F S theta Pi'
POP='YRI LWK GWD MSL ESN ACB ASW CHB JPT CHS CDX KHV CEU TSI FIN GBR IBS GIH PJL BEB STU ITU MXL PEL PUR CLM'

for j in $STAT
do
        DATA=~/pophuman.uab.cat/files/wig
        mkdir -p ~/pophuman.uab.cat/Data/wig/100kb/$j
        DATAWIG=~/pophuman.uab.cat/Data/wig/100kb/$j
        ls $DATA | grep -P "^$j\_"  | grep '100kb.bw' | while read i; do cp $DATA/$i $DATAWIG;done
done

for j in $STAT
do
    echo $j
    DATAWIG=~/pophuman.uab.cat/Data/wig/100kb/$j
    mkdir -p ~/pophuman.uab.cat/Data/bedGraph/100kb/$j
    OUTPUT=~/pophuman.uab.cat/Data/bedGraph/100kb/$j
   
    for i in $POP
        do 
        echo "+++++++++++++++++++$i+++++++++++++++++++"
        var=$(ls $DATAWIG | grep $i | grep '100kb' | cut -d'.' -f1)
        file=$(ls $DATAWIG | grep $i | grep '100kb')
        ./bigWigToBedGraph $DATAWIG/$file $OUTPUT/$var.bedGraph
        done
done

### Cleaning transformed bedGraph to proper format

<p style='text-align: justify;'>Bedgraph files from bigWig merge contigous windows if they have similar values. In order to achive unique 10kb or 100kb windows we split windows where distance where major than 10kb or 100kb.</p>

#### 10KB windows

In [None]:
%%bash
STAT='Tajima_D FayWu_H FuLi_D FuLi_F iHS S theta Pi XPEHH'

DATABEDGRAPH=~/pophuman.uab.cat/Data/bedGraph/10kb

for j in $STAT
do 
    mkdir -p $DATABEDGRAPH/$j/old
    OLDBEDGRAPH=$DATABEDGRAPH/$j/old
    OUTPUT=~/pophuman.uab.cat/Data/bedGraph/10kb/$j
    
    mv $DATABEDGRAPH/$j/*.bedGraph $OLDBEDGRAPH/
    
    echo $j
    for i in $(ls $OLDBEDGRAPH/ | grep 10kb)
    do
        echo "+++++++++++++++++++$i+++++++++++++++++++"
        # var=$(echo $j | cut -d'.' -f1 )
        var=$(echo $i | cut -d'.' -f1 | sed 's/_old/;/g' | cut -d';' -f1)
        #echo $var
        awk '{if ($3-$2 != 10000) for (i = 1; i <= (($3-$2)/10000); ++i) if ( i == 1) print $1,$2,$2+(i*10000),$4; else print $1,$2+((i-1)*10000),$2+(((i-1)*10000)+10000),$4; else print $0}' $OLDBEDGRAPH/$i | tr ' ' '\t' > $OUTPUT/$var.bedGraph         
    done
done


#### 100KB windows

In [None]:
%%bash
STAT='Tajima_D FayWu_H FuLi_D FuLi_F S theta Pi'

DATABEDGRAPH=~/pophuman.uab.cat/Data/bedGraph/100kb

for j in $STAT
do 
    mkdir -p $DATABEDGRAPH/$j/old
    OLDBEDGRAPH=$DATABEDGRAPH/$j/old
    OUTPUT=~/pophuman.uab.cat/Data/bedGraph/100kb/$j
    
    mv $DATABEDGRAPH/$j/*.bedGraph $OLDBEDGRAPH/
    
    echo $j
    for i in $(ls $OLDBEDGRAPH/ | grep 100kb)
    do
        echo "+++++++++++++++++++$i+++++++++++++++++++"
        # var=$(echo $j | cut -d'.' -f1 )
        var=$(echo $i | cut -d'.' -f1 | sed 's/_old/;/g' | cut -d';' -f1)
        #echo $var
        awk '{if ($3-$2 != 100000) for (i = 1; i <= (($3-$2)/100000); ++i) if ( i == 1) print $1,$2,$2+(i*100000),$4; else print $1,$2+((i-1)*100000),$2+(((i-1)*100000)+100000),$4; else print $0}' $OLDBEDGRAPH/$i | tr ' ' '\t' > $OUTPUT/$var.bedGraph
    done
done


## Download other data to perform the analysis

In [None]:
%%bash
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/GRCh37_mapping/gencode.v27lift37.basic.annotation.gff3.gz
mv gencode.v27lift37.basic.annotation.gff3.gz ~/pophuman.uab.cat/Data/
gunzip ~/pophuman.uab.cat/Data/gencode.v27lift37.basic.annotation.gff3.gz


# Gene annotation and gene density
grep -P "\tgene\t" ~/pophuman.uab.cat/Data/gencode.v27lift37.basic.annotation.gff3| tr ';' '\t' | cut -f1,10 > ~/pophuman.uab.cat/Data/gencode.genes.txt
for i in {1..22};do echo "chr$i"; grep -c "chr$i" ~/pophuman.uab.cat/Data/gencode.genes.txt;done | awk -v OFS="\t" '!(NR%2){print d OFS $0} {d=$0}' > ~/pophuman.uab.cat/Data/gene.chr.density.txt
grep -c "chrX" ~/pophuman.uab.cat/Data/gencode.genes.txt | awk -v OFS="\t" '!(NR%2){print d OFS $0} {d=$0}' >> ~/pophuman.uab.cat/Data/gene.chr.density.txt
grep -c "chrX" ~/pophuman.uab.cat/Data/gencode.genes.txt | awk -v OFS="\t" '!(NR%2){print d OFS $0} {d=$0}' >> ~/pophuman.uab.cat/Data/gene.chr.density.txt


# Enhacers annotations
wget ~/pophuman.uab.cat/Data/http://slidebase.binf.ku.dk/human_enhancers/presets/serve/enhancer_tss_associations
## MAL cut -f1,2,3,4 enhancer_tss_associations | tr ';' '\t' | cut -f4,6 | tr ':' '\t' | tr '-' '\t' > ~/pophuman.uab.cat/Results/10kb/enhancersTssAsso.tab

wget https://static-content.springer.com/esm/art%3A10.1038%2Fs41559-018-0478-6/MediaObjects/41559_2018_478_MOESM4_ESM.xlsx #Voigth iHS last data
mv 41559_2018_478_MOESM4_ESM.xlsx ~/pophuman.uab.cat/Data