## Install plink
We recommend utilising the conda install of plink using the standard commands available here:
- https://anaconda.org/bioconda/plink

Plink is also available in the following locations:
- https://www.cog-genomics.org/plink2/
- https://github.com/chrchang/plink-ng

Once installed, continue with the below code chunks.

In [None]:
%%bash
plink --help | head

In [None]:
%%bash

# Use the code below to download files for chromosomes of interest


# list all available genotype calls files and places them in a file
# The file all_snp_files.txt will appear in your environment (click on the folder icon on the left of your screen to view files in your environment)
dx find data --name "ukb22418_*.bed" --path "/Bulk/Genotype Results/Genotype calls/" --delim ',' | awk -F, '{ print $4 }' |awk -F'/' '{print $5}' | awk -F'.' '{print $1}'  > all_snp_files.txt


# The code below provides an example on how to download data for  chromosome 1, chromosome 4 and chromosome X
# To modify, replace  the 1,4, and X in the followig expression '^ukb22418_c[14X]_' with your chromosome of interest
# For example, if you are interested in SNPs on chromosomes 5,6,8, and 9, use: '^ukb22418_c[5689]_'
# For double-digit chromosomes , e.g chromosome 12 use the following format: '^ukb22418_c[1][2]_'
# For mixed double digit and single digit chromosomes, e,g, chromosome 8 and 12, use the following format: '^ukb22418_c[18][2]_'
# To include all chromosomes modify the expression to '^ukb22418_c*_' 



files=$(grep '^ukb22418_c[14X]_' all_snp_files.txt) 
echo $files 




# Download files containing chromosomes of interest

for i in $files; do
dx download "/Bulk/Genotype Results/Genotype calls/${i}*"
done



# The downloaded files will appear in your environment. There should be a bed, bim and fam file for each chromosome containing SNPs of interest.



In [None]:
%%bash

# Define files to merge. Please replace the '^ukb22418_c[14X]_' expression in the code below with the expression used to define chromosomes in the previous code chunk. 

grep '^ukb22418_c[14X]_' all_snp_files.txt > files_to_merge.txt
cat files_to_merge.txt
plink --merge-list files_to_merge.txt --make-bed --out genotyping_merged

In [None]:
%%bash
# replace the rsIDs in the code below with your SNPs; Always seperate each rsID with a commma.
plink --bfile genotyping_merged --snps rs28659788,rs116587930 --recode A --out snp_ind_plink_results
dx upload snp_ind_plink_results.raw

In [None]:
%%bash

dx download -f "snps_list.txt"
plink --bfile genotyping_merged --extract snps_list.txt --recode A --out snp_list_plink_results
dx upload snp_list_plink_results.raw


In [None]:
%%bash
plink --bfile genotyping_merged --chr 1 --from-kb 0 --to-kb 30000000 --recode A --out snp_region_plink_results
dx upload snp_region_plink_results.raw

In [None]:
%%bash

dx download -f "genomic_regions.txt"

plink --bfile genotyping_merged --extract range genomic_regions.txt  --recode A --out snp_region_list_plink_results

dx upload snp_region_list_plink_results.raw
