**Genome-wide assessment identifies novel runs of homozygosity linked to Parkinson’s disease etiology across diverse ancestral populations**

---

*   **Project:** Runs of Homozygosity (ROH) Analysis in Parkinson's Disease -- ROH Mapping and Functional Enrichment in Parkinson's Disease
*   **Version:** Python/3.9
*   **Authors**: Kathryn Step, Carlos F. Hernández, Esraa Eltaraifee, Ana Jimena Hernández-Medrano, Pin-Jui Kung, Miriam Ostrožovičová, Alexandra Zirra, Zih-Hua Fang and Sara Bandres-Ciga
*   **Estimated Computation and Runtime**:
   *   **Estimated Specifications:**  1 CPU, 3.75 GB Memory, 50 GB Persistent Disk Size
   *   **Estimated Runtime:** 90 min
*   **Status:** COMPLETE
*   **Started:** 26-AUG-2023
*   **Last Updated:** 24-NOV-2024
   *   **Update Description:** Incorporated functional annotations for ROH regions and prioritized candidate genes

---

**Notebook Overview**  
*   Fine-mapping ROH regions with overlapping alleles
*   Identifying homozygous variants using VCF files
*   Extracting and annotating functional impacts of variants
*   Performing prioritization based on clinical and conservation scores


---

**Note:**
This notebook presents the initial analysis for the American Admixed (AMR) ancestry group. The same analysis pipeline was repeated for the following additional ancestry groups:  
- **African Admixed (AAC)**  
- **African (AFR)**  
- **Ashkenazi Jewish (AJ)**  
- **American Admixed (AMR)**  
- **Central Asian (CAS)**  
- **East Asian (EAS)**  
- **European (EUR)**  
- **South Asian (SAS)**  

While only the AMR results are shown here for clarity and to minimize redundancy, the methodology and scripts are identical across all groups. Please refer to the project documentation for a comprehensive overview.  

---


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item">
    <li><span><a href="#Getting-Started" data-toc-modified-id="Getting-Started-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Getting Started</a></span></li>
    <li><span><a href="#Create-Working-Directory" data-toc-modified-id="Create-Working-Directory-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Create Working Directory</a></span></li>
    <li><span><a href="#Homozygosity-mapping" data-toc-modified-id="Homozygosity-mapping-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Homozygosity Mapping</a></span>
        <ul class="toc-item">
            <li><span><a href="#Perform-QC-before-relationship-inference-with-KING" data-toc-modified-id="Perform-QC-before-relationship-inference-with-KING-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Perform QC before relationship inference with KING</a></span></li>
            <li><span><a href="#Update-Family-ID-and-prune-samples" data-toc-modified-id="Update-Family-ID-and-prune-samples-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Update Family ID (FID) as IID and prune samples with missing phenotype</a></span></li>
            <li><span><a href="#Relationship-inference-using-KING" data-toc-modified-id="Relationship-inference-using-KING-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Relationship inference using KING</a></span></li>
            <li><span><a href="#Update-Family-ID-based-on-relationship-inference" data-toc-modified-id="Update-Family-ID-based-on-relationship-inference-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Update Family ID based on relationship inference</a></span></li>
            <li><span><a href="#Check-for-inbred-individuals" data-toc-modified-id="Check-for-inbred-individuals-3.5"><span class="toc-item-num">3.5&nbsp;&nbsp;</span>Check for inbred individuals</a></span>
                <ul class="toc-item">
                    <li><span><a href="#Optional:-retain-individuals-with-F>0.0884" data-toc-modified-id="Optional:-retain-individuals-with-F>0.0884-3.5.1"><span class="toc-item-num">3.5.1&nbsp;&nbsp;</span>Optional: retain individuals with F>0.0884 to check overlapped ROH by KING</a></span></li>
                </ul>
            </li>
            <li><span><a href="#Find-ROH-overlaps-that-likely-segregate-within-families" data-toc-modified-id="Find-ROH-overlaps-that-likely-segregate-within-families-3.6"><span class="toc-item-num">3.6&nbsp;&nbsp;</span>Find ROH overlaps that likely segregate within families</a></span>
                <ul class="toc-item">
                    <li><span><a href="#Check-for-overlapping-allele-matched-ROH-within-families" data-toc-modified-id="Check-for-overlapping-allele-matched-ROH-within-families-3.6.1"><span class="toc-item-num">3.6.1&nbsp;&nbsp;</span>Check for overlapping allele-matched ROH within families</a></span></li>
                    <li><span><a href="#Fine-mapping" data-toc-modified-id="Fine-mapping-3.6.2"><span class="toc-item-num">3.6.2&nbsp;&nbsp;</span>Fine mapping (if high penetrance variants were found)</a></span></li>
                </ul>
            </li>
            <li><span><a href="#Find-ROH-overlaps-between-inbred-individuals" data-toc-modified-id="Find-ROH-overlaps-between-inbred-individuals-3.7"><span class="toc-item-num">3.7&nbsp;&nbsp;</span>Find ROH overlaps between inbred individuals</a></span>
                <ul class="toc-item">
                    <li><span><a href="#Subset-samples" data-toc-modified-id="Subset-samples-3.7.1"><span class="toc-item-num">3.7.1&nbsp;&nbsp;</span>Subset samples</a></span></li>
                    <li><span><a href="#Check-consensus-ROH" data-toc-modified-id="Check-consensus-ROH-3.7.2"><span class="toc-item-num">3.7.2&nbsp;&nbsp;</span>Check consensus ROH (cROH)</a></span></li>
                    <li><span><a href="#Find-ROH-enriched-in-inbred-cases" data-toc-modified-id="Find-ROH-enriched-in-inbred-cases-3.7.3"><span class="toc-item-num">3.7.3&nbsp;&nbsp;</span>Find ROH enriched in inbred cases (test the association using a logistic model)</a></span>
                        <ul class="toc-item">
                            <li><span><a href="#Consensus-ROH-filtered" data-toc-modified-id="Consensus-ROH-filtered-3.7.3.1"><span class="toc-item-num">3.7.3.1&nbsp;&nbsp;</span>Consensus ROH filtered</a></span></li>
                            <li><span><a href="#In-PD-genes" data-toc-modified-id="In-PD-genes-3.7.3.2"><span class="toc-item-num">3.7.3.2&nbsp;&nbsp;</span>In PD genes</a></span></li>
                        </ul>
                    </li>
                </ul>
            </li>
            <li><span><a href="#Find-overlaps-between-EOPD" data-toc-modified-id="Find-overlaps-between-EOPD-3.8"><span class="toc-item-num">3.8&nbsp;&nbsp;</span>Find overlaps between EOPD</a></span>
                <ul class="toc-item">
                    <li><span><a href="#Subset-EOPD-+-Controls" data-toc-modified-id="Subset-EOPD-+-Controls-3.8.1"><span class="toc-item-num">3.8.1&nbsp;&nbsp;</span>Subset EOPD + Controls</a></span></li>
                    <li><span><a href="#Check-consensus-ROH" data-toc-modified-id="Check-consensus-ROH-3.8.2"><span class="toc-item-num">3.8.2&nbsp;&nbsp;</span>Check consensus ROH (cROH)</a></span></li>
                    <li><span><a href="#Find-ROH-enriched-in-inbred-cases" data-toc-modified-id="Find-ROH-enriched-in-inbred-cases-3.8.3"><span class="toc-item-num">3.8.3&nbsp;&nbsp;</span>Find ROH enriched in inbred cases (test the association using logistic model)</a></span>
                        <ul class="toc-item">
                            <li><span><a href="#Consensus-ROH-filtered" data-toc-modified-id="Consensus-ROH-filtered-3.8.3.1"><span class="toc-item-num">3.8.3.1&nbsp;&nbsp;</span>Consensus ROH filtered</a></span></li>
                            <li><span><a href="#In-PD-genes" data-toc-modified-id="In-PD-genes-3.8.3.2"><span class="toc-item-num">3.8.3.2&nbsp;&nbsp;</span>In PD genes</a></span></li>
                        </ul>
                    </li>
                </ul>
            </li>
        </ul>
    </li>
    <li><span><a href="#Fine-mapping-loop" data-toc-modified-id="Fine-mapping-loop-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Fine mapping loop (all)</a></span>
        <ul class="toc-item">
            <li><span><a href="#Download-pools" data-toc-modified-id="Download-pools-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Download pools</a></span></li>
            <li><span><a href="#Fine-mapping-in-shared-alleles-ROH" data-toc-modified-id="Fine-mapping-in-shared-alleles-ROH-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Fine mapping in shared alleles ROH (subset ROH regions shared)</a></span>
                <ul class="toc-item">
                     <li><span><a href="#Select-individuals-that-share-alleles" data-toc-modified-id="Select-individuals-that-share-alleles-4.2.1"><span class="toc-item-num">4.2.1&nbsp;&nbsp;</span>Select individuals that share alleles</a></span></li>
                    <li><span><a href="#Select-homozygous-variants-and-extract-CSQ-from-the-VCF-file" data-toc-modified-id="Select-homozygous-variants-and-extract-CSQ-from-the-VCF-file-4.2.2"><span class="toc-item-num">4.2.2&nbsp;&nbsp;</span>Select homozygous variants and extract CSQ from the VCF file</a></span></li>
                       <li><span><a href="#Prioritization-criteria" data-toc-modified-id="Prioritization-criteria-4.2.2.1"><span class="toc-item-num">4.2.2.1&nbsp;&nbsp;</span>Prioritization criteria</a></span></li>
                </ul>
            </li>
            <li><span><a href="#Select-homozygous-variants-and-extract-CSQ-from-the-VCF-file-with-unique-samples" data-toc-modified-id="Select-homozygous-variants-and-extract-CSQ-from-the-VCF-file-with-unique-samples-4.2.3"><span class="toc-item-num">4.2.3&nbsp;&nbsp;</span>Select homozygous variants and extract CSQ from the VCF file with unique samples</a></span>
                <ul class="toc-item">
                       <li><span><a href="#Prioritization-criteria-unique-samples" data-toc-modified-id="Prioritization-criteria-unique-samples-4.2.3.1"><span class="toc-item-num">4.2.3.1&nbsp;&nbsp;</span>Prioritization criteria</a></span></li>
                </ul>
            </li>
        </ul>
    </li>
</ul></div>

# Getting Started

In [None]:
# Install necessary Python packages
!pip install openpyxl --quiet

# Import required libraries
from pathlib import Path
import seaborn as sns
import openpyxl
import glob

# Use the os package to interact with the environment
import os

# Bring in Pandas for Dataframe functionality
import pandas as pd

# Numpy for basics
import numpy as np

# Use StringIO for working with file contents
from io import StringIO

# Enable IPython to display matplotlib graphs
import matplotlib.pyplot as plt
%matplotlib inline

# Enable interaction with the FireCloud API
from firecloud import api as fapi

# Import the iPython HTML rendering for displaying links to Google Cloud Console
from IPython.core.display import display, HTML

# Import urllib modules for building URLs to Google Cloud Console
import urllib.parse

# BigQuery for querying data
from google.cloud import bigquery

#Import Sys
import sys as sys


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
# Set up R support with rpy2
!pip install --upgrade rpy2 --quiet
%reload_ext rpy2.ipython

In [3]:
%%R

# Verify that R magic is working
print('R magic cell is working')

[1] "R magic cell is working"


In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE,  :
  libraries ‘/home/jupyter/packages’, ‘/usr/lib/R/site-library’ contain no packages


In [None]:
# Set up billing project and data path variables
BILLING_PROJECT_ID = os.environ['GOOGLE_PROJECT']
WORKSPACE_NAMESPACE = os.environ['WORKSPACE_NAMESPACE']
WORKSPACE_NAME = os.environ['WORKSPACE_NAME']
WORKSPACE_BUCKET = os.environ['WORKSPACE_BUCKET']

WORKSPACE_ATTRIBUTES = fapi.get_workspace(WORKSPACE_NAMESPACE, WORKSPACE_NAME).json().get('workspace',{}).get('attributes',{})

## Print the information to check we are in the proper release and billing 
## This will be different for you, the user, depending on the billing project your workspace is on
print('Billing and Workspace')
print(f'Workspace Name: {WORKSPACE_NAME}')
print(f'Billing Project: {BILLING_PROJECT_ID}')
print(f'Workspace Bucket, where you can upload and download data: {WORKSPACE_BUCKET}')
print('')


## GP2 v6.0
## Explicitly define release v6.0 path 
GP2_RELEASE_PATH = 'gs:/path/to/data'  
GP2_CLINICAL_RELEASE_PATH = f'{GP2_RELEASE_PATH}/clinical_data'
GP2_RAW_GENO_PATH = f'{GP2_RELEASE_PATH}/raw_genotypes'
GP2_IMPUTED_GENO_PATH = f'{GP2_RELEASE_PATH}/imputed_genotypes'
print('GP2 v6.0')
print(f'Path to GP2 v6.0 Clinical Data: {GP2_CLINICAL_RELEASE_PATH}')
print(f'Path to GP2 v6.0 Raw Genotype Data: {GP2_RAW_GENO_PATH}')
print(f'Path to GP2 v6.0 Imputed Genotype Data: {GP2_IMPUTED_GENO_PATH}')

#borra esto

In [5]:
# Utility routine for printing a shell command before executing it
def shell_do(command):
    print(f'Executing: {command}', file=sys.stderr)
    !$command
    
# Utility routine for executing a shell command and returning its output
def shell_return(command):
    print(f'Executing: {command}', file=sys.stderr)
    output = !$command
    return '\n'.join(output)

# Utility routine for executing a BigQuery query
def bq_query(query):
    print(f'Executing: {query}', file=sys.stderr)
    return pd.read_gbq(query, project_id=BILLING_PROJECT_ID, dialect='standard')

# Utility routine for displaying a message and a link
def display_html_link(description, link_text, url):
    html = f'''
    <p>
    </p>
    <p>
    {description}
    <a target=_blank href="{url}">{link_text}</a>.
    </p>
    '''
    display(HTML(html))

# Utility routines for reading files from Google Cloud Storage
def gcs_read_file(path):
    """Return the contents of a file in GCS"""
    contents = !gsutil -u {BILLING_PROJECT_ID} cat {path}
    return '\n'.join(contents)

def gcs_read_csv(path, sep=None):
    """Return a DataFrame from the contents of a delimited file in GCS"""
    return pd.read_csv(StringIO(gcs_read_file(path)), sep=sep, engine='python')

# Utility routine for displaying a message and link to Cloud Console
def link_to_cloud_console_gcs(description, link_text, gcs_path):
    url = '{}?{}'.format(
        os.path.join('https://console.cloud.google.com/storage/browser',
                     gcs_path.replace("gs://","")),
        urllib.parse.urlencode({'userProject': BILLING_PROJECT_ID}))
    display_html_link(description, link_text, url)

In [6]:
%%bash

# Set up tools (PLINK, King, and ANNOVAR)
mkdir -p ~/tools
cd ~/tools

# Check and install PLINK 1.9
if test -e /home/jupyter/tools/plink; then
   echo "Plink1.9 is already installed in /home/jupyter/tools/"
else
   echo -e "Downloading plink \n    -------"
   wget -N http://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20190304.zip 
   unzip -o plink_linux_x86_64_20190304.zip
   echo -e "\n plink downloaded and unzipped in /home/jupyter/tools \n "
fi

# Check and install PLINK 2.0 with a corrected download link
if test -e /home/jupyter/tools/plink2; then
   echo "Plink2 is already installed in /home/jupyter/tools/"
else
   echo -e "Downloading plink2 \n    -------"
   wget -N https://s3.amazonaws.com/plink2-assets/alpha3/plink2_linux_avx2_20240806.zip
   unzip -o plink2_linux_avx2_20240806.zip -d /home/jupyter/tools/
   mv /home/jupyter/tools/plink2_linux_avx2_20240806 /home/jupyter/tools/plink2
   echo -e "\nPlink2 downloaded, unzipped, and saved as 'plink2' in /home/jupyter/tools \n"
fi

# Install King 2.3.0
if test -e /home/jupyter/tools/king; then
   echo "KING is already installed in /home/jupyter/tools/"
else
   echo -e "Downloading KING \n    -------"
   wget -N https://www.kingrelatedness.com/executables/Linux-king230.tar.gz
   tar -xzvf Linux-king230.tar.gz
   echo -e "\n KING downloaded and unzipped in /home/jupyter/tools \n "
fi

Plink1.9 is already installed in /home/jupyter/tools/
Plink2 is already installed in /home/jupyter/tools/
KING is already installed in /home/jupyter/tools/


In [7]:
%%bash

# List contents of the tools directory
ls /home/jupyter/tools/

king
LICENSE
Linux-king230.tar.gz
plink
plink2
plink2_linux_avx2_20240806.zip
plink_linux_x86_64_20190304.zip
prettify
toy.map
toy.ped


In [8]:
%%bash

# Make the downloaded tools executable
chmod u+x /home/jupyter/tools/plink
chmod u+x /home/jupyter/tools/plink2
chmod u+x /home/jupyter/tools/king

In [9]:
%%R

# Load necessary R packages
library(data.table)
library(magrittr)
library(stringr)
library(ggplot2)
library(plyr)
library(dplyr)
library(tidyr)

data.table 1.14.8 using 48 threads (see ?getDTthreads).  Latest news: r-datatable.com

Attaching package: ‘dplyr’

The following objects are masked from ‘package:plyr’:

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize

The following objects are masked from ‘package:data.table’:

    between, first, last

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Attaching package: ‘tidyr’

The following object is masked from ‘package:magrittr’:

    extract



# Create Working Directory

In [10]:
# Make a directory for use in python
print("Making a working directory")
ANC = 'AMR'
WORK_DIR = f'/home/jupyter/Release6_{ANC}'
shell_do(f'mkdir -p {WORK_DIR}')

Making a working directory


Executing: mkdir -p /home/jupyter/Release6_AMR


In [11]:
%%R

# Set directory for use in R
ANC <- 'AMR'
setwd(paste0("/home/jupyter/Release6_",ANC,"/") )

# Homozygosity mapping

## Perform QC before relationship inference with KING

Quality control is crucial for accurate relationship inference. We recommend filtering high-quality markers with `--geno 0.05` and `--hwe 1E-10`. Note that LD pruning is not recommended for KING.

KING accepts only PLINK bfiles.

In [13]:
# Run plink2 for quality control of genetic data
shell_do(f'/home/jupyter/tools/plink2 \
        --pfile {WORK_DIR}/{ANC}_release6  \
        --geno 0.05 --hwe 1E-10 --make-bed \
        --out {WORK_DIR}/gp2_{ANC}_r6_qced')   

Executing: /home/jupyter/tools/plink2         --pfile /home/jupyter/Release6_AMR/AMR_release6          --geno 0.05 --hwe 1E-10 --make-bed         --out /home/jupyter/Release6_AMR/gp2_AMR_r6_qced


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Release6_AMR/gp2_AMR_r6_qced.log.
Options in effect:
  --geno 0.05
  --hwe 1E-10
  --make-bed
  --out /home/jupyter/Release6_AMR/gp2_AMR_r6_qced
  --pfile /home/jupyter/Release6_AMR/AMR_release6

Start time: Tue Nov 26 05:22:34 2024
628867 MiB RAM detected, ~622031 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
529 samples (241 females, 288 males; 529 founders) loaded from
/home/jupyter/Release6_AMR/AMR_release6.psam.
1185307 variants loaded from /home/jupyter/Release6_AMR/AMR_release6.pvar.
1 binary phenotype loaded (367 cases, 139 controls).
Calculating allele frequencies... 1116222733384449556066717782889399done.
--geno: 377 variants removed due to missing genotype data.
Computing chrX Hardy-Weinberg p-values... 1010111112121313141415151616171718

## Update Family ID (FID) as IID and prune samples with missing phenotype

GP2 PLINK files have all Family IDs set to 0. This causes all samples to be considered one family. Update FID to match IID and remove samples with missing phenotype.

In [14]:
# Update IDs in the .fam file and create update_ids.txt
shell_do(f"cat {WORK_DIR}/gp2_{ANC}_r6_qced.fam | awk '{{print $1, $2, $2, $2}}' > {WORK_DIR}/update_ids.txt")

# Run plink to update IDs, prune, and output updated files
shell_do(f'/home/jupyter/tools/plink --bfile {WORK_DIR}/gp2_{ANC}_r6_qced '
         f'--update-ids {WORK_DIR}/update_ids.txt '
         f'--prune '
         f'--make-bed --out {WORK_DIR}/gp2_{ANC}_r6_qced_updated_id')

Executing: cat /home/jupyter/Release6_AMR/gp2_AMR_r6_qced.fam | awk '{print $1, $2, $2, $2}' > /home/jupyter/Release6_AMR/update_ids.txt
Executing: /home/jupyter/tools/plink --bfile /home/jupyter/Release6_AMR/gp2_AMR_r6_qced --update-ids /home/jupyter/Release6_AMR/update_ids.txt --prune --make-bed --out /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_id


PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_id.log.
Options in effect:
  --bfile /home/jupyter/Release6_AMR/gp2_AMR_r6_qced
  --make-bed
  --out /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_id
  --prune
  --update-ids /home/jupyter/Release6_AMR/update_ids.txt

628867 MB RAM detected; reserving 314433 MB for main workspace.
1184930 variants loaded from .bim file.
529 people (288 males, 241 females) loaded from .fam.
506 phenotype values loaded from .fam.
--update-ids: 529 people updated.
--prune: 506 people remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 506 founders and 0 nonfounders present.
Calculating allele frequencies... 1011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677

##  Relationship inference using KING

Use KING to infer relationships between individuals in the dataset. This step helps to accurately define family structures and identify related individuals.

In [15]:
# Run KING to identify related individuals (up to second-degree relatives)
shell_do(f'/home/jupyter/tools/king -b {WORK_DIR}/gp2_{ANC}_r6_qced_updated_id.bed '
         f'--build --related --degree 2 '
         f'--prefix {WORK_DIR}/gp2_{ANC}')

Executing: /home/jupyter/tools/king -b /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_id.bed --build --related --degree 2 --prefix /home/jupyter/Release6_AMR/gp2_AMR


KING 2.3.0 - (c) 2010-2022 Wei-Min Chen

The following parameters are in effect:
                   Binary File : /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_id.bed (-bname)

Additional Options
         Close Relative Inference : --related [ON], --duplicate
   Pairwise Relatedness Inference : --kinship, --ibdseg, --ibs, --makeGRM
              Inference Parameter : --degree [2], --seglength
         Relationship Application : --unrelated, --cluster, --build [ON]
                        QC Report : --bysample, --bySNP, --roh, --autoQC
                     QC Parameter : --callrateN, --callrateM
             Population Structure : --pca, --mds
              Structure Parameter : --projection, --pcs
              Disease Association : --tdt
   Quantitative Trait Association : --lmm
                Association Model : --trait [], --covariate []
            Association Parameter : --invnorm
               Genetic Risk Score : --risk, --model [], --prevalence, --noflip
              C

## Update Family ID based on relationship inference

After running relationship inference, update the Family IDs to reflect the identified relationships.

In [16]:
# Run plink to update FIDs/IDs, prune, and output updated binary files
shell_do(f'/home/jupyter/tools/plink --bfile {WORK_DIR}/gp2_{ANC}_r6_qced_updated_id '
         f'--update-ids {WORK_DIR}/gp2_{ANC}updateids.txt '
         f'--prune '
         f'--make-bed --out {WORK_DIR}/gp2_{ANC}_r6_qced_updated_fid')

Executing: /home/jupyter/tools/plink --bfile /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_id --update-ids /home/jupyter/Release6_AMR/gp2_AMRupdateids.txt --prune --make-bed --out /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_fid


PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_fid.log.
Options in effect:
  --bfile /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_id
  --make-bed
  --out /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_fid
  --prune
  --update-ids /home/jupyter/Release6_AMR/gp2_AMRupdateids.txt

628867 MB RAM detected; reserving 314433 MB for main workspace.
1184930 variants loaded from .bim file.
506 people (278 males, 228 females) loaded from .fam.
506 phenotype values loaded from .fam.
--update-ids: 7 people updated.
--prune: 506 people remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 506 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686

## Check for inbred individuals

This step identifies individuals with a higher than normal inbreeding coefficient. These individuals may have regions of homozygosity indicative of consanguinity.

In [17]:
# Run KING to detect runs of homozygosity (ROH)
shell_do(f'/home/jupyter/tools/king -b {WORK_DIR}/gp2_{ANC}_r6_qced_updated_fid.bed '
         f'--roh --prefix {WORK_DIR}/gp2_{ANC}')

Executing: /home/jupyter/tools/king -b /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_fid.bed --roh --prefix /home/jupyter/Release6_AMR/gp2_AMR


KING 2.3.0 - (c) 2010-2022 Wei-Min Chen

The following parameters are in effect:
                   Binary File : /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_fid.bed (-bname)

Additional Options
         Close Relative Inference : --related, --duplicate
   Pairwise Relatedness Inference : --kinship, --ibdseg, --ibs, --makeGRM
              Inference Parameter : --degree, --seglength
         Relationship Application : --unrelated, --cluster, --build
                        QC Report : --bysample, --bySNP, --roh [ON], --autoQC
                     QC Parameter : --callrateN, --callrateM
             Population Structure : --pca, --mds
              Structure Parameter : --projection, --pcs
              Disease Association : --tdt
   Quantitative Trait Association : --lmm
                Association Model : --trait [], --covariate []
            Association Parameter : --invnorm
               Genetic Risk Score : --risk, --model [], --prevalence, --noflip
              Computing

### Optional: retain individuals with F>0.0884 to check overlapped ROH by KING

For reference, retain individuals with inbreeding coefficient (F) greater than 0.0884 and check for overlapping ROH.

In [None]:
# Load the ROH data from the specified file
inbred = pd.read_csv(f'{WORK_DIR}/gp2_{ANC}.roh', sep='\t')

# Filter the rows where F_ROH (inbreeding coefficient) is greater than 0.0884
inbred_high_froh = inbred.loc[inbred['F_ROH'] > 0.0884]

inbred_high_froh

## Find ROH overlaps that likely segregate within families

Identify ROH that overlap within families, as these shared regions may harbor disease-causing mutations. Ensure no control individuals carry the same segments.

In [19]:
# Run plink to detect runs of homozygosity (ROH) with specified parameters
shell_do(f'/home/jupyter/tools/plink --bfile {WORK_DIR}/gp2_{ANC}_r6_qced_updated_fid '
         f'--homozyg group '
         f'--homozyg-density 50 '
         f'--homozyg-gap 1000 '
         f'--homozyg-kb 1500 '
         f'--homozyg-snp 100 '
         f'--homozyg-window-het 1 '
         f'--homozyg-window-missing 5 '
         f'--homozyg-window-snp 50 '
         f'--homozyg-window-threshold 0.05 '
         f'--homozyg-match 0.95 '
         f'--out {WORK_DIR}/gp2_{ANC}_r6')

Executing: /home/jupyter/tools/plink --bfile /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_fid --homozyg group --homozyg-density 50 --homozyg-gap 1000 --homozyg-kb 1500 --homozyg-snp 100 --homozyg-window-het 1 --homozyg-window-missing 5 --homozyg-window-snp 50 --homozyg-window-threshold 0.05 --homozyg-match 0.95 --out /home/jupyter/Release6_AMR/gp2_AMR_r6


PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Release6_AMR/gp2_AMR_r6.log.
Options in effect:
  --bfile /home/jupyter/Release6_AMR/gp2_AMR_r6_qced_updated_fid
  --homozyg group
  --homozyg-density 50
  --homozyg-gap 1000
  --homozyg-kb 1500
  --homozyg-match 0.95
  --homozyg-snp 100
  --homozyg-window-het 1
  --homozyg-window-missing 5
  --homozyg-window-snp 50
  --homozyg-window-threshold 0.05
  --out /home/jupyter/Release6_AMR/gp2_AMR_r6

628867 MB RAM detected; reserving 314433 MB for main workspace.
1184930 variants loaded from .bim file.
506 people (278 males, 228 females) loaded from .fam.
506 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 506 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940

### Check for overlapping allele-matched ROH within families

Ensure that ROHs with overlapping alleles within families are identified, particularly those that segregate among affected individuals. These regions can be pooled for further analysis.

Detect regions where individuals share the same alleles, which may indicate genetic relatedness or a shared ancestry.

Focus on allele segregation within families or shared segments that potentially harbor disease-causing mutations across different individuals.

In [None]:
# Load the overlap file for the specific ancestry group
df = pd.read_csv(f'{WORK_DIR}/gp2_{ANC}_r6.hom.overlap',sep='\s+')

df

In [None]:
# Check for pools that were shared within families
ROH_fam = pd.concat(g for _, g in df.groupby(["POOL", "FID"]) if len(g) > 1)

# Display the result
ROH_fam

In [None]:
# Read family structures from the .fam file
fams = pd.read_csv(f'{WORK_DIR}/gp2_{ANC}_r6_qced_updated_fid.fam', 
                   sep=' ', header=None, 
                   names=['FID', 'IID', 'pat_ID', 'mat_ID', 'sex', 'pheno'])

# Sort by family ID (FID) and individual ID (IID)
fams = fams.sort_values(['FID', 'IID'])

# Keep only families with multiple individuals (same FID)
fams = pd.concat(g for _, g in fams.groupby("FID") if len(g) > 1)

# Display the result
fams

In [23]:
# Define a function to filter groups that contain all IDs in the list
def filter_groups(group):
    """
    Check if all IDs in 'id_list' are present in the 'IID' column of the group.
    
    Args:
        group (DataFrame): A grouped DataFrame with 'IID' as one of its columns.

    Returns:
        bool: True if all IDs in 'id_list' are present in the group, False otherwise.
    """
    return set(id_list).issubset(group['IID'])

In [None]:
# Keep pools that contain all affected family members with allele-matched ROH
li = []

for pool, dat in ROH_fam.groupby('POOL'):
    for fam, fam_dat in fams.groupby('FID'):
        # List of affected individuals in the family (pheno == 2)
        id_list = fam_dat.loc[fam_dat['pheno'] == 2, 'IID'].to_list()
        
        # Keep pools that contain all affected family members
        keep = dat.groupby('POOL').filter(filter_groups)
        
        # Remove '*' from 'GRP' to match alleles
        keep['GRP'] = keep['GRP'].str.replace('*', '', regex=False)
        
        # Keep allele-matched ROHs that segregate within the same families
        seg_keep = keep.groupby('GRP').filter(filter_groups)
        
        li.append(seg_keep)

# Concatenate the results into a single DataFrame
fam_pools = pd.concat(li, axis=0)

# Display the result
fam_pools

In [None]:
# Check the pools for 'CON' and 'UNION' families
fam_pools_CON_UNION = df.loc[
    (df['POOL'].isin(fam_pools['POOL'])) & (df['FID'].isin(['CON', 'UNION']))
]

# Display the result
fam_pools_CON_UNION

In [26]:
# Filter for ROHs that no control (CON) carried, looking for high penetrance variants
check_ROH = df.loc[
    (df['POOL'].isin(fam_pools['POOL'])) & 
    (df['FID'].isin(['CON', 'UNION'])) & 
    (df['PHE'].str.contains(':0'))
]

# Display the result
check_ROH

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,NSNP,NSIM,GRP


### Fine mapping (if high penetrance variants were found)

In [None]:
# Filter to identify rows corresponding to the region where the most individuals carried the allele-matched ROH
# Specifically, focus on the pool labeled 'S164'
filtered_df = df.loc[df['POOL'] == 'S164']

# Display the filtered DataFrame to examine the individuals and details for this region
filtered_df

# Observation: The filtered results show one family and one unrelated individual carrying the allele-matched ROH
# This suggests the presence of a potentially shared or segregating genomic region of interest.

## Find ROH overlaps between inbred individuals

### Subset samples

In [28]:
# Read the inbreeding ROH file (*_ROH.hom.indiv) from Google Cloud Storage
FROH = gcs_read_csv(f'gs://fc-1e9cbcf3-a691-4415-8a35-f19d546e50dc/{ANC}_ROH.hom.indiv', sep='\s+')

# Calculate the FROH inbreeding coefficient and round it to 5 decimal places
FROH['FROH'] = (FROH['KB'] / 2886428).round(5)

# Select cases with FROH > 0.0156 and controls (PHE == 1)
keep_inbred = FROH.loc[((FROH['FROH'] > 0.0156) & (FROH['PHE'] == 2)) | (FROH['PHE'] == 1)]

# Read the updated FID and IID information from the .fam file
new_fid = pd.read_csv(f'{WORK_DIR}/chrAll_{ANC}_QC.fam', sep='\s+', usecols=[0, 1], header=None, names=['FID', 'IID'])

# Merge the inbred samples with the new FID information
all_df = pd.merge(keep_inbred, new_fid, on='IID')

# Write out FID and IID for subsetting into a list file
all_df[['FID_y', 'IID']].to_csv(f'{WORK_DIR}/keep_{ANC}_inbred_sampleset.list', sep='\t', index=False, header=False)

In [29]:
# Subset the samples of interest and run homozygosity mapping again
shell_do(f'/home/jupyter/tools/plink --bfile {WORK_DIR}/chrAll_{ANC}_QC '
         f'--keep {WORK_DIR}/keep_{ANC}_inbred_sampleset.list '
         f'--homozyg group '
         f'--homozyg-density 50 '
         f'--homozyg-gap 1000 '
         f'--homozyg-kb 1500 '
         f'--homozyg-snp 100 '
         f'--homozyg-window-het 1 '
         f'--homozyg-window-missing 5 '
         f'--homozyg-window-snp 50 '
         f'--homozyg-window-threshold 0.05 '
         f'--homozyg-match 0.95 '
         f'--out {WORK_DIR}/gp2_{ANC}_r6_inbreed')

Executing: /home/jupyter/tools/plink --bfile /home/jupyter/Release6_AMR/chrAll_AMR_QC --keep /home/jupyter/Release6_AMR/keep_AMR_inbred_sampleset.list --homozyg group --homozyg-density 50 --homozyg-gap 1000 --homozyg-kb 1500 --homozyg-snp 100 --homozyg-window-het 1 --homozyg-window-missing 5 --homozyg-window-snp 50 --homozyg-window-threshold 0.05 --homozyg-match 0.95 --out /home/jupyter/Release6_AMR/gp2_AMR_r6_inbreed


PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Release6_AMR/gp2_AMR_r6_inbreed.log.
Options in effect:
  --bfile /home/jupyter/Release6_AMR/chrAll_AMR_QC
  --homozyg group
  --homozyg-density 50
  --homozyg-gap 1000
  --homozyg-kb 1500
  --homozyg-match 0.95
  --homozyg-snp 100
  --homozyg-window-het 1
  --homozyg-window-missing 5
  --homozyg-window-snp 50
  --homozyg-window-threshold 0.05
  --keep /home/jupyter/Release6_AMR/keep_AMR_inbred_sampleset.list
  --out /home/jupyter/Release6_AMR/gp2_AMR_r6_inbreed

628867 MB RAM detected; reserving 314433 MB for main workspace.
4094211 variants loaded from .bim file.
524 people (286 males, 238 females) loaded from .fam.
502 phenotype values loaded from .fam.
--keep: 159 people remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 159 founders and 0 nonfounders presen

### Check consensus ROH (cROH)

The consensus ROH represents regions of homozygosity shared by multiple individuals. Identifying cROH may indicate regions of interest for further analysis.

In [None]:
# Read the inbreeding ROH overlap data
inbred_roh = pd.read_csv(f'{WORK_DIR}/gp2_{ANC}_r6_inbreed.hom.overlap', sep='\s+')

# Filter for rows where the FID is 'CON' (consensus)
inbred_roh_consense = inbred_roh.loc[inbred_roh['FID'] == 'CON']

# Display the result
inbred_roh_consense

In [31]:
# Filter for consensus ROH (cROH) with a length > 100 Kb and > 100 SNPs
inbred_roh_consense_filtered = inbred_roh_consense.loc[
    (inbred_roh_consense['NSNP'] >= 100) & (inbred_roh_consense['KB'] >= 100)
]

# Save the filtered results to a file
inbred_roh_consense_filtered.to_csv(
    f'{WORK_DIR}/gp2_{ANC}_r6_inbreed.hom.overlap.filtered', 
    sep='\t', index=False, header=False
)

# Display the filtered result
inbred_roh_consense_filtered

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,NSNP,NSIM,GRP
20,S1,CON,20,4:16,7,chr7:65477373:C:A,chr7:66044271:T:C,65477373,66044271,566.899,365,,
72,S4,CON,14,3:11,6,chr6:61118955:C:T,chr6:61867241:T:C,61118955,61867241,748.287,1415,,
87,S5,CON,13,4:9,8,chr8:46622494:A:C,chr8:47646564:A:G,46622494,47646564,1024.071,827,,
115,S7,CON,12,2:10,11,chr11:49155393:A:G,chr11:49489304:A:C,49155393,49489304,333.912,441,,
129,S8,CON,12,0:12,22,chr22:41154950:C:T,chr22:41854130:TTTG:T,41154950,41854130,699.181,862,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2489,S446,CON,2,2:0,20,chr20:59448560:T:C,chr20:61999812:G:A,59448560,61999812,2551.253,4161,,
2493,S447,CON,2,1:1,21,chr21:34816191:C:T,chr21:36669561:A:C,34816191,36669561,1853.371,2944,,
2497,S448,CON,2,2:0,21,chr21:42781756:G:C,chr21:42954192:C:T,42781756,42954192,172.437,417,,
2501,S449,CON,2,2:0,21,chr21:42997454:A:G,chr21:46518326:T:C,42997454,46518326,3520.873,5391,,


In [32]:
# Filter for ROHs that no control (CON) carried, looking for high penetrance variants
check_ROH = inbred_roh_consense_filtered.loc[
    inbred_roh_consense_filtered['PHE'].str.contains(':0')
]

# Display the result
check_ROH

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,NSNP,NSIM,GRP
782,S82,CON,5,5:0,9,chr9:82135547:C:A,chr9:84169832:A:T,82135547,84169832,2034.286,2754,,
1022,S121,CON,4,4:0,10,chr10:24128513:T:C,chr10:25123571:A:G,24128513,25123571,995.059,1379,,
1028,S122,CON,4,4:0,10,chr10:25129805:G:C,chr10:25475212:G:A,25129805,25475212,345.408,486,,
1112,S136,CON,4,4:0,17,chr17:9973671:G:C,chr17:10259230:T:C,9973671,10259230,285.560,658,,
1420,S196,CON,3,3:0,6,chr6:164604500:C:T,chr6:165879923:CG:C,164604500,165879923,1275.424,2645,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2473,S442,CON,2,2:0,20,chr20:796225:T:G,chr20:5046718:T:C,796225,5046718,4250.494,7484,,
2489,S446,CON,2,2:0,20,chr20:59448560:T:C,chr20:61999812:G:A,59448560,61999812,2551.253,4161,,
2497,S448,CON,2,2:0,21,chr21:42781756:G:C,chr21:42954192:C:T,42781756,42954192,172.437,417,,
2501,S449,CON,2,2:0,21,chr21:42997454:A:G,chr21:46518326:T:C,42997454,46518326,3520.873,5391,,


In [33]:
# Extract 'POOL' column values and join them into a space-separated string
columns = check_ROH['POOL'].tolist()
column_spaces = ' '.join(map(str, columns))

# Display the result
print(column_spaces)

S82 S121 S122 S136 S196 S197 S199 S200 S201 S202 S206 S207 S211 S212 S220 S221 S222 S226 S252 S256 S265 S268 S269 S270 S276 S291 S303 S304 S309 S317 S324 S325 S328 S335 S336 S337 S341 S342 S345 S360 S361 S362 S363 S364 S366 S373 S374 S376 S378 S381 S382 S383 S384 S385 S386 S387 S394 S395 S397 S399 S400 S401 S407 S408 S409 S411 S412 S413 S414 S415 S421 S426 S427 S428 S430 S431 S436 S437 S438 S440 S441 S442 S446 S448 S449 S450


### Find ROH enriched in inbred cases (test the association using a logistic model)

Assess whether specific ROHs are enriched in inbred cases using logistic regression models to identify significant associations.

#### Consensus ROH Filtered

Apply filters to consensus ROH based on specific criteria to refine the list of regions under investigation.

In [35]:
%%R

# Define the ancestry code (ANC) and read the filtered ROH data
ANC <- "AMR"
ROH <- fread(paste0("gp2_", ANC, "_r6_inbreed.hom.overlap.filtered"), header = FALSE)

# Assign column names to the ROH data
colnames(ROH) <- c("POOL", "FID", "IID", "PHE", "CHR", "SNP1", "SNP2", "BP1", "BP2", "KB", "NSNP", "NSIM", "GRP")

# Display the structure of the ROH data
str(ROH)

Classes ‘data.table’ and 'data.frame':	406 obs. of  13 variables:
 $ POOL: chr  "S1" "S4" "S5" "S7" ...
 $ FID : chr  "CON" "CON" "CON" "CON" ...
 $ IID : int  20 14 13 12 12 11 11 11 11 10 ...
 $ PHE : chr  "4:16" "3:11" "4:9" "2:10" ...
 $ CHR : int  7 6 8 11 22 11 11 16 20 11 ...
 $ SNP1: chr  "chr7:65477373:C:A" "chr6:61118955:C:T" "chr8:46622494:A:C" "chr11:49155393:A:G" ...
 $ SNP2: chr  "chr7:66044271:T:C" "chr6:61867241:T:C" "chr8:47646564:A:G" "chr11:49489304:A:C" ...
 $ BP1 : int  65477373 61118955 46622494 49155393 41154950 48022207 48812285 71380456 34716170 55586177 ...
 $ BP2 : int  66044271 61867241 47646564 49489304 41854130 48310586 48950614 71604736 35257135 56058978 ...
 $ KB  : num  567 748 1024 334 699 ...
 $ NSNP: int  365 1415 827 441 862 363 135 356 643 866 ...
 $ NSIM: logi  NA NA NA NA NA NA ...
 $ GRP : logi  NA NA NA NA NA NA ...
 - attr(*, ".internal.selfref")=<externalptr> 


In [36]:
# Use this as input for the next two steps
shell_do(f"grep 'are cases and' {WORK_DIR}/gp2_{ANC}_r6_inbreed.log")

Executing: grep 'are cases and' /home/jupyter/Release6_AMR/gp2_AMR_r6_inbreed.log


Among remaining phenotypes, 20 are cases and 139 are controls.


In [37]:
%%R

# Split the 'PHE' column into case and control components
ROH$case <- ldply(strsplit(as.character(ROH$PHE), split = ":"))[[1]]
ROH$control <- ldply(strsplit(as.character(ROH$PHE), split = ":"))[[2]]

# Convert case and control components to numeric
ROH$NcasesWithROHs <- as.numeric(ROH$case)
ROH$NcontrolsWithROHs <- as.numeric(ROH$control)

# Calculate total counts for cases, controls, and combined
ROH$combinedN <- ROH$NcasesWithROHs + ROH$NcontrolsWithROHs

# Initialize the 'P' column for p-values
ROH$P <- NA

# Perform proportion tests for each row
for (i in seq_len(nrow(ROH))) {
  thisP <- prop.test(x = c(ROH$NcasesWithROHs[i], ROH$NcontrolsWithROHs[i]), 
                     n = c(20, 139))  # n = (cases, controls)
  ROH$P[i] <- thisP$p.value
}

# Calculate the total ROH count for cases and controls
ROH$total_ROH_count <- ROH$NcasesWithROHs + ROH$NcontrolsWithROHs

# Display the structure of the ROH data
str(ROH)

Classes ‘data.table’ and 'data.frame':	406 obs. of  20 variables:
 $ POOL             : chr  "S1" "S4" "S5" "S7" ...
 $ FID              : chr  "CON" "CON" "CON" "CON" ...
 $ IID              : int  20 14 13 12 12 11 11 11 11 10 ...
 $ PHE              : chr  "4:16" "3:11" "4:9" "2:10" ...
 $ CHR              : int  7 6 8 11 22 11 11 16 20 11 ...
 $ SNP1             : chr  "chr7:65477373:C:A" "chr6:61118955:C:T" "chr8:46622494:A:C" "chr11:49155393:A:G" ...
 $ SNP2             : chr  "chr7:66044271:T:C" "chr6:61867241:T:C" "chr8:47646564:A:G" "chr11:49489304:A:C" ...
 $ BP1              : int  65477373 61118955 46622494 49155393 41154950 48022207 48812285 71380456 34716170 55586177 ...
 $ BP2              : int  66044271 61867241 47646564 49489304 41854130 48310586 48950614 71604736 35257135 56058978 ...
 $ KB               : num  567 748 1024 334 699 ...
 $ NSNP             : int  365 1415 827 441 862 363 135 356 643 866 ...
 $ NSIM             : logi  NA NA NA NA NA NA ...
 $ GRP     



In [38]:
%%R

# Calculate proportions of cases and controls with ROH
ROH$propCases <- ROH$NcasesWithROHs / 20  # Number of cases
ROH$propControls <- ROH$NcontrolsWithROHs / 139  # Number of controls

# Identify case-enriched ROHs
ROH$caseEnriched <- ifelse(ROH$propCases > ROH$propControls, 1, 0)

# Subset ROHs where total ROH count >= 1 and case-enriched
ROH_subsetted <- subset(ROH, total_ROH_count >= 1 & caseEnriched == 1)

# Apply Bonferroni correction
Nbonf <- length(ROH_subsetted$POOL)
ROH_subsetted$passMultiTest <- ifelse(ROH_subsetted$P <= (0.05 / Nbonf), 1, 0)
ROH_subsetted$BONF <- ROH_subsetted$P / Nbonf

# Subset the significant results after multiple testing correction
significant_ROH <- subset(ROH_subsetted, passMultiTest == 1)

# Apply Bonferroni correction to the full dataset
ROH$BONF <- ROH$P / Nbonf

# Display summary of Bonferroni-corrected significant results
summary(ROH_subsetted$passMultiTest)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    0.00    0.02    0.00    1.00 


In [39]:
%%R

# Write the subsetted ROH data to text files
fwrite(ROH_subsetted, paste0(ANC, "_ROH.inbreed_subsetted.txt"))
fwrite(ROH_subsetted, paste0(ANC, "_ROH.inbreed_caseEnriched.txt"))

In [40]:
# Read the inbreed case-enriched ROH data from the file
ROH = pd.read_csv(f"{WORK_DIR}/{ANC}_ROH.inbreed_caseEnriched.txt")

# Display the ROH data
ROH

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,...,NcasesWithROHs,NcontrolsWithROHs,combinedN,P,total_ROH_count,propCases,propControls,caseEnriched,passMultiTest,BONF
0,S1,CON,20,4:16,7,chr7:65477373:C:A,chr7:66044271:T:C,65477373,66044271,566.899,...,4,16,20,0.477796,20,0.20,0.115108,1,0,0.001365
1,S4,CON,14,3:11,6,chr6:61118955:C:T,chr6:61867241:T:C,61118955,61867241,748.287,...,3,11,14,0.532834,14,0.15,0.079137,1,0,0.001522
2,S5,CON,13,4:9,8,chr8:46622494:A:C,chr8:47646564:A:G,46622494,47646564,1024.071,...,4,9,13,0.103606,13,0.20,0.064748,1,0,0.000296
3,S7,CON,12,2:10,11,chr11:49155393:A:G,chr11:49489304:A:C,49155393,49489304,333.912,...,2,10,12,1.000000,12,0.10,0.071942,1,0,0.002857
4,S11,CON,11,2:9,11,chr11:48022207:TA:T,chr11:48310586:C:T,48022207,48310586,288.380,...,2,9,11,0.912685,11,0.10,0.064748,1,0,0.002608
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
345,S446,CON,2,2:0,20,chr20:59448560:T:C,chr20:61999812:G:A,59448560,61999812,2551.253,...,2,0,2,0.007384,2,0.10,0.000000,1,0,0.000021
346,S447,CON,2,1:1,21,chr21:34816191:C:T,chr21:36669561:A:C,34816191,36669561,1853.371,...,1,1,2,0.593965,2,0.05,0.007194,1,0,0.001697
347,S448,CON,2,2:0,21,chr21:42781756:G:C,chr21:42954192:C:T,42781756,42954192,172.437,...,2,0,2,0.007384,2,0.10,0.000000,1,0,0.000021
348,S449,CON,2,2:0,21,chr21:42997454:A:G,chr21:46518326:T:C,42997454,46518326,3520.873,...,2,0,2,0.007384,2,0.10,0.000000,1,0,0.000021


In [41]:
# Filter the ROH data for entries that passed the multiple testing correction
filtered_ROH = ROH.loc[ROH['passMultiTest'] == 1]

# Display the filtered result
filtered_ROH

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,...,NcasesWithROHs,NcontrolsWithROHs,combinedN,P,total_ROH_count,propCases,propControls,caseEnriched,passMultiTest,BONF
46,S77,CON,5,4:1,8,chr8:42230547:G:T,chr8:43554958:C:T,42230547,43554958,1324.412,...,4,1,5,8.342658e-05,5,0.2,0.007194,1,1,2.383617e-07
50,S81,CON,5,4:1,9,chr9:79266337:T:A,chr9:80842614:G:A,79266337,80842614,1576.278,...,4,1,5,8.342658e-05,5,0.2,0.007194,1,1,2.383617e-07
51,S82,CON,5,5:0,9,chr9:82135547:C:A,chr9:84169832:A:T,82135547,84169832,2034.286,...,5,0,5,1.128733e-07,5,0.25,0.0,1,1,3.224952e-10
57,S88,CON,5,4:1,17,chr17:20558851:A:C,chr17:20936047:G:A,20558851,20936047,377.197,...,4,1,5,8.342658e-05,5,0.2,0.007194,1,1,2.383617e-07
81,S121,CON,4,4:0,10,chr10:24128513:T:C,chr10:25123571:A:G,24128513,25123571,995.059,...,4,0,4,4.725875e-06,4,0.2,0.0,1,1,1.35025e-08
82,S122,CON,4,4:0,10,chr10:25129805:G:C,chr10:25475212:G:A,25129805,25475212,345.408,...,4,0,4,4.725875e-06,4,0.2,0.0,1,1,1.35025e-08
94,S136,CON,4,4:0,17,chr17:9973671:G:C,chr17:10259230:T:C,9973671,10259230,285.56,...,4,0,4,4.725875e-06,4,0.2,0.0,1,1,1.35025e-08


In [42]:
%%bash

# Set working directory and ancestry code
WORK_DIR='/home/jupyter/Release6_{ANC}'
cd $WORK_DIR
ANC=AMR

# Count and display various ROH statistics
{
    printf "Number of ROH: "
    wc -l < gp2_${ANC}_r6_inbreed.hom.overlap.filtered

    printf "Number of ROH enriched in Cases: "
    sed 1d < ${ANC}_ROH.inbreed_caseEnriched.txt | wc -l

    printf "Number of ROH that Pass Bonferroni: "
    cut -d',' -f26 < ${ANC}_ROH.inbreed_caseEnriched.txt | sed 1d | { grep -c '1' || true; }
}

bash: line 4: cd: /home/jupyter/Release6_{ANC}: No such file or directory


Number of ROH: 406
Number of ROH enriched in Cases: 350
Number of ROH that Pass Bonferroni: 0


In [43]:
%%bash

# Set working directory and ancestry code
WORK_DIR='/home/jupyter/Release6_{ANC}'
cd $WORK_DIR
ANC=AMR

# Count and filter ROHs that pass Bonferroni correction
{
    # Count the number of ROHs that pass Bonferroni correction
    printf "Number of ROH that Pass Bonferroni: "
    cut -d',' -f26 < ${ANC}_ROH.inbreed_caseEnriched.txt | sed 1d | { grep -c '1' || true; }

    # Output the rows of ROHs that pass Bonferroni correction into a new CSV file
    printf "Rows of ROH that Pass Bonferroni (including header):\n"
    head -n 1 ${ANC}_ROH.inbreed_caseEnriched.txt > ${ANC}_ROH_Pass_Bonferroni.csv
    awk -F',' '$26 == 1' < ${ANC}_ROH.inbreed_caseEnriched.txt >> ${ANC}_ROH_Pass_Bonferroni.csv
}

bash: line 4: cd: /home/jupyter/Release6_{ANC}: No such file or directory


Number of ROH that Pass Bonferroni: 0
Rows of ROH that Pass Bonferroni (including header):


In [44]:
# Read the ROH data that passed Bonferroni correction
roh_pass_bonferroni = pd.read_csv(f"{WORK_DIR}/{ANC}_ROH_Pass_Bonferroni.csv")

# Display the DataFrame
roh_pass_bonferroni

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,...,NcasesWithROHs,NcontrolsWithROHs,combinedN,P,total_ROH_count,propCases,propControls,caseEnriched,passMultiTest,BONF


#### In PD genes

Focus on ROH that overlap with genes known to be involved in PD. This may help in identifying potential pathogenic regions.

In [45]:
%%bash

# Set working directory and ancestry code
WORK_DIR='/home/jupyter/Release6_{ANC}'
cd $WORK_DIR
ANC=AMR

# Filter ROH data for 'CON' and clean non-CON/UNION rows
grep 'CON' gp2_${ANC}_r6_inbreed.hom.overlap.filtered | awk -F' ' '{if($4 != "-9")print}' > ${ANC}_ROH.hom.overlap.inbreed.CON.txt
grep -v 'CON' gp2_${ANC}_r6_inbreed.hom.overlap.filtered | grep -v 'UNION' | awk -F' ' '{if($4 != "-9")print}' | grep -v 'NA' > ${ANC}_ROH.hom.overlap.inbreed.clean.txt

# Run the Perl script to map ROH data with gene list
perl post_plink_ROH_mapping.pl ${ANC}_ROH.hom.overlap.inbreed.clean.txt ${ANC}_ROH.hom.overlap.inbreed.CON.txt geneList.txt ${ANC}_ROH.hom.overlap.inbreed.CON.PDgenes.txt

bash: line 4: cd: /home/jupyter/Release6_{ANC}: No such file or directory


PD gene: RBX1
PD gene: LOC100131289
PD gene: RSL24D1P1
PD gene: TMEM163
PD gene: TMEM163
PD gene: SPG11
PD gene: TRIM40
PD gene: CHCHD2
PD gene: RNF141
PD gene: RNF141
PD gene: HRNR
PD gene: ITPKB
PD gene: ITPKB
PD gene: ITPKB
PD gene: MED12L
PD gene: SPTSSB
PD gene: SPTSSB
PD gene: MCCC1
PD gene: MCCC1
PD gene: MCCC1
PD gene: EIF4G1
PD gene: ELOVL7
PD gene: PARK2
PD gene: GALNT17
PD gene: ITGA8
PD gene: ITGA8
PD gene: FAM171A1
PD gene: GXYLT1
PD gene: SCAF11
PD gene: SLC38A1
PD gene: USP8
PD gene: POLG1
PD gene: CD19
PD gene: SETD1A
PD gene: SETD1A
PD gene: ZNF646
PD gene: KAT8
PD gene: ATP13A2
PD gene: PINK1
PD gene: KCNS3
PD gene: KCNS3
PD gene: KPNA1
PD gene: KPNA1
PD gene: BST1
PD gene: BST1
PD gene: FAM200B
PD gene: CD38
PD gene: SNCA
PD gene: AC097478.1
PD gene: AC097478.1
PD gene: SNCA
PD gene: ANK2
PD gene: CAMK2D
PD gene: CAMK2D
PD gene: CAMK2D
PD gene: ELOVL7
PD gene: ELOVL7
PD gene: SLC44A4
PD gene: HLA-DRB5
PD gene:  AHR
PD gene: HLA-DRB6
PD gene: HLA-DQA1
PD gene: RIMS1
P

In [46]:
%%R

# Load the ROH data from the CON PD genes file
ROH <- fread(paste0(ANC, "_ROH.hom.overlap.inbreed.CON.PDgenes.txt"), header = TRUE)

# Display the structure of the ROH data
str(ROH)

Classes ‘data.table’ and 'data.frame':	46 obs. of  14 variables:
 $ POOL           : chr  "S113" "S123" "S150" "S154" ...
 $ num_subjects   : int  4 4 3 3 3 3 3 3 3 3 ...
 $ case_to_control: chr  "2:2" "3:1" "2:1" "2:1" ...
 $ ratio          : chr  "1.0" "3.0" "2.0" "2.0" ...
 $ CHR            : chr  "chr7" "chr11" "chr1" "chr1" ...
 $ SNP1           : chr  "chr7:55356967:G:A" "chr11:7463094:A:C" "chr1:151860425:A:C" "chr1:225993955:A:G" ...
 $ SNP2           : chr  "chr7:56922707:C:T" "chr11:10351412:C:T" "chr1:152618871:G:A" "chr1:230304061:A:G" ...
 $ BP1            : int  55356967 7463094 151860425 225993955 149500048 160234491 182701608 59539026 160712093 70645942 ...
 $ BP2            : int  56922707 10351412 152618871 230304061 150884172 161775573 184390899 60257915 161123166 73741147 ...
 $ KB             : num  1566 2888 758 4310 1384 ...
 $ NSNP           : int  2408 4921 792 6747 2094 2049 2262 859 649 4115 ...
 $ PD gene        : chr  "CHCHD2" "RNF141 RNF141" "HRNR" "ITPKB 

In [47]:
# Use this log data as input for the next two steps
shell_do(f'grep "are cases and" {WORK_DIR}/gp2_{ANC}_r6_inbreed.log')

Executing: grep "are cases and" /home/jupyter/Release6_AMR/gp2_AMR_r6_inbreed.log


Among remaining phenotypes, 20 are cases and 139 are controls.


In [48]:
%%R

# Split the 'case_to_control' column into case and control components
ROH$case <- ldply(strsplit(as.character(ROH$case_to_control), split = ":"))[[1]]
ROH$control <- ldply(strsplit(as.character(ROH$case_to_control), split = ":"))[[2]]

# Convert the case and control components to numeric
ROH$NcasesWithROHs <- as.numeric(ROH$case)
ROH$NcontrolsWithROHs <- as.numeric(ROH$control)

# Calculate total counts for cases, controls, and combined
ROH$combinedN <- ROH$NcasesWithROHs + ROH$NcontrolsWithROHs

# Initialize the 'P' column for p-values
ROH$P <- NA

# Perform proportion tests for each row
for (i in seq_len(nrow(ROH))) {
  thisP <- prop.test(x = c(ROH$NcasesWithROHs[i], ROH$NcontrolsWithROHs[i]), 
                     n = c(20, 139))  # n = (cases, controls)
  ROH$P[i] <- thisP$p.value
}

# Calculate the total ROH count for cases and controls
ROH$total_ROH_count <- ROH$NcasesWithROHs + ROH$NcontrolsWithROHs

# Display the structure of the ROH data
str(ROH)

Classes ‘data.table’ and 'data.frame':	46 obs. of  21 variables:
 $ POOL             : chr  "S113" "S123" "S150" "S154" ...
 $ num_subjects     : int  4 4 3 3 3 3 3 3 3 3 ...
 $ case_to_control  : chr  "2:2" "3:1" "2:1" "2:1" ...
 $ ratio            : chr  "1.0" "3.0" "2.0" "2.0" ...
 $ CHR              : chr  "chr7" "chr11" "chr1" "chr1" ...
 $ SNP1             : chr  "chr7:55356967:G:A" "chr11:7463094:A:C" "chr1:151860425:A:C" "chr1:225993955:A:G" ...
 $ SNP2             : chr  "chr7:56922707:C:T" "chr11:10351412:C:T" "chr1:152618871:G:A" "chr1:230304061:A:G" ...
 $ BP1              : int  55356967 7463094 151860425 225993955 149500048 160234491 182701608 59539026 160712093 70645942 ...
 $ BP2              : int  56922707 10351412 152618871 230304061 150884172 161775573 184390899 60257915 161123166 73741147 ...
 $ KB               : num  1566 2888 758 4310 1384 ...
 $ NSNP             : int  2408 4921 792 6747 2094 2049 2262 859 649 4115 ...
 $ PD gene          : chr  "CHCHD2" "RNF14



In [49]:
%%R

# Calculate proportions of cases and controls with ROH
ROH$propCases <- ROH$NcasesWithROHs / 20  # Number of cases
ROH$propControls <- ROH$NcontrolsWithROHs / 139  # Number of controls

# Identify case-enriched ROHs
ROH$caseEnriched <- ifelse(ROH$propCases > ROH$propControls, 1, 0)

# Subset ROHs where total ROH count >= 1 and case-enriched
ROH_subsetted <- subset(ROH, total_ROH_count >= 1 & caseEnriched == 1)

# Apply Bonferroni correction
Ngenes <- length(ROH_subsetted$PD)  # Number of genes in the subset
ROH_subsetted$passMultiTest <- ifelse(ROH_subsetted$P <= (0.05 / Ngenes), 1, 0)
ROH_subsetted$BONF <- ROH_subsetted$P / Ngenes

# Subset and show the results that pass multiple testing correction
significant_ROH <- subset(ROH_subsetted, passMultiTest == 1)

# Apply Bonferroni correction to the full dataset
ROH$BONF <- ROH$P / Ngenes

# Display a summary of the multi-test results
summary(ROH_subsetted$passMultiTest)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.00000 0.00000 0.05263 0.00000 1.00000 


In [50]:
%%R

# Write the subsetted ROH data to text files
fwrite(ROH_subsetted, paste0(ANC, "_ROH.inbreed_PDgenes_subsetted.txt"))
fwrite(ROH_subsetted, paste0(ANC, "_ROH.inbreed_PDgenes_caseEnriched.txt"))

In [51]:
# Read the case-enriched ROH data from the file
ROH = pd.read_csv(f'{WORK_DIR}/{ANC}_ROH.inbreed_PDgenes_caseEnriched.txt')

# Display the DataFrame
ROH

Unnamed: 0,POOL,num_subjects,case_to_control,ratio,CHR,SNP1,SNP2,BP1,BP2,KB,...,NcasesWithROHs,NcontrolsWithROHs,combinedN,P,total_ROH_count,propCases,propControls,caseEnriched,passMultiTest,BONF
0,S113,4,2:2,1.0,chr7,chr7:55356967:G:A,chr7:56922707:C:T,55356967,56922707,1565.741,...,2,2,4,0.127926,4,0.1,0.014388,1,0,0.003366
1,S123,4,3:1,3.0,chr11,chr11:7463094:A:C,chr11:10351412:C:T,7463094,10351412,2888.319,...,3,1,4,0.002292,4,0.15,0.007194,1,0,6e-05
2,S150,3,2:1,2.0,chr1,chr1:151860425:A:C,chr1:152618871:G:A,151860425,152618871,758.447,...,2,1,3,0.048462,3,0.1,0.007194,1,0,0.001275
3,S154,3,2:1,2.0,chr1,chr1:225993955:A:G,chr1:230304061:A:G,225993955,230304061,4310.107,...,2,1,3,0.048462,3,0.1,0.007194,1,0,0.001275
4,S174,3,1:2,0.5,chr3,chr3:149500048:A:G,chr3:150884172:G:C,149500048,150884172,1384.125,...,1,2,3,0.829323,3,0.05,0.014388,1,0,0.021824
5,S177,3,2:1,2.0,chr3,chr3:182701608:C:T,chr3:184390899:T:G,182701608,184390899,1689.292,...,2,1,3,0.048462,3,0.1,0.007194,1,0,0.001275
6,S184,3,2:1,2.0,chr5,chr5:59539026:G:A,chr5:60257915:T:G,59539026,60257915,718.89,...,2,1,3,0.048462,3,0.1,0.007194,1,0,0.001275
7,S195,3,2:1,2.0,chr6,chr6:160712093:G:C,chr6:161123166:A:C,160712093,161123166,411.074,...,2,1,3,0.048462,3,0.1,0.007194,1,0,0.001275
8,S200,3,3:0,3:0,chr7,chr7:70645942:CT:C,chr7:73741147:T:A,70645942,73741147,3095.206,...,3,0,3,0.000191,3,0.15,0.0,1,1,5e-06
9,S211,3,3:0,3:0,chr10,chr10:15180551:G:C,chr10:15923173:ACTTAT:A,15180551,15923173,742.623,...,3,0,3,0.000191,3,0.15,0.0,1,1,5e-06


In [52]:
%%bash

# Set working directory and ancestry code
WORK_DIR='/home/jupyter/Release6_{ANC}'
cd $WORK_DIR
ANC=AMR

# Display various ROH statistics
{
    # Count the number of ROHs overlapping PD genes
    printf "Number of ROH overlapping PD genes: "
    sed 1d < ${ANC}_ROH.hom.overlap.inbreed.CON.PDgenes.txt | wc -l

    # Count the number of ROHs enriched in cases
    printf "Number of ROH enriched in Cases: "
    sed 1d < ${ANC}_ROH.inbreed_PDgenes_caseEnriched.txt | wc -l

    # Count the number of ROHs that pass Bonferroni correction
    printf "Number of ROH that Pass Bonferroni: "
    cut -d',' -f27 < ${ANC}_ROH.inbreed_PDgenes_caseEnriched.txt | sed 1d | { grep -c '1' || true; }
}

bash: line 4: cd: /home/jupyter/Release6_{ANC}: No such file or directory


Number of ROH overlapping PD genes: 46
Number of ROH enriched in Cases: 38
Number of ROH that Pass Bonferroni: 0


In [53]:
%%bash

# Set working directory and ancestry code
WORK_DIR='/home/jupyter/Release6_{ANC}'
cd $WORK_DIR
ANC=AMR

# Count and filter ROHs that pass Bonferroni correction
{
    # Count the number of ROHs that pass Bonferroni correction
    printf "Number of ROH that Pass Bonferroni: "
    cut -d',' -f27 < ${ANC}_ROH.inbreed_PDgenes_caseEnriched.txt | sed 1d | { grep -c '1' || true; }

    # Extract rows of ROHs that pass Bonferroni correction, including header
    printf "Rows of ROH that Pass Bonferroni (including header):\n"
    head -n 1 ${ANC}_ROH.inbreed_PDgenes_caseEnriched.txt > ${ANC}_ROH_Pass_Bonferroni_inbred_cases_PD_genes.csv
    awk -F',' '$27 == 1' < ${ANC}_ROH.inbreed_PDgenes_caseEnriched.txt >> ${ANC}_ROH_Pass_Bonferroni_inbred_cases_PD_genes.csv
}

bash: line 4: cd: /home/jupyter/Release6_{ANC}: No such file or directory


Number of ROH that Pass Bonferroni: 0
Rows of ROH that Pass Bonferroni (including header):


In [54]:
# Read the ROH data that passed Bonferroni correction
roh_pass_bonferroni = pd.read_csv(f"{WORK_DIR}/{ANC}_ROH_Pass_Bonferroni_inbred_cases_PD_genes.csv")

# Display the DataFrame
roh_pass_bonferroni

Unnamed: 0,POOL,num_subjects,case_to_control,ratio,CHR,SNP1,SNP2,BP1,BP2,KB,...,NcasesWithROHs,NcontrolsWithROHs,combinedN,P,total_ROH_count,propCases,propControls,caseEnriched,passMultiTest,BONF


## Find overlaps between EOPD

Identify ROH overlaps specifically within EOPD cases to investigate potential shared genetic causes.

### Subset EOPD + Controls

Retrieve the list of EOPD cases and control samples, and use them to generate PLINK bfiles for further analysis.

In [None]:
%%R

# Load ROH data and clinical data
ROH <- fread(paste0(ANC, "_ROH.hom.indiv"))[, 2]
clinical <- fread("master_key_release6_final.csv")[, c(2, 5, 10)]

# Rename clinical data columns
colnames(clinical) <- c("IID", "PHENO", "AAO")

# Merge ROH and clinical data by IID
merged <- merge(ROH, clinical, by = "IID")

# Add FID column with a default value of 0
merged$FID <- 0

# Filter for Early Onset Parkinson's Disease (EOPD) or controls (PHENO == 1)
filtered <- subset(merged, (AAO < 45 & PHENO == 2) | PHENO == 1)

# Select FID and IID columns for output
selected <- select(filtered, c("FID", "IID"))

# Write the selected samples to a file
fwrite(selected, paste0("keep_", ANC, "_EOPD_sampleset.list"), sep = '\t')

# Display the selected data
selected

In [None]:
# Display the first few lines of the EOPD sampleset list
shell_do(f'head {WORK_DIR}/keep_{ANC}_EOPD_sampleset.list')

In [58]:
# Subset the samples of interest and run homozygosity mapping
shell_do(f'/home/jupyter/tools/plink --bfile {WORK_DIR}/chrAll_{ANC}_QC '
         f'--keep {WORK_DIR}/keep_{ANC}_EOPD_sampleset.list '
         f'--homozyg group '
         f'--homozyg-density 50 '
         f'--homozyg-gap 1000 '
         f'--homozyg-kb 1500 '
         f'--homozyg-snp 100 '
         f'--homozyg-window-het 1 '
         f'--homozyg-window-missing 5 '
         f'--homozyg-window-snp 50 '
         f'--homozyg-window-threshold 0.05 '
         f'--homozyg-match 0.95 '
         f'--out {WORK_DIR}/gp2_{ANC}_r6_EOPD')

Executing: /home/jupyter/tools/plink --bfile /home/jupyter/Release6_AMR/chrAll_AMR_QC --keep /home/jupyter/Release6_AMR/keep_AMR_EOPD_sampleset.list --homozyg group --homozyg-density 50 --homozyg-gap 1000 --homozyg-kb 1500 --homozyg-snp 100 --homozyg-window-het 1 --homozyg-window-missing 5 --homozyg-window-snp 50 --homozyg-window-threshold 0.05 --homozyg-match 0.95 --out /home/jupyter/Release6_AMR/gp2_AMR_r6_EOPD


PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Release6_AMR/gp2_AMR_r6_EOPD.log.
Options in effect:
  --bfile /home/jupyter/Release6_AMR/chrAll_AMR_QC
  --homozyg group
  --homozyg-density 50
  --homozyg-gap 1000
  --homozyg-kb 1500
  --homozyg-match 0.95
  --homozyg-snp 100
  --homozyg-window-het 1
  --homozyg-window-missing 5
  --homozyg-window-snp 50
  --homozyg-window-threshold 0.05
  --keep /home/jupyter/Release6_AMR/keep_AMR_EOPD_sampleset.list
  --out /home/jupyter/Release6_AMR/gp2_AMR_r6_EOPD

628867 MB RAM detected; reserving 314433 MB for main workspace.
4094211 variants loaded from .bim file.
524 people (286 males, 238 females) loaded from .fam.
502 phenotype values loaded from .fam.
--keep: 230 people remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 230 founders and 0 nonfounders present.
Calcu

### Check consensus ROH (cROH)

Confirm the consensus ROH in the generated sample lists.

In [59]:
# Read the inbreeding ROH overlap data
inbred_roh = pd.read_csv(f'{WORK_DIR}/gp2_{ANC}_r6_EOPD.hom.overlap', sep='\s+')

# Filter for rows where the FID is 'CON' (consensus)
inbred_roh_consense = inbred_roh.loc[inbred_roh['FID'] == 'CON']

# Display the filtered data
inbred_roh_consense

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,NSNP,NSIM,GRP
23,S1,CON,23,7:16,7,chr7:65477373:C:A,chr7:65477373:C:A,65477373,65477373,0.001,1,,
45,S2,CON,20,6:14,3,chr3:49588749:T:G,chr3:49612962:G:A,49588749,49612962,24.214,23,,
67,S3,CON,20,7:13,3,chr3:49855463:A:G,chr3:49870590:G:A,49855463,49870590,15.128,11,,
88,S4,CON,19,6:13,3,chr3:49948685:T:C,chr3:50141389:G:A,49948685,50141389,192.705,140,,
108,S5,CON,18,6:12,3,chr3:48466268:T:C,chr3:48544415:A:G,48466268,48544415,78.148,31,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2614,S410,CON,2,1:1,21,chr21:27018589:G:A,chr21:28152094:A:G,27018589,28152094,1133.506,2165,,
2618,S411,CON,2,1:1,22,chr22:27779753:G:A,chr22:30214581:C:T,27779753,30214581,2434.829,2994,,
2622,S412,CON,2,2:0,22,chr22:30216960:A:G,chr22:30244319:A:G,30216960,30244319,27.360,35,,
2626,S413,CON,2,1:1,22,chr22:30558084:G:A,chr22:32332418:T:TC,30558084,32332418,1774.335,2502,,


In [60]:
# Filter for consensus ROHs (cROH) with length > 100 Kb and > 100 SNPs
inbred_roh_consense_filtered = inbred_roh_consense.loc[
    (inbred_roh_consense['NSNP'] >= 100) & (inbred_roh_consense['KB'] >= 100)
]

# Save the filtered results to a file
inbred_roh_consense_filtered.to_csv(
    f'{WORK_DIR}/gp2_{ANC}_r6_EOPD.hom.overlap.filtered', sep='\t', index=False, header=False
)

# Display the filtered data
inbred_roh_consense_filtered

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,NSNP,NSIM,GRP
88,S4,CON,19,6:13,3,chr3:49948685:T:C,chr3:50141389:G:A,49948685,50141389,192.705,140,,
147,S7,CON,17,10:7,6,chr6:27899165:G:C,chr6:28207771:G:C,27899165,28207771,308.607,385,,
166,S8,CON,17,5:12,22,chr22:41154950:C:T,chr22:41580474:G:C,41154950,41580474,425.525,471,,
184,S9,CON,16,5:11,6,chr6:61317309:G:C,chr6:61867241:T:C,61317309,61867241,549.933,838,,
202,S10,CON,16,7:9,8,chr8:46907634:ACCG:A,chr8:47646564:A:G,46907634,47646564,738.931,575,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2610,S409,CON,2,1:1,21,chr21:24522616:C:T,chr21:26423081:C:T,24522616,26423081,1900.466,3001,,
2614,S410,CON,2,1:1,21,chr21:27018589:G:A,chr21:28152094:A:G,27018589,28152094,1133.506,2165,,
2618,S411,CON,2,1:1,22,chr22:27779753:G:A,chr22:30214581:C:T,27779753,30214581,2434.829,2994,,
2626,S413,CON,2,1:1,22,chr22:30558084:G:A,chr22:32332418:T:TC,30558084,32332418,1774.335,2502,,


In [61]:
# Filter for ROHs that no control (CON) carried, looking for high penetrance variants
check_ROH = inbred_roh_consense_filtered.loc[
    inbred_roh_consense_filtered['PHE'].str.contains(':0')
]

# Display the filtered data
check_ROH

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,NSNP,NSIM,GRP
1109,S97,CON,5,5:0,4,chr4:121656615:T:G,chr4:121970619:G:C,121656615,121970619,314.005,416,,
1303,S127,CON,4,4:0,4,chr4:114942670:T:A,chr4:116313712:G:A,114942670,116313712,1371.043,1915,,
1309,S128,CON,4,4:0,4,chr4:130700347:C:G,chr4:131682533:A:C,130700347,131682533,982.187,1570,,
1315,S129,CON,4,4:0,4,chr4:131985505:A:G,chr4:132300720:G:C,131985505,132300720,315.216,581,,
1542,S167,CON,3,3:0,1,chr1:15673611:C:T,chr1:17215454:A:G,15673611,17215454,1541.844,1569,,
1547,S168,CON,3,3:0,1,chr1:31767341:A:G,chr1:33234313:G:A,31767341,33234313,1466.973,1089,,
1657,S190,CON,3,3:0,3,chr3:120504534:G:C,chr3:122063104:T:A,120504534,122063104,1558.571,2698,,
1662,S191,CON,3,3:0,3,chr3:135906881:A:T,chr3:137453392:T:A,135906881,137453392,1546.512,1695,,
1697,S198,CON,3,3:0,4,chr4:12655552:T:A,chr4:13756738:G:A,12655552,13756738,1101.187,1681,,
1702,S199,CON,3,3:0,4,chr4:71774726:T:G,chr4:73412553:C:T,71774726,73412553,1637.828,2053,,


In [62]:
# Extract 'POOL' column values and join them into a space-separated string
columns = check_ROH['POOL'].tolist()
column_spaces = ' '.join(map(str, columns))

# Display the result
print(column_spaces)

S97 S127 S128 S129 S167 S168 S190 S191 S198 S199 S202 S203 S206 S214 S215 S216 S217 S218 S222 S226 S227 S254 S255 S259 S270 S272 S273 S281 S293 S303 S309 S315 S320 S321 S326 S327 S344 S345 S346 S347 S349 S354 S356 S358 S367 S370 S371 S372 S375 S382 S388 S394 S395 S399 S402 S403 S404 S405 S408 S414


### Find ROH enriched in inbred cases (test the association using logistic model)

Run logistic regression models to test for associations between ROHs and inbred status.

#### Concensus ROH filtered

In [64]:
%%R

# Load the filtered ROH data
ROH <- fread(paste0("gp2_", ANC, "_r6_EOPD.hom.overlap.filtered"), header = FALSE)

# Assign column names to the ROH data
colnames(ROH) <- c("POOL", "FID", "IID", "PHE", "CHR", "SNP1", "SNP2", "BP1", "BP2", "KB", "NSNP", "NSIM", "GRP")

# Display the structure of the ROH data
str(ROH)

Classes ‘data.table’ and 'data.frame':	345 obs. of  13 variables:
 $ POOL: chr  "S4" "S7" "S8" "S9" ...
 $ FID : chr  "CON" "CON" "CON" "CON" ...
 $ IID : int  19 17 17 16 16 14 14 13 13 12 ...
 $ PHE : chr  "6:13" "10:7" "5:12" "5:11" ...
 $ CHR : int  3 6 22 6 8 8 20 12 16 10 ...
 $ SNP1: chr  "chr3:49948685:T:C" "chr6:27899165:G:C" "chr22:41154950:C:T" "chr6:61317309:G:C" ...
 $ SNP2: chr  "chr3:50141389:G:A" "chr6:28207771:G:C" "chr22:41580474:G:C" "chr6:61867241:T:C" ...
 $ BP1 : int  49948685 27899165 41154950 61317309 46907634 99012219 34787094 111278914 47050042 37413507 ...
 $ BP2 : int  50141389 28207771 41580474 61867241 47646564 99904724 35257135 112549081 47942933 38306784 ...
 $ KB  : num  193 309 426 550 739 ...
 $ NSNP: int  140 385 471 838 575 571 582 1129 224 1446 ...
 $ NSIM: logi  NA NA NA NA NA NA ...
 $ GRP : logi  NA NA NA NA NA NA ...
 - attr(*, ".internal.selfref")=<externalptr> 


In [65]:
# Use this log data as input for the next two steps
shell_do(f'grep "are cases and" {WORK_DIR}/gp2_{ANC}_r6_EOPD.log')

Executing: grep "are cases and" /home/jupyter/Release6_AMR/gp2_AMR_r6_EOPD.log


Among remaining phenotypes, 91 are cases and 139 are controls.


In [66]:
%%R

# Split the 'PHE' column into case and control components
ROH$case <- ldply(strsplit(as.character(ROH$PHE), split = ":"))[[1]]
ROH$control <- ldply(strsplit(as.character(ROH$PHE), split = ":"))[[2]]

# Convert case and control components to numeric
ROH$NcasesWithROHs <- as.numeric(ROH$case)
ROH$NcontrolsWithROHs <- as.numeric(ROH$control)

# Calculate the total counts for cases, controls, and combined
ROH$combinedN <- ROH$NcasesWithROHs + ROH$NcontrolsWithROHs

# Initialize the 'P' column for p-values
ROH$P <- NA

# Perform proportion tests for each row
for (i in seq_len(nrow(ROH))) {
  thisP <- prop.test(x = c(ROH$NcasesWithROHs[i], ROH$NcontrolsWithROHs[i]), 
                     n = c(91, 139))  # n = (cases, controls)
  ROH$P[i] <- thisP$p.value
}

# Calculate the total ROH count for cases and controls
ROH$total_ROH_count <- ROH$NcasesWithROHs + ROH$NcontrolsWithROHs

# Display the structure of the ROH data
str(ROH)

Classes ‘data.table’ and 'data.frame':	345 obs. of  20 variables:
 $ POOL             : chr  "S4" "S7" "S8" "S9" ...
 $ FID              : chr  "CON" "CON" "CON" "CON" ...
 $ IID              : int  19 17 17 16 16 14 14 13 13 12 ...
 $ PHE              : chr  "6:13" "10:7" "5:12" "5:11" ...
 $ CHR              : int  3 6 22 6 8 8 20 12 16 10 ...
 $ SNP1             : chr  "chr3:49948685:T:C" "chr6:27899165:G:C" "chr22:41154950:C:T" "chr6:61317309:G:C" ...
 $ SNP2             : chr  "chr3:50141389:G:A" "chr6:28207771:G:C" "chr22:41580474:G:C" "chr6:61867241:T:C" ...
 $ BP1              : int  49948685 27899165 41154950 61317309 46907634 99012219 34787094 111278914 47050042 37413507 ...
 $ BP2              : int  50141389 28207771 41580474 61867241 47646564 99904724 35257135 112549081 47942933 38306784 ...
 $ KB               : num  193 309 426 550 739 ...
 $ NSNP             : int  140 385 471 838 575 571 582 1129 224 1446 ...
 $ NSIM             : logi  NA NA NA NA NA NA ...
 $ GRP    



In [67]:
%%R

# Calculate proportions of cases and controls with ROH
ROH$propCases <- ROH$NcasesWithROHs / 91  # Number of cases
ROH$propControls <- ROH$NcontrolsWithROHs / 139  # Number of controls

# Identify case-enriched ROHs
ROH$caseEnriched <- ifelse(ROH$propCases > ROH$propControls, 1, 0)

# Subset ROHs where total ROH count >= 1 and case-enriched
ROH_subsetted <- subset(ROH, total_ROH_count >= 1 & caseEnriched == 1)

# Apply Bonferroni correction
Nbonf <- length(ROH_subsetted$POOL)  # Number of tests (unique POOLs)
ROH_subsetted$passMultiTest <- ifelse(ROH_subsetted$P <= (0.05 / Nbonf), 1, 0)
ROH_subsetted$BONF <- ROH_subsetted$P / Nbonf

# Subset the significant results after multiple testing correction
significant_ROH <- subset(ROH_subsetted, passMultiTest == 1)

# Apply Bonferroni correction to the full dataset
ROH$BONF <- ROH$P / Nbonf

# Display summary of Bonferroni-corrected results
summary(ROH_subsetted$passMultiTest)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 


In [68]:
%%R

# Write the subsetted ROH data to text files
fwrite(ROH_subsetted, paste0(ANC, "_ROH.EOPD_subsetted.txt"))
fwrite(ROH_subsetted, paste0(ANC, "_ROH.EOPD_caseEnriched.txt"))

In [69]:
# Read the case-enriched ROH data from the file
ROH = pd.read_csv(f'{WORK_DIR}/{ANC}_ROH.EOPD_caseEnriched.txt')

# Display the DataFrame
ROH

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,...,NcasesWithROHs,NcontrolsWithROHs,combinedN,P,total_ROH_count,propCases,propControls,caseEnriched,passMultiTest,BONF
0,S7,CON,17,10:7,6,chr6:27899165:G:C,chr6:28207771:G:C,27899165,28207771,308.607,...,10,7,17,0.152806,17,0.109890,0.050360,1,0,0.000670
1,S10,CON,16,7:9,8,chr8:46907634:ACCG:A,chr8:47646564:A:G,46907634,47646564,738.931,...,7,9,16,0.928387,16,0.076923,0.064748,1,0,0.004072
2,S13,CON,14,8:6,8,chr8:99012219:T:G,chr8:99904724:T:C,99012219,99904724,892.506,...,8,6,14,0.268764,14,0.087912,0.043165,1,0,0.001179
3,S18,CON,13,7:6,12,chr12:111278914:G:A,chr12:112549081:T:C,111278914,112549081,1270.168,...,7,6,13,0.428293,13,0.076923,0.043165,1,0,0.001878
4,S23,CON,12,7:5,10,chr10:37413507:T:A,chr10:38306784:T:C,37413507,38306784,893.278,...,7,5,12,0.288016,12,0.076923,0.035971,1,0,0.001263
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
223,S409,CON,2,1:1,21,chr21:24522616:C:T,chr21:26423081:C:T,24522616,26423081,1900.466,...,1,1,2,1.000000,2,0.010989,0.007194,1,0,0.004386
224,S410,CON,2,1:1,21,chr21:27018589:G:A,chr21:28152094:A:G,27018589,28152094,1133.506,...,1,1,2,1.000000,2,0.010989,0.007194,1,0,0.004386
225,S411,CON,2,1:1,22,chr22:27779753:G:A,chr22:30214581:C:T,27779753,30214581,2434.829,...,1,1,2,1.000000,2,0.010989,0.007194,1,0,0.004386
226,S413,CON,2,1:1,22,chr22:30558084:G:A,chr22:32332418:T:TC,30558084,32332418,1774.335,...,1,1,2,1.000000,2,0.010989,0.007194,1,0,0.004386


In [70]:
# Filter the ROH data for rows where the 'PHE' column contains ':0'
filtered_ROH = ROH.loc[ROH['PHE'].str.contains(':0')]

# Display the filtered data
filtered_ROH

Unnamed: 0,POOL,FID,IID,PHE,CHR,SNP1,SNP2,BP1,BP2,KB,...,NcasesWithROHs,NcontrolsWithROHs,combinedN,P,total_ROH_count,propCases,propControls,caseEnriched,passMultiTest,BONF
31,S97,CON,5,5:0,4,chr4:121656615:T:G,chr4:121970619:G:C,121656615,121970619,314.005,...,5,0,5,0.019712,5,0.054945,0.0,1,0,8.6e-05
53,S127,CON,4,4:0,4,chr4:114942670:T:A,chr4:116313712:G:A,114942670,116313712,1371.043,...,4,0,4,0.047947,4,0.043956,0.0,1,0,0.00021
54,S128,CON,4,4:0,4,chr4:130700347:C:G,chr4:131682533:A:C,130700347,131682533,982.187,...,4,0,4,0.047947,4,0.043956,0.0,1,0,0.00021
55,S129,CON,4,4:0,4,chr4:131985505:A:G,chr4:132300720:G:C,131985505,132300720,315.216,...,4,0,4,0.047947,4,0.043956,0.0,1,0,0.00021
77,S167,CON,3,3:0,1,chr1:15673611:C:T,chr1:17215454:A:G,15673611,17215454,1541.844,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.00052
78,S168,CON,3,3:0,1,chr1:31767341:A:G,chr1:33234313:G:A,31767341,33234313,1466.973,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.00052
86,S190,CON,3,3:0,3,chr3:120504534:G:C,chr3:122063104:T:A,120504534,122063104,1558.571,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.00052
87,S191,CON,3,3:0,3,chr3:135906881:A:T,chr3:137453392:T:A,135906881,137453392,1546.512,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.00052
90,S198,CON,3,3:0,4,chr4:12655552:T:A,chr4:13756738:G:A,12655552,13756738,1101.187,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.00052
91,S199,CON,3,3:0,4,chr4:71774726:T:G,chr4:73412553:C:T,71774726,73412553,1637.828,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.00052


In [71]:
%%bash

# Set working directory and ancestry code
WORK_DIR='/home/jupyter/Release6_{ANC}'
cd $WORK_DIR
ANC=AMR

# Count and display various ROH statistics
{
    printf "Number of ROH: "
    wc -l < gp2_${ANC}_r6_EOPD.hom.overlap.filtered

    printf "Number of ROH enriched in Cases: "
    sed 1d < ${ANC}_ROH.EOPD_caseEnriched.txt | wc -l

    printf "Number of ROH that Pass Bonferroni: "
    cut -d',' -f26 < ${ANC}_ROH.EOPD_caseEnriched.txt | sed 1d | { grep -c '1' || true; }
}

bash: line 4: cd: /home/jupyter/Release6_{ANC}: No such file or directory


Number of ROH: 345
Number of ROH enriched in Cases: 228
Number of ROH that Pass Bonferroni: 0


#### In PD genes

Examine the ROHs that overlap with genes associated with PD.

In [72]:
%%bash

# Set working directory and ancestry code
WORK_DIR='/home/jupyter/Release6_{ANC}'
cd $WORK_DIR
ANC=AMR

# Filter ROH data for 'CON' and create clean file
grep 'CON' gp2_${ANC}_r6_EOPD.hom.overlap.filtered | awk -F' ' '{if($4 != "-9")print}' > ${ANC}_ROH.hom.overlap.EOPD.CON.txt

# Filter non-CON/UNION ROHs and clean the file
grep -v 'CON' gp2_${ANC}_r6_EOPD.hom.overlap.filtered | grep -v 'UNION' | awk -F' ' '{if($4 != "-9")print}' | grep -v 'NA' > ${ANC}_ROH.hom.overlap.EOPD.clean.txt

# Run the Perl script to map ROH data with gene list
perl post_plink_ROH_mapping.pl ${ANC}_ROH.hom.overlap.EOPD.clean.txt ${ANC}_ROH.hom.overlap.EOPD.CON.txt geneList.txt ${ANC}_ROH.hom.overlap.EOPD.CON.PDgenes.txt

bash: line 4: cd: /home/jupyter/Release6_{ANC}: No such file or directory


PD gene: LOC100131289
PD gene: RSL24D1P1
PD gene: RBX1
PD gene: SPG11
PD gene: TMEM229B
PD gene: TMEM163
PD gene: TMEM163
PD gene: ELOVL7
PD gene: ELOVL7
PD gene: SLC44A4
PD gene: HLA-DRB5
PD gene: CD19
PD gene: SETD1A
PD gene: SETD1A
PD gene: ZNF646
PD gene: KAT8
PD gene: ASXL3
PD gene: ASXL3
PD gene: ATP13A2
PD gene: KPNA1
PD gene: KPNA1
PD gene: SPTSSB
PD gene: SPTSSB
PD gene: CAMK2D
PD gene: CAMK2D
PD gene: ELOVL7
PD gene: PARK2
PD gene: PARK2
PD gene: PARK2
PD gene: USP8
PD gene: UBTF
PD gene: FAM171A2
PD gene: MAP3K14
PD gene: CRHR1
PD gene: MAPT-AS1
PD gene: KANSL1
PD gene: MAPT
PD gene: MAPT
PD gene: MAP3K14
PD gene: CRHR1
PD gene: MAPT-AS1
PD gene: KANSL1
PD gene: MAPT
PD gene: MAPT
PD gene: NSF
PD gene: WNT3
PD gene: WNT3
PD gene: NSF
PD gene: WNT3
PD gene: WNT3
PD gene: ARHGAP27
PD gene: MYLK2
PD gene: HRNR
PD gene: KCNS3
PD gene: KCNS3
PD gene: SCN3A
PD gene: STK39
PD gene: STK39
PD gene: MED12L
PD gene: MED12L
PD gene: UCHL1
PD gene: SNCA
PD gene: AC097478.1
PD gene: AC097

In [73]:
%%R

# Load the ROH data from the CON PD genes file
ROH <- fread(paste0(ANC, "_ROH.hom.overlap.EOPD.CON.PDgenes.txt"), header = TRUE)

# Display the structure of the ROH data
str(ROH)

Classes ‘data.table’ and 'data.frame':	42 obs. of  14 variables:
 $ POOL           : chr  "S117" "S133" "S138" "S165" ...
 $ num_subjects   : int  4 4 4 4 4 3 3 3 3 3 ...
 $ case_to_control: chr  "0:4" "3:1" "2:2" "2:2" ...
 $ ratio          : chr  "0.0" "3.0" "1.0" "1.0" ...
 $ CHR            : chr  "chr2" "chr5" "chr6" "chr16" ...
 $ SNP1           : chr  "chr2:134725300:C:T" "chr5:60274169:TG:T" "chr6:31353056:C:T" "chr16:29651317:G:T" ...
 $ SNP2           : chr  "chr2:136064653:T:C" "chr5:61320199:T:G" "chr6:31949174:T:C" "chr16:31132672:C:T" ...
 $ BP1            : int  134725300 60274169 31353056 29651317 32982212 15673611 120504534 160234491 112860471 61627648 ...
 $ BP2            : int  136064653 61320199 31949174 31132672 34497055 17215454 122063104 161775573 114748490 63237065 ...
 $ KB             : num  1339 1046 596 1481 1515 ...
 $ NSNP           : int  1272 1277 1055 923 1597 1569 2698 2049 2729 2351 ...
 $ PD gene        : chr  "TMEM163 TMEM163" "ELOVL7 ELOVL7" "SLC44

In [74]:
# Use this log data as input for the next two steps
shell_do(f'grep "are cases and" {WORK_DIR}/gp2_{ANC}_r6_EOPD.log')

Executing: grep "are cases and" /home/jupyter/Release6_AMR/gp2_AMR_r6_EOPD.log


Among remaining phenotypes, 91 are cases and 139 are controls.


In [75]:
%%R

# Split the 'case_to_control' column into case and control components
ROH$case <- ldply(strsplit(as.character(ROH$case_to_control), split = ":"))[[1]]
ROH$control <- ldply(strsplit(as.character(ROH$case_to_control), split = ":"))[[2]]

# Convert case and control components to numeric
ROH$NcasesWithROHs <- as.numeric(ROH$case)
ROH$NcontrolsWithROHs <- as.numeric(ROH$control)

# Calculate total counts for cases, controls, and combined
ROH$combinedN <- ROH$NcasesWithROHs + ROH$NcontrolsWithROHs

# Initialize the 'P' column for p-values
ROH$P <- NA

# Perform proportion tests for each row
for (i in seq_len(nrow(ROH))) {
  thisP <- prop.test(x = c(ROH$NcasesWithROHs[i], ROH$NcontrolsWithROHs[i]), 
                     n = c(91, 139))  # n = (cases, controls)
  ROH$P[i] <- thisP$p.value
}

# Calculate the total ROH count for cases and controls
ROH$total_ROH_count <- ROH$NcasesWithROHs + ROH$NcontrolsWithROHs

# Display the structure of the ROH data
str(ROH)

Classes ‘data.table’ and 'data.frame':	42 obs. of  21 variables:
 $ POOL             : chr  "S117" "S133" "S138" "S165" ...
 $ num_subjects     : int  4 4 4 4 4 3 3 3 3 3 ...
 $ case_to_control  : chr  "0:4" "3:1" "2:2" "2:2" ...
 $ ratio            : chr  "0.0" "3.0" "1.0" "1.0" ...
 $ CHR              : chr  "chr2" "chr5" "chr6" "chr16" ...
 $ SNP1             : chr  "chr2:134725300:C:T" "chr5:60274169:TG:T" "chr6:31353056:C:T" "chr16:29651317:G:T" ...
 $ SNP2             : chr  "chr2:136064653:T:C" "chr5:61320199:T:G" "chr6:31949174:T:C" "chr16:31132672:C:T" ...
 $ BP1              : int  134725300 60274169 31353056 29651317 32982212 15673611 120504534 160234491 112860471 61627648 ...
 $ BP2              : int  136064653 61320199 31949174 31132672 34497055 17215454 122063104 161775573 114748490 63237065 ...
 $ KB               : num  1339 1046 596 1481 1515 ...
 $ NSNP             : int  1272 1277 1055 923 1597 1569 2698 2049 2729 2351 ...
 $ PD gene          : chr  "TMEM163 TMEM163



In [76]:
%%R

# Calculate proportions of cases and controls with ROH
ROH$propCases <- ROH$NcasesWithROHs / 91  # Number of cases
ROH$propControls <- ROH$NcontrolsWithROHs / 139  # Number of controls

# Identify case-enriched ROHs
ROH$caseEnriched <- ifelse(ROH$propCases > ROH$propControls, 1, 0)

# Subset ROHs where total ROH count >= 1 and case-enriched
ROH_subsetted <- subset(ROH, total_ROH_count >= 1 & caseEnriched == 1)

# Apply Bonferroni correction
Ngenes <- length(ROH_subsetted$PD)  # Number of genes
ROH_subsetted$passMultiTest <- ifelse(ROH_subsetted$P <= (0.05 / Ngenes), 1, 0)
ROH_subsetted$BONF <- ROH_subsetted$P / Ngenes

# Subset the significant results after multiple testing correction
significant_ROH <- subset(ROH_subsetted, passMultiTest == 1)

# Apply Bonferroni correction to the full dataset
ROH$BONF <- ROH$P / Ngenes

# Display summary of Bonferroni-corrected results
summary(ROH_subsetted$passMultiTest)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 


In [77]:
%%R

# Write the subsetted ROH data to text files
fwrite(ROH_subsetted, paste0(ANC, "_ROH.EOPD_PDgenes_subsetted.txt"))
fwrite(ROH_subsetted, paste0(ANC, "_ROH.EOPD_PDgenes_caseEnriched.txt"))

In [78]:
# Read the case-enriched ROH data from the file
ROH = pd.read_csv(f'{WORK_DIR}/{ANC}_ROH.EOPD_PDgenes_caseEnriched.txt')

# Display the DataFrame
ROH

Unnamed: 0,POOL,num_subjects,case_to_control,ratio,CHR,SNP1,SNP2,BP1,BP2,KB,...,NcasesWithROHs,NcontrolsWithROHs,combinedN,P,total_ROH_count,propCases,propControls,caseEnriched,passMultiTest,BONF
0,S133,4,3:1,3.0,chr5,chr5:60274169:TG:T,chr5:61320199:T:G,60274169,61320199,1046.031,...,3,1,4,0.34399,4,0.032967,0.007194,1,0,0.010424
1,S138,4,2:2,1.0,chr6,chr6:31353056:C:T,chr6:31949174:T:C,31353056,31949174,596.119,...,2,2,4,1.0,4,0.021978,0.014388,1,0,0.030303
2,S165,4,2:2,1.0,chr16,chr16:29651317:G:T,chr16:31132672:C:T,29651317,31132672,1481.356,...,2,2,4,1.0,4,0.021978,0.014388,1,0,0.030303
3,S166,4,3:1,3.0,chr18,chr18:32982212:C:T,chr18:34497055:T:C,32982212,34497055,1514.844,...,3,1,4,0.34399,4,0.032967,0.007194,1,0,0.010424
4,S167,3,3:0,3:0,chr1,chr1:15673611:C:T,chr1:17215454:A:G,15673611,17215454,1541.844,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.003595
5,S190,3,3:0,3:0,chr3,chr3:120504534:G:C,chr3:122063104:T:A,120504534,122063104,1558.571,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.003595
6,S202,3,3:0,3:0,chr4,chr4:112860471:G:C,chr4:114748490:G:T,112860471,114748490,1888.02,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.003595
7,S206,3,3:0,3:0,chr5,chr5:61627648:G:A,chr5:63237065:A:G,61627648,63237065,1609.418,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.003595
8,S213,3,2:1,2.0,chr6,chr6:160712093:G:C,chr6:161123166:A:C,160712093,161123166,411.074,...,2,1,3,0.70986,3,0.021978,0.007194,1,0,0.021511
9,S214,3,3:0,3:0,chr6,chr6:161389511:A:C,chr6:161815720:A:G,161389511,161815720,426.21,...,3,0,3,0.118637,3,0.032967,0.0,1,0,0.003595


In [79]:
%%bash

# Set working directory and ancestry code
WORK_DIR='/home/jupyter/Release6_{ANC}'
cd $WORK_DIR
ANC=AMR

# Count and display various ROH statistics
{
    printf "Number of ROH overlapping PD genes: "
    sed 1d < ${ANC}_ROH.hom.overlap.EOPD.CON.PDgenes.txt | wc -l

    printf "Number of ROH enriched in Cases: "
    sed 1d < ${ANC}_ROH.EOPD_PDgenes_caseEnriched.txt | wc -l

    printf "Number of ROH that Pass Bonferroni: "
    cut -d',' -f27 < ${ANC}_ROH.EOPD_PDgenes_caseEnriched.txt | sed 1d | { grep -c '1' || true; }
}

bash: line 4: cd: /home/jupyter/Release6_{ANC}: No such file or directory


Number of ROH overlapping PD genes: 42
Number of ROH enriched in Cases: 33
Number of ROH that Pass Bonferroni: 0


# Fine mapping loop (all)

Iterate over all identified regions to perform fine mapping, isolating the regions most likely to contain disease-causing mutations.

In [80]:
# Create a working directory for use in Python
print("Making a working directory")
WORK_DIR = '/home/jupyter/Mapping'
shell_do(f'mkdir -p {WORK_DIR}')  # Create the directory if it doesn't exist

Making a working directory


Executing: mkdir -p /home/jupyter/Mapping


## Download pools

Download the pools of samples for further analysis of shared alleles

In [83]:
# Identify and list different names in pools, excluding certain files
shell_do(f"ls {WORK_DIR}/*.vcf.gz | sed 's/.vcf.gz//g' | sort -V | grep -v 'chr1_shared_roh' > {WORK_DIR}/list_pools.txt")

# Filter and identify pools that need to be renamed
shell_do(f"grep 'S[0-9]\\+_' {WORK_DIR}/list_pools.txt > {WORK_DIR}/to_rename_pools.txt")

Executing: ls /home/jupyter/Mapping/*.vcf.gz | sed 's/.vcf.gz//g' | sort -V | grep -v 'chr1_shared_roh' > /home/jupyter/Mapping/list_pools.txt
Executing: grep 'S[0-9]\+_' /home/jupyter/Mapping/list_pools.txt > /home/jupyter/Mapping/to_rename_pools.txt


In [84]:
# Rename pools to match the desired nomenclature
with open(f'{WORK_DIR}/to_rename_pools.txt', 'r') as file:
    # Iterate over each line in the file
    for line in file:
        # Remove the newline character and get the pool name
        pool = line.strip()
        name = pool.split('/')[-1]  # Extract the file name from the path
        pool_id, ancestry = name.split('_')  # Split the name into pool ID and ancestry

        # Rename the VCF and index files to match the new nomenclature
        shell_do(f"mv {WORK_DIR}/{name}.vcf.gz {WORK_DIR}/{ancestry}_{pool_id}.vcf.gz")
        shell_do(f"mv {WORK_DIR}/{name}.vcf.gz.tbi {WORK_DIR}/{ancestry}_{pool_id}.vcf.gz.tbi")

Executing: mv /home/jupyter/Mapping/S7_MDE.vcf.gz /home/jupyter/Mapping/MDE_S7.vcf.gz
Executing: mv /home/jupyter/Mapping/S7_MDE.vcf.gz.tbi /home/jupyter/Mapping/MDE_S7.vcf.gz.tbi
Executing: mv /home/jupyter/Mapping/S14_MDE.vcf.gz /home/jupyter/Mapping/MDE_S14.vcf.gz
Executing: mv /home/jupyter/Mapping/S14_MDE.vcf.gz.tbi /home/jupyter/Mapping/MDE_S14.vcf.gz.tbi
Executing: mv /home/jupyter/Mapping/S15_MDE.vcf.gz /home/jupyter/Mapping/MDE_S15.vcf.gz
Executing: mv /home/jupyter/Mapping/S15_MDE.vcf.gz.tbi /home/jupyter/Mapping/MDE_S15.vcf.gz.tbi
Executing: mv /home/jupyter/Mapping/S16_MDE.vcf.gz /home/jupyter/Mapping/MDE_S16.vcf.gz
Executing: mv /home/jupyter/Mapping/S16_MDE.vcf.gz.tbi /home/jupyter/Mapping/MDE_S16.vcf.gz.tbi
Executing: mv /home/jupyter/Mapping/S17_MDE.vcf.gz /home/jupyter/Mapping/MDE_S17.vcf.gz
Executing: mv /home/jupyter/Mapping/S17_MDE.vcf.gz.tbi /home/jupyter/Mapping/MDE_S17.vcf.gz.tbi
Executing: mv /home/jupyter/Mapping/S51_MDE.vcf.gz /home/jupyter/Mapping/MDE_S51.vcf

In [85]:
# Generate an updated list of renamed pools, excluding certain files
shell_do(f"ls {WORK_DIR}/*.vcf.gz | sed 's/.vcf.gz//g' | sort -V | grep -v 'chr1_shared_roh' > {WORK_DIR}/list_renamed_pools.txt")

Executing: ls /home/jupyter/Mapping/*.vcf.gz | sed 's/.vcf.gz//g' | sort -V | grep -v 'chr1_shared_roh' > /home/jupyter/Mapping/list_renamed_pools.txt


## Fine mapping in shared alleles ROH (subset ROH regions shared)

Perform fine mapping on the ROH regions shared between individuals. This step narrows down potential causal variants.

In [88]:
# Display the last few lines of the pool origin file
shell_do(f'tail {WORK_DIR}/pools_origin.tsv')

Executing: tail /home/jupyter/Mapping/pools_origin.tsv














### Select individuals that share alleles

Identify individuals who share alleles in the same ROH to investigate potential genetic similarities.

In [None]:
# Initialize an empty DataFrame to store allele-matched ROH data
allele_match_ROH = pd.DataFrame()

with open(f'{WORK_DIR}/pools_origin.tsv', 'r') as file:
    # Iterate over each line in the file
    for line in file:
        # Generate variables from the line
        pool = line.strip()
        ancestry, type_pool, pool_id, path = pool.split('\t')

        # Load the corresponding .hom.overlap file
        hom_overlap = pd.read_csv(f"{WORK_DIR}/{path}", sep='\s+')

        # Clean the GRP column by removing asterisks (*)
        hom_overlap['GRP'] = hom_overlap['GRP'].astype(str).str.replace('*', '', regex=False)

        # Filter the pool for the current pool ID
        work_pool = hom_overlap[hom_overlap['POOL'] == pool_id]

        # Select ROH pools that shared alleles
        result = work_pool[work_pool['NSIM'] > 0]
        shared1 = result[result['GRP'] == '1']

        # Insert the ancestry as the first column
        shared1.insert(0, 'Ancestry', ancestry)

        # Append the shared ROH data to the main DataFrame
        allele_match_ROH = pd.concat([allele_match_ROH, shared1], ignore_index=True)

        # If there are shared ROHs, write to a TSV file
        if len(shared1) > 0:
            shared1['Total'] = len(work_pool) - 2
            shared1[['IID', 'CHR', 'BP1', 'BP2', 'Total']].to_csv(
                f'{WORK_DIR}/PHENO_{ancestry}_{pool_id}.tsv', sep='\t', index=False
            )

# Display the final allele-matched ROH DataFrame
allele_match_ROH

In [90]:
# List PHENO files, clean the filenames, and save the result to shared_alleles_pools.txt
shell_do(f"ls {WORK_DIR}/PHENO* | sed 's|{WORK_DIR}/PHENO_||g' | sed 's/.tsv//g' > {WORK_DIR}/shared_alleles_pools.txt")

Executing: ls /home/jupyter/Mapping/PHENO* | sed 's|/home/jupyter/Mapping/PHENO_||g' | sed 's/.tsv//g' > /home/jupyter/Mapping/shared_alleles_pools.txt


In [91]:
# Filter list_renamed_pools.txt using shared_alleles_pools.txt and save the result
shell_do(f"grep -f {WORK_DIR}/shared_alleles_pools.txt {WORK_DIR}/list_renamed_pools.txt > {WORK_DIR}/list_shared_alleles_pools.txt")

Executing: grep -f /home/jupyter/Mapping/shared_alleles_pools.txt /home/jupyter/Mapping/list_renamed_pools.txt > /home/jupyter/Mapping/list_shared_alleles_pools.txt


### Select homozygous variants and extract CSQ from the VCF file

In [92]:
# Initialize an empty DataFrame to store results from all pools
df_results = pd.DataFrame()

# Open and read the file containing the list of pools
with open(f'{WORK_DIR}/list_shared_alleles_pools.txt', 'r') as file:
    # Iterate over each pool listed in the file
    for line in file:
        # Remove trailing newline characters from the line
        pool = line.strip()
        print(f"Processing pool: {pool}")

        # Extract the pool name (e.g., ancestry-specific identifier) from the file path
        pool_name = pool.split('/')[4]
        
        # Generate the PHENO file to subset pools based on individual IDs
        shell_do(f"cat {WORK_DIR}/PHENO_{pool_name}.tsv | cut -f1 | sed 1d > {WORK_DIR}/PHENO_{pool_name}.IID")

        # Read the PHENO file to extract genomic positions and chromosome information
        positions = pd.read_csv(f"{WORK_DIR}/PHENO_{pool_name}.tsv", sep='\t')
        CHR = positions['CHR'].max()  # Extract the maximum chromosome number (optional)

        # Check if WGS data contains the samples sharing alleles
        shell_do(f'/home/jupyter/tools/plink2 --vcf {pool}.vcf.gz --make-just-psam --out {pool}')
        IIDs_WGS = pd.read_csv(f"{pool}.psam", sep='\t')
        is_contained = IIDs_WGS['#IID'].isin(positions['IID']).any()

        # Proceed only if at least one sample from the pool is found in WGS data
        if is_contained:
            # Run PLINK2 to set variant IDs and generate a .raw file with sample data
            shell_do(f'/home/jupyter/tools/plink2 --vcf {pool}.vcf.gz \
                           --set-all-var-ids @:#:\$r:\$a \
                           --new-id-max-allele-len 160 \
                           --keep {WORK_DIR}/PHENO_{pool_name}.IID \
                           --recode A --out {pool}')
            
            # Read the generated .raw file into a DataFrame
            raw = pd.read_csv(f'{pool}.raw', sep='\t')
            
            # Identify columns in the .raw file containing only zeros (homozygous variants)
            columns_with_only_0 = raw.columns[(raw.eq(0).all())]

            # Decompress the VCF file for annotation purposes
            shell_do(f'gunzip -c {pool}.vcf.gz > {pool}.vcf')
            
            # Extract chromosome, position, reference allele, and alternate allele from the VCF file (first part)
            shell_do(f"cat {pool}.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > {pool}_begin")
            
            # Extract annotation information (e.g., functional effects, allele frequencies) from the VCF file (second part)
            shell_do(f"cat {pool}.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/\t/g' | \
                     awk -F '\t' 'BEGIN{{OFS=FS}}{{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}}' > {pool}_end")
            
            # Combine the extracted information into a reduced annotation file
            shell_do(f'paste {pool}_begin {pool}_end | sed 1d > {pool}_reduced.tsv')
            
            # Define column names for the reduced annotation file
            column_names = [
                "chr", "bp", "ID", "REF", "ALT", "rsid", "gene", "variant_type", "Variant_class", 
                "variant_impact", "GnomADg_AF", "clinical_signif", "clinvar_signif", "cadd_phred", 
                "REVEL", "eve_class", "am_class", "SIFT", "Polyphen"
            ]
            
            # Read the reduced annotation file into a DataFrame
            vars_anno_reduced = pd.read_csv(
                f'{pool}_reduced.tsv', sep='\t', low_memory=False, header=None, names=column_names
            )
            
            # Generate a unique ID for each variant by combining chromosome, position, REF, and ALT columns
            vars_anno_reduced['ID'] = vars_anno_reduced['chr'].astype(str) + ':' + vars_anno_reduced['bp'].astype(str) + ':' + vars_anno_reduced['REF'] + ':' + vars_anno_reduced['ALT']
            
            # Create a new column to match the format of column names in the .raw file
            vars_anno_reduced['colname'] = vars_anno_reduced['ID'] + '_' + vars_anno_reduced['REF']
            
            # Drop the temporary ID column as it is no longer needed
            vars_anno_reduced = vars_anno_reduced.drop('ID', axis=1)
            
            # Filter the reduced annotation DataFrame to retain only rows with columns containing zeros in the .raw file
            results = vars_anno_reduced.loc[vars_anno_reduced['colname'].isin(columns_with_only_0)]
            
            # Extract ancestry and pool ID from the pool name
            ancestry, pool_id = pool_name.split('_')
            
            # Add the ancestry as the first column in the filtered DataFrame
            results.insert(0, 'Ancestry', ancestry)
            
            # Add the pool ID as the second column in the filtered DataFrame
            results.insert(1, 'Pool', pool_id)
            
            # Append the filtered results to the main DataFrame
            df_results = pd.concat([df_results, results], ignore_index=True)

# Sort the concatenated results by 'Ancestry' and 'Pool' in descending order
df_results = df_results.sort_values(by=['Ancestry', 'Pool'], ascending=[False, False])

# Display the final sorted DataFrame
df_results

Processing pool: /home/jupyter/Mapping/AMR_S97


Executing: cat /home/jupyter/Mapping/PHENO_AMR_S97.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_AMR_S97.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S97.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AMR_S97


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S97.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AMR_S97
  --vcf /home/jupyter/Mapping/AMR_S97.vcf.gz

Start time: Tue Nov 26 05:25:49 2024
628867 MiB RAM detected, ~621586 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 57669 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S97-temporary.pgen +
/home/jupyter/Mapping/AMR_S97-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S97-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S97-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AMR_S97.psam ... done.
End time: Tue Nov 26 05:25:49 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S97.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_AMR_S97.IID                            --recode A --out /home/jupyter/Mapping/AMR_S97


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S97.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_AMR_S97.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AMR_S97
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AMR_S97.vcf.gz

Start time: Tue Nov 26 05:25:50 2024
628867 MiB RAM detected, ~621596 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 57669 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S97-temporary.pgen +
/home/jupyter/Mapping/AMR_S97-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S97-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S97-temporary.psam.
57669 variants loaded from /home/jupyter/Mapping/AMR_S97-temporary.pvar.zst.
Note: No phenot

Executing: gunzip -c /home/jupyter/Mapping/AMR_S97.vcf.gz > /home/jupyter/Mapping/AMR_S97.vcf
Executing: cat /home/jupyter/Mapping/AMR_S97.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AMR_S97_begin
Executing: cat /home/jupyter/Mapping/AMR_S97.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AMR_S97_end
Executing: paste /home/jupyter/Mapping/AMR_S97_begin /home/jupyter/Mapping/AMR_S97_end | sed 1d > /home/jupyter/Mapping/AMR_S97_reduced.tsv


Processing pool: /home/jupyter/Mapping/AMR_S122


Executing: cat /home/jupyter/Mapping/PHENO_AMR_S122.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_AMR_S122.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S122.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AMR_S122


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S122.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AMR_S122
  --vcf /home/jupyter/Mapping/AMR_S122.vcf.gz

Start time: Tue Nov 26 05:26:08 2024
628867 MiB RAM detected, ~621593 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 14808 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S122-temporary.pgen +
/home/jupyter/Mapping/AMR_S122-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S122-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S122-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AMR_S122.psam ... done.
End time: Tue Nov 26 05:26:08 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S122.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_AMR_S122.IID                            --recode A --out /home/jupyter/Mapping/AMR_S122


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S122.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_AMR_S122.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AMR_S122
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AMR_S122.vcf.gz

Start time: Tue Nov 26 05:26:09 2024
628867 MiB RAM detected, ~621595 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 14808 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S122-temporary.pgen +
/home/jupyter/Mapping/AMR_S122-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S122-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S122-temporary.psam.
14808 variants loaded from /home/jupyter/Mapping/AMR_S122-temporary.pvar.zst.
Note: 

Executing: gunzip -c /home/jupyter/Mapping/AMR_S122.vcf.gz > /home/jupyter/Mapping/AMR_S122.vcf
Executing: cat /home/jupyter/Mapping/AMR_S122.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AMR_S122_begin
Executing: cat /home/jupyter/Mapping/AMR_S122.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AMR_S122_end
Executing: paste /home/jupyter/Mapping/AMR_S122_begin /home/jupyter/Mapping/AMR_S122_end | sed 1d > /home/jupyter/Mapping/AMR_S122_reduced.tsv


Processing pool: /home/jupyter/Mapping/AMR_S129


Executing: cat /home/jupyter/Mapping/PHENO_AMR_S129.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_AMR_S129.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S129.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AMR_S129


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S129.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AMR_S129
  --vcf /home/jupyter/Mapping/AMR_S129.vcf.gz

Start time: Tue Nov 26 05:26:19 2024
628867 MiB RAM detected, ~621595 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 32245 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S129-temporary.pgen +
/home/jupyter/Mapping/AMR_S129-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S129-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S129-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AMR_S129.psam ... done.
End time: Tue Nov 26 05:26:19 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S129.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_AMR_S129.IID                            --recode A --out /home/jupyter/Mapping/AMR_S129


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S129.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_AMR_S129.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AMR_S129
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AMR_S129.vcf.gz

Start time: Tue Nov 26 05:26:20 2024
628867 MiB RAM detected, ~621602 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 32245 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S129-temporary.pgen +
/home/jupyter/Mapping/AMR_S129-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S129-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S129-temporary.psam.
32245 variants loaded from /home/jupyter/Mapping/AMR_S129-temporary.pvar.zst.
Note: 

Executing: gunzip -c /home/jupyter/Mapping/AMR_S129.vcf.gz > /home/jupyter/Mapping/AMR_S129.vcf
Executing: cat /home/jupyter/Mapping/AMR_S129.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AMR_S129_begin
Executing: cat /home/jupyter/Mapping/AMR_S129.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AMR_S129_end
Executing: paste /home/jupyter/Mapping/AMR_S129_begin /home/jupyter/Mapping/AMR_S129_end | sed 1d > /home/jupyter/Mapping/AMR_S129_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S7


Executing: cat /home/jupyter/Mapping/PHENO_MDE_S7.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_MDE_S7.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S7.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S7


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S7.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S7
  --vcf /home/jupyter/Mapping/MDE_S7.vcf.gz

Start time: Tue Nov 26 05:26:33 2024
628867 MiB RAM detected, ~621590 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 8263 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S7-temporary.pgen +
/home/jupyter/Mapping/MDE_S7-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S7-temporary.psam written.
5 samples (0 females, 0 males, 5 ambiguous; 5 founders) loaded from
/home/jupyter/Mapping/MDE_S7-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S7.psam ... done.
End time: Tue Nov 26 05:26:33 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S7.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_MDE_S7.IID                            --recode A --out /home/jupyter/Mapping/MDE_S7


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S7.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_MDE_S7.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S7
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S7.vcf.gz

Start time: Tue Nov 26 05:26:34 2024
628867 MiB RAM detected, ~621597 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 8263 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S7-temporary.pgen +
/home/jupyter/Mapping/MDE_S7-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S7-temporary.psam written.
5 samples (0 females, 0 males, 5 ambiguous; 5 founders) loaded from
/home/jupyter/Mapping/MDE_S7-temporary.psam.
8263 variants loaded from /home/jupyter/Mapping/MDE_S7-temporary.pvar.zst.
Note: No phenotype data pr

Executing: gunzip -c /home/jupyter/Mapping/MDE_S7.vcf.gz > /home/jupyter/Mapping/MDE_S7.vcf
Executing: cat /home/jupyter/Mapping/MDE_S7.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S7_begin
Executing: cat /home/jupyter/Mapping/MDE_S7.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S7_end
Executing: paste /home/jupyter/Mapping/MDE_S7_begin /home/jupyter/Mapping/MDE_S7_end | sed 1d > /home/jupyter/Mapping/MDE_S7_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S51


Executing: cat /home/jupyter/Mapping/PHENO_MDE_S51.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_MDE_S51.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S51.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S51


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S51.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S51
  --vcf /home/jupyter/Mapping/MDE_S51.vcf.gz

Start time: Tue Nov 26 05:26:42 2024
628867 MiB RAM detected, ~621595 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 53709 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S51-temporary.pgen +
/home/jupyter/Mapping/MDE_S51-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S51-temporary.psam written.
5 samples (0 females, 0 males, 5 ambiguous; 5 founders) loaded from
/home/jupyter/Mapping/MDE_S51-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S51.psam ... done.
End time: Tue Nov 26 05:26:42 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S51.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_MDE_S51.IID                            --recode A --out /home/jupyter/Mapping/MDE_S51


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S51.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_MDE_S51.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S51
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S51.vcf.gz

Start time: Tue Nov 26 05:26:44 2024
628867 MiB RAM detected, ~621598 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 53709 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S51-temporary.pgen +
/home/jupyter/Mapping/MDE_S51-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S51-temporary.psam written.
5 samples (0 females, 0 males, 5 ambiguous; 5 founders) loaded from
/home/jupyter/Mapping/MDE_S51-temporary.psam.
53709 variants loaded from /home/jupyter/Mapping/MDE_S51-temporary.pvar.zst.
Note: No phenot

Executing: gunzip -c /home/jupyter/Mapping/MDE_S51.vcf.gz > /home/jupyter/Mapping/MDE_S51.vcf
Executing: cat /home/jupyter/Mapping/MDE_S51.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S51_begin
Executing: cat /home/jupyter/Mapping/MDE_S51.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S51_end
Executing: paste /home/jupyter/Mapping/MDE_S51_begin /home/jupyter/Mapping/MDE_S51_end | sed 1d > /home/jupyter/Mapping/MDE_S51_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S164


Executing: cat /home/jupyter/Mapping/PHENO_MDE_S164.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_MDE_S164.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S164.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S164


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S164.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S164
  --vcf /home/jupyter/Mapping/MDE_S164.vcf.gz

Start time: Tue Nov 26 05:27:02 2024
628867 MiB RAM detected, ~621603 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 308278 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S164-temporary.pgen +
/home/jupyter/Mapping/MDE_S164-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S164-temporary.psam written.
6 samples (0 females, 0 males, 6 ambiguous; 6 founders) loaded from
/home/jupyter/Mapping/MDE_S164-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S164.psam ... done.
End time: Tue Nov 26 05:27:03 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S164.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_MDE_S164.IID                            --recode A --out /home/jupyter/Mapping/MDE_S164


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S164.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_MDE_S164.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S164
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S164.vcf.gz

Start time: Tue Nov 26 05:27:04 2024
628867 MiB RAM detected, ~621605 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 308278 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S164-temporary.pgen +
/home/jupyter/Mapping/MDE_S164-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S164-temporary.psam written.
6 samples (0 females, 0 males, 6 ambiguous; 6 founders) loaded from
/home/jupyter/Mapping/MDE_S164-temporary.psam.
308278 variants loaded from /home/jupyter/Mapping/MDE_S164-temporary.pvar.zst.
Note

Executing: gunzip -c /home/jupyter/Mapping/MDE_S164.vcf.gz > /home/jupyter/Mapping/MDE_S164.vcf
Executing: cat /home/jupyter/Mapping/MDE_S164.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S164_begin
Executing: cat /home/jupyter/Mapping/MDE_S164.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S164_end
Executing: paste /home/jupyter/Mapping/MDE_S164_begin /home/jupyter/Mapping/MDE_S164_end | sed 1d > /home/jupyter/Mapping/MDE_S164_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S186


Executing: cat /home/jupyter/Mapping/PHENO_MDE_S186.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_MDE_S186.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S186.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S186


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S186.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S186
  --vcf /home/jupyter/Mapping/MDE_S186.vcf.gz

Start time: Tue Nov 26 05:28:24 2024
628867 MiB RAM detected, ~621431 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 332557 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S186-temporary.pgen +
/home/jupyter/Mapping/MDE_S186-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S186-temporary.psam written.
7 samples (0 females, 0 males, 7 ambiguous; 7 founders) loaded from
/home/jupyter/Mapping/MDE_S186-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S186.psam ... done.
End time: Tue Nov 26 05:28:26 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S186.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_MDE_S186.IID                            --recode A --out /home/jupyter/Mapping/MDE_S186


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S186.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_MDE_S186.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S186
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S186.vcf.gz

Start time: Tue Nov 26 05:28:27 2024
628867 MiB RAM detected, ~621441 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 332557 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S186-temporary.pgen +
/home/jupyter/Mapping/MDE_S186-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S186-temporary.psam written.
7 samples (0 females, 0 males, 7 ambiguous; 7 founders) loaded from
/home/jupyter/Mapping/MDE_S186-temporary.psam.
332557 variants loaded from /home/jupyter/Mapping/MDE_S186-temporary.pvar.zst.
Note

Executing: gunzip -c /home/jupyter/Mapping/MDE_S186.vcf.gz > /home/jupyter/Mapping/MDE_S186.vcf
Executing: cat /home/jupyter/Mapping/MDE_S186.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S186_begin
Executing: cat /home/jupyter/Mapping/MDE_S186.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S186_end
Executing: paste /home/jupyter/Mapping/MDE_S186_begin /home/jupyter/Mapping/MDE_S186_end | sed 1d > /home/jupyter/Mapping/MDE_S186_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S235


Executing: cat /home/jupyter/Mapping/PHENO_MDE_S235.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_MDE_S235.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S235.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S235


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S235.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S235
  --vcf /home/jupyter/Mapping/MDE_S235.vcf.gz

Start time: Tue Nov 26 05:29:52 2024
628867 MiB RAM detected, ~621330 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 71738 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S235-temporary.pgen +
/home/jupyter/Mapping/MDE_S235-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S235-temporary.psam written.
3 samples (0 females, 0 males, 3 ambiguous; 3 founders) loaded from
/home/jupyter/Mapping/MDE_S235-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S235.psam ... done.
End time: Tue Nov 26 05:29:53 2024
Processing pool: /home/jupyter/Mapping/MDE_S288


Executing: cat /home/jupyter/Mapping/PHENO_MDE_S288.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_MDE_S288.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S288.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S288


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S288.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S288
  --vcf /home/jupyter/Mapping/MDE_S288.vcf.gz

Start time: Tue Nov 26 05:29:55 2024
628867 MiB RAM detected, ~621337 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 38535 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S288-temporary.pgen +
/home/jupyter/Mapping/MDE_S288-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S288-temporary.psam written.
6 samples (0 females, 0 males, 6 ambiguous; 6 founders) loaded from
/home/jupyter/Mapping/MDE_S288-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S288.psam ... done.
End time: Tue Nov 26 05:29:55 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S288.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_MDE_S288.IID                            --recode A --out /home/jupyter/Mapping/MDE_S288


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S288.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_MDE_S288.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S288
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S288.vcf.gz

Start time: Tue Nov 26 05:29:56 2024
628867 MiB RAM detected, ~621342 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 38535 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S288-temporary.pgen +
/home/jupyter/Mapping/MDE_S288-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S288-temporary.psam written.
6 samples (0 females, 0 males, 6 ambiguous; 6 founders) loaded from
/home/jupyter/Mapping/MDE_S288-temporary.psam.
38535 variants loaded from /home/jupyter/Mapping/MDE_S288-temporary.pvar.zst.
Note: 

Executing: gunzip -c /home/jupyter/Mapping/MDE_S288.vcf.gz > /home/jupyter/Mapping/MDE_S288.vcf
Executing: cat /home/jupyter/Mapping/MDE_S288.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S288_begin
Executing: cat /home/jupyter/Mapping/MDE_S288.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S288_end
Executing: paste /home/jupyter/Mapping/MDE_S288_begin /home/jupyter/Mapping/MDE_S288_end | sed 1d > /home/jupyter/Mapping/MDE_S288_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S289


Executing: cat /home/jupyter/Mapping/PHENO_MDE_S289.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_MDE_S289.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S289.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S289


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S289.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S289
  --vcf /home/jupyter/Mapping/MDE_S289.vcf.gz

Start time: Tue Nov 26 05:30:12 2024
628867 MiB RAM detected, ~621421 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 18969 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S289-temporary.pgen +
/home/jupyter/Mapping/MDE_S289-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S289-temporary.psam written.
6 samples (0 females, 0 males, 6 ambiguous; 6 founders) loaded from
/home/jupyter/Mapping/MDE_S289-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S289.psam ... done.
End time: Tue Nov 26 05:30:12 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S289.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_MDE_S289.IID                            --recode A --out /home/jupyter/Mapping/MDE_S289


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S289.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_MDE_S289.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S289
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S289.vcf.gz

Start time: Tue Nov 26 05:30:13 2024
628867 MiB RAM detected, ~621424 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 18969 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S289-temporary.pgen +
/home/jupyter/Mapping/MDE_S289-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S289-temporary.psam written.
6 samples (0 females, 0 males, 6 ambiguous; 6 founders) loaded from
/home/jupyter/Mapping/MDE_S289-temporary.psam.
18969 variants loaded from /home/jupyter/Mapping/MDE_S289-temporary.pvar.zst.
Note: 

Executing: gunzip -c /home/jupyter/Mapping/MDE_S289.vcf.gz > /home/jupyter/Mapping/MDE_S289.vcf
Executing: cat /home/jupyter/Mapping/MDE_S289.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S289_begin
Executing: cat /home/jupyter/Mapping/MDE_S289.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S289_end
Executing: paste /home/jupyter/Mapping/MDE_S289_begin /home/jupyter/Mapping/MDE_S289_end | sed 1d > /home/jupyter/Mapping/MDE_S289_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S406


Executing: cat /home/jupyter/Mapping/PHENO_MDE_S406.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_MDE_S406.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S406.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S406


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S406.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S406
  --vcf /home/jupyter/Mapping/MDE_S406.vcf.gz

Start time: Tue Nov 26 05:30:23 2024
628867 MiB RAM detected, ~621434 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 30203 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S406-temporary.pgen +
/home/jupyter/Mapping/MDE_S406-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S406-temporary.psam written.
4 samples (0 females, 0 males, 4 ambiguous; 4 founders) loaded from
/home/jupyter/Mapping/MDE_S406-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S406.psam ... done.
End time: Tue Nov 26 05:30:23 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S406.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_MDE_S406.IID                            --recode A --out /home/jupyter/Mapping/MDE_S406


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S406.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_MDE_S406.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S406
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S406.vcf.gz

Start time: Tue Nov 26 05:30:25 2024
628867 MiB RAM detected, ~621432 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 30203 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S406-temporary.pgen +
/home/jupyter/Mapping/MDE_S406-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S406-temporary.psam written.
4 samples (0 females, 0 males, 4 ambiguous; 4 founders) loaded from
/home/jupyter/Mapping/MDE_S406-temporary.psam.
30203 variants loaded from /home/jupyter/Mapping/MDE_S406-temporary.pvar.zst.
Note: 

Executing: gunzip -c /home/jupyter/Mapping/MDE_S406.vcf.gz > /home/jupyter/Mapping/MDE_S406.vcf
Executing: cat /home/jupyter/Mapping/MDE_S406.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S406_begin
Executing: cat /home/jupyter/Mapping/MDE_S406.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S406_end
Executing: paste /home/jupyter/Mapping/MDE_S406_begin /home/jupyter/Mapping/MDE_S406_end | sed 1d > /home/jupyter/Mapping/MDE_S406_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S419


Executing: cat /home/jupyter/Mapping/PHENO_MDE_S419.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_MDE_S419.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S419.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S419


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S419.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S419
  --vcf /home/jupyter/Mapping/MDE_S419.vcf.gz

Start time: Tue Nov 26 05:30:38 2024
628867 MiB RAM detected, ~621428 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 191710 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S419-temporary.pgen +
/home/jupyter/Mapping/MDE_S419-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S419-temporary.psam written.
5 samples (0 females, 0 males, 5 ambiguous; 5 founders) loaded from
/home/jupyter/Mapping/MDE_S419-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S419.psam ... done.
End time: Tue Nov 26 05:30:39 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S419.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --keep /home/jupyter/Mapping/PHENO_MDE_S419.IID                            --recode A --out /home/jupyter/Mapping/MDE_S419


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S419.log.
Options in effect:
  --export A
  --keep /home/jupyter/Mapping/PHENO_MDE_S419.IID
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S419
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S419.vcf.gz

Start time: Tue Nov 26 05:30:40 2024
628867 MiB RAM detected, ~621436 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 191710 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S419-temporary.pgen +
/home/jupyter/Mapping/MDE_S419-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S419-temporary.psam written.
5 samples (0 females, 0 males, 5 ambiguous; 5 founders) loaded from
/home/jupyter/Mapping/MDE_S419-temporary.psam.
191710 variants loaded from /home/jupyter/Mapping/MDE_S419-temporary.pvar.zst.
Note

Executing: gunzip -c /home/jupyter/Mapping/MDE_S419.vcf.gz > /home/jupyter/Mapping/MDE_S419.vcf
Executing: cat /home/jupyter/Mapping/MDE_S419.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S419_begin
Executing: cat /home/jupyter/Mapping/MDE_S419.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S419_end
Executing: paste /home/jupyter/Mapping/MDE_S419_begin /home/jupyter/Mapping/MDE_S419_end | sed 1d > /home/jupyter/Mapping/MDE_S419_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S618


Executing: cat /home/jupyter/Mapping/PHENO_MDE_S618.tsv | cut -f1 | sed 1d > /home/jupyter/Mapping/PHENO_MDE_S618.IID
Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S618.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S618


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S618.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S618
  --vcf /home/jupyter/Mapping/MDE_S618.vcf.gz

Start time: Tue Nov 26 05:31:29 2024
628867 MiB RAM detected, ~621417 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 7592 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S618-temporary.pgen +
/home/jupyter/Mapping/MDE_S618-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S618-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/MDE_S618-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S618.psam ... done.
End time: Tue Nov 26 05:31:29 2024


Unnamed: 0,Ancestry,Pool,chr,bp,REF,ALT,rsid,gene,variant_type,Variant_class,...,GnomADg_AF,clinical_signif,clinvar_signif,cadd_phred,REVEL,eve_class,am_class,SIFT,Polyphen,colname
43647,MDE,S7,1,171501065,G,A,rs6671036,PRRC2C,intron_variant,SNV,...,0.9936,,,2.154,,,,,,1:171501065:G:A_G
43648,MDE,S7,1,171521600,G,GTTC,rs10633005,PRRC2C,intron_variant,insertion,...,0.9908,,,6.044,,,,,,1:171521600:G:GTTC_G
43649,MDE,S7,1,171583376,G,GC,rs57171374,PRRC2C,intron_variant,insertion,...,1.0000,,,8.052,,,,,,1:171583376:G:GC_G
43650,MDE,S7,1,171606471,T,TG,rs71107328,MYOCOS,intron_variant,insertion,...,1.0000,,,4.703,,,,,,1:171606471:T:TG_T
43651,MDE,S7,1,171612323,A,C,rs12024517,MYOCOS,intron_variant,SNV,...,1.0000,,,2.094,,,,,,1:171612323:A:C_A
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26945,AMR,S122,10,29735144,C,A,rs7098589,SVIL,intron_variant,SNV,...,0.1510,,,7.393,,,,,,10:29735144:C:A_C
26946,AMR,S122,10,29739723,A,G,rs12782184,SVIL,upstream_gene_variant,SNV,...,0.1459,,,3.750,,,,,,10:29739723:A:G_A
26947,AMR,S122,10,29741673,T,C,rs11007724,SVIL,upstream_gene_variant,SNV,...,0.7994,,,6.252,,,,,,10:29741673:T:C_T
26948,AMR,S122,10,29742568,G,A,rs34535212,,intergenic_variant,SNV,...,0.1728,,,0.982,,,,,,10:29742568:G:A_G


#### Prioritization criteria

Apply prioritization criteria to select variants most likely to be disease-causing.

In [93]:
# Filter candidate variants based on variant impact and GnomADg_AF frequency
candidates_df_results = df_results[
    (
        (df_results['variant_impact'].isin(['HIGH', 'MODERATE']))  # Impact is HIGH or MODERATE
    ) & (
        (df_results['GnomADg_AF'].isnull()) | (df_results['GnomADg_AF'] <= 0.01)  # GnomADg_AF is null or <= 0.01
    )
]

# Display the filtered candidate results
candidates_df_results

Unnamed: 0,Ancestry,Pool,chr,bp,REF,ALT,rsid,gene,variant_type,Variant_class,...,GnomADg_AF,clinical_signif,clinvar_signif,cadd_phred,REVEL,eve_class,am_class,SIFT,Polyphen,colname
158125,MDE,S419,14,60118341,C,T,rs773266496&COSV58303323,PCNX4,missense_variant,SNV,...,3.3e-05,,,22.0,0.129504,,likely_benign,tolerated(0.06),benign(0.007),14:60118341:C:T_C
165508,MDE,S419,14,66924264,A,G,rs41285470,GPHN,missense_variant,SNV,...,0.005868,likely_benign,,0.12,,,,tolerated_low_confidence(1),benign(0),14:66924264:A:G_A
166222,MDE,S419,14,67577308,C,T,rs369145807,PLEKHH1,missense_variant,SNV,...,6.6e-05,,,15.36,0.06771,,,deleterious_low_confidence(0.03),benign(0.078),14:67577308:C:T_C
167862,MDE,S419,14,69352130,G,A,rs144512254,GALNT16,missense_variant,SNV,...,0.00046,,,7.087,0.061827,,likely_benign,tolerated(0.17),benign(0),14:69352130:G:A_G
170967,MDE,S419,14,72940346,G,A,rs145062084&COSV105912372,DCAF4,missense_variant,SNV,...,0.000388,likely_benign,,8.009,0.017264,,likely_benign,tolerated(0.76),benign(0.005),14:72940346:G:A_G
195580,MDE,S419,14,105863242,T,C,rs2338627,IGHJ6,missense_variant,SNV,...,0.008675,,,10.07,,,,,,14:105863242:T:C_T
195581,MDE,S419,14,105863243,A,C,rs2338626,IGHJ6,missense_variant,SNV,...,0.00209,,,9.352,,,,,,14:105863243:A:C_A
91062,MDE,S186,1,20649109,C,T,rs45539432&CM052340,PINK1,stop_gained,SNV,...,5.3e-05,pathogenic,,38.0,,,,,,1:20649109:C:T_C
60131,MDE,S164,1,20649109,C,T,rs45539432&CM052340,PINK1,stop_gained,SNV,...,5.3e-05,pathogenic,,38.0,,,,,,1:20649109:C:T_C
36334,AMR,S129,4,130964119,G,T,rs78589926,,splice_donor_variant&non_coding_transcript_var...,SNV,...,0.005998,,,1.198,,,,,,4:130964119:G:T_G


### Select homozygous variants and extract CSQ from the VCF file with unique samples

For unique samples, extract homozygous variants and retrieve their consequences from the VCF file.

In [94]:
# Initialize an empty DataFrame to store unique samples across pools
uniq_samples = pd.DataFrame()

# Open and read the file containing the list of renamed pools
with open(f'{WORK_DIR}/list_renamed_pools.txt', 'r') as file:
    # Iterate through each pool listed in the file
    for line in file:
        # Remove trailing newline characters from the line
        pool = line.strip()
        print(f"Processing pool: {pool}")

        # Extract the pool name (e.g., ancestry-specific identifier) from the file path
        pool_name = pool.split('/')[4]

        # Check if the pool contains a unique sample in WGS data
        shell_do(f'/home/jupyter/tools/plink2 --vcf {pool}.vcf.gz --make-just-psam --out {pool}')
        IIDs_WGS = pd.read_csv(f"{pool}.psam", sep='\t')

        # Proceed only if the pool contains a single unique sample
        if IIDs_WGS['#IID'].nunique() == 1:
            # Generate a .raw file using PLINK2 with proper variant IDs
            shell_do(f'/home/jupyter/tools/plink2 --vcf {pool}.vcf.gz \
                           --set-all-var-ids @:#:\$r:\$a \
                           --new-id-max-allele-len 160 \
                           --recode A --out {pool}')
            
            # Read the generated .raw file into a DataFrame
            raw = pd.read_csv(f'{pool}.raw', sep='\t')

            # Identify columns in the .raw file containing only zeros (homozygous variants)
            columns_with_only_0 = raw.columns[(raw.eq(0).all())]

            # Decompress the VCF file for annotation purposes
            shell_do(f'gunzip -c {pool}.vcf.gz > {pool}.vcf')

            # Extract chromosome, position, reference allele, and alternate allele from the VCF file (first part)
            shell_do(f"cat {pool}.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > {pool}_begin")
            
            # Extract annotation fields (e.g., functional effects, allele frequencies) from the VCF file (second part)
            shell_do(f"cat {pool}.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/\t/g' | \
                     awk -F '\t' 'BEGIN{{OFS=FS}}{{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}}' > {pool}_end")
            
            # Combine the extracted fields into a reduced annotation file
            shell_do(f'paste {pool}_begin {pool}_end | sed 1d > {pool}_reduced.tsv')

            # Define column names for the reduced annotation file
            column_names = [
                "chr", "bp", "ID", "REF", "ALT", "rsid", "gene", "variant_type", "Variant_class", 
                "variant_impact", "GnomADg_AF", "clinical_signif", "clinvar_signif", "cadd_phred", 
                "REVEL", "eve_class", "am_class", "SIFT", "Polyphen"
            ]

            # Read the reduced annotation file into a DataFrame
            vars_anno_reduced = pd.read_csv(
                f'{pool}_reduced.tsv', sep='\t', low_memory=False, header=None, names=column_names
            )

            # Create a unique ID for each variant by combining chromosome, position, REF, and ALT
            vars_anno_reduced['ID'] = vars_anno_reduced['chr'].astype(str) + ':' + vars_anno_reduced['bp'].astype(str) + ':' + vars_anno_reduced['REF'] + ':' + vars_anno_reduced['ALT']
            
            # Generate a new column to match the column format in the .raw file
            vars_anno_reduced['colname'] = vars_anno_reduced['ID'] + '_' + vars_anno_reduced['REF']
            
            # Drop the temporary ID column as it is no longer needed
            vars_anno_reduced = vars_anno_reduced.drop('ID', axis=1)

            # Filter the annotation DataFrame based on columns that contain only zeros in the .raw file
            results = vars_anno_reduced.loc[vars_anno_reduced['colname'].isin(columns_with_only_0)]

            # Extract ancestry and pool ID from the pool name
            ancestry, pool_id = pool_name.split('_')

            # Add the ancestry information as the first column
            results.insert(0, 'Ancestry', ancestry)

            # Add the pool ID as the second column
            results.insert(1, 'Pool', pool_id)

            # Append the filtered results to the main DataFrame
            uniq_samples = pd.concat([uniq_samples, results], ignore_index=True)

# Sort the concatenated results by 'Ancestry' and 'Pool' in descending order
uniq_samples = uniq_samples.sort_values(by=['Ancestry', 'Pool'], ascending=[False, False])

# Display the final sorted DataFrame
uniq_samples

Processing pool: /home/jupyter/Mapping/AAC_S396


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AAC_S396.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AAC_S396


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AAC_S396.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AAC_S396
  --vcf /home/jupyter/Mapping/AAC_S396.vcf.gz

Start time: Tue Nov 26 05:31:31 2024
628867 MiB RAM detected, ~621420 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 4516 variants scanned.
--vcf: /home/jupyter/Mapping/AAC_S396-temporary.pgen +
/home/jupyter/Mapping/AAC_S396-temporary.pvar.zst +
/home/jupyter/Mapping/AAC_S396-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AAC_S396-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AAC_S396.psam ... done.
End time: Tue Nov 26 05:31:31 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AAC_S396.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AAC_S396


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AAC_S396.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AAC_S396
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AAC_S396.vcf.gz

Start time: Tue Nov 26 05:31:32 2024
628867 MiB RAM detected, ~621413 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 4516 variants scanned.
--vcf: /home/jupyter/Mapping/AAC_S396-temporary.pgen +
/home/jupyter/Mapping/AAC_S396-temporary.pvar.zst +
/home/jupyter/Mapping/AAC_S396-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AAC_S396-temporary.psam.
4516 variants loaded from /home/jupyter/Mapping/AAC_S396-temporary.pvar.zst.
Note: No phenotype data present.
--exp

Executing: gunzip -c /home/jupyter/Mapping/AAC_S396.vcf.gz > /home/jupyter/Mapping/AAC_S396.vcf
Executing: cat /home/jupyter/Mapping/AAC_S396.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AAC_S396_begin
Executing: cat /home/jupyter/Mapping/AAC_S396.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AAC_S396_end
Executing: paste /home/jupyter/Mapping/AAC_S396_begin /home/jupyter/Mapping/AAC_S396_end | sed 1d > /home/jupyter/Mapping/AAC_S396_reduced.tsv


Processing pool: /home/jupyter/Mapping/AAC_S397


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AAC_S397.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AAC_S397


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AAC_S397.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AAC_S397
  --vcf /home/jupyter/Mapping/AAC_S397.vcf.gz

Start time: Tue Nov 26 05:31:38 2024
628867 MiB RAM detected, ~621416 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 4790 variants scanned.
--vcf: /home/jupyter/Mapping/AAC_S397-temporary.pgen +
/home/jupyter/Mapping/AAC_S397-temporary.pvar.zst +
/home/jupyter/Mapping/AAC_S397-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AAC_S397-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AAC_S397.psam ... done.
End time: Tue Nov 26 05:31:38 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AAC_S397.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AAC_S397


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AAC_S397.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AAC_S397
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AAC_S397.vcf.gz

Start time: Tue Nov 26 05:31:39 2024
628867 MiB RAM detected, ~621418 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 4790 variants scanned.
--vcf: /home/jupyter/Mapping/AAC_S397-temporary.pgen +
/home/jupyter/Mapping/AAC_S397-temporary.pvar.zst +
/home/jupyter/Mapping/AAC_S397-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AAC_S397-temporary.psam.
4790 variants loaded from /home/jupyter/Mapping/AAC_S397-temporary.pvar.zst.
Note: No phenotype data present.
--exp

Executing: gunzip -c /home/jupyter/Mapping/AAC_S397.vcf.gz > /home/jupyter/Mapping/AAC_S397.vcf
Executing: cat /home/jupyter/Mapping/AAC_S397.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AAC_S397_begin
Executing: cat /home/jupyter/Mapping/AAC_S397.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AAC_S397_end
Executing: paste /home/jupyter/Mapping/AAC_S397_begin /home/jupyter/Mapping/AAC_S397_end | sed 1d > /home/jupyter/Mapping/AAC_S397_reduced.tsv


Processing pool: /home/jupyter/Mapping/AJ_S1341


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1341.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AJ_S1341


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1341.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AJ_S1341
  --vcf /home/jupyter/Mapping/AJ_S1341.vcf.gz

Start time: Tue Nov 26 05:31:46 2024
628867 MiB RAM detected, ~621401 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 13411 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1341-temporary.pgen +
/home/jupyter/Mapping/AJ_S1341-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1341-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1341-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AJ_S1341.psam ... done.
End time: Tue Nov 26 05:31:46 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1341.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AJ_S1341


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1341.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AJ_S1341
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AJ_S1341.vcf.gz

Start time: Tue Nov 26 05:31:47 2024
628867 MiB RAM detected, ~621404 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 13411 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1341-temporary.pgen +
/home/jupyter/Mapping/AJ_S1341-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1341-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1341-temporary.psam.
13411 variants loaded from /home/jupyter/Mapping/AJ_S1341-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: load

Executing: gunzip -c /home/jupyter/Mapping/AJ_S1341.vcf.gz > /home/jupyter/Mapping/AJ_S1341.vcf
Executing: cat /home/jupyter/Mapping/AJ_S1341.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AJ_S1341_begin
Executing: cat /home/jupyter/Mapping/AJ_S1341.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AJ_S1341_end
Executing: paste /home/jupyter/Mapping/AJ_S1341_begin /home/jupyter/Mapping/AJ_S1341_end | sed 1d > /home/jupyter/Mapping/AJ_S1341_reduced.tsv


Processing pool: /home/jupyter/Mapping/AJ_S1443


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1443.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AJ_S1443


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1443.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AJ_S1443
  --vcf /home/jupyter/Mapping/AJ_S1443.vcf.gz

Start time: Tue Nov 26 05:31:55 2024
628867 MiB RAM detected, ~621409 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 54041 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1443-temporary.pgen +
/home/jupyter/Mapping/AJ_S1443-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1443-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1443-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AJ_S1443.psam ... done.
End time: Tue Nov 26 05:31:55 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1443.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AJ_S1443


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1443.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AJ_S1443
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AJ_S1443.vcf.gz

Start time: Tue Nov 26 05:31:56 2024
628867 MiB RAM detected, ~621414 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 54041 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1443-temporary.pgen +
/home/jupyter/Mapping/AJ_S1443-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1443-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1443-temporary.psam.
54041 variants loaded from /home/jupyter/Mapping/AJ_S1443-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: load

Executing: gunzip -c /home/jupyter/Mapping/AJ_S1443.vcf.gz > /home/jupyter/Mapping/AJ_S1443.vcf
Executing: cat /home/jupyter/Mapping/AJ_S1443.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AJ_S1443_begin
Executing: cat /home/jupyter/Mapping/AJ_S1443.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AJ_S1443_end
Executing: paste /home/jupyter/Mapping/AJ_S1443_begin /home/jupyter/Mapping/AJ_S1443_end | sed 1d > /home/jupyter/Mapping/AJ_S1443_reduced.tsv


Processing pool: /home/jupyter/Mapping/AJ_S1510


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1510.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AJ_S1510


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1510.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AJ_S1510
  --vcf /home/jupyter/Mapping/AJ_S1510.vcf.gz

Start time: Tue Nov 26 05:32:13 2024
628867 MiB RAM detected, ~621417 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 4626 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1510-temporary.pgen +
/home/jupyter/Mapping/AJ_S1510-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1510-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1510-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AJ_S1510.psam ... done.
End time: Tue Nov 26 05:32:14 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1510.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AJ_S1510


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1510.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AJ_S1510
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AJ_S1510.vcf.gz

Start time: Tue Nov 26 05:32:15 2024
628867 MiB RAM detected, ~621419 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 4626 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1510-temporary.pgen +
/home/jupyter/Mapping/AJ_S1510-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1510-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1510-temporary.psam.
4626 variants loaded from /home/jupyter/Mapping/AJ_S1510-temporary.pvar.zst.
Note: No phenotype data present.
--exp

Executing: gunzip -c /home/jupyter/Mapping/AJ_S1510.vcf.gz > /home/jupyter/Mapping/AJ_S1510.vcf
Executing: cat /home/jupyter/Mapping/AJ_S1510.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AJ_S1510_begin
Executing: cat /home/jupyter/Mapping/AJ_S1510.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AJ_S1510_end
Executing: paste /home/jupyter/Mapping/AJ_S1510_begin /home/jupyter/Mapping/AJ_S1510_end | sed 1d > /home/jupyter/Mapping/AJ_S1510_reduced.tsv


Processing pool: /home/jupyter/Mapping/AJ_S1511


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1511.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AJ_S1511


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1511.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AJ_S1511
  --vcf /home/jupyter/Mapping/AJ_S1511.vcf.gz

Start time: Tue Nov 26 05:32:21 2024
628867 MiB RAM detected, ~621414 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 6053 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1511-temporary.pgen +
/home/jupyter/Mapping/AJ_S1511-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1511-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1511-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AJ_S1511.psam ... done.
End time: Tue Nov 26 05:32:21 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1511.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AJ_S1511


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1511.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AJ_S1511
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AJ_S1511.vcf.gz

Start time: Tue Nov 26 05:32:22 2024
628867 MiB RAM detected, ~621417 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 6053 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1511-temporary.pgen +
/home/jupyter/Mapping/AJ_S1511-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1511-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1511-temporary.psam.
6053 variants loaded from /home/jupyter/Mapping/AJ_S1511-temporary.pvar.zst.
Note: No phenotype data present.
--exp

Executing: gunzip -c /home/jupyter/Mapping/AJ_S1511.vcf.gz > /home/jupyter/Mapping/AJ_S1511.vcf
Executing: cat /home/jupyter/Mapping/AJ_S1511.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AJ_S1511_begin
Executing: cat /home/jupyter/Mapping/AJ_S1511.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AJ_S1511_end
Executing: paste /home/jupyter/Mapping/AJ_S1511_begin /home/jupyter/Mapping/AJ_S1511_end | sed 1d > /home/jupyter/Mapping/AJ_S1511_reduced.tsv


Processing pool: /home/jupyter/Mapping/AJ_S1515


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1515.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AJ_S1515


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1515.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AJ_S1515
  --vcf /home/jupyter/Mapping/AJ_S1515.vcf.gz

Start time: Tue Nov 26 05:32:29 2024
628867 MiB RAM detected, ~621424 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 12764 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1515-temporary.pgen +
/home/jupyter/Mapping/AJ_S1515-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1515-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AJ_S1515-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AJ_S1515.psam ... done.
End time: Tue Nov 26 05:32:29 2024
Processing pool: /home/jupyter/Mapping/AJ_S1623


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1623.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AJ_S1623


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1623.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AJ_S1623
  --vcf /home/jupyter/Mapping/AJ_S1623.vcf.gz

Start time: Tue Nov 26 05:32:30 2024
628867 MiB RAM detected, ~621420 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 12708 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1623-temporary.pgen +
/home/jupyter/Mapping/AJ_S1623-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1623-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1623-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AJ_S1623.psam ... done.
End time: Tue Nov 26 05:32:30 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1623.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AJ_S1623


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1623.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AJ_S1623
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AJ_S1623.vcf.gz

Start time: Tue Nov 26 05:32:31 2024
628867 MiB RAM detected, ~621423 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 12708 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1623-temporary.pgen +
/home/jupyter/Mapping/AJ_S1623-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1623-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1623-temporary.psam.
12708 variants loaded from /home/jupyter/Mapping/AJ_S1623-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: load

Executing: gunzip -c /home/jupyter/Mapping/AJ_S1623.vcf.gz > /home/jupyter/Mapping/AJ_S1623.vcf
Executing: cat /home/jupyter/Mapping/AJ_S1623.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AJ_S1623_begin
Executing: cat /home/jupyter/Mapping/AJ_S1623.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AJ_S1623_end
Executing: paste /home/jupyter/Mapping/AJ_S1623_begin /home/jupyter/Mapping/AJ_S1623_end | sed 1d > /home/jupyter/Mapping/AJ_S1623_reduced.tsv


Processing pool: /home/jupyter/Mapping/AJ_S1650


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1650.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AJ_S1650


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1650.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AJ_S1650
  --vcf /home/jupyter/Mapping/AJ_S1650.vcf.gz

Start time: Tue Nov 26 05:32:39 2024
628867 MiB RAM detected, ~621430 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 8451 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1650-temporary.pgen +
/home/jupyter/Mapping/AJ_S1650-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1650-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1650-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AJ_S1650.psam ... done.
End time: Tue Nov 26 05:32:39 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1650.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AJ_S1650


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1650.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AJ_S1650
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AJ_S1650.vcf.gz

Start time: Tue Nov 26 05:32:40 2024
628867 MiB RAM detected, ~621431 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 8451 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1650-temporary.pgen +
/home/jupyter/Mapping/AJ_S1650-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1650-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1650-temporary.psam.
8451 variants loaded from /home/jupyter/Mapping/AJ_S1650-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: loadin

Executing: gunzip -c /home/jupyter/Mapping/AJ_S1650.vcf.gz > /home/jupyter/Mapping/AJ_S1650.vcf
Executing: cat /home/jupyter/Mapping/AJ_S1650.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AJ_S1650_begin
Executing: cat /home/jupyter/Mapping/AJ_S1650.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AJ_S1650_end
Executing: paste /home/jupyter/Mapping/AJ_S1650_begin /home/jupyter/Mapping/AJ_S1650_end | sed 1d > /home/jupyter/Mapping/AJ_S1650_reduced.tsv


Processing pool: /home/jupyter/Mapping/AJ_S1651


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1651.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AJ_S1651


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1651.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AJ_S1651
  --vcf /home/jupyter/Mapping/AJ_S1651.vcf.gz

Start time: Tue Nov 26 05:32:48 2024
628867 MiB RAM detected, ~621430 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 18394 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1651-temporary.pgen +
/home/jupyter/Mapping/AJ_S1651-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1651-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1651-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AJ_S1651.psam ... done.
End time: Tue Nov 26 05:32:48 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1651.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AJ_S1651


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1651.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AJ_S1651
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AJ_S1651.vcf.gz

Start time: Tue Nov 26 05:32:49 2024
628867 MiB RAM detected, ~621434 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 18394 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1651-temporary.pgen +
/home/jupyter/Mapping/AJ_S1651-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1651-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1651-temporary.psam.
18394 variants loaded from /home/jupyter/Mapping/AJ_S1651-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: load

Executing: gunzip -c /home/jupyter/Mapping/AJ_S1651.vcf.gz > /home/jupyter/Mapping/AJ_S1651.vcf
Executing: cat /home/jupyter/Mapping/AJ_S1651.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AJ_S1651_begin
Executing: cat /home/jupyter/Mapping/AJ_S1651.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AJ_S1651_end
Executing: paste /home/jupyter/Mapping/AJ_S1651_begin /home/jupyter/Mapping/AJ_S1651_end | sed 1d > /home/jupyter/Mapping/AJ_S1651_reduced.tsv


Processing pool: /home/jupyter/Mapping/AJ_S1665


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1665.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AJ_S1665


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1665.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AJ_S1665
  --vcf /home/jupyter/Mapping/AJ_S1665.vcf.gz

Start time: Tue Nov 26 05:32:58 2024
628867 MiB RAM detected, ~621435 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 12032 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1665-temporary.pgen +
/home/jupyter/Mapping/AJ_S1665-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1665-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1665-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AJ_S1665.psam ... done.
End time: Tue Nov 26 05:32:58 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AJ_S1665.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AJ_S1665


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AJ_S1665.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AJ_S1665
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AJ_S1665.vcf.gz

Start time: Tue Nov 26 05:32:59 2024
628867 MiB RAM detected, ~621437 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 12032 variants scanned.
--vcf: /home/jupyter/Mapping/AJ_S1665-temporary.pgen +
/home/jupyter/Mapping/AJ_S1665-temporary.pvar.zst +
/home/jupyter/Mapping/AJ_S1665-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AJ_S1665-temporary.psam.
12032 variants loaded from /home/jupyter/Mapping/AJ_S1665-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: load

Executing: gunzip -c /home/jupyter/Mapping/AJ_S1665.vcf.gz > /home/jupyter/Mapping/AJ_S1665.vcf
Executing: cat /home/jupyter/Mapping/AJ_S1665.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AJ_S1665_begin
Executing: cat /home/jupyter/Mapping/AJ_S1665.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AJ_S1665_end
Executing: paste /home/jupyter/Mapping/AJ_S1665_begin /home/jupyter/Mapping/AJ_S1665_end | sed 1d > /home/jupyter/Mapping/AJ_S1665_reduced.tsv


Processing pool: /home/jupyter/Mapping/AMR_S82


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S82.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AMR_S82


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S82.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AMR_S82
  --vcf /home/jupyter/Mapping/AMR_S82.vcf.gz

Start time: Tue Nov 26 05:33:07 2024
628867 MiB RAM detected, ~621440 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 27524 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S82-temporary.pgen +
/home/jupyter/Mapping/AMR_S82-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S82-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AMR_S82-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AMR_S82.psam ... done.
End time: Tue Nov 26 05:33:08 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S82.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AMR_S82


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S82.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AMR_S82
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AMR_S82.vcf.gz

Start time: Tue Nov 26 05:33:09 2024
628867 MiB RAM detected, ~621445 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 27524 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S82-temporary.pgen +
/home/jupyter/Mapping/AMR_S82-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S82-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AMR_S82-temporary.psam.
27524 variants loaded from /home/jupyter/Mapping/AMR_S82-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: loading... 0

Executing: gunzip -c /home/jupyter/Mapping/AMR_S82.vcf.gz > /home/jupyter/Mapping/AMR_S82.vcf
Executing: cat /home/jupyter/Mapping/AMR_S82.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AMR_S82_begin
Executing: cat /home/jupyter/Mapping/AMR_S82.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AMR_S82_end
Executing: paste /home/jupyter/Mapping/AMR_S82_begin /home/jupyter/Mapping/AMR_S82_end | sed 1d > /home/jupyter/Mapping/AMR_S82_reduced.tsv


Processing pool: /home/jupyter/Mapping/AMR_S97


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S97.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AMR_S97


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S97.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AMR_S97
  --vcf /home/jupyter/Mapping/AMR_S97.vcf.gz

Start time: Tue Nov 26 05:33:20 2024
628867 MiB RAM detected, ~621427 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 57669 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S97-temporary.pgen +
/home/jupyter/Mapping/AMR_S97-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S97-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S97-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AMR_S97.psam ... done.
End time: Tue Nov 26 05:33:20 2024
Processing pool: /home/jupyter/Mapping/AMR_S121


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S121.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AMR_S121


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S121.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AMR_S121
  --vcf /home/jupyter/Mapping/AMR_S121.vcf.gz

Start time: Tue Nov 26 05:33:21 2024
628867 MiB RAM detected, ~621423 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 13090 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S121-temporary.pgen +
/home/jupyter/Mapping/AMR_S121-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S121-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S121-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AMR_S121.psam ... done.
End time: Tue Nov 26 05:33:21 2024
Processing pool: /home/jupyter/Mapping/AMR_S122


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S122.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AMR_S122


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S122.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AMR_S122
  --vcf /home/jupyter/Mapping/AMR_S122.vcf.gz

Start time: Tue Nov 26 05:33:22 2024
628867 MiB RAM detected, ~621424 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 14808 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S122-temporary.pgen +
/home/jupyter/Mapping/AMR_S122-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S122-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S122-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AMR_S122.psam ... done.
End time: Tue Nov 26 05:33:22 2024
Processing pool: /home/jupyter/Mapping/AMR_S127


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S127.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AMR_S127


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S127.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AMR_S127
  --vcf /home/jupyter/Mapping/AMR_S127.vcf.gz

Start time: Tue Nov 26 05:33:23 2024
628867 MiB RAM detected, ~621432 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 33935 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S127-temporary.pgen +
/home/jupyter/Mapping/AMR_S127-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S127-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AMR_S127-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AMR_S127.psam ... done.
End time: Tue Nov 26 05:33:23 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S127.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/AMR_S127


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S127.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/AMR_S127
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/AMR_S127.vcf.gz

Start time: Tue Nov 26 05:33:24 2024
628867 MiB RAM detected, ~621433 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 33935 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S127-temporary.pgen +
/home/jupyter/Mapping/AMR_S127-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S127-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/AMR_S127-temporary.psam.
33935 variants loaded from /home/jupyter/Mapping/AMR_S127-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: load

Executing: gunzip -c /home/jupyter/Mapping/AMR_S127.vcf.gz > /home/jupyter/Mapping/AMR_S127.vcf
Executing: cat /home/jupyter/Mapping/AMR_S127.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/AMR_S127_begin
Executing: cat /home/jupyter/Mapping/AMR_S127.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/AMR_S127_end
Executing: paste /home/jupyter/Mapping/AMR_S127_begin /home/jupyter/Mapping/AMR_S127_end | sed 1d > /home/jupyter/Mapping/AMR_S127_reduced.tsv


Processing pool: /home/jupyter/Mapping/AMR_S128


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S128.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AMR_S128


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S128.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AMR_S128
  --vcf /home/jupyter/Mapping/AMR_S128.vcf.gz

Start time: Tue Nov 26 05:33:37 2024
628867 MiB RAM detected, ~621439 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 35627 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S128-temporary.pgen +
/home/jupyter/Mapping/AMR_S128-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S128-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S128-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AMR_S128.psam ... done.
End time: Tue Nov 26 05:33:37 2024
Processing pool: /home/jupyter/Mapping/AMR_S129


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/AMR_S129.vcf.gz --make-just-psam --out /home/jupyter/Mapping/AMR_S129


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/AMR_S129.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/AMR_S129
  --vcf /home/jupyter/Mapping/AMR_S129.vcf.gz

Start time: Tue Nov 26 05:33:38 2024
628867 MiB RAM detected, ~621442 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 32245 variants scanned.
--vcf: /home/jupyter/Mapping/AMR_S129-temporary.pgen +
/home/jupyter/Mapping/AMR_S129-temporary.pvar.zst +
/home/jupyter/Mapping/AMR_S129-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/AMR_S129-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/AMR_S129.psam ... done.
End time: Tue Nov 26 05:33:38 2024
Processing pool: /home/jupyter/Mapping/EUR_S5327


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/EUR_S5327.vcf.gz --make-just-psam --out /home/jupyter/Mapping/EUR_S5327


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/EUR_S5327.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/EUR_S5327
  --vcf /home/jupyter/Mapping/EUR_S5327.vcf.gz

Start time: Tue Nov 26 05:33:39 2024
628867 MiB RAM detected, ~621441 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 30157 variants scanned.
--vcf: /home/jupyter/Mapping/EUR_S5327-temporary.pgen +
/home/jupyter/Mapping/EUR_S5327-temporary.pvar.zst +
/home/jupyter/Mapping/EUR_S5327-temporary.psam written.
3 samples (0 females, 0 males, 3 ambiguous; 3 founders) loaded from
/home/jupyter/Mapping/EUR_S5327-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/EUR_S5327.psam ... done.
End time: Tue Nov 26 05:33:39 2024
Processing pool: /home/jupyter/Mapping/EUR_S5365


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/EUR_S5365.vcf.gz --make-just-psam --out /home/jupyter/Mapping/EUR_S5365


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/EUR_S5365.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/EUR_S5365
  --vcf /home/jupyter/Mapping/EUR_S5365.vcf.gz

Start time: Tue Nov 26 05:33:41 2024
628867 MiB RAM detected, ~621444 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 9944 variants scanned.
--vcf: /home/jupyter/Mapping/EUR_S5365-temporary.pgen +
/home/jupyter/Mapping/EUR_S5365-temporary.pvar.zst +
/home/jupyter/Mapping/EUR_S5365-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/EUR_S5365-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/EUR_S5365.psam ... done.
End time: Tue Nov 26 05:33:41 2024
Processing pool: /home/jupyter/Mapping/EUR_S5386


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/EUR_S5386.vcf.gz --make-just-psam --out /home/jupyter/Mapping/EUR_S5386


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/EUR_S5386.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/EUR_S5386
  --vcf /home/jupyter/Mapping/EUR_S5386.vcf.gz

Start time: Tue Nov 26 05:33:42 2024
628867 MiB RAM detected, ~621444 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 4294 variants scanned.
--vcf: /home/jupyter/Mapping/EUR_S5386-temporary.pgen +
/home/jupyter/Mapping/EUR_S5386-temporary.pvar.zst +
/home/jupyter/Mapping/EUR_S5386-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/EUR_S5386-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/EUR_S5386.psam ... done.
End time: Tue Nov 26 05:33:42 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/EUR_S5386.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/EUR_S5386


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/EUR_S5386.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/EUR_S5386
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/EUR_S5386.vcf.gz

Start time: Tue Nov 26 05:33:43 2024
628867 MiB RAM detected, ~621443 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 4294 variants scanned.
--vcf: /home/jupyter/Mapping/EUR_S5386-temporary.pgen +
/home/jupyter/Mapping/EUR_S5386-temporary.pvar.zst +
/home/jupyter/Mapping/EUR_S5386-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/EUR_S5386-temporary.psam.
4294 variants loaded from /home/jupyter/Mapping/EUR_S5386-temporary.pvar.zst.
Note: No phenotype data present

Executing: gunzip -c /home/jupyter/Mapping/EUR_S5386.vcf.gz > /home/jupyter/Mapping/EUR_S5386.vcf
Executing: cat /home/jupyter/Mapping/EUR_S5386.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/EUR_S5386_begin
Executing: cat /home/jupyter/Mapping/EUR_S5386.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/EUR_S5386_end
Executing: paste /home/jupyter/Mapping/EUR_S5386_begin /home/jupyter/Mapping/EUR_S5386_end | sed 1d > /home/jupyter/Mapping/EUR_S5386_reduced.tsv


Processing pool: /home/jupyter/Mapping/EUR_S5387


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/EUR_S5387.vcf.gz --make-just-psam --out /home/jupyter/Mapping/EUR_S5387


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/EUR_S5387.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/EUR_S5387
  --vcf /home/jupyter/Mapping/EUR_S5387.vcf.gz

Start time: Tue Nov 26 05:33:49 2024
628867 MiB RAM detected, ~621434 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 2490 variants scanned.
--vcf: /home/jupyter/Mapping/EUR_S5387-temporary.pgen +
/home/jupyter/Mapping/EUR_S5387-temporary.pvar.zst +
/home/jupyter/Mapping/EUR_S5387-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/EUR_S5387-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/EUR_S5387.psam ... done.
End time: Tue Nov 26 05:33:49 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/EUR_S5387.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/EUR_S5387


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/EUR_S5387.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/EUR_S5387
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/EUR_S5387.vcf.gz

Start time: Tue Nov 26 05:33:50 2024
628867 MiB RAM detected, ~621436 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 2490 variants scanned.
--vcf: /home/jupyter/Mapping/EUR_S5387-temporary.pgen +
/home/jupyter/Mapping/EUR_S5387-temporary.pvar.zst +
/home/jupyter/Mapping/EUR_S5387-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/EUR_S5387-temporary.psam.
2490 variants loaded from /home/jupyter/Mapping/EUR_S5387-temporary.pvar.zst.
Note: No phenotype data present

Executing: gunzip -c /home/jupyter/Mapping/EUR_S5387.vcf.gz > /home/jupyter/Mapping/EUR_S5387.vcf
Executing: cat /home/jupyter/Mapping/EUR_S5387.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/EUR_S5387_begin
Executing: cat /home/jupyter/Mapping/EUR_S5387.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/EUR_S5387_end
Executing: paste /home/jupyter/Mapping/EUR_S5387_begin /home/jupyter/Mapping/EUR_S5387_end | sed 1d > /home/jupyter/Mapping/EUR_S5387_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S7


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S7.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S7


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S7.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S7
  --vcf /home/jupyter/Mapping/MDE_S7.vcf.gz

Start time: Tue Nov 26 05:33:56 2024
628867 MiB RAM detected, ~621445 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 8263 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S7-temporary.pgen +
/home/jupyter/Mapping/MDE_S7-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S7-temporary.psam written.
5 samples (0 females, 0 males, 5 ambiguous; 5 founders) loaded from
/home/jupyter/Mapping/MDE_S7-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S7.psam ... done.
End time: Tue Nov 26 05:33:56 2024
Processing pool: /home/jupyter/Mapping/MDE_S14


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S14.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S14


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S14.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S14
  --vcf /home/jupyter/Mapping/MDE_S14.vcf.gz

Start time: Tue Nov 26 05:33:57 2024
628867 MiB RAM detected, ~621445 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 6673 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S14-temporary.pgen +
/home/jupyter/Mapping/MDE_S14-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S14-temporary.psam written.
3 samples (0 females, 0 males, 3 ambiguous; 3 founders) loaded from
/home/jupyter/Mapping/MDE_S14-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S14.psam ... done.
End time: Tue Nov 26 05:33:57 2024
Processing pool: /home/jupyter/Mapping/MDE_S15


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S15.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S15


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S15.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S15
  --vcf /home/jupyter/Mapping/MDE_S15.vcf.gz

Start time: Tue Nov 26 05:33:58 2024
628867 MiB RAM detected, ~621448 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 30628 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S15-temporary.pgen +
/home/jupyter/Mapping/MDE_S15-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S15-temporary.psam written.
3 samples (0 females, 0 males, 3 ambiguous; 3 founders) loaded from
/home/jupyter/Mapping/MDE_S15-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S15.psam ... done.
End time: Tue Nov 26 05:33:58 2024
Processing pool: /home/jupyter/Mapping/MDE_S16


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S16.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S16


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S16.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S16
  --vcf /home/jupyter/Mapping/MDE_S16.vcf.gz

Start time: Tue Nov 26 05:33:59 2024
628867 MiB RAM detected, ~621451 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 23568 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S16-temporary.pgen +
/home/jupyter/Mapping/MDE_S16-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S16-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/MDE_S16-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S16.psam ... done.
End time: Tue Nov 26 05:33:59 2024
Processing pool: /home/jupyter/Mapping/MDE_S17


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S17.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S17


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S17.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S17
  --vcf /home/jupyter/Mapping/MDE_S17.vcf.gz

Start time: Tue Nov 26 05:34:01 2024
628867 MiB RAM detected, ~621453 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 24540 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S17-temporary.pgen +
/home/jupyter/Mapping/MDE_S17-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S17-temporary.psam written.
2 samples (0 females, 0 males, 2 ambiguous; 2 founders) loaded from
/home/jupyter/Mapping/MDE_S17-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S17.psam ... done.
End time: Tue Nov 26 05:34:01 2024
Processing pool: /home/jupyter/Mapping/MDE_S51


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S51.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S51


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S51.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S51
  --vcf /home/jupyter/Mapping/MDE_S51.vcf.gz

Start time: Tue Nov 26 05:34:02 2024
628867 MiB RAM detected, ~621454 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 53709 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S51-temporary.pgen +
/home/jupyter/Mapping/MDE_S51-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S51-temporary.psam written.
5 samples (0 females, 0 males, 5 ambiguous; 5 founders) loaded from
/home/jupyter/Mapping/MDE_S51-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S51.psam ... done.
End time: Tue Nov 26 05:34:02 2024
Processing pool: /home/jupyter/Mapping/MDE_S65


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S65.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S65


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S65.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S65
  --vcf /home/jupyter/Mapping/MDE_S65.vcf.gz

Start time: Tue Nov 26 05:34:03 2024
628867 MiB RAM detected, ~621454 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 298387 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S65-temporary.pgen +
/home/jupyter/Mapping/MDE_S65-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S65-temporary.psam written.
7 samples (0 females, 0 males, 7 ambiguous; 7 founders) loaded from
/home/jupyter/Mapping/MDE_S65-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S65.psam ... done.
End time: Tue Nov 26 05:34:04 2024
Processing pool: /home/jupyter/Mapping/MDE_S83


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S83.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S83


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S83.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S83
  --vcf /home/jupyter/Mapping/MDE_S83.vcf.gz

Start time: Tue Nov 26 05:34:05 2024
628867 MiB RAM detected, ~621455 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 237673 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S83-temporary.pgen +
/home/jupyter/Mapping/MDE_S83-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S83-temporary.psam written.
5 samples (0 females, 0 males, 5 ambiguous; 5 founders) loaded from
/home/jupyter/Mapping/MDE_S83-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S83.psam ... done.
End time: Tue Nov 26 05:34:06 2024
Processing pool: /home/jupyter/Mapping/MDE_S87


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S87.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S87


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S87.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S87
  --vcf /home/jupyter/Mapping/MDE_S87.vcf.gz

Start time: Tue Nov 26 05:34:07 2024
628867 MiB RAM detected, ~621457 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 78597 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S87-temporary.pgen +
/home/jupyter/Mapping/MDE_S87-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S87-temporary.psam written.
4 samples (0 females, 0 males, 4 ambiguous; 4 founders) loaded from
/home/jupyter/Mapping/MDE_S87-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S87.psam ... done.
End time: Tue Nov 26 05:34:08 2024
Processing pool: /home/jupyter/Mapping/MDE_S90


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S90.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S90


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S90.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S90
  --vcf /home/jupyter/Mapping/MDE_S90.vcf.gz

Start time: Tue Nov 26 05:34:09 2024
628867 MiB RAM detected, ~621458 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 83919 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S90-temporary.pgen +
/home/jupyter/Mapping/MDE_S90-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S90-temporary.psam written.
3 samples (0 females, 0 males, 3 ambiguous; 3 founders) loaded from
/home/jupyter/Mapping/MDE_S90-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S90.psam ... done.
End time: Tue Nov 26 05:34:09 2024
Processing pool: /home/jupyter/Mapping/MDE_S164


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S164.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S164


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S164.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S164
  --vcf /home/jupyter/Mapping/MDE_S164.vcf.gz

Start time: Tue Nov 26 05:34:10 2024
628867 MiB RAM detected, ~621459 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 308278 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S164-temporary.pgen +
/home/jupyter/Mapping/MDE_S164-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S164-temporary.psam written.
6 samples (0 females, 0 males, 6 ambiguous; 6 founders) loaded from
/home/jupyter/Mapping/MDE_S164-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S164.psam ... done.
End time: Tue Nov 26 05:34:11 2024
Processing pool: /home/jupyter/Mapping/MDE_S186


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S186.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S186


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S186.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S186
  --vcf /home/jupyter/Mapping/MDE_S186.vcf.gz

Start time: Tue Nov 26 05:34:13 2024
628867 MiB RAM detected, ~621460 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 332557 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S186-temporary.pgen +
/home/jupyter/Mapping/MDE_S186-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S186-temporary.psam written.
7 samples (0 females, 0 males, 7 ambiguous; 7 founders) loaded from
/home/jupyter/Mapping/MDE_S186-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S186.psam ... done.
End time: Tue Nov 26 05:34:14 2024
Processing pool: /home/jupyter/Mapping/MDE_S235


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S235.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S235


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S235.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S235
  --vcf /home/jupyter/Mapping/MDE_S235.vcf.gz

Start time: Tue Nov 26 05:34:15 2024
628867 MiB RAM detected, ~621463 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 71738 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S235-temporary.pgen +
/home/jupyter/Mapping/MDE_S235-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S235-temporary.psam written.
3 samples (0 females, 0 males, 3 ambiguous; 3 founders) loaded from
/home/jupyter/Mapping/MDE_S235-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S235.psam ... done.
End time: Tue Nov 26 05:34:15 2024
Processing pool: /home/jupyter/Mapping/MDE_S259


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S259.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S259


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S259.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S259
  --vcf /home/jupyter/Mapping/MDE_S259.vcf.gz

Start time: Tue Nov 26 05:34:16 2024
628867 MiB RAM detected, ~621462 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 2592 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S259-temporary.pgen +
/home/jupyter/Mapping/MDE_S259-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S259-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/MDE_S259-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S259.psam ... done.
End time: Tue Nov 26 05:34:16 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S259.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/MDE_S259


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S259.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S259
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S259.vcf.gz

Start time: Tue Nov 26 05:34:17 2024
628867 MiB RAM detected, ~621461 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 2592 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S259-temporary.pgen +
/home/jupyter/Mapping/MDE_S259-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S259-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/MDE_S259-temporary.psam.
2592 variants loaded from /home/jupyter/Mapping/MDE_S259-temporary.pvar.zst.
Note: No phenotype data present.
--exp

Executing: gunzip -c /home/jupyter/Mapping/MDE_S259.vcf.gz > /home/jupyter/Mapping/MDE_S259.vcf
Executing: cat /home/jupyter/Mapping/MDE_S259.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S259_begin
Executing: cat /home/jupyter/Mapping/MDE_S259.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S259_end
Executing: paste /home/jupyter/Mapping/MDE_S259_begin /home/jupyter/Mapping/MDE_S259_end | sed 1d > /home/jupyter/Mapping/MDE_S259_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S288


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S288.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S288


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S288.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S288
  --vcf /home/jupyter/Mapping/MDE_S288.vcf.gz

Start time: Tue Nov 26 05:34:23 2024
628867 MiB RAM detected, ~621455 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 38535 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S288-temporary.pgen +
/home/jupyter/Mapping/MDE_S288-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S288-temporary.psam written.
6 samples (0 females, 0 males, 6 ambiguous; 6 founders) loaded from
/home/jupyter/Mapping/MDE_S288-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S288.psam ... done.
End time: Tue Nov 26 05:34:23 2024
Processing pool: /home/jupyter/Mapping/MDE_S289


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S289.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S289


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S289.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S289
  --vcf /home/jupyter/Mapping/MDE_S289.vcf.gz

Start time: Tue Nov 26 05:34:25 2024
628867 MiB RAM detected, ~621456 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 18969 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S289-temporary.pgen +
/home/jupyter/Mapping/MDE_S289-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S289-temporary.psam written.
6 samples (0 females, 0 males, 6 ambiguous; 6 founders) loaded from
/home/jupyter/Mapping/MDE_S289-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S289.psam ... done.
End time: Tue Nov 26 05:34:25 2024
Processing pool: /home/jupyter/Mapping/MDE_S406


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S406.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S406


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S406.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S406
  --vcf /home/jupyter/Mapping/MDE_S406.vcf.gz

Start time: Tue Nov 26 05:34:26 2024
628867 MiB RAM detected, ~621459 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 30203 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S406-temporary.pgen +
/home/jupyter/Mapping/MDE_S406-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S406-temporary.psam written.
4 samples (0 females, 0 males, 4 ambiguous; 4 founders) loaded from
/home/jupyter/Mapping/MDE_S406-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S406.psam ... done.
End time: Tue Nov 26 05:34:26 2024
Processing pool: /home/jupyter/Mapping/MDE_S419


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S419.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S419


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S419.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S419
  --vcf /home/jupyter/Mapping/MDE_S419.vcf.gz

Start time: Tue Nov 26 05:34:27 2024
628867 MiB RAM detected, ~621457 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 191710 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S419-temporary.pgen +
/home/jupyter/Mapping/MDE_S419-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S419-temporary.psam written.
5 samples (0 females, 0 males, 5 ambiguous; 5 founders) loaded from
/home/jupyter/Mapping/MDE_S419-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S419.psam ... done.
End time: Tue Nov 26 05:34:28 2024
Processing pool: /home/jupyter/Mapping/MDE_S606


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S606.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S606


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S606.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S606
  --vcf /home/jupyter/Mapping/MDE_S606.vcf.gz

Start time: Tue Nov 26 05:34:29 2024
628867 MiB RAM detected, ~621452 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 23950 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S606-temporary.pgen +
/home/jupyter/Mapping/MDE_S606-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S606-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/MDE_S606-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S606.psam ... done.
End time: Tue Nov 26 05:34:29 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S606.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/MDE_S606


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S606.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S606
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S606.vcf.gz

Start time: Tue Nov 26 05:34:30 2024
628867 MiB RAM detected, ~621452 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 23950 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S606-temporary.pgen +
/home/jupyter/Mapping/MDE_S606-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S606-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/MDE_S606-temporary.psam.
23950 variants loaded from /home/jupyter/Mapping/MDE_S606-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: load

Executing: gunzip -c /home/jupyter/Mapping/MDE_S606.vcf.gz > /home/jupyter/Mapping/MDE_S606.vcf
Executing: cat /home/jupyter/Mapping/MDE_S606.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S606_begin
Executing: cat /home/jupyter/Mapping/MDE_S606.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S606_end
Executing: paste /home/jupyter/Mapping/MDE_S606_begin /home/jupyter/Mapping/MDE_S606_end | sed 1d > /home/jupyter/Mapping/MDE_S606_reduced.tsv


Processing pool: /home/jupyter/Mapping/MDE_S618


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S618.vcf.gz --make-just-psam --out /home/jupyter/Mapping/MDE_S618


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S618.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/MDE_S618
  --vcf /home/jupyter/Mapping/MDE_S618.vcf.gz

Start time: Tue Nov 26 05:34:41 2024
628867 MiB RAM detected, ~621454 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 7592 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S618-temporary.pgen +
/home/jupyter/Mapping/MDE_S618-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S618-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/MDE_S618-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/MDE_S618.psam ... done.
End time: Tue Nov 26 05:34:41 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/MDE_S618.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/MDE_S618


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/MDE_S618.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/MDE_S618
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/MDE_S618.vcf.gz

Start time: Tue Nov 26 05:34:42 2024
628867 MiB RAM detected, ~621453 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 7592 variants scanned.
--vcf: /home/jupyter/Mapping/MDE_S618-temporary.pgen +
/home/jupyter/Mapping/MDE_S618-temporary.pvar.zst +
/home/jupyter/Mapping/MDE_S618-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/MDE_S618-temporary.psam.
7592 variants loaded from /home/jupyter/Mapping/MDE_S618-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: loadin

Executing: gunzip -c /home/jupyter/Mapping/MDE_S618.vcf.gz > /home/jupyter/Mapping/MDE_S618.vcf
Executing: cat /home/jupyter/Mapping/MDE_S618.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/MDE_S618_begin
Executing: cat /home/jupyter/Mapping/MDE_S618.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/MDE_S618_end
Executing: paste /home/jupyter/Mapping/MDE_S618_begin /home/jupyter/Mapping/MDE_S618_end | sed 1d > /home/jupyter/Mapping/MDE_S618_reduced.tsv


Processing pool: /home/jupyter/Mapping/SAS_S892


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/SAS_S892.vcf.gz --make-just-psam --out /home/jupyter/Mapping/SAS_S892


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/SAS_S892.log.
Options in effect:
  --make-just-psam
  --out /home/jupyter/Mapping/SAS_S892
  --vcf /home/jupyter/Mapping/SAS_S892.vcf.gz

Start time: Tue Nov 26 05:34:49 2024
628867 MiB RAM detected, ~621452 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 423437 variants scanned.
--vcf: /home/jupyter/Mapping/SAS_S892-temporary.pgen +
/home/jupyter/Mapping/SAS_S892-temporary.pvar.zst +
/home/jupyter/Mapping/SAS_S892-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/SAS_S892-temporary.psam.
Note: No phenotype data present.
Writing /home/jupyter/Mapping/SAS_S892.psam ... done.
End time: Tue Nov 26 05:34:50 2024


Executing: /home/jupyter/tools/plink2 --vcf /home/jupyter/Mapping/SAS_S892.vcf.gz                            --set-all-var-ids @:#:\$r:\$a                            --new-id-max-allele-len 160                            --recode A --out /home/jupyter/Mapping/SAS_S892


PLINK v2.00a6LM AVX2 Intel (6 Aug 2024)        www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/jupyter/Mapping/SAS_S892.log.
Options in effect:
  --export A
  --new-id-max-allele-len 160
  --out /home/jupyter/Mapping/SAS_S892
  --set-all-var-ids @:#:$r:$a
  --vcf /home/jupyter/Mapping/SAS_S892.vcf.gz

Start time: Tue Nov 26 05:34:51 2024
628867 MiB RAM detected, ~621459 available; reserving 314433 MiB for main
workspace.
Using up to 96 threads (change this with --threads).
--vcf: 423437 variants scanned.
--vcf: /home/jupyter/Mapping/SAS_S892-temporary.pgen +
/home/jupyter/Mapping/SAS_S892-temporary.pvar.zst +
/home/jupyter/Mapping/SAS_S892-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
/home/jupyter/Mapping/SAS_S892-temporary.psam.
423437 variants loaded from /home/jupyter/Mapping/SAS_S892-temporary.pvar.zst.
Note: No phenotype data present.
--export A pass 1/1: lo

Executing: gunzip -c /home/jupyter/Mapping/SAS_S892.vcf.gz > /home/jupyter/Mapping/SAS_S892.vcf
Executing: cat /home/jupyter/Mapping/SAS_S892.vcf | grep -v '##' | cut -f1-5 | sed 's/^chr//g' > /home/jupyter/Mapping/SAS_S892_begin
Executing: cat /home/jupyter/Mapping/SAS_S892.vcf | grep -v '##' | cut -f8 | sed 's/.*CSQ=//g' | sed 's/|/	/g' |                      awk -F '	' 'BEGIN{OFS=FS}{print $18,$4,$2,$22,$3,$53,$66,$126,$96,$108,$76,$78,$39,$40}' > /home/jupyter/Mapping/SAS_S892_end
Executing: paste /home/jupyter/Mapping/SAS_S892_begin /home/jupyter/Mapping/SAS_S892_end | sed 1d > /home/jupyter/Mapping/SAS_S892_reduced.tsv


Unnamed: 0,Ancestry,Pool,chr,bp,REF,ALT,rsid,gene,variant_type,Variant_class,...,GnomADg_AF,clinical_signif,clinvar_signif,cadd_phred,REVEL,eve_class,am_class,SIFT,Polyphen,colname
157756,SAS,S892,8,124557139,C,A,rs16899688,MTSS1,intron_variant,SNV,...,0.3105,,,3.250,,,,,,8:124557139:C:A_C
157757,SAS,S892,8,124557442,A,G,rs7841224,MTSS1,intron_variant,SNV,...,0.3602,,,0.125,,,,,,8:124557442:A:G_A
157758,SAS,S892,8,124557903,G,A,rs4870903&COSV61499448,MTSS1,intron_variant,SNV,...,0.7082,benign,Benign,6.763,,,,,,8:124557903:G:A_G
157759,SAS,S892,8,124558287,A,G,rs3750232,MTSS1,intron_variant,SNV,...,0.4729,,,7.564,,,,,,8:124558287:A:G_A
157760,SAS,S892,8,124558559,C,T,rs7846270,MTSS1,intron_variant,SNV,...,0.4260,,,16.060,,,,,,8:124558559:C:T_C
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4505,AAC,S396,6,42299036,T,A,rs4711713,TRERF1,intron_variant,SNV,...,0.1336,,,1.253,,,,,,6:42299036:T:A_T
4506,AAC,S396,6,42299293,A,G,rs4711714,TRERF1,intron_variant,SNV,...,0.1447,,,0.110,,,,,,6:42299293:A:G_A
4507,AAC,S396,6,42299322,T,C,rs4711715,TRERF1,intron_variant,SNV,...,0.1454,,,0.098,,,,,,6:42299322:T:C_T
4508,AAC,S396,6,42299485,A,G,rs4711716,TRERF1,intron_variant,SNV,...,0.1352,,,2.567,,,,,,6:42299485:A:G_A


#### Prioritization criteria

Apply criteria to prioritize variants for further analysis.

In [95]:
# Filter unique sample variants with HIGH or MODERATE impact and GnomADg_AF <= 0.01 or null
candidates_uniq_samples = uniq_samples[
    (uniq_samples['variant_impact'].isin(['HIGH', 'MODERATE']))  # HIGH or MODERATE impact
    & ((uniq_samples['GnomADg_AF'].isnull()) | (uniq_samples['GnomADg_AF'] <= 0.01))  # GnomADg_AF is null or <= 0.01
]

# Display the filtered candidate variants
candidates_uniq_samples

Unnamed: 0,Ancestry,Pool,chr,bp,REF,ALT,rsid,gene,variant_type,Variant_class,...,GnomADg_AF,clinical_signif,clinvar_signif,cadd_phred,REVEL,eve_class,am_class,SIFT,Polyphen,colname
161981,SAS,S892,8,128916918,T,A,rs369143754,CCDC26,splice_acceptor_variant&non_coding_transcript_...,SNV,...,0.000624,,,5.673,,,,,,8:128916918:T:A_T
143437,MDE,S606,15,78766976,C,T,rs142824118&COSV66308197,ADAMTS7,missense_variant,SNV,...,0.004863,,,15.07,0.034523,,likely_benign,tolerated(0.05),benign(0.079),15:78766976:C:T_C
107935,AMR,S82,9,84723600,C,A,rs139913267&COSV52853311&COSV52860967,NTRK2,missense_variant,SNV,...,0.001032,benign/likely_benign,,16.34,,,,tolerated_low_confidence(0.52),benign(0.007),9:84723600:C:A_C
92767,AJ,S1665,13,23340893,C,A,rs142967124,SACS,missense_variant,SNV,...,0.001722,benign&benign/likely_benign&likely_benign,,9.213,0.121609,Benign,likely_benign,deleterious(0.01),benign(0.003),13:23340893:C:A_C
81060,AJ,S1651,11,8985795,G,A,rs61756060&COSV58469013,NRIP3,missense_variant,SNV,...,0.002735,,,25.1,0.218651,,likely_benign,deleterious(0.04),possibly_damaging(0.825),11:8985795:G:A_G
62263,AJ,S1650,11,551741,C,T,rs139348192,LRRC56,missense_variant,SNV,...,0.003883,benign,,22.6,0.101013,Pathogenic,likely_benign,deleterious(0),possibly_damaging(0.815),11:551741:C:T_C
51970,AJ,S1623,9,2823786,C,T,rs775393848,PUM3,missense_variant,SNV,...,0.000125,,,22.6,0.516913,,likely_benign,deleterious(0.01),benign(0.184),9:2823786:C:T_C
33224,AJ,S1443,8,120255342,T,C,rs141272095,COL14A1,missense_variant,SNV,...,0.000565,,,22.7,0.245616,,likely_benign,deleterious_low_confidence(0),benign(0.092),8:120255342:T:C_T
7321,AAC,S397,6,42188686,C,T,rs137853903&CM045052&COSV57835221,GUCA1B,missense_variant,SNV,...,0.004838,benign&likely_benign,,22.7,0.239087,,ambiguous,deleterious(0.01),,6:42188686:C:T_C
7867,AAC,S397,6,42745604,C,T,rs7742995,TBCC,missense_variant,SNV,...,0.009927,,,3.136,0.041897,,likely_benign,tolerated(0.38),,6:42745604:C:T_C
