# 1. GP2 - Extracting LRRK2 G2019S carriers from the AJ population

### Description
**Project Title:** The age at onset of LRRK2 p.G2019S Parkinson's disease across ancestries and countries of origin

**Version:** Python 3.10, R 4.4.2    

**Last Updtated:** 20-MAR-2025

### Notebook Overview
Extracting the G2019S SNP (i.e., chr12:40340400,  rs34637584) from the AJ PLINK files.

---
**Note:** This notebook is for the Ashkenazi Jewish (AJ) ancestry group, to apply to other ancestry groups simply assign the {ANCESTRY} variable from "AJ" to the desired one from the following ancestries:

* African Admixed (AAC)
* African (AFR)
* Ashkenazi Jewish (AJ)
* American Admixed (AMR)
* Central Asian (CAS)
* East Asian (EAS)
* European (EUR)
* Middle Eastern (MDE)
* South Asian (SAS)

### Table of Contents:

1) [Getting started](#getting-started)
2) [GP2 population: AJ](#gp2-population-aj)


## 1. Getting Started <a id="getting-started"></a>

### 1.1. Load the Python libraries

In [1]:
# Use the os package to interact with the environment
import os
import sys

# Bring in Pandas for Dataframe functionality
import pandas as pd
from functools import reduce

# Bring some visualization functionality 
import seaborn as sns

# numpy for basics
import numpy as np

# Use StringIO for working with file contents
from io import StringIO

# Enable IPython to display matplotlib graphs
import matplotlib.pyplot as plt
%matplotlib inline

# Enable interaction with the FireCloud API
from firecloud import api as fapi

# Import the iPython HTML rendering for displaying links to Google Cloud Console
#from IPython.core.display import display, HTML

# Import the iPython HTML rendering for displaying links to Google Cloud Console
from IPython.display import display, HTML

# Import urllib modules for building URLs to Google Cloud Console
import urllib.parse

# BigQuery for querying data
from google.cloud import bigquery

### 1.2. Initialize work environment variables

#### Install Plink

In [31]:
# Utility routine for printing a shell command before executing it
def shell_do(command):
    print(f'Executing: {command}', file=sys.stderr)
    !$command
    
def shell_return(command):
    print(f'Executing: {command}', file=sys.stderr)
    output = !$command
    return '\n'.join(output)

# Utility routine for printing a query before executing it
def bq_query(query):
    print(f'Executing: {query}', file=sys.stderr)
    return pd.read_gbq(query, project_id=BILLING_PROJECT_ID, dialect='standard')

# Utility routine for display a message and a link
def display_html_link(description, link_text, url):
    html = f'''
    <p>
    </p>
    <p>
    {description}
    <a target=_blank href="{url}">{link_text}</a>.
    </p>
    '''

    display(HTML(html))

# Utility routines for reading files from Google Cloud Storage
def gcs_read_file(path):
    """Return the contents of a file in GCS"""
    contents = !gsutil -u {BILLING_PROJECT_ID} cat {path}
    return '\n'.join(contents)
    
def gcs_read_csv(path, sep=None):
    """Return a DataFrame from the contents of a delimited file in GCS"""
    return pd.read_csv(StringIO(gcs_read_file(path)), sep=sep, engine='python')

# Utility routine for displaying a message and link to Cloud Console
def link_to_cloud_console_gcs(description, link_text, gcs_path):
    url = '{}?{}'.format(
        os.path.join('https://console.cloud.google.com/storage/browser',
                     gcs_path.replace("gs://","")),
        urllib.parse.urlencode({'userProject': BILLING_PROJECT_ID}))

    display_html_link(description, link_text, url)

In [None]:
%%bash

# Install Plink 1.9

# Create directory tools, check if exists
mkdir -p ~/tools
cd ~/tools

# Check if Plink 1.9 is already installed, install if not
if test -e /home/jupyter/tools/plink; then
    echo "Plink1.9 is already installed in /home/jupyter/tools/"

else
    echo -e "Downloading plink \n    -------"
    # Download plink1.9 from website
    wget -N http://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20190304.zip 
    unzip -o plink_linux_x86_64_20190304.zip
    
    echo -e "\n plink downloaded and unzipped in /home/jupyter/tools \n "
fi

Plink1.9 is already installed in /home/jupyter/tools/


In [None]:
%%bash

# Install Plink 2.0
cd /home/jupyter/tools/

# Check if Plink 2.0 is already installed, install if not
if test -e /home/jupyter/tools/plink2; then
    echo "Plink2 is already installed in /home/jupyter/"

else
    echo "Plink2 is not installed"
    cd /home/jupyter/tools/
    # Download plink2.0 from website
    wget http://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_latest.zip
    unzip -o plink2_linux_x86_64_latest.zip

fi

Plink2 is already installed in /home/jupyter/


In [None]:
%%bash

ls /home/jupyter/tools/

# Update permissions to make plink executable
chmod u+x /home/jupyter/tools/plink
chmod u+x /home/jupyter/tools/plink2

LICENSE
plink
plink2
plink2_linux_x86_64_latest.zip
plink_linux_x86_64_20190304.zip
prettify
toy.map
toy.ped
vcf_subset


In [None]:
# Install rpy2 to run R in Python environment

! pip install rpy2


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [36]:
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [None]:
%%R
# Print working directory
getwd()

[1] "/home/jupyter/LRRK2_AJ"


## 2. GP2 population: AJ <a id="gp2-population-aj"></a>

In [None]:
# Set ancestry variable
ANCESTRY = "AJ"

# Pass variable to environment to use in bash
os.environ["ANCESTRY"] = ANCESTRY

In [None]:
# Create a folder on your workspace
print("Making a working directory")
WORK_DIR = f'/home/jupyter/LRRK2_{ANCESTRY}/'
shell_do(f'mkdir -p {WORK_DIR}') # f' stands for f-string which contains expressions inside brackets

Making a working directory


Executing: mkdir -p /home/jupyter/LRRK2_AJ/


### Extract *LRRK2* chr12:40340400-40340400 as G2019S variant (GRCh38 Build)

In [None]:
! cd /home/jupyter/LRRK2_{ANCESTRY}/

# Extract LRRK2 G2019S variant as binary plink format (.bed, .bim, .fam)
! /home/jupyter/tools/plink2 \
--pfile chr12_{ANCESTRY}_release7 \
--chr 12 \
--from-bp 40340400  \
--to-bp 40340400 \
--make-bed \
--out LRRK2_{ANCESTRY}_G2019S

# Export as .ped and .map
! /home/jupyter/tools/plink \
--bfile LRRK2_{ANCESTRY}_G2019S \
--out LRRK2_{ANCESTRY}_G2019S_info \
--recode

PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to LRRK2_AJ_G2019S.log.
Options in effect:
  --chr 12
  --from-bp 40340400
  --make-bed
  --out LRRK2_AJ_G2019S
  --pfile chr12_AJ_release7
  --to-bp 40340400

Start time: Thu Mar 20 08:49:26 2025
14993 MiB RAM detected, ~13361 available; reserving 7496 MiB for main
workspace.
Using up to 4 compute threads.
2577 samples (987 females, 1590 males; 2577 founders) loaded from
chr12_AJ_release7.psam.
1104353 variants loaded from chr12_AJ_release7.pvar.
1 binary phenotype loaded (1246 cases, 393 controls).
1 variant remaining after main filters.
Writing LRRK2_AJ_G2019S.fam ... done.
Writing LRRK2_AJ_G2019S.bim ... done.
Writing LRRK2_AJ_G2019S.bed ... done.
End time: Thu Mar 20 08:49:26 2025
PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General

In [None]:
%%R -i ANCESTRY

# Change working directory
setwd(paste0('/home/jupyter/LRRK2_', ANCESTRY, '/'))

# Read the .ped file
data_g2019s <- read.table(paste0("LRRK2_", ANCESTRY, "_G2019S_info.ped", strip.white = TRUE))

# Read the CSV file
data_meta <- read.csv("master_key_release7_final.csv", header=TRUE, na.strings = c("", "NA"))

# Merge the data
data_g2019s_merge_clin <- merge(x = data_g2019s, y = data_meta, by.x = c('V2'), by.y =c("GP2sampleID"))

# Write the merged table to a file
write.table(paste0(data_g2019s_merge_clin, "data_g2019s_merge_clin_", ANCESTRY, "_r7.txt", row.names = FALSE, sep = "\t"))

## Saving

Save the final files to your workspace bucket, since we are conducting this analysis on Terra.