# LRRK2 p.A419V - Age of onset analysis 

- Project: Multiancestry LRRK2 p.A419V analysis
- Version: Python/3.10.12
- Last Updated: 05-MAY-2025

# Description

**1. Description**
- Getting started
- Load python libraries
- Define function
- Setting up path
- Install R and its packages

**2. Import data**

**3. Baseline information**

**4. Data preparation**

**5. Age of onset statistical analysis**
- Adjusted
- Unadjusted

# Getting started

## Load python libraries

In [1]:
# Use the os package to interact with the environment
import os

# Bring in Pandas for Dataframe functionality
import pandas as pd

# Numpy for basics
import numpy as np

# Use StringIO for working with file contents
from io import StringIO

# Enable IPython to display matplotlib graphs
import matplotlib.pyplot as plt
%matplotlib inline

# Enable interaction with the FireCloud API
from firecloud import api as fapi

# Import the iPython HTML rendering for displaying links to Google Cloud Console
from IPython.core.display import display, HTML

# Import urllib modules for building URLs to Google Cloud Console
import urllib.parse

# BigQuery for querying data
from google.cloud import bigquery

# Import Sys
import sys as sys

  from IPython.core.display import display, HTML


## Define function

In [3]:
# Utility routine for printing a shell command before executing it
def shell_do(command):
    print(f'Executing: {command}', file=sys.stderr)
    !$command
    
def shell_return(command):
    print(f'Executing: {command}', file=sys.stderr)
    output = !$command
    return '\n'.join(output)

# Utility routine for printing a query before executing it
def bq_query(query):
    print(f'Executing: {query}', file=sys.stderr)
    return pd.read_gbq(query, project_id=BILLING_PROJECT_ID, dialect='standard')

# Utility routine for display a message and a link
def display_html_link(description, link_text, url):
    html = f'''
    <p>
    </p>
    <p>
    {description}
    <a target=_blank href="{url}">{link_text}</a>.
    </p>
    '''

    display(HTML(html))

# Utility routines for reading files from Google Cloud Storage
def gcs_read_file(path):
    """Return the contents of a file in GCS"""
    contents = !gsutil -u {BILLING_PROJECT_ID} cat {path}
    return '\n'.join(contents)
    
def gcs_read_csv(path, sep=None):
    """Return a DataFrame from the contents of a delimited file in GCS"""
    return pd.read_csv(StringIO(gcs_read_file(path)), sep=sep, engine='python')

# Utility routine for displaying a message and link to Cloud Console
def link_to_cloud_console_gcs(description, link_text, gcs_path):
    url = '{}?{}'.format(
        os.path.join('https://console.cloud.google.com/storage/browser',
                     gcs_path.replace("gs://","")),
        urllib.parse.urlencode({'userProject': BILLING_PROJECT_ID}))

    display_html_link(description, link_text, url)

## Setting up path¶

In [2]:
# Set up billing project and data path variables
BILLING_PROJECT_ID = os.environ['GOOGLE_PROJECT']
WORKSPACE_NAMESPACE = os.environ['WORKSPACE_NAMESPACE']
WORKSPACE_NAME = os.environ['WORKSPACE_NAME']
WORKSPACE_BUCKET = os.environ['WORKSPACE_BUCKET']

WORKSPACE_ATTRIBUTES = fapi.get_workspace(WORKSPACE_NAMESPACE, WORKSPACE_NAME).json().get('workspace',{}).get('attributes',{})

## GP2 v5.0 gs://gp2tier2/release9_18122024/
GP2_TIER1 = 'gs://gp2tier1/release9_18122024'
GP2_RELEASE_PATH = 'gs://gp2tier2/release9_18122024'
GP2_CLINICAL_RELEASE_PATH = f'{GP2_RELEASE_PATH}/clinical_data'
GP2_META_RELEASE_PATH = f'{GP2_RELEASE_PATH}/meta_data'
GP2_SUMSTAT_RELEASE_PATH = f'{GP2_RELEASE_PATH}/summary_statistics'

GP2_RAW_GENO_PATH = f'{GP2_RELEASE_PATH}/raw_genotypes'
GP2_IMPUTED_GENO_PATH = f'{GP2_RELEASE_PATH}/imputed_genotypes'
print('GP2 v5.0')
print(f'Path to GP2 v2.0 Clinical Data: {GP2_CLINICAL_RELEASE_PATH}')
print(f'Path to GP2 v2.0 Raw Genotype Data: {GP2_RAW_GENO_PATH}')
print(f'Path to GP2 v2.0 Imputed Genotype Data: {GP2_IMPUTED_GENO_PATH}')

GP2 v5.0
Path to GP2 v2.0 Clinical Data: gs://gp2tier2/release9_18122024/clinical_data
Path to GP2 v2.0 Raw Genotype Data: gs://gp2tier2/release9_18122024/raw_genotypes
Path to GP2 v2.0 Imputed Genotype Data: gs://gp2tier2/release9_18122024/imputed_genotypes


## Install R and its packages

In [None]:
pip install rpy2

In [4]:
%load_ext rpy2.ipython

In [5]:
%%R
install.packages("tidyverse")
install.packages("data.table")

library(tidyverse)
library(data.table)

* installing *source* package ‘tidyverse’ ...
** package ‘tidyverse’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (tidyverse)
* installing *source* package ‘data.table’ ...
** package ‘data.table’ successfully unpacked and MD5 sums checked
** using staged installation


gcc 9.4.0
zlib 1.2.11 is available ok
* checking if R installation supports OpenMP without any extra hints... yes


** libs
using C compiler: ‘gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0’


gcc -I"/usr/share/R/include" -DNDEBUG      -fopenmp  -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-EpRONj/r-base-4.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c assign.c -o assign.o
gcc -I"/usr/share/R/include" -DNDEBUG      -fopenmp  -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-EpRONj/r-base-4.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c between.c -o between.o
gcc -I"/usr/share/R/include" -DNDEBUG      -fopenmp  -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-EpRONj/r-base-4.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c bmerge.c -o bmerge.o
gcc -I"/usr/share/R/include" -DNDEBUG      -fopenmp  -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-EpRONj/r-base-4.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c chmatch.c -o chmatch.o
gcc -I"/usr/share/R/include" -DNDEBUG      -fope

gcc -I"/usr/share/R/include" -DNDEBUG      -fopenmp  -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-EpRONj/r-base-4.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c snprintf.c -o snprintf.o
gcc -I"/usr/share/R/include" -DNDEBUG      -fopenmp  -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-EpRONj/r-base-4.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c subset.c -o subset.o
gcc -I"/usr/share/R/include" -DNDEBUG      -fopenmp  -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-EpRONj/r-base-4.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c transpose.c -o transpose.o
gcc -I"/usr/share/R/include" -DNDEBUG      -fopenmp  -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-EpRONj/r-base-4.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c types.c -o types.o
gcc -I"/usr/share/R/include" -DNDEBUG      -

installing to /home/jupyter/packages/00LOCK-data.table/00new/data.table/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (data.table)


── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors


Installing package into ‘/home/jupyter/packages’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/tidyverse_2.0.0.tar.gz'
Content type 'application/x-gzip' length 704618 bytes (688 KB)
downloaded 688 KB


The downloaded source packages are in
	‘/tmp/RtmpKhTYMS/downloaded_packages’
Installing package into ‘/home/jupyter/packages’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/data.table_1.17.0.tar.gz'
Content type 'application/x-gzip' length 5833671 bytes (5.6 MB)
downloaded 5.6 MB


The downloaded source packages are in
	‘/tmp/RtmpKhTYMS/downloaded_packages’
data.table 1.17.0 using 1 threads (see ?getDTthreads).  Latest news: r-datatable.com

Attaching package: ‘data.table’

The following objects are masked from ‘package:lubridate’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

The following objects are masked from ‘package:dplyr’:

    between, first, last

The following object is maske

# Import data

In [11]:
WORK_DIR = f'/home/jupyter/A419V_release9'
labels = ['AAC', 'AFR', 'AJ', 'AMR', 'CAH', 'CAS','FIN', 'MDE', 'SAS', 'EAS', 'EUR']

for label in labels:
    shell_do(f'gsutil -u {BILLING_PROJECT_ID} -m cp -r {GP2_RELEASE_PATH}/imputed_genotypes/{label}/chr12_{label}_release9* {WORK_DIR}/{label}/')

Executing: gsutil -u terra-8cb3be5c -m cp -r gs://gp2tier2/release9_18122024/imputed_genotypes/AAC/chr12_AAC_release9* /home/jupyter/A419V_release9/AAC/


Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AAC/chr12_AAC_release9.pgen...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AAC/chr12_AAC_release9.psam...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AAC/chr12_AAC_release9.log...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AAC/chr12_AAC_release9.pvar...
- [4/4 files][  1.1 GiB/  1.1 GiB] 100% Done  53.0 MiB/s ETA 00:00:00           
Operation completed over 4 objects/1.1 GiB.                                      


Executing: gsutil -u terra-8cb3be5c -m cp -r gs://gp2tier2/release9_18122024/imputed_genotypes/AFR/chr12_AFR_release9* /home/jupyter/A419V_release9/AFR/


Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AFR/chr12_AFR_release9.pgen...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AFR/chr12_AFR_release9.psam...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AFR/chr12_AFR_release9.log...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AFR/chr12_AFR_release9.pvar...
/ [4/4 files][  2.8 GiB/  2.8 GiB] 100% Done  66.7 MiB/s ETA 00:00:00           
Operation completed over 4 objects/2.8 GiB.                                      


Executing: gsutil -u terra-8cb3be5c -m cp -r gs://gp2tier2/release9_18122024/imputed_genotypes/AJ/chr12_AJ_release9* /home/jupyter/A419V_release9/AJ/


Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AJ/chr12_AJ_release9.psam...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AJ/chr12_AJ_release9.log...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AJ/chr12_AJ_release9.pgen...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AJ/chr12_AJ_release9.pvar...
| [4/4 files][  1.3 GiB/  1.3 GiB] 100% Done  70.6 MiB/s ETA 00:00:00           
Operation completed over 4 objects/1.3 GiB.                                      


Executing: gsutil -u terra-8cb3be5c -m cp -r gs://gp2tier2/release9_18122024/imputed_genotypes/AMR/chr12_AMR_release9* /home/jupyter/A419V_release9/AMR/


Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AMR/chr12_AMR_release9.psam...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AMR/chr12_AMR_release9.pgen...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AMR/chr12_AMR_release9.log...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/AMR/chr12_AMR_release9.pvar...
/ [4/4 files][  4.0 GiB/  4.0 GiB] 100% Done  65.5 MiB/s ETA 00:00:00           
Operation completed over 4 objects/4.0 GiB.                                      


Executing: gsutil -u terra-8cb3be5c -m cp -r gs://gp2tier2/release9_18122024/imputed_genotypes/CAH/chr12_CAH_release9* /home/jupyter/A419V_release9/CAH/


Copying gs://gp2tier2/release9_18122024/imputed_genotypes/CAH/chr12_CAH_release9.psam...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/CAH/chr12_CAH_release9.log...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/CAH/chr12_CAH_release9.pgen...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/CAH/chr12_CAH_release9.pvar...
/ [4/4 files][  1.1 GiB/  1.1 GiB] 100% Done  19.4 MiB/s ETA 00:00:00           
Operation completed over 4 objects/1.1 GiB.                                      


Executing: gsutil -u terra-8cb3be5c -m cp -r gs://gp2tier2/release9_18122024/imputed_genotypes/CAS/chr12_CAS_release9* /home/jupyter/A419V_release9/CAS/


Copying gs://gp2tier2/release9_18122024/imputed_genotypes/CAS/chr12_CAS_release9.pgen...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/CAS/chr12_CAS_release9.log...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/CAS/chr12_CAS_release9.psam...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/CAS/chr12_CAS_release9.pvar...
\ [4/4 files][  1.3 GiB/  1.3 GiB] 100% Done  61.8 MiB/s ETA 00:00:00           
Operation completed over 4 objects/1.3 GiB.                                      


Executing: gsutil -u terra-8cb3be5c -m cp -r gs://gp2tier2/release9_18122024/imputed_genotypes/FIN/chr12_FIN_release9* /home/jupyter/A419V_release9/FIN/


Copying gs://gp2tier2/release9_18122024/imputed_genotypes/FIN/chr12_FIN_release9.pgen...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/FIN/chr12_FIN_release9.psam...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/FIN/chr12_FIN_release9.log...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/FIN/chr12_FIN_release9.pvar...
/ [4/4 files][142.0 MiB/142.0 MiB] 100% Done                                    
Operation completed over 4 objects/142.0 MiB.                                    


Executing: gsutil -u terra-8cb3be5c -m cp -r gs://gp2tier2/release9_18122024/imputed_genotypes/MDE/chr12_MDE_release9* /home/jupyter/A419V_release9/MDE/


Copying gs://gp2tier2/release9_18122024/imputed_genotypes/MDE/chr12_MDE_release9.pgen...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/MDE/chr12_MDE_release9.psam...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/MDE/chr12_MDE_release9.log...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/MDE/chr12_MDE_release9.pvar...
/ [4/4 files][  1.1 GiB/  1.1 GiB] 100% Done  62.4 MiB/s ETA 00:00:00           
Operation completed over 4 objects/1.1 GiB.                                      


Executing: gsutil -u terra-8cb3be5c -m cp -r gs://gp2tier2/release9_18122024/imputed_genotypes/SAS/chr12_SAS_release9* /home/jupyter/A419V_release9/SAS/


Copying gs://gp2tier2/release9_18122024/imputed_genotypes/SAS/chr12_SAS_release9.psam...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/SAS/chr12_SAS_release9.pgen...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/SAS/chr12_SAS_release9.log...
Copying gs://gp2tier2/release9_18122024/imputed_genotypes/SAS/chr12_SAS_release9.pvar...
/ [4/4 files][832.1 MiB/832.1 MiB] 100% Done  66.6 MiB/s ETA 00:00:00           
Operation completed over 4 objects/832.1 MiB.                                    


# Baseline information 

In [14]:
import statistics as st

df = []

labels = ['AAC', 'AFR', 'AJ', 'AMR', 'CAH', 'CAS', 'EAS', 'EUR', 'FIN', 'MDE', 'SAS']
for label in labels:
    master     = pd.read_csv(f"/home/jupyter/A419V_release9/master_key_release9_final.csv")
    master_red = master[(master["nba_label"] == label) & (~master["age_of_onset"].isna())]
    sd = round(st.stdev(master_red["age_of_onset"]), 2)
    mean = round(st.mean(master_red["age_of_onset"]), 2)

    df.append({
        'label' : label,
        'mean AAO +- SD' : str(mean) + ' +- ' + str(sd) 
    })
    
pd.DataFrame(df)

Unnamed: 0,label,mean AAO +- SD
0,AAC,59.08 +- 12.49
1,AFR,57.18 +- 12.98
2,AJ,62.45 +- 11.38
3,AMR,52.41 +- 13.74
4,CAH,55.57 +- 13.68
5,CAS,52.76 +- 12.12
6,EAS,53.5 +- 12.88
7,EUR,58.94 +- 12.0
8,FIN,58.38 +- 11.83
9,MDE,53.35 +- 13.38


# Data preparation

we need to extract these variant below:

Raw genotyped
1. exm994472 or Seq_rs34594498 - A419V
2. seq_rs34637584 - G2019S
3. seq_rs34778348 - G2385R

Imputed SNPs
1. chr12:40320043:G:C - R1628P (imputed)

## SNPs ID

In [12]:
with open("/home/jupyter/A419V_release9/extract_snps_raw_cas_aj.txt", "w") as f:
    f.write("Seq_rs34594498\nseq_rs34637584\nseq_rs34778348")

In [17]:
with open("/home/jupyter/A419V_release9/extract_snps_raw.txt", "w") as f:
    f.write("exm994472\nseq_rs34637584\nseq_rs34778348")

In [22]:
with open("/home/jupyter/A419V_release9/extract_snps_imp.txt", "w") as f:
    f.write("chr12:40320043:G:C")

## PLINK file

In [21]:
%%bash
WORK_DIR=/home/jupyter/A419V_release9
cd $WORK_DIR

labels=('CAH' 'EAS' 'EUR')

for label in "${labels[@]}"
do

    /home/jupyter/plink1.9 \
    --bfile ${label}/${label}_release9_remove_related_updated \
    --filter-cases \
    --extract extract_snps_raw.txt \
    --make-bed \
    --out ${label}/${label}_release9_extracted_raw
    
done

PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_release9_extracted_raw.log.
Options in effect:
  --bfile CAH/CAH_release9_remove_related_updated
  --extract extract_snps_raw.txt
  --filter-cases
  --make-bed
  --out CAH/CAH_release9_extracted_raw

3672 MB RAM detected; reserving 1836 MB for main workspace.
1904683 variants loaded from .bim file.
982 people (514 males, 468 females) loaded from .fam.
954 phenotype values loaded from .fam.
--extract: 3 variants remaining.
338 people removed due to case/control status (--filter-cases).
Using 1 thread.
Before main variant filters, 644 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989 done.
Total genotyping rate in rem

In [22]:
%%bash
WORK_DIR=/home/jupyter/A419V_release9
cd $WORK_DIR

labels=('CAS')

for label in "${labels[@]}"
do

    /home/jupyter/plink1.9 \
    --bfile ${label}/${label}_release9_remove_related_updated \
    --filter-cases \
    --extract extract_snps_raw_cas_aj.txt \
    --make-bed \
    --out ${label}/${label}_release9_extracted_raw
    
done

PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to AJ/AJ_release9_extracted_raw.log.
Options in effect:
  --bfile AJ/AJ_release9_remove_related_updated
  --extract extract_snps_raw_cas_aj.txt
  --filter-cases
  --make-bed
  --out AJ/AJ_release9_extracted_raw

3672 MB RAM detected; reserving 1836 MB for main workspace.
1875579 variants loaded from .bim file.
3081 people (1934 males, 1147 females) loaded from .fam.
2533 phenotype values loaded from .fam.
--extract: 3 variants remaining.
1372 people removed due to case/control status (--filter-cases).
Using 1 thread.
Before main variant filters, 1709 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989 done.
Total genotyping rate

In [23]:
%%bash
WORK_DIR=/home/jupyter/A419V_release9
cd $WORK_DIR

labels=('CAH' 'EAS' 'EUR' 'CAS')

for label in "${labels[@]}"
do

    /home/jupyter/plink2 \
    --pfile ${label}/chr12_${label}_release9 \
    --keep ${label}/${label}_case_id.txt \
    --extract extract_snps_imp.txt \
    --make-bed \
    --out ${label}/${label}_release9_extracted_imp

done

PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_release9_extracted_imp.log.
Options in effect:
  --extract extract_snps_imp.txt
  --keep CAH/CAH_case_id.txt
  --make-bed
  --out CAH/CAH_release9_extracted_imp
  --pfile CAH/chr12_CAH_release9

Start time: Wed Apr  9 10:00:29 2025
3672 MiB RAM detected, ~1989 available; reserving 1836 MiB for main workspace.
Using 1 compute thread.
1003 samples (480 females, 523 males; 1003 founders) loaded from
CAH/chr12_CAH_release9.psam.
2663687 variants loaded from CAH/chr12_CAH_release9.pvar.
1 binary phenotype loaded (646 cases, 317 controls).
--extract: 1 variant remaining.
--keep: 644 samples remaining.
644 samples (275 females, 369 males; 644 founders) remaining after main
filters.
639 cases and 0 controls remaining after main filters.
1 variant remaining after main filters.
Writing CAH/CAH_release9_extracted_imp.fam .

Error: No variants remaining after main filters.


End time: Wed Apr  9 10:00:47 2025
PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAS/CAS_release9_extracted_imp.log.
Options in effect:
  --extract extract_snps_imp.txt
  --keep CAS/CAS_case_id.txt
  --make-bed
  --out CAS/CAS_release9_extracted_imp
  --pfile CAS/chr12_CAS_release9

Start time: Wed Apr  9 10:00:47 2025
3672 MiB RAM detected, ~1940 available; reserving 1836 MiB for main workspace.
Using 1 compute thread.
1071 samples (571 females, 500 males; 1071 founders) loaded from
CAS/chr12_CAS_release9.psam.
1605084 variants loaded from CAS/chr12_CAS_release9.pvar.
1 binary phenotype loaded (609 cases, 345 controls).
--extract: 1 variant remaining.
--keep: 661 samples remaining.
661 samples (360 females, 301 males; 661 founders) remaining after main
filters.
592 cases and 0 controls remaining after main filters.
1 variant remaining after main filters.
Writing C

In [50]:
%%bash
WORK_DIR=/home/jupyter/A419V_release9
cd $WORK_DIR

labels=('CAH' 'EAS' 'EUR' 'CAS')

for label in "${labels[@]}"
do

    /home/jupyter/plink1.9 \
    --bfile ${label}/${label}_release9_extracted_raw \
    --bmerge ${label}/${label}_release9_extracted_imp  \
    --make-bed \
    --out ${label}/${label}_release9_extracted_merged
    
done

PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_release9_extracted_merged.log.
Options in effect:
  --bfile CAH/CAH_release9_extracted_raw
  --bmerge CAH/CAH_release9_extracted_imp
  --make-bed
  --out CAH/CAH_release9_extracted_merged

3672 MB RAM detected; reserving 1836 MB for main workspace.
644 people loaded from CAH/CAH_release9_extracted_raw.fam.
644 people to be merged from CAH/CAH_release9_extracted_imp.fam.
Of these, 0 are new, while 644 are present in the base dataset.
3 markers loaded from CAH/CAH_release9_extracted_raw.bim.
1 marker to be merged from CAH/CAH_release9_extracted_imp.bim.
Of these, 1 is new, while 0 are present in the base dataset.
Performing single-pass merge (644 people, 4 variants).
Merged fileset written to CAH/CAH_release9_extracted_merged-merge.bed +
CAH/CAH_release9_extracted_merged-merge.bim +
CAH/CAH_release9_extracted_merg

Error: Failed to open AJ/AJ_release9_extracted_imp.fam.


PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAS/CAS_release9_extracted_merged.log.
Options in effect:
  --bfile CAS/CAS_release9_extracted_raw
  --bmerge CAS/CAS_release9_extracted_imp
  --make-bed
  --out CAS/CAS_release9_extracted_merged

3672 MB RAM detected; reserving 1836 MB for main workspace.
661 people loaded from CAS/CAS_release9_extracted_raw.fam.
661 people to be merged from CAS/CAS_release9_extracted_imp.fam.
Of these, 0 are new, while 661 are present in the base dataset.
3 markers loaded from CAS/CAS_release9_extracted_raw.bim.
1 marker to be merged from CAS/CAS_release9_extracted_imp.bim.
Of these, 1 is new, while 0 are present in the base dataset.
Performing single-pass merge (661 people, 4 variants).
Merged fileset written to CAS/CAS_release9_extracted_merged-merge.bed +
CAS/CAS_release9_extracted_merged-merge.bim +
CAS/CAS_release9_extracted_merg

In [51]:
WORK_DIR = "/home/jupyter/A419V_release9"
labels = ['CAH', 'CAS', 'EAS', 'EUR']

for label in labels:
    
    # Rename the SNPID in bim file to be in the format of chr_pos_ref_alt
    bim = pd.read_csv(f"/home/jupyter/A419V_release9/{label}/{label}_release9_extracted_merged.bim", sep = "\t", names = ["chr", "rsid", "pos", "bp", "a1", "a2"])
    bim['chr'] = 'chr' + bim['chr'].astype(str)
    bim['bp'] = bim['bp'].astype(str)
    bim['rsid'] = bim['chr'].str.cat(bim['bp'], sep = "_")
    bim['rsid'] = bim['rsid'].str.cat(bim['a2'], sep = "_")
    bim['rsid'] = bim['rsid'].str.cat(bim['a1'], sep = "_")
    bim.to_csv(f"/home/jupyter/A419V_release9/{label}/{label}_release9_extracted_merged.bim", sep = "\t", index = False, header = False)

In [52]:
%%bash
WORK_DIR=/home/jupyter/A419V_release9
cd $WORK_DIR

labels=('CAH' 'EAS' 'EUR' 'CAS')

for label in "${labels[@]}"
do

    /home/jupyter/plink1.9 \
    --bfile ${label}/${label}_release9_extracted_merged \
    --recode A \
    --out ${label}/${label}_release9_extracted_merged

done

PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_release9_extracted_merged.log.
Options in effect:
  --bfile CAH/CAH_release9_extracted_merged
  --out CAH/CAH_release9_extracted_merged
  --recode A

3672 MB RAM detected; reserving 1836 MB for main workspace.
4 variants loaded from .bim file.
644 people (369 males, 275 females) loaded from .fam.
644 phenotype values loaded from .fam.
Using 1 thread.
Before main variant filters, 644 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989 done.
Total genotyping rate is 0.998835.
4 variants and 644 people pass filters and QC.
Among remaining phenotypes, 644 are cases and 0 are controls.
--recode A to CAH/CAH_release9_extrac

## Covariate file

In [4]:
labels=['CAH', 'CAS', 'EAS', 'EUR', 'AJ']
for label in labels:
    
    cov = pd.read_csv(f"/home/jupyter/A419V_release9/{label}/{label}_covar.txt", sep = "\t")
    master = pd.read_csv(f"/home/jupyter/A419V_release9/master_key_release9_final.csv")
    master.rename(columns = {"GP2ID":"IID"}, inplace = True)

    AAO = master[["IID", "age_of_onset"]]

    cov_AAO = pd.merge(cov, AAO, how = "left", on = "IID")
    cov_AAO_cases = cov_AAO[(cov_AAO["PHENO"] == 2) & (~cov_AAO["age_of_onset"].isna())]
    cov_AAO_cases[["FID", "IID", "age_of_onset"]].to_csv(f"/home/jupyter/A419V_release9/{label}/{label}_covar_AAO.txt", sep = "\t", header = True, index = False)
    cov_AAO_cases[["FID", "IID", "SEX", "age_of_onset", "PC1", "PC2", "PC3", "PC4", "PC5"]].to_csv(f"/home/jupyter/A419V_release9/{label}/{label}_covar_AAO_full.txt", sep = "\t", header = True, index = False)

    cov_AAO_cases_2 = cov_AAO[(cov_AAO["PHENO"] == 2) & (~cov_AAO["age_of_onset"].isna()) & (~cov_AAO["AGE"].isna())]
    cov_AAO_cases_2[["FID", "IID", "SEX", "AGE", "age_of_onset", "PC1", "PC2", "PC3", "PC4", "PC5"]].to_csv(f"/home/jupyter/A419V_release9/{label}/{label}_covar_AAO_full_withage.txt", sep = "\t", header = True, index = False)


# Age of onset

## Linear model

### No adjustment

In [7]:
%%bash
WORK_DIR='/home/jupyter/A419V_release9'
cd $WORK_DIR

labels=('CAH' 'CAS' 'EAS' 'EUR')

for label in "${labels[@]}"
do

    /home/jupyter/plink2 \
    --bfile ${label}/${label}_release9_extracted_merged \
    --pheno ${label}/${label}_covar_AAO.txt \
    --pheno-name age_of_onset \
    --linear allow-no-covars cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err \
    --ci 0.95 \
    --out ${label}/${label}_AAO_unadjusted
    
done

PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_AAO_unadjusted.log.
Options in effect:
  --bfile CAH/CAH_release9_extracted_merged
  --ci 0.95
  --glm allow-no-covars cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err
  --out CAH/CAH_AAO_unadjusted
  --pheno CAH/CAH_covar_AAO.txt
  --pheno-name age_of_onset

Start time: Wed Apr 16 09:15:31 2025
52216 MiB RAM detected, ~50501 available; reserving 26108 MiB for main
workspace.
Using up to 8 compute threads.
644 samples (275 females, 369 males; 644 founders) loaded from
CAH/CAH_release9_extracted_merged.fam.
4 variants loaded from CAH/CAH_release9_extracted_merged.bim.
1 quantitative phenotype loaded (408 values).
Calculating allele frequencies... done.
--glm linear regression on phenotype 'age_of_onset': done.
Results written to CAH/CAH_AAO_unadjusted.age_of_onset.glm.linear .
End time: Wed

In [8]:
labels = ['CAH', 'CAS', 'EAS', 'EUR']
df = pd.DataFrame()

for label in labels:

    lm          = pd.read_csv(f"/home/jupyter/A419V_release9/{label}/{label}_AAO_unadjusted.age_of_onset.glm.linear", delim_whitespace = True)
    lm_a419     = lm[lm["ID"] == "chr12_40252984_G_A"]
    lm_a419_red = lm_a419[["ID", "REF", "ALT", "A1_CT", "ALLELE_CT", "A1_FREQ", "TEST", "OBS_CT", "BETA", "SE", "L95", "U95", "T_STAT", "P", "ERRCODE"]]
    
    lm_a419_red["label"]  = label
    lm_a419_red["REF_CT"] = lm_a419_red["ALLELE_CT"] - lm_a419_red["A1_CT"]
    lm_a419_red           = lm_a419_red[["label", "ID", "REF", "ALT", "ALLELE_CT", "A1_CT", "REF_CT", "A1_FREQ", "TEST", "OBS_CT", "BETA", "SE", "L95", "U95", "T_STAT", "P", "ERRCODE"]]
    
    df = pd.concat([df, lm_a419_red])

df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["label"]  = label
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["REF_CT"] = lm_a419_red["ALLELE_CT"] - lm_a419_red["A1_CT"]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["label"]  = label
A value is trying to be set on a copy of a slice from a DataFrame.
Try us

Unnamed: 0,label,ID,REF,ALT,ALLELE_CT,A1_CT,REF_CT,A1_FREQ,TEST,OBS_CT,BETA,SE,L95,U95,T_STAT,P,ERRCODE
0,CAH,chr12_40252984_G_A,G,A,816,3,813,0.003676,ADD,408,-6.59412,8.09728,-22.4645,9.27626,-0.814362,0.415915,.
0,CAS,chr12_40252984_G_A,G,A,918,10,908,0.010893,ADD,459,-7.9951,3.7582,-15.361,-0.629163,-2.12737,0.033924,.
0,EAS,chr12_40252984_G_A,G,A,4730,74,4656,0.015645,ADD,2365,-2.97755,1.47309,-5.86476,-0.090347,-2.0213,0.043362,.
0,EUR,chr12_40252984_G_A,G,A,21436,14,21422,0.000653,ADD,10718,-2.40034,3.00913,-8.29813,3.49746,-0.797684,0.425072,.
0,AJ,chr12_40252984_G_A,G,A,2678,1,2677,0.000373,ADD,1339,-17.3693,11.5902,-40.0857,5.34713,-1.49862,0.134209,.


### Adjusted

#### Adjusted by sex and PCs

In [9]:
%%bash
WORK_DIR='/home/jupyter/A419V_release9'
cd $WORK_DIR

labels=('CAH' 'CAS' 'EAS' 'EUR')

for label in "${labels[@]}"
do

    /home/jupyter/plink2 \
    --bfile ${label}/${label}_release9_extracted_merged \
    --pheno ${label}/${label}_covar_AAO.txt \
    --pheno-name age_of_onset \
    --covar ${label}/${label}_covar_AAO_full.txt \
    --covar-name SEX,PC1,PC2,PC3,PC4,PC5 \
    --linear cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err \
    --ci 0.95 \
    --out ${label}/${label}_AAO_adjusted_sex_pc

done

PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_AAO_adjusted_sex_pc.log.
Options in effect:
  --bfile CAH/CAH_release9_extracted_merged
  --ci 0.95
  --covar CAH/CAH_covar_AAO_full.txt
  --covar-name SEX,PC1,PC2,PC3,PC4,PC5
  --glm cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err
  --out CAH/CAH_AAO_adjusted_sex_pc
  --pheno CAH/CAH_covar_AAO.txt
  --pheno-name age_of_onset

Start time: Wed Apr 16 09:17:07 2025
52216 MiB RAM detected, ~50499 available; reserving 26108 MiB for main
workspace.
Using up to 8 compute threads.
644 samples (275 females, 369 males; 644 founders) loaded from
CAH/CAH_release9_extracted_merged.fam.
4 variants loaded from CAH/CAH_release9_extracted_merged.bim.
1 quantitative phenotype loaded (408 values).
6 covariates loaded from CAH/CAH_covar_AAO_full.txt.
Calculating allele frequencies... done.
--glm linear regr

In [10]:
labels = ['CAH', 'CAS', 'EAS', 'EUR']
df = pd.DataFrame()

for label in labels:

    lm          = pd.read_csv(f"/home/jupyter/A419V_release9/{label}/{label}_AAO_adjusted_sex_pc.age_of_onset.glm.linear", delim_whitespace = True)
    lm_a419     = lm[lm["ID"] == "chr12_40252984_G_A"]
    lm_a419_red = lm_a419[["ID", "REF", "ALT", "A1_CT", "ALLELE_CT", "A1_FREQ", "TEST", "OBS_CT", "BETA", "SE", "L95", "U95", "T_STAT", "P", "ERRCODE"]]
    
    lm_a419_red["label"]  = label
    lm_a419_red["REF_CT"] = lm_a419_red["ALLELE_CT"] - lm_a419_red["A1_CT"]
    lm_a419_red           = lm_a419_red[["label", "ID", "REF", "ALT", "ALLELE_CT", "A1_CT", "REF_CT", "A1_FREQ", "TEST", "OBS_CT", "BETA", "SE", "L95", "U95", "T_STAT", "P", "ERRCODE"]]
    
    df = pd.concat([df, lm_a419_red])

df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["label"]  = label
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["REF_CT"] = lm_a419_red["ALLELE_CT"] - lm_a419_red["A1_CT"]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["label"]  = label
A value is trying to be set on a copy of a slice from a DataFrame.
Try us

Unnamed: 0,label,ID,REF,ALT,ALLELE_CT,A1_CT,REF_CT,A1_FREQ,TEST,OBS_CT,BETA,SE,L95,U95,T_STAT,P,ERRCODE
0,CAH,chr12_40252984_G_A,G,A,816,3,813,0.003676,ADD,408,-10.391,8.17257,-26.4089,5.62697,-1.27144,0.204309,.
1,CAH,chr12_40252984_G_A,G,A,816,3,813,0.003676,SEX,408,0.955993,1.38608,-1.76068,3.67267,0.689708,0.490778,.
2,CAH,chr12_40252984_G_A,G,A,816,3,813,0.003676,PC1,408,49.6349,24.8251,0.978466,98.2913,1.99938,0.0462434,.
3,CAH,chr12_40252984_G_A,G,A,816,3,813,0.003676,PC2,408,48.7434,20.9876,7.60835,89.8784,2.32248,0.0207082,.
4,CAH,chr12_40252984_G_A,G,A,816,3,813,0.003676,PC3,408,-6.47066,67.9436,-139.638,126.696,-0.095236,0.924175,.
5,CAH,chr12_40252984_G_A,G,A,816,3,813,0.003676,PC4,408,65.078,27.7759,10.6383,119.518,2.34297,0.0196202,.
6,CAH,chr12_40252984_G_A,G,A,816,3,813,0.003676,PC5,408,57.3986,27.4762,3.54635,111.251,2.08903,0.0373362,.
0,CAS,chr12_40252984_G_A,G,A,918,10,908,0.010893,ADD,459,-9.49786,3.76428,-16.8757,-2.12,-2.52315,0.0119735,.
1,CAS,chr12_40252984_G_A,G,A,918,10,908,0.010893,SEX,459,2.70896,1.10671,0.539843,4.87808,2.44775,0.0147552,.
2,CAS,chr12_40252984_G_A,G,A,918,10,908,0.010893,PC1,459,-0.04389,50.8591,-99.7259,99.6381,-0.000863,0.999312,.


#### Adjusted by sex, age and PCs

In [21]:
%%bash
WORK_DIR='/home/jupyter/A419V_release9'
cd $WORK_DIR

labels=('CAH' 'CAS' 'EAS' 'EUR')

for label in "${labels[@]}"
do

    /home/jupyter/plink2 \
    --bfile ${label}/${label}_release9_extracted_merged \
    --pheno ${label}/${label}_covar_AAO.txt \
    --pheno-name age_of_onset \
    --covar ${label}/${label}_covar_AAO_full_withage.txt \
    --covar-name SEX,AGE,PC1,PC2,PC3,PC4,PC5 \
    --covar-variance-standardize \
    --linear cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err \
    --ci 0.95 \
    --out ${label}/${label}_AAO_adjusted_sex_pc_age

done

PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_AAO_adjusted_sex_pc_age.log.
Options in effect:
  --bfile CAH/CAH_release9_extracted_merged
  --ci 0.95
  --covar CAH/CAH_covar_AAO_full_withage.txt
  --covar-name SEX,AGE,PC1,PC2,PC3,PC4,PC5
  --covar-variance-standardize
  --glm cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err
  --out CAH/CAH_AAO_adjusted_sex_pc_age
  --pheno CAH/CAH_covar_AAO.txt
  --pheno-name age_of_onset

Start time: Mon Apr 14 03:23:54 2025
3672 MiB RAM detected, ~2218 available; reserving 1836 MiB for main workspace.
Using 1 compute thread.
644 samples (275 females, 369 males; 644 founders) loaded from
CAH/CAH_release9_extracted_merged.fam.
4 variants loaded from CAH/CAH_release9_extracted_merged.bim.
1 quantitative phenotype loaded (408 values).
7 covariates loaded from CAH/CAH_covar_AAO_full_withage.txt.
--covar-

In [22]:
labels = ['CAH', 'CAS', 'EAS', 'EUR']
df = pd.DataFrame()

for label in labels:

    lm          = pd.read_csv(f"/home/jupyter/A419V_release9/{label}/{label}_AAO_adjusted_sex_pc_age.age_of_onset.glm.linear", delim_whitespace = True)
    lm_a419     = lm[lm["ID"] == "chr12_40252984_G_A"]
    lm_a419_red = lm_a419[["ID", "REF", "ALT", "A1_CT", "ALLELE_CT", "A1_FREQ", "TEST", "OBS_CT", "BETA", "SE", "L95", "U95", "T_STAT", "P", "ERRCODE"]]
    
    lm_a419_red["label"]  = label
    lm_a419_red["REF_CT"] = lm_a419_red["ALLELE_CT"] - lm_a419_red["A1_CT"]
    lm_a419_red           = lm_a419_red[["label", "ID", "REF", "ALT", "ALLELE_CT", "A1_CT", "REF_CT", "A1_FREQ", "TEST", "OBS_CT", "BETA", "SE", "L95", "U95", "T_STAT", "P", "ERRCODE"]]
    
    df = pd.concat([df, lm_a419_red])

df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["label"]  = label
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["REF_CT"] = lm_a419_red["ALLELE_CT"] - lm_a419_red["A1_CT"]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["label"]  = label
A value is trying to be set on a copy of a slice from a DataFrame.
Try us

Unnamed: 0,label,ID,REF,ALT,ALLELE_CT,A1_CT,REF_CT,A1_FREQ,TEST,OBS_CT,BETA,SE,L95,U95,T_STAT,P,ERRCODE
0,CAH,chr12_40252984_G_A,G,A,796,2,794,0.002513,ADD,398,-4.07478,4.20051,-12.3076,4.15807,-0.970068,0.332616,.
1,CAH,chr12_40252984_G_A,G,A,796,2,794,0.002513,SEX,398,-0.206535,0.293132,-0.781063,0.367994,-0.704579,0.481494,.
2,CAH,chr12_40252984_G_A,G,A,796,2,794,0.002513,AGE,398,12.5762,0.293935,12.0001,13.1523,42.7856,3.4891499999999998e-149,.
3,CAH,chr12_40252984_G_A,G,A,796,2,794,0.002513,PC1,398,-0.050248,0.312456,-0.66265,0.562153,-0.160817,0.872321,.
4,CAH,chr12_40252984_G_A,G,A,796,2,794,0.002513,PC2,398,1.12616,0.300217,0.537744,1.71457,3.75115,0.000202809,.
5,CAH,chr12_40252984_G_A,G,A,796,2,794,0.002513,PC3,398,0.286796,0.349958,-0.399109,0.972702,0.819515,0.412994,.
6,CAH,chr12_40252984_G_A,G,A,796,2,794,0.002513,PC4,398,0.60387,0.324105,-0.031364,1.2391,1.86319,0.0631884,.
7,CAH,chr12_40252984_G_A,G,A,796,2,794,0.002513,PC5,398,0.639872,0.329311,-0.005565,1.28531,1.94306,0.0527295,.
0,CAS,chr12_40252984_G_A,G,A,904,10,894,0.011062,ADD,452,-4.30013,1.67182,-7.57684,-1.02341,-2.57212,0.0104327,.
1,CAS,chr12_40252984_G_A,G,A,904,10,894,0.011062,SEX,452,0.247901,0.24796,-0.238091,0.733893,0.999763,0.317971,.


#### Remove R1628P, G2019S, G2385R carrier

In [13]:
%%bash
WORK_DIR='/home/jupyter/A419V_release9'
cd $WORK_DIR

labels=('CAH' 'CAS' 'EAS' 'EUR' 'AJ')

for label in "${labels[@]}"
do

    /home/jupyter/plink1.9 \
    --bfile ${label}/${label}_release9_extracted_merged \
    --recode A \
    --out ${label}/${label}_AAO_adjusted_sex_pc_age

done

PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_AAO_adjusted_sex_pc_age.log.
Options in effect:
  --bfile CAH/CAH_release9_extracted_merged
  --out CAH/CAH_AAO_adjusted_sex_pc_age
  --recode A

52216 MB RAM detected; reserving 26108 MB for main workspace.
4 variants loaded from .bim file.
644 people (369 males, 275 females) loaded from .fam.
644 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 644 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989 done.
Total genotyping rate is 0.998835.
4 variants and 644 people pass filters and QC.
Among remaining phenotypes, 644 are cases and 0 are controls.

In [9]:
labels= ["EAS", "EUR", "CAS", "CAH"]

for label in labels:
    raw      = pd.read_csv(f"/home/jupyter/A419V_release9/{label}/{label}_AAO_adjusted_sex_pc_age.raw", delim_whitespace = True)
    raw_filt = raw[(raw["chr12_40320043_G_C_C"] == 0) & (raw["chr12_40340400_G_A_A"] == 0) & (raw["chr12_40363526_G_A_A"] == 0)]
    raw_filt[["FID", "IID"]].to_csv(f"/home/jupyter/A419V_release9/{label}/{label}_AAO_no_carrier_id.txt", sep = "\t", index = False, header = False)

In [17]:
%%bash
WORK_DIR='/home/jupyter/A419V_release9'
cd $WORK_DIR

labels=('CAH' 'CAS' 'EAS' 'EUR')

for label in "${labels[@]}"
do

    /home/jupyter/plink1.9 \
    --bfile ${label}/${label}_release9_extracted_merged \
    --keep ${label}/${label}_AAO_no_carrier_id.txt \
    --make-bed \
    --out ${label}/${label}_release9_extracted_merged_rm_carrier 

done

PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_release9_extracted_merged_rm_carrier.log.
Options in effect:
  --bfile CAH/CAH_release9_extracted_merged
  --keep CAH/CAH_AAO_no_carrier_id.txt
  --make-bed
  --out CAH/CAH_release9_extracted_merged_rm_carrier

52216 MB RAM detected; reserving 26108 MB for main workspace.
4 variants loaded from .bim file.
644 people (369 males, 275 females) loaded from .fam.
644 phenotype values loaded from .fam.
--keep: 626 people remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 626 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989 done.
4 variants and 626 people pass filters and QC.
A

In [18]:
%%bash
WORK_DIR='/home/jupyter/A419V_release9'
cd $WORK_DIR

labels=('CAH' 'CAS' 'EAS' 'EUR')

for label in "${labels[@]}"
do

    /home/jupyter/plink2 \
    --bfile ${label}/${label}_release9_extracted_merged_rm_carrier \
    --pheno ${label}/${label}_covar_AAO.txt \
    --pheno-name age_of_onset \
    --covar ${label}/${label}_covar_AAO_full_withage.txt \
    --covar-name SEX,AGE,PC1,PC2,PC3,PC4,PC5 \
    --covar-variance-standardize \
    --linear cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err \
    --ci 0.95 \
    --out ${label}/${label}_AAO_adjusted_sex_pc_age_rm_carrier

done

PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_AAO_adjusted_sex_pc_age_rm_carrier.log.
Options in effect:
  --bfile CAH/CAH_release9_extracted_merged_rm_carrier
  --ci 0.95
  --covar CAH/CAH_covar_AAO_full_withage.txt
  --covar-name SEX,AGE,PC1,PC2,PC3,PC4,PC5
  --covar-variance-standardize
  --glm cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err
  --out CAH/CAH_AAO_adjusted_sex_pc_age_rm_carrier
  --pheno CAH/CAH_covar_AAO.txt
  --pheno-name age_of_onset

Start time: Wed Apr 16 13:51:43 2025
52216 MiB RAM detected, ~49940 available; reserving 26108 MiB for main
workspace.
Using up to 8 compute threads.
626 samples (270 females, 356 males; 626 founders) loaded from
CAH/CAH_release9_extracted_merged_rm_carrier.fam.
4 variants loaded from CAH/CAH_release9_extracted_merged_rm_carrier.bim.
1 quantitative phenotype loaded (398 values).
7 co

In [19]:
labels = ['CAH', 'CAS', 'EAS', 'EUR']
df = pd.DataFrame()

for label in labels:

    lm          = pd.read_csv(f"/home/jupyter/A419V_release9/{label}/{label}_AAO_adjusted_sex_pc_age_rm_carrier.age_of_onset.glm.linear", delim_whitespace = True)
    lm_a419     = lm[lm["ID"] == "chr12_40252984_G_A"]
    lm_a419_red = lm_a419[["ID", "REF", "ALT", "A1_CT", "ALLELE_CT", "A1_FREQ", "TEST", "OBS_CT", "BETA", "SE", "L95", "U95", "T_STAT", "P", "ERRCODE"]]
    
    lm_a419_red["label"]  = label
    lm_a419_red["REF_CT"] = lm_a419_red["ALLELE_CT"] - lm_a419_red["A1_CT"]
    lm_a419_red           = lm_a419_red[["label", "ID", "REF", "ALT", "ALLELE_CT", "A1_CT", "REF_CT", "A1_FREQ", "TEST", "OBS_CT", "BETA", "SE", "L95", "U95", "T_STAT", "P", "ERRCODE"]]
    
    df = pd.concat([df, lm_a419_red])

df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["label"]  = label
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["REF_CT"] = lm_a419_red["ALLELE_CT"] - lm_a419_red["A1_CT"]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["label"]  = label
A value is trying to be set on a copy of a slice from a DataFrame.
Try us

Unnamed: 0,label,ID,REF,ALT,ALLELE_CT,A1_CT,REF_CT,A1_FREQ,TEST,OBS_CT,BETA,SE,L95,U95,T_STAT,P,ERRCODE
0,CAH,chr12_40252984_G_A,G,A,778,2,776,0.002571,ADD,389,-3.94082,4.22686,-12.2253,4.34367,-0.932329,0.351758,.
1,CAH,chr12_40252984_G_A,G,A,778,2,776,0.002571,SEX,389,-0.204122,0.297326,-0.78687,0.378627,-0.686525,0.492801,.
2,CAH,chr12_40252984_G_A,G,A,778,2,776,0.002571,AGE,389,12.6677,0.299332,12.081,13.2543,42.3198,7.04895e-146,.
3,CAH,chr12_40252984_G_A,G,A,778,2,776,0.002571,PC1,389,-0.057358,0.320641,-0.685803,0.571087,-0.178885,0.858123,.
4,CAH,chr12_40252984_G_A,G,A,778,2,776,0.002571,PC2,389,1.06854,0.305115,0.470531,1.66656,3.50211,0.000516491,.
5,CAH,chr12_40252984_G_A,G,A,778,2,776,0.002571,PC3,389,0.157585,0.38674,-0.600412,0.915581,0.407469,0.683893,.
6,CAH,chr12_40252984_G_A,G,A,778,2,776,0.002571,PC4,389,0.688412,0.336958,0.027986,1.34884,2.04302,0.04174,.
7,CAH,chr12_40252984_G_A,G,A,778,2,776,0.002571,PC5,389,0.38751,0.354263,-0.306832,1.08185,1.09385,0.274713,.
0,CAS,chr12_40252984_G_A,G,A,886,10,876,0.011287,ADD,443,-4.26791,1.65211,-7.50598,-1.02985,-2.58332,0.0101116,.
1,CAS,chr12_40252984_G_A,G,A,886,10,876,0.011287,SEX,443,0.262727,0.247519,-0.222402,0.747856,1.06144,0.28908,.


#### Adjusted by R16, G23, G2019

In [4]:
%%bash
WORK_DIR='/home/jupyter/A419V_release9'
cd $WORK_DIR

labels=('CAH' 'CAS' 'EAS' 'EUR')

for label in "${labels[@]}"
do

    /home/jupyter/plink1.9 \
    --bfile ${label}/${label}_release9_extracted_merged \
    --recode A \
    --out ${label}/${label}_AAO_adjusted_sex_pc_age

done

PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_AAO_adjusted_sex_pc_age.log.
Options in effect:
  --bfile CAH/CAH_release9_extracted_merged
  --out CAH/CAH_AAO_adjusted_sex_pc_age
  --recode A

52216 MB RAM detected; reserving 26108 MB for main workspace.
4 variants loaded from .bim file.
644 people (369 males, 275 females) loaded from .fam.
644 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 644 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989 done.
Total genotyping rate is 0.998835.
4 variants and 644 people pass filters and QC.
Among remaining phenotypes, 644 are cases and 0 are controls.

In [57]:
labels = ["CAH", "CAS", "EAS", "EUR"]

for label in labels: 
    cov = pd.read_csv(f'/home/jupyter/A419V_release9/{label}/{label}_covar_AAO_full_withage.txt', sep = "\t")
    raw = pd.read_csv(f'/home/jupyter/A419V_release9/{label}/{label}_AAO_adjusted_sex_pc_age.raw', delim_whitespace = True)
    raw_red = raw[["IID", "chr12_40320043_G_C_C", "chr12_40340400_G_A_A", "chr12_40363526_G_A_A"]]
    cov_raw = pd.merge(cov, raw_red, on = "IID", how = "left")
    cov_raw = cov_raw[(~cov_raw["chr12_40320043_G_C_C"].isna()) & (~cov_raw["chr12_40340400_G_A_A"].isna()) & (~cov_raw["chr12_40363526_G_A_A"].isna())]
    cov_raw.to_csv(f'/home/jupyter/A419V_release9/{label}/{label}_covar_AAO_full_withage_geno.txt', sep = "\t", header = True, index = False)

In [58]:
%%bash
WORK_DIR='/home/jupyter/A419V_release9'
cd $WORK_DIR

labels=('CAH' 'CAS' 'EAS' 'EUR')

for label in "${labels[@]}"
do

    /home/jupyter/plink2 \
    --bfile ${label}/${label}_release9_extracted_merged \
    --pheno ${label}/${label}_covar_AAO.txt \
    --pheno-name age_of_onset \
    --covar ${label}/${label}_covar_AAO_full_withage_geno.txt \
    --covar-name SEX,AGE,PC1,PC2,PC3,PC4,PC5,chr12_40320043_G_C_C,chr12_40340400_G_A_A,chr12_40363526_G_A_A \
    --covar-variance-standardize \
    --linear cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err \
    --ci 0.95 \
    --out ${label}/${label}_AAO_adjusted_sex_pc_age_geno

done

PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAH/CAH_AAO_adjusted_sex_pc_age_geno.log.
Options in effect:
  --bfile CAH/CAH_release9_extracted_merged
  --ci 0.95
  --covar CAH/CAH_covar_AAO_full_withage_geno.txt
  --covar-name SEX,AGE,PC1,PC2,PC3,PC4,PC5,chr12_40320043_G_C_C,chr12_40340400_G_A_A,chr12_40363526_G_A_A
  --covar-variance-standardize
  --glm cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err
  --out CAH/CAH_AAO_adjusted_sex_pc_age_geno
  --pheno CAH/CAH_covar_AAO.txt
  --pheno-name age_of_onset

Start time: Thu Apr 17 15:08:06 2025
52216 MiB RAM detected, ~50557 available; reserving 26108 MiB for main
workspace.
Using up to 8 compute threads.
644 samples (275 females, 369 males; 644 founders) loaded from
CAH/CAH_release9_extracted_merged.fam.
4 variants loaded from CAH/CAH_release9_extracted_merged.bim.
1 quantitative phenotype lo

phenotype 'age_of_onset'.


--glm linear regression on phenotype 'age_of_onset': done.
Results written to CAH/CAH_AAO_adjusted_sex_pc_age_geno.age_of_onset.glm.linear .
End time: Thu Apr 17 15:08:06 2025
PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CAS/CAS_AAO_adjusted_sex_pc_age_geno.log.
Options in effect:
  --bfile CAS/CAS_release9_extracted_merged
  --ci 0.95
  --covar CAS/CAS_covar_AAO_full_withage_geno.txt
  --covar-name SEX,AGE,PC1,PC2,PC3,PC4,PC5,chr12_40320043_G_C_C,chr12_40340400_G_A_A,chr12_40363526_G_A_A
  --covar-variance-standardize
  --glm cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err
  --out CAS/CAS_AAO_adjusted_sex_pc_age_geno
  --pheno CAS/CAS_covar_AAO.txt
  --pheno-name age_of_onset

Start time: Thu Apr 17 15:08:06 2025
52216 MiB RAM detected, ~50559 available; reserving 26108 MiB for main
workspace.
Using up to 8 compute threads.
661 samples (36

all-missing.


Calculating allele frequencies... 0%0%done.


phenotype 'age_of_onset'.


--glm linear regression on phenotype 'age_of_onset': done.
Results written to CAS/CAS_AAO_adjusted_sex_pc_age_geno.age_of_onset.glm.linear .
End time: Thu Apr 17 15:08:06 2025
PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to EAS/EAS_AAO_adjusted_sex_pc_age_geno.log.
Options in effect:
  --bfile EAS/EAS_release9_extracted_merged
  --ci 0.95
  --covar EAS/EAS_covar_AAO_full_withage_geno.txt
  --covar-name SEX,AGE,PC1,PC2,PC3,PC4,PC5,chr12_40320043_G_C_C,chr12_40340400_G_A_A,chr12_40363526_G_A_A
  --covar-variance-standardize
  --glm cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err
  --out EAS/EAS_AAO_adjusted_sex_pc_age_geno
  --pheno EAS/EAS_covar_AAO.txt
  --pheno-name age_of_onset

Start time: Thu Apr 17 15:08:06 2025
52216 MiB RAM detected, ~50559 available; reserving 26108 MiB for main
workspace.
Using up to 8 compute threads.
3192 samples (1

phenotype 'age_of_onset'.


--glm linear regression on phenotype 'age_of_onset': done.
Results written to EAS/EAS_AAO_adjusted_sex_pc_age_geno.age_of_onset.glm.linear .
End time: Thu Apr 17 15:08:06 2025
PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025)    cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to EUR/EUR_AAO_adjusted_sex_pc_age_geno.log.
Options in effect:
  --bfile EUR/EUR_release9_extracted_merged
  --ci 0.95
  --covar EUR/EUR_covar_AAO_full_withage_geno.txt
  --covar-name SEX,AGE,PC1,PC2,PC3,PC4,PC5,chr12_40320043_G_C_C,chr12_40340400_G_A_A,chr12_40363526_G_A_A
  --covar-variance-standardize
  --glm cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err
  --out EUR/EUR_AAO_adjusted_sex_pc_age_geno
  --pheno EUR/EUR_covar_AAO.txt
  --pheno-name age_of_onset

Start time: Thu Apr 17 15:08:06 2025
52216 MiB RAM detected, ~50559 available; reserving 26108 MiB for main
workspace.
Using up to 8 compute threads.
15332 samples (

In [59]:
labels = ['CAH', 'CAS', 'EAS', 'EUR']
df = pd.DataFrame()

for label in labels:

    lm          = pd.read_csv(f"/home/jupyter/A419V_release9/{label}/{label}_AAO_adjusted_sex_pc_age_geno.age_of_onset.glm.linear", delim_whitespace = True)
    lm_a419     = lm[lm["ID"] == "chr12_40252984_G_A"]
    lm_a419_red = lm_a419[["ID", "REF", "ALT", "A1_CT", "ALLELE_CT", "A1_FREQ", "TEST", "OBS_CT", "BETA", "SE", "L95", "U95", "T_STAT", "P", "ERRCODE"]]
    
    lm_a419_red["label"]  = label
    lm_a419_red["REF_CT"] = lm_a419_red["ALLELE_CT"] - lm_a419_red["A1_CT"]
    lm_a419_red           = lm_a419_red[["label", "ID", "REF", "ALT", "ALLELE_CT", "A1_CT", "REF_CT", "A1_FREQ", "TEST", "OBS_CT", "BETA", "SE", "L95", "U95", "T_STAT", "P", "ERRCODE"]]
    
    df = pd.concat([df, lm_a419_red])

df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["label"]  = label
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["REF_CT"] = lm_a419_red["ALLELE_CT"] - lm_a419_red["A1_CT"]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lm_a419_red["label"]  = label
A value is trying to be set on a copy of a slice from a DataFrame.
Try us

Unnamed: 0,label,ID,REF,ALT,ALLELE_CT,A1_CT,REF_CT,A1_FREQ,TEST,OBS_CT,BETA,SE,L95,U95,T_STAT,P,ERRCODE
0,CAH,chr12_40252984_G_A,G,A,788,2,786,0.002538,ADD,394,-3.99091,4.21002,-12.2424,4.26058,-0.947954,0.34375,.
1,CAH,chr12_40252984_G_A,G,A,788,2,786,0.002538,SEX,394,-0.231059,0.294786,-0.808829,0.346711,-0.783819,0.433631,.
2,CAH,chr12_40252984_G_A,G,A,788,2,786,0.002538,AGE,394,12.6447,0.297864,12.0609,13.2285,42.4514,6.67716e-147,.
3,CAH,chr12_40252984_G_A,G,A,788,2,786,0.002538,PC1,394,-0.042634,0.31939,-0.668627,0.583358,-0.133487,0.893878,.
4,CAH,chr12_40252984_G_A,G,A,788,2,786,0.002538,PC2,394,1.07747,0.30388,0.481871,1.67306,3.54569,0.000440062,.
5,CAH,chr12_40252984_G_A,G,A,788,2,786,0.002538,PC3,394,0.21374,0.35754,-0.487025,0.914505,0.597808,0.550321,.
6,CAH,chr12_40252984_G_A,G,A,788,2,786,0.002538,PC4,394,0.676866,0.326892,0.036169,1.31756,2.07061,0.0390648,.
7,CAH,chr12_40252984_G_A,G,A,788,2,786,0.002538,PC5,394,0.48657,0.349359,-0.198161,1.1713,1.39275,0.164503,.
8,CAH,chr12_40252984_G_A,G,A,788,2,786,0.002538,chr12_40320043_G_C_C,394,0.463623,0.318744,-0.161104,1.08835,1.45453,0.146618,.
9,CAH,chr12_40252984_G_A,G,A,788,2,786,0.002538,chr12_40340400_G_A_A,394,-0.590933,0.329429,-1.2366,0.054737,-1.79381,0.0736328,.
