Skip to content

LaurenceHowe/SiblingGWAS

Repository files navigation

SiblingGWAS

Scripts for running GWAS using siblings to estimate Within-Family (WF) and Between-Family (BF) effects of genetic variants on continuous traits. Allows the inclusion of more than two siblings from one family.

Basic Requirements

  1. Siblings. The analysis pipeline requires data on siblings. We suggest including all siblings from families with one or more pairs of genotyped dizygotic siblings. For example, in a family with a pair of monozygotic twins and an additional sibling, include both MZ twins and the sibling. The inclusion of both MZ twins should (very) modestly improve power by accounting for variation in the phenotypic outcome. If siblings have not been previously identified in the dataset, we suggest using KING (http://people.virginia.edu/~wc9c/KING/manual.html) to infer siblings.
  2. Imputed genotype data. The analysis scripts use best guess genotype data in PLINK binary format. We have provided scripts to convert different file formats (e.g. vcf, bgen) to PLINK binary best guess format satisfying pipeline input requirements.
  3. Phenotypes. Phenotype data for siblings on outcomes of interest (e.g. height and body mass index).

For more details on the prerequisites and inputs required for the pipeline, please consult the wiki
https://github.com/LaurenceHowe/SiblingGWAS/wiki/

Downloading and running the pipeline

Navigate to the directory where you want to download the repository. The repository can then be downloaded using git:

git clone https://github.com/LaurenceHowe/SiblingGWAS/


Once the repository is downloaded, run the following command to check that files have downloaded properly:

head ./SiblingGWAS/resources/parameters


SCRIPTS:

config file

File to be edited with paths to relevant input files.
Note that only this file should be edited.

1.0_setup

The set-up script runs checks to ensure that the input files are in the correct format and checks the installation of R packages.

2.0_summary

This script extracts summary data on available phenotypes.

3.0_partitions

This script partitions the genetic data into smaller lists of SNPs to be run in batches.

4.0_unified_regression

This script runs the regressions in R. The script fits two models: a conventional regression of genotype on phenotype and a model including the family mean for each genotype to generate Within-Family and Between-Family estimates. Standard errors are adjusted to account for family structure.

5.0_tidy

This script compiles the output into a final summary statistics file.

Any queries to Laurence Howe laurence.howe@bristol.ac.uk

Note scripts were adapted from scripts by GoDMC (Gibran Hemani et al) and the SSGAC (Sean Lee/Patrik Turley et al). See the Wiki for more information!