Skip to content

Commit

Permalink
(Greatly) Simplify the pedigree definition file format of grups-rs
Browse files Browse the repository at this point in the history
- make the template definition file of grups-rs mirror the standard of
  `.ped` and `.fam` file formats as closely as possible.
- grups-rs remains backwards-compatible with the old file format.
  • Loading branch information
MaelLefeuvre committed Dec 15, 2023
1 parent 0564c45 commit 6117c86
Show file tree
Hide file tree
Showing 14 changed files with 751 additions and 107 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# 0.3.2
## Features
- Implement a simpler pedigree definition file format, that is based upon the [`.ped`](https://csg.sph.umich.edu/abecasis/QTDT/docs/pedigree.html) and/or [`.fam`](https://www.cog-genomics.org/plink/1.9/formats#fam) file formats.
- This effectively makes GRUPS-rs *somewhat* compatible with these usual file format, as users still need to target specific pairwise comparisons within the constructed tree. This is done through the 'COMPARE' keyword (see the documentation for more information on the current standard)
- Note that GRUPS-rs is still backwards compatible with its *'legacy'* format, which mirrors the initial implement of GRUPS. Here the program is able to automatically detect and parse the appropriate format.


# 0.3.1
## Features
- Add Jemalloc memory allocator
Expand Down
20 changes: 10 additions & 10 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[workspace.package]
version = "0.3.1"
version = "0.3.2"
authors = ["Maël Lefeuvre <mael.lefeuvre@mnhn.fr>"]
description = "GRUPS-rs: Get Relatedness Using Pedigree Simulations"
edition = "2021"
Expand Down
69 changes: 65 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -416,9 +416,47 @@ grups-rs fst --threads 22 --vcf-dir data/1000g-phase3/ --output-dir data/fst/EUR

## Defining custom pedigrees

Defining pedigrees within GRUPS-rs is performed through simple definition files. See the example main template pedigree definition file [here](resources/pedigrees/example_pedigree.txt). Other examples may be found in the [resources/pedigrees](/resources/pedigrees) subdirectory of this repository.
Defining pedigrees within GRUPS-rs is performed through simple definition files. See the main example template pedigree definition file [here](resources/pedigrees/example_pedigree.ped). Other examples may be found in the [resources/pedigrees](/resources/pedigrees) subdirectory of this repository. GRUPS-rs currently supports two alternative formats, which are described below.

In essence, a pedigree in GRUPS-rs is defined and parsed in three distinct steps, each one tied to a keyword within the definition file:
### Standard format (`GRUPS-rs`)

The standard pedigree definition format of `grups-rs` can be subdivided into two sections:

1. The first section takes charge of defining the individuals found within the family tree, and its topology. Here, this particular section extensively, mirrors the commonly found [`.ped`](https://csg.sph.umich.edu/abecasis/Pedstats/tour/input.html) file format of the `PEDSTATS/QTDT` software, or PLINK's [`.fam`](https://www.cog-genomics.org/plink/1.9/formats#fam) files.
- Here, merely three columns are required from these previously mentionned file formats. Here, the `iid` column specifies the within-family id of the individual, while `fid` and `mid` both specify the ids of the individuals parents (See below):
```python
# First section: define the pedigree's topology
iid fid mid
Ind1 0 0 # Ind1 and Ind2 are defined as founder individuals
Ind2 0 0
Ind3 Ind1 Ind2 # Ind3 is defined as the offspring of Ind1 and Ind2
```

2. The second section of the file takes charge of specifying which pairwise comparisons should `GRUPS-rs` specifically investigate within the template family tree. Here, every line beginning with the `COMPARE` keyword is considered as a "comparison definition" entry by `GRUPS-rs`, and is expected to adhere to the following scheme:
```
COMPARE <label> <iid-1> <iid-2>
```
Where,
- `<label>` is the user-defined name for the given comparison (e.g. 'first-degree', 'cousins', 'unrelated', etc.)
- `<iid-1>` is the individual id of the first sample being compared
- `<iid-2>` is the individual id of the second sample being compared

Example:
```
COMPARE Unrelated Ind1 Ind2
COMPARE First Ind1 Ind3
COMPARE Self Ind3 Ind3
```

Note that, while requiring only three columns, `GRUPS-rs` is able to directly parse the previously mentionned `.fam` and `.ped` formats, provided that users manually annotate the required second section at the bottom of these files. Hence, an quick and intuitive way to design custom pedigree files for GRUPS-rs is to:
1. *Visually* generate the first section of the file, using the interactive [`QuickPed`](https://magnusdv.github.io/pedsuite/articles/web_only/quickped.html) online software [(Vigeland M.D. 2022)](https://doi.org/10.1186/s12859-022-04759-y).
2. Export and save the output of QuickPed as a `.ped` file
3. Manually append the desired 'COMPARE' entries at the bottom of this file.
### Legacy format (`GRUPS`)

On top of the current standard format, `GRUPS-rs` remains entirely backwards compatible with the previous file format of `GRUPS``.

In essence, the legacy pedigree definition file of GRUPS is defined and parsed in three distinct steps, each one tied to a keyword within the definition file:

1. `INDIVIDUALS`: Define the individuals within the pedigree.
- Individuals are then defined by a unique, line-separated id or name.
Expand Down Expand Up @@ -483,11 +521,34 @@ Thus, an appropriate family tree topology could be as follows:

</p>

Were founder and simulated individuals are colored in teal and lavander, respectively. Green arrows represents the comparisons that GRUPS-rs is requested to perform.
Where founder and simulated individuals are colored in teal and lavander, respectively. Green arrows represents the comparisons that GRUPS-rs is requested to perform.

Here, a template family tree such as this one can be defined as the following:

**Standard format**
```python
# standard format
Ind1 0 0
Ind2 0 0
Ind3 Ind1 Ind2 # Ind3 and Ind4 are defined as the childreb of Ind1 and Ind2
Ind4 Ind1 Ind2
Ind5 Ind3 Ind4 # Ind5 is defined as an inbred individual, since Ind3 and Ind4 are siblings.
Ind6 0 0
Ind7 Ind4 Ind6

COMPARE inbred-self Ind5 Ind5
COMPARE self Ind3 Ind3
COMPARE first Ind2 Ind4
COMPARE inbred-second Ind5 Ind7
COMPARE second Ind2 Ind7
COMPARE Unrelated Ind1 Ind2
```

**Legacy format**:

Alternatively, one could also define this family tree using the legacy format of `GRUPS` in such a manner
```python
# legacy format
INDIVIDUALS
Ind1
Ind2
Expand All @@ -507,8 +568,8 @@ COMPARISONS
inbred-self=compare(Ind5,Ind5) # Compare inbred individual Ind5 to itself. label this relationship as 'inbred-self'
self=compare(Ind3,Ind3) # Compare outbred individual Ind3 to itself. label this relationship as 'self'
first=compare(Ind2,Ind4) # Compare Ind2 and Ind4. label this relationship as 'first'
second=compare(Ind2,Ind7) # Compare Ind2 and Ind7. label this relationship as 'second'
inbred-second=compare(Ind5,Ind7) # Compare Ind5 and Ind7. label this relationship as 'inbred-second'
second=compare(Ind2,Ind7) # Compare Ind2 and Ind7. label this relationship as 'second'
unrelated=compare(Ind1,Ind2) # Compare Ind1 and Ind2. label this relationship as 'unrelated'
```

Expand Down
48 changes: 48 additions & 0 deletions resources/pedigrees/example_pedigree.ped
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# example pedigree definition input file
# The first section defines the individuals in the pedigree, and mirrors that
# of a standard '.pedigree' file or that of a PLINK '.fam' file.
# Note that:
# - only three columns are required by grups-rs, but the format is also
# compatible with the standard 4- and 6-field version of these files (in that
# case, the 'famid', 'sex', and 'aff' columns are ignored).
# - The header line is optional, but can help GRUPS-rs in finding the
# appropriate column index, should its location be non-standard.
# - Empty lines and comments (inline of full-length) are ignored by the program
#
# The second section defines which pedigree comparisons should GRUPS-rs target
# Note that:
# - any line starting with the 'COMPARE' keyword is considered a comparison
# definition line.
# - A comparison definition line is expected to adhere to the following scheme:
#
# COMPARE <label> <iid-1> <iid-2>
#
# Where,
# - <label> is the user-defined name for that kinship tie
# - <iid-1> is the individual id of the first sample being compared
# - <iid-2> is the individual id of the second sample involved in the comparison

# First section: define the pedigree's topology
iid fid mid
father 0 0
mother 0 0
child1 father mother
child2 father mother
wife 0 0
gchild child1 wife
cousin child2 husband
husband 0 0
inbred child1 child2
stepmom 0 0
halfsib child1 stepmom

# Second section: target specific comparisons within the pedigree
COMPARE Inbred_self inbred inbred
COMPARE Twins_or_self father father
COMPARE Parent_child father child1
COMPARE Siblings child1 child2
COMPARE GPGC mother cousin
COMPARE Avuncular child2 gchild
COMPARE Half-siblings gchild halfsib
COMPARE Cousins cousin gchild
COMPARE Unrelated father mother
46 changes: 46 additions & 0 deletions resources/pedigrees/extended_pedigree.ped
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# example pedigree definition input file
# The first section defines the individuals in the pedigree, and mirrors that
# of a standard '.pedigree' file or that of a PLINK '.fam' file.
# Note that:
# - only three columns are required by grups-rs, but the format is also
# compatible with the standard 4- and 6-field version of these files (in that
# case, the 'famid', 'sex', and 'aff' columns are ignored).
# - The header line is optional, but can help GRUPS-rs in finding the
# appropriate column index, should its location be non-standard.
# - Empty lines and comments (inline of full-length) are ignored by the program
#
# The second section defines which pedigree comparisons should GRUPS-rs target
# Note that:
# - any line starting with the 'COMPARE' keyword is considered a comparison
# definition line.
# - A comparison definition line is expected to adhere to the following scheme:
#
# COMPARE <label> <iid-1> <iid-2>
#
# Where,
# - <label> is the user-defined name for that kinship tie
# - <iid-1> is the individual id of the first sample being compared
# - <iid-2> is the individual id of the second sample involved in the comparison

# First section: define the pedigree's topology
iid fid mid
father 0 0
mother 0 0
son father mother
son_mate 0 0
gson son son_mate
gson_mate 0 0
ggson gson gson_mate
ggson_mate 0 0
gggson ggson ggson_mate
gggson_mate 0 0
ggggson gggson gggson_mate

# Second section: target specific comparisons within the pedigree
COMPARE Self father father # Identical Twins or Self comparison
COMPARE First father son # First degree
COMPARE Second father gson # Second degree
COMPARE Third father ggson # Third degree
COMPARE Fourth father gggson # Fourth Degree
COMPARE Fifth father ggggson # Fifth Degree
COMPARE Unrelated father mother # Unrelated
33 changes: 33 additions & 0 deletions resources/pedigrees/koszyce-pedigree.ped
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
iid fid mid
0.0F 0 0
0.1M 0 0
1.0F 0.0F 0.1M
1.1M 0 0
1.2F 0.0F 0.1M
2.0M 1.0F 1.1M
2.1M 1.1M 1.2F
2.2M 1.1M 1.2F
2.3F 0 0
3.0M 2.2M 2.3F

# Expected pedigree:
# 0.0F -------+------- 0.1M
# |
# +----------+----------+
# | |
# 1.0F --+-- 1.1M --+-- 1.2F
# | |
# | +---+---+
# | | |
# 2.0M 2.1M 2.2M --+-- 2.3F
# |
# 3.0M
#

COMPARE Self 3.0M 3.0M # E(r)=1.0
COMPARE 1st-degree 2.1M 2.2M # E(r)=0.5
COMPARE 2nd+3rd-degree 2.0M 2.1M # E(r)=0.375 (0.25 + 0.125)
COMPARE 2nd-degree 2.1M 3.0M # E(r)=0.25
COMPARE 3rd+4th-degree 2.0M 3.0M # E(r)=0.1875 (0.125 + 0.0625)
COMPARE 3rd-degree 1.0F 3.0M # E(r)=0.125
COMPARE Unrelated 2.0M 2.3F # E(r)=0.0
44 changes: 44 additions & 0 deletions resources/pedigrees/siblings-pedigree.ped
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# example pedigree definition input file
# The first section defines the individuals in the pedigree, and mirrors that
# of a standard '.pedigree' file or that of a PLINK '.fam' file.
# Note that:
# - only three columns are required by grups-rs, but the format is also
# compatible with the standard 4- and 6-field version of these files (in that
# case, the 'famid', 'sex', and 'aff' columns are ignored).
# - The header line is optional, but can help GRUPS-rs in finding the
# appropriate column index, should its location be non-standard.
# - Empty lines and comments (inline of full-length) are ignored by the program
#
# The second section defines which pedigree comparisons should GRUPS-rs target
# Note that:
# - any line starting with the 'COMPARE' keyword is considered a comparison
# definition line.
# - A comparison definition line is expected to adhere to the following scheme:
#
# COMPARE <label> <iid-1> <iid-2>
#
# Where,
# - <label> is the user-defined name for that kinship tie
# - <iid-1> is the individual id of the first sample being compared
# - <iid-2> is the individual id of the second sample involved in the comparison

# First section: define the pedigree's topology
iid fid mid
father 0 0
mother 0 0
stepfather 0 0
stepson mother stepfather
child1 father mother
child2 father mother
wife 0 0
gchild child1 wife
cousin child2 husband
husband 0 0

# Second section: target specific comparisons within the pedigree
COMPARE Self father father
COMPARE Siblings child1 child2
COMPARE Half-siblings child1 stepson
COMPARE Cousins cousin gchild
COMPARE Unrelated father mother

Loading

0 comments on commit 6117c86

Please sign in to comment.