Background/Aims
The aim of this study was to identify the profile of rare variants associated with
Crohn’s disease (CD) using whole exome sequencing (WES) analysis of Korean children
with CD and to evaluate whether genetic profiles could provide information during
medical decision making.
Methods
DNA samples from 18 control individuals and 22 patients with infantile, very-early
and early onset CD of severe phenotype were used for WES. Genes were filtered using
panels of inflammatory bowel disease (IBD)-associated genes and genes of primary immunodeficiency
(PID) and monogenic IBD.
Results
Eighty-one IBD-associated variants and 35 variants in PID genes were revealed by WES.
The most frequently occurring variants were carried by nine (41%) and four (18.2%)
CD probands and were
ATG16L2 (rs11235604) and
IL17REL (rs142430606), respectively. Twenty-four IBD-associated variants and 10 PID variants
were predicted to be deleterious and were identified in the heterozygous state. However,
their functions were unknown with the exception of a novel p.Q111X variant in XIAP
(X chromosome) of a male proband.
Conclusions
The presence of many rare variants of unknown significance limits the clinical applicability
of WES for individual CD patients. However, WES in children may be beneficial for
distinguishing CD secondary to PID.
INTRODUCTION
Recent advances in genome-wide association studies (GWAS) and meta-analyses have identified
140 susceptibility loci for Crohn’s disease (CD), an intestinal chronic inflammatory
disease, in Caucasians;
1–
4 however, the currently identified loci explain less than 30% of the heritable risk
and account for relatively small increments in the risk of inflammatory bowel disease
(IBD). Existing GWAS have focused on common variants (minor allele frequency [MAF]
>0.05), so strategies to enhance the identification of rare (MAF <0.01) and low-frequency
(MAF, 0.01 to 0.05) variants with increasing effect sizes are critical for the discovery
of the remaining inherited factors.
5 Direct genotyping by targeted array, metabochip, immunochip using low-frequency variants,
and genome sequencing are the methods currently available to investigate disease-causing
rare variants linked to complex traits.
6 Genome seuquencing technologies have developed rapidly in recent years and this strategy
can be used for a wide range of investigations, from monogenic Mendelian disorders
to diseases with high degrees of genetic heterogeneity.
The human exome constitutes less than 5% of the genome, and whole exome sequencing
(WES) studies can therefore be more cost effective than whole genome sequencing for
focused research. In addition, protein-coding regions are more evolutionarily conserved
and are more sensitive to genetic changes
7,
8 than nongenetic regions, making WES potentially more valuable for uncovering deleterious
mutations. WES has been recently employed to circumvent the “diagnostic odyssey” by
providing genetic diagnoses for hearing loss, muscular dystrophy, neuromuscular disease,
retinitis pigmentosa, and mitochondrial disease. Mitochondrial disease was particularly
notable because it was associated not only with mitochondrial genes, but also with
hundreds of nuclear DNA genes.
9 Recently, a variety of primary immunodeficiencies (PIDs) and monogenic diseases were
revealed to cause refractory infantile colitis.
10,
11 Therefore, WES is rapidly becoming a common clinical test for individuals with rare
genetic disorders.
12,
13
Despite these advances, the ability of WES models to uncover disease-causing variants
associated with complex conditions, such as CD and type 2 diabetes, has not been established
for all populations.
14,
15 Methods such as GWAS have been used to validate whether identified high-effect variants
are common enough to be carried by large populations with CD. Rare and low-frequency
variants may occur too infrequently to be identified as contributory for complex traits.
In addition, genotypical and phenotypical differences exist between Caucasian and
Asian populations with CD. For example, mutations within the nucleotide-binding-oligomerisation-domain
(
NOD2/CARD15) and autophagy-related 16-like 1 (
ATG16L1) sequences were not associated with CD in Asian populations.
16–
18 In addition, the prevalence of small bowel involvement and perianal fistula was higher
in Asian patients than in Caucasian patients.
19,
20
Herein, we used WES analysis of Korean children with CD with the aim of identifying
rare variants associated with CD. Genetic susceptibility plays a more important role
in the etiology of pediatric CD than adult CD, probably as a consequence of a higher
burden of disease-causing mutations in affected children.
21 We therefore focused on patients with early-onset CD and severe symptoms such as
more extensive disease at onset and rapid progression. In addition, we also asked
whether genetic profiling of variants could assist in the medical decision-making
process to determine optimal treatment of pediatric CD.
RESULTS
1. Exome sequencing
Exome data were analyzed from 22 pediatric patients with CD and 18 reference individuals.
Total read average was 78,473,095 bp. Seventy-eight percent of mappable reads were
on-target reads and 86% of targeted bases were covered at 10× read depth. Each exome
had, on average, 66,289 SNPs, with 20,196 found in exonic regions. Following a series
of quality-control steps (SNP quality >50, total read depth >10, alternative read
depth >3), 171,898 variants were identified across the 40 exomes. Of those, we focused
on 32,794 missense/nonsense/indel variants within exons. After 24,317 of these variants
were removed due to their presence in the 18 control exomes, 8,477 unique variants
from 5,625 genes were identified across 22 CD exomes.
2. Characteristics of coding variants in IBD-associated genes
Of the 8,477 unique variants from 5,625 genes, the 22 pro-bands carried 81 rare and
low-frequency variants, of which MAF were less than 0.05 among 56 IBD-associated genes
(
Supplementary Table 3). Two probands each carried nonsense mutations in
ATG16L1 and
NOD2; however, these were not deleterious and were not highly conserved according to
in silico prediction algorithms. With the exception of
ATG16L2 (rs11235604) and
TBC1D1 (rs117452860),
24 the remaining variants were of unknown significance (VUS), and their functional roles
in mucosal immunity remain to be elucidated.
Among the 81 variants of the 56 IBD-associated genes, the most frequently occurring
variants were carried by nine and four CD probands, and were found in
ATG16L2 (rs11235604) and
IL17REL (rs142430606), respectively (
Table 2).
ATG16L2, a homolog of
ATG16L1, was identified as a novel candidate gene for CD in a recent Korean GWAS.
24
ATG16L1 functions in autophagy alongside
ATG5.
31 In addition,
ATG16L1 is closely related to
NOD2, which functions in an autophagy-mediated antibacterial pathway in CD.
32 However, little is known regarding the function of mutated
ATG16L2. An additional SNP in
IL17REL, rs142430606 (c.C785T; p.P262L), has not previously been associated with IBD and
was not predicted to be deleterious through
in silico prediction. Variant rs5771069 was previously associated with ulcerative colitis.
33 However, there is no linkage disequilibrium between rs5771069 and rs142430606. The
association of
ATG16L2 and
IL17REL with CD was confirmed using an internal CD GWAS database (n=533). The rs11235604
variant of
ATG16L2 was strongly associated with CD in a previous Korean GWAS (odds ratio [OR], 1.63;
95% confidence interval [CI], 1.27 to 2.10; imputed p-value=1.17×10
−4
) (
Supplementary Table 4).
24 The newly identified variant (rs142430606 in
IL17REL) showed a marginal association with CD (OR, 2.04; 95% CI, 1.001 to 4.14; imputed
p-value=4.53×10
−2
).
Twenty-four unique deleterious variants (10 low-frequency SNPs and 14 novel variants)
in 21 genes were identified in the 22 probands (
Table 3). The 10 low-frequency SNPs have not previously been reported as associated with
IBD. All the variants were in evolutionarily conserved regions of IBD-associated genes;
however, it remains to be determined whether heterozygous incidence is deleterious
for these variants. No dose effects have previously been reported for these genes
with respect to CD phenotypes.
3. Characteristics of coding variants in PID and monogenic IBD genes
Using a PID and monogenic IBD gene panel, 35 variants in 24 PID genes were identified
in the 22 probands and among the 35 variants, 10 variants in eight PID genes were
predicted to be deleterious (
Table 4); however, all the variants were VUS in the heterozygous state with the exception
of
XIAP (p.Q111X; XIAP deficiency), which were identified on the X chromosome of a male patient
(proband 13). The XIAP protein plays an important role in activating the nuclear factor
κB signaling pathway that leads to proinflammatory cascades.
34 The stopgain mutation (c.C331T; p.Q111X) in proband 13 was confirmed by Sanger sequencing
(
Fig. 1) and was strongly indicative of XIAP deficiency. The mutation was located prior to
the BIR2 and BIR3 domains, which play a role in the recruitment of RIP2 and apoptosis.
35 Proband 13 was diagnosed as having severe CD with perianal fistula at the age of
10 years. He presented reduced natural killer cell activity and recurrent episodes
of bicytopenia with bacterial infections. A deleterious p.V561M variant in
CYBB, of which can cause chronic granulomatous disease, was identified in proband 7. However,
his respiratory burst tests were normal.
4. Correlation of patient profiles with deleterious variants
Comorbidity of perianal issues in the 22 CD probands was related to the presence of
a heterozygous variant (rs11235604) in
ATG16L2 (p<0.002, chi-square test); however, proband 16, who was homozygous for the rs11235604
variant, did not suffer perianal problems. Probands 5 and 9 died of severe infantile
IBD and perianal fistula at the age of 14 months and 8 years, respectively. These
two probands carried a heterozygous variant of
IL10RA (c.C301T; p.R101W), which was previously reported to be a causative gene for refractory
infantile IBD when present in the homozygous state.
36 We therefore performed Sanger sequencing on
IL10RA in the two probands and their healthy parents; however, no additional homozygote
or compound heterozygote mutations in
IL10RA were identified. In summary, no genotype-phenotype associations were noted in the
probands with the exception of XIAP deficiency in proband 13.
DISCUSSION
In this study, we performed WES analysis on samples from 22 children with CD, and
identified 81 IBD-associated gene variants and 35 PID genes. One variant, rs11235604
in
ATG16L2, was already identified as a CD susceptibility locus in our GWAS database.
24 A further variant, rs142430606 in
IL17REL, was newly identified as a probable disease-causing rare variant. The GWAS dataset
confirmed this variant to be marginally associated with CD (OR, 2.04; imputed p-value=4.53×10
−2
). PID genes were also examined, and a novel p.Q111X variant in
XIAP was noted in a patient with CD. The majority of the rare variants, particularly 24
unique deleterious variants in conserved loci, were VUS. Further study is needed to
determine the functional effects of these mutations. The identification of numerous
VUS in the small study population suggests that WES might not yet be applicable to
clinical decision making in the treatment of pediatric CD.
One possible explanation for the difficulties in interpreting interesting rare variants
is that IBD-associated variants are too rare and genetically heterogeneous to allow
statistically significant observation in a small population. Recent GWAS successfully
found common disease-causing variants in populations with CD,
1–
4 but those common variants accounted for less than 30% of the heritability of CD.
37 The majority of polymorphisms in the human genome are rare variants, but, due to
the limited statistical power, the effects of rare variants on polygenic CD are not
clear. Lack of information regarding gene function also hampers the interpretation
of WES data. Approximately 5,000 genes are prioritized in databases such as OMIM,
and functional interpretation of VUS not listed in the databases is difficult. This
lack of functional information hampers the prioritization of candidate mutations for
further analysis. Unlike diseases exhibiting Mendelian inheritance patterns that have
clear genotype-phenotype correlations, complex diseases are affected by the regulatory
variation of non-coding regions, cumulative effects of polygenic determinants, gene-gene
interactions, gene-environment interactions, and epigenetic gene modification mechanisms,
all of which present huge challenges for the study of complex traits. Due to this
complexity, WES may not be sufficient to uncover critical determinants. For example,
recent WES for complex traits such as type 2 diabetes and idiopathic epilepsy failed
to identify any significant rare variants despite the use of large study populations.
14,
15
The scope of our study was additionally limited by the challenges presented by WES
analysis.
8,
38 First, WES involves applied computational genomics. Different sequencing methods
produce sequences of varying length and depth and the results of “loss-of-function”
predictions can vary with data formats and annotation software. In addition, detection
of short sequence indels is limited to one third of the read length in WES. The very
large amount of data required for WES analysis also poses a challenge in determining
disease-causing mutations. Capturing specific genomic regions and the exome may reduce
the complexity of the data and simplify the computational analysis. In the present
study, the analysis was simplified by prioritizing IBD-associated genes from recent
GWAS studies and genes of PID and monogenic IBD. Second,
in silico prediction models show substantial disagreements.
39 In the present study, we used three programs to assess the deleterious extent of
the identified mutations. SIFT predictions correlated with Polyphen-2 and Mutation-Taster
predictions at levels of 40% to 67%. Care must be taken to avoid false hypotheses
that primarily rely on current filtering parameters and variable interpretations of
WES data.
38 Therefore, in addition to validation by Sanger sequencing, functional studies are
important for the full assessment of deleterious variants; however, it is difficult
to perform functional studies on the numerous variants presented in the current study.
The identification of numerous VUS does not alleviate the “diagnostic odyssey” needed
for some patients.
Nonetheless, based on the fact that IBD-mimicking colitis is frequently observed in
immunodeficient infants, WES-based diagnosis for patients with monogenic IBD may be
clinically practical.
26 The identification of mutations in
IL10RA and
XIAP by WES highlighted the need for hematopoietic stem cell transplantation in affected
children.
40,
41 One-third of chronic granulomatous disease and one-fifth of XIAP-deficient patients
develop a noninfectious chronic IBD similar to CD.
42,
43 Common variable immune deficiency, dyskeratosis congenita, immunodysregulation polyendocrinopathy
enteropathy X-linked syndrome, and Wiskott-Aldrich syndrome are also frequently accompanied
by infantile enterocolitis.
26 Using a panel of PID and monogenic IBD genes, our WES identified a novel
XIAP variant carried by proband.
13 Detailed guidelines for the diagnosis of IBD using WES remain to be established.
In conclusion, although pediatric patients with severe phenotypes carried a wide spectrum
of genetic susceptibility factors for CD, the numerous heterozygous VUS in IBD-associated
genes remain to be functionally characterized. Subsequently, those VUS limit the practical
clinical application of WES for CD patients and hamper any personalized application
of our findings to individual CD patients; however, using WES, a Korean-specific variant
in
ATG16L2 was found in CD patients with early-onset and severe phenotype, and a probable candidate
variant in
IL17REL was newly identified. In addition, WES in children may be beneficial for distinguishing
CD secondary to PID, for example as a result of the loss of XIAP protein.