### Recently, there has been a rapid increase in interest in genomics-based applications in the biomedical, pharmaceutical, and therapeutics industries. Machine learning (ML), with its sophisticated mathematical and data analysis techniques, coupled with advances in next-generation sequencing (NGS) have played a huge role in this rapid rise. As most genomic companies and other research organizations started to produce genomic data to keep themselves ahead of the curve, the ability to extract novel biological insights and build predictive models from this ever-growing data has proved to be a challenge for ML because it relied on hand-crafted features for model training and predictions as we saw in the previous two chapters. Translating this massive genomic data from an incomprehensible resource into meaningful insights automatically and intuitively requires more expressive ML models and algorithms.

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_001.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_002.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_003.jpg)

The activation function is the key component of a neural network as it introduces the non-linearity in the architecture. By definition, they transform the summed weighted input from the input node (the output from the transfer function) into an output value to be passed on to the next layer in the architecture (Figure 4.3). Activation functions are unique components of any DL architecture. They help convert a linear relationship into a nonlinear relationship, which is the key to solving so many of the problems that cannot be typically solved by ML. Without the activation function, the network would be just a linear combination of input values. Activation functions will decide if an input to the network is important and should be passed on further or not. This is where DL is different from traditional ML; for example, without the non-linear activation functions, which is the case with ML models, the neural networks behave just like a linear regression function. The activation function helps the DNNs keep the most useful information and filters all irrelevant data points.

Since activation functions are very important, let’s spend some time understanding the different activation functions and where they can be applied. Several types of activation functions are available. These will be described in the following subsections.

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_004.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_005.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_006.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_007.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_008.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_009.jpg)

### Forward propagation is the method through which neural networks make predictions. This network uses multiple layers (input, hidden, and output) to make predictions. For example, in this simple network, a single pass of forwarding propagation looks similar to what’s shown in Figure 4.10:

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_010.jpg)

#### Backpropagation is the opposite of forward propagation and is the most common algorithm for neural networks. It is the process of propagating the errors back into the network to update the weights at each node in the network so that they cause the original output to be closer to the target output, thereby lowering the error overall (Figure 4.11). It works by calculating the loss in the output layer by comparing the predictions with the observed values. The derivative concerning the weight is then calculated using the chain rule and then updates the weights, as shown in Figure 4.11:

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_011.jpg)

### Regularization
One of the key concepts in DL is to avoid biases and variance in the model. Among them, overfitting is the most important one. Regularization is a set of strategies used in DL to reduce the overfitting of the model and improve model predictions. Most models perform well after being trained on a specific subset of data but often, they fail on real-world data, which means they fail to generalize well. Regularization strategies aim to address overfitting and keep the training error as low as possible.

There are three types of regularization techniques. Let’s take a look.

- Lasso
In this method, the coefficients of the network are shrunk to 0 and because of that, it is suitable for variable selection.

- Ridge
In this method, the coefficients of the network shrink to smaller values (but not 0).

- Elastic Net
This method combines Lasso and Ridge and is a tradeoff between both methods.

Data augmentation
Data augmentation is an interesting regularization technique that can solve overfitting. The goal of data augmentation is to generate new training data based on a given original dataset and it provides a cheaper alternative to increase the amount of input data. This technique is very popular in computer vision (CV) and natural language processing (NLP).

Dropout
This is another regularization method and among several regularization methods, dropout is the most popular. Dropout regularization is the process where the randomly selected neurons are dropped during the model training to prevent overfitting. It is a regularization method where it penalizes the nodes that have large weights.

#### DNA methylation prediction. DNA methylation is the process of adding a methyl group to a C5 position of the cytosine of a DNA sequence, resulting in 5-methylcytosine. DNA methylation is a key epigenetic mechanism that is involved in regulating gene expression. So, obtaining a precision prediction of DNA methylation is key in genomics.

#### For this example, we will extract 400 bps from multiple DNA sequences centered at the assayed methylation site (5-methylcytosine) and calculate GC content (counts of Gs and Cs divided by the length of the DNA sequence):

![image.png](attachment:image.png)

#### The preceding table shows the DNA sample ID and the extracted feature (GC) from each DNA sequence. Now, we can add the output to the table, which is methylation levels represented as a methylation ratio, ranging from 0 to 1 (Figure 4.13). The basic measurement used to quantify methylation is the methylation ratio, which is the log ratio of intensities observed in the treated sample compared to the control samples. 0 represents no methylation, while higher methylation values represent more methylation:

![image.png](attachment:image.png)

To come up with a model to predict methylation, what we do normally is fit a straight line through the data (GC content versus methylation value). But as we know, the straight line goes to the negative of the Y-axis and the methylation value cannot be negative. This means we must bend the line near 0 on the Y-axis. This is called non-linearity, and this is what the activation function in a neural network does (Figure 4.14):

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_014.jpg)

The line that we fit here is called the ReLU activation function, which we covered earlier. ReLu accepts input values and gives the maximum of 0 and the input value. This means that if the input values are positive, it returns as it is but if the input values in negative it returns 0. This is how we convert a linear function into a non-linear one.

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_015.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_017.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_018.jpg)

GNNs
A GNN is a type of neural network architecture that is best suited for graph data. While typical neural networks work on array data as input, GNNs work with graphs. GNNs are one of the hottest topics in DL because of their huge popularity and their application in the many domains of life sciences. Graphs are everywhere; real-world objects are often defined in terms of their connections to other things. A set of objects, and their connections between them, are naturally expressed as a graph. GNNs work by transforming all attributes of the graph (nodes, edges, global context), as shown in Figure 4.19:

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_019.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_020.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_021.jpg)

#### The first step in applying DL to genomics is checking the availability of raw data, which can either be generated or extracted from multiple sources and preprocessed. As seen previously, the input of a DNN is real values and in the case of DNA sequences, the four nucleotides (A, T, C, G) can be one-hot encoded as [1,0,0,0], [0,1,0,0], [0,0,1,0], and [0,0,0,1]. The target labels for this data can either be human-annotated or experimental results. Similar to ML, the input data is split into training, validation, and testing datasets and are used for model training, model validation, and model evaluation, respectively. Again, the data split can depend on real-world scenarios. For example, you can keep most data for training (70%), a few data points for validating the model (10%), and 20% for testing (Figure 4.22):

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_04_022.jpg)

#### Even though the human genome has 3 billion base pairs, only 0.1% genetic variation exists between individuals. The most common cause of this genetic variation is a change of a single base pair in the DNA sequence, which we refer to as single nucleotide polymorphisms (SNPs). Many of these single base-pair changes have no impact on human health. However, some of these changes have important biological effects and contribute to some genetic diseases in humans. As such, SNPs are commonly used to detect disease-causing genes in humans, predict a person’s response to drugs or their susceptibility to developing the disease, classifying complex diseases using [Genomics SNP data](drclab/snp.pdf)

Protein structure predictions
One of the most recent success stories of DNNs in functional genomics is protein structure predictions. So, what exactly is the protein structure prediction? It involves modeling the relationship between the amino acids of a protein and its corresponding 3D structure. Deciphering the structure of the protein is widely considered one of the foundational problems of biochemistry and computational biology. DeepMind’s AlphaFold sent shockwaves in the protein structure prediction competition (Protein Structure Prediction Center (CASP)) when it achieved an accuracy of >90 and took first place by a large margin. By 2020, AphaFold’s performance is even more impressive, and it is now considered the go-to model for predicting protein structure.

Regulatory genomics
Regulatory genomics is the study of gene regulatory elements such as promoters, enhancers, silencers, insulators, and so on. They play an important role in gene regulation and hence functionally characterizing them is very important. In addition to these gene regulatory elements, identifying sequence motifs in DNA and RNA regulatory regions is key since they represent target sites of a particular regulatory protein, such as the transcription factor (TF). A variety of techniques are currently available to functionally characterize these gene regulatory elements and sequence motifs. Along with ML algorithms, DNN architectures such as CNNs and RNNs are successfully applied for regulatory genomics applications.

Gene regulatory networks
Gene regulatory networks (GRNs) are defined as networks that are inferred by gene expression data. GRNs are an exciting area of functional genomics and represent causal relationships between the regulators and the target genes. GRNs are important to understand the causal map of network interactions, molecular marker detection, hub gene detection, and so on. It is now routinely possible to perform high-throughput sequencing on any species of interest and generate gene expression data, though it is still a challenge to infer regulatory relationships between TFs, binding sites, and potential gene targets. DNNs, especially CNNs and GNNs, have proven to have much success in building GRNs (https://link.springer.com/chapter/10.1007/978-3-030-05481-6_11).

Single-cell RNA sequencing
Single-cell RNA sequencing (RNA-seq) is a relatively new technology but it has already revolutionized RNA-seq because of its incredible success and widespread applications, particularly in clinical diagnostics. This is possible because this technology can reveal the heterogeneity of tissues. Single-cell RNA-seq enables gene expression measurements in individual cells, thereby enabling cell-type clustering. Despite its huge success, biological inference remains the major limitation because of the sparse nature of the generated data. In addition, there is a large volume of dropout events in the data. DNNs – in particular, GNNs – have been successful in deconvoluting the node relationships in a graph through neighbor information propagation (https://www.nature.com/articles/s41467-021-22197-x).