Skip to content

Latest commit

 

History

History
38 lines (33 loc) · 4.51 KB

07.architecture-and-representation.md

File metadata and controls

38 lines (33 loc) · 4.51 KB

Tip 5: Choose an appropriate data representation and neural network architecture {#architecture}

Neural network architecture refers to the number and types of layers in the network and how they are connected. While certain best practices have been established by the research community [@doi:10.1007/978-3-642-35289-8], architecture design choices remain largely problem-specific and are vastly empirical efforts requiring extensive experimentation. Furthermore, as deep learning is a quickly evolving field, many recommendations are often short-lived and are frequently replaced by newer insights supported by recent empirical results. This is further complicated by the fact that many recommendations do not generalize well across different problems and datasets. Therefore, choosing how to represent data and design an architecture is closer to an art than a science. That said, there are some general principles to follow when experimenting.

First and foremost, knowledge of the available data and scientific question should inform data representation and architectural design choices. For example, if the dataset is an array of measurements with no natural ordering of inputs (such as gene expression data), multilayer perceptrons may be effective. These are the most basic type of neural network. They are able to learn complex non-linear relationships across the input data despite their relative simplicity. Similarly, if the dataset is comprised of images, convolutional neural networks (CNNs) are a good choice because they emphasize local structures and adjacency within the data. CNNs may also be a good choice for learning on sequences. Recent empirical evidence suggests that they can outperform canonical sequence learning techniques such as recurrent neural networks and the closely related long short-term memory networks in some cases [@arxiv:1803.01271]. Transformers are powerful sequence models [@arxiv:1706.03762] but require extensive data and computing power to train from scratch. Accessible high-level overviews of different neural network architectures are provided in [@doi:10.1016/j.ymeth.2020.06.016] and [@doi:10.1038/nature14539].

Deep learning models typically benefit from increasing the amount of labeled training data. Large amounts of data help to avoid overfitting and increase the likelihood of achieving top performance on a given task. If there is not enough data available to train a well-performing model, transfer learning should be considered. In transfer learning, a model whose weights were generated by training on another dataset is used as the starting point for training [@tag:Yosinski2014]. Transfer learning is most useful when the pre-training and target datasets are of similar nature [@tag:Yosinski2014]. For this reason, it is important to search for similar datasets that are already available. However, even when similar datasets do not exist, transferring features can still improve model performance compared with random feature initialization. For example, Rajkomar et al. showed advantages of ImageNet-pretraining [@doi:10.1007/s11263-015-0816-y] for a model that is applied to grayscale medical image classification [@doi:10.1007/s10278-016-9914-9]. Pre-trained models can be obtained from public repositories, such as Kipoi [@doi:10.1038/s41587-019-0140-0] for genomics or Hugging Face [@doi:10.18653/v1/2020.emnlp-demos.6] for biomedical text [@doi:10.1145/3458754], protein sequences [@arxiv:2007.06225], and chemicals [@arxiv:2010.09885]. Moreover, learned features could be helpful even when a pre-training task is different from a target task [@doi:10.1109/CVPRW.2014.131].

Recently, the concept of self-supervised learning, which is closely related to pre-training and transfer learning, has seen an increase in popularity [@doi:10.1016/j.micron.2018.01.010]. Self-supervised learning leverages large amounts of unlabeled data and uses naturally-available information as labels for supervised learning. Thus, self-supervised learning is sometimes also described as autonomous supervised learning. Using self-supervised learning, a model can be pre-trained on a related task before it is trained on the target task. Another related approach is multi-task learning, which simultaneously trains a network for multiple separate tasks that share features. In fact, multi-task learning can be used separately or even in combination with transfer learning [@doi:10.1109/TBDATA.2016.2573280].

Practitioners should base the neural network's architecture on knowledge of the problem and take advantage of similar existing data or pre-trained models.