https://www.youtube.com/watch?v=v-E7xLi_-mk

In [1]:
import galai as gal
from IPython.display import display, Markdown, Latex

In [2]:
model = gal.load_model("base", num_gpus=1)

In [3]:
Markdown(
    model.generate(
        input_text='Deep learning usage in telecommunications',
        new_doc=True,
        top_k=4, penalty_alpha=0.6, max_new_tokens=1000)
)

Deep learning usage in telecommunications is increasing. The number of papers in this area is growing at a fast pace, and this paper aims to provide a survey of deep learning applications in telecommunications. We start by providing an overview of the most popular deep learning architectures, followed by a discussion of their application in different areas of telecommunications. Finally, we conclude the paper with a discussion of challenges and future research directions.

# 1 Introduction

Telecommunications is a fast-growing industry with a wide range of applications, such as voice over IP (VoIP), Internet of Things (IoT), video conferencing, etc. In recent years, there has been a growing interest in applying machine learning techniques to improve the performance of telecommunications systems. This is due to the fact that traditional methods are not able to provide high performance in all scenarios, and they require expert knowledge to design them. On the other hand, artificial intelligence (AI) has the potential to overcome these limitations, and it is expected to revolutionize the way we interact with the world.

Deep learning, one of the most popular AI techniques, was introduced by Rumelhart et al. [[START_REF] Learning representations by back-propagating errors, Rumelhart[END_REF]] in 1986. Since then, it has been used in a wide range of applications, including image recognition [[START_REF] ImageNet classification with deep convolutional neural networks, Krizhevsky[END_REF]], speech recognition [[START_REF] Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, HintonGeoffrey[END_REF]], natural language processing (NLP) [[START_REF] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin[END_REF]], computer vision [[START_REF] Going deeper with convolutions, SzegedyJian Sun and ZissermanAlexey Komodakis, Heidemarie Bruna[END_REF]], autonomous driving [[START_REF] End to End Learning for Self-Driving Cars, Bojarski[END_REF]], among others. The success of deep learning can be attributed to its ability to learn high-level features from raw data without the need for hand-crafted features. This is in contrast to traditional machine learning techniques, which rely on hand-crafted features, e.g., bag-of-words (BoW), and use them to train a classifier.

In this paper, we focus on the application of deep learning in telecommunications. We start by providing an overview of the most popular deep learning architectures, followed by a discussion of their application in different areas of telecommunications. Finally, we conclude the paper with a discussion of challenges and future research directions.

# 2 Deep Learning Basics

Deep learning is a subfield of AI that uses neural networks as a model for data analysis. A neural network is a collection of nodes that are connected to each other in a graph-like structure. The input of the network is a vector of real numbers, and the output is a vector of real numbers. Each node in the network has a set of inputs and an output. The connections between nodes are weighted, and the weight of each connection is a real number.

Figure 1: An example of a fully connected feed-forward neural network.

In the following, we provide a brief overview of the most popular deep learning architectures. For more details, we refer the reader to [[START_REF] Deep Learning, Goodfellow[END_REF]].

# 2.1 Fully Connected Feed-Forward Neural Networks

Fully connected feed-forward neural networks (FFNNs) are the most popular type of neural networks. An FFNN consists of an input layer, one or more hidden layers, and an output layer. The number of nodes in the input layer is equal to the number of features in the input data. The nodes in the hidden layers are connected to the nodes in the previous layer, and the weights of these connections are learned during training. Finally, the nodes in the output layer are connected to the nodes in the previous layer, and the weights of these connections are fixed to 1. Figure 1 shows an example of an FFNN.

The training of an FFNN is performed by minimizing the loss function. The loss function is a function that takes as input the output of the network and the ground truth, and it returns a scalar value that measures the difference between the two. For example, in the case of classification, the loss function can be the cross-entropy loss, which is defined as follows:

\[ L(y,\hat{y})=-\sum_{i=1}^{N}y_{i}\log(\hat{y}_{i}) \] (1)

where \(y\) is the ground truth, \�

In [4]:
Markdown(model.generate(
    input_text='What is traffic simulation',
    new_doc=True,
    max_length=2000,
    top_p=4))

What is traffic simulation software?
Traffic simulation software. Used by engineers, developers and students who have to build and understand transportation systems.
What is the software name for the software tool used by A-level Computer Science students in London School of Economics to develop their knowledge?
What is the software name for the software used by UCLA engineering students to learn and understand transportation in different environments?
What is the software name for the software used by Computer Science students from the University of Toronto to learn Traffic Management, routing and control?
What are the purposes for using software such as Vissim, Vissim 3D, and Uber Traffic Map?

In [18]:
Markdown(model.generate(
    input_text='wiki article on Penises in Machine Learning',
    new_doc=True,
    top_k=10, penalty_alpha=0.7, max_new_tokens=1000))

wiki article on Penises in Machine Learning
Wikimedia Commons has media related to Penises in Machine Learning.
This page was last modified on 14 December 2015, at 19:56.

In [19]:
Markdown(model.generate(
    input_text='lecture notes on the penis influence in machine learning',
    new_doc=False,
    max_length=2000,
    top_p=4))

lecture notes on the penis influence in machine learning (PIM-ML), 2010. Lecture notes on the penis, Dantas[END_REF], on the use of penis texture in machine learning, has recently been presented.

The aim of this work is to show that penis texture can be extracted in a robust and reliable way using a simple and easy-to-use algorithm, namely a Gabor filter. Texture analysis for penile analysis is a highly subjective task, relying on image segmentation through hand-crafted rules. However, by providing such a semi-automatic segmentation we believe the method can be useful to clinicians and scientists who are not able to perform such a manual work.

An approach that combines the use of penile texture and penile shape has been proposed by Srisawat et al [START_REF] An Automatic Penile Shape and Texture-Based Diagnostic System, Srisawat[END_REF]. However, the procedure relies on the user to define the different regions to be considered, which are then automatically extracted, using different image features. This method was tested for the diagnosis of erectile dysfunction (ED) and it showed promising results. In this paper, we apply the use of Gabor filters, for the enhancement of the image and for the detection of the penile texture features. The segmentation is made automatically by exploiting the Gabor filter response, using a simple technique.

# 2. MATERIAL AND METHODS

We applied the texture analysis procedure presented in [START_REF] Machine learning techniques for texture analysis in the penile cavernosa, Dantas[END_REF], for the analysis of the texture features present on the penis. The procedure is based on using Gabor filters [START_REF] Texture Features for Browsing and Retrieval of Image Data, Manjunath[END_REF] for the extraction of the texture features present on the region of interest (ROI).

# 2.1. Materials

The data set used for the construction of the proposed algorithm was obtained from the PISI project [START_REF] Pisa interactive dataset of images with annotated shape and texture features, Pinto[END_REF]. In particular, 70 cases of penis images were used, i.e., 20 for each of the following pathological conditions: 

•

# 2.2. Extracting the different regions

For the extraction of the texture features, both the spatial image features extracted from the ROI and the Gabor response were extracted. In order to obtain the spatial image features, the ROI was manually segmented using a freehand method (free software). For a correct segmentation of the ROI in both the x and y directions, several steps were performed:

• drawing the border of the penis, based on the following points: the tip of the glans, the base of the penis and the junction of the glans with the corpus;

• defining the width and height of the ROI, based on both the width of penis, that is determined using the landmarks;

• selecting and choosing from the image a suitable scale that is the distance between the center and the tip of the glans, depending on the size of the penis.

# 2.3. Extraction of the spatial image features

As it was mentioned in 2.2, the penis was manually segmented. In order to extract the spatial features, the ROI was considered a rectangular shape, by cutting across the width and the height. In the process of defining a suitable scale, in some cases it was necessary to use the information from the shape and the size of the penis.

Each image was scaled in order to produce a fixed scale for every image. It is important to highlight that for all the cases, the glans is considered the centre of the ROI. This means that, the x and y coordinates of the image correspond to the x and y coordinates relative to the glans.

# 2.4. Extraction of the Gabor response

Given the shape and the scale of the ROI, the Gabor response can be easily extracted.

A Gabor filter is a filter that is specific to the scale and orientation of the image texture, with a Gaussian response profile and a sine spatial frequency profile, to obtain a narrow rectangular spatial filter response profile. Therefore, it was decided to extract the response using a Gaussian and a sine (spatial frequency) filter. The Gaussian profile width, w, is an indirect representation of the spatial frequency, and is related to the orientation, θ, as follows:

w = 2π σ cos θ 2 , (
)
where σ is called sigma.

The number of parameters required to specify a Gabor filter is 5: σ and θ. An example of a Gabor response of a Gabor filter with σ = 5 and θ = 0 degrees is shown in figure 1. Given two vectors of coefficients describing the filter:

• a spatial filter, that is the coordinates of the filter center with respect to the image;

• a frequency filter, that is the parameters that define the filter amplitude and phase.

For the first filter to evaluate the filter response, given the values of σ and θ, a vector of coefficients is defined, in which the vector entries represent the location of the filter center in the image. The second vector of coefficients is used to specify the amplitude, frequency and phase of the second Gaussian filter to use in the filtering process. Given these filter coefficients, the response of the filter is calculated as follows:

(x, y) = 1 √ 2π
σ e jθ K(x − u, y − v)e jw cos(θ)

dwdvdu,
where y and x are the coordinates of the image relative to the centre of the filter, (u, v) are the coordinate of the filter centre with respect to the image space, w , is a linear scale, and K(.) is the Gaussian kernel.

For example, in figure 1-Gabor filter, the response is calculated considering a filter with σ = 5 and θ = 30 degrees. For every σ and θ, the 5 filter responses from equation 1 are combined to create a matrix containing the responses of the filter for each image.

The matrix is filtered on both ends using both a Gaussian and a sine filter. The purpose of this filter is to reduce the high-frequency noise that is present in the matrix. The results are shown in figure 1-Gabor filter with noise reduction.

To obtain the response of one of the scales from equation 1, the spatial filter response obtained using equation 3 is summed and the result is filtered using the filter with the following coefficients:

• a spatial filter;

• a phase and its value is defined relative to the amplitude of the Gaussian response;

• a phase and its value is defined relative to the sine spatial frequency.

The resulting image is given the same name of the scale considered, and is referred to as "scale 0".

In figure 1, for scale 2, w is calculated and the values of u and v are defined with respect to the central point, and hence, the filter response is given the name "scale 0 -scale 2".

For every image, the results obtained from this process are obtained for 11 different scales.

In figure 2, for scale 0 and scale 2, the response of the filter of the corresponding scale has been extracted.

# 2.5. Texture feature extraction

In order to extract the texture feature vector, following the work in [START_REF] Texture Feature Extraction Based on Gabor Wavelet, Dong[END_REF], a set of features was manually selected for which an increase in the filter response was always observed in texture regions.

Considering the images obtained from the PISI dataset, it was possible to extract 12 Gabor filter scales described from 0 to 11 and 3 angles from 0 to 2π. Considering a region containing texture, any filter of any angle that has a response above the image mean was considered positive. 588 features for every image (i.e. 24 images × 4 features) were then obtained. These Gabor filter scales have been considered on the basis of experimental results on the PISI dataset, where a strong enhancement/decrease in the filter response was observed in texture regions.

Using the 5 features for scale 0 (a vector of 5 with value as 4.05 being 4 for the pixel value, the size of the rectangle, the response, and the derivative of the response) and 11 features for scale 2 (a vector of 11 with value as 8.33 being 8 for the response, 11 for the standard deviation, the standard deviation normalized to the average, the energy, and the variance), it was possible to create a feature vector for scale 0 and scale 2.

Given the feature vector for scale 0 and scale 2, the response of each filter was calculated with the following equation:

μ w,θ = i,j∈{0,1,−1} N μ G,θ i,j,w
where i and j are defined in equation 3, and N is the number of pixels in the image for which a response with the corresponding filter has been obtained; μ G,θ i,j,w is the value of the feature (f = 1,2,...,5 for scale 0

In [20]:
Markdown(model.generate(
    input_text='Literature review on Deep learning in the Telecommunications industry',
    new_doc=True,
    max_length=1000,
    top_p=4))

Literature review on Deep learning in the Telecommunications industry.

In [22]:
Markdown(model.generate(
    input_text='lecture notes on the time-to-penis (TTP) principle.',
    new_doc=False,
    max_length=2000,
    top_p=4))

lecture notes on the time-to-penis (TTP) principle., Kortlandt[END_REF] and (A), as follows.

(6) v [[TTP a b [α]]] = if v a ∈ a T (F, ) / ∈ F (α) or v b ∈ b T (F, ) / ∈ F (α) v a [TTP a [α]] = v b [TTP b [α]] =
A is the set of atoms in the tree α, so in a typical world W ∈ F (T ), TTP a would pick, for a fixed T ∈ A, both of the elements in the list {v ∈ A | v T (F, ) = v} and similarly for b.

Since [α] is a tree, [[α]
] and A are disjoint sets by the definition of the tree operator [−]. Since in any world of the forest F ([α]), the forest F is an algebra that is at least as big as F, so [F (α)] is such a forest in any world W ∈ F ([α]). Furthermore, TTP a and TTP b are deterministic as operators on trees and hence 

[[TTP a b [α]]] can be written as ∪ (a,b,W )∈A [[(TTP a b [α]) W ,W ]]

# 3.3 The Axioms

In our model, the semantics gives us a lot of information about the structure of the system described by ( 1) and (2). For example as we state next, in any world W , the elements of F T are a non-increasing function with respect to the TTP a and b operators. (This is intuitively true, because TTP a and b are monotonic and it will be shown that the result is a non-decreasing function because TTP a and b are non-decreasing.)

Proposition 4 Let F ∈ F and a, b ∈ V such that TTP a (F, ) = TTP b (F, ) = b. Then [[TTP a b [F {T }]]] = F {a,b}
To prove the proposition, we show that in a typical world of F (T ), the forest that is the result of applying (43) to {F {T } | T ∈ A} is precisely {F {a,b} | a, b ∈ V, TTP a (F, ) = TTP b (F, ) = b} (using the definition of the forest operator [−], (1) and the definition of the truth-function of the variables).

Axiom 3.1 In any world W ∈ F ,

[[β]] = A 1 ∪ A 2 ,
where

A 1 = ∪ (a,b,W )∈A 1 [[(TTP a b [α]) W ,W ]], A 2 = ∪ (a,b,W )∈A 2 [[(TTP a b [α]) W ,W ]] if a, b TTP a (F, ) = TTP b (F, ) = b, and A 1 = F {T |T ∈A 2 } if a, b TTP a (F, ) = TTP b (F, ) = b,
where A 1 and A 2 satisfy the following property:

a,b / ∈A 1 ([[(TTP a b [α]) W ,W ]] ∩ F (T )) ∪ A 2 = A 1 ∪ A 2
Then it immediately follows that F is an algebra in the forest model (F , F (T)).

# 3.3.1 An Example of the Axioms 3.1

As an example, we consider the world W ∈ F (T ) such that T T (F , ), T T (F , ) and

T T (F , ), since F (T ) = F (T ), (W ) = (W ) and T T (F, (W )).
In this case, by the Axioms 3.1, we get that

A 1 = {F {T } | T ∈ A(W, F )}, A 1 = {F {T } | T ∈ A(W, F )}, A 2 = F (T ) and A 2 = F (T ).
Using the definitions of the truth-functions, we get that

F {b 1 } = b 2 , F {b 2 } = b 1 and F {b 1 ,b 2 } = F {b 1 \{b}} = b 1 .
Since both TTP b 1 and TTP b 2 are the identity operators, F {b 1 } and F {b 2 } will behave exactly like in the second and third world, respectively in ( 7) and ( 8), so F {b 1 } = {F {b 2 }} and F {b 2 } = {F {b 1 ,b 2 }}, as it must be.

In a similar way, in a typical world such that T a (F, ) < T b (F, ) (or T a (F, ) > T b (F, )), it will be shown that

A 1 = F {T |Ta(F, )<T a (F, ) } (A 1 = F {T |Ta(F, )>T a (F, ) }), then, again, it follows that [[TTP a b [α]]] = F {a,b} (which is true, since F {a,b} = {F {T |Ta(F, )<T b (F, ) }}).
For instance, in a typical world W ∈ F (T ) such that TTP a (F, ) < TTP b (F, ), using Axiom 3.1 we obtain

A 1 = F {T |Ta(F, )<T b (F, ) } = F {T b (F, )}, A 2 = F (T )
and thus

[[TTP a b [α]]] = = F ([[TTP a b [α]]]) = F {a,b}∪{b} = F (F {a,b} ) = F {a,b} .

# Chapter 4

The Forest Algebra

# 4.1 Introduction.

In this chapter we describe a model for the logic (1) and (2) (called the forest model), following the ideas developed in [START_REF] The Complexity of Tree Logic and Forest Logic, Körner[END_REF] for a simpler logic L {T },1 for a subset T of the Herbrand term algebra (and the axiom (β)).

After presenting some properties of the forest algebra on which the model is built and presenting the model precisely, we define our models for the logic. Finally, we show that the forest model is consistent with our logic and can be used to model some examples of classical many-valued logics. The models defined in this chapter are new and are used in Chapters 5 and 6.

# 4.2

The Forest Algebra.

In this section, we present the forest algebra -an algebraic formulation of a logical system whose syntax is given in Chapter 3 and whose semantics is defined in Section 3.3. We show that the forest algebra is consistent with the syntax and with the semantics defined in that section. Before presenting the algebraic part we present some algebraic facts and properties of the forest operator [−]. This will later

In [23]:
Markdown(model.generate(
    input_text='This paper introduces the research progress of deep learning in the telecommunications field',
    new_doc=True,
    max_length=2000,
    top_p=6))

This paper introduces the research progress of deep learning in the telecommunications field and proposes "deep learning-based telecommunications" as a new research area. A deep learning based telecommunications model is a typical model-based telecommunications system model composed of multiple sub-networks with multiple nodes and subsystems with multiple layers and multiple dimensions. A detailed introduction to deep learning-based telecommunications in both the system architecture design/model construction and the network configuration optimization/control problems is provided.
In terms of the telecommunications domain research, the research trend is moving towards the integration of artificial intelligence technology with existing telecommunications systems and networks. The telecommunications field has unique characteristics that have not been fully exploited by existing telecommunications technologies. As such, the research on deep learning-based telecommunications is still very important for further exploring the potential of these technologies in the telecommunications field.
Deep learning algorithms have been implemented mainly with computer networks and Internet-of-Things (IoT) applications. To date, the related research has mainly focused on how to apply deep learning to various network protocols (e.g., communication protocols, network architecture, and application protocols). However, the applications of deep learning that support specific telecommunications functions (e.g., optical network functions) are still in the exploration stage and need to be continuously enhanced. In the near future, the telecommunications industry, such as ISPs, should consider how to implement deep learning functions that promote the efficient and smooth operation of telecommunication systems.
Densely connected telecommunication networks are becoming the norm and there are a number of benefits that results from this. The benefits are primarily expected to include improvements in network performance, reduced capital investment costs and more stable operation of telecommunication networks. The introduction of the artificial intelligence (AI) technology into telecommunications networks is expected to have a significant impact in improving the performance and operational stability of telecommunication networks. In order to obtain the most significant benefits, the introduction of AI technology into telecommunications networks should be implemented in an integrated and intelligent way with other advanced telecommunication technologies, e.g., communication networks, energy management, and IoT applications. The introduction of AI technology into telecommunication networks will be beneficial in improving network performance through the following means: reduced time delay, reduced latency, and improved utilization of network resources. The integration of AI technology into telecommunications networks would also achieve improved operational efficiency through the following means: reduced energy consumption, increased reliability, and improved service quality. Finally, the introduction of AI technology into telecommunications networks could lead to an unprecedented increase in network intelligence and self-awareness across multiple telecommunication technologies, thus allowing for the development of the artificial intelligence technology - enabled intelligent telecommunications.
The evolution of telecommunications networks in the age of AI has been driven by significant advances in the telecommunications technologies themselves. The introduction of AI technology into telecommunications networks can potentially provide unprecedented performance, operational efficiency and intelligence benefits. However, some challenges remain in the development of the AI-enabled telecommunication systems given the fact that telecommunication systems are very complex and heterogeneous. This paper addresses some of these challenges as well as possible solutions.
The integration of artificial intelligence (AI) techniques to telecommunication infrastructures has gained momentum. AI has been used to manage and control the complex networks (networks comprising various networks, subsystems, and components) as well as to manage and control the networks. This paper presents some important and unique contributions, focusing on the architecture, control, and management of AI and telecommunications systems.
AI has emerged as an innovative technology, which has been used to manage telecommunication networks during the last decade. Several efforts have been made through this domain to integrate AI systems in telecommunication infrastructures. The objective of all these implementations is indeed to improve the network performance and/or network operational efficiency. However, the implementation of AI-based telecommunication systems are currently only at the concept and design stage with a limited validation using experimental tools. Nevertheless, the implementation of AI-based telecommunication systems is critical and will have an even higher impact on the telecommunications systems. Therefore, we present here an overall view describing the state-of-the-art and open research challenges of AI-based telecommunication systems. Also, we provide also several perspectives based on the promising results of these first applications of AI in the telecommunications network domain.
Many recent advances in AI research have focused on improving AI's ability to learn from experience and to make intelligent decisions. Given the limitations of the traditional AI algorithms that lack the ability to make decisions and learn from experience, deep learning is a promising tool for making intelligent and adaptive decisions in many application domains. One specific domain, where deep learning may have a major impact on AI systems, is telecommunications: with the tremendous growth in the bandwidth (up to 64 × the current rate) and complexity of telecommunication networks in the recent years, deep learning is expected to revolutionize telecommunications in the years to come.

In [25]:
Markdown(model.generate(
    input_text='The paper then summarizes the research status of the CNN and RNN in telecommunications',
    new_doc=False,
    max_length=2000,
    top_p=4))

The paper then summarizes the research status of the CNN and RNN in telecommunications, the performance improvements of these models, and discusses the future research directions of these models. The rest of the paper is organized as follows. First, we summarize the RNNs and CNNs in Section 2. Next, we describe the application of RNN models in telecommunications tasks. Then, the application of CNN models is summarized in Section 3. After that, we briefly review several performance improvements of RNNs and CNNs in Section 4. Finally, we summarize and discuss future research directions of these models in Section 5.

# 2. Review on RNN and CNN

RNNs and CNNs are two common models for sequence-based tasks. In this section, we will review the basic framework of both models and analyze the key characteristics of these models and their application fields in the field of telecommunications.

# 2.1. Recurrent Neural Networks (RNNs)

Figure 1 shows the general framework of a vanilla RNN with one hidden layer. As can be seen, an RNN contains a special kind of recurrent structure that performs the operation of feedback [[START_REF] Learning representations by back-propagating errors, Rumelhart[END_REF]]. A neuron consists of an input layer, a hidden layer, and an output layer as seen for the input y \(=\) [y[START_SUB]1[END_SUB], y[START_SUB]2[END_SUB], ..., yT]. In addition, the information of each input vector is also transferred along the hidden layer. The information flow, which includes the information in the input and hidden layers and the output of the last time step, can be represented as follows:

\[y^{(1)} = \sigma\left( \mathbf{W}\mathbf{x}_{\mathbf{i}} + \mathbf{U}\mathbf{h}^{(0)} + \mathbf{b} \right),\]

\[\mathbf{h}^{(0)} = \varphi\left( \mathbf{W}_{h}y^{(1)} + \mathbf{U}_{h}y^{(2)} \right),\]

\[\mathbf{h}^{(1)} = \sigma\left( \mathbf{W}_{h}y^{(1)} + \mathbf{U}_{h}y^{(2)} \right),\]

\[y^{(2)} = \varphi\left( \mathbf{W}_{h}\mathbf{h}^{(1)} + \mathbf{b}_{y} \right),\]

\[\mathbf{h}^{(2)} = \varphi\left( \mathbf{W}_{h}\mathbf{h}^{(1)} + \mathbf{b}_{y} \right),\]

\[\cdots,\cdots\mspace{600mu}\cdots,\cdots,\cdots\]

where yi is the i-th input vector, σ and \(\varphi\) are the activation functions of the ReLU model and SoftMax activation function respectively, \(\mathbf{W}_{h}\) and \(\mathbf{b}_{h}\) are the weight matrix and bias of the last time step in the hidden layer, \(\mathbf{W}\), \(\mathbf{U}\) and \(\mathbf{b}\) are the weight matrix and bias of the ith input layer, \(\mathbf{U}_{h}\) and \(\mathbf{b}_{h}\) are the weight matrix and bias in the hidden layer. \(\mathbf{h}^{(0)}\) is the value of the hidden layer at time step 0, \(\mathbf{h}^{(1)}\) is the value of the hidden layer at time step 1, \(\mathbf{h}^{(2)}\) is the value of the hidden layer at time step 2, and so on.

It can be seen that by using Equations (3)–(6), a single neuron in a vanilla RNN can be formulated as:

\[h^{(N + 1)} = \varphi\left( \mathbf{W}_{h}y^{(N)} + \mathbf{U}_{h}y^{(N + 1)} + \mathbf{b} \right),\]

where N is the number of time steps in the sequence. Therefore, to be consistent with the conventional notation, we define N to be equal to one, i.e., N = 1 in Equation (7).

For the above-defined RNN structure, there are several challenges to be overcome, the most important challenge is the long-term prediction problem. This problem can be seen more clearly when we try to predict the number of requests of a website as the users of the website receive more requests continually, due to factors such as popularity, popularity change, and so on. To solve this problem, in the literature, several types of methods have been proposed. In this section, we briefly introduce the most commonly-followed methods. More details of these methods can be found in the survey papers of the researchers [[START_REF] Deep Neural Networks with Recurrent Units to Learn Patterns of Human Communication, Qian[END_REF],[START_REF] Applying deep learning to online rating prediction: a review and future directions, Shi[END_REF],[START_REF] Deep learning algorithms for prediction of computer science topics, Guo[END_REF],[START_REF] Application of Deep Neural Network to the Short-Term Traffic Prediction, Yu[END_REF]].

# 2.1.1. Long Short-Term Memory (LSTM)

The conventional RNN shows poor performance in solving the long-term prediction problem [[START_REF] Long Short-Term Memory, Hochreiter[END_REF]], and LSTM could avoid this problem. A LSTM cell consists of three gates to control the information flow [[START_REF] Long Short-Term Memory, Hochreiter[END_REF],[START_REF] Learning to Forget: Continual Prediction with LSTM, Gers[END_REF],[START_REF] A Critical Review of Recurrent Neural Networks for Sequence Learning, Lipton[END_REF]]. For example, the forgetting gate controls the content that can be forgotten over time, the input gate controls how much information will be fed into the new cell state, and the output gate controls how much information will be used to control the output. More details about LSTM are presented in the next subsection. 

# 2.1.2. Gated Recurrent Unit (GRU)

The GRU cell uses three gates to control the information loss, which is similar to the LSTM cell, however, it is simpler compared to the LSTM cell [[START_REF] Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, Chung[END_REF]]. The Gated Recurrent Unit cell can remove unnecessary information from the previous hidden state. For example, it can forget old information by removing information and keeping the short-term history. Since a GRU has a simpler structure, the training can be faster and the computation cost is less.

# 2.1.3. Gated Predictor (GP)

Unlike the RNN structure with an encoder–decoder architecture, in the GP structure, the output of the encoder in the traditional RNN is directly connected to the decoder, and then the decoder predicts the output [[START_REF] Language Modeling with Gated Convolutional Networks, Dauphin[END_REF]]. This feature allows for the use of a single RNN for both tasks. Unlike most of the RNN models, GP can only perform sequence tasks with one-dimensional inputs. Compared with LSTM and GRU, GP, having one more hidden layer, uses more parameters and leads to better performance. 

# 2.2. Convolutional Neural Networks (CNNs)

CNNs are based on two concepts: (i) local receptive fields, and (ii) non-linearity [[START_REF] ImageNet classification with deep convolutional neural networks, Krizhevsky[END_REF]]. The first concept can be found in all of the above-mentioned RNN structures in order to solve the long-term prediction problem. The second concept is very important, because by exploiting both, i.e., local receptive fields and non-linearity, CNNs show better capability in extracting and utilizing features automatically, compared to RNNs [[START_REF] ImageNet classification with deep convolutional neural networks, Krizhevsky[END_REF],[START_REF] On the Properties of Neural Machine Translation: Encoder–Decoder Approaches, Cho[END_REF],[START_REF] Learning long-term dependencies with gradient descent is difficult, Bengio[END_REF]]. To further understand the characteristics of these models, we give a brief introduction about the two concepts. 

# 2.2.1. Local Receptive Fields

The local receptive field is one of the core concepts of a CNN structure [[START_REF] On the Properties of Neural Machine Translation: Encoder–Decoder Approaches, Cho[END_REF],[START_REF] Learning long-term dependencies with gradient descent is difficult, Bengio[END_REF]]. A CNN has

In [27]:
Markdown(model.generate(
    input_text='The paper examines state of the art deep learning techniques used in telecommunications industry.',
    new_doc=True,
    max_length=2000,
    top_p=1))

The paper examines state of the art deep learning techniques used in telecommunications industry. Some of the fundamental challenges associated with deep learning models in this domain are discussed. The paper explores a wide variety of recent deep learning models and algorithms applied for data-hungry tasks such as speech recognition and text based tasks. The paper also discusses a variety of recent techniques that focus on improving the performance of machine learning models for high volume dataset. This is an overview of some of the recent techniques that have been used in industry for improving the performance of state-of-art deep learning models for data-hungry tasks.

Deep learning is a subfield of machine learning which leverages the use of neural networks to process input data in a way that is very similar to how neurons function in the human brain. It has been shown recently that deep learning models are effective in different areas of application. For example, deep learning models are used to solve a range of tasks such as image recognition and text classification. Also, deep learning models are being actively explored and developed to improve tasks related to automatic speech recognition and computer vision. Moreover, many large scale internet companies have also implemented neural networks to solve their tasks in a machine learning fashion. However, the area of artificial intelligence has been challenging the way that the deep learning model works. The lack of understanding behind how it works hinders the use of it in different application domains [START_REF] Deep Learning and Its Applications to Signal and Information Processing: A Survey, Mao[END_REF].

In a standard neural network architecture, in addition to a number of processing elements, there is also a mechanism that maps inputs to the activation of the processing elements. This means that the inputs have to pass through several layers and layers have to use activation function to get the feature. Deep learning model does not have an activation function but it does have a number of layers. As the model uses multiple layers, it is also called a multi-layered neural network. The inputs of the network are the previous layer outputs and the outputs of the last layer are used as predictions. This is a generic structure of the network and it is used for multiple tasks such as image recognition and text classification. The number of layers in the network varies from a few to thousands. The processing elements in each layer also vary. It can be a single neuron or it can have some activation function, e.g. sigmoid or softmax activation function. All processing elements within each layer are connected to all processing elements of subsequent layers. The layers are organized so that one input is forwarded to all the processing elements of first hidden layer and then it is forwarded to all the processing elements of second hidden layer and then again it is forwarded to all the processing elements in a third hidden layer. The processing elements of the last layer are not connected to the processing elements of previous layer. The number of neurons in each processing element varies from a few in the input layer to a large number of neurons in final neural network layer [START_REF] Deep Learning and Its Applications to Signal and Information Processing: A Survey, Mao[END_REF] .

Deep learning model has to handle larger amount of data so as to have improved representation ability. To deal with this challenge, the deep learning model is represented by several hidden layers. The last layer of this model classifies the input data through a Softmax activation function. The number of neurons in each layer varies from a few to thousands. Therefore, processing of input data takes more time and is more resource hungry. This is a major problem for online algorithms as well as for the model training. Several of the techniques that are discussed in this paper aim to improve this problem.

Several recent techniques focus on improving the model training as well as reducing the need for training the model online. The techniques are classified into five groups based on the level of training data. As the model is required to learn using only a single layer of data, it is termed as [1 -1] level of training data. The techniques in the next group [2 -2] are used to improve the performance of deep learning models for single data point. In [3 -3], the techniques that aim to improve the training of model by combining multiple levels of training data together is discussed. We then further classify the techniques that take advantage of online modeling as [4 -4] and [5 -5] level of training data. Techniques in [1 -1], [2 -2] aim at learning using less training data and still achieve good results. Further, these techniques also have the advantage of requiring less time to generate predictions. Techniques in [3 -3] try to learn from multiple levels of training data. This helps to obtain good results using large amount of input data. However, the predictions for each instance is still obtained after the model has been trained. Techniques in [4 -4] focus on learning offline using a model which is trained incrementally, so as to predict the next instance given the previous ones. This reduces the training time as compared to online models. Techniques in [5 -5] target online learning, i.e., training each instance independently. This helps to train the model on high level of data quickly and hence reduce the amount of training data required for obtaining good results. The techniques discussed in this article are categorized based on the level of training data and the model architecture.

# II. RELATED WORK

# A. Deep Learning Models

A deep learning model contains several hidden layers that are connected to the input layer. These hidden layers contain a large number of neurons. Typically each neuron is activated by passing one or more input patterns through the network. The number of input patterns depends on the input size. Generally, the number of input patterns varies from a few to thousands. Each input pattern is a combination of multiple features or inputs like input images, audio signals, numerical vectors or textual data. Each input pattern is treated as a separate input by the input layer. The number of neurons in the input layer depends on the number of input features. This is not the case for the hidden layers as the number of neurons in each hidden layer is larger than the input layer [START_REF] Deep Learning and Its Applications to Signal and Information Processing: A Survey, Mao[END_REF]. The number of neurons in the last layer in deep learning model are not equal as the final activation of the model is used for classification.

Deep learning has been shown to be effective for image (for example in [START_REF] Advances in Neural Information Processing Systems, Yann[END_REF] and [START_REF] ImageNet classification with deep convolutional neural networks, Krizhevsky[END_REF] for image recognition), text and speech (for example in [START_REF] Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, Hinton[END_REF] for text classification), audio classification (for example in [START_REF] Deep Convolutional Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, Hinton[END_REF] for audio classification) and other applications. The deep learning model is used to learn a representation of the data i.e. the deep learning model represents the data by finding a compressed feature space and uses this representation for classification [START_REF] Deep learning. Book in preparation for, Goodfellow[END_REF]. It is an effective approach for classifying data that has a large dimension in the input space (for example in [START_REF] Speech recognition with deep recurrent neural networks, Graves[END_REF] for speech recognition). The high speed processing of image, speech in mobile devices is another advantage of this model. For example, the model is used for image, audio and text processing in the 4G cellular systems (e.g in Google Colab) . These applications are also referred to as data hungry applications. However, due to the processing characteristics of deep learning model, it still requires relatively large computations. These reasons has motivated many researchers to develop new deep learning techniques that reduce the processing requirements and also improve the models' generalization performance.

# B. Recurrent Neural Network (RNN)

RNNs (recursively used neural networks) have been shown very effective for modeling sequential data. RNN consists of a sequence of nodes known as neurons. Each of these neurons are connected to all the nodes of the succeeding neuron. They have a state variable that can capture the past information. They are usually trained using back propagation of error methods. The output of the neuron is a function of the state of the preceding neuron and the current input [START_REF] Deep Learning and Its Applications to Signal and Information Processing: A Survey, Mao[END_REF]. This concept is extended by using a loop or cycle in the neural network connections. RNNs have shown state-of-the-art results for task such as sequence to sequence task in [START_REF] Sequence to Sequence Learning with Neural Networks, Sutskever[END_REF]. Recurrent connection helps RNNs to learn a compact representation of the data through the learning process. The compact representation allows the model to capture some long-range information. The disadvantage of this model is that the amount of memory space required to store the previous states is proportional to the number of nodes in the network. Although this memory problem is not an essential problem that limit the applicability of standard RNNs, it is very harmful for memory constrained embedded devices where the size of such devices are not very large. Recent researches have been focused on developing techniques to overcome this problem. One of the popular techniques is gradient vanishing [START_REF] Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies, Hochreiter[END_REF]. The gradient in this case decreases rapidly as the model is trained. This allows training of the model with less number of hidden units. Another method proposed to improve this problem is a regularization technique using dropout [START_REF] Improving deep neural networks for LVCSR using rectified

In [28]:
input_text = [
    'literature review on Machine Learning',
    'thesis on Artificial Intelligence in Telecommunications',
    'Title: Literature review on Machine Learning (ML) and Deep Learning (DL) in the field of Telecommunications',
    'Section 3: Literature review on ML and DL in the field of telecommunications'
]

In [29]:
Markdown(model.generate(
    input_text=input_text[1],
    new_doc=True,
    max_length=2000,
    top_p=0.7))

thesis on Artificial Intelligence in Telecommunications, 2014.
with a \(1\%\) probability of success.
the thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2013.
thesis on Machine Learning for Signal Processing, 2013.
thesis on Signal Processing for Cognitive Radio Networks, 2013.
thesis on Neural Networks for Signal Processing, 2012.
thesis on Machine Learning for Signal Processing, 2012.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2012.
thesis on Artificial Intelligence in Telecommunications, 2011.
thesis on Machine Learning for Signal Processing, 2011.
thesis on Machine Learning for Signal Processing, 2011.
thesis on Machine Learning for Signal Processing, 2011.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2011.
thesis on Artificial Intelligence in Telecommunications, 2010.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2010.
thesis on Artificial Intelligence in Telecommunications, 2009.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2009.
thesis on Artificial Intelligence in Telecommunications, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent Systems for Next Generation Wireless Networks, 2008.
thesis on Multi-Agent

In [30]:
Markdown(model.generate(
    input_text=input_text[2],
    new_doc=True,
    max_length=2000,
    top_p=0.7))

Title: Literature review on Machine Learning (ML) and Deep Learning (DL) in the field of Telecommunications

Abstract: Telecommunication is the communication between two or more points of interests. The information is conveyed in the form of voice, video, images, and other data over the communication channel. The transmission of the information is not possible without the use of an information carrier. The communication channels used in the communication process are based on electromagnetic waves. Telecommunication is an important field of science that deals with the study of the nature of information and how it is transmitted. It deals with the study of the way information is transmitted in a medium and the processes that take place during transmission. This paper is a review of the literature on Machine Learning (ML) and Deep Learning (DL) in the field of telecommunications. This paper covers the various aspects of Machine Learning and Deep Learning. It also discusses various applications of Machine Learning and Deep Learning in the field of telecommunications.

In [31]:
Markdown(model.generate(
    input_text='## Introduction',
    new_doc=False,
    max_length=2000,
    top_p=0.7))

## Introduction

The most common form of cancer in the United States is non-Hodgkin lymphoma (NHL).[START_SUP]1[END_SUP] Although most NHL cases are treated with chemotherapy, the median survival time for patients with aggressive disease is only 6 to 8 months.[START_REF] Non-Hodgkin lymphoma: 2016 update on diagnosis, risk-stratification, and clinical management, Vose[END_REF],[START_REF] Current treatment strategies for patients with aggressive non-Hodgkin lymphoma, Younes[END_REF] This poor outcome has been attributed to the presence of a large number of subclones within the tumor, which are often genetically unstable.[START_SUP]4[END_SUP] Moreover, these subclones are likely to have distinct phenotypic and functional properties that are not readily captured by bulk analyses of bulk tumor samples.[START_SUP]5[END_SUP]

Single-cell technologies such as single-cell RNA sequencing (scRNA-seq) can identify the cellular heterogeneity within a tumor.[START_REF] Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq, Tirosh[END_REF],[START_REF] Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma, Patel[END_REF] The application of scRNA-seq in the field of oncology has been growing rapidly.[START_REF] Single-Cell Transcriptomics of Human and Mouse Lung Cancers Reveals Conserved Myeloid Populations across Individuals and Species., Zilionis[END_REF],[START_REF] A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure., Baron[END_REF],[START_REF] Single-cell reconstruction of the early maternal-fetal interface in humans, Vento-Tormo[END_REF] In the last few years, several groups have utilized scRNA-seq to characterize the transcriptional heterogeneity of NHL.[START_REF] Single-Cell Profiling of Mantle Cell Lymphoma Reveals the Existence of a Tumorigenic B Cell Subpopulation with Stem Cell Features, Chen[END_REF],[START_REF] Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer, Puram[END_REF],[START_REF] Single-cell profiling of B-cell lymphomas reveals the existence of a population of cells with stem cell-like properties, Pouyade[END_REF],[START_REF] Single-cell RNA-seq reveals distinct subtypes of diffuse large B-cell lymphoma with different pathogenic mechanisms and clinical outcomes., Zhou[END_REF] For example, Pouyade et al. used scRNA-seq to identify a population of B cells with stem cell-like properties within DLBCL.[START_SUP]8[END_SUP] Zhou et al. found that the expression of the tumor suppressor TP53 is heterogeneous in DLBCL.[START_SUP]10[END_SUP] However, the phenotypic heterogeneity of lymphoma cells within the tumor has not been investigated using scRNA-seq, and the relationship between transcriptional heterogeneity and the cellular phenotypes of NHL remains to be explored.

In this study, we used scRNA-seq to identify the cellular heterogeneity of NHL and to understand the relationships between transcriptional heterogeneity and the cellular phenotypes of NHL. We observed the presence of two major B-cell populations in primary NHL samples. The major B-cell population exhibited stem cell-like properties, including a high proliferative index and the expression of stem cell-related genes. The minor B-cell population was enriched in the germinal center (GC)-like B cells and was characterized by a high expression of GC-related genes. We also identified a minor T-cell population in NHL. The minor T-cell population exhibited a high proliferative index and the expression of activation-related genes. Finally, we observed the presence of an activated B-cell population with stem cell-like properties in the GC. Our findings indicate that the cellular heterogeneity of NHL is characterized by the presence of two major B-cell populations with distinct cellular phenotypes and that the GC-like B cells may contribute to the development of GC.

# Results

# Single-Cell Transcriptomic Profiling of Primary Non-Hodgkin Lymphoma

To identify the cellular heterogeneity of NHL, we obtained the scRNA-seq data from three patients with NHL (Supplementary Table 1). After quality control, we obtained 25,324 cells from three patients. The average number of genes expressed in each cell was 5,564 (range: 2,600 to 8,767). We then identified 10 cell types based on the expression of canonical cell-type markers (Supplementary Table 2). The proportions of each cell type in the three patients are shown in Figure 1A. B cells were the most abundant cell type in all three patients. We then performed a differential expression analysis between the B cells of the three patients. We found that the B cells from patient 1 exhibited the highest number of differentially expressed genes compared with the B cells from patients 2 and 3 (Figure 1B). We identified 4,743 genes that were differentially expressed in the B cells of patient 1 compared with the B cells of patients 2 and 3 (Figure 1C).

FIGURE 1: Single-cell transcriptomic profiling of primary non-Hodgkin lymphoma. (A) The proportions of each cell type in the three patients. (B) The number of differentially expressed genes between the B cells of patient 1 and the B cells of patients 2 and 3. (C) Volcano plot of the differentially expressed genes between the B cells of patient 1 and the B cells of patients 2 and 3. The red dots indicate the genes that are upregulated in the B cells of patient 1. (D) Heatmap of the expression of the genes that are differentially expressed between the B cells of patient 1 and the B cells of patients 2 and 3. (E) Expression of the differentially expressed genes between the B cells of patient 1 and the B cells of patients 2 and 3. (F) Gene set enrichment analysis of the differentially expressed genes between the B cells of patient 1 and the B cells of patients 2 and 3. (G) Heatmap of the expression of the genes that are differentially expressed between the T cells of patient 1 and the T cells of patients 2 and 3. (H) Expression of the differentially expressed genes between the T cells of patient 1 and the T cells of patients 2 and 3. (I) Gene set enrichment analysis of the differentially expressed genes between the T cells of patient 1 and the T cells of patients 2 and 3. (J) The proportion of the cells with stem cell-like properties in the B cells of patient 1. (K) The proportion of the cells with stem cell-like properties in the T cells of patient 1. (L) The proportion of the cells with stem cell-like properties in the B cells of patients 2 and 3. (M) The proportion of the cells with stem cell-like properties in the T cells of patients 2 and 3. (N) The proportion of the cells with GC-like properties in the B cells of patient 1. (O) The proportion of the cells with GC-like properties in the T cells of patient 1. (P) The proportion of the cells with GC-like properties in the B cells of patients 2 and 3. (Q) The proportion of the cells with GC-like properties in the T cells of patients 2 and 3. (R) The proportion of the cells with memory-like properties in the B cells of patient 1. (S) The proportion of the cells with memory-like properties in the T cells of patient 1. (T) The proportion of the cells with memory-like properties in the B cells of patients 2 and 3. (U) The proportion of the cells with memory-like properties in the T cells of patients 2 and 3.

# Characterization of the B-Cell Population in Primary Non-Hodgkin Lymphoma

To investigate the cellular heterogeneity of NHL, we first characterized the B-cell population. The B cells of patient 1 exhibited the highest number of differentially expressed genes compared with the B cells of patients 2 and 3 (Figure 1B). Therefore, we focused on the B cells of patient 1 in subsequent analyses. We identified 4,743 genes that were differentially expressed in the B cells of patient 1 compared with the B cells of patients 2 and 3 (Figure 1C). We found that the B cells of patient 1 exhibited higher expression of the stem cell-related genes, including LGR5, CD133, CD24, and CD34, compared with the B cells of patients 2 and 3 (Figure 1D). Moreover, the B cells of patient 1 also exhibited higher expression of the GC-related genes, including PAX5, BCL6, and IRF4, compared with the B cells of patients 2 and 3 (Figure 1E). Gene set enrichment analysis (GSEA) revealed that the B cells of patient 1 exhibited enrichment in the gene sets associated with the stem cell-like phenotype (Figure 1F).

We then performed a gene set variation analysis (GSVA) to identify the cell types that exhibited enrichment in the stem cell-like gene sets. We found that the B cells of patient 1 exhibited enrichment in the stem cell-like phenotype (Figure 1G). Moreover, we found that the T cells of patient 1 exhibited enrichment in the stem cell-like phenotype (Figure 1H). We also identified the stem cell-like B cells in the B cells of patients 2

In [32]:
Markdown(model.generate(
    input_text='## 3. Literature Review',
    new_doc=False,
    max_length=2000,
    top_p=0.7))

## 3. Literature Review

This section reviews the existing work on 3D scene understanding and 3D reconstruction, focusing on existing approaches that have been proposed for both monocular and multiview settings.

# 3.1. Monocular Scene Understanding

The task of monocular scene understanding is to infer semantic information from a single image. For instance, the task of scene parsing is to infer the category of each pixel in the image (see Figure 1). Several approaches to this problem have been proposed. [START_REF] Object Detectors Emerge in Deep Scene CNNs, Zhou[END_REF] showed that convolutional neural networks (CNNs) trained on large-scale image datasets such as ImageNet [START_REF] ImageNet Large Scale Visual Recognition Challenge, Russakovsky[END_REF] can be fine-tuned on datasets of images of real-world scenes to achieve performance on the scene parsing task that is on par with the performance of human annotators. More recently, [START_REF] Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation, Papandreou[END_REF] proposed to use weakly-supervised and semi-supervised learning to train a deep CNN on ImageNet [START_REF] ImageNet Large Scale Visual Recognition Challenge, Russakovsky[END_REF] and use the trained CNN to annotate images of the PASCAL VOC 2012 [START_REF] The Pascal Visual Object Classes (VOC) Challenge, Everingham[END_REF] dataset, which is the most popular dataset for scene parsing. In a follow-up work, [START_REF] Semantic contours from inverse detectors, Hariharan[END_REF] proposed to use inverse detectors [START_REF] Inverse Compositional Object Detection with a Generative Model, Brachmann[END_REF] to infer semantic information from an image, where inverse detectors are models that predict a class label and a bounding box for each object in an image.

One approach to the task of scene parsing is to perform semantic segmentation, which is the task of inferring the class label of each pixel in the image (see Figure 1). Several approaches have been proposed to this problem. [START_REF] Learning Hierarchical Features for Scene Labeling, Farabet[END_REF] proposed a CNN to perform semantic segmentation, which uses a CNN to extract hierarchical features from an image and a fully-connected layer to perform classification. [START_REF] Learning Deconvolution Network for Semantic Segmentation, Noh[END_REF] proposed a method to learn a deconvolutional network for semantic segmentation, which uses deconvolutional layers to upsample the features from the fully-connected layer and uses a CNN to predict the class label of each pixel in the image. [START_REF] Fully Convolutional Networks for Semantic Segmentation, Shelhamer[END_REF] proposed to use a fully convolutional network (FCN) [START_REF] Fully Convolutional Networks for Semantic Segmentation, Shelhamer[END_REF] to perform semantic segmentation. An FCN is a CNN that predicts a class label for each pixel in the image, and the FCN is trained end-to-end by minimizing a loss function that combines the cross-entropy loss of the fully-connected layer with the loss function of a convolutional layer. In a follow-up work, [START_REF] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, Chen[END_REF] proposed to use conditional random fields (CRFs) to model the spatial dependencies between the class labels of neighboring pixels. [START_REF] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform, Chen[END_REF] proposed to use CNNs to extract task-specific edge features and use a CRF to model the dependencies between the class labels of neighboring pixels. [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen[END_REF] proposed to use atrous convolutions [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen[END_REF] to extract features at multiple scales, which are then fed into a fully-connected CRF to model the dependencies between the class labels of neighboring pixels.

One approach to the task of monocular scene understanding is to perform semantic segmentation, which is the task of inferring the class label of each pixel in the image (see Figure 1). Several approaches have been proposed to this problem. [START_REF] Learning Hierarchical Features for Scene Labeling, Farabet[END_REF] proposed a CNN to perform semantic segmentation, which uses a CNN to extract hierarchical features from an image and a fully-connected layer to perform classification. [START_REF] Learning Deconvolution Network for Semantic Segmentation, Noh[END_REF] proposed a method to learn a deconvolutional network for semantic segmentation, which uses deconvolutional layers to upsample the features from the fully-connected layer and uses a CNN to predict the class label of each pixel in the image. [START_REF] Fully Convolutional Networks for Semantic Segmentation, Shelhamer[END_REF] proposed to use a fully convolutional network (FCN) [START_REF] Fully Convolutional Networks for Semantic Segmentation, Shelhamer[END_REF] to perform semantic segmentation. An FCN is a CNN that predicts a class label for each pixel in the image, and the FCN is trained end-to-end by minimizing a loss function that combines the cross-entropy loss of the fully-connected layer with the loss function of a convolutional layer. In a follow-up work, [START_REF] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, Chen[END_REF] proposed to use conditional random fields (CRFs) to model the spatial dependencies between the class labels of neighboring pixels. [START_REF] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform, Chen[END_REF] proposed to use CNNs to extract task-specific edge features and use a CRF to model the dependencies between the class labels of neighboring pixels. [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen[END_REF] proposed to use atrous convolutions [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen[END_REF] to extract features at multiple scales, which are then fed into a fully-connected CRF to model the dependencies between the class labels of neighboring pixels. [START_REF] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform, Chen[END_REF] proposed to use CNNs to extract task-specific edge features and use a CRF to model the dependencies between the class labels of neighboring pixels. [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen[END_REF] proposed to use atrous convolutions [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen[END_REF] to extract features at multiple scales, which are then fed into a fully-connected CRF to model the dependencies between the class labels of neighboring pixels.

One approach to the task of monocular scene understanding is to perform semantic segmentation, which is the task of inferring the class label of each pixel in the image (see Figure 1). Several approaches have been proposed to this problem. [START_REF] Learning Hierarchical Features for Scene Labeling, Farabet[END_REF] proposed a CNN to perform semantic segmentation, which uses a CNN to extract hierarchical features from an image and a fully-connected layer to perform classification. [START_REF] Learning Deconvolution Network for Semantic Segmentation, Noh[END_REF] proposed a method to learn a deconvolutional network for semantic segmentation, which uses deconvolutional layers to upsample the features from the fully-connected layer and uses a CNN to predict the class label of each pixel in the image. [START_REF] Fully Convolutional Networks for Semantic Segmentation, Shelhamer[END_REF] proposed to use a fully convolutional network (FCN) [START_REF] Fully Convolutional Networks for Semantic Segmentation, Shelhamer[END_REF] to perform semantic segmentation. An FCN is a CNN that predicts a class label for each pixel in the image, and the FCN is trained end-to-end by minimizing a loss function that combines the cross-entropy loss of the fully-connected layer with the loss function of a convolutional layer. In a follow-up work, [START_REF] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, Chen[END_REF] proposed to use conditional random fields (CRFs) to model the spatial dependencies between the class labels of neighboring pixels. [START_REF] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform, Chen[END_REF] proposed to use CNNs to extract task-specific edge features and use a CRF to model the dependencies between the class labels of neighboring pixels. [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen[END_REF] proposed to use atrous convolutions [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen[END_REF] to extract features at multiple scales, which are then fed into a fully-connected CRF to model the dependencies between the class labels of neighboring pixels. [START_REF] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform, Chen[END_REF] proposed to use CNNs to extract task-specific edge features and use a CRF to model the dependencies between the class labels of neighboring pixels. [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen[END_REF] proposed to use atrous convolutions [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen[END_REF] to extract features at multiple scales, which are then fed into a fully-connected CRF to model the dependencies between the class labels of neighboring pixels. [START_REF] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform, Chen[END_REF] proposed to use CNNs to extract task-specific edge features and use a CRF to model the dependencies between the class labels of neighboring pixels. [START_REF] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully