# Deep Learning 2021-22 / Summer Semester 
## Week 1: Preliminaries

Testing the potential of deep learning presents unique challenges because any single application brings together various disciplines. Applying deep learning requires simultaneously understanding (i) the motivations for casting a problem in a particular way; (ii) the mathematical form of a given model; (iii) the optimization algorithms for fitting the models to data; (iv) the statistical principles that tell us when we should expect our models to generalize to unseen data and practical methods for certifying that they have, in fact, generalized; and (v) the engineering techniques required to train models efficiently, navigating the pitfalls of numerical computing and getting the most out of available hardware. Teaching both the critical thinking skills required to formulate problems, the mathematics to solve them, and the software tools to implement those solutions

### Coding environment

In this class we will mainly use Jupyter notebooks.  
**What is a jupyter notebook?**
*   open-source web application
*   Supports over 40 programming languages, including Python, R, ...
*   allows you to create and share documents containing live code, visualizations, equations, and narrative text (using Markdown)
* Often used in Data Science (data cleaning, transformation, simulation, statistical modeling, visualization, machine learning)
* A nice manual about all functionalities can be found [here](https://jupyter.brynmawr.edu/services/public/dblank/Jupyter%20Notebook%20Users%20Manual.ipynb)
* Successor JupyterLab


**What you need to work with jupyter notebooks**  
There are multiple options:
* Installing it with Anaconda or pip. For this you need to install Anaconda first (Note: we recommend to install the classic version, not miniconda). Here is the link to the [Anaconda resource.](https://docs.anaconda.com/anaconda/install/)
* Here is a nice [blog](https://garywoodfine.com/set-up-anaconda-jupyter-notebook-tensorflow-for-deep-learning/), which explains you how to install jupyter with Anaconda and how to manage environments including python packages.
*   [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb#recent=true) (allows link sharing, uses a standard environment of python packages and versions) --> runs on a Google Server

In this course, we will provide notebooks via Google Colab. If you don't have a google account you can download them and open it on your local machine. (Please note that we cannot provide support for version-problems etc.)

Please set up your jupyter notebooks yourself.

# Pytorch
Ok. Back to this week's topic on Algebra and Calculus.   
In Machine Learning we often deal with large matrices or arrays. For example when you want to analyze images, sound or sensor data. You can process this type of data with the Python modules numpy or pytorch. 

We will start by introducing the tensor, PyTorch's primary tool for storing and transforming numerical data. Tensors support asynchronous computation on CPU, GPU and provide support for automatic differentiation. Automatic differentiation in Numpy does not exist.


In [None]:
import torch
import numpy as np
import pandas as pd

### Scalars, vectors, matrices and tensors

Basic mathematical objects for storing data comprise scalars, vectors, matrices and tensors.


*   Scalars are single numbers that are used to measure a quantity. I.e. 12.5 km or 20 degrees Celcius. Mathematical constants are also represented as scalar values.
*  Vectors are a list of scalars
* Tensors are n-dimensional arrays. Can be one-dimensional or more
*  Matrices are two-dimensional vectors or arrays or 2-dimensional tensors


In [None]:
#Scalar

torch.tensor([np.array([3])])

tensor([[3]])

In [None]:
#Vectors

x = torch.arange(12, dtype=torch.float64)
print(x)

y = torch.tensor(np.array([1,2,3]))
print(y)

tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.],
       dtype=torch.float64)
tensor([1, 2, 3])


In [None]:
#Matrices

M1 = torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]]) 
print(M1,'\n')

M2 = torch.zeros((3, 4))
print(M2,'\n')

M3 = torch.ones((3, 4)) 
print(M3)

tensor([[2, 1, 4, 3],
        [1, 2, 3, 4],
        [4, 3, 2, 1]]) 

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]) 

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])


In [None]:
#3D Tensor

t1 = torch.Tensor(2, 3, 4)
print(t1,'\n') 

t2 = torch.Tensor(np.random.randint(12, size=(2,3,4))) 
print(t2)

tensor([[[8.7958e-35, 0.0000e+00, 7.0065e-44, 7.0065e-44],
         [6.3058e-44, 6.7262e-44, 7.4269e-44, 6.3058e-44],
         [6.7262e-44, 7.5670e-44, 1.1771e-43, 6.7262e-44]],

        [[7.9874e-44, 8.1275e-44, 7.4269e-44, 7.0065e-44],
         [8.1275e-44, 6.8664e-44, 7.1466e-44, 6.4460e-44],
         [7.0065e-44, 7.8473e-44, 7.2868e-44, 7.1466e-44]]]) 

tensor([[[ 6.,  3.,  9.,  0.],
         [ 4.,  6.,  9.,  1.],
         [ 4.,  7., 10.,  3.]],

        [[ 0.,  1., 10.,  0.],
         [ 1.,  3.,  9.,  2.],
         [ 4.,  9.,  0.,  8.]]])


### Attributes of Tensors

Tensors have certain attributes, for example a shape, number of dimensions and the datatype of their values. You can check for these attributes in the following way

In [None]:
X = torch.Tensor(2, 3, 4)

print('Shape attribute:')
print(X.shape, '\n')

print('Number of dimensions:')
print(X.ndim, '\n')

print('Datatype of elements:')
print(X.dtype, '\n')

Shape attribute:
torch.Size([2, 3, 4]) 

Number of dimensions:
3 

Datatype of elements:
torch.float32 



### Indexing to access single elements

In [None]:
X = torch.Tensor(np.random.randint(12, size=(2,3,4)))
print(X, '\n')

print(X[0,1,2], '\n')  #dim, row, col

print(X[0,2,-1]) #last column

tensor([[[ 8.,  2.,  3.,  1.],
         [ 5.,  0.,  4.,  4.],
         [ 1., 10.,  8.,  8.]],

        [[ 0.,  5.,  8., 11.],
         [ 0.,  0.,  7.,  6.],
         [ 9.,  2., 10.,  7.]]]) 

tensor(4.) 

tensor(8.)


### Slicing
If we have matrices or higher-dimensional arrays, we can extract slices, meaning or multiple elements from them. Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.
We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

In [None]:
print(X,'\n\n')

print(X[0,0:1],'\n\n')      #Matrix in first dimension, first row

print(X[0, :, -1],'\n\n')   #last column
 
print(X[:, 1::2],'\n\n')    #Every other element, starting at index 1

tensor([[[ 2.,  9.,  4.,  6.],
         [ 7.,  4.,  3.,  1.],
         [ 8.,  1.,  2., 10.]],

        [[ 7.,  6.,  2.,  0.],
         [ 0.,  5.,  2.,  0.],
         [ 7.,  2.,  7., 11.]]]) 


tensor([[2., 9., 4., 6.]]) 


tensor([ 6.,  1., 10.]) 


tensor([[[7., 4., 3., 1.]],

        [[0., 5., 2., 0.]]]) 




### Reshaping Arrays

Changing the dimensions of a tensor is often used in Machine Learning. 
The most flexible way of doing this is with the ``reshape`` method. But make sure that you can fit all elements into the new reshaped tensor, otherwise you'll get an Error.


In [None]:
#Let't turn our 3 dimensional tensor with the dimensions (2,3,4) into a 2-dimensional (6,4) tensors
print('First, look at X: \n', X, '\n')

print( "X's shape is:", X.shape , '\n') #2*3*4 = 24 elements
Xr = X.reshape((6, 4))
print('Reshaped X into Xr with size 6x4 becomes: \n', Xr, '\n')

#Another example from 2D into 3D
print("Reshaping back into 3D but into a different sized tensor: \n", Xr.reshape((2,4,3)), '\n' )

# Another common reshaping pattern is the conversion of a one-dimensional array/tensor into a two-dimensional row or column matrix.
X1 = torch.Tensor([1, 2, 3])
print(X1, '\n', X1.shape, '\n')

#row vector via reshape
print(X1.reshape((1, 3)) , '\n')

# column vector via reshape
print(X1.reshape((3, 1)) , '\n') # 2D (3,1)

First, look at X: 
 tensor([[[ 8.,  2.,  3.,  1.],
         [ 5.,  0.,  4.,  4.],
         [ 1., 10.,  8.,  8.]],

        [[ 0.,  5.,  8., 11.],
         [ 0.,  0.,  7.,  6.],
         [ 9.,  2., 10.,  7.]]]) 

X's shape is: torch.Size([2, 3, 4]) 

Reshaped X into Xr with size 6x4 becomes: 
 tensor([[ 8.,  2.,  3.,  1.],
        [ 5.,  0.,  4.,  4.],
        [ 1., 10.,  8.,  8.],
        [ 0.,  5.,  8., 11.],
        [ 0.,  0.,  7.,  6.],
        [ 9.,  2., 10.,  7.]]) 

Now reshaping back into 3D but different size: 
 tensor([[[ 8.,  2.,  3.],
         [ 1.,  5.,  0.],
         [ 4.,  4.,  1.],
         [10.,  8.,  8.]],

        [[ 0.,  5.,  8.],
         [11.,  0.,  0.],
         [ 7.,  6.,  9.],
         [ 2., 10.,  7.]]]) 

torch.Size([3]) 

tensor([[1., 2., 3.]]) 

tensor([[1.],
        [2.],
        [3.]]) 



### Arithmetics

You can apply arithmetis on entire tensors and the function will be applied on each element.

In [None]:
print('X1: ', X1, '\n')

#Multiplication and Division
print("Multiplied by 2: ",X1*2, '\n') #Multiplication
print("Divided by 2: ",X1/2, '\n') #Division
print("Modulo 2: ",X1%2, '\n') #Modulo
print("X1 // 2 =", X1 // 2, '\n') # integer (floor) division

#Exponents
print("e^X1 =",torch.exp(X1), '\n')  #e^X1
print("X1^3 =",torch.pow(3, X1), '\n') #X1^3

#Logarithms
print("log(X1) =",torch.log(X1), '\n')
print("log2(X1) =",torch.log2(X1), '\n')
print("log10(X1) =",torch.log10(X1), '\n')

X1:  tensor([1., 2., 3.]) 

Multiplied by 2:  tensor([2., 4., 6.]) 

Divided by 2:  tensor([0.5000, 1.0000, 1.5000]) 

Modulo 2:  tensor([1., 0., 1.]) 

X1 // 2 = tensor([0., 1., 1.]) 

e^X1 = tensor([ 2.7183,  7.3891, 20.0855]) 

X1^3 = tensor([ 3.,  9., 27.]) 

log(X1) = tensor([0.0000, 0.6931, 1.0986]) 

log2(X1) = tensor([0.0000, 1.0000, 1.5850]) 

log10(X1) = tensor([0.0000, 0.3010, 0.4771]) 



  import sys


### Matrix Operations 
Typical operations on tensors are transpositions, obtaining the determinant, the inverse and the norm. These can be coded out as followed:  


Operation  | Math | Numpy
-------------------|------------------|--------  
Transposition | $A^T$ |  A.t(), A.transpose()
Determinant | $|A|$ | torch.det(A)
Inverse | $A^{-1}$| torch.inverse(A)
Norm | $\|A\|$ | torch.norm

#### Transpose of a matrix
The transpose flips a matrix over its diagonale, meaning that the row-values are depicted in columns.

In pytorch, you can simply apply the ``t()``-function to a matrix or use the ``transpose()`` function, if you have multiple dimensions. 
Pytorch's transpose function, requires that you specify the dimension, which you want to transpose.

In [None]:
#Transpose matrices with t()
A = torch.arange(9).reshape(3,3)
print(A, '\n')

print('Transpose: ')
print( A.t(), '\n' )

#For 3-dimensional (or more) tensors you can use transpose()
A3 = torch.Tensor(np.random.randint(12, size=(2,3,4)))
print(A3, '\n')
torch.transpose(A3, 0, 1) #Indicate the dimesions to be transposed

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]) 

Transpose: 
tensor([[0, 3, 6],
        [1, 4, 7],
        [2, 5, 8]]) 

tensor([[[ 6.,  4.,  8.,  7.],
         [ 4., 11.,  8.,  6.],
         [ 0.,  2.,  9.,  4.]],

        [[10.,  7.,  5.,  0.],
         [ 4.,  9.,  0.,  5.],
         [ 0.,  2., 11.,  3.]]]) 



tensor([[[ 6.,  4.,  8.,  7.],
         [10.,  7.,  5.,  0.]],

        [[ 4., 11.,  8.,  6.],
         [ 4.,  9.,  0.,  5.]],

        [[ 0.,  2.,  9.,  4.],
         [ 0.,  2., 11.,  3.]]])

#### The determinant of a matrix
The determinant is a scalar value, which characterizes the properties of a matrix. The determinant of a matrix A is denoted as $|A|$. 

The determinant of a 2x2 matrix can be defined as:  
$|A| = 
\begin{vmatrix}
a & b \\ 
c & d \\
\end{vmatrix} = ad-bc$   

In [None]:
A = torch.arange(4, dtype=float).reshape(2,2)
print(A)
torch.det(A)

tensor([[0., 1.],
        [2., 3.]], dtype=torch.float64)


tensor(-2., dtype=torch.float64)

#### Inverse of a matrix

$A^{-1} = 
\frac{1}{det(A)}*
\begin{vmatrix}
d & -b \\ 
-c & a \\
\end{vmatrix}$   

In [None]:
torch.inverse(A)

tensor([[-1.5000,  0.5000],
        [ 1.0000,  0.0000]], dtype=torch.float64)

In [None]:
A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = A.clone()  # Assign a copy of `A` to `B` by allocating new memory

### Norms

Informally, a Norm tells us how big a vector, matrix or tensor is.

There are different classes of so called p-norms. P can be a number >1. Special Norms are:
*   P=1: Sum of all absolute values in a tensor: $\|x\|_1 = \Sigma^n_{i=1} |x_i|$
*   P=2: The Euklidean Norm (or: $\ell_2$-norm) is calculated as $\|x\|_2 = \sqrt{x_1^2 + \cdots + x_n^2}$ for vectors and for matrices it is: $\sigma_1(A)$
* Frobenius Norm: Only exists for matrices and is defined as $\sqrt{\sum_{i,j} a_{ij}^2}$ or $\sqrt{\sum_{i}\sigma_i^2(A)}$, where $\sigma_i$ is the ith singular value of A

In Machine Learning we often work with the $\ell_1$- and $\ell_2$ norms to improve our model (we will get there later in the semester :-) )

To calculate the norm, we can call ``torch.linalg.norm()`` and specify *p* in the 'ord' parameter .

In [None]:
print(X1,'\n')

#L1-Norm
print( torch.linalg.norm(X1, ord=1) , '\n')

#L2-Norm
print( torch.linalg.norm(X1, ord=2) , '\n')


tensor([1., 2., 3.]) 

tensor(6.) 

tensor(3.7417) 



### Multiplication and Dot product
Element-wise multiplications (often called Haddamart-Product) multiply values at the same position with each other. 

However, one of the most fundamental operations is the dot product.  
Given two vectors $x,y \in \mathbb{R}^d$, their dot product $x^Ty$ is a sum over the products of the elements at the same position: $x^Ty=\Sigma^d_{i=1}x_iy_i$.

Dot products are useful in a wide range of contexts. For example, given a set of weights $\mathbf{w}$, the weighted sum of some values ${u}$ could be expressed as the dot product $\mathbf{u}^T \mathbf{w}$. When the weights are non-negative and sum to one $\left(\sum_{i=1}^{d} {w_i} = 1\right)$, the dot product expresses a *weighted average*. When two vectors each have length one (we will discuss what *length* means below in the section on norms), dot products can also capture the cosine of the angle between them.

Here is an overview of how to implement them.

*   Element-wise-multiplication simply with `*`
*   Vector-vector dot-product with `torch.dot()`
* Matrix-vector dot-product with `torch.mv()`
* Matrix-Matrix dot-product with `torch.mm()`


Below some examples. You can also see if you can reproduce them on paper.

In [None]:
#Element-wise multiplication of two vectors
x1 = torch.Tensor(np.array([2,2,2]))
x2 = torch.Tensor(np.array([1,2,3]))
print(x1*x2)

#Element-wise Matrix-Matrix multiplication
M1 = torch.Tensor(np.array([[2, 2], [3, 4]])) #[first row], [second row]
M2 = torch.Tensor(np.array([[1, 2], [4, 5]]))
print(M1*M2)

#Element-wise matrix vector products
a = torch.Tensor(np.array([3,4]))
print(a*M1)


# Vector-vector dot product results in a scalar
print(torch.dot(x1, x2))

#A matrix-vector dot-product
print('The matrix: \n', M1)

vec = torch.Tensor(np.array([2,2]))
print('The vector: \n', vec)

prod = torch.mv(M1, vec)
print('The product of both: \n', prod)

print(torch.mm(M1,M2))

tensor([2., 4., 6.])
tensor([[ 2.,  4.],
        [12., 20.]])
tensor([[ 6.,  8.],
        [ 9., 16.]])
tensor(12.)
The matrix: 
 tensor([[2., 2.],
        [3., 4.]])
The vector: 
 tensor([2., 2.])
The product of both: 
 tensor([ 8., 14.])
tensor([[10., 14.],
        [19., 26.]])


In just this section, we have taught you all the linear algebra that you will need to understand a remarkable chunk of modern deep learning. There is a lot more to linear algebra and a lot of that mathematics is useful for machine learning. For example, matrices can be decomposed into factors, and these decompositions can reveal low-dimensional structure in real-world datasets. There are entire subfields of machine learning that focus on using matrix decompositions and their generalizations to high-order tensors to discover structure in datasets and solve prediction problems. But this book focuses on deep learning. And we believe you will be much more inclined to learn more mathematics once you have gotten your hands dirty deploying useful machine learning models on real datasets. So while we reserve the right to introduce more mathematics much later on, we will wrap up this section here.

### Why do we need Calculus? 

In deep learning, we train models, updating them successively so that they get better and better as they see more and more data. Usually, getting better means minimizing a loss function, a score that answers the question “how bad is our model?” This question is more subtle than it appears. Ultimately, what we really care about is producing a model that performs well on data that we have never seen before. But we can only fit the model to data that we can actually see. Thus we can decompose the task of fitting models into two key concerns: (i) optimization: the process of fitting our models to observed data; (ii) generalization: the mathematical principles and practitioners’ wisdom that guide as to how to produce models whose validity extends beyond the exact set of data examples used to train them.

To help you understand optimization problems and methods in later chapters, here we give a very brief primer on differential calculus that is commonly used in deep learning.

### Automatic Differentiation

Pytorch is amazing because it comes with the 'autograd' package. The autograd package can automatically calculate derivatives. This would not be possible with numpy. 

You will learn how this works in the next lectures. For now we just want to show you how pytorch can calulate derivatives.

**A simple example**  
Let's find the derivative of the following function:  
$f(x) = 5x^4 + 3x^3 + 7x^2 + 9x -5$

If we write this out by hand, the first derivative is:  
$f'(x) = 20x^3 + 9x^2 + 14x + 9$

Now, we want to find the value of this derivative where x=2.  
$f'(2) = 160 + 36 + 28 + 9 = 233$

To start, let's create the variable `x` and assign it to an initial value.
Once we compute the gradient of ``y`` with respect to ``x``. We can tell pytorch that we want to store a gradient of our tensor by the ``requires_grad=True`` keyword.

In [None]:
import torch
from torch.autograd import Variable

In [None]:
x = torch.autograd.Variable(torch.Tensor([2]),requires_grad=True) #Defines the value at which we want to compute the derivative
y = 5*x**4 + 3*x**3 + 7*x**2 + 9*x - 5 #Defines the function we want to compute the derivative of 
print('x=', x)
print('y=', y)

x= tensor([2.], requires_grad=True)
y= tensor([145.], grad_fn=<SubBackward0>)


In [None]:
y.backward() #This function calculates the gradient or first derivative of y w.r.t x=2
x.grad #Outputs the derivative value stored at x=2

tensor([233.])

### Data Import

In [None]:
import os

os.makedirs(os.path.join('..', 'data'), exist_ok=True)
data_file = os.path.join('..', 'data', 'house_tiny.csv')
with open(data_file, 'w') as f:
    f.write('NumRooms,Alley,Price\n')  # Column names
    f.write('NA,Pave,127500\n')  # Each row represents a data example
    f.write('2,NA,106000\n')
    f.write('4,NA,178100\n')
    f.write('NA,NA,140000\n')

In [None]:
data = pd.read_csv(data_file)
print(data,'\n')

inputs, outputs = data.iloc[:, 0:2], data.iloc[:, 2]
inputs = inputs.fillna(inputs.mean())
print(inputs,'\n')

inputs = pd.get_dummies(inputs, dummy_na=True)
print(inputs,'\n')

X, y = torch.tensor(inputs.values), torch.tensor(outputs.values)
X, y

   NumRooms Alley   Price
0       NaN  Pave  127500
1       2.0   NaN  106000
2       4.0   NaN  178100
3       NaN   NaN  140000 

   NumRooms Alley
0       3.0  Pave
1       2.0   NaN
2       4.0   NaN
3       3.0   NaN 

   NumRooms  Alley_Pave  Alley_nan
0       3.0           1          0
1       2.0           0          1
2       4.0           0          1
3       3.0           0          1 



  """


(tensor([[3., 1., 0.],
         [2., 0., 1.],
         [4., 0., 1.],
         [3., 0., 1.]], dtype=torch.float64),
 tensor([127500, 106000, 178100, 140000]))

### Finding All the Functions and Classes in a Module
In order to know which functions and classes can be called in a module, we invoke the dir function. For instance, we can query all properties in the module for generating random numbers:

In [None]:
print(dir(torch.distributions))

['AbsTransform', 'AffineTransform', 'Bernoulli', 'Beta', 'Binomial', 'CatTransform', 'Categorical', 'Cauchy', 'Chi2', 'ComposeTransform', 'ContinuousBernoulli', 'CorrCholeskyTransform', 'Dirichlet', 'Distribution', 'ExpTransform', 'Exponential', 'ExponentialFamily', 'FisherSnedecor', 'Gamma', 'Geometric', 'Gumbel', 'HalfCauchy', 'HalfNormal', 'Independent', 'IndependentTransform', 'Kumaraswamy', 'LKJCholesky', 'Laplace', 'LogNormal', 'LogisticNormal', 'LowRankMultivariateNormal', 'LowerCholeskyTransform', 'MixtureSameFamily', 'Multinomial', 'MultivariateNormal', 'NegativeBinomial', 'Normal', 'OneHotCategorical', 'OneHotCategoricalStraightThrough', 'Pareto', 'Poisson', 'PowerTransform', 'RelaxedBernoulli', 'RelaxedOneHotCategorical', 'ReshapeTransform', 'SigmoidTransform', 'SoftmaxTransform', 'StackTransform', 'StickBreakingTransform', 'StudentT', 'TanhTransform', 'Transform', 'TransformedDistribution', 'Uniform', 'VonMises', 'Weibull', 'Wishart', '__all__', '__builtins__', '__cached__'

Generally, we can ignore functions that start and end with __ (special objects in Python) or functions that start with a single _(usually internal functions). Based on the remaining function or attribute names, we might hazard a guess that this module offers various methods for generating random numbers, including sampling from the uniform distribution (uniform), normal distribution (normal), and multinomial distribution (multinomial).

### Finding the Usage of Specific Functions and Classes
For more specific instructions on how to use a given function or class, we can invoke the help function. As an example, let us explore the usage instructions for tensors’ ones function.

In [None]:
help(torch.ones)


Help on built-in function ones:

ones(...)
    ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
    
    Returns a tensor filled with the scalar value `1`, with the shape defined
    by the variable argument :attr:`size`.
    
    Args:
        size (int...): a sequence of integers defining the shape of the output tensor.
            Can be a variable number of arguments or a collection like a list or tuple.
    
    Keyword arguments:
        out (Tensor, optional): the output tensor.
        dtype (:class:`torch.dtype`, optional): the desired data type of returned tensor.
            Default: if ``None``, uses a global default (see :func:`torch.set_default_tensor_type`).
        layout (:class:`torch.layout`, optional): the desired layout of returned Tensor.
            Default: ``torch.strided``.
        device (:class:`torch.device`, optional): the desired device of returned tensor.
            Default: if ``None``, uses the cur

In the Jupyter notebook, we can use ? to display the document in another window. For example, list? will create content that is almost identical to help(list), displaying it in a new browser window. In addition, if we use two question marks, such as list??, the Python code implementing the function will also be displayed.