Workshop 3 - Introduction to PyTorch
==========================

This is based on the [Introduction to PyTorch](https://pytorch.org/tutorials/beginner/basics/intro.html) and an old text classification with the torchtext library tutorial that is no longer available.

# PyTorch

PyTorch is one of the most widely used libraries for implementing models in NLP today (and all of AI). Today, we'll be learning the basics of the library, including the low-level building blocks of all neural networks.

First, if you are not running this within Ed, you may need to uncomment this line to install the libraries we need (note, this can take a little while):

In [74]:
# !pip3 install torch torchvision torchaudio

Now we can import PyTorch and see what version we have.

In [75]:
import torch
print(torch.__version__)

2.6.0


# Tensors

Tensors are a specialized data structure that are very similar to arrays and matrices.
In PyTorch, tensors encode:
- The inputs to a model
- The outputs of a model
- A model's parameters (e.g., weights)

Tensors are similar to [NumPy’s](https://numpy.org/) ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and
NumPy arrays can often share the same underlying memory. Tensors are also optimized for automatic differentiation (we'll see more about that later in this lab).

Let's import numpy to explore the relationship between the data types:

In [76]:
import numpy as np

### Initializing a Tensor

Tensors can be initialized in various ways. Take a look at the following examples:

**Directly from data**

Tensors can be created directly from data. The data type is automatically inferred.



In [77]:
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
print(data, "has type", x_data.dtype)

float_data = [[1.0, 2.0],[3.0, 4.0]]
xf_data = torch.tensor(float_data)
print(float_data, "has type", xf_data.dtype)

[[1, 2], [3, 4]] has type torch.int64
[[1.0, 2.0], [3.0, 4.0]] has type torch.float32


**From a NumPy array**

Tensors can be created from NumPy arrays.



In [78]:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
print("From NumPy ndarray:\n", x_np)
print("From List:\n", x_data)

From NumPy ndarray:
 tensor([[1, 2],
        [3, 4]])
From List:
 tensor([[1, 2],
        [3, 4]])


**From another tensor:**

Tensors can be created from other tensors. The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

These two functions show examples of creating tensors with the same shape as an existing one, but new data.

In [79]:
x_ones = torch.ones_like(x_data)
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float)
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.4174, 0.1495],
        [0.7509, 0.1825]]) 



**From just a set of dimensions:**

`shape` is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor. The contents of the tensor are determined by the function called.



In [80]:
shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: 
 tensor([[0.1405, 0.6700, 0.3726],
        [0.6949, 0.2467, 0.6011]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


### Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.



In [81]:
tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Operations on Tensors

Our models are implemented as a series of operations on tensors, e.g., multiplying tensors together, and applying non-linear functions to them.

For details on the tensor opertations available, see [this page in the PyTorch documentation](https://pytorch.org/docs/stable/torch.html).

Each of these operations can be run on the GPU (at typically higher speeds than on a
CPU).
Beyond the operations defined below, some other useful ones include:
- `torch.tensor.view(-1)`, which reduces the tensor dimension, useful in convert batch input to single input [documentation](https://pytorch.org/docs/stable/generated/torch.Tensor.view.html)
- `torch.squeeze`, which removes dimensions of size 1 [documentation](https://pytorch.org/docs/stable/generated/torch.squeeze.html)
- `torch.unsqueeze`, which adds a dimension of size 1 [documentation](https://pytorch.org/docs/stable/generated/torch.unsqueeze.html)

By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using
``.to`` method (after checking for GPU availability). Keep in mind that copying large tensors
across devices can be expensive in terms of time and memory!



In [82]:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
    print("GPU is available, moving tensor to it")
    tensor = tensor.to('cuda')
else:
    print("No GPU available")

No GPU available


Now let's see how a few operations work.

**Indexing:**

Indexing into a tensor is similar to accessing an element of a list. The tensor has a multidimensional structure and you can access parts of that structure.

In [83]:
tensor = torch.rand(4, 4)
print("Complete tensor:")
print(tensor)
print('\nFirst row:')
print(tensor[0])
print('\nType of the first row:')
print(type(tensor[0]))

Complete tensor:
tensor([[0.4130, 0.0920, 0.6786, 0.9053],
        [0.5459, 0.9422, 0.6318, 0.0081],
        [0.9379, 0.4625, 0.5738, 0.5780],
        [0.0154, 0.7319, 0.2653, 0.9714]])

First row:
tensor([0.4130, 0.0920, 0.6786, 0.9053])

Type of the first row:
<class 'torch.Tensor'>


**Slicing:**

Sometimes we want to get a part of the tensor that does not correspond to one chunk we can index into. For example, in a 2-D tensor we may want to get a column of the tensor:

In [84]:
tensor = torch.rand(4, 4)
print("Complete tensor:")
print(tensor)
print('\nFirst column: ', tensor[:, 0])
print('\nLast column:', tensor[..., -1])

Complete tensor:
tensor([[0.9030, 0.9575, 0.7527, 0.1481],
        [0.6175, 0.9153, 0.1670, 0.2877],
        [0.1650, 0.1907, 0.1475, 0.2482],
        [0.0515, 0.4364, 0.7500, 0.3689]])

First column:  tensor([0.9030, 0.6175, 0.1650, 0.0515])

Last column: tensor([0.1481, 0.2877, 0.2482, 0.3689])


Note that indexing and slicing gives you a view of the tensor - it does not make a copy.

If you modify the slice then the corresponding part of the tensor will be adjusted:

In [85]:
tensor = torch.rand(4, 4)
print("Original tensor")
print(tensor)
print("\nA column of the tensor")
print(tensor[:,1])
tensor[:,1] = 0
print("\nThe updated tensor")
print(tensor)

Original tensor
tensor([[0.1333, 0.5797, 0.3014, 0.2396],
        [0.5929, 0.4686, 0.9913, 0.4748],
        [0.0424, 0.5769, 0.3952, 0.7800],
        [0.7335, 0.4888, 0.2602, 0.3297]])

A column of the tensor
tensor([0.5797, 0.4686, 0.5769, 0.4888])

The updated tensor
tensor([[0.1333, 0.0000, 0.3014, 0.2396],
        [0.5929, 0.0000, 0.9913, 0.4748],
        [0.0424, 0.0000, 0.3952, 0.7800],
        [0.7335, 0.0000, 0.2602, 0.3297]])


**Joining tensors**

You can use `torch.cat` to concatenate a sequence of tensors along a given dimension.

In [86]:
tensor1 = torch.rand(2, 3)
tensor2 = torch.rand(2, 3)
print("Initial tensors:")
print(tensor1)
print(tensor2)
print("\nCombine along dimension 0")
combined = torch.cat([tensor1, tensor2], dim=0)
print(combined)
print("\nCombine along dimension 1")
combined = torch.cat([tensor1, tensor2], dim=1)
print(combined)

Initial tensors:
tensor([[0.2825, 0.6795, 0.8872],
        [0.1636, 0.2742, 0.3222]])
tensor([[0.0186, 0.4958, 0.1066],
        [0.3346, 0.6978, 0.4516]])

Combine along dimension 0
tensor([[0.2825, 0.6795, 0.8872],
        [0.1636, 0.2742, 0.3222],
        [0.0186, 0.4958, 0.1066],
        [0.3346, 0.6978, 0.4516]])

Combine along dimension 1
tensor([[0.2825, 0.6795, 0.8872, 0.0186, 0.4958, 0.1066],
        [0.1636, 0.2742, 0.3222, 0.3346, 0.6978, 0.4516]])


We can also combine tensors and create a new dimension in the process with `torch.stack`:

In [None]:
tensor1 = torch.rand(2, 3)
tensor2 = torch.rand(2, 3)
print("Initial tensors:")
print(tensor1)
print(tensor2)
print("\nStacked")
# 2个 2行3列的 tensor
combined = torch.stack([tensor1, tensor2], dim=0)
print(combined)

Initial tensors:
tensor([[0.2993, 0.4750, 0.9063],
        [0.5529, 0.8859, 0.9253]])
tensor([[0.9665, 0.6160, 0.7633],
        [0.8730, 0.1644, 0.5906]])

Stacked
tensor([[[0.2993, 0.4750, 0.9063],
         [0.5529, 0.8859, 0.9253]],

        [[0.9665, 0.6160, 0.7633],
         [0.8730, 0.1644, 0.5906]]])


**Arithmetic operations**



In [None]:
# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(tensor)
torch.matmul(tensor, tensor.T, out=y3)


# This computes the element-wise product (dot product). z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

tensor([[0.0178, 0.0000, 0.0908, 0.0574],
        [0.3516, 0.0000, 0.9827, 0.2255],
        [0.0018, 0.0000, 0.1562, 0.6084],
        [0.5381, 0.0000, 0.0677, 0.1087]])

**Single-element tensors** If you have a one-element tensor, for example by aggregating all
values of a tensor into one value, you can convert it to a Python
numerical value using ``item()``:



In [89]:
agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item))

5.274263381958008 <class 'float'>


--------------




### Connect with NumPy [Optional]

It is possible to access the same memory as both a PyTorch tensor and a NumPy array. This allows the data to be modified by either library.

In [None]:
t = torch.ones(5)
print(f"t: {t} (PyTorch tensor)")
n = t.numpy()
print(f"n: {n} (NumPy array)")

# access the same memory
print("\nAdd one to the PyTorch tensor and we now have:")
t.add_(1)
print(f"t: {t}")
print(f"n: {n}")
print("\nAdd one to the NumPy array and we now have:")
np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

print("\nWe can go the other way too:")
n = np.ones(5)
t = torch.from_numpy(n)
print(f"t: {t}")
print(f"n: {n}")
np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([1., 1., 1., 1., 1.]) (PyTorch tensor)
n: [1. 1. 1. 1. 1.] (NumPy array)

Add one to the PyTorch tensor and we now have:
t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]

Add one to the NumPy array and we now have:
t: tensor([3., 3., 3., 3., 3.])
n: [3. 3. 3. 3. 3.]

We can go the other way too:
t: tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
n: [1. 1. 1. 1. 1.]
t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]


# Datasets & DataLoaders


Code for processing data samples can get messy and hard to maintain; we ideally want our dataset code
to be decoupled from our model training code for better readability and modularity.
PyTorch provides two data primitives: `torch.utils.data.DataLoader` and `torch.utils.data.Dataset`
that allow you to use pre-loaded datasets as well as your own data.
`Dataset` stores the samples and their corresponding labels, and `DataLoader` wraps an iterable around
the `Dataset` to enable easy access to the samples.

## Creating a Custom Dataset for your files [Optional Reading]

We are going to load the same data as from Workshop 2, but this time into PyTorch, rather than scikit-learn.

A custom Dataset class must implement three functions: `__init__`, `__len__`, and `__getitem__`.

In the next few sections, we'll explain each function so you can implement the data reader at the bottom of this section.

### `__init__`

The `__init__` function is run once when instantiating the Dataset object. We read the data file and store the relevant parts. We also record some transforms, which will be covered in more detail in the next section.

In [91]:
def __init__(self, json_file, transform=None, target_transform=None):
    self.labels = []
    self.inputs = []
    for line in open(json_file):
        data = json.loads(line.strip())
        for msg, tlabel in zip(data['messages'], data['sender_labels']):
            self.inputs.append(msg)
            if tlabel:
                tlabel = 0
            else:
                tlabel = 1
            self.labels.append(tlabel)
    self.transform = transform
    self.target_transform = target_transform

### `__len__`

The `__len__` function returns the number of samples in our dataset.



In [92]:
def __len__(self):
    return len(self.labels)

### `__getitem__`

The `__getitem__` function loads and returns a sample from the dataset at the given index `idx`.
Based on the index, it gets the data, calls the transform functions on them (if applicable), and returns the
text and a corresponding label in a tuple.

In many cases, there is further preprocessing that occurs here, so we can provide a tensor rather than strings.

In [93]:
def __getitem__(self, idx):
    text = self.inputs[idx]
    label = self.labels[idx]
    if self.transform:
        text = self.transform(text)
    if self.target_transform:
        label = self.target_transform(label)
    return text, label

### The complete data loader [Optional Reading]

Below we have put these pieces together to make the dataset loader.

In [94]:
import json

from torch.utils.data import Dataset

class CustomTextDataset(Dataset):
    def __init__(self, json_file, transform=None, target_transform=None):
        self.labels = []
        self.inputs = []
        for line in open(json_file):
            data = json.loads(line.strip())
            for msg, tlabel in zip(data['messages'], data['sender_labels']):
                self.inputs.append(msg)
                if tlabel:
                    tlabel = 0
                else:
                    tlabel = 1
                self.labels.append(tlabel)
        self.transform = transform
        self.target_transform = target_transform
        print("Read", len(self.labels), len(self.inputs))

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        text = self.inputs[idx]
        label = self.labels[idx]
        if self.transform:
            text = self.transform(text)
        if self.target_transform:
            label = self.target_transform(label)
        return text, label

## Preparing the data for training with DataLoaders [Optional Reading]

The `Dataset` retrieves our dataset's features and labels one sample at a time. While training a model, we typically want to
pass samples in "minibatches", reshuffle the data at every epoch to reduce model overfitting, and use Python's `multiprocessing` to speed up data retrieval.

`DataLoader` is an iterable that abstracts this complexity for us in an easy API.

In [95]:
from torch.utils.data import DataLoader

def get_dataloader(filename):
    dataset = CustomTextDataset(filename)
    dataloader = DataLoader(dataset, batch_size=64, shuffle=True)
    return dataloader, dataset

train_dataloader, train_dataset = get_dataloader("mod-train.jsonl")
validation_dataloader, validation_dataset = get_dataloader("mod-validation.jsonl")
test_dataloader, test_dataset = get_dataloader("mod-test.jsonl")

Read 11766 11766
Read 2782 2782
Read 2741 2741


## Iterate through the DataLoader [Optional Reading]

We have loaded that dataset into the ``DataLoader`` and can iterate through the dataset as needed.
Each iteration below returns a batch of ``train_features`` and ``train_labels`` (containing ``batch_size=64`` features and labels respectively).
Because we specified ``shuffle=True``, after we iterate over all batches the data is shuffled (for finer-grained control over
the data loading order, take a look at [Samplers](https://pytorch.org/docs/stable/data.html#data-loading-order-and-sampler)).



In [96]:
training_text_batch, training_label_batch = next(iter(train_dataloader))
print("First text:")
print(training_text_batch[0])
print("\nFirst label:")
print(training_label_batch)
print("\nNumber of labels:")
print(training_label_batch.size())

First text:
For several reasons: If we take Bulgaria this season they're pushed back into Constantinople with the possibility of three units supporting and not even close to enough units for us to push through them. If we take Constantinople and Bulgaria simultaneously in Fall, we not only deprive them of an additional build, we also put ourselves in the position to destroy them in the next year completely.

First label:
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Number of labels:
torch.Size([64])


--------------




### Prepare data processing pipelines [Optional Reading]

Our data is currently strings, but a neural network needs numbers. To convert to numbers we will create a vocabulary, which maps from tokens to integer IDs.

In [97]:
train_iter = CustomTextDataset("mod-train.jsonl")

def yield_tokens(data_iter):
    for text, _ in data_iter:
        for token in text.split():
            yield token

class Vocab(object):
    def __init__(self, token_iter):
        self.str_to_num = {"<unk>": 0}
        self.num_to_str = ["<unk>"]
        for token in token_iter:
            if token not in self.str_to_num:
                self.str_to_num[token] = len(self.str_to_num)
                self.num_to_str.append(token)

    def __call__(self, tokens):
        return [self.str_to_num.get(t, 0) for t in tokens]

    def __len__(self):
        return len(self.num_to_str)

vocab = Vocab(yield_tokens(train_iter))

print(vocab(['here', 'is', 'an', 'example']))

Read 11766 11766
[429, 45, 54, 2286]


Using this vocabulary, we can define functions that will convert strings.

In [98]:
def text_pipeline(text):
    return vocab(text.split())

def label_pipeline(label):
    return int(label)

print(text_pipeline('here is the an example'))
print(label_pipeline('10'))

[429, 45, 3, 54, 2286]
10


### Generate data batch and iterator [Optional Reading]

The data produced by the `DataLoader` above is not ready to be used by a model because all the words are strings. Now, with the preprocessing functions, we can change that.

Before sending data to the model, `collate_fn` function works on a batch of samples generated from DataLoader. The input to `collate_fn` is a batch of data with the batch size in DataLoader, and `collate_fn` processes them according to the data processing pipelines declared previously. Pay attention here and make sure that `collate_fn` is declared as a top level def. This ensures that the function is available in each worker.

In this example, the text entries in the original data batch input are packed into a list and concatenated as a single tensor for the input of `nn.EmbeddingBag`. The offset is a tensor of delimiters to represent the beginning index of the individual sequence in the text tensor. Label is a tensor saving the labels of individual text entries.

In [99]:
from torch.utils.data import DataLoader

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


def collate_batch(batch):
    label_list, text_list, offsets = [], [], [0]
    for _text, _label in batch:
        label_list.append(label_pipeline(_label))
        processed_text = torch.tensor(text_pipeline(_text), dtype=torch.int64)
        text_list.append(processed_text)
        offsets.append(processed_text.size(0))
    label_list = torch.tensor(label_list, dtype=torch.int64)
    offsets = torch.tensor(offsets[:-1]).cumsum(dim=0)
    text_list = torch.cat(text_list)
    return text_list.to(device), label_list.to(device), offsets.to(device)

train_iter = CustomTextDataset("mod-train.jsonl")
dataloader = DataLoader(
    train_iter, batch_size=8, shuffle=False, collate_fn=collate_batch
)

Read 11766 11766


# Build the Neural Network

Neural networks are comprised of layers/modules that perform operations on data.
The [torch.nn](https://pytorch.org/docs/stable/nn.html) namespace provides all the building blocks you need to
build your own neural network. Every module in PyTorch subclasses the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html).
A neural network is a module itself that consists of other modules (layers). This nested structure allows for
building and managing complex architectures easily.

In the following sections, we'll build a neural network to classify text in the dataset.

The model is composed of the nn.EmbeddingBag layer plus a linear layer for the classification purpose. nn.EmbeddingBag with the default mode of “mean” computes the mean value of a “bag” of embeddings. Although the text entries here have different lengths, nn.EmbeddingBag module requires no padding here since the text lengths are saved in offsets.

Additionally, since nn.EmbeddingBag accumulates the average across the embeddings on the fly, nn.EmbeddingBag can enhance the performance and memory efficiency to process a sequence of tensors

In [100]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader

## Get Device for Training
We want to be able to train our model on a hardware accelerator like the GPU or MPS,
if available. Let's check to see if [torch.cuda](https://pytorch.org/docs/stable/notes/cuda.html)
or [torch.backends.mps](https://pytorch.org/docs/stable/notes/mps.html) are available, otherwise we use the CPU.

Note: The `nn.EmbeddingBag` is not implemented for `mps` so we will not use it even if it is available. The code is included for completeness here.

In [101]:
device = 'cpu'
if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    pass
    #device = "mps"
print(f"Using {device} device")

Using cpu device


## Define the Class
We define our neural network by subclassing ``nn.Module``, and
initialize the neural network layers in ``__init__``. Every ``nn.Module`` subclass implements
the operations on input data in the ``forward`` method.



In [None]:
class TextClassificationModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_class):
        '''
        具体来说，super(TextClassificationModel, self) 获取到的是 TextClassificationModel 的父类 nn.Module，然后调用它的构造函数 __init__()。这种写法比较直观，但在 Python 3 中可以使用更简洁的写法。

        super().__init__()
        这种写法是 隐式指定父类 的写法。它是 Python 3 中的一个特性，利用 super() 自动推断当前类和实例的父类，因此不需要显式指定类名和实例。

        super().__init__() 的效果与 super(TextClassificationModel, self).__init__() 完全相同，都是调用 nn.Module 的构造函数。
        '''
        super(TextClassificationModel, self).__init__()
        # Create the embedding, which goes from token IDs to a vector, which is the sum of word vectors
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=False)
        # Create a linear layer, which is a matrix of weights we multiply the embedding by to get scores
        self.fc = nn.Linear(embed_dim, num_class)
        # Call the function below to initialise the values of the weights
        self.init_weights()

    def init_weights(self):
        # This function sets the starting / initial value of the weights for each part of the model
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()

    def forward(self, text, offsets):
        # This function does the actual computation. When text comes in as a set of token IDs, it runs the embedding, then the linear layer, to get scores for each possible label
        embedded = self.embedding(text, offsets)
        return self.fc(embedded)

We build a model with the embedding dimension of 64. The vocab size is equal to the length of the vocabulary instance. The number of classes is equal to the number of labels. We also set a seed for the random number generator so that results are consistent across students:

In [103]:
train_iter = CustomTextDataset("mod-train.jsonl")
num_class = len(set([label for (text, label) in train_iter]))
vocab_size = len(vocab)
emsize = 64
torch.manual_seed(0)
model = TextClassificationModel(vocab_size, emsize, num_class).to(device)

Read 11766 11766


To use the model, we pass it the input data. This executes the model's ``forward``,
along with some [background operations](https://github.com/pytorch/pytorch/blob/270111b7b611d174967ed204776985cefca9c144/torch/nn/modules/module.py#L866).
Do not call ``model.forward()`` directly!

Calling the model on the input returns a 2-dimensional tensor with dim=0 corresponding to each output of the two raw predicted values for each class (whether the message is the truth or a lie), and dim=1 corresponding to the individual values of each output.
We get the prediction probabilities by passing it through an instance of the ``nn.Softmax`` module.

We haven't trained the model yet, so the code below will run, but its output will be somewhat random.

In [104]:
X = torch.tensor(text_pipeline("This is a sample sentence"), dtype=torch.int64)
X = X.to(device)
offsets = torch.tensor([0])
offsets = offsets.to(device)
logits = model(X, offsets)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([0])


--------------




## Model Layers

Let's break down the layers in the classification model. To illustrate it, we
will take a sample text input and see what happens as we pass it through the network.


In [None]:
input_text = "This is a sample"
# 负责把 input_text 转换成整数索引
processed_text = torch.tensor(text_pipeline(input_text), dtype=torch.int64)
print(processed_text.size())

torch.Size([4])


### nn.EmbeddingBag
We initialize the [nn.EmbeddingBag](https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html) layer to convert each token into a vector (the minibatch dimension (at dim=0) is maintained).



In [106]:
embed_dim = 128
embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=False)
embedded_text = embedding(processed_text, torch.tensor([0]))

### nn.Linear
The [linear layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)
is a module that applies a linear transformation on the input using its stored weights and biases.




In [107]:
linear_layer = nn.Linear(embed_dim, num_class)
after_linear = linear_layer(embedded_text)
print(after_linear.size())

torch.Size([1, 2])


## Model Parameters
Many layers inside a neural network are *parameterized*, i.e. have associated weights
and biases that are optimized during training. Subclassing ``nn.Module`` automatically
tracks all fields defined inside your model object, and makes all parameters
accessible using your model's ``parameters()`` or ``named_parameters()`` methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.




In [108]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: TextClassificationModel(
  (embedding): EmbeddingBag(16377, 64, mode='mean')
  (fc): Linear(in_features=64, out_features=2, bias=True)
)


Layer: embedding.weight | Size: torch.Size([16377, 64]) | Values : tensor([[ 0.2645, -0.4655,  0.3043,  0.4324, -0.4327, -0.2976,  0.4373,  0.3005,
         -0.2616, -0.4378, -0.3856,  0.4366,  0.3949, -0.1520, -0.1325, -0.1351,
          0.1171, -0.3044, -0.4334,  0.3032,  0.3322,  0.3822, -0.3429,  0.2422,
         -0.2833, -0.2283,  0.2864,  0.4530,  0.1991, -0.2419,  0.3447, -0.1111,
         -0.1037,  0.1568, -0.2786, -0.4232,  0.4534,  0.3434, -0.3681, -0.1856,
         -0.4708, -0.2233,  0.1724,  0.2424, -0.4557, -0.1700,  0.1951, -0.2215,
          0.1075,  0.2820,  0.3942,  0.0351, -0.0618,  0.3626, -0.0783,  0.0957,
         -0.2375,  0.1727,  0.3676, -0.2435,  0.3653,  0.0121, -0.1461,  0.1690],
        [-0.1514,  0.3692, -0.3844,  0.3541, -0.0966, -0.3555, -0.3974, -0.1039,
         -0.0290,  0.3862,  0.0554, -0.4057, -0

--------------

# Automatic Differentiation with ``torch.autograd`` [Optional Reading]

When training neural networks, the most frequently used algorithm is
**back propagation**. In this algorithm, parameters (model weights) are
adjusted according to the **gradient** of the loss function with respect
to the given parameter.

To compute those gradients, PyTorch has a built-in differentiation engine
called ``torch.autograd``. It supports automatic computation of gradient for any
computational graph.

Consider the simplest one-layer neural network, with input ``x``,
parameters ``w`` and ``b``, and some loss function. It can be defined in
PyTorch in the following manner:


In [109]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

## Tensors, Functions and Computational graph [Optional Reading]

This code defines a small **computational graph**. In this network, ``w`` and ``b`` are **parameters**, which we need to
optimize. Thus, we need to be able to compute the gradients of loss
function with respect to those variables. In order to do that, we set
the ``requires_grad`` property of those tensors.



<div class="alert alert-info"><h4>Note</h4><p>You can set the value of ``requires_grad`` when creating a
          tensor, or later by using ``x.requires_grad_(True)`` method.</p></div>



A function that we apply to tensors to construct computational graph is
in fact an object of class ``Function``. This object knows how to
compute the function in the *forward* direction, and also how to compute
its derivative during the *backward propagation* step. A reference to
the backward propagation function is stored in ``grad_fn`` property of a
tensor. You can find more information of ``Function`` [in the
documentation](https://pytorch.org/docs/stable/autograd.html#function)_.




In [110]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x10fcc5000>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x10fcc4730>


## Computing Gradients [Optional Reading]

To optimize weights of parameters in the neural network, we need to
compute the derivatives of our loss function with respect to parameters,
namely, we need $\frac{\partial loss}{\partial w}$ and
$\frac{\partial loss}{\partial b}$ under some fixed values of
``x`` and ``y``. To compute those derivatives, we call
``loss.backward()``, and then retrieve the values from ``w.grad`` and
``b.grad``:




In [111]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.1321, 0.3029, 0.1494],
        [0.1321, 0.3029, 0.1494],
        [0.1321, 0.3029, 0.1494],
        [0.1321, 0.3029, 0.1494],
        [0.1321, 0.3029, 0.1494]])
tensor([0.1321, 0.3029, 0.1494])


<div class="alert alert-info"><h4>Note</h4><p>- We can only obtain the ``grad`` properties for the leaf
    nodes of the computational graph, which have ``requires_grad`` property
    set to ``True``. For all other nodes in our graph, gradients will not be
    available.
  - We can only perform gradient calculations using
    ``backward`` once on a given graph, for performance reasons. If we need
    to do several ``backward`` calls on the same graph, we need to pass
    ``retain_graph=True`` to the ``backward`` call.</p></div>




## Disabling Gradient Tracking

By default, all tensors with ``requires_grad=True`` are tracking their
computational history and support gradient computation. However, there
are some cases when we do not need to do that, for example, when we have
trained the model and just want to apply it to some input data, i.e. we
only want to do *forward* computations through the network. We can stop
tracking computations by surrounding our computation code with
``torch.no_grad()`` block:




In [112]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


Another way to achieve the same result is to use the ``detach()`` method
on the tensor:




In [113]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


There are reasons you might want to disable gradient tracking:
  - To mark some parameters in your neural network as **frozen parameters**.
  - To **speed up computations** when you are only doing forward pass, because computations on tensors that do
    not track gradients would be more efficient.



## More on Computational Graphs [Optional Reading]

Conceptually, autograd keeps a record of data (tensors) and all executed
operations (along with the resulting new tensors) in a directed acyclic
graph (DAG) consisting of
[Function](https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)_
objects. In this DAG, leaves are the input tensors, roots are the output
tensors. By tracing this graph from roots to leaves, you can
automatically compute the gradients using the chain rule.

In a forward pass, autograd does two things simultaneously:

- run the requested operation to compute a resulting tensor
- maintain the operation’s *gradient function* in the DAG.

The backward pass kicks off when ``.backward()`` is called on the DAG
root. ``autograd`` then:

- computes the gradients from each ``.grad_fn``,
- accumulates them in the respective tensor’s ``.grad`` attribute
- using the chain rule, propagates all the way to the leaf tensors.

<div class="alert alert-info"><h4>Note</h4><p>**DAGs are dynamic in PyTorch**
  An important thing to note is that the graph is recreated from scratch; after each
  ``.backward()`` call, autograd starts populating a new graph. This is
  exactly what allows you to use control flow statements in your model;
  you can change the shape, size and operations at every iteration if
  needed.</p></div>



--------------




# Optimizing Model Parameters

Now that we have a model and data it's time to train, validate and test our model by optimizing its parameters on
our data. Training a model is an iterative process; in each iteration the model makes a guess about the output, calculates
the error in its guess (*loss*), collects the derivatives of the error with respect to its parameters, and **optimizes** these parameters using gradient descent.


## Hyperparameters

Hyperparameters are adjustable parameters that let you control the model optimization process.
Different hyperparameter values can impact model training and convergence rates
([read more](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html) about hyperparameter tuning)

We define the following hyperparameters for training:
 - **Number of Epochs** - the number times to iterate over the dataset
 - **Batch Size** - the number of data samples propagated through the network before the parameters are updated
 - **Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.




In [114]:
learning_rate = 5
batch_size = 64
epochs = 5

## Optimization Loop

Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each
iteration of the optimization loop is called an **epoch**.

Each epoch consists of two main parts:
 - **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.
 - **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.

Let's briefly familiarize ourselves with some of the concepts used in the training loop.

### Loss Function

When presented with some training data, our untrained network is likely not to give the correct
answer. **Loss function** measures the degree of dissimilarity of obtained result to the target value,
and it is the loss function that we want to minimize during training. To calculate the loss we make a
prediction using the inputs of our given data sample and compare it against the true data label value.

Common loss functions include [nn.MSELoss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) (Mean Square Error) for regression tasks, and
[nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss) (Negative Log Likelihood) for classification.
[nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) combines `nn.LogSoftmax` and `nn.NLLLoss`.

We pass our model's output logits to `torch.nn.CrossEntropyLoss`, which will normalize the logits and compute the prediction error.



In [115]:
# Initialize the loss function
loss_function = torch.nn.CrossEntropyLoss()

### Optimizer

Optimization is the process of adjusting model parameters to reduce model error in each training step. **Optimization algorithms** define how this process is performed (in this example we use Stochastic Gradient Descent).
All optimization logic is encapsulated in  the ``optimizer`` object. Here, we use the SGD optimizer; additionally, there are many [different optimizers](https://pytorch.org/docs/stable/optim.html)
available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.

We initialize the optimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter.



In [116]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:
 * Call ``optimizer.zero_grad()`` to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
 * Backpropagate the prediction loss with a call to ``loss.backward()``. PyTorch deposits the gradients of the loss w.r.t. each parameter.
 * Once we have our gradients, we call ``optimizer.step()`` to adjust the parameters by the gradients collected in the backward pass.



## Full Implementation
We define `train_loop` that loops over our optimization code, and `test_loop` that
evaluates the model's performance against our test data.

We'll also redefine our data loading code here to make sure it is configured correctly (sometimes in the process of experimenting above, it gets modified).


In [117]:
import time

from torch.utils.data import DataLoader

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def collate_batch(batch):
    label_list, text_list, offsets = [], [], [0]
    for _text, _label in batch:
        label_list.append(label_pipeline(_label))
        processed_text = torch.tensor(text_pipeline(_text), dtype=torch.int64)
        text_list.append(processed_text)
        offsets.append(processed_text.size(0))
    label_list = torch.tensor(label_list, dtype=torch.int64)
    offsets = torch.tensor(offsets[:-1]).cumsum(dim=0)
    text_list = torch.cat(text_list)
    return text_list.to(device), label_list.to(device), offsets.to(device)

train_iter = CustomTextDataset("mod-train.jsonl")
train_dataloader = DataLoader(
    train_iter, batch_size=8, shuffle=False, collate_fn=collate_batch
)
test_iter = CustomTextDataset("mod-test.jsonl")
test_dataloader = DataLoader(
    test_iter, batch_size=8, shuffle=False, collate_fn=collate_batch
)

def train_loop(dataloader, epoch, loss_function, optimizer, model):
    model.train()
    total_acc, total_count = 0, 0
    log_interval = 500
    start_time = time.time()

    for idx, (text, label, offsets) in enumerate(dataloader):
        optimizer.zero_grad()
        predicted_label = model(text, offsets)
        loss = loss_function(predicted_label, label)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)
        optimizer.step()
        total_acc += (predicted_label.argmax(1) == label).sum().item()
        total_count += label.size(0)
        if idx % log_interval == 0 and idx > 0:
            elapsed = time.time() - start_time
            print(
                "| epoch {:3d} | {:5d}/{:5d} batches "
                "| accuracy {:8.3f}".format(
                    epoch, idx, len(dataloader), total_acc / total_count
                )
            )
            total_acc, total_count = 0, 0
            start_time = time.time()


def test_loop(dataloader, loss_function, model):
    model.eval()
    total_acc, total_count = 0, 0

    with torch.no_grad():
        for idx, (text, label, offsets) in enumerate(dataloader):
            predicted_label = model(text, offsets)
            loss = loss_function(predicted_label, label)
            total_acc += (predicted_label.argmax(1) == label).sum().item()
            total_count += label.size(0)
    return total_acc / total_count

Read 11766 11766
Read 2741 2741


We initialize the loss function and optimizer, and pass it to ``train_loop`` and ``test_loop``.
Feel free to increase the number of epochs to track the model's improving performance.



In [118]:
# Hyperparameters
learning_rate = 5
batch_size = 64
epochs = 5

loss_function = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.1)
total_accu = None

train_dataset = CustomTextDataset("mod-train.jsonl")
valid_dataset = CustomTextDataset("mod-validation.jsonl")
test_dataset = CustomTextDataset("mod-test.jsonl")
num_train = len(train_dataset)

train_dataloader = DataLoader(
    train_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_batch
)
valid_dataloader = DataLoader(
    valid_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_batch
)
test_dataloader = DataLoader(
    test_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_batch
)

for epoch in range(1, epochs + 1):
    epoch_start_time = time.time()
    train_loop(train_dataloader, epoch, loss_function, optimizer, model)
    accu_val = test_loop(valid_dataloader, loss_function, model)
    if total_accu is not None and total_accu > accu_val:
        scheduler.step()
    else:
        total_accu = accu_val
    print("-" * 59)
    print(
        "| end of epoch {:3d} | time: {:5.2f}s | "
        "valid accuracy {:8.3f} ".format(
            epoch, time.time() - epoch_start_time, accu_val
        )
    )
    print("-" * 59)


Read 11766 11766
Read 2782 2782
Read 2741 2741
-----------------------------------------------------------
| end of epoch   1 | time:  0.29s | valid accuracy    0.955 
-----------------------------------------------------------
-----------------------------------------------------------
| end of epoch   2 | time:  0.24s | valid accuracy    0.955 
-----------------------------------------------------------
-----------------------------------------------------------
| end of epoch   3 | time:  0.27s | valid accuracy    0.955 
-----------------------------------------------------------
-----------------------------------------------------------
| end of epoch   4 | time:  0.21s | valid accuracy    0.955 
-----------------------------------------------------------
-----------------------------------------------------------
| end of epoch   5 | time:  0.24s | valid accuracy    0.955 
-----------------------------------------------------------


# Save and Load the Model

Finally, we will look at how to save and load a model.


## Saving and Loading Model Weights
PyTorch models store the learned parameters in an internal
state dictionary, called ``state_dict``. These can be persisted via the ``torch.save``
method:



In [None]:
#这行代码的作用是保存训练好的模型权重（weights），以便之后加载并继续使用。
torch.save(model.state_dict(), 'model_weights.pth')

To load model weights, you need to create an instance of the same model first, and then load the parameters
using ``load_state_dict()`` method.



In [None]:
new_model = TextClassificationModel(vocab_size, emsize, num_class) # we do not specify ``weights``, i.e. create untrained model
# 加载上面训练好的model
new_model.load_state_dict(torch.load('model_weights.pth'))

<All keys matched successfully>

If we plan to use our model, but not train it further, then we call `model.eval()`. This is a standard way to tell the model to set itself up for use, rather than training (e.g., it disables dropout). Failing to do this could yield inconsistent outputs.



In [121]:
new_model.eval()

TextClassificationModel(
  (embedding): EmbeddingBag(16377, 64, mode='mean')
  (fc): Linear(in_features=64, out_features=2, bias=True)
)

## Saving and Loading Models with Shapes
When loading model weights, we needed to instantiate the model class first, because the class
defines the structure of a network. We might want to save the structure of this class together with
the model, in which case we can pass `model` (and not `model.state_dict()`) to the save function:



In [None]:
# 保存整个模型，不仅包括模型权重，还包括模型结构
torch.save(model, 'model.pth')

We can then load the model like this:



In [None]:
# NOTE: The `weights_only=False` argument should only be used very carefully
# since it enables execution of code in the `model.pth` file.
# It is necessary here because of the custom class we defined for the dataloader.
# False: 还原整个模型（包括结构 + 参数），但可能执行 model.pth 里的代码
# True: 只加载 state_dict()，不会执行 model.pth 里的代码
model = torch.load('model.pth', weights_only=False)

This approach uses Python [pickle](https://docs.python.org/3/library/pickle.html) module when serializing the model, thus it relies on the actual class definition to be available when loading the model.



# Task 1

Adapt the model above to have a hidden layer with 100 dimensions and a tanh activation. The model should:

1. Use an embedding to represent words
2. Use a linear layer to convert them to the hidden dimension
3. Apply a tanh activation function (see [this page](https://pytorch.org/docs/stable/generated/torch.tanh.html#torch.tanh))
4. Use a linear layer to get scores across the possible labels

In [124]:
class MyTextClassificationModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_class, hidden):
        super(MyTextClassificationModel, self).__init__()
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=False)
        self.fc = nn.Linear(embed_dim, hidden)
        self.fc2 = nn.Linear(hidden, num_class)
        self.init_weights()

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()
        self.fc2.weight.data.uniform_(-initrange, initrange)
        self.fc2.bias.data.zero_()

    def forward(self, text, offsets):
        embedded = self.embedding(text, offsets)
        hidden_rep = torch.tanh(self.fc(embedded))
        return self.fc2(hidden_rep)

# Everything below here is copied from above, except with the model renamed

train_iter = CustomTextDataset("mod-train.jsonl")
num_class = len(set([label for (label, text) in train_iter]))
vocab = Vocab(yield_tokens(train_iter))
emsize = 64
hidden = 100
new_model = MyTextClassificationModel(vocab_size, emsize, num_class, hidden).to(device)

# Hyperparameters
learning_rate = 5
batch_size = 64
epochs = 5

loss_function = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(new_model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.1)
total_accu = None

train_dataset = CustomTextDataset("mod-train.jsonl")
valid_dataset = CustomTextDataset("mod-validation.jsonl")
test_dataset = CustomTextDataset("mod-test.jsonl")
num_train = len(train_dataset)

train_dataloader = DataLoader(
    train_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_batch
)
valid_dataloader = DataLoader(
    valid_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_batch
)
test_dataloader = DataLoader(
    test_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_batch
)

for epoch in range(1, epochs + 1):
    epoch_start_time = time.time()
    train_loop(train_dataloader, epoch, loss_function, optimizer, new_model)
    accu_val = test_loop(valid_dataloader, loss_function, new_model)
    if total_accu is not None and total_accu > accu_val:
        scheduler.step()
    else:
        total_accu = accu_val
    print("-" * 59)
    print(
        "| end of epoch {:3d} | time: {:5.2f}s | "
        "valid accuracy {:8.3f} ".format(
            epoch, time.time() - epoch_start_time, accu_val
        )
    )
    print("-" * 59)

Read 11766 11766
Read 11766 11766
Read 2782 2782
Read 2741 2741
-----------------------------------------------------------
| end of epoch   1 | time:  1.09s | valid accuracy    0.955 
-----------------------------------------------------------
-----------------------------------------------------------
| end of epoch   2 | time:  0.96s | valid accuracy    0.953 
-----------------------------------------------------------
-----------------------------------------------------------
| end of epoch   3 | time:  0.92s | valid accuracy    0.955 
-----------------------------------------------------------
-----------------------------------------------------------
| end of epoch   4 | time:  1.06s | valid accuracy    0.955 
-----------------------------------------------------------
-----------------------------------------------------------
| end of epoch   5 | time:  0.98s | valid accuracy    0.955 
-----------------------------------------------------------


# Task 2

For the original model, let's look at some properties of the embeddings. You may find `model.embedding.weight.data.tolist()` useful as a way to convert the embeddings into a Python list of lists.

1. What are the minimum and maximum values in the embeddings?
2. If you put the values in buckets of width 0.1 (ie., 0 to 0.1, 0.1 to 0.2, 0.2 to 0.3, etc), what is the distribution? (you can be approximate, any form of rounding is fine)

This type of analysis can reveal if there are strange asymmetries in the weights we are learning.

In [125]:
nums = model.embedding.weight.data.tolist()

min_num, max_num = 0, 0
buckets = {}
for subnums in nums:
    for num in subnums:
        min_num = min(min_num, num)
        max_num = max(max_num, num)
        rounded = int(num * 10) / 10
        buckets[rounded] = buckets.get(rounded, 0) + 1
        
print(min_num, max_num)
bucket_keys = sorted(buckets.keys())
for bucket in bucket_keys:
    print(bucket, buckets[bucket])

-0.664068341255188 0.6388050317764282
-0.6 6
-0.5 706
-0.4 104544
-0.3 104625
-0.2 104653
-0.1 104649
0.0 209339
0.1 105009
0.2 104738
0.3 105161
0.4 103946
0.5 748
0.6 4
