# Pytorch Network Examples

(C) 2023-2024 by [Damir Cavar](http://damir.cavar.me/)

**Version:** 1.1, January 2024

**Download:** This and various other Jupyter notebooks are available from my [GitHub repo](https://github.com/dcavar/python-tutorial-notebooks).

**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))

This tutorial was developed as part of the course material for the course Advanced Natural Language Processing at [Indiana University](https://www.indiana.edu/).

This document is based on the [Pytorch Neural Network tutorial](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html) and the [Pytorch Text Classification Tutorial](https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html).

**Prerequisites:**

In [None]:
!pip install -U torch torchtext

The prerequisites are of course an installed [Pytorch](https://pytorch.org/) library and additionally the [Portalocker](https://pypi.org/project/portalocker/) module.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

[Torchdata](https://github.com/pytorch/data) is no longer supported, but you will need to install it anyway. See the [GitHub repo](https://github.com/pytorch/data) for more details. You will also need to install the [torchtext](https://pytorch.org/text/stable/index.html) module.

In [3]:
from torchtext.datasets import AG_NEWS

The [AG-NEWS data set](https://github.com/mhjabreel/CharCnn_Keras) can be independently pulled from the GitHub repo [CharCnn_Keras](https://github.com/mhjabreel/CharCnn_Keras). See the [repo documentation](https://github.com/mhjabreel/CharCnn_Keras) for more information on sources and background. We can load the [AG_NEWS data set](https://github.com/mhjabreel/CharCnn_Keras) from `torchdata` as above and look at the content of the training data set:

In [4]:
train_iter = iter(AG_NEWS(split="train"))

In [None]:
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

In [None]:
tokenizer = get_tokenizer("basic_english")
train_iter = AG_NEWS(split="train")

In [None]:
def yield_tokens(data_iter):
    for _, text in data_iter:
        yield tokenizer(text)

In [None]:
vocab = build_vocab_from_iterator(yield_tokens(train_iter), specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])

The list of tokens are converted into lists of integers in vocab:

In [None]:
vocab(['here', 'is', 'an', 'example'])

In [None]:
text_pipeline = lambda x: vocab(tokenizer(x))
label_pipeline = lambda x: int(x) - 1

In [None]:
text_pipeline('here is the an example')

In [None]:
label_pipeline('10')