## [NLP FROM SCRATCH: CLASSIFYING NAMES WITH A CHARACTER-LEVEL RNN](https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html#nlp-from-scratch-classifying-names-with-a-character-level-rnn)

##### We will be building and training a basic character-level RNN to classify words. This tutorial, along with the following two, show how to do preprocess data for NLP modeling “from scratch”, in particular not using many of the convenience functions of torchtext, so you can see how preprocessing for NLP modeling works at a low level.

##### A character-level RNN reads words as a series of characters - outputting a prediction and “hidden state” at each step, feeding its previous hidden state into each next step. We take the final prediction to be the output, i.e. which class the word belongs to.

#### Specifically, we’ll train on a few thousand surnames from 18 languages of origin, and predict which language a name is from based on the spelling:

In [None]:
from glob import glob

In [None]:
import string

In [None]:
from tqdm import tqdm
import urllib
from zipfile import ZipFile
import os

In [None]:
url = "https://download.pytorch.org/tutorial/data.zip"

In [None]:
home = os.environ['HOME']
data_dir = f"{home}/torch/"
tar_file = data_dir + url.split('/')[-1]

In [None]:
class TqdmUpTo(tqdm):
    def update_to(self, b=1, bsize=1, tsize=None):
        if tsize is not None:
            self.total = tsize
        self.update(b * bsize - self.n)

In [None]:
if not os.path.isdir(data_dir):
    os.mkdir(data_dir)

In [None]:
with TqdmUpTo(unit='B', unit_scale=True, miniters=1, desc=tar_file) as t:
    urllib.request.urlretrieve(url=url, filename=tar_file, reporthook=t.update_to)

In [None]:

with ZipFile(tar_file, "r") as zip:
    zip.extractall(data_dir)

In [None]:
for r, d, files in os.walk(data_dir):
    print(r, d, files)

In [None]:
glob(data_dir+"data/names/*.txt")

In [57]:
all_letters = string.ascii_letters +" .,;'"

In [56]:
import unicodedata