## 1. Представление и предобработка текстовых данных 

1.1 Операции по предобработке:
* токенизация
* стемминг / лемматизация
* удаление стоп-слов
* удаление пунктуации
* приведение к нижнему регистру
* любые другие операции над текстом

In [None]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem.snowball import SnowballStemmer

In [None]:
text = 'Select your preferences and run the install command. Stable represents the most currently tested and supported version of PyTorch. Note that LibTorch is only available for C++'

Реализовать функцию `preprocess_text(text: str) -> str`, которая:
* приводит строку к нижнему регистру
* заменяет все символы, кроме a-z, A-Z и знаков .,!? на пробел


In [None]:
import re
# reg = re.compile(r"[^a-z A-Z.]")
def preprocess_text(text: str) -> str:
    return re.sub(r"[^a-z A-Z\.]", " ", text.lower())
preprocess_text(text)

'select your preferences and run the install command. stable represents the most currently tested and supported version of pytorch. note that libtorch is only available for c  '

1.2 Представление текстовых данных при помощи бинарного кодирования


Представить первое предложение из `text` в виде тензора `sentence_t`: `sentence_t[i] == 1`, если __слово__ с индексом `i` присуствует в предложении.

In [None]:
import nltk
from pprint import pprint
nltk.download('punkt') # ноебходимо скачать для полноценной работы

In [None]:
text.split(".")[0] # первое предложение 

'Select your preferences and run the install command'

In [None]:
tokens = list(set(word_tokenize(preprocess_text(text))))
def get_embed(text,tokens):
    text = preprocess_text(text)
    x = [0]* len(tokens)
    for word in word_tokenize(text):
        x[tokens.index(word)] = 1
    return x


    
get_embed(text.split(".")[0],tokens)

[0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0]

In [None]:
print(tokens)

['pytorch', 'of', 'the', 'c', '.', 'and', 'run', 'libtorch', 'most', 'stable', 'only', 'your', 'currently', 'supported', 'available', 'is', 'install', 'select', 'represents', 'tested', 'note', 'command', 'that', 'preferences', 'for', 'version']


## 2. Классификация фамилий по национальности

Датасет: https://disk.yandex.ru/d/owHew8hzPc7X9Q?w=1

2.1 Считать файл `surnames/surnames.csv`. 

2.2 Закодировать национальности числами, начиная с 0.

2.3 Разбить датасет на обучающую и тестовую выборку

2.4 Реализовать класс `Vocab` (токен = __символ__)

2.5 Реализовать класс `SurnamesDataset`

2.6. Обучить классификатор.

2.7 Измерить точность на тестовой выборке. Проверить работоспособность модели: прогнать несколько фамилий студентов группы через модели и проверить результат. Для каждой фамилии выводить 3 наиболее вероятных предсказания.

In [None]:
import pandas as pd
from sklearn import preprocessing,model_selection

In [None]:
le = preprocessing.LabelEncoder() # энкодер - преобразует название класса  в номер

In [None]:
df = pd.read_csv("surnames.csv")
df["label"] = le.fit_transform(df.nationality)
df.surname = df.surname.apply(lambda x:x.lower())
df

Unnamed: 0,surname,nationality,label
0,woodford,English,4
1,coté,French,5
2,kore,English,4
3,koury,Arabic,0
4,lebzak,Russian,14
...,...,...,...
10975,quraishi,Arabic,0
10976,innalls,English,4
10977,król,Polish,12
10978,purvis,English,4


Разделение на обучающую и тестовую выборку

In [None]:
X_train, X_test, y_train, y_test = model_selection.train_test_split( df.surname, df.label, test_size=0.1, random_state=42, statify=df.label)

list(map(len,[X_train, X_test, y_train, y_test]))

[9882, 1098, 9882, 1098]

In [None]:
from collections import Counter

In [None]:
class Vocab:  
  def __init__(self, data):
    letters = list(set(data))
    forward = dict((j,i) for i,j in enumerate(letters))
    backward = dict((j,i) for i,j in forward.items())
    print("forward",forward)
    print("backward",backward)
    self.idx_to_token = backward
    self.token_to_idx = forward
    self.vocab_len = len(forward)


vat = Vocab("".join(df.surname.values))

forward {'è': 0, 'ç': 1, 'õ': 2, 'l': 3, 'ü': 4, 'n': 5, 'c': 6, 'ż': 7, 'v': 8, 'à': 9, 'k': 10, 'ń': 11, 'j': 12, 'f': 13, 'ß': 14, 'ś': 15, 'o': 16, 'ä': 17, 'ê': 18, 'r': 19, "'": 20, 'ñ': 21, 'í': 22, 'i': 23, 'w': 24, 'u': 25, 'x': 26, 'ù': 27, '/': 28, 'ó': 29, 'á': 30, 'h': 31, 'ö': 32, 'ì': 33, 'b': 34, 'd': 35, 'g': 36, 'ą': 37, ':': 38, 'ł': 39, 'e': 40, 'z': 41, '1': 42, 'ò': 43, 't': 44, 'ã': 45, 'p': 46, 'y': 47, 'a': 48, 's': 49, 'ú': 50, 'é': 51, 'q': 52, '-': 53, 'm': 54}
backward {0: 'è', 1: 'ç', 2: 'õ', 3: 'l', 4: 'ü', 5: 'n', 6: 'c', 7: 'ż', 8: 'v', 9: 'à', 10: 'k', 11: 'ń', 12: 'j', 13: 'f', 14: 'ß', 15: 'ś', 16: 'o', 17: 'ä', 18: 'ê', 19: 'r', 20: "'", 21: 'ñ', 22: 'í', 23: 'i', 24: 'w', 25: 'u', 26: 'x', 27: 'ù', 28: '/', 29: 'ó', 30: 'á', 31: 'h', 32: 'ö', 33: 'ì', 34: 'b', 35: 'd', 36: 'g', 37: 'ą', 38: ':', 39: 'ł', 40: 'e', 41: 'z', 42: '1', 43: 'ò', 44: 't', 45: 'ã', 46: 'p', 47: 'y', 48: 'a', 49: 's', 50: 'ú', 51: 'é', 52: 'q', 53: '-', 54: 'm'}


In [None]:
import torch

In [None]:
class SurnamesDataset(torch.utils.data.Dataset):
  def __init__(self, X, y, vocab: Vocab):
    self.X = X
    self.y = y
    self.vocab = vocab

  def vectorize(self, surname):
    x = [0]* self.vocab.vocab_len
    for letter in surname:
        x[self.vocab.token_to_idx[letter]] = 1
    return x
    
  def __len__(self):
    return len(self.X)

  def __getitem__(self, idx):

    return torch.tensor(self.vectorize(self.X[idx])), self.y[idx]

vat = Vocab("".join(df.surname.values))
sur_train = SurnamesDataset(X_train.values,y_train.values,vat)
sur_test = SurnamesDataset(X_test.values,y_test.values,vat)


forward {'è': 0, 'ç': 1, 'õ': 2, 'l': 3, 'ü': 4, 'n': 5, 'c': 6, 'ż': 7, 'v': 8, 'à': 9, 'k': 10, 'ń': 11, 'j': 12, 'f': 13, 'ß': 14, 'ś': 15, 'o': 16, 'ä': 17, 'ê': 18, 'r': 19, "'": 20, 'ñ': 21, 'í': 22, 'i': 23, 'w': 24, 'u': 25, 'x': 26, 'ù': 27, '/': 28, 'ó': 29, 'á': 30, 'h': 31, 'ö': 32, 'ì': 33, 'b': 34, 'd': 35, 'g': 36, 'ą': 37, ':': 38, 'ł': 39, 'e': 40, 'z': 41, '1': 42, 'ò': 43, 't': 44, 'ã': 45, 'p': 46, 'y': 47, 'a': 48, 's': 49, 'ú': 50, 'é': 51, 'q': 52, '-': 53, 'm': 54}
backward {0: 'è', 1: 'ç', 2: 'õ', 3: 'l', 4: 'ü', 5: 'n', 6: 'c', 7: 'ż', 8: 'v', 9: 'à', 10: 'k', 11: 'ń', 12: 'j', 13: 'f', 14: 'ß', 15: 'ś', 16: 'o', 17: 'ä', 18: 'ê', 19: 'r', 20: "'", 21: 'ñ', 22: 'í', 23: 'i', 24: 'w', 25: 'u', 26: 'x', 27: 'ù', 28: '/', 29: 'ó', 30: 'á', 31: 'h', 32: 'ö', 33: 'ì', 34: 'b', 35: 'd', 36: 'g', 37: 'ą', 38: ':', 39: 'ł', 40: 'e', 41: 'z', 42: '1', 43: 'ò', 44: 't', 45: 'ã', 46: 'p', 47: 'y', 48: 'a', 49: 's', 50: 'ú', 51: 'é', 52: 'q', 53: '-', 54: 'm'}


In [None]:
train_dataloader = torch.utils.data.DataLoader(sur_train, batch_size=64, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(sur_test, batch_size=64, shuffle=True)


In [None]:
x,y = next(iter(train_dataloader)) 
x.shape,torch.sum(x,axis=1),y # проверка размеров 

(torch.Size([64, 55]),
 tensor([ 7,  8,  7,  3,  5,  7,  9,  5,  8,  5,  7,  7,  5,  5,  4,  2, 10,  9,
          4,  5,  5,  6,  6,  6,  5,  6,  7,  5,  4,  5, 10,  6,  9,  4,  9,  6,
          3,  6,  6,  5,  7,  4,  5, 10,  3,  6,  5,  6,  3,  6,  6,  5,  5,  8,
          6,  4,  6,  6,  3,  3,  4,  7,  5,  9]),
 tensor([15, 10,  6,  4,  4,  4, 14, 14,  7,  3,  4,  9, 14,  8,  2,  0, 14,  7,
         16,  4,  4, 14, 10,  0,  4, 14,  2,  1,  2,  0, 14,  4, 14,  0,  4,  2,
         14,  4, 14,  4, 14,  3, 17,  5,  0,  9,  2,  2,  0,  4,  9,  4, 14, 14,
          7,  6, 14,  4, 16,  1,  0, 16,  2, 14], dtype=torch.int32))

In [None]:
import numpy as np
import torch
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
from torch import nn
from torch import optim
import torch.nn.functional as F
import seaborn as sns
import pandas as pd
from tqdm.notebook import tqdm
from torchvision import models

%matplotlib inline

In [None]:
class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(55, 40)
        self.fc2 = nn.Linear(40, 20)
        self.fc3 = nn.Linear(20, 18)
        self.fc4 = nn.Linear(18, 18)
        
    def forward(self, x):        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x

In [None]:
model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

переменная images использована из предыдущего пайплайна
Оставил ее такой же (#22)

In [None]:
epochs = 50
train_losses, test_losses = [], []

for e in range(epochs):
    running_loss = 0
    for images, labels in train_dataloader:
        # images = images.view(images.shape[0], -1)
        images = images.type(torch.float)
        # print(images.shape)
        labels = labels.type(torch.LongTensor)
        optimizer.zero_grad()
        logps = model(images)
        # print(logps)
        # print(labels)
        loss = criterion(logps, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        test_loss = 0
        accuracy = 0
        # Turn off gradients for validation, saves memory and computations
        with torch.no_grad():
            for test_images, test_labels in test_dataloader:
                test_images = test_images.type(torch.float)
                logps = model(test_images)
                test_labels = test_labels.type(torch.LongTensor)
                test_loss += criterion(logps, test_labels)
                ps = torch.exp(logps)
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class == test_labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor))
        
        train_losses.append(running_loss/len(test_dataloader))
        test_losses.append(test_loss/len(test_dataloader))
        
        print("Epoch {}/{}..".format(e+1, epochs),
              "Training loss: {:.3f}..".format(train_losses[-1]),
              "Test loss: {:.3f}..".format(test_losses[-1]),
              "Test Accuracy: {:.3f}%".format(accuracy/len(test_dataloader)))



Epoch 1/50.. Training loss: 20.125.. Test loss: 2.091.. Test Accuracy: 0.405%
Epoch 2/50.. Training loss: 16.486.. Test loss: 1.879.. Test Accuracy: 0.459%
Epoch 3/50.. Training loss: 15.303.. Test loss: 1.793.. Test Accuracy: 0.470%
Epoch 4/50.. Training loss: 14.931.. Test loss: 1.760.. Test Accuracy: 0.489%
Epoch 5/50.. Training loss: 14.662.. Test loss: 1.705.. Test Accuracy: 0.505%
Epoch 6/50.. Training loss: 14.365.. Test loss: 1.678.. Test Accuracy: 0.509%
Epoch 7/50.. Training loss: 14.028.. Test loss: 1.650.. Test Accuracy: 0.507%
Epoch 8/50.. Training loss: 13.656.. Test loss: 1.607.. Test Accuracy: 0.529%
Epoch 9/50.. Training loss: 13.362.. Test loss: 1.599.. Test Accuracy: 0.523%
Epoch 10/50.. Training loss: 13.107.. Test loss: 1.589.. Test Accuracy: 0.530%
Epoch 11/50.. Training loss: 12.901.. Test loss: 1.545.. Test Accuracy: 0.536%
Epoch 12/50.. Training loss: 12.725.. Test loss: 1.535.. Test Accuracy: 0.543%
Epoch 13/50.. Training loss: 12.571.. Test loss: 1.526.. Test

In [None]:
print(torch.__version__)

1.13.0+cpu


In [None]:
def vectorize(surname):
    x = [0]* vat.vocab_len
    for letter in surname:
        x[vat.token_to_idx[letter]] = 1
    return x
data = torch.tensor(vectorize("kuznetsov")).type(torch.float).unsqueeze(0)
# data.shape
# model(data)
top_p, top_class = torch.exp(model(data)).topk(3, dim=1)

top_p, top_class, le.inverse_transform(top_class[0])


(tensor([[0.9609, 0.0136, 0.0130]], grad_fn=<TopkBackward0>),
 tensor([[14,  2,  7]]),
 array(['Russian', 'Czech', 'Greek'], dtype=object))

In [None]:
def vectorize(surname):
    x = [0]* vat.vocab_len
    for letter in surname:
        x[vat.token_to_idx[letter]] = 1
    return x
data = torch.tensor(vectorize("smirnov")).type(torch.float).unsqueeze(0)
# data.shape
# model(data)
top_p, top_class = torch.exp(model(data)).topk(3, dim=1)

top_p, top_class, le.inverse_transform(top_class[0])

(tensor([[0.4514, 0.3569, 0.0437]], grad_fn=<TopkBackward0>),
 tensor([[14,  4,  2]]),
 array(['Russian', 'English', 'Czech'], dtype=object))

## 3. Классификация обзоров ресторанов

Датасет: https://disk.yandex.ru/d/nY1o70JtAuYa8g


3.1 Считать файл `yelp/raw_train.csv`. Оставить от исходного датасета 10% строчек.

3.2 Воспользоваться функцией `preprocess_text` из 1.1 для обработки текста отзыва. Закодировать рейтинг числами, начиная с 0.

3.3 Разбить датасет на обучающую и тестовую выборку

3.4 Реализовать класс `Vocab` (токен = слово)

3.5 Реализовать класс `ReviewDataset`

3.6 Обучить классификатор

3.7 Измерить точность на тестовой выборке. Проверить работоспособность модели: придумать небольшой отзыв, прогнать его через модель и вывести номер предсказанного класса (сделать это для явно позитивного и явно негативного отзыва)


In [None]:
df = pd.read_csv("raw_test.csv",header=None)
df.columns = ["label", "text"]
df.label = df.label-1
df.head()

Unnamed: 0,label,text
0,0,Ordered a large Mango-Pineapple smoothie. Stay...
1,1,Quite a surprise! \n\nMy wife and I loved thi...
2,0,"First I will say, this is a nice atmosphere an..."
3,1,I was overall pretty impressed by this hotel. ...
4,0,Video link at bottom review. Worst service I h...


In [None]:
X = df.text.apply(lambda x: preprocess_text(x.replace("\\n", "").replace("."," "))).values
y = df.label.values

print("calculating tokens")

tokens = list(
    set(
        word_tokenize(
                " ".join(
                    X
                )
        )
    )
)
tokens

calculating tokens


['suede',
 'mophead',
 'conspicuously',
 'randomly',
 'foxx',
 'checkins',
 'chs',
 'frickin',
 'tequileria',
 'needs',
 'verified',
 'overeating',
 'unneeded',
 'belting',
 'productive',
 'juicenot',
 'jaclyn',
 'bundlet',
 'isconsidered',
 'elizabeth',
 'tortured',
 'thr',
 'penalties',
 'rtm',
 'griffin',
 'nde',
 'wiggling',
 'optimism',
 'voudrez',
 'drifting',
 'macy',
 'secruity',
 'complimentarily',
 'betsey',
 'plot',
 'pla',
 'occupiers',
 'carmas',
 'slidesi',
 'coffee',
 'huber',
 'succomb',
 'aria',
 'ornaments',
 'isi',
 'choisir',
 'smallish',
 'earpierce',
 'apprendre',
 'mirroring',
 'pins',
 'attendee',
 'allllllllllllllllllllllllll',
 'outreach',
 'crisp',
 'schmmamit',
 'estrellas',
 'pyrex',
 'pizzettes',
 'recomended',
 'nicolle',
 'roach',
 'dermaplaning',
 'invitant',
 'spoiled',
 'stimmig',
 'montesano',
 'jurisdiction',
 'habernero',
 'toilet',
 'portion',
 'scarpetta',
 'relative',
 'eigentlich',
 'panda',
 'sunglass',
 'revising',
 'onlkxaqfq',
 'chivalrousl

In [None]:
class Vocab: # что мне снова с этим делать?
  def __init__(self, all_tokens):
    # self.idx_to_token = ...
    # self.token_to_idx = ...
    # self.vocab_len = ...
    forward = dict((j,i) for i,j in enumerate(all_tokens))
    backward = dict((j,i) for i,j in forward.items())
    self.idx_to_token = backward
    self.token_to_idx = forward
    self.vocab_len = len(forward)
vat = Vocab(tokens)

In [None]:
X_train, X_test, y_train, y_test = model_selection.train_test_split( X, y, test_size=0.2, random_state=42)

list(map(len,[X_train, X_test, y_train, y_test]))

[30400, 7600, 30400, 7600]

In [None]:
class ReviewDataset1(torch.utils.data.Dataset):
  def __init__(self, X, y, vocab: Vocab):
    self.X = X
    self.y = y
    self.vocab = vocab

  def vectorize(self, review):
    """Генерирует представление отзыва review при помощи бинарного кодирования (см. 1.2)"""
    x = [0] * self.vocab.vocab_len

    for letter in word_tokenize(preprocess_text(review.replace("\\n", "").replace(".", " "))):
      x[self.vocab.token_to_idx[letter]] = 1
    return x

  def __len__(self):
    return len(self.X)

  def __getitem__(self, idx):
    return torch.tensor(self.vectorize(self.X[idx])), self.y[idx]

rev_train = ReviewDataset1(X_train,y_train,vat)
rev_test = ReviewDataset1(X_test,y_test,vat)


In [None]:
import numpy as np
import torch
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
from torch import nn
from torch import optim
import torch.nn.functional as F
import seaborn as sns
import pandas as pd
from tqdm.notebook import tqdm
from torchvision import models

%matplotlib inline

In [None]:
train_dataloader = torch.utils.data.DataLoader(rev_train, batch_size=64, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(rev_test, batch_size=64, shuffle=True)

In [None]:
x, y = next(iter(train_dataloader))
x.shape,torch.sum(x,axis=1),y # проверка размеров 

(torch.Size([64, 56938]),
 tensor([ 34, 103,  19, 177,  77,  59,  26,  78, 339,  37,  55,  74, 162,  22,
          55,  63,  83,  83,  46,  54,  79, 162,  60, 103,  43,  23,  44,  41,
          76,  24,  51,  61,  41,  17,  83,  17,  54, 264,  85,  56,  48,  98,
          76,  98,  70,  48,  10, 257, 117,  32, 168, 132,  75,  59, 112,  86,
          56, 202,  66,  37,  56,  57,  56, 131]),
 tensor([1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0,
         1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1,
         0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0]))

In [None]:
class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(56938, 3_000)
        self.fc2 = nn.Linear(3_000, 2_000)
        self.fc3 = nn.Linear(2_000, 500)
        self.fc4 = nn.Linear(500, 2)
        
    def forward(self, x):        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x

In [None]:
model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
epochs = 10
train_losses, test_losses = [], []

model = model.to("cuda")
for e in range(epochs):
    running_loss = 0
    for images, labels in tqdm(train_dataloader):
        # print("1111")
        # images = images.view(images.shape[0], -1)
        images = images.type(torch.float).to("cuda")
        # print(images.shape)
        optimizer.zero_grad()
        logps = model(images)
        # print(logps)
        # print(labels)
        loss = criterion(logps, labels.to("cuda"))
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        test_loss = 0
        accuracy = 0
        # Turn off gradients for validation, saves memory and computations
        with torch.no_grad():
            for test_images, test_labels in test_dataloader:
                test_images = test_images.type(torch.float).to("cuda")
                logps = model(test_images)
                test_loss += criterion(logps, test_labels.to("cuda"))
                ps = torch.exp(logps).detach().cpu()
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class == test_labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor))
        
        train_losses.append(running_loss/len(test_dataloader))
        test_losses.append(test_loss/len(test_dataloader))
        
        print("Epoch {}/{}..".format(e+1, epochs),
              "Training loss: {:.3f}..".format(train_losses[-1]),
              "Test loss: {:.3f}..".format(test_losses[-1]),
              "Test Accuracy: {:.3f}%".format(accuracy/len(test_dataloader)))

  0%|          | 0/475 [00:00<?, ?it/s]

Epoch 1/10.. Training loss: 1.042.. Test loss: 0.224.. Test Accuracy: 0.916%


  0%|          | 0/475 [00:00<?, ?it/s]

Epoch 2/10.. Training loss: 0.376.. Test loss: 0.368.. Test Accuracy: 0.905%


  0%|          | 0/475 [00:00<?, ?it/s]

KeyboardInterrupt: 

1 - положительный отзыв, 0 - отрицательный

In [None]:
data = torch.tensor(rev_train.vectorize("good service i like it my wife become pregnant")).type(torch.float).unsqueeze(0)
torch.exp(model(data.to("cuda"))).topk(1, dim=1)

torch.return_types.topk(
values=tensor([[0.9789]], device='cuda:0', grad_fn=<TopkBackward0>),
indices=tensor([[1]], device='cuda:0'))

In [None]:
data = torch.tensor(rev_train.vectorize("bad service i dislike it my husband become pregnant")).type(torch.float).unsqueeze(0)
torch.exp(model(data.to("cuda"))).topk(1, dim=1)

torch.return_types.topk(
values=tensor([[0.9787]], device='cuda:0', grad_fn=<TopkBackward0>),
indices=tensor([[0]], device='cuda:0'))