# Deep Learning Fundamentals

Welcome to Deep Learning Fundamentals! This lesson is your gateway to the exciting field of deep learning.

Deep learning is the cornerstone of many recent breakthroughs in AI, and by understanding its fundamentals, you're setting yourself up for success in creating innovative generative AI applications.

## Lesson Objectives
By the end of this lesson, you will be able to:

- Understand the fundamentals of the perceptron algorithm and basic neural network architecture
-  PyTorch for practical deep-learning tasks
- Apply Hugging Face for practical deep learning tasks
- Use transfer learning to leverage pre-trained models for a variety of machine learning tasks

## What Is a Perceptron

https://www.youtube.com/watch?v=U9eoVk7M6iU

![image.png](attachment:dcf61fe6-18d4-454b-9a4d-d3a5914cfe29.png)

A perceptron is an essential component in the world of AI, acting as a binary classifier capable of deciding whether data, like an image, belongs to one class or another. It works by adjusting its weighted inputs—think of these like dials fine-tuning a radio signal—until it becomes better at predicting the right class for the data. This process is known as learning, and it shows us that even complex tasks start with small, simple steps.

## Technical Terms Explained:
**Perceptron**: A basic computational model in machine learning that makes decisions by weighing input data. It's like a mini-decision maker that labels data as one thing or another.

**Binary Classifier**: A type of system that categorizes data into one of two groups. Picture a light switch that can be flipped to either on or off.

**Vector of Numbers**: A sequence of numbers arranged in order, which together represent one piece of data.

**Activation Function**: A mathematical equation that decides whether the perceptron's calculated sum from the inputs is enough to trigger a positive or negative output.

## The Multi-Layer Perceptron

https://www.youtube.com/watch?v=vCzMfZtrdjk

The multi-layer perceptron is a powerful tool in the world of machine learning, capable of making smart decisions by mimicking the way our brain's neurons work. This amazing system can learn from its experiences, growing smarter over time as it processes information through layers, and eventually, it can predict answers with astonishing accuracy!

## Technical Terms Explained:

**Multi-Layer Perceptron (MLP)**: A type of artificial neural network that has multiple layers of nodes, each layer learning to recognize increasingly complex features of the input data.

**Input Layer**: The first layer in an MLP where the raw data is initially received.

**Output Layer**: The last layer in an MLP that produces the final result or prediction of the network.

**Hidden Layers**: Layers between the input and output that perform complex data transformations.

## Training Deep Neural Networks

https://www.youtube.com/watch?v=qWXYsjRCdNY

We learned that training deep neural networks involves guided adjustments to improve their performance on tasks like image recognition. By gradually refining the network's parameters and learning from mistakes, these networks become smarter and more skilled at predicting outcomes. The marvel of this technology is its ability to turn raw data into meaningful insights.

## Technical Terms Explained:
**Labeled Dataset**: This is a collection of data where each piece of information comes with a correct answer or label. It's like a quiz with the questions and answers already provided.

**Gradient Descent**: This method helps find the best settings for a neural network by slowly tweaking them to reduce errors, similar to finding the lowest point in a valley.

**Cost Function**: Imagine it as a score that tells you how wrong your network's predictions are. The goal is to make this score as low as possible.

**Learning Rate**: This hyperparameter specifies how big the steps are when adjusting the neural network's settings during training. Too big, and you might skip over the best setting; too small, and it'll take a very long time to get there.

**Backpropagation**: Short for backward propagation of errors. This is like a feedback system that tells each part of the neural network how much it contributed to any mistakes, so it can learn and do better next time.

## Exercise: Classification of Handwritten Digits Using an MLP



In this exercise, you will train a multi-layer perceptron (MLP) to classify handwritten digits from the MNIST dataset. The MNIST dataset consists of 28x28 grayscale images of handwritten digits (0 to 9). The task is to classify each image into one of the 10 classes (one for each digit).

First, run through the notebook in the Workspace on your own. Then, you'll be ready to watch my walkthrough video and answer the quiz questions.

### scikit-learn Documentation

https://scikit-learn.org/stable/datasets/loading_other_datasets.html#downloading-datasets-from-the-openml-org-repository

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

https://scikit-learn.org/stable/modules/neural_networks_supervised.html

https://scikit-learn.org/stable/modules/neural_networks_supervised.html

### NumPy Documentation

https://numpy.org/doc/stable/reference/generated/numpy.array.html

https://numpy.org/doc/stable/user/basics.broadcasting.html

## What Is PyTorch

https://www.youtube.com/watch?v=wSC-KYyymsM

PyTorch is a dynamic and powerful tool for building and training machine learning models. It simplifies the process with its fundamental building blocks like tensors and neural networks and offers effective ways to define objectives and improve models using loss functions and optimizers. By leveraging PyTorch, anyone can gain the skills to work with large amounts of data and develop cutting-edge AI applications.

## Resources


https://pytorch.org/

## PyTorch Tensors

https://www.youtube.com/watch?v=hxU91jBuISw

PyTorch tensors are crucial tools in the world of programming and data science, which work somewhat like building blocks helping to shape and manage data effortlessly. These tensors allow us to deal with data in multiple dimensions, which is especially handy when working with things like images or more complex structures. Getting to know tensors is a step forward in understanding how PyTorch simplifies the processes of deep learning, enabling us to perform intricate numerical computations efficiently.

## Technical Terms Explained:
**Tensors**: Generalized versions of vectors and matrices that can have any number of dimensions (i.e. multi-dimensional arrays). They hold data for processing with operations like addition or multiplication.

**Matrix operations**: Calculations involving matrices, which are two-dimensional arrays, like adding two matrices together or multiplying them.

**Scalar values**: Single numbers or quantities that only have magnitude, not direction (for example, the number 7 or 3.14).

**Linear algebra**: An area of mathematics focusing on vector spaces and operations that can be performed on vectors and matrices.

In [2]:
import torch
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [None]:
# Create a 3-dimensional tensor
images = torch.rand((4, 28, 28))

# Get the second image
second_image = images[1]

# Display the image
plt.imshow(second_image, cmap='gray')
plt.axis('off') # disable axes
plt.show()

In [4]:
a = torch.tensor([[1, 1], [1, 0]])

print(a)
print(torch.matrix_power(a, 2))
print(torch.matrix_power(a, 3))
print(torch.matrix_power(a, 4))

tensor([[1, 1],
        [1, 0]])
tensor([[2, 1],
        [1, 1]])
tensor([[3, 2],
        [2, 1]])
tensor([[5, 3],
        [3, 2]])



Images as PyTorch Tensors

In [5]:
import torch

# Create a 3-dimensional tensor
images = torch.rand((4, 28, 28))

# Get the second image
second_image = images[1]

### Displaying Images

In [None]:
plt.imshow(second_image, cmap='gray')
plt.axis('off') # disable axes
plt.show()

### Matrix Multiplication

In [3]:
a = torch.tensor([[1, 1], [1, 0]])

print(a)
# tensor([[1, 1],
#         [1, 0]])

print(torch.matrix_power(a, 2))
# tensor([[2, 1],
#         [1, 1]])

print(torch.matrix_power(a, 3))
# tensor([[3, 2],
#         [2, 1]])

print(torch.matrix_power(a, 4))
# tensor([[5, 3],
#         [3, 2]])

tensor([[1, 1],
        [1, 0]])
tensor([[2, 1],
        [1, 1]])
tensor([[3, 2],
        [2, 1]])
tensor([[5, 3],
        [3, 2]])


## Resources

https://pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html

https://pytorch.org/docs/stable/tensors.html

## PyTorch Neural Networks

https://www.youtube.com/watch?v=RaQhrt5QBIM

PyTorch offers powerful features to create and interlink neural networks, which are key elements in understanding modern artificial intelligence. We explored creating a multi-layer perceptron using PyTorch's nn.Module class and then passed a tensor into it and received the output.

## PyTorch MLP Class

In [5]:
import torch.nn as nn


In [14]:
class MLP(nn.Module):
    def __init__(self, input_size):
        super(MLP, self).__init__()
        self.hidden_layer = nn.Linear(input_size, 64)
        self.output_layer = nn.Linear(64, 2)
        self.activation = nn.ReLU()

    def forward(self, x):
        x = self.activation(self.hidden_layer(x))
        return self.output_layer(x)


In [15]:
model = MLP(input_size=10)
print(model)

MLP(
  (hidden_layer): Linear(in_features=10, out_features=64, bias=True)
  (output_layer): Linear(in_features=64, out_features=2, bias=True)
  (activation): ReLU()
)


In [16]:
model.forward(torch.rand(10))

tensor([-0.1876,  0.1264], grad_fn=<ViewBackward0>)

## Resources

https://pytorch.org/tutorials/beginner/nn_tutorial.html

https://pytorch.org/docs/stable/nn.html

https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module

https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear

https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU

## PyTorch Loss Functions

https://www.youtube.com/watch?v=ZI5OCPZoBAo

PyTorch loss functions are essential tools that help in improving the accuracy of a model by measuring errors. These functions come in different forms to tackle various problems, like deciding between categories (classification) or predicting values (regression). Understanding and using these functions correctly is key to making smart, effective models that do a great job at the tasks they're designed for!

### Cross-Entropy Loss in PyTorch

In [17]:
loss_function = nn.CrossEntropyLoss()

# Our dataset contains a single image of a dog, where 
# cat = 0 and dog = 1 (corresponding to index 0 and 1)
target_tensor = torch.tensor([1])
target_tensor

tensor([1])

In [20]:
# Prediction: Most likely a dog (index 1 is higher)
# Note that the values do not need to sum to 1

predicted_tensor = torch.tensor([[2.0, 5.0]])
loss_value = loss_function(predicted_tensor, target_tensor)
loss_value

tensor(0.0486)

In [21]:
# Prediction: Slightly more likely a cat (index 0 is higher)
predicted_tensor = torch.tensor([[1.5, 1.1]])
loss_value = loss_function(predicted_tensor, target_tensor)
loss_value

tensor(0.9130)

## Mean Squared Error Loss in PyTorch


In [22]:
# Define the loss function
loss_function = nn.MSELoss()

# Define the predicted and actual values as tensors
predicted_tensor = torch.tensor([320000.0])
actual_tensor = torch.tensor([300000.0])

# Compute the MSE loss
loss_value = loss_function(predicted_tensor, actual_tensor)
print(loss_value.item())

400000000.0


## Technical Terms Explained:
**Loss functions**: They measure how well a model is performing by calculating the difference between the model's predictions and the actual results.

**Cross entropy loss**: This is a measure used when a model needs to choose between categories (like whether an image shows a cat or a dog), and it shows how well the model's predictions align with the actual categories.

**Mean squared error**: This shows the average of the squares of the differences between predicted numbers (like a predicted price) and the actual numbers. It's often used for predicting continuous values rather than categories.

## Resources

https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss

https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss

https://pytorch.org/docs/stable/nn.html#loss-functions

## PyTorch Optimizers

https://www.youtube.com/watch?v=W1eLQ_Z-boM

PyTorch optimizers are important tools that help improve how a neural network learns from data by adjusting the model's parameters. By using these optimizers, like stochastic gradient descent (SGD) with momentum or Adam, we can quickly get started learning!

## Technical Terms Explained:
**Gradients**: Directions and amounts by which a function increases most. The parameters can be changed in a direction opposite to the gradient of the loss function in order to reduce the loss.

**Learning Rate**: This hyperparameter specifies how big the steps are when adjusting the neural network's settings during training. Too big, and you might skip over the best setting; too small, and it'll take a very long time to get there.

**Momentum**: A technique that helps accelerate the optimizer in the right direction and dampens oscillations.

In [None]:
import torch.optim as optim

# Assuming `model` is your defined neural network
optimizer = optim.SGD(model.parameters(), lr==0.01, momentum=0.9)

# lr = 0.01 sets the learning rate to 0.01
# momentum=0.9 smooths out updates and can help training




# Assuming `model` is your defined neural network
optimizer = optim.Adam(model.parameters(), lr==0.01)

# lr = 0.01 sets the learning rate to 0.01

## Resources

https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html

https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD

https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam

https://pytorch.org/docs/stable/optim.html

## PyTorch Datasets and Data Loaders

https://www.youtube.com/watch?v=AY182jC9AS8

PyTorch makes accessing data for your model a breeze! These tools ensure that the flow of information to our AI is just right, making its learning experience effective and fun.

In [26]:
from torch.utils.data import Dataset

# Create a toy dataset
class NumberProductDataset(Dataset):
    def __init__(self, data_range=(1, 10)):
        self.numbers = list(range(data_range[0], data_range[1]))

    def __getitem__(self, index):
        number1 = self.numbers[index]
        number2 = self.numbers[index] + 1
        return (number1, number2), number1 * number2

    def __len__(self):
        return len(self.numbers)

In [28]:
# Instantiate the dataset
dataset = NumberProductDataset(
    data_range=(0, 11)
)

# Access a data sample
data_sample = dataset[3]
data_sample

((3, 4), 12)

In [29]:
## An Example of Data Loader

In [38]:
from torch.utils.data import DataLoader

# Instantiate the dataset
dataset = NumberProductDataset(data_range=(0, 11))

# Create a DataLoader instance
dataloader = DataLoader(dataset, batch_size=4, shuffle=False)

# Iterating over batches

for (num_pairs, products) in dataloader:
    print(num_pairs, products)
    print()

[tensor([0, 1, 2, 3]), tensor([1, 2, 3, 4])] tensor([ 0,  2,  6, 12])

[tensor([4, 5, 6, 7]), tensor([5, 6, 7, 8])] tensor([20, 30, 42, 56])

[tensor([ 8,  9, 10]), tensor([ 9, 10, 11])] tensor([ 72,  90, 110])



## Technical Terms:
**PyTorch Dataset class**: This is like a recipe that tells your computer how to get the data it needs to learn from, including where to find it and how to parse it, if necessary.

**PyTorch Data Loader**: Think of this as a delivery truck that brings the data to your AI in small, manageable loads called batches; this makes it easier for the AI to process and learn from the data.

**Batches**: Batches are small, evenly divided parts of data that the AI looks at and learns from each step of the way.

**Shuffle**: It means mixing up the data so that it's not in the same order every time, which helps the AI learn better.

https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset

https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

https://pytorch.org/docs/stable/data.html

## PyTorch Training Loops

https://www.youtube.com/watch?v=GDVr3fImnnk

A PyTorch training loop is an essential part of building a neural network model, which helps us teach the computer how to make predictions or decisions based on data. By using this loop, we gradually improve our model's accuracy through a process of learning from its mistakes and adjusting.

## Technical Terms Explained:

**Training Loop**: The cycle that a neural network goes through many times to learn from the data by making predictions, checking errors, and improving itself.

**Batches**: Batches are small, evenly divided parts of data that the AI looks at and learns from each step of the way.

**Epochs**: A complete pass through the entire training dataset. The more epochs, the more the computer goes over the material to learn.

**Loss functions**: They measure how well a model is performing by calculating the difference between the model's predictions and the actual results.

**Optimizer**: Part of the neural network's brain that makes decisions on how to change the network to get better at its job.

## Create a Number Sum Dataset

This dataset has two features—a pair of numbers—and a target value—the sum of those two numbers.

Note that this is not actually a good use of deep learning. At the end of our training loop, the model still doesn't know how to add 3 + 7! The idea here is to use a simple example so it's easy to evaluate the model's performance.

In [12]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

In [5]:
class NumberSumDataset(Dataset):
    def __init__(self, data_range=(1, 10)):
        self.numbers = list(range(data_range[0], data_range[1]))

    def __getitem__(self, index):
        number1 = float(self.numbers[index // len(self.numbers)])
        number2 = float(self.numbers[index % len(self.numbers)])
        return torch.tensor([number1, number2]), torch.tensor([number1 + number2])

    def __len__(self):
        return len(self.numbers) ** 2
    

## Inspect the Dataset

In [6]:
dataset = NumberSumDataset(data_range=(1, 100))
for i in range(5):
    print(dataset[i])

(tensor([1., 1.]), tensor([2.]))
(tensor([1., 2.]), tensor([3.]))
(tensor([1., 3.]), tensor([4.]))
(tensor([1., 4.]), tensor([5.]))
(tensor([1., 5.]), tensor([6.]))


In [7]:
len(dataset)

9801

## Define a Simple Model

In [10]:
class MLP(nn.Module):
    def __init__(self, input_size):
        super(MLP, self).__init__()
        self.hidden_layer = nn.Linear(input_size, 128)
        self.output_layer = nn.Linear(128, 1)
        self.activation = nn.ReLU()

    def forward(self, x):
        x = self.activation(self.hidden_layer(x))
        return self.output_layer(x)



## Instantiate Components Needed for Training

In [33]:
dataset = NumberSumDataset(data_range=(0, 100))
dataloader = DataLoader(dataset, batch_size=100, shuffle=True)
model = MLP(input_size=2)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


In [40]:
# Create a Training Loop
for epoch in range(10):
    total_loss = 0.0
    for number_pairs, sums in dataloader: # iterate over the batches
        predictions = model(number_pairs) # Compute the model output
        loss = loss_function(predictions, sums) # Compute the loss
        loss.backward() # Perform backpropagation
        optimizer.step() # Update the parametes
        optimizer.zero_grad() # Zero the gradients

        total_loss += loss.item() # Add the loss for all batches

    # Print the loss for this epoch
    print("Epoch {}: Sum of Batch Lossess = {:.5f}".format(epoch, total_loss))

Epoch 0: Sum of Batch Lossess = 0.21776
Epoch 1: Sum of Batch Lossess = 0.20828
Epoch 2: Sum of Batch Lossess = 0.20975
Epoch 3: Sum of Batch Lossess = 0.20326
Epoch 4: Sum of Batch Lossess = 0.19416
Epoch 5: Sum of Batch Lossess = 0.20175
Epoch 6: Sum of Batch Lossess = 0.18542
Epoch 7: Sum of Batch Lossess = 0.18894
Epoch 8: Sum of Batch Lossess = 0.18817
Epoch 9: Sum of Batch Lossess = 0.18943


In [41]:
# Test the model on 3 + 7
model(torch.tensor([3.0, 7.0]))

tensor([9.9942], grad_fn=<ViewBackward0>)

## Resources

https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html

## What Is Hugging Face

https://www.youtube.com/watch?v=reiuwJAVZVY

Hugging Face is a company making waves in the technology world with its amazing tools for understanding and using human language in computers. Hugging Face offers everything from tokenizers, which help computers make sense of text, to a huge variety of ready-to-go language models, and even a treasure trove of data suited for language tasks.

## Technical Terms:

**Tokenizers**: These work like a translator, converting the words we use into smaller parts and creating a secret code that computers can understand and work with.

**Models**: These are like the brain for computers, allowing them to learn and make decisions based on information they've been fed.

**Datasets**: Think of datasets as textbooks for computer models. They are collections of information that models study to learn and improve.

**Trainers**: Trainers are the coaches for computer models. They help these models get better at their tasks by practicing and providing guidance. HuggingFace Trainers implement the PyTorch training loop for you, so you can focus instead on other aspects of working on the model.

## Resources

https://huggingface.co/

## Hugging Face Tokenizers

https://www.youtube.com/watch?v=_2kGyo9uofk

HuggingFace tokenizers help us break down text into smaller, manageable pieces called tokens. These tokenizers are easy to use and also remarkably fast due to their use of the Rust programming language.

## Technical Terms Explained:

**Tokenization**: It's like cutting a sentence into individual pieces, such as words or characters, to make it easier to analyze.

**Tokens**: These are the pieces you get after cutting up text during tokenization, kind of like individual Lego blocks that can be words, parts of words, or even single letters. These tokens are converted to numerical values for models to understand.

**Pre-trained Model**: This is a ready-made model that has been previously taught with a lot of data.

**Uncased**: This means that the model treats uppercase and lowercase letters as the same.

In [44]:
from transformers import BertTokenizer

# Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# See how many tokens are in the vocabulary

tokenizer.vocab_size

30522

In [46]:
# Tokenize the sentence 
tokens = tokenizer.tokenize('I heart Generative AI')

# Print the tokens
print(tokens)

# Show the token ids assigned to each token
print(tokenizer.convert_tokens_to_ids(tokens))

['i', 'heart', 'genera', '##tive', 'ai']
[1045, 2540, 11416, 6024, 9932]


## Resources

https://huggingface.co/docs/tokenizers/main/en/index

https://huggingface.co/docs/tokenizers/main/en/api/tokenizer

https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer

## Hugging Face Models

https://www.youtube.com/watch?v=xFKrhP2tRWE

Hugging Face models provide a quick way to get started using models trained by the community. With only a few lines of code, you can load a pre-trained model and start using it on tasks such as sentiment analysis.

In [52]:
from transformers import BertForSequenceClassification, BertTokenizer

# Load a pre-trained sentiment analysis model
model_name = "textattack/bert-base-uncased-imdb"
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Tokenize the input sequence
tokenizer = BertTokenizer.from_pretrained(model_name)
inputs = tokenizer("Generative AI is not fun", return_tensors='pt')

# Make prediction
with torch.no_grad():
    outputs = model(**inputs).logits
    probabilities = torch.nn.functional.softmax(outputs, dim=1)
    predicted_class = torch.argmax(probabilities)

# Display sentiment result
if predicted_class == 1:
    print(f"Sentiment: Positive ({probabilities[0][1] * 100:.2f}%)")
else:
    print(f"Sentiment: Negative ({probabilities[0][0] * 100:.2f}%)")

Sentiment: Negative (99.30%)


## Resources

https://huggingface.co/docs/transformers/index

https://huggingface.co/models

https://huggingface.co/textattack/bert-base-uncased-imdb

https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForSequenceClassification

https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html#torch.nn.functional.softmax

https://pytorch.org/docs/stable/generated/torch.argmax.html#torch.argmax

## Hugging Face Datasets

https://www.youtube.com/watch?v=NXscJaem9zc

HuggingFace Datasets library is a powerful tool for managing a variety of data types, like text and images, efficiently and easily. This resource is incredibly fast and doesn't use a lot of computer memory, making it great for handling big projects without any hassle.

## Technical Terms Explained:


**IMDb dataset**: A dataset of movie reviews that can be used to train a machine learning model to understand human sentiments.


**Apache Arrow**: A software framework that allows for fast data processing

Note that this code uses IPython functions (display and HTML) so it should be run in an IPython environment (e.g. Jupyter Notebook).

In [57]:
from datasets import load_dataset
from IPython.display import HTML, display

# Load the IMDB dataset, which contains movie reviews
# and sentiment labels (positive or negative)
dataset = load_dataset("imdb")

# Fetch a review from the training set
review_number = 42
sample_review = dataset["train"][review_number]

display(HTML(sample_review["text"][:450] + "..."))

if sample_review['label'] == 1:
    print("Sentiment: Positive")
else:
    print("Sentiment: Negative")

Sentiment: Negative


https://huggingface.co/docs/datasets/index

https://huggingface.co/datasets

https://huggingface.co/docs/datasets/main/en/package_reference/loading_methods#datasets.load_dataset

https://huggingface.co/datasets/imdb

## Hugging Face Trainers

https://www.youtube.com/watch?v=ffvJ8CPfE9o

Hugging Face trainers offer a simplified approach to training generative AI models, making it easier to set up and run complex machine learning tasks. This tool wraps up the hard parts, like handling data and carrying out the training process, allowing us to focus on the big picture and achieve better outcomes with our AI endeavors.

## Technical Terms Explained:
**Truncating**: This refers to shortening longer pieces of text to fit a certain size limit.

**Padding**: Adding extra data to shorter texts to reach a uniform length for processing.

**Batches**: Batches are small, evenly divided parts of data that the AI looks at and learns from each step of the way.

**Batch Size**: The number of data samples that the machine considers in one go during training.

**Epochs**: A complete pass through the entire training dataset. The more epochs, the more the computer goes over the material to learn.

**Dataset Splits**: Dividing the dataset into parts for different uses, such as training the model and testing how well it works.



In [61]:
from transformers import (DistilBertForSequenceClassification,
    DistilBertTokenizer,
    TrainingArguments,
    Trainer
)
from datasets import load_dataset

In [64]:
model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=2
)
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)


dataset = load_dataset("imdb")
tokenized_datasets = dataset.map(tokenize_function, batched=True)

training_args = TrainingArguments(
    per_device_train_batch_size=64,
    output_dir="./results",
    learning_rate=2e-5,
    num_train_epochs=3,
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)
trainer.train()

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.26.0`: Please run `pip install transformers[torch]` or `pip install 'accelerate>={ACCELERATE_MIN_VERSION}'`

## Resources

https://huggingface.co/docs/transformers/main_classes/trainer

https://huggingface.co/docs/transformers/model_doc/distilbert#transformers.DistilBertForSequenceClassification

https://huggingface.co/docs/transformers/model_doc/distilbert#transformers.DistilBertTokenizer

https://huggingface.co/distilbert-base-uncased

https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments

https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer

## PyTorch

https://www.youtube.com/watch?v=kHWB4i2eS1Q

## Hugging Face