I will try to explain and do the following steps:
Load Data
Define PyToch Model
Define Loss Function and Optimizers
Run a Training Loop
Evaluate the Model
Make Predictions

### (1) Load Data

First the following imports are needed:

In [19]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

In [20]:
# load the dataset, split into input (X) and output (y) variables
# output will be a numpy array
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',') # values are separated by commas
X = dataset[:,0:8] # slicing the dataset: select rows 0 - 7
y = dataset[:,8]   # slicing: select row 8 (output)
print(f"X shape: {X.shape}, y shape: {y.shape}") # 786 data points, each 8 input dims and one output dim
# Print the first 10 datapoints from the NumPy arrays (X and y)
print("First 10 datapoints in NumPy array X:")
print(X[:10])  # First 10 rows of X
print("\nFirst 10 datapoints in NumPy array y:")
print(y[:10])  # First 10 values in y

X shape: (768, 8), y shape: (768,)
First 10 datapoints in NumPy array X:
[[6.000e+00 1.480e+02 7.200e+01 3.500e+01 0.000e+00 3.360e+01 6.270e-01
  5.000e+01]
 [1.000e+00 8.500e+01 6.600e+01 2.900e+01 0.000e+00 2.660e+01 3.510e-01
  3.100e+01]
 [8.000e+00 1.830e+02 6.400e+01 0.000e+00 0.000e+00 2.330e+01 6.720e-01
  3.200e+01]
 [1.000e+00 8.900e+01 6.600e+01 2.300e+01 9.400e+01 2.810e+01 1.670e-01
  2.100e+01]
 [0.000e+00 1.370e+02 4.000e+01 3.500e+01 1.680e+02 4.310e+01 2.288e+00
  3.300e+01]
 [5.000e+00 1.160e+02 7.400e+01 0.000e+00 0.000e+00 2.560e+01 2.010e-01
  3.000e+01]
 [3.000e+00 7.800e+01 5.000e+01 3.200e+01 8.800e+01 3.100e+01 2.480e-01
  2.600e+01]
 [1.000e+01 1.150e+02 0.000e+00 0.000e+00 0.000e+00 3.530e+01 1.340e-01
  2.900e+01]
 [2.000e+00 1.970e+02 7.000e+01 4.500e+01 5.430e+02 3.050e+01 1.580e-01
  5.300e+01]
 [8.000e+00 1.250e+02 9.600e+01 0.000e+00 0.000e+00 0.000e+00 2.320e-01
  5.400e+01]]

First 10 datapoints in NumPy array y:
[1. 0. 1. 0. 1. 0. 1. 0. 1. 1.]


PyTorch works with the "tensor" data type, one should convert e.g. bc NumPy uses 64 bit floats and PyTorch uses 32 bit floats

In [25]:
# Convert to PyTorch tensors
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1) # y is a 1D Object with elements [1.0 0.0 1.0 ...]
# it has no rows or columns. For torch it expects 2D as output dim, so we convert to [[1.0],
#                                                                                     [0.0] ...]

# Print the first 10 datapoints from the PyTorch tensors (X and y)
print("\nFirst 10 datapoints in PyTorch tensor X:")
print(X_tensor[:10])  # First 10 rows of X as a tensor
print("\nFirst 10 datapoints in PyTorch tensor y:")
print(y_tensor[:10])  # First 10 values of y as a tensor


First 10 datapoints in PyTorch tensor X:
tensor([[6.0000e+00, 1.4800e+02, 7.2000e+01, 3.5000e+01, 0.0000e+00, 3.3600e+01,
         6.2700e-01, 5.0000e+01],
        [1.0000e+00, 8.5000e+01, 6.6000e+01, 2.9000e+01, 0.0000e+00, 2.6600e+01,
         3.5100e-01, 3.1000e+01],
        [8.0000e+00, 1.8300e+02, 6.4000e+01, 0.0000e+00, 0.0000e+00, 2.3300e+01,
         6.7200e-01, 3.2000e+01],
        [1.0000e+00, 8.9000e+01, 6.6000e+01, 2.3000e+01, 9.4000e+01, 2.8100e+01,
         1.6700e-01, 2.1000e+01],
        [0.0000e+00, 1.3700e+02, 4.0000e+01, 3.5000e+01, 1.6800e+02, 4.3100e+01,
         2.2880e+00, 3.3000e+01],
        [5.0000e+00, 1.1600e+02, 7.4000e+01, 0.0000e+00, 0.0000e+00, 2.5600e+01,
         2.0100e-01, 3.0000e+01],
        [3.0000e+00, 7.8000e+01, 5.0000e+01, 3.2000e+01, 8.8000e+01, 3.1000e+01,
         2.4800e-01, 2.6000e+01],
        [1.0000e+01, 1.1500e+02, 0.0000e+00, 0.0000e+00, 0.0000e+00, 3.5300e+01,
         1.3400e-01, 2.9000e+01],
        [2.0000e+00, 1.9700e+02, 7.000

The reshape operation changes y from a 1D to a 2D array:
befor y is 1D: y = [0, 1, 1, 0, 1] 

after, each element has its own row: 
y = [[0], [1],
     [1],
     [0],
     [1]]
     
This is done because PyTorch expects/prefers this format.

### (2) Define the model

The standart way to do this is by writing a class that inherits from the torch.nn.Module and then defining the layers sequentially. 
Of course: The input layer must fit the dimension of the input


In [26]:
class PimaClassifier(nn.Module):          # inherit from nn.Module, basic class for neural networks
    def __init__(self):                   # init is called when instance of PimaClassifier is created
        super().__init__()                # call the parents` class constructor (necessary!)
        self.hidden1 = nn.Linear(8, 12)   # Linear Layer: 8 inputs, 12 outputs
        self.act1 = nn.ReLU()             # activation function for the layer: ReLu
        self.hidden2 = nn.Linear(12, 8)   # 2nd layer: 12 in, 8 out
        self.act2 = nn.ReLU()
        self.output = nn.Linear(8, 1)     # Output layer: In this case (classification) one output neuron
        self.act_output = nn.Sigmoid()    # Sigmoid fct for classification problem

    def forward(self, x):                 # define how the data is passed forward through the network
        x = self.act1(self.hidden1(x))    # Verkettung: Lineare fkt innen, Aktivierungsfkt außen
        x = self.act2(self.hidden2(x))    
        x = self.act_output(self.output(x))
        return x

model = PimaClassifier()
print(model)

PimaClassifier(
  (hidden1): Linear(in_features=8, out_features=12, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=12, out_features=8, bias=True)
  (act2): ReLU()
  (output): Linear(in_features=8, out_features=1, bias=True)
  (act_output): Sigmoid()
)


### (3) Preparation for training 
define loss function (binary classification = binary cross entropy loss) 
choose optimizer (standart = adam)

In [27]:
loss_fn = nn.BCELoss()  # binary cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.001) # adam has more hyperparams than the learning rate

### (4) Training the model  

Epoch: Passes the entire training dataset to the model once 

Batch: One or more samples passed to the model, from which the gradient descent algorithm will be executed for one iteration (batch size linearly in relation to the number of computations) 

Pass batches (whole data in batches) in loops (epochs) through the model until satisfied with the models output. 
 
 The simplest way to build a training loop is to use two nested for-loops, one for epochs and one for batches:

In [28]:
n_epochs = 100
batch_size = 10

for epoch in range(n_epochs):               # Loop over the number of epochs
    # Loop over the dataset in batches (batch gradient descent
    for i in range(0, len(X), batch_size):  # range(start idx, end idx (stops one before), step size)
        Xbatch = X[i:i+batch_size]          # Get a batch of input data (Xbatch) of size 'batch_size'
        y_pred = model(Xbatch)              # Forward pass: compute the model's predictions for the current batch
        ybatch = y[i:i+batch_size]          # Get the corresponding batch of target/output data (ybatch)
        loss = loss_fn(y_pred, ybatch)      # Calculate the loss between the predictions (y_pred) and the actual targets (ybatch)
        optimizer.zero_grad()               # Zero the gradients from the previous iteration (necessary before performing backprop)
        loss.backward()                     # Backward pass: compute the gradients of the loss with respect to the model's parameters
        optimizer.step()                    # Update the model's parameters using the gradients and the optimizer's learning rate

    print(f'Finished epoch {epoch}, latest loss {loss}')


Finished epoch 0, latest loss 0.6246140003204346
Finished epoch 1, latest loss 0.6292789578437805
Finished epoch 2, latest loss 0.6256213188171387
Finished epoch 3, latest loss 0.6078948974609375
Finished epoch 4, latest loss 0.5829925537109375
Finished epoch 5, latest loss 0.5649218559265137
Finished epoch 6, latest loss 0.552021861076355
Finished epoch 7, latest loss 0.5435589551925659
Finished epoch 8, latest loss 0.5360323786735535
Finished epoch 9, latest loss 0.5319582223892212
Finished epoch 10, latest loss 0.528340220451355
Finished epoch 11, latest loss 0.5259295701980591
Finished epoch 12, latest loss 0.5268047451972961
Finished epoch 13, latest loss 0.5282303094863892
Finished epoch 14, latest loss 0.5231646299362183
Finished epoch 15, latest loss 0.5265318155288696
Finished epoch 16, latest loss 0.5174545049667358
Finished epoch 17, latest loss 0.5159606337547302
Finished epoch 18, latest loss 0.5168362259864807
Finished epoch 19, latest loss 0.5211670994758606
Finished epo

### (5) Evaluate the Model 

This training loop only uses one training set. Of course normally we have: Training set (80%), Validation set (10%) and Testset (10%) to get a real performance prediction. Here, we can only evaluate performance on the training data (so, do it the same way but on the testset) 
Reminder: **Accuracy** measures the proportion of correct predictions out of the total number of predictions:

$ \text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}} = \frac{\text{TP + TN}}{\text{TP + TN + FP + FN}} $


In [29]:
# compute accuracy (no_grad is optional)
with torch.no_grad():
    y_pred = model(X)

accuracy = (y_pred.round() == y).float().mean()
print(f"Accuracy {accuracy}")

Accuracy 0.7578125


The round() function rounds off the floating point to the nearest integer. The == operator compares and returns a Boolean tensor, which can be converted to floating point numbers 1.0 and 0.0. The mean() function will provide you the count of the number of 1’s (i.e., prediction matches the label) divided by the total number of samples. The no_grad() context is optional but suggested, so you relieve y_pred from remembering how it comes up with the number since you are not going to do differentiation on it. 
One could do the whole training a few times to see how different models perform (stochastic process!) 
 
 ### (6) Make predictions 
 We can now use the model to make predictions:

In [11]:
# make class predictions with the model
predictions = (model(X) > 0.5).int()
for i in range(5):
    print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))

[6.0, 148.0, 72.0, 35.0, 0.0, 33.599998474121094, 0.6269999742507935, 50.0] => 1 (expected 1)
[1.0, 85.0, 66.0, 29.0, 0.0, 26.600000381469727, 0.35100001096725464, 31.0] => 0 (expected 0)
[8.0, 183.0, 64.0, 0.0, 0.0, 23.299999237060547, 0.671999990940094, 32.0] => 1 (expected 1)
[1.0, 89.0, 66.0, 23.0, 94.0, 28.100000381469727, 0.16699999570846558, 21.0] => 0 (expected 0)
[0.0, 137.0, 40.0, 35.0, 168.0, 43.099998474121094, 2.2880001068115234, 33.0] => 1 (expected 1)


### Summary 
You discovered how to create your first neural network model using PyTorch. Specifically, you learned the key steps in using PyTorch to create a neural network or deep learning model step by step, including:

How to load data
How to define a neural network in PyTorch
How to train a model on data
How to evaluate a model
How to make predictions with the model