# Approccio orientato ai dati

## Approccio connettivista: Perceptron

L'approccio connettivista prevede l'utilizzo di un modello di rete neurale, come un semplice perceptron, per imparare i modelli dei nomi. Le fasi comprendono:

1. **Codifica dei dati**: Convertire i nomi in un formato adatto all'addestramento della rete neurale (ad esempio, codifica a un punto).
2. **Addestramento del modello**: Addestrare un semplice perceptron sul set di dati. L'input potrebbe essere costituito da sequenze di caratteri e l'obiettivo potrebbe essere il carattere successivo del nome.
3. **Valutazione della perdita**: Utilizzare una funzione di perdita adeguata (come l'entropia incrociata categoriale) per valutare le prestazioni del modello e regolare i pesi di conseguenza.
4. **Generazione del nome**: Utilizzare il modello addestrato per prevedere i caratteri successivi, a partire da un carattere o una sequenza iniziale, per generare nuovi nomi.

In [26]:
# Load the dataset
file_path = 'names.txt'

# Read the file and process it for bigram frequency analysis
with open(file_path, 'r', encoding='utf-8') as file:
    names = file.read().splitlines()

print(names[:10])

['emma', 'olivia', 'ava', 'isabella', 'sophia', 'charlotte', 'mia', 'amelia', 'harper', 'evelyn']


### Fase 1: Preparazione dei dati

È necessario convertire i nomi in un formato che una rete neurale possa elaborare. Di solito questo comporta la codifica di ogni carattere in un formato numerico, come la codifica *one-hot*.

In [None]:
import numpy as np

# Create a set of all unique characters in the names
unique_chars = set(''.join(names))
unique_chars = sorted(unique_chars)
print(len(unique_chars), unique_chars)

In [None]:
# Create a dictionary to map each character to a unique integer
char_to_int = dict((c, i) for i, c in enumerate(unique_chars))

#  Create a dictionary to convert back
int_to_char = {i: c for c, i in char_to_int.items()}

# Convert names to integer representation
int_names = [[char_to_int[char] for char in name] for name in names]

# One-hot encode the integer representation of names
max_name_length = max([len(name) for name in int_names])
n_chars = len(unique_chars)
one_hot_encoded = np.zeros(
    (len(int_names), max_name_length, n_chars), dtype=np.float32)
for i, name in enumerate(int_names):
    for j, char_int in enumerate(name):
        one_hot_encoded[i, j, char_int] = 1.0

one_hot_encoded.shape

In [None]:
one_hot_encoded[0, :, :]

In [None]:
import matplotlib.pyplot as plt

# plot the first two names in the dataset
plt.figure(figsize=(16, 6))
plt.subplot(1, 2, 1)
plt.title(names[0])
plt.imshow(one_hot_encoded[0].T)
plt.subplot(1, 2, 2)
plt.title(names[1])
plt.imshow(one_hot_encoded[1].T)
plt.show()

### Passo 2: Progettazione del percettrone

È possibile progettare un semplice perceptron utilizzando Numpy.

In [None]:
import numpy as np

input_size = len(char_to_int)  # Number of unique characters
output_size = len(char_to_int)  # Same as the number of unique characters

# Initialize weights and biases
weights = np.random.normal(0.0, pow(input_size, -0.5),
                           (input_size, output_size))
biases = np.zeros(output_size)

In [None]:
weights[0], biases[0]

### Fase 3: addestramento del modello

L'addestramento del perceptron comporta la regolazione dei pesi in base all'errore tra l'uscita prevista e quella effettiva. Questa operazione viene tipicamente eseguita utilizzando la *backpropagation* e un algoritmo di ottimizzazione come la discesa del gradiente.

In [None]:
import numpy as np


def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def sigmoid_derivative(x):
    return x * (1 - x)

In [None]:
def forward_pass(input_layer, weights, biases):
    return sigmoid(np.dot(input_layer, weights) + biases)

In [None]:
def backward_pass(target, output, input_layer, weights, biases, learning_rate):
    error = target - output
    d_weights = np.dot(input_layer.T, (2 * error * sigmoid_derivative(output)))
    d_biases = 2 * error * sigmoid_derivative(output)

    # Calculate the updated weights and biases
    updated_weights = weights + learning_rate * d_weights
    updated_biases = biases + learning_rate * \
        np.sum(d_biases, axis=0)  # Summing up the gradients for biases

    return updated_weights, updated_biases

Nel contesto dell'addestramento di una rete neurale per la generazione di nomi, il "target" si riferisce all'output desiderato per un dato input durante il processo di addestramento. Nel caso di un semplice perceptron per la generazione di nomi, l'obiettivo per ogni istanza di addestramento è tipicamente il carattere successivo della sequenza di nomi, codificato nello stesso formato dell'input (ad esempio, codificato a un punto, se è così che sono rappresentati i caratteri di input).

In [None]:
# Assuming one_hot_encoded is the one-hot encoded representation of names
# Each character in a name is an input, and the next character is the target

# Split each name into input-target pairs
inputs = []
targets = []
for name in one_hot_encoded:
    for i in range(len(name) - 1):
        inputs.append(name[i])
        targets.append(name[i + 1])

# Convert to numpy arrays for efficient computation
inputs = np.array(inputs)
targets = np.array(targets)

In [None]:
import random


def generate_name(weights, biases, char_to_int, int_to_char, max_length=10):
    # Start with a random character
    current_char = random.choice(list(char_to_int.keys()))
    name = current_char

    for _ in range(max_length - 1):
        # Convert current character to one-hot encoding
        input_vec = np.zeros((1, len(char_to_int)))
        input_vec[0, char_to_int[current_char]] = 1

        # Forward pass to predict the next character
        output = forward_pass(input_vec, weights, biases)

        # Convert output to character
        next_char_int = np.argmax(output)
        next_char = int_to_char[next_char_int]

        # Append the predicted character to the name
        name += next_char

        # Update the current character
        current_char = next_char

        # Optionally: break if a special end-of-sequence character is predicted

    return name.capitalize()

In [None]:
def train_perceptron(inputs, targets, weights, biases, epochs, learning_rate, char_to_int, int_to_char):
    for epoch in range(epochs):
        total_loss = 0
        for i in range(len(inputs)):
            input_layer = inputs[i]
            target = targets[i]

            # Forward pass
            output = forward_pass(input_layer, weights, biases)

            # Calculate loss (mean squared error)
            loss = np.mean((target - output) ** 2)
            total_loss += loss

            # Backward pass and update weights and biases
            weights, biases = backward_pass(
                target, output, input_layer, weights, biases, learning_rate)

        # Print epoch details and generate a name
        average_loss = total_loss / len(inputs)
        generated_name = generate_name(
            weights, biases, char_to_int, int_to_char)
        print(
            f"Epoch {epoch+1}/{epochs}, Loss: {average_loss:.4f}, Generated Name: {generated_name}")

In [None]:
epochs = 3
learning_rate = 0.01
train_perceptron(inputs, targets, weights, biases, epochs,
                 learning_rate, char_to_int, int_to_char)

### Aggiungere un layer nascosto

In [None]:
input_size = len(char_to_int)  # Number of unique characters
hidden_size = 256  # You can adjust this
output_size = len(char_to_int)  # Same as the number of unique characters

# Initialize weights and biases for the input-to-hidden layer
weights_1 = np.random.normal(
    0.0, pow(hidden_size, -0.5), (input_size, hidden_size))
biases_1 = np.zeros(hidden_size)

# Initialize weights and biases for the hidden-to-output layer
weights_2 = np.random.normal(
    0.0, pow(output_size, -0.5), (hidden_size, output_size))
biases_2 = np.zeros(output_size)

In [None]:
def forward_pass(input_layer, weights_1, biases_1, weights_2, biases_2):
    # Input to hidden layer
    hidden_layer_input = np.dot(input_layer, weights_1) + biases_1
    hidden_layer_output = sigmoid(hidden_layer_input)

    # Hidden to output layer
    output_layer_input = np.dot(hidden_layer_output, weights_2) + biases_2
    output = sigmoid(output_layer_input)

    return output, hidden_layer_output

In [None]:
def backward_pass(target, output, hidden_layer_output, input_layer, weights_1, biases_1, weights_2, biases_2, learning_rate):
    # Calculate error
    error = target - output

    # Gradients for output layer
    d_weights_2 = np.dot(hidden_layer_output.reshape(-1, 1), (2 * error * sigmoid_derivative(output)).reshape(1, -1))
    d_biases_2 = 2 * error * sigmoid_derivative(output)

    # Error for hidden layer
    hidden_layer_error = np.dot(2 * error * sigmoid_derivative(output), weights_2.T).flatten()

    # Gradients for hidden layer
    d_weights_1 = np.dot(input_layer.reshape(-1, 1), hidden_layer_error.reshape(1, -1))
    d_biases_1 = hidden_layer_error * sigmoid_derivative(hidden_layer_output)

    # Update weights and biases
    weights_1 += learning_rate * d_weights_1
    biases_1 += learning_rate * np.sum(d_biases_1, axis=0)
    weights_2 += learning_rate * d_weights_2
    biases_2 += learning_rate * np.sum(d_biases_2, axis=0)

    return weights_1, biases_1, weights_2, biases_2


In [None]:
def generate_name(weights_1, biases_1, weights_2, biases_2, char_to_int, int_to_char, max_length=10):
    # Start with a random character
    current_char = random.choice(list(char_to_int.keys()))
    name = current_char

    for _ in range(max_length - 1):
        # Convert current character to one-hot encoding
        input_vec = np.zeros((1, len(char_to_int)))
        input_vec[0, char_to_int[current_char]] = 1

        # Forward pass through the network (including the hidden layer)
        hidden_layer_output = sigmoid(np.dot(input_vec, weights_1) + biases_1)
        output = sigmoid(np.dot(hidden_layer_output, weights_2) + biases_2)

        # Convert output to character
        next_char_int = np.argmax(output)
        next_char = int_to_char[next_char_int]

        # Append the predicted character to the name
        name += next_char

        # Update the current character
        current_char = next_char

        # Optionally: break if a special end-of-sequence character is predicted

    return name.capitalize()

In [None]:
def train_perceptron(inputs, targets, weights_1, biases_1, weights_2, biases_2, epochs, learning_rate, char_to_int, int_to_char):
    for epoch in range(epochs):
        total_loss = 0
        for i in range(len(inputs)):
            input_layer = inputs[i]
            target = targets[i]

            # Forward pass
            output, hidden_layer_output = forward_pass(
                input_layer, weights_1, biases_1, weights_2, biases_2)

            # Calculate loss (mean squared error)
            loss = np.mean((target - output) ** 2)
            total_loss += loss

            # Backward pass and update weights and biases
            weights_1, biases_1, weights_2, biases_2 = backward_pass(
                target, output, hidden_layer_output, input_layer, weights_1, biases_1, weights_2, biases_2, learning_rate)

        # Print epoch details and generate a name
        average_loss = total_loss / len(inputs)
        # Generate name using output layer weights
        generated_name = generate_name(
            weights_2, biases_2, char_to_int, int_to_char)
        print(
            f"Epoch {epoch+1}/{epochs}, Loss: {average_loss:.4f}, Generated Name: {generated_name}")

In [None]:
epochs = 3
learning_rate = 0.01
# Assuming that the following variables are already defined:
# inputs, targets, weights_1, biases_1, weights_2, biases_2, epochs, learning_rate, char_to_int, int_to_char

train_perceptron(inputs, targets, weights_1, biases_1, weights_2,
                 biases_2, epochs, learning_rate, char_to_int, int_to_char)