# Turkish Name Generator — From Scratch

This notebook builds a **character-level name generator** trained on a corpus of Turkish names.  
We construct everything from **manual layers** to **custom BatchNorm**, train it using gradient descent,  
and even implement **BatchNorm folding** to optimize inference speed.

### Features:
- Hand-written `Linear`, `BatchNorm1d`, `ReLU`, and `Softmax`
- Character embedding layer
- Tokenization and preprocessing for UTF-8-safe Turkish names
- Training loop with manual parameter updates
- BatchNorm layer folding into Linear layers for fast inference
- Interactive name generation

> The goal is not just to build a model...  
> but to understand the gears and levers behind every activation.

---

> Built from *first principles™*, with love for the stack and respect for the flow.


## Import Libraries & Set Random Seed
Initialize the environment, import required libraries, and fix the randomness for reproducibility.


In [1]:
import torch
import torch.nn.functional as F
import random

g = torch.Generator().manual_seed(1337)

## Define Model Components
Implement the basic building blocks of a neural network:
- Linear layers
- Activation functions (ReLU, Softmax)
- Custom BatchNorm1d layer with train/test behavior


In [None]:
class Linear:
  def __init__(self, ins, outs, bias=False):
    self.weights = torch.empty(outs, ins)
    torch.nn.init.kaiming_uniform_(self.weights, mode='fan_in', nonlinearity='relu', generator=g)
    if bias:
      self.biases = torch.rand(outs, generator=g)
    else:
      self.biases = None

  def __call__(self, x):
    pre_act = x @ self.weights.T
    if self.biases is not None:
      pre_act += self.biases
    return pre_act

  def params(self):
    return [self.weights] + [self.biases] if self.biases is not None else [self.weights]
  
class Relu:
  def __call__(self, x):
    self.out = F.relu(x)
    return self.out

class Softmax:
  def __call__(self, x):
    self.out = F.softmax(x)
    return self.out
  
class BatchNorm1d:
  def __init__(self, dim, eps=1e-5, momentum=0.1):
    self.dim = dim
    self.eps = eps
    self.momentum = momentum
    self.training = True
    # gamma & beta: trainable parameters to scale or move the normed batch
    self.gamma = torch.ones(dim)
    self.beta = torch.zeros(dim)
    # for inference
    self.running_mean = torch.zeros(dim)
    self.running_var = torch.ones(dim)

  def __call__(self, x):
    if self.training:   # train vars
      xmean = x.mean(0, keepdim=True)
      xvar = x.var(0, keepdim=True)
    else:               # inference vars
      xmean = self.running_mean
      xvar = self.running_var

    xhat = (x - xmean) / torch.sqrt(xvar + self.eps)
    self.out = self.gamma * xhat + self.beta

    if self.training: # calculate running mean/var
      self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * xmean
      self.running_var = (1 - self.momentum) * self.running_var + self.momentum * xvar
    return self.out

  def params(self):
    return [self.gamma] + [self.beta]

## Load & Preprocess Turkish Names
- Load raw Turkish names
- Normalize characters using custom UTF-8 mapping for Turkish phonemes
- Prepare a clean dataset of lowercase, ASCII-safe names


In [5]:
with open('turkish_names.txt', 'r') as f:
  lines = f.read().splitlines()
len(lines)

trchr_to_utf8 = {
    'İ': 'i',
    'I': '!',
    'Ö': '@',
    'Ü': '#',
    'Ş': '$',
    'Ç': '^',
    'Ğ': '&'}

utf8_to_trchr = {v: k for k, v in trchr_to_utf8.items()}
def undo_utf8(name: str):
  name = ''.join(utf8_to_trchr.get(ch, ch) for ch in name).lower()
  return name

words = [word for line in lines for word in line.split()]
names = [''.join(trchr_to_utf8.get(ch, ch) for ch in name).lower() for name in words]
# ^ turkish phonemes transmogrified into utf-8 sigils ^-^
names[:10]

['jale',
 'ali',
 'mahmut',
 'mansur',
 'k#r$ad',
 'gamze',
 'mira^',
 'y#cel',
 'kubilay',
 'hayati']

## Character-Level Tokenization
- Extract character vocabulary
- Map characters to integers and vice versa
- Prepare input-output pairs using a sliding window of context


In [6]:
chars = ['.'] + sorted(list(set(ch for name in names for ch in name)))
# ^ no need to sort tbh.    '.' as special start/end token.
i_to_s = {i: chars[i] for i in range(len(chars))}
s_to_i = {v: k for k, v in i_to_s.items()}

## Create Training, Validation & Test Splits
- Shuffle dataset
- Split into train/val/test with ratios 80/10/10
- Convert names into tensorized (X, Y) samples
- Verify dataset consistency via average name length check


In [7]:
block_size = 3 # how many characters should it take to predict the next one?
# . . . -> a; . . a -> t; . a t -> a
def create_dataset(names):
  X, Y = [], []
  for name in names:
    name = f"{name}."   # add end token to each name
    context = [0] * block_size # initialized as `. . .`
    for ch in name:
      X.append(context)
      Y.append(s_to_i[ch])
      context = context[1:] + [s_to_i[ch]]
  return torch.tensor(X), torch.tensor(Y)

random.shuffle(names)

n1 = int(len(names) * 0.1)

Xtr, Ytr = create_dataset(names[n1*2:]) # 80% train split
Xte, Yte = create_dataset(names[:n1]) # 10% test
Xval, Yval = create_dataset(names[n1:n1*2]) # 10% valid

# precision spell
avg_len = sum(len(name) + 1 for name in names) / len(names)
actual = (len(Xtr) + len(Xte) + len(Xval)) / len(names)
assert abs(avg_len - actual) < 1e-6, '\nSize mismatch.\nWhat did you break?'

## Train the Neural Network
- Define model architecture with:
  - Character embeddings
  - Two hidden layers
  - BatchNorm and ReLU activations
- Train using cross-entropy loss and SGD
- Monitor training and validation loss


In [8]:
emb_dim = 8 # each character's embedding will be a vector of size (2).
C = torch.randn(len(chars), emb_dim)
emb = C[Xtr].view((Xtr.shape[0], -1))


n_hidden = 100 #neuron count for hidden layers
batch_size = 32

layers = [
    Linear(emb.shape[1], n_hidden), BatchNorm1d(n_hidden), Relu(),
    Linear(n_hidden, 50), BatchNorm1d(50), Relu(),
    Linear(50, len(chars))
]

parameters = []
for layer in layers:
  if hasattr(layer, 'params'):
    parameters.extend(layer.params())
parameters.append(C)

for p in parameters:
  p.requires_grad = True

print(f'{len([p for param in parameters for p in param])} parameters.')

losses = []
for i in range(10000):

  ib = torch.randint(0, emb.shape[0], (batch_size,)) # indices of the elements in the batch
  #emb = C.view
  Xb = C[Xtr[ib]].view(batch_size, -1)
  Yb = Ytr[ib] # initialize batches

# forward pass
  for layer in layers:
    Xb = layer(Xb)
    layer.act = Xb
# backward pass
  loss = F.cross_entropy(Xb, Yb)
  loss.backward()
  losses.append(loss.item())
# update
  for p in parameters:
    if i < 25000:
        p.data += -0.01 * p.grad
    else:
        p.data += -0.001 * p.grad

  for p in parameters:
    p.grad = None
  if i%10000 == 0:
    print(f"step {i}\ntrain loss: {loss:.5f}")
    X = C[Xval].view(Xval.shape[0], -1)
    Y = Yval # initialize batches
    # forward pass
    for layer in layers:
        X = layer(X)
    val_loss = F.cross_entropy(X, Y)
    print(f"valid loss: {val_loss:.5f}")

print(f"avg train loss: {sum(losses) / len(losses)}")

510 parameters.
step 0
train loss: 3.62561
valid loss: 3.65232
avg train loss: 2.119587361955643


## BatchNorm Folding for Faster Inference
After training, remove BatchNorm layers by folding their behavior into the preceding Linear layer:
- Adjust weights and biases using running mean/variance
- Bake γ (scale) and β (shift) into Linear
- Eliminate the BatchNorm layer entirely


In [9]:
# layers[3].weights = (layers[4].gamma * (layers[3].weights.T / torch.sqrt(layers[4].running_var + 1e-5))).T
# layers[3].biases = (layers[4].gamma * (-layers[4].running_mean / torch.sqrt(layers[4].running_var + 1e-5))) + layers[4].beta

# # Folding BatchNorm into the weights of the previous Linear layer
# # Reference: https://forums.fast.ai/t/faster-inference-batch-normalization-folding/69161

# layers.pop(4)

In [10]:
# ^^^^ what is this monstrosity???? let me explain.

# we are effectively getting rid of the BatchNorm layer during inference.
# this will allow us to decrease inference time.
# we can do this because the BN operation after training is just a linear operation.
# so what we do here is:
#
# Change each weight that follows a BN into:
#     Weights divided by the standard deviation; 
#     Multiplied by the gamma, e.g. trainable scaling parameter of the BN layer.
# We also need to account for beta, e.g. trainable shifting parameter, e.g. 'bias', of the BN layer.
#
# The result is a Linear layer with the BN baked into it. Pretty cool!
# Then we just get rid of the BN layer. ^-^
#
#
#   Now, let's do this to all BN layers:

In [11]:
# modular :p
for i, layer in enumerate(layers):
    if hasattr(layer, 'running_mean'): # e.g. if the layer is BN
        layers[i-1].weights = (layer.gamma * (layers[i-1].weights.T / torch.sqrt(layer.running_var + 1e-5))).T
        layers[i-1].biases = (layer.gamma * (-layer.running_mean / torch.sqrt(layer.running_var + 1e-5))) + layer.beta
layers = [layer for layer in layers if not hasattr(layer, 'running_mean')]
layers

[<__main__.Linear at 0x11cb91cd0>,
 <__main__.Relu at 0x11cbaae40>,
 <__main__.Linear at 0x11cb21d30>,
 <__main__.Relu at 0x11ce18170>,
 <__main__.Linear at 0x10fee85f0>]

## Evaluate Post-Folding Model
- Run the model on test set after batchnorm has been removed
- Check for consistent test loss to ensure folding didn't break anything


In [12]:
# Test results
X = C[Xtr].view(Xtr.shape[0], -1)
Y = Ytr # initialize batches
# forward pass
for layer in layers:
    X = layer(X)
# backward pass
loss = F.cross_entropy(X, Y)
print(f"test loss: {loss}")

test loss: 1.7257429361343384


## Generate New Names
- Switch all layers to evaluation mode
- Sample character-by-character using the trained model
- Decode predicted names and map UTF-8 sigils back to Turkish characters


In [14]:
# ---   INFERENCE   ---
for layer in layers:
  if hasattr(layer, 'training'):
    layer.training = False

while True:
  name_count = int(input("How many names will be generated? (0 to exit): "))
  if name_count == 0:
    break

  for i in range(name_count):
    context = [0] * block_size
    name = ''
    while True:
      emb = C[context].view(-1, block_size * emb_dim)
      for layer in layers:
        emb = layer(emb)
      pred = torch.multinomial(F.softmax(emb, dim=1), 1).item()
      if i_to_s[pred] == '.':
        break
      context = context[1:] + [pred]
      #print(context)
      name += i_to_s[pred]

    name = undo_utf8(name)
    print(name)


basmft
emrat
mehanap
ezer
hi̇subayn
ui̇lami̇ye
mehmet
cerhanir
özgen
furay
yuğazanee
asşegürma
aycm
tul
yus
