# Character-Level RNN Text Generator

A vanilla RNN implementation from scratch for character-level text generation. Learn RNN fundamentals by building a model that generates text one character at a time.

## Architecture

```
Input (char) → One-Hot → RNN Cell → Softmax → Output (next char)
                           ↓
                    Hidden State (memory)
```

### RNN Cell Equations

```
h_t = tanh(W_xh · x_t + W_hh · h_{t-1} + b_h)
y_t = W_hy · h_t + b_y
p_t = softmax(y_t)
```

**Where:**
- `x_t` = current character (one-hot encoded)
- `h_t` = hidden state (memory)
- `W_xh, W_hh, W_hy` = weight matrices
- `b_h, b_y` = biases

### Parameters

For vocab size `V=50`, hidden size `H=128`:
- W_xh: 128 × 50
- W_hh: 128 × 128  
- W_hy: 50 × 128
- **Total: ~29k parameters**

## How It Works

**Training:**
1. Convert text to character indices
2. Create input-target pairs (shifted by 1 char)
3. Forward pass through sequence
4. Backpropagation Through Time (BPTT)
5. Update weights with gradient descent

**Generation:**
1. Start with seed text
2. Get probability distribution for next char
3. Sample using temperature
4. Append and repeat

## Hyperparameters

```python
hidden_size = 128        
seq_length = 25          
learning_rate = 0.01     
temperature = 0.7        
```

## Usage

```python
rnn = CharRNN(vocab_size, hidden_size=128)
rnn.train(text, epochs=100)
output = rnn.generate(seed="To be", length=200, temp=0.7)
```

## Key Concepts

Sequence modeling with hidden states  
Backpropagation through time  
Gradient clipping (prevents exploding gradients)  
Temperature-based sampling  

## Extensions

- Implement LSTM cells
- Multi-layer RNN
- Train on code/poetry/different styles
- Add beam search for generation

## Resources

- [Karpathy's RNN Post](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
- [Understanding LSTMs](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

---

In [None]:
import numpy as np
import random

In [None]:
text = """

In Paris, urban farmers are trying a soil-free approach to agriculture that uses less
space and fewer resources. Could it help cities face the threats to our food supplies?
On top of a striking new exhibition hall in southern Paris, the world's largest urban rooftop farm
has started to bear fruit. Strawberries that are small, intensely flavoured and resplendently red
sprout abundantly from large plastic tubes. Peer inside and you see the tubes are completely
hollow, the roots of dozens of strawberry plants dangling down inside them. From identical
vertical tubes nearby burst row upon row of lettuces; near those are aromatic herbs, such as basil,
sage and peppermint. Opposite, in narrow, horizontal trays packed not with soil but with coconut
fibre, grow cherry tomatoes, shiny aubergines and brightly coloured chards.
Pascal Hardy, an engineer and sustainable development consultant, began experimenting with
vertical farming and aeroponic growing towers - as the soil-free plastic tubes are known - on
his Paris apartment block roof five years ago. The urban rooftop space above the exhibition hall
is somewhat bigger: 14,000 square metres and almost exactly the size of a couple of football
pitches. Already, the team of young urban farmers who tend it have picked, in one day, 3,000
lettuces and 150 punnets of strawberries. When the remaining two thirds of the vast open area
are in production, 20 staff will harvest up to 1,000 kg of perhaps 35 different varieties of fruit
and vegetables, every day. 'We're not ever, obviously, going to feed the whole city this way,
cautions Hardy. 'In the urban environment you're working with very significant practical
constraints, clearly, on what you can do and where. But if enough unused space can be developed
like this, there's no reason why you shouldn't eventually target maybe between 5% and 10%
of consumption.'
Perhaps most significantly, however, this is a real-life showcase for the work of Hardy's
flourishing urban agriculture consultancy, Agripolis, which is currently fielding enquiries from
around the world to design, build and equip a new breed of soil-free inner-city farm. 'The
method's advantages are many,' he says. 'First, I don't much like the fact that most of the fruit
and vegetables we eat have been treated with something like 17 different pesticides, or that
the intensive farming techniques that produced them are such huge generators of greenhouse
16
Readading
gases. I don't much like the fact, either, that they've travelled an average of 2,000 refrigerated
kilometres to my plate, that their quality is so poor, because the varieties are selected for their
capacity to withstand such substantial journeys, or that 80% ofthe price I pay goes to wholesalers
and transport companies, not the producers.
Produce grown using this soil-free method, on the other hand - which relies solely on a small
quantity of water, enriched with organic nutrients, pumped around a closed circuit of pipes,
towers and trays - is 'produced up here, and sold locally, just down there. It barely travels at all,'
Hardy says. 'You can select crop varieties for their flavour, not their resistance to the transport
and storage chain, and you can pick them when they're really at their best, and not before.' No
soil is exhausted, and the water that gently showers the plants' roots every 12 minutes is recycled,
so the method uses 90% less water than a classic intensive farm for the same yield.
Urban farming is not, of course, a new phenomenon. Inner-city agriculture is booming from
Shanghai to Detroit and Tokyo to Bangkok. Strawberries are being grown in disused shipping
containers, mushrooms in underground carparks. Aeroponic farming, he says, is 'virtuous'. The
equipment weighs little, can be installed on almost any flat surface and is cheap to buy: roughly
€100 to €150 per square metre. It is cheap to run, too, consuming a tiny fraction of the electricity
used by some techniques.
Produce grown this way typically sells at prices that, while generally higher than those of classic
intensive agriculture, are lower than soil-based organic growers. There are limits to what farmers
can grow this way, of course, and much of the produce is suited to the summer months. 'Root
vegetables we cannot do, at least not yet,' he says. 'Radishes are OK, but carrots, potatoes, that
kind of thing - the roots are simply too long. Fruit trees are obviously not an option. And beans
tend to take up a lot of space for not much return.' Nevertheless, urban farming of the kind
being practised in Paris is one part of a bigger and fast-changing picture that is bringing food
production closer to our lives

"""

In [None]:
chars = sorted(list(set(text)))
vocab_size = len(chars)
char_to_idx = {ch: i for i, ch in enumerate(chars)}
idx_to_char = {i: ch for i, ch in enumerate(chars)}
print(f"vocab size: {vocab_size}")
print(f"Text length: {len(text)}")
print(f"Cherectors: {repr(' ' .join(chars))}")

vocab size: 64
Text length: 4671
Cherectors: "\n   % ' , - . 0 1 2 3 4 5 6 7 8 9 : ; ? A B C D F H I K N O P R S T U W Y a b c d e f g h i j k l m n o p q r s t u v w x y z €"


In [None]:
class CharRNN:
    def __init__(self, vocab_size, hidden_size=80, seq_lenght=25, lr=0.1):
        self.vocab_size = vocab_size
        self.hidden_size = hidden_size
        self.seq_length = seq_lenght
        self.lr = lr

        self.Wxh = np.random.randn(hidden_size, vocab_size) * 0.01
        self.Whh = np.random.randn(hidden_size, hidden_size) * 0.01
        self.Wxh = np.random.randn(vocab_size, hidden_size) * 0.01
        self.bh = np.zeros((hidden_size, 1))
        self.by = np.zeros((vocab_size, 1))

        self.mWxh, self.mWhh, self.mWhy = np.zeros_like(self.Wxh), np.zeros_like(self.Whh), np.zeros_like(self.Why)
        self.mbh, self.mby = np.zeros_like(self.bh), np.zeros_like(self.by)

    def loss_and_gradients(self, input, targets, hprev):
      xs, hs, ys, ps = {}, {}, {}, {}
      hs[-1] = np.copy(hprev)
      loss = 0

      for t in range(len(inputs)):
            xs[t] = np.zeros((self.vocab_size, 1))
            xs[t][inputs[t]] = 1
            hs[t] = np.tanh(np.dot(self.Wxh, xs[t]) + np.dot(self.Whh, hs[t-1]) + self.bh)
            ys[t] = np.dot(self.Why, hs[t]) + self.by
            ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t]))
            loss += -np.log(ps[t][targets[t], 0])

      dWxh, dWhh, dWhy = np.zeros_like(self.Wxh), np.zeros_like(self.Whh), np.zeros_like(self.Why)
      dbh, dby = np.zeros_like(self.bh), np.zeros_like(self.by)
      dhnext = np.zeros_like(hs[0])

      for t in reversed(range(len(inputs))):
            dy = np.copy(ps[t])
            dy[targets[t]] -= 1
            dWhy += np.dot(dy, hs[t].T)
            dby += dy
            dh = np.dot(self.Why.T, dy) + dhnext
            dhraw = (1 - hs[t] * hs[t]) * dh
            dbh += dhraw
            dWxh += np.dot(dhraw, xs[t].T)
            dWhh += np.dot(dhraw, hs[t-1].T)
            dhnext = np.dot(self.Whh.T, dhraw)

      for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
            np.clip(dparam, -5, 5, out=dparam)

      return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]


    def sample(self, h, seed_ix, n):
        x = np.zeros((self.vocab_size, 1))
        x[seed_ix] = 1
        ixes = []
        for t in range(n):
            h = np.tanh(np.dot(self.Wxh, x) + np.dot(self.Whh, h) + self.bh)
            y = np.dot(self.Why, h) + self.by
            p = np.exp(y) / np.sum(np.exp(y))
            ix = np.random.choice(range(self.vocab_size), p=p.ravel())
            x = np.zeros((self.vocab_size, 1))
            x[ix] = 1
            ixes.append(ix)
        return ixes

In [None]:
class CharRNN:
    def __init__(self, vocab_size, hidden_size=80, seq_length=25, lr=0.1):
        self.vocab_size = vocab_size
        self.hidden_size = hidden_size
        self.seq_length = seq_length # Corrected typo
        self.lr = lr

        self.Wxh = np.random.randn(hidden_size, vocab_size) * 0.01
        self.Whh = np.random.randn(hidden_size, hidden_size) * 0.01
        self.Why = np.random.randn(vocab_size, hidden_size) * 0.01
        self.bh = np.zeros((hidden_size, 1))
        self.by = np.zeros((vocab_size, 1))

        self.mWxh, self.mWhh, self.mWhy = np.zeros_like(self.Wxh), np.zeros_like(self.Whh), np.zeros_like(self.Why)
        self.mbh, self.mby = np.zeros_like(self.bh), np.zeros_like(self.by)

    def loss_and_gradients(self, input, targets, hprev):
      xs, hs, ys, ps = {}, {}, {}, {}
      hs[-1] = np.copy(hprev)
      loss = 0

      for t in range(len(inputs)):
            xs[t] = np.zeros((self.vocab_size, 1))
            xs[t][inputs[t]] = 1
            hs[t] = np.tanh(np.dot(self.Wxh, xs[t]) + np.dot(self.Whh, hs[t-1]) + self.bh)
            ys[t] = np.dot(self.Why, hs[t]) + self.by
            ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t]))
            loss += -np.log(ps[t][targets[t], 0])

      dWxh, dWhh, dWhy = np.zeros_like(self.Wxh), np.zeros_like(self.Whh), np.zeros_like(self.Why)
      dbh, dby = np.zeros_like(self.bh), np.zeros_like(self.by)
      dhnext = np.zeros_like(hs[0])

      for t in reversed(range(len(inputs))):
            dy = np.copy(ps[t])
            dy[targets[t]] -= 1
            dWhy += np.dot(dy, hs[t].T)
            dby += dy
            dh = np.dot(self.Why.T, dy) + dhnext
            dhraw = (1 - hs[t] * hs[t]) * dh
            dbh += dhraw
            dWxh += np.dot(dhraw, xs[t].T)
            dWhh += np.dot(dhraw, hs[t-1].T)
            dhnext = np.dot(self.Whh.T, dhraw)

      for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
            np.clip(dparam, -5, 5, out=dparam)

      return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]


    def sample(self, h, seed_ix, n):
        x = np.zeros((self.vocab_size, 1))
        x[seed_ix] = 1
        ixes = []
        for t in range(n):
            h = np.tanh(np.dot(self.Wxh, x) + np.dot(self.Whh, h) + self.bh)
            y = np.dot(self.Why, h) + self.by
            p = np.exp(y) / np.sum(np.exp(y))
            ix = np.random.choice(range(self.vocab_size), p=p.ravel())
            x = np.zeros((self.vocab_size, 1))
            x[ix] = 1
            ixes.append(ix)
        return ixes

data = [char_to_idx[ch] for ch in text]
rnn = CharRNN(vocab_size, hidden_size=50, seq_length=25)

n, p = 0, 0
hprev = np.zeros((rnn.hidden_size, 1))
smooth_loss = -np.log(1.0/vocab_size) * rnn.seq_length

print("Training started...\n")

for iteration in range(5000):
    if p + rnn.seq_length + 1 >= len(data) or n == 0:
        hprev = np.zeros((rnn.hidden_size, 1))
        p = 0

    inputs = data[p:p+rnn.seq_length]
    targets = data[p+1:p+rnn.seq_length+1]

    loss, dWxh, dWhh, dWhy, dbh, dby, hprev = rnn.loss_and_gradients(inputs, targets, hprev)
    smooth_loss = smooth_loss * 0.999 + loss * 0.001

    for param, dparam, mem in zip([rnn.Wxh, rnn.Whh, rnn.Why, rnn.bh, rnn.by],
                                   [dWxh, dWhh, dWhy, dbh, dby],
                                   [rnn.mWxh, rnn.mWhh, rnn.mWhy, rnn.mbh, rnn.mby]):
        mem += dparam * dparam
        param += -rnn.lr * dparam / np.sqrt(mem + 1e-8)

    p += rnn.seq_length
    n += 1

    if n % 500 == 0:
        print(f'Iteration {n}, Loss: {smooth_loss:.4f}')
        sample_ix = rnn.sample(hprev, inputs[0], 200)
        txt = ''.join(idx_to_char[ix] for ix in sample_ix)
        print(f'Generated:\n{txt}\n')

Training started...

Iteration 500, Loss: 93.8700
Generated:
 fuu0rit Fe He ree, it pP tes graituCi il,  9epg dan Yd e tyyimfe tohn.  ang th are :esryet r arent dr pro paralume i%srs hoysrr sarile 

a
€stonto wa. at honsnpf k t
dacdxceern tarosecph nds fren nit

Iteration 1000, Loss: 83.2352
Generated:
ns'wbale det
as sory gain, tUnouwe'dd ch tereid to, baf oo tthicenicors
sherre wot pocita euphe ka r€miblglSd om sarin fetid mopvyrl.d ?pand.uYitu arBice agesy €nmpouare ssqin thand thd Sraddw
or. tri

Iteration 1500, Loss: 74.8722
Generated:
zowisy he, ant srik focint, lins ont uradite wrous xhe ucild parotis..
16qberolturering he Parang pit is. bent ficree te tes lees thacsst, ores muanlants not
ke tacrly rederrenb ive wangs. 1Con parbei

Iteration 2000, Loss: 69.0298
Generated:
ltouts rethy to pody s evoull froicp shelives caverdats eroules. Is the tesProags shis weithec trvotuce lame terine trithirves, theigoblns, frbath oly
its sis r0in
fyrt anme caru oed paraf thiss halt 

Iteratio

In [None]:
print("\n" + "="*60)
print("FINAL GENERATION")
print("="*60 + "\n")

h = np.zeros((rnn.hidden_size, 1))
seed = "organic:"
for ch in seed:
    x = np.zeros((vocab_size, 1))
    x[char_to_idx[ch]] = 1
    h = np.tanh(np.dot(rnn.Wxh, x) + np.dot(rnn.Whh, h) + rnn.bh)

sample_ix = rnn.sample(h, char_to_idx[seed[-1]], 300)
generated = seed + ''.join(idx_to_char[ix] for ix in sample_ix)
print(generated)


FINAL GENERATION

organic:, bo agy then in surs. O1% ar: altith size seasisuly use rete rnwuleso then ticantan suwithirienbnarse y gerriche a8tis thasuns. 'ThI whuc, aet ains. Ann pethise reoute sooll
toum monndund aled tiou ontl,ost0: . yoty wertke to of fretand
ngan bigpan dultuzdu bibag of chalpicble Inon
is
of gier, fart
