<a href="https://colab.research.google.com/github/Alaaokaly/nlp-foundations/blob/main/RNNtextgenerating.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import numpy as np
import torch.nn as nn
import torch.optim as optim
import torch

In [2]:
corpus = """Weather in Asia: A Diverse Tapestry of Climates

Asia, the largest continent on Earth, is renowned for its immense diversity, not only in culture and geography but also in climate. From the icy tundras of Siberia to the tropical rainforests of Southeast Asia, the weather across this vast expanse varies dramatically. Understanding the climatic conditions in different regions of Asia is essential for agriculture, tourism, and daily life, influencing everything from crop cycles to travel plans.

In northern Asia, particularly in Siberia, winters are harsh and long. Temperatures can plummet to as low as -40°C in some areas, making it one of the coldest places on Earth. The vast taiga forest is covered in snow for much of the year, creating a winter wonderland that attracts adventurous tourists. However, summers are brief and can be surprisingly warm, with temperatures reaching 30°C.

Moving south, East Asia experiences a different climate, significantly influenced by the monsoon system. Countries like China, Japan, and Korea see seasonal rains that are crucial for agriculture. The summer months bring heavy downpours, often accompanied by typhoons, especially in coastal regions.

Southeast Asia is characterized by its tropical climate, with high temperatures and humidity year-round. Countries like Thailand and Indonesia experience two main seasons: the dry season and the wet season. The monsoon rains can lead to flooding but also nourish the lush landscapes.

In South Asia, the weather is dominated by the Indian monsoon. The summer monsoon brings heavy rains, crucial for replenishing water supplies and supporting agriculture. However, it can also lead to natural disasters like floods. In contrast, the winter months bring cooler temperatures to northern India.

Central Asia, encompassing countries like Kazakhstan and Uzbekistan, is characterized by its continental climate with hot summers and cold winters. The region is largely arid, with deserts dominating the landscape. Despite the harsh conditions, Central Asia has a rich history of nomadic cultures.

The weather across Asia is a complex subject, reflecting the continent's vast geographical diversity. Each region has developed its own unique climatic conditions, influencing everything from agriculture to culture. Understanding these dynamics becomes increasingly important as climate change impacts weather patterns globally."""


In [3]:
data = corpus.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('data has %d characters, %d unique.' % (data_size, vocab_size))

char_to_idx = {char: idx for idx, char in enumerate(chars)}
idx_to_char = {idx: char for idx, char in enumerate(chars)}


data has 2413 characters, 37 unique.


In [4]:
# RNN generating text model

epochs = 1000
input_size =  len(chars)
output_size =  len(chars)
hidden_n = 37
sequence_length = 120

class RNN(nn.Module):
    def __init__(self, input_size, output_size,hidden_size ):
        super(RNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_n, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x, h_0):
        out, h_t = self.rnn(x, h_0)
        out = self.fc(out[:, -1, :])  # Use only the last output
        return out

model = RNN(input_size, hidden_n, output_size)


In [5]:

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


In [None]:
for epoch in range(epochs):
    for i in range(len(corpus) - sequence_length):
        # Prepare input and target sequences
        x_seq = [char_to_idx[ch] for ch in data[i:i + sequence_length]]
        y_seq = char_to_idx[data[i+sequence_length]]

        # Convert to tensor and one-hot encode
        x_tensor = torch.zeros(1, sequence_length, vocab_size)
        for j, idx in enumerate(x_seq):
            x_tensor[0, j, idx] = 1  # One-hot encoding

        # Initialize hidden state
        h_0 = torch.zeros(1, 1, hidden_n)  # Shape (num_layers, batch_size, hidden_size)

        # Forward pass
        optimizer.zero_grad()  # Zero the gradients
        y_pred = model(x_tensor, h_0)  # Get the prediction

        # Compute loss
        loss = criterion(y_pred, torch.tensor([y_seq]))

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

In [None]:
with torch.no_grad():
    test_input = "hel"  # Take the last 3 characters for prediction
    x_seq = [char_to_idx[ch] for ch in test_input]
    x_tensor = torch.zeros(1, sequence_length, vocab_size)
    for j, idx in enumerate(x_seq):
        x_tensor[0, j, idx] = 1  # One-hot encoding

    h_0 = torch.zeros(1, 1, hidden_size)
    predicted_output = model(x_tensor, h_0)
    predicted_char_idx = torch.argmax(predicted_output).item()
    predicted_char = idx_to_char[predicted_char_idx]

    print(f"Input: '{test_input}' -> Predicted next character: '{predicted_char}'")

In [None]:
# evaluate using preplexity