# Day-71: Introduction to RNNs (Recurrent Neural Network)

In the last few sessions, we worked with Convolutional Neural Networks (CNNs) for image-based tasks.
But now, we’re stepping into the world of sequences — where order and context matter.

So today, we’ll explore one of the most powerful architectures for sequence data — Recurrent Neural Networks (RNNs).

These are the foundation of models used in language translation, speech recognition, time-series forecasting, and even chatbots like the one you’re watching me through right now!

## Topics Covered

- What are sequences?

- Time-dependent data

- The concept of Recurrent Neural Networks

- Understanding Vanishing Gradients

## What are Sequences?

A sequence is simply data where the elements are dependent on each other, and their order is significant.

- `Analogy`:
 - Let’s start simple — imagine you’re trying to predict the next word in a sentence:\
**The cat sat on the ____**

 - You can’t make a good guess unless you know what came before — “cat” and “sat on the.”
 - That’s sequence dependency — each piece of data depends on the previous ones.

Examples:

- Stock prices (today’s price depends on yesterday’s)

- Temperature forecasting

- Text, speech, and music

Traditional neural networks (like CNNs or FFNNs) treat each input independently —
but RNNs remember context, just like your brain recalls what you said a few seconds ago in a conversation.

## How RNNs Work

Think of an RNN as a human note-taker during a lecture:

 - Each new sentence (input) adds to their memory.

 - They use what they remember from before to understand the current point.

### Time-Dependent Data 

Time-dependent data is a type of sequence where the dependency is explicitly over time. RNNs handle this by feeding the output of a neuron at time step $t−1$ as an additional input to the same neuron at time step $t$. This hidden state acts as the network's memory.

- `Analogy`: Imagine reading a novel. As you read sentence $N$, you need to remember the context from sentences $N−1$,$N−2$, and so on, to fully understand the current plot. The RNN's hidden state is like your short-term memory for the text you've already processed.

- Mathematical Vibe: The hidden state $h_t$ at time t is calculated as:

$ h_t = f(W_hhh_t−1 + W_xhx_t + b_h) $

where:
- $ x_t $ is the input at time $ t$ . 
- $ h_t−1 $ is the previous hidden state (the memory). 
- $ f $ is the activation function (like $tanh$ or $ReLU$).
- $ W$'s are the weight matrices (shared across all time steps).

## The Vanishing Gradient Problem