# Day-72: Long Short-Term Memory (LSTM)

Welcome back to Day 72 of our 100 Days of Deep Learning Challenge!
In the last session, we talked about RNNs — how they handle sequential data and where they fail due to the vanishing gradient problem.

Today, on Day 72, we're tackling the big daddy of sequence modeling: the Long Short-Term Memory (LSTM) network! Get ready, because this is where things get seriously powerful for handling long-term dependencies.

So, if you’ve ever wondered how models like ChatGPT, speech recognizers, or language translators remember context over long sequences — LSTMs are the foundation.

## Topics Covered:

- Why LSTM was introduced

- LSTM cell structure

- Input, Forget, and Output Gates

- Example: Remembering context in sentences

- Code Implementation using Keras

## Why LSTM was introduced?

- Think of a student reading a long paragraph.
- A standard RNN tries to remember every word — but after a few sentences, it forgets what came earlier.
- LSTM, on the other hand, acts like a smart note-taker — it writes down important info and crosses out the unimportant stuff.

## LSTM cell structure

LSTM is designed with a special cell state (the "conveyor belt") that runs straight through the entire sequence, allowing information to be carried forward with minimal loss. The flow of information to and from this cell state is regulated by three main gates: 
1. the Forget Gate, 
2. the Input Gate, 
3. and the Output Gate.

### The Analogy: A Factory Assembly Line

Think of the LSTM cell as a sophisticated factory assembly line.

1. The Cell State ($C_t$): The Conveyor Belt

This is the main path where the product (the long-term memory/information) moves. It's a straight line, making it easy for information to flow unaltered.

2. The Forget Gate ($f_t$): The Quality Control Manager 
  - Purpose: Decides which information from the previous cell state ($C_{t−1}$) is no longer relevant and should be forgotten (erased).
  - Mechanism: It looks at the current input ($x_t$) and the previous hidden state ($h_{t−1}$) and outputs a number between 0 and 1 for each value in $C_{t−1}$.
      - 1 means "keep this completely."
      - 0 means "completely forget this."

  - `Example`: If we're processing a sentence and the subject changes (e.g., "John is a coder... Mary is a singer..."), the Forget Gate would likely output a 0 for information about John's gender/role when processing the information about Mary.

3. The Input Gate ($i_t$) & Candidate Cell State ($C~t$): The New Shipment Receiver and Box-Filler

  - This gate decides which new information from the current input will be stored in the cell state.
  - The Input Gate ($i_t$): Acts as a filter. It decides which parts of the new input ($x_t$ and $h_{t−1}$) are important enough to be considered for the cell state.

  - The Candidate Cell State ($C~t$): Creates a vector of potential new values that could be added to the state.
  - The Action: The $C_{t−1}$ (old cell state) is updated to $C_t$(new cell state) by first multiplying by the Forget Gate's output (dropping old info) and then adding the element-wise product of the Input Gate's output and the Candidate Cell State (adding selected new info).

4. The Output Gate ($o_t$): The Final Product Inspector
  - Purpose: Decides what the final output (the new hidden state, $h_t$) will be, based on the newly updated cell state ($C_t$).
  - Mechanism: It filters the updated cell state. The gate looks at $x_t$ and $h_{t−1}$ to decide which parts of $C_t$ are relevant for the current time step's prediction.
  - `Example`: The cell state might hold information about all the things John does, but if the current input is a question about his job, the Output Gate will only let the "coder" part of the cell state be used to calculate the hidden state ($h_t$) for the next step