

### Conv/LSTM Equations with Markdown:

1. **Input Gate** (\(i_t\)):
   $$ i_t = \sigma(W_{xi} \ast x_t + W_{hi} \ast h_{t-1} + W_{ci} \circ C_{t-1} + b_i) $$
   Decides what new information is added to the cell state, with convolution $\ast$ applied to inputs and the previous hidden state, and an element-wise multiplication $\circ$ with the previous cell state.

2. **Forget Gate** (\(f_t\)):
   $$ f_t = \sigma(W_{xf} \ast x_t + W_{hf} \ast h_{t-1} + W_{cf} \circ C_{t-1} + b_f) $$
   Decides what information is discarded from the cell state, utilizing the same convolutional and element-wise operations to weigh the inputs, previous hidden state, and previous cell state.

3. **Cell State** (\(C_t\)):
   $$ C_t = f_t \circ C_{t-1} + i_t \circ \tanh(W_{xc} \ast x_t + W_{hc} \ast h_{t-1} + b_c) $$
   Updates the cell state by forgetting selected past information and adding new candidate values, modulated by the input and forget gates' outputs and processed through convolutional layers.

4. **Output Gate** (\(o_t\)):
   $$ o_t = \sigma(W_{xo} \ast x_t + W_{ho} \ast h_{t-1} + W_{co} \circ C_t + b_o) $$
   Decides the next hidden state by filtering the cell state's content, with the convolution applied to inputs and the previous hidden state, and element-wise multiplication with the current cell state.

5. **Hidden State** (\(h_t\)):
   $$ h_t = o_t \circ \tanh(C_t) $$
   The final hidden state output for the current time step, combining the output gate's decision with the activated cell state to produce the next hidden state that carries both spatial and temporal information.





In [None]:
!wget https://raw.githubusercontent.com/cbtn-data-science-ml/tensorflow-professional-developer/main/model_utils.py

In [None]:
from model_utils import plot_loss_and_accuracy, early_stopping_callback, model_checkpoint_callback

| Feature                | `!cd` (Shell Command)       | `%cd` (Magic Command)                  |
|------------------------|-----------------------------|----------------------------------------|
| **Scope**             | Temporary (subshell only)   | Persistent (notebook-wide)            |
| **Effect on Notebook**| No effect on working dir    | Changes notebook's working dir        |
| **Use Case**          | One-off shell commands      | Lasting directory changes             |

In [None]:
# Print working directory !pwd or % pwd?


In [None]:
# Change directory


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dropout


## EDA

In [None]:
# Good idea to shuffle the data


In [None]:
# Is the dataset balanced? close enought to 50/50 IMO
# if imbalanced see: https://www.tensorflow.org/tutorials/structured_data/imbalanced_data

In [None]:
# Sample 5 random tweets and their classification



In [None]:
# For the training dataset



In [None]:
# For the test dataset


In [None]:
# Calculate word counts for each tweet
train_df['word_count'] = train_df['text'].apply(lambda x: len(str(x).split()))

plt.figure(figsize=(10, 6))
sns.histplot(train_df['word_count'], bins=30, kde=True)
plt.title('Word Count Distribution in Tweets')
plt.xlabel('Word Count')
plt.ylabel('Frequency')
plt.show()


In [None]:
sns.countplot(x='target', data=random_samples)
plt.title('Class Distribution in Random Samples')
plt.xlabel('Disaster Tweets (1) vs. Non-Disaster Tweets (0)')
plt.ylabel('Count')
plt.show()


In [None]:
# Imports


# Data paths


# Load data


# Shuffle data


# Preprocessing




# Tokenization and padding



In [None]:
# Build, train, and compile model

# LSTM Architecture
LSTM layers help models to both remember details from long ago and forget irrelevant data, perfect for complex tasks where the sequence and context of information (like the unfolding of a story) matter a lot.

In [None]:
# LSTM Architecture


In [None]:
# After training, plot the loss and accuracy


# GRU model

A GRU layer helps the model remember and use past information to make decisions, making it great for tasks where understanding the sequence or flow of data (like the order of words in a sentence) is important.

In [None]:
# GRU model



In [None]:
# GRU Architecture


In [None]:
# After training, plot the loss and accuracy
