Convolutional Neural Networks (CNNs)

A Convolutional Neural Network (CNN) is a deep learning model specifically designed for processing structured grid-like data, such as images.

Structure of CNNs:

1. Convolutional Layers

2. Pooling Layers -- MaxPooling, AveragePooling -- dimensionality reduction or downsizing the size

3. Fully Connected (Dense) Layers.


convolution + pooling = feature extractions
before dense layer there is flatling layer

by default there is no padding , without it feature map will be shrink, to avoid that we use padding othre wise dimensions will be changed after pooling

In [2]:
#CIFAR-10 - popular dataset for image classification with 10 classes

from tensorflow import keras

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten,Dense,Dropout

from tensorflow.keras.datasets import cifar10

import matplotlib.pyplot as plt

import tensorflow as tf

import numpy as np

In [3]:
#load CIFAR dataset
(X_train,y_train),(X_test,y_test) = cifar10.load_data()

#Normalize the pixel values
X_train,X_test = X_train / 255.0, X_test/255

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step


In [4]:
#convert labels to one-hot encoding
y_train = keras.utils.to_categorical(y_train, num_classes = 10)
y_test = keras.utils.to_categorical(y_test, num_classes = 10)

In [6]:
#build CNN MODEL
model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(32, 32, 3)),  #32 at start is no of features . and in input shape 3 represent no of channels (RGB)
    MaxPooling2D(2,2),  # downsampled to 2 by 2
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(128, (3,3), activation='relu'),
    Flatten(),
    Dense(128, activation='relu'), # this is fully connected layer or dense layer before giving input here we have to flatten it from vector to 1d vector
    Dropout(0.6), #prevent overfitting  , its a regularization parameter, sooo 60% of neuran are deactivated
    Dense(10, activation='softmax')  # for classification task we have to provide activation function , for binary it will be sigmoid, 10 means 10 classes
])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [9]:
#compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])  # why categorical_crossentropy becz its a classification taks and metrics to evalutate wil  be accuracy

In [10]:
model.summary()

without padding 32 by 32 to 30 by 30 in conv2d



 in total param and trainable parameter
we have to adjust bais and weights to all the layers


in transfer learning only some of the layers are trainable

In [12]:
#train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m72s[0m 89ms/step - accuracy: 0.2946 - loss: 1.8829 - val_accuracy: 0.5393 - val_loss: 1.2902
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 91ms/step - accuracy: 0.5075 - loss: 1.3786 - val_accuracy: 0.5931 - val_loss: 1.1425
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m77s[0m 85ms/step - accuracy: 0.5742 - loss: 1.2070 - val_accuracy: 0.6494 - val_loss: 1.0059
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m68s[0m 87ms/step - accuracy: 0.6179 - loss: 1.0906 - val_accuracy: 0.6595 - val_loss: 0.9683
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m67s[0m 86ms/step - accuracy: 0.6525 - loss: 1.0020 - val_accuracy: 0.6853 - val_loss: 0.9078
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 83ms/step - accuracy: 0.6813 - loss: 0.9240 - val_accuracy: 0.6931 - val_loss: 0.8772
Epoch 7/10
[1m7

In [None]:
test_acc = model.evaluate(X_test, y_test)
print(f'Test Accuracy : {test_acc}')

In [None]:
#MNIST dataset - contains gray scale 28x28 images of handwritten digits

IMPROVEMENTS

check images  date 10feb 3:09 - 3:10


                         ..                    

**Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM)**

Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks are types of neural networks specifically designed to process sequential data, such as time series, speech, or text.

1. RNN

- Sequential Data
- Memory Cells # it has ability to maintain info overtime
- Challenges with RNNs # longterm dependencies ( hist data of 1 or more years) there is a vanishing gradient problem, when we try to adjust weight the previous weights will be gone from the memory to avoid this we have LSTM


2. LSTM

- Structure of an LSTM: Forget Gate, Input Gate, Output Gate

- An LSTM has three gates (gating mechanism):
      Forget Gate: Decides which information from the previous time step should be discarded.

      Input Gate: Determines which new information should be added to the memory.

      Output Gate: Controls which part of the memory should be output to the next step.( Determines what the next hidden state should be)
-  Benefits of LSTM
    Text processing , speech data and time series both are goood choices

Pre-processing Steps for Text Data

1. Text Cleaning
2. Tokenization #splitting the text into small words knowns as tokens
3. Stopword Removal
4. Stemming # ing words like RUNNING
5. Lemmatization # good- better same
6. Removing Non-Alphabetic Characters
7. Handling Imbalanced Data
8. Text Vectorization # bag of words comes under this like how many times a particular text is present
9. Removing Rare Words
10. Handling Negations


#CHECK IMG 3:22
NLP TAKS  --- Preprocessing steps

In [15]:
#RNN for next- word prediction using shakespereblabla dataset
import requests

# URL of the dataset
url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"

# Download the content
response = requests.get(url)

# Save it to a file
with open("tiny_shakespeare.txt", "w", encoding="utf-8") as file:
    file.write(response.text)

print("Download complete: tiny shakespeare.txt")

Download complete: tiny shakespeare.txt


In [17]:
#Import Libraries

from tensorflow.keras.preprocessing.text import Tokenizer

from tensorflow.keras.preprocessing.sequence import pad_sequences

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Embedding, LSTM, SimpleRNN, Dense, Dropout
import numpy as np

import tensorflow as tf

In [20]:
#load the data
with open("tiny_shakespeare.txt", "r", encoding="utf-8") as file:
    shakespeare_text = file.read().lower()

In [23]:
# create a sorted list
chars = sorted(set(shakespeare_text))

#create a dictionary
char_to_index = {char : idx for idx, char in enumerate(chars)}

#create a reverse dictionary
index_to_char = {idx : char for idx, char in enumerate(chars)}

vocab_size = len(chars)

In [24]:
#convert text to character indices
text_as_int = np.array([char_to_index[char] for char in shakespeare_text])

#setting the sequence Length
sequence_length = 100
examples_per_epoch = len(shakespeare_text) // (sequence_length + 1)

#create the input and target
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(sequence_length + 1, drop_remainder=True)

#split the sequences into X and y
def split_input_target(chunk):
  input_text = chunk[:-1]
  target_text = chunk[1:]
  return input_text, target_text

dataset = sequences.map(split_input_target)

BATCH_SIZE = 64 # Define BATCH_SIZE here
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

In [32]:
#Build a simple RNN model
model_rnn = Sequential([
    Embedding(input_dim=vocab_size, output_dim=256),
    SimpleRNN(1024, return_sequences=True, stateful=False, recurrent_initializer='glorot_uniform'),
    Dense(vocab_size, activation='softmax') ])

In [34]:
#compile the model

model_rnn.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
#Train the model

EPOCHS = 20 #change this value based on performance
history = model_rnn.fit(dataset, epochs=EPOCHS)

Epoch 1/20
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m356s[0m 2s/step - accuracy: 0.2110 - loss: 2.8744
Epoch 2/20
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m341s[0m 2s/step - accuracy: 0.4024 - loss: 2.0138
Epoch 3/20
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m368s[0m 2s/step - accuracy: 0.4765 - loss: 1.7502
Epoch 4/20
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m376s[0m 2s/step - accuracy: 0.5154 - loss: 1.6051
Epoch 5/20
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m331s[0m 2s/step - accuracy: 0.5359 - loss: 1.5203
Epoch 6/20
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m326s[0m 2s/step - accuracy: 0.5491 - loss: 1.4663
Epoch 7/20
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m324s[0m 2s/step - accuracy: 0.5591 - loss: 1.4293
Epoch 8/20
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m325s[0m 2s/step - accuracy: 0.5660 - loss: 1.3963
Epoch 9/20
[1m172/172[0m [32m

In [None]:
#Define the model (switch between RNN and LSTM)

def build_model(vocab_size, embedding_dim, rnn_units, batch_size, use_lstm = True):

model = Sequential([
    Embedding(vocab_size,embedding_dim, input_length=sequence_length),
    LSTM(rnn_units, return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform')
    if use lstm else
    SimpleRNN(rnn_units, return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'),
    Dense(vocab_size)

return_model

In [None]:
#hyperparaneters
embedding_dim = 256
rnn_units = 1024
use_lstm = TRUE # change to false for simple RNN
sequence_length = 100
model_lstm = build_model(vocab_size,embedding_dim, rnn_units, sequence_length, use_lstm)

#LSTM
IMPROVEMENTS

1. Model Architecture Enhancements

2. Training Optimizations

3. Data Preprocessing Improvements

4. Regularization & Generalization

5. Evaluation & Post-Processing

# TRANSFER LEARNING



Transfer learning is a machine learning technique where a model trained on one task is reused for a different but related task. This allows us to:

 - Leverage pre-learned features.

 - Reduce training time since the model already has strong feature extraction capabilities.
 - Improve accuracy with limited data.
# Popular Transfer Learning Models
1. Image Classification Models

- VGG16/VGG19
- ResNet (ResNet50, ResNet101, ResNet152)
- InceptionV3
- EfficientNet (EfficientNetB0 B7)
- MobileNetV2
I

2. Natural Language Processing (NLP) Models

- BERT (Bidirectional Encoder Representations from Transformers)

- GPT (Generative Pre-trained Transformer, including GPT-3 & GPT-4)
- T5 (Text-To-Text Transfer Transformer)
- XLNet

3. Object detection and segmentation models
- YOLO ( You Only Look Once, YOLOv4, YOLOv5, YOLOv8)
- Mask R-CNN
- Faster R-CNN

4. Speech & Audio Processing Models

- WaveNet
- DeepSpeech
- Whisper (by OpenAI)


check img 4:02