### RNN
Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed to recognize patterns in sequences of data, such as time series, speech, text, or video. Unlike traditional feedforward neural networks, RNNs have connections that form directed cycles, allowing them to maintain a 'memory' of previous inputs in the sequence.
 
Types of RNNs include:
- **One to One**: Standard neural networks where each input corresponds to one output.
- **One to Many**: A single input produces a sequence of outputs (e.g., image captioning).
- **Many to One**: A sequence of inputs produces a single output (e.g., sentiment analysis).
- **Many to Many**: Both input and output are sequences (e.g., machine translation).


### NLP
Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and human language. It involves the development of algorithms and models that enable machines to understand, interpret, generate, and respond to human language in a meaningful way. NLP encompasses a wide range of tasks, including text classification, sentiment analysis, machine translation, named entity recognition, and more. Techniques used in NLP include tokenization, part-of-speech tagging, parsing, and the use of advanced models like transformers and RNNs to capture the context and semantics of language.

NLP is divided into two main categories:
- **Natural Language Understanding (NLU)**: This involves comprehending the meaning and context of text, including tasks like sentiment analysis, entity recognition, and intent detection.
- **Natural Language Generation (NLG)**: This focuses on producing human-like text based on given data or prompts, such as in chatbots, text summarization, and content creation.

Applications of NLP include:
- **Chatbots and Virtual Assistants**: Systems like Siri, Alexa, and Google Assistant that interact with users in natural language.
- **Sentiment Analysis**: Analyzing customer reviews or social media posts to determine public sentiment towards products or services.
- **Machine Translation**: Translating text from one language to another, as seen in services like Google Translate.
- **Text Summarization**: Automatically generating concise summaries of longer documents.

NLP Python packages:
- **NLTK (Natural Language Toolkit)**: A comprehensive library for building NLP applications, providing tools for tokenization, stemming, tagging, parsing, and more.
- **spaCy**: An open-source library designed for fast and efficient NLP tasks, including named entity recognition, part-of-speech tagging, and dependency parsing.
- **Transformers (by Hugging Face)**: A library that provides pre-trained models for various NLP tasks, including BERT, GPT, and T5, enabling state-of-the-art performance in text generation, classification, and more.
- **Gensim**: A library focused on topic modeling and document similarity analysis, useful for tasks like word embedding and latent semantic analysis.
- **TextBlob**: A simple library for processing textual data, providing easy-to-use APIs for common NLP tasks like sentiment analysis and noun phrase extraction.

How Text Preprocessing in NLP works ?
- Tokenization
- Stop Words Removal
- Normalization (Lowercasing, Stemming, Lemmatization)
- Vectorization (Bag of Words, TF-IDF, Word Embeddings)
- Part-of-Speech Tagging
- Named Entity Recognition

In [19]:
# word tokenization 

text = "There are multiple ways we can perform tokenization on given text data. We can choose any method based on langauge, library and purpose of modeling."
# Split text by whitespace
tokens = text.split()
print(tokens)


['There', 'are', 'multiple', 'ways', 'we', 'can', 'perform', 'tokenization', 'on', 'given', 'text', 'data.', 'We', 'can', 'choose', 'any', 'method', 'based', 'on', 'langauge,', 'library', 'and', 'purpose', 'of', 'modeling.']


In [20]:
# Lets split the given text by full stop (.)
text = "Characters like periods, exclamation point and newline char are used to separate the sentences. But one drawback with split() method, that we can only use one separator at a time! So sentence tonenization wont be foolproof with split() method."
text.split(". ") # Note the space after the full stop makes sure that we dont get empty element at the end of list.


['Characters like periods, exclamation point and newline char are used to separate the sentences',
 'But one drawback with split() method, that we can only use one separator at a time! So sentence tonenization wont be foolproof with split() method.']

In [1]:
%pip install --user -U nltk

Note: you may need to restart the kernel to use updated packages.


In [2]:
import nltk

In [3]:
from nltk.tokenize import word_tokenize

In [6]:
nltk.download('punkt')
nltk.download('punkt_tab')


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Asys\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\Asys\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt_tab.zip.


True

In [7]:
text = """There are multiple ways we can perform tokenization on given text data. We can choose any method based on langauge, library and purpose of modeling."""
tokens = word_tokenize(text)
print(tokens)


['There', 'are', 'multiple', 'ways', 'we', 'can', 'perform', 'tokenization', 'on', 'given', 'text', 'data', '.', 'We', 'can', 'choose', 'any', 'method', 'based', 'on', 'langauge', ',', 'library', 'and', 'purpose', 'of', 'modeling', '.']


In [8]:
from nltk.tokenize import sent_tokenize


In [9]:

text = """Characters like periods, exclamation point and newline char are used to separate the sentences. But one drawback with split() method, that we can only use one separator at a time! So sentence tonenization wont be foolproof with split() method."""
sent_tokenize(text)

['Characters like periods, exclamation point and newline char are used to separate the sentences.',
 'But one drawback with split() method, that we can only use one separator at a time!',
 'So sentence tonenization wont be foolproof with split() method.']

In [10]:
from nltk.corpus import stopwords

In [11]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Asys\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:

len(set(stopwords.words('english')))

# sample sentence
text = """He determined to drop his litigation with the monastry, and relinguish his claims to the wood-cuting and 
fishery rihgts at once. He was the more ready to do this becuase the rights had become much less valuable, and he had 
indeed the vaguest idea where the wood and river in question were."""

# set of stop words
stop_words = set(stopwords.words('english')) 

# tokens of words  
word_tokens = word_tokenize(text) 
    
filtered_sentence = [] 
  
for w in word_tokens: 
    
    if w not in stop_words: 
        filtered_sentence.append(w) 


print("\nOriginal Sentence \n")
print(" ".join(word_tokens)) 


print("\nFiltered Sentence \n")
print(" ".join(filtered_sentence)) 



Original Sentence 

He determined to drop his litigation with the monastry , and relinguish his claims to the wood-cuting and fishery rihgts at once . He was the more ready to do this becuase the rights had become much less valuable , and he had indeed the vaguest idea where the wood and river in question were .

Filtered Sentence 

He determined drop litigation monastry , relinguish claims wood-cuting fishery rihgts . He ready becuase rights become much less valuable , indeed vaguest idea wood river question .


### GenAI

In [1]:
import tensorflow as tf
from tensorflow.keras.layers import Dense, LeakyReLU, BatchNormalization, Reshape, Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
import numpy as np
import matplotlib.pyplot as plt


In [2]:
# Generator model
def build_generator():
    model = Sequential()
    model.add(Dense(256, input_dim=100))
    model.add(LeakyReLU(alpha=0.2))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Dense(512))
    model.add(LeakyReLU(alpha=0.2))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Dense(1024))
    model.add(LeakyReLU(alpha=0.2))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Dense(28 * 28, activation='tanh'))
    model.add(Reshape((28, 28, 1)))
    return model


In [3]:
# Discriminator model
def build_discriminator():
    model = Sequential()
    model.add(Flatten(input_shape=(28, 28, 1)))
    model.add(Dense(512))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dense(256))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dense(1, activation='sigmoid'))
    return model


In [4]:
# Compile the GAN
def compile_gan(generator, discriminator):
    discriminator.compile(loss='binary_crossentropy', optimizer=Adam(0.0002, 0.5), metrics=['accuracy'])
    discriminator.trainable = False
    gan_input = tf.keras.Input(shape=(100,))
    gan_output = discriminator(generator(gan_input))
    gan = tf.keras.Model(gan_input, gan_output)
    gan.compile(loss='binary_crossentropy', optimizer=Adam(0.0002, 0.5))
    return gan


In [5]:
# Training the GAN
def train_gan(generator, discriminator, gan, epochs=10000, batch_size=64, sample_interval=1000):
    (X_train, _), (_, _) = tf.keras.datasets.mnist.load_data()
    X_train = (X_train.astype(np.float32) - 127.5) / 127.5
    X_train = np.expand_dims(X_train, axis=3)
    real = np.ones((batch_size, 1))
    fake = np.zeros((batch_size, 1))

    for epoch in range(epochs):
        idx = np.random.randint(0, X_train.shape[0], batch_size)
        real_imgs = X_train[idx]
        noise = np.random.normal(0, 1, (batch_size, 100))
        gen_imgs = generator.predict(noise)
        d_loss_real = discriminator.train_on_batch(real_imgs, real)
        d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
        d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
        noise = np.random.normal(0, 1, (batch_size, 100))
        g_loss = gan.train_on_batch(noise, real)

        if epoch % sample_interval == 0:
            print(f"{epoch} [D loss: {d_loss[0]}, acc.: {100 * d_loss[1]}%] [G loss: {g_loss}]")
            sample_images(generator)

