# Natural Language Processing

- **Chatbot with Transformer Models**:  
A chatbot is a conversational agent that interacts with users in natural language. Using pre-trained transformer models like GPT (Generative Pre-trained Transformers), chatbots can understand and generate human-like responses. The goal is to create a chatbot that can hold coherent and contextually relevant conversations with users over multiple turns.

- **Language Modeling and Text Generation**:  
Language modeling is the task of predicting the next word or sequence of words in a sentence based on the context provided by previous words. Using pre-trained GPT (Generative Pre-trained Transformer) models, we can train a model to generate creative text, such as poetry or short stories, and build a text autocomplete system. The goal is to create a model that understands the structure and style of the input text and can generate coherent, contextually appropriate continuations.

## Chatbot with Transformer Models

### Problem Description:
A chatbot is a conversational agent that interacts with users in natural language. Using pre-trained transformer models like GPT (Generative Pre-trained Transformers), chatbots can understand and generate human-like responses. The goal is to create a chatbot that can hold coherent and contextually relevant conversations with users over multiple turns.

### Key Concepts:
1. **Transformer Models**: Deep learning models designed to understand the context and relationships between words in a sequence. They excel in tasks like language understanding and generation.
2. **Tokenization**: The process of breaking down text into smaller units (tokens) that the model can understand.
3. **Context**: The previous conversation history that helps the model generate responses relevant to the current input.
4. **Inference**: The process of generating a response based on user input using a pre-trained model.
5. **Pre-Trained Model**: A model that has been trained on a large corpus of text data and can be fine-tuned for specific tasks (e.g., GPT-2, DialoGPT).

### Chatbot Process:
1. **Input Processing**: The user inputs a message, which is tokenized for the model to understand.
2. **Generate Response**: The model uses the input and conversation history to generate a relevant response.
3. **Output Decoding**: The generated tokens are decoded back into human-readable text.
4. **Conversation Flow**: The chat history is updated with each interaction, allowing the model to maintain context across multiple turns.


### Steps:
1. **Select a Pre-Trained Model**: Choose a model like GPT-2 or DialoGPT that is suitable for conversational tasks.
2. **Tokenize Input**: Process user input into tokens that the model can understand.
3. **Generate Responses**: Use the model to generate a response based on the input and conversation history.
4. **Decode Output**: Convert the model's output tokens back into human-readable text.
5. **Conversation Management**: Update the conversation history with each user and chatbot interaction to maintain context.
6. **Fine-Tuning**: Optionally, fine-tune the model on domain-specific conversations to improve relevance and accuracy.



In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load pre-trained DialoGPT model and tokenizer
model_name = "microsoft/DialoGPT-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def generate_response(input_text, chat_history_ids=None):
    # Encode the user input and add end-of-string token
    new_user_input_ids = tokenizer.encode(input_text + tokenizer.eos_token, return_tensors='pt')

    # Append the new user input to the chat history (if any)
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if chat_history_ids is not None else new_user_input_ids

    # Generate the response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        pad_token_id=tokenizer.eos_token_id,
        no_repeat_ngram_size=3,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.7
    )

    # Decode the response and print it
    response = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    return response, chat_history_ids

# Initialize the chat history
chat_history_ids = None
print("Chatbot is ready! Type 'quit' to end the conversation.")

while True:
    # Get user input
    user_input = input("You: ")
    if user_input.lower() == 'quit':
        print("Ending the chat. Have a great day!")
        break

    # Generate response
    response, chat_history_ids = generate_response(user_input, chat_history_ids)

    print(f"Chatbot: {response}")

Chatbot is ready! Type 'quit' to end the conversation.
You: Hi
Chatbot: Hi!
You: What is the weather like today?
Chatbot: It's getting colder. It's been freezing for the last few days.
You: Tell me a joke.
Chatbot: Did you hear about Pluto?
You: No
Chatbot: It wasn't a joke.
You: quit
Ending the chat. Have a great day!


## Language Modeling and Text Generation

### Problem Description:
Language modeling is the task of predicting the next word or sequence of words in a sentence based on the context provided by previous words. Using pre-trained GPT (Generative Pre-trained Transformer) models, we can train a model to generate creative text, such as poetry or short stories, and build a text autocomplete system. The goal is to create a model that understands the structure and style of the input text and can generate coherent, contextually appropriate continuations.

### Key Concepts:
1. **Language Model**: A model that learns the probability of a word or sequence of words occurring based on the context provided by preceding words.
2. **Pre-Trained Model**: A model like GPT-2 that has been trained on a large corpus of text data and can be fine-tuned for specific text generation tasks.
3. **Tokenization**: The process of converting text into tokens that the model can understand, such as words or subwords.
4. **Fine-Tuning**: The process of further training a pre-trained model on a specific dataset to adapt it to a particular style or domain.
5. **Creative Writing**: Using the model to generate poetry, short stories, or other forms of creative text based on a given prompt.
6. **Text Autocomplete**: Predicting and suggesting completions for partial text inputs, similar to how search engines provide suggestions as users type.

### Language Modeling Process:
1. **Data Collection**: Gather a dataset of creative writing samples (e.g., poetry, short stories) for fine-tuning.
2. **Tokenize Text**: Process the text into tokens using a tokenizer compatible with the chosen GPT model.
3. **Fine-Tune the Model**: Train the pre-trained GPT model on the dataset to capture the style and structure of creative writing.
4. **Generate Text**: Provide a prompt to the model and generate creative continuations.
5. **Autocomplete System**: Use the model to suggest likely continuations for user inputs in real-time.

### Types of Language Models:
1. **Generative Language Models**: Create new text sequences based on input prompts (e.g., GPT-2, GPT-3).
2. **Masked Language Models**: Predict missing words in a sentence (e.g., BERT).
3. **Seq2Seq Models**: Translate or transform one sequence into another (e.g., for translation or summarization tasks).

### Language Modeling Objective:
The goal is to train a GPT-based model to:
- Generate creative text that matches the style and theme of the input.
- Autocomplete partial sentences by suggesting relevant continuations.
- Adapt to different creative tasks such as poetry generation or story continuation.

### Steps:
1. **Select a Pre-Trained Model**: Choose a model like GPT-2 from the Hugging Face Transformers library.
2. **Prepare Data**: Tokenize the dataset of creative writing samples.
3. **Fine-Tune the Model**: Train the model using the prepared dataset to adapt it to the desired style.
4. **Text Generation**: Use the fine-tuned model to generate text based on user prompts.
5. **Build Autocomplete**: Use the model to predict likely word sequences based on partially typed text.
6. **Interactive Example**: Create a loop where users can input a prompt, and the model generates creative continuations.

In [None]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

def generate_text(prompt, max_length=100, temperature=0.7):
    # Encode the input prompt
    input_ids = tokenizer.encode(prompt, return_tensors='pt')

    # Generate text using the GPT-2 model with sampling
    output = model.generate(
        input_ids,
        max_length=max_length,
        num_return_sequences=1,
        temperature=temperature,  # Controls randomness in generation
        top_k=50,                 # Filters the most likely words
        top_p=0.95,               # Nucleus sampling
        do_sample=True,           # Enables sampling for more creative results
        no_repeat_ngram_size=2,   # Prevents repeating phrases
        pad_token_id=tokenizer.eos_token_id
    )

    # Decode the output to readable text
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
    return generated_text

# Example prompt for generating a poem
prompt = "The sun dipped below the horizon, casting a golden glow over the sea"
generated_poem = generate_text(prompt, max_length=100)
print("Prompt:\n", prompt)
print("Generated Poem:\n", generated_poem)


Prompt:
 The sun dipped below the horizon, casting a golden glow over the sea
Generated Poem:
 The sun dipped below the horizon, casting a golden glow over the sea of green. The rain fell from the sky, and the clouds floated over it.

The moon rose in a flash, the sun's rays hitting the moon like a jet and leaving a cloud of red in its wake. Then the rays of the wind swept through the air, illuminating the night sky with its shimmering light. An incredible amount of light shone through its blackened surface, creating a perfect storm of sparks and


In [None]:
def autocomplete_text(prompt, max_length=50, temperature=0.7):
    # Encode the input prompt
    input_ids = tokenizer.encode(prompt, return_tensors='pt')

    # Generate text that completes the prompt using sampling
    output = model.generate(
        input_ids,
        max_length=max_length,
        num_return_sequences=1,
        temperature=temperature,  # Controls randomness (higher values = more diverse outputs)
        top_k=50,                 # Filters to the top 50 most likely next tokens
        top_p=0.9,                # Uses nucleus sampling (picking from the top 90% probability mass)
        do_sample=True,           # Enables sampling for more creative output
        pad_token_id=tokenizer.eos_token_id
    )

    # Decode and return the completion
    completion = tokenizer.decode(output[0], skip_special_tokens=True)
    return completion

# Example prompt for autocomplete
prompt = "In the quiet forest, a fox"
completion = autocomplete_text(prompt, max_length=50)
print("Prompt:", prompt)
print("Autocomplete Suggestion:\n", completion)

Prompt: In the quiet forest, a fox
Autocomplete Suggestion: In the quiet forest, a fox is a fox, and a wolf is a wolf, and a bear is a bear.


In [None]:
print("Welcome to the Text Generation Demo! Type 'quit' to exit.")

while True:
    user_prompt = input("Enter a prompt: ")
    if user_prompt.lower() == 'quit':
        print("Goodbye!")
        break

    # Generate creative continuation
    generated_text = generate_text(user_prompt, max_length=100)
    print("\nGenerated Text:\n", generated_text)


Welcome to the Text Generation Demo! Type 'quit' to exit.
Enter a prompt: The ancient oak tree stood tall in the clearing, its branches

Generated Text:
 The ancient oak tree stood tall in the clearing, its branches hanging from its trunk.

"What's the matter with this?"
...
-
There was a long silence, and then, as if the sky were changing, the sound of the wind rose to a loud and sudden roar. As if by magic, a large black blade pierced through the oak's branches, piercing the tree with its long blade. The blade struck the branch, cutting it in half and killing it
Enter a prompt: quit
Goodbye!
