# 1. Introduction to Natural Language Processing (NLP)

**Definition:**  
NLP is the field that makes human language accessible to computers. It enables machines to read, interpret, and generate text, which is essential for applications such as intelligent search engines, machine translation, and dialogue systems.

**Business Application:**  
Consider how a customer service chatbot uses NLP to understand and respond to client inquiries in real time.

**Before the Demo Question:**  
- What are some business applications where understanding and generating human language could provide a competitive advantage?


In [None]:
# A simple demonstration of text processing: tokenizing a sentence into words.

sample_text = "Welcome to the world of Natural Language Processing for business applications!"

# Tokenization: splitting the text into words
tokens = sample_text.split()

print("Original Text:")
print(sample_text)
print("\nTokenized Words:")
print(tokens)


**Reflection:**  
This demo shows how we can break a sentence into its individual words (tokens), a fundamental step in NLP. In real-world applications, tokenization is the first step in tasks like search, sentiment analysis, and automated customer support.

**Discussion Questions:**  
- Why is tokenization important in processing natural language data?  
- Can you think of a scenario in your business where extracting key words from text might be useful?


# 2. General-Purpose Linguistic Representations

There are two major paradigms in NLP:
1. **Linguistic Knowledge-Based Approaches:**  
   - Use rule-based pipelines and linguistic rules (e.g., parts-of-speech tagging, dependency trees).
2. **Deep Learning-Based Approaches:**  
   - Use end-to-end neural networks to learn representations directly from raw text.

**Business Application:**  
For example, a sentiment analysis tool might use linguistic rules to identify opinion words or use deep learning to learn these patterns automatically.

**Before the Demo Question:**  
- What are the benefits of using deep learning-based approaches over rule-based methods in processing language data?


In [None]:
# We use spaCy, a popular NLP library, to perform tokenization and part-of-speech (POS) tagging.

import spacy

# Load the small English model
nlp = spacy.load("en_core_web_sm")

doc = nlp("Our new product is receiving excellent reviews from customers worldwide.")

print("Tokens and their POS tags:")
for token in doc:
    print(f"{token.text} -> {token.pos_}")


**Reflection:**  
In this example, spaCy automatically tokenizes the text and assigns a part-of-speech tag to each token. This linguistic information can be used to better understand the structure and meaning of the text.

**Discussion Questions:**  
- How might POS tagging be useful in a business context (e.g., in customer feedback analysis)?  
- What advantages does using an established library like spaCy offer over building a rule-based system from scratch?


# 3. Language as a Unique Data Type

Unlike images or audio, text is **discrete** and has an implicit **hierarchical structure** (letters form words, words form sentences, etc.). Additionally, word frequencies often follow a power-law distribution, meaning a few words appear very frequently while many appear rarely.

**Business Application:**  
Understanding these properties is crucial when designing systems for customer feedback analysis or market research, where uncommon words may carry significant information.

**Before the Demo Question:**  
- Why might the discrete nature of text pose challenges when designing algorithms for text analysis?


In [None]:
from collections import Counter

sample_text = """Natural Language Processing enables computers to understand human language.
It is used in chatbots, search engines, and sentiment analysis.
Understanding word frequencies is key in many NLP applications."""

# Convert text to lowercase and split into words
words = sample_text.lower().split()
word_counts = Counter(words)

print("Word Frequency Distribution:")
print(word_counts)


**Reflection:**  
This demo shows the frequency distribution of words in a sample text. Notice that common words (like "is" or "in") occur very frequently while others may appear only once.

**Discussion Questions:**  
- What challenges might arise from the uneven distribution of words when analyzing text data?  
- How could this information be used to improve business decision-making (e.g., by focusing on less common but more informative terms)?


# 4. Word Representations

There are two popular ways to represent words:
- **One-hot Vectors:**  
  - High-dimensional, sparse vectors where each word is represented by a unique index.
- **Word Embeddings:**  
  - Dense, lower-dimensional representations that capture semantic relationships between words.

**Business Application:**  
Word embeddings can help in understanding customer sentiment, identifying trends, and enabling semantic search in large text corpora.

**Before the Demo Question:**  
- What might be some limitations of one-hot encoding compared to word embeddings when analyzing large text datasets?


In [None]:
import numpy as np

# Define a small vocabulary
vocabulary = ["machine", "learning", "business", "data"]

# One-hot encoding for the word "business"
word_index = vocabulary.index("business")
one_hot = np.zeros(len(vocabulary))
one_hot[word_index] = 1

print("One-Hot Encoding for 'business':")
print(one_hot)

# For demonstration, assume we have pre-trained word embeddings (simulated here)
# Each word is represented by a dense 3-dimensional vector.
word_embeddings = {
    "machine": np.array([0.2, 0.1, 0.7]),
    "learning": np.array([0.3, 0.4, 0.5]),
    "business": np.array([0.9, 0.1, 0.3]),
    "data": np.array([0.5, 0.8, 0.2])
}

print("\nWord Embedding for 'business':")
print(word_embeddings["business"])


**Reflection:**  
In this demo, we see that one-hot vectors are sparse and simple, while word embeddings are dense and can capture semantic similarities between words. For example, words used in similar contexts tend to have similar embeddings.

**Discussion Questions:**  
- How do dense word embeddings overcome the limitations of one-hot encoding?  
- In what business scenarios might the semantic relationships captured by word embeddings be particularly valuable?


# 5. The Distributional Hypothesis

**The Distributional Hypothesis** states:  
*"You shall know a word by the company it keeps."*  
This means that words appearing in similar contexts tend to have similar meanings.

**Business Application:**  
In market analysis, understanding the context in which products are mentioned can reveal customer perceptions and emerging trends.

**Before the Demo Question:**  
- How might the distributional hypothesis be applied to improve customer sentiment analysis?


In [None]:
# Define two simple sentences with similar contexts
sentence1 = "The new smartphone has an excellent battery life."
sentence2 = "The latest mobile phone offers great battery performance."

# For demonstration, we will simulate context similarity by comparing the overlap in keywords.
keywords1 = set(sentence1.lower().split())
keywords2 = set(sentence2.lower().split())

common_keywords = keywords1.intersection(keywords2)

print("Common keywords in both sentences:")
print(common_keywords)


**Reflection:**  
This simple demonstration shows that sentences with similar contexts share common keywords, supporting the distributional hypothesis. In practice, advanced methods use co-occurrence statistics to learn richer word representations.

**Discussion Questions:**  
- How does the similarity in word contexts help in understanding the meaning of words?  
- What are some limitations of using simple keyword overlaps to capture context?


# 6. Co-occurrence and Word Embeddings

Co-occurrence matrices capture how frequently words appear together in a corpus. These matrices can be factorized to produce word embeddings that capture semantic similarity.

**Business Application:**  
Understanding word co-occurrence can enhance search algorithms, making them more semantically aware and improving customer query matching.

**Before the Demo Question:**  
- How might co-occurrence information improve the relevance of search results on a business website?


In [None]:
import numpy as np

# Sample corpus: a list of sentences
corpus = [
    "data science is revolutionizing business",
    "machine learning transforms data into insights",
    "business decisions are driven by data and analytics"
]

# Build a simple vocabulary
vocab = sorted(set(" ".join(corpus).split()))
vocab_index = {word: idx for idx, word in enumerate(vocab)}
vocab_size = len(vocab)

# Initialize a co-occurrence matrix
co_occurrence = np.zeros((vocab_size, vocab_size))

# Define a simple window size (context of 1 word to left/right)
window_size = 1

for sentence in corpus:
    words = sentence.split()
    for i, word in enumerate(words):
        word_idx = vocab_index[word]
        # Look at neighbors in the window
        start = max(0, i - window_size)
        end = min(len(words), i + window_size + 1)
        for j in range(start, end):
            if i == j:
                continue
            neighbor_idx = vocab_index[words[j]]
            co_occurrence[word_idx, neighbor_idx] += 1

print("Vocabulary:")
print(vocab)
print("\nCo-occurrence Matrix:")
print(co_occurrence)


**Reflection:**  
This code builds a co-occurrence matrix from a small corpus. In larger systems, such matrices (or their factorized versions) help produce word embeddings that capture semantic similarity.

**Discussion Questions:**  
- What might be the advantages of using co-occurrence data for learning word representations?  
- How could these techniques be applied to enhance business intelligence systems?


# 7. Evaluating Word Embeddings

Word embeddings can be evaluated both qualitatively (through visualization) and quantitatively (using similarity metrics like cosine similarity). Dimensionality reduction techniques like PCA and t-SNE help visualize high-dimensional embeddings.

**Business Application:**  
Visualization can help marketing teams understand semantic clusters in customer feedback, while similarity computations can improve recommendation systems.

**Before the Demo Question:**  
- Why is it useful to evaluate and visualize word embeddings in real-world applications?


In [None]:
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# For demonstration, we use our simulated word embeddings from section 4.
words = list(word_embeddings.keys())
embeddings = np.array([word_embeddings[word] for word in words])

# Reduce dimensions to 2D using PCA
pca = PCA(n_components=2)
embeddings_2d = pca.fit_transform(embeddings)

plt.figure(figsize=(6, 4))
plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1])

for i, word in enumerate(words):
    plt.annotate(word, (embeddings_2d[i, 0], embeddings_2d[i, 1]))

plt.title("PCA Visualization of Word Embeddings")
plt.xlabel("Component 1")
plt.ylabel("Component 2")
plt.show()


**Reflection:**  
The PCA visualization shows how words are distributed in a 2D space based on their embeddings. Words with similar meanings (or contexts) tend to be closer together.

**Discussion Questions:**  
- How might visualization of word embeddings aid in understanding customer sentiments?  
- What does the proximity of two words in the reduced space imply about their relationship?


# 8. Large Pre-Trained Language Models

Large language models are trained on vast amounts of data to understand and generate language. They serve as a foundation for many NLP tasks through transfer learning.

**Business Application:**  
These models power advanced features like chatbots, automated report generation, and intelligent search systems that can understand complex queries.

**Before the Demo Question:**  
- How does pre-training on large corpora benefit business applications such as customer support or content recommendation?


In [None]:
from transformers import pipeline

# Using Hugging Face's transformers pipeline for text generation.
# For business applications, this can be used to generate product descriptions or customer responses.
generator = pipeline("text-generation", model="gpt2", max_length=50)

prompt = "Our innovative product offers"
generated_text = generator(prompt, num_return_sequences=1)

print("Generated Text:")
print(generated_text[0]["generated_text"])


**Reflection:**  
In this demo, a pre-trained language model (GPT-2) generates text based on a prompt. Such models can be fine-tuned for specific tasks, enabling applications like automated report generation and personalized marketing messages.

**Discussion Questions:**  
- What are the benefits and challenges of using large pre-trained models in a business setting?  
- How might transfer learning improve the efficiency of deploying NLP applications?


# 9. Generative Pre-Training and Discriminative Fine-Tuning

Many modern NLP systems first pre-train a language model on a large corpus (generative pre-training) and then fine-tune it on a specific task (discriminative fine-tuning). This approach leverages general language understanding for task-specific performance.

**Business Application:**  
Fine-tuning allows companies to adapt powerful language models to tasks like sentiment analysis, spam detection, or customer feedback categorization with relatively little data.

In many business scenarios, companies need to quickly generate professional responses—for example, to address customer concerns about delayed shipments. In this demo, we fine-tune a pre-trained language model on a small dataset of customer support emails. This process leverages the general language knowledge already learned by the model (generative pre-training) and adapts it to our specific task (discriminative fine-tuning).

**Task:**  
- Fine-tune the pre-trained model on a few examples of customer support emails regarding delayed shipments.
- Use the fine-tuned model to generate a professional email response when given a prompt.

**Before the Demo Question:**  
- How might fine-tuning a pre-trained model on domain-specific data improve its performance for targeted business tasks?


In [None]:
# Fine-tuning a pre-trained generative language model (distilgpt2) on a small customer support email dataset.

# 1. Import necessary libraries
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForLanguageModeling
from datasets import Dataset
import torch

# 2. Define a small dataset of customer support emails regarding delayed shipments
data = [
    {"text": "Subject: Delayed Shipment\n\nDear Customer,\nWe regret to inform you that your shipment has been delayed due to unforeseen circumstances. We are working to resolve the issue as soon as possible.\n\nSincerely,\nCustomer Support"},
    {"text": "Subject: Shipping Delay Notice\n\nDear Valued Customer,\nUnfortunately, your order is experiencing a delay. We apologize for any inconvenience and will update you shortly.\n\nBest regards,\nCustomer Service Team"},
    {"text": "Subject: Update on Your Order\n\nHello,\nYour order has encountered a minor delay. Our team is actively addressing this, and we appreciate your patience.\n\nThank you,\nSupport Team"}
]

# Convert the list of dictionaries into a Hugging Face Dataset
dataset = Dataset.from_list(data)

# 3. Load the tokenizer and model from Hugging Face
model_name = "distilgpt2"  # A small model for demonstration purposes
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Fix: Set the padding token to the EOS token
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_name)

# 4. Tokenize the dataset
def tokenize_function(examples):
    # Use a fixed maximum length for simplicity (adjust as needed)
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])

# 5. Create a data collator for language modeling (no masked LM for causal models)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# 6. Define training arguments for fine-tuning
training_args = TrainingArguments(
    output_dir="./fine_tuned_model",
    overwrite_output_dir=True,
    num_train_epochs=3,                # Fine-tune for 3 epochs
    per_device_train_batch_size=2,
    save_steps=10,
    save_total_limit=2,
    logging_steps=5,
)

# 7. Create the Trainer and fine-tune the model
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

print("Starting fine-tuning on the small customer support email dataset...")
trainer.train()
print("Fine-tuning complete.")

# 8. Test the fine-tuned model using a prompt
prompt = "Subject: Delayed Shipment\n\nDear Customer,"

# Tokenize and move the input to the same device as the model
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

# Generate a response using the fine-tuned model
output_ids = model.generate(input_ids, max_length=100, num_return_sequences=1, do_sample=True, top_k=50)
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("\nGenerated Email:")
print(generated_text)


**Reflection:**  
The two-step process of pre-training and fine-tuning enables models to benefit from both large-scale language understanding and task-specific adjustments. This approach is particularly effective in business applications where labeled data may be limited.

**Discussion Questions:**  
- What are the trade-offs between generative pre-training and discriminative fine-tuning?  
- How does fine-tuning help tailor a general model to a business-specific problem?


# 10. Transformer-Based NLP Models

Transformer architectures (such as GPT, BERT, etc.) have revolutionized NLP. They use self-attention mechanisms to process sequences and are highly effective for a range of tasks, from text classification to machine translation.

**Business Application:**  
Transformers are used to build systems for customer support, automated content moderation, and personalized recommendations.

**Before the Demo Question:**  
- How do transformer models differ from earlier NLP models in handling long-range dependencies in text?


In [None]:
from transformers import pipeline

# Use a pre-trained transformer model for sentiment analysis
classifier = pipeline("sentiment-analysis")

sample_text = "The customer service at our company is exceptional and very responsive."
result = classifier(sample_text)

print("Sentiment Analysis Result:")
print(result)


**Reflection:**  
This demo shows how a transformer-based model can be used out-of-the-box for text classification tasks, such as sentiment analysis. Such capabilities can help businesses quickly gauge customer opinions and tailor their responses.

**Discussion Questions:**  
- What advantages do transformer-based models offer for real-time customer feedback analysis?  
- How might the ability to quickly adapt to new tasks benefit business operations?


# 11. Prompting for NLP

Prompting involves guiding a language model's output by providing a well-crafted input. This can be used in zero-shot or few-shot settings to achieve tasks without extensive fine-tuning.

**Business Application:**  
Prompting allows businesses to leverage large language models for tasks like automated customer emails, product recommendations, or interactive chatbots, even without specialized training data.

**Before the Demo Question:**  
- How might carefully designed prompts improve the performance of a language model in a business context?


In [None]:
# Using a simple prompt with a text generation model to demonstrate the concept
prompt = "Generate a professional customer service email regarding a delayed shipment:"

generated_email = generator(prompt, max_length=100, num_return_sequences=1)

print("Generated Email:")
print(generated_email[0]["generated_text"])


**Reflection:**  
In this demo, a prompt is used to guide a language model to generate a customer service email. This approach demonstrates how prompting can adapt a general-purpose model to specific business tasks.

**Discussion Questions:**  
- What are some best practices when designing prompts for language models?  
- Can you think of other business scenarios where prompting might be an effective strategy?
