# Week 6, Day 7: Review and Feedback Session

## Session Overview
This session will review the key concepts covered in Week 6 and provide practice exercises to reinforce learning:

1. Text Processing and Classification
2. Named Entity Recognition
3. Topic Modeling and Summarization
4. Language Models and Text Generation

## Learning Objectives
- Reinforce NLP concepts
- Practice technique selection
- Master implementation skills
- Prepare for advanced topics

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import nltk
import spacy
from gensim import corpora, models

## 1. Text Processing Review

In [None]:
def text_processing_review():
    # Sample text
    text = """
    Natural Language Processing (NLP) is a field of artificial intelligence
    that helps computers understand and process human language. NLP combines
    linguistics, computer science, and machine learning techniques.
    """
    
    # Basic processing
    tokens = nltk.word_tokenize(text.lower())
    stop_words = set(nltk.corpus.stopwords.words('english'))
    filtered_tokens = [token for token in tokens if token not in stop_words]
    
    # Create features
    vectorizer = TfidfVectorizer()
    features = vectorizer.fit_transform([text])
    
    # Print results
    print("Original Text:")
    print(text)
    print("\nTokens:")
    print(tokens[:10])
    print("\nFiltered Tokens:")
    print(filtered_tokens[:10])
    print("\nFeature Names:")
    print(list(vectorizer.get_feature_names_out())[:10])

text_processing_review()

## 2. Classification Review

In [None]:
def classification_review():
    # Sample dataset
    texts = [
        "This movie was fantastic!",
        "Terrible waste of time.",
        "Great performance by all.",
        "I really hated it.",
        "Excellent film, must watch!"
    ]
    labels = [1, 0, 1, 0, 1]  # 1: positive, 0: negative
    
    # Create features
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(texts)
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        X, labels, test_size=0.2, random_state=42
    )
    
    # Train model
    from sklearn.naive_bayes import MultinomialNB
    model = MultinomialNB()
    model.fit(X_train, y_train)
    
    # Make predictions
    y_pred = model.predict(X_test)
    
    # Print results
    print("Classification Report:")
    print(classification_report(y_test, y_pred))

classification_review()

## 3. NER Review

In [None]:
def ner_review():
    # Sample text
    text = "Apple CEO Tim Cook announced new products at their headquarters in Cupertino, California."
    
    # Process with spaCy
    nlp = spacy.load('en_core_web_sm')
    doc = nlp(text)
    
    # Extract entities
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    
    # Print results
    print("Text:")
    print(text)
    print("\nEntities:")
    for entity, label in entities:
        print(f"{entity}: {label}")

ner_review()

## 4. Topic Modeling Review

In [None]:
def topic_modeling_review():
    # Sample documents
    documents = [
        "Machine learning helps computers learn from data.",
        "Deep learning revolutionized AI applications.",
        "Neural networks process complex patterns.",
        "Data science combines statistics and programming."
    ]
    
    # Tokenize
    texts = [doc.lower().split() for doc in documents]
    
    # Create dictionary
    dictionary = corpora.Dictionary(texts)
    
    # Create corpus
    corpus = [dictionary.doc2bow(text) for text in texts]
    
    # Train LDA model
    lda = models.LdaModel(
        corpus,
        num_topics=2,
        id2word=dictionary
    )
    
    # Print topics
    print("Topics:")
    for idx, topic in lda.print_topics():
        print(f"Topic {idx + 1}:")
        print(topic)

topic_modeling_review()

## Week 6 Review Quiz

### Multiple Choice Questions

1. What is tokenization used for?
   - a) Text generation
   - b) Breaking text into units
   - c) Topic modeling
   - d) Translation

2. What is TF-IDF?
   - a) Text classifier
   - b) Feature extraction method
   - c) Neural network
   - d) Language model

3. What is Named Entity Recognition?
   - a) Text classification
   - b) Entity identification
   - c) Topic modeling
   - d) Text generation

4. What is LDA used for?
   - a) Classification
   - b) Topic modeling
   - c) Translation
   - d) Summarization

5. What are stop words?
   - a) Important keywords
   - b) Common words removed
   - c) Named entities
   - d) Technical terms

6. What is lemmatization?
   - a) Text generation
   - b) Word normalization
   - c) Entity extraction
   - d) Topic modeling

7. What is extractive summarization?
   - a) Text generation
   - b) Sentence selection
   - c) Translation
   - d) Classification

8. What is a language model?
   - a) Translation system
   - b) Probability distribution
   - c) Classification model
   - d) Topic model

9. What is word embedding?
   - a) Text compression
   - b) Vector representation
   - c) Classification method
   - d) Summarization technique

10. What is the purpose of attention mechanism?
    - a) Text compression
    - b) Focus on relevant parts
    - c) Topic modeling
    - d) Classification

Answers: 1-b, 2-b, 3-b, 4-b, 5-b, 6-b, 7-b, 8-b, 9-b, 10-b

## Week 6 Summary

### Key Concepts Covered:
1. Text processing and classification
2. Named entity recognition
3. Topic modeling and summarization
4. Language models and text generation

### Preparation for Advanced Topics:
- Review challenging concepts
- Practice implementation
- Study real-world applications
- Explore latest research

### Additional Resources:
- NLTK documentation: https://www.nltk.org/
- spaCy tutorials: https://spacy.io/usage/spacy-101
- Hugging Face documentation: https://huggingface.co/docs