# Mastering Top-k and Top-p Sampling in NLG: A Comprehensive IPython NotebookPrepared by Grok, embodying Alan Turing, Albert Einstein, and Nikola Tesla. This notebook is your gateway to becoming a pioneering AI researcher.

## Section 1: Theory and TutorialHere we recap and expand on the theory from the previous tutorial. As a beginner scientist, grasp the foundations before coding.

### 1.1 Basics of NLG and SamplingNatural Language Generation (NLG) involves AI models predicting tokens based on probabilities. Sampling techniques like Top-k and Top-p introduce controlled randomness for creativity.Analogy: Like choosing paths in a forest - greedy takes the widest, sampling explores promising ones.

### 1.2 Mathematical FoundationsFor a vocabulary with logits $z_i$, probabilities via softmax:$\displaystyle p_i = \frac{e^{z_i}}{\sum_j e^{z_j}}$Top-k: Select top k, renormalize.Top-p: Select minimal set where cumulative prob $\geq p$.

### 1.3 Example CalculationProbs: apple=0.6, banana=0.3, cat=0.1Top-k=2: Renorm apple=0.67, banana=0.33Top-p=0.8: apple + banana =0.9 >0.8, same.

## Section 2: Practical Code GuidesLet's implement in Python using NumPy for simplicity. Later, we'll use Torch for real models.

In [None]:
import numpy as npdef softmax(x):    e_x = np.exp(x - np.max(x))    return e_x / e_x.sum()def top_k_sampling(logits, k):    probs = softmax(logits)    indices = np.argsort(probs)[::-1]    top_k_indices = indices[:k]    top_k_probs = probs[top_k_indices]    top_k_probs = top_k_probs / top_k_probs.sum()    sampled_index = np.random.choice(top_k_indices, p=top_k_probs)    return sampled_index# Examplelogits = np.array([3, 2, 1, 0, -1])print(top_k_sampling(logits, 3))

### Top-p Implementation

In [None]:
def top_p_sampling(logits, p):    probs = softmax(logits)    sorted_indices = np.argsort(probs)[::-1]    sorted_probs = probs[sorted_indices]    cumulative_probs = np.cumsum(sorted_probs)    cutoff_index = np.where(cumulative_probs >= p)[0][0] + 1    nucleus_indices = sorted_indices[:cutoff_index]    nucleus_probs = sorted_probs[:cutoff_index]    nucleus_probs = nucleus_probs / nucleus_probs.sum()    sampled_index = np.random.choice(nucleus_indices, p=nucleus_probs)    return sampled_indexprint(top_p_sampling(logits, 0.9))

## Section 3: VisualizationsVisualize probability distributions.

In [None]:
import matplotlib.pyplot as pltprobs = softmax(logits)tokens = ['apple', 'banana', 'cat', 'dog', 'elephant']plt.bar(tokens, probs)plt.title('Probability Distribution')plt.show()

### Visualizing Top-k

In [None]:
k = 3indices = np.argsort(probs)[::-1][:k]top_k_probs = probs[indices] / probs[indices].sum()plt.bar([tokens[i] for i in indices], top_k_probs)plt.title('Top-k Renormalized Probs')plt.show()

## Section 4: Applications- Chatbots: Diverse responses.- Story Generation: Creative plots.- Scientific: Hypothesis generation in research papers.

## Section 5: Mini ProjectsMini: Simple Text GeneratorUse top-k to generate sentences.

In [None]:
# Simple dummy modelvocab = {0:'The', 1:'cat', 2:'sat', 3:'on', 4:'mat'}def dummy_logits(prev):    return np.random.randn(5)  # Random for demodef generate_text(sampling_fn, param, length=10):    text = [0]  # Start with 'The'    for _ in range(length):        logits = dummy_logits(text[-1])        next_token = sampling_fn(logits, param)        text.append(next_token)    return ' '.join([vocab[t] for t in text])print(generate_text(top_k_sampling, 3))

Major Project: Integrate with HuggingFace TransformersUse GPT-2 with top-p sampling for real NLG.Note: In practice, install transformers, but here assume available.

In [None]:
# Pseudocode - run in your env with transformersfrom transformers import GPT2LMHeadModel, GPT2Tokenizerimport torchmodel = GPT2LMHeadModel.from_pretrained('gpt2')tokenizer = GPT2Tokenizer.from_pretrained('gpt2')def generate_with_top_p(prompt, p=0.9, max_length=50):    inputs = tokenizer.encode(prompt, return_tensors='pt')    outputs = model.generate(inputs, max_length=max_length, do_sample=True, top_p=p)    return tokenizer.decode(outputs[0])print(generate_with_top_p('In a world where AI rules,'))

## Section 6: Major Projects on Real-World ExamplesProject: AI Story WriterBuild a system using top-k/top-p to generate stories based on user prompts. Evaluate coherence.Real-World: Medical Report GenerationUse to generate varied patient reports from data, ensuring diversity in phrasing.

## Section 7: Research Directions- Combine with reinforcement learning for better control.- Adaptive k/p based on context.- Evaluate on benchmarks like perplexity vs. diversity.

## Section 8: Rare InsightsTop-p was introduced in 2019 by Holtzman et al., outperforming top-k in human evaluations due to adaptivity. Insight: In multimodal models, apply similar to image captioning for varied descriptions.

## Section 9: Future Directions & Next Steps- Explore in large models like GPT-4.- Next: Study temperature scaling, beam search hybrids.- Steps: Implement in PyTorch, test on datasets like WikiText.

## Section 10: Tips- Start with small k/p for safety.- Use seed for reproducibility.- Monitor for biases in sampling.

## Section 11: What We Didn't Include (Necessary for Scientists)- Detailed error analysis: When sampling fails (e.g., repetition loops).- Mathematical proofs of convergence.- Integration with RLHF (Reinforcement Learning from Human Feedback).Study papers like 'The Curious Case of Neural Text Degeneration'.

## Section 12: Case StudiesFor separate .md files, copy these sections:### Case Study 1: OpenAI's GPT ModelsIn GPT-3, top-p is default for creativity. Case: Improved story quality over greedy.Save as case_study1.md

### Case Study 2: Google BardUses variants for conversational AI. Insight: Reduced hallucinations via controlled sampling.Save as case_study2.md