# Further Reading

AI and large language models represent a convergence of decades of research in machine learning, natural language processing, computer systems, and human-computer interaction. This chapter highlights foundational papers, influential systems, and active research areas for readers who want to explore the topics we've covered in greater depth.

## Foundations of Language Models

- The transformer architecture that started it all:
  - Vaswani et al. (2017): *Attention Is All You Need* {cite}`vaswani2017attention`
  - Devlin et al. (2019): *BERT: Pre-training of Deep Bidirectional Transformers* {cite}`devlin2019bert`

- Scaling laws and emergent capabilities:
  - Kaplan et al. (2020): *Scaling Laws for Neural Language Models* {cite}`kaplan2020scaling`
  - Wei et al. (2022): *Emergent Abilities of Large Language Models* {cite}`wei2022emergent`

## Prompting and In-Context Learning

- Understanding how LLMs learn from prompts:
  - Brown et al. (2020): *Language Models are Few-Shot Learners* {cite}`brown2020language`
  - Wei et al. (2022): *Chain-of-Thought Prompting Elicits Reasoning* {cite}`wei2022chain`
  - Kojima et al. (2022): *Large Language Models are Zero-Shot Reasoners* {cite}`kojima2022large`

- Prompt engineering techniques:
  - Reynolds and McDonell (2021): *Prompt Programming for Large Language Models* {cite}`reynolds2021prompt`
  - Zhou et al. (2023): *Large Language Models Are Human-Level Prompt Engineers* {cite}`zhou2023large`

## Tokenization and Representation

- Subword tokenization methods:
  - Sennrich et al. (2016): *Neural Machine Translation of Rare Words with Subword Units* {cite}`sennrich2016neural`
  - Kudo and Richardson (2018): *SentencePiece: A Simple and Language Independent Approach* {cite}`kudo2018sentencepiece`

- Understanding embeddings:
  - Mikolov et al. (2013): *Efficient Estimation of Word Representations in Vector Space* {cite}`mikolov2013efficient`
  - Pennington et al. (2014): *GloVe: Global Vectors for Word Representation* {cite}`pennington2014glove`

## Alignment and Safety

- Reinforcement learning from human feedback:
  - Christiano et al. (2017): *Deep Reinforcement Learning from Human Preferences* {cite}`christiano2017deep`
  - Ouyang et al. (2022): *Training Language Models to Follow Instructions with Human Feedback* {cite}`ouyang2022training`

- Understanding model behavior and safety:
  - Bai et al. (2022): *Constitutional AI: Harmlessness from AI Feedback* {cite}`bai2022constitutional`
  - Ganguli et al. (2023): *The Capacity for Moral Self-Correction in Large Language Models* {cite}`ganguli2023capacity`

## Multimodal Models

- Vision-language models:
  - Radford et al. (2021): *Learning Transferable Visual Models From Natural Language Supervision* {cite}`radford2021learning`
  - Ramesh et al. (2022): *Hierarchical Text-Conditional Image Generation with CLIP Latents* {cite}`ramesh2022hierarchical`

- Unified architectures:
  - Alayrac et al. (2022): *Flamingo: a Visual Language Model for Few-Shot Learning* {cite}`alayrac2022flamingo`

## Model Efficiency and Compression

- Making models faster and smaller:
  - Hinton et al. (2015): *Distilling the Knowledge in a Neural Network* {cite}`hinton2015distilling`
  - Frantar and Alistarh (2023): *SparseGPT: Massive Language Models Can Be Accurately Pruned* {cite}`frantar2023sparsegpt`

- Low-rank adaptation:
  - Hu et al. (2022): *LoRA: Low-Rank Adaptation of Large Language Models* {cite}`hu2022lora`

## Retrieval Augmentation and Tool Use

- Enhancing models with external knowledge:
  - Lewis et al. (2020): *Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks* {cite}`lewis2020retrieval`
  - Borgeaud et al. (2022): *Improving Language Models by Retrieving from Trillions of Tokens* {cite}`borgeaud2022improving`

- Tool use and agentic behavior:
  - Schick et al. (2023): *Toolformer: Language Models Can Teach Themselves to Use Tools* {cite}`schick2023toolformer`
  - Nakano et al. (2021): *WebGPT: Browser-assisted Question-Answering with Human Feedback* {cite}`nakano2021webgpt`

## Privacy and Security

- Privacy-preserving machine learning:
  - Abadi et al. (2016): *Deep Learning with Differential Privacy* {cite}`abadi2016deep`
  - McMahan et al. (2017): *Communication-Efficient Learning of Deep Networks from Decentralized Data* {cite}`mcmahan2017communication`

- Adversarial robustness and attacks:
  - Carlini et al. (2021): *Extracting Training Data from Large Language Models* {cite}`carlini2021extracting`
  - Wallace et al. (2019): *Universal Adversarial Triggers for Attacking and Analyzing NLP* {cite}`wallace2019universal`

## Interpretability and Mechanistic Understanding

- Understanding how transformers work:
  - Elhage et al. (2021): *A Mathematical Framework for Transformer Circuits* {cite}`elhage2021mathematical`
  - Olsson et al. (2022): *In-Context Learning and Induction Heads* {cite}`olsson2022context`

- Probing and analysis:
  - Tenney et al. (2019): *BERT Rediscovers the Classical NLP Pipeline* {cite}`tenney2019bert`

## Coding and Program Synthesis

- Models for code generation:
  - Chen et al. (2021): *Evaluating Large Language Models Trained on Code* {cite}`chen2021evaluating`
  - Austin et al. (2021): *Program Synthesis with Large Language Models* {cite}`austin2021program`

- Formal verification and correctness:
  - Polu and Sutskever (2020): *Generative Language Modeling for Automated Theorem Proving* {cite}`polu2020generative`

## Practical Resources and Libraries

- Open-source frameworks:
  - [Ollama](https://docs.ollama.com/)
  - [UniXcoder](https://huggingface.co/microsoft/unixcoder-base)

## Surveys and Perspectives

- Comprehensive overviews:
  - Zhao et al. (2023): *A Survey of Large Language Models* {cite}`zhao2023survey`
  - Bommasani et al. (2021): *On the Opportunities and Risks of Foundation Models* {cite}`bommasani2021opportunities`

- Future directions and open problems:
  - Bubeck et al. (2023): *Sparks of Artificial General Intelligence: Early Experiments with GPT-4* {cite}`bubeck2023sparks`

## References

In the web/html version of the book, the bibliography will appear *directly below this current text section*.

However in the print versions which are based on $\LaTeX$, the bibliography will appear (more traditionally) as the penultimate un-numbered standalone chapter which precedes the Proof Index.

```{bibliography}
:style: unsrt
```
