author	title
Neil Ernst	Large Language Models and Natural Language Processing in SE

AI-supported development tools, like Codex, Copilot, ChatGPT, etc., have taken a big role in SE recently. What underpins these tools, how do they work so well, what ethical concerns do they raise, and what can we expect for SE in the AI future?

Learning Outcomes

a more than passing awareness of how large language models "work" on code
ability to discuss the (current) tradeoffs of these tools
analyze the way such tools are evaluated and discern hype from reality

#	Topic	Readings	Exercises
LLM-1	LLM overview • Research Opportunities	Naturalness paper	Command Pattern
LLM-2	Discussion and analysis	Remaining 3 papers

Required Readings

Hindle et al., On the Naturalness of Software
Codex
StarCoder
Patch Generation With Language Models: Feasibility and Scaling Behavior

Optional Readings and Activities

Karampatsis, Big Code != Big Vocabulary Autocomplete.
Xu, Vasilescu, Neubig, "In IDE Code Generation from Natural Language" [sections 1-4, 8,9]
LeGoues podcast audio
LeGoues, Survey of APR

Helpful tutorials and summaries

Alammar, The Illustrated Transformer
Vaswani et al. Attention is all you need
https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots - how these tools are really improved
Willison, "understanding GPT tokenizers"
Self-Attention from Scratch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Learning Outcomes

Required Readings

Optional Readings and Activities

Helpful tutorials and summaries

Files

README.md

Latest commit

History

README.md

File metadata and controls

Learning Outcomes

Required Readings

Optional Readings and Activities

Helpful tutorials and summaries