# Introduction to Large Language Models
### (And How They Can Help You Code)

## There's Never Been an Easier Time to Write Good Code

By using Large Language Models (LLMs), you can more quickly go from code that runs to code you can build on.

They can:

    * Offer suggestions for code refactoring/organization/functions
    * Write you documentation -- both your readme file and function-level documnentation
    * Shorten the learning curve for practices like using git for version control
    * Write your unit tests and virtual environments

You should still expect to be troubleshooting and editing everything these write.

**That may change, but it hasn't yet**


### You Know What an LLM Is, But We're Going to Define it Anyway

A model designed to process and generate human-like text to perform tasks like answering questions, summarizing data, or translating languages. They’re “large” in that they were trained on a lot of data and have a range of capabilities. 

Examples: 
* GPT-3.5 and GPT-4 (OpenAI)
* Claude and Claude 2 (Anthropic)
* Bard (Google)
* LLaMA and LLaMA 2 (Facebook/Open Source/many descendants)


### A More Technical Definition

A large language model (LLM) is a type of **deep learning neural network** (an algorithm inspired by the brain's structure, using layers of computational nodes to recognize patterns in data). Specifically, it's a variant of **transformer architectures** (a cutting-edge structure in machine learning introduced around 2017), which excels in processing language. 

The LLM operates by handling sequences of tokens (tokens can be words, parts of words, or characters, depending on how the text is split up) and its primary job is to predict subsequent **tokens** (basically guessing the next word in a sequence). 

These models are massive, having billions of **parameters** (adjustable parts of the model that help it learn from data). A key feature of transformers is their use of **attention mechanisms** (a technique allowing the model to "focus" on different parts of the input with varying intensities), which help weigh the importance of different input tokens when generating outputs.

*This is the only part of the presentation I just took verbatim from GPT-4.*

### Coding Is Another Kind of Text Synthesis and Prediction

* LLMs are text sythesis and prediction machines, and coding is another kind of text
* The LLMs you'll use were trained on A LOT OF CODE -- but how much varies by language
* There are code-specific **models** and code-specific **benchmarks** - and you might choose to get deeper into either of those. (Want to make an R benchmark?)

### There's Also A Lot of Coding-Related Text Online!

* LLMs were trained on documentation, too
* And lots of guidance on how to do tasks like deal with your git error messages
* And, like with code, they can give you more personal (and sometimes even accurate!) advice on those

### Documentation: Writing Down What You Know 

#### Repo-Level Documentation

- This is the readme that's in your git repository. It's in a format called markdown (also what this presentation is in)
- High-level description of what your repo does, inputs and outputs, things someone might need to know if they want to run or contribute to it. Specifics will vary based on repo and purpose.
- If this is code that will get run again or shared or built on, there should a) be a repo and b) be a readme file
- Good coding platforms will provide this functionality! If you use Posit Connect, Databricks, etc., there is version control capabilities
- If you're not using git, you can and should still write documentation with this same information
- Example: "can you write me a md readme file for this code? please note a few things: 1) this is a MVP. it doesn't parse the regulations - we're just pulling them from the existing github site that has the regs parsed already and hard-coding the links. once we know that the output is useful, I'll parse the whole code. 2) this is using GPT 3.5 because I'm on the waitlist for the API for GPT 4. please write this with the markdown tags as text”

#### Function-Level Documentation

- Function-level documentation is a place to describe each function's purpose, parameters, return values, and anything weird
- You might not always write comprehensive function-level documentation, but it's a good practice if you're writing a package or other code that's getting shared, reused, and built on
- Example prompts: "Write function-level documentation for this code"








---

## Code Organization and Error Handling with LLMs

- LLMs generate test cases for your R functions using the `testthat` package, increasing likelihood of code reliability and correctness
- This keeps you from having to do ad hoc testing (or no testing at all!) when you make updates
- Example: "Write unit tests for this code"
- [slide with example of prompts/responses]

---

## Refactoring/Reorganizing/Rewriting through LLMs

- LLMs assist in reviewing and optimizing your existing R code
- They can identify areas of improvement and suggest more efficient or readable solutions using the latest R packages and techniques
- They can make style changes as well (and help you get started with lintr, a library for doing code analysis)
- Example: "Review this R code snippet and suggest improvements"
- [slide with example of prompts/responses]

---

## Reducing the Learning Curve for Git and Virtual Environments with LLMs

- LLMs can guide you through using git and virtual environments in R, making it easier to adopt best practices for version control and dependency management
- By asking LLMs for guidance, you can learn the necessary commands, workflows, and concepts for using git and virtual environments like `renv`
- LLMs can also help with troubleshooting and interpreting error messages related to git and virtual environments, accelerating your learning process
- Example: "Explain how to set up a git repository for an R project and initialize a virtual environment using renv"
- [slide with example of prompts/responses]

---

## Conclusion

- Large language models are powerful tools for adopting better coding practices in R faster and more efficiently
- They can help with documentation, code organization, error handling, and test cases
- By using LLMs, you can reduce technical debt and improve collaboration, maintainability, and efficiency in your R projects


