# Coding with LLMs: Visual Studio Code and GitHub Copilot

* * * 

<div class="alert alert-success">

### Learning Objectives 

1.  Set-up and navigate Visual Studio Code
2.  Take advantage of the main functionalities of GitHub Copilot for coding
3.  Understand some of the strengths and weaknesses of AI coding assistants

</div>

### Icons Used in This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the lesson!<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
⚠️ **Warning:** Heads-up about tricky stuff or common mistakes.<br>
🤖 **AI Generated**: Code generated by an LLM that we'll test and debug.<br>

### Sections
1. [Getting Comfortable with Visual Studio Code](#section1)
2. [Getting Started with GitHub Copilot](#section2)
3. [Agent Mode with GitHub Copilot](#section3)
4. [In-line Options for GitHub Copilot](#section4)

<h2 id="section1" style="scroll-margin-top: 96px;">Getting Comfortable with Visual Studio Code</h2>

### Installation
This week, we will be getting comfortable with Visual Studio Code (VS Code), which is an **Integrated Development Environment (IDE)**. You can think of an IDE as powerful text editor that has many additional features to make your life easier while coding. VS Code is particularly popular because it's free, lightweight, and highly customizable with thousands of extensions available.

You have two options for using VS Code:

1. **Use GitHub Codespaces** (what we'll mainly use for this class): GitHub Codespaces allows you to run VS Code in the cloud. This means that you do not need to install anything on your computer to run the analyses for this class. This is a great option if you want to quickly and painlessly run VS Code, or your computer does not have enough memory or processing power to run the analyses for this class.

2. **Install VS Code on your computer**: Download and install VS Code from the [official website](https://code.visualstudio.com/). The installation process is straightforward - just download the installer for your operating system (Windows, macOS, or Linux) and follow the setup instructions. This is a good option if your computer is capable of running the analyses for this class, and you'd like to customize your VS Code for workflows outside of this class.

Setting up the course materials with GitHub Codespaces is super easy. Simply navigate to [course respository on GitHub](https://github.com/dlab-berkeley/COMPSS-211), click the green "Code" button in the top right, and click the "+" button in the "Codespaces" section indicated by the red circle below:

<center><img src="../../img/github_codespaces.png" width="300"></center>

If you've already created a Codespace, you can always come back to it. When you click the "Code" button, you should also seed your available Codespaces. Above, this is indicated by the green rectangle. You can click on this to access your previously created Codespace.

### Anatomy of a VS Code Window
When you start VS Code on your own computer, you will be greeted by a Welcome screen that prompts you to start by opening a file or folder. Click `Open Folder` and open the materials you've downloaded for this class from the GitHub repository. If you're using a Codespace, the course materials will already be loaded.

There are many components to VS Code, but the main ones to keep in mind are:
1. **Activity Bar:** This is the bar on the far left. It links to several panels that may be helpful: the Explorer, a Search function, a Debugging Panel, Extensions, etc.
2. **Explorer:** This is a panel that opens up on the left side next to the Activity Bar. It will show the file structure of your loaded folder. You can use this to open up different files, such as Jupyter Notebooks.
3. **Main Window:** If you click on different files in the Explorer tab, they'll open up in the main window as separate tabs. You can arrange tabs in different ways: for example, drag one tab to one side of the window, and you'll enter side-by-side view.

There are many other components, such as the Terminal, the Secondary Side Bar, etc. For more information, check out the extensive and user-friendly [Visual Studio Code documentation](https://code.visualstudio.com/docs) or ask you can even ask GitHub Copilot!

### VS Code Extensions

VS Code extensions add powerful features to your coding environment. For this class, we recommend installing the following:

1. Python: Enables Python language support, code completion, linting, and debugging.
2. GitHub Copilot: Provides AI-powered code suggestions and chat assistance directly in your editor.

To install an extension, open the Extensions panel (the square icon on the left sidebar), search for the extension name, and click Install.

If you're using Codespaces, these extensions should already be installed for you. If you're using VS Code on your own computer, make sure both are enabled.

You can also explore other extensions!

### Running code in Visual Studio Code

Now that we have all the extensions we need, we can start running Python code! To run a line of code, click on the line you want to run and press `Ctrl + Enter` or `Cmd + Enter`. To run multiple lines of code, highlight the lines you want to run and press `Ctrl + Enter`  or `Cmd + Enter`.

When running code on Codespaces, you may need to set up a kernel for your Jupyter Notebook when running code. If asked for an interpreter, select "Jupyter Kernels", and select the starred option with the name of this class. If you do not see "Jupyter Kernels" as an option, refresh the page.

<h2 id="section2" style="scroll-margin-top: 96px;">Getting Started with GitHub Copilot</h2>

Now that we have set up Visual Studio Code, we can start using GitHub Copilot!

For our examples, we will be using data from [Gapminder](https://www.gapminder.org/), an educational non-proﬁt. It includes data for 142 countries, with values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.

Let's use Pandas to import the Gapminder dataset from the `data` folder.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("../../data/gapminder.csv")
df.head()

🔔 **Question**: What do the two periods `".."` refer to in the file path we used to import gapminder?

⚠️ **Warning**: By default, VS Code will set the working directory (i.e., where we "are" in your computer) to that of the folder you opened to start working in VS Code (in this case, the `week02_working-with-llms` folder). You can check where your working directory is by running `os.getcwd()` ("get current working directory"). Try it out below:

In [None]:
import os
os.getcwd()

### GitHub Copilot Chat

GitHub Copilot chat allows you to directly ask questions to GitHub Copilot. To open the chat, click `Ctrl + Shift + I` (Windows) or `Cmd + Shift + I` (Mac). You can ask Copilot to generate code, explain code, or provide examples in this chat.

You can change the location of your GitHub Copilot chat. By default, GitHub Copilot is on the right hand side. You can toggle it by clicking the "Secondary Side Bar" button in the top right. If you'd like easy access in the "Activity Bar" (the bar on the left hand side), you can draw the "Chat" label to the left hand side. Note that this will move the chat window to the left hand side. If you'd like the window to be an external window that you can move around, click the three dots in the chat toolbar, and select "Open Chat in New Window". Note that you cannot use an external window if you're in Codespaces.

💡 **Tip**: What files and code in a repository can Copilot "see"? This is called the **context window**.

By default, Copilot can see the code or text you have highlighted or are working on in an open file. If you want Copilot to ignore your open file, you can click the eye button next to "Current file" in the chat to disable current file context. 

<center><img src="../../img/context_window.png" width="300"></center>

By default, the other files in your folder are not added to the context window automatically. However, you can tell Copilot to look at other files in your project by clicking the paperclip button in the chat to add files to the context window.

<center><img src="../../img/context_window2.png" width="300"></center>

Generally, the more files you have in the context window, the better the results you'll obtain when you generate code.

### 🥊 Challenge 1: Trying Out Ask Mode in Copilot Chat

Open the chat, highlight the code in the cell below, and ask Copilot to explain it by typing a question into the chat (e.g.., "Explain this code"). There is no need to copy the code in - highlighting is enough to set the context window.

In [None]:
df.groupby(['continent', 'year'])['lifeExp'] \
  .mean() \
  .reset_index() \
  .pivot(index='continent', columns='year', values='lifeExp')

Copilot can also help us debug errors in our code. Below, we have written some code to plot histograms of the `"lifeExp"` and `"gdpPercap"` columns in the Gapminder dataframe. However, there are three mistakes in the code. Can you find them?

There are a few ways of doing this. You could either highlight the code and ask Copilot Chat to find the errors. You could also run the code, copy the error, and ask the chat to explain why the errors occurred. Lastly, you may find that Copilot autocomplete will insist on correcting the errors.

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='gdpPercap', hue='continent')
plt.axline(x=df['gdpPercap'].mean(), fcolor='red')
plt.xlabel('GDP per Capita')
plt.set_title('Histogram of GDP per Capita')
plt.show()

<h2 id="section3" style="scroll-margin-top: 96px;">Agent Mode with GitHub Copilot</h2>

Depending on how you prompt GitHub Copilot, it may offer you code solutions in different ways.

Let's try out an experiment. In the following code block, we're plotting the GDP Per Capita for each continent as a function of time.

In [None]:
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x='year', y='gdpPercap', hue='continent')
plt.show()

First, ask GitHub Copilot to add axis labels and a title. Copilot will likely give you a new block of code in the chat. It may look something like this:

<center><img src="../../img/week2_labels.png" width="300"></center>

If you hover over the code block, you'll see a few options show up:

<center><img src="../../img/apply_in_editor.png" width="300"></center>

These are, in order:

* **Apply in editor**: Insert the code directly to the place where it makes sense.
* **Insert at cursor**: Insert to code where your cursor is located.
* **Copy to clipboard**: Copy to your clipboard so you can just paste where you'd like

**We recommend using either the second or third option**. GitHub Copilot is actively being developed, and some features may not work well. At the time of this writing, **Apply in editor** does not reliably work well in Jupyter Notebooks.

Another component to this is that the latter two options encourage you to be more deliberate about where and how you incorporate Copilot's suggestions into your code, ensuring that you understand your code a little better.

Now, let's try and be less prescriptive and allow the LLM more freedom to change our code. Highlight the following block, and ask the LLM in chat to "make the plot look nicer". 

Depending on how you prompt the model, which model you use, and how lucky or unlucky you are, you'll get different things.

In [None]:
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x='year', y='gdpPercap', hue='continent')
plt.show()

<center><img src="../../img/ask_diff.png" width="600"></center>


In the above plot, we can see the original lines with a red background, and many new lines, with a green background. GitHub Copilot is recommending the green lines as new changes, which will overwrite the red lines. Using the box in the bottom right (indicated by the pink rectangle), we can either keep or undo the changes. 

Go ahead and keep the changes. How does the plot look?

<h2 id="section4" style="scroll-margin-top: 96px;">In-line Options for GitHub Copilot</h2>

In-line chat allows you to interact with GitHub Copilot directly in the line that you're coding on (rather than having to go to the chat window). It might suit your workflow better, since it works in-line. Highlight a section of code and press `Ctrl + I` (or `Cmd + I` on Mac) - Copilot will prompt you for the changes you'd like. Let's work through this in a challenge.

### 🥊 Challenge 2: Experimenting with In-line Chat

In the code cell below, use `Ctrl + I` or `Cmd + I` to enter in-line chat mode. Ask Copilot to:

* Change the background of the plot to your favorite color (e.g. "purple")
* Add x-axis and y-axis labels
* Set the font size of the x- and y-axis labels to 14.

Instead of having to sift through documentation for matplotlib code, you can simply prompt Copilot to handle the details.

Click "Accept" to accept the changes Copilot suggests. You can also click "Discard" to discard the changes, "Rerun Request" to ask Copilot to generate new code, or "Toggle changes" to see the changes Copilot made (the "Toggle Changes" option may appear in the dropdown menu which you can access by clicking the down arrow next to the "Discard" button).

Accept the changes and run the chunk to see your new plot. 

⚠️ **Warning**: Sometimes you may notice that Copilot doesn't do everything you told it to. This is because Copilot is still learning and may not always understand what you are asking for. You can always ask it again or make the changes yourself. For example, when you asked to change the background, it may have changed the background of the plot, when what you really wanted to do is to change the background of the panel, or vise versa; in that case, you would have to update your request to specify that you want to change the panel background.

In [None]:
plt.figure()
sns.histplot(data=df, x='gdpPercap', binwidth=1000)
plt.tight_layout()
plt.show()

### In-line suggestions
Copilot will also provide in-line suggestions for code, as you type. You can think of this as "code autocomplete." These suggestions are called "ghost text" and appear in a lighter gray text after your cursor, as you're typing. 

You can accept them by pressing `Tab` or ignore them by continuing to type (or click `Esc` to reject them outright). These suggestions can be very helpful for completing code quickly. Suggestions are automatically triggered by the code your write, based on context from the code you have already written, and the recent changes you've made.

### 🥊 Challenge 3: Tab Completion
After the first line below, start typing:

```
sns.histplot(
```

and see what suggestions Copilot gives you. You may need to wait a second for the suggestion to appear or continue typing. Click `Tab` to accept the suggestion.

⚠️ **Warning**: Copilot sometimes provides suggestions that are not what you want. You can always ignore the suggestion and continue typing. Copilot is also very eager to offer suggestions (all the time!). Learning how to actively code while using and ignoring Copilot suggestions is a skill in and of itself!

In [None]:
plt.figure()
# YOUR CODE HERE


### 🥊 Challenge 4: In-line Suggestions with Comments

You can also trigger suggestions by providing comments in your code. These commments allow you to give more detailed information and context for the in-line suggestions. Start typing 
```
# Calculate average gdpPercap and lifeExp grouped by continent for the year 2007
```
and see what suggestions Copilot gives you. Click `Tab` to accept the suggestion.

💡 **Tip**: You may have to start writing code (e.g., `avg_gdp_per_cap = `) to trigger the suggestions.

In [None]:
# YOUR CODE HERE



### 🥊 Challenge 5: Bringing it All Together

Let's try out different techniques we've learned in an example problem.

The code below is quite a mess! It's hard to tell what it's even doing. First, use "Ask mode" to explain what this code is doing.

In [None]:
gm=df[df.continent=='Asia'].groupby('year')['lifeExp'].mean().reset_index(name='mean');plt.figure();sns.lineplot(data=gm,x='year',y='mean');plt.xlabel('Year');plt.ylabel('Life Expectancy');plt.title('Average Life Expectancy in Asia by Year');plt.show()
gm=df[df.continent=='Europe'].groupby('year')['lifeExp'].mean().reset_index(name='mean');plt.figure();sns.lineplot(data=gm,x='year',y='mean');plt.xlabel('Year');plt.ylabel('Life Expectancy');plt.title('Average Life Expectancy in Europe by Year');plt.show()
gm=df[df.continent=='Oceania'].groupby('year')['lifeExp'].mean().reset_index(name='mean');plt.figure();sns.lineplot(data=gm,x='year',y='mean');plt.xlabel('Year');plt.ylabel('Life Expectancy');plt.title('Average Life Expectancy');plt.show()

Next, use in-line mode, and ask it to clean up the code. Go ahead and copy the above code into the next cell and use in-line mode there, so you can compare the differences:

In [None]:
# YOUR CODE HERE




⚠️ **Warning:** You may find Copilot does different things: it may turn it into a for-loop, or it may keep the code in separate blocks, but make it cleaner and commented. It can be hard to tell what led to different outcomes! The best thing you can do to achieve predictable outputs it to be as prescriptive as possible in your prompting. If you give Copilot more details and constraints, it has less opportunity to infer uncertainties.

We can take this clean-up a step further and ask Copilot to make a function from our repetitive code. Highlight the code and ask Copilot in Agent Mode to reduce repetitiveness in the code by turning it into a function that takes in data and a continent name and outputs a plot. Place the function in the following cell:

In [None]:
# YOUR CODE HERE



In the following cell, use your new function to recreate the original plots. **Do not use GitHub Copilot to do this**.

In [None]:
# YOUR CODE HERE



Let's make this function more flexible. Ask Copilot to make the following changes to the function:
- Accept a list of continents, and plot them on the same plot.
- Accept a list of columns to plot. The plot should adaptively adjust the number of subplots according to the number of columns you give it. 
- Name your function `plot_gap_cols_for_continents`.

Place your function in the following cell:

In [None]:
# YOUR CODE HERE

Finally, test your function. One test is provided below. Unit testing is very important with AI-generated code to ensure it's doing what you expect it to do.

**Add 4 more tests to ensure you have confidence that the function provided by Copilot is doing what you want.**

In [None]:
plot_gap_cols_for_continents(df, ['Asia', 'Europe', 'Africa'], ['lifeExp', 'gdpPercap'])

# Stretch Goals

Being able to open up a new dataset, build your own personal mental model of the columns, and quickly develop research questions is a skill that needs to be practiced. With LLMs, it becomes easier to hand off that cognitive work - this can result in negative consequence if you do this too often! Let's be purposeful about building reps with data analysis.

FiveThirtyEight contains a [GitHub repository](https://github.com/fivethirtyeight/data) with many datasets. Pick a dataset that seems interesting, and use the following function to download it. Note that you'll need the full path to the data (folder and file name).

**Without using LLMs, do the following:**

1. Pick a dataset that seems interesting, and import it.
2. Explore the columns and rows of the dataset. How was this data acquired? What is the unit of analysis (rows)?
3. Write down three simple "research questions" that you can answer using a simple data analysis. These do not need to be complicated questions. They simply need to be things that you can answer using the data at hand with some pandas functions.
4. Create 3 plots visualizing some aspect of your data.
5. Write up what you've found, and ask an LLM to critique your findings. You can upload your dataset, screenshots of your images, and your code in order to get feedback.

In [10]:
def load_fte_dataset(dataset_name, delim=','):
    """
    Load a FiveThirtyEight dataset directly from GitHub.

    Parameters:
        dataset_name (str): The filename of the dataset, e.g. 'president_polls.csv'
    
    Returns:
        pd.DataFrame: Loaded dataset as a pandas DataFrame
    """
    base_url = "https://raw.githubusercontent.com/fivethirtyeight/data/master/"
    url = base_url + dataset_name
    return pd.read_csv(url, delimiter=delim)

We'll show one simple example using the Fandango dataset. This dataset contains all films with Rotten Tomatoes ratings (as of 10 years ago), a user rating from Rotten Tomatoes, Metacritic scores, and fan reviews from Fandango. 

The rows are movies, and the columns are different scores describing the movie:

In [15]:
df = load_fte_dataset("fandango/fandango_score_comparison.csv")
df.head(10)

Unnamed: 0,FILM,RottenTomatoes,RottenTomatoes_User,Metacritic,Metacritic_User,IMDB,Fandango_Stars,Fandango_Ratingvalue,RT_norm,RT_user_norm,...,IMDB_norm,RT_norm_round,RT_user_norm_round,Metacritic_norm_round,Metacritic_user_norm_round,IMDB_norm_round,Metacritic_user_vote_count,IMDB_user_vote_count,Fandango_votes,Fandango_Difference
0,Avengers: Age of Ultron (2015),74,86,66,7.1,7.8,5.0,4.5,3.7,4.3,...,3.9,3.5,4.5,3.5,3.5,4.0,1330,271107,14846,0.5
1,Cinderella (2015),85,80,67,7.5,7.1,5.0,4.5,4.25,4.0,...,3.55,4.5,4.0,3.5,4.0,3.5,249,65709,12640,0.5
2,Ant-Man (2015),80,90,64,8.1,7.8,5.0,4.5,4.0,4.5,...,3.9,4.0,4.5,3.0,4.0,4.0,627,103660,12055,0.5
3,Do You Believe? (2015),18,84,22,4.7,5.4,5.0,4.5,0.9,4.2,...,2.7,1.0,4.0,1.0,2.5,2.5,31,3136,1793,0.5
4,Hot Tub Time Machine 2 (2015),14,28,29,3.4,5.1,3.5,3.0,0.7,1.4,...,2.55,0.5,1.5,1.5,1.5,2.5,88,19560,1021,0.5
5,The Water Diviner (2015),63,62,50,6.8,7.2,4.5,4.0,3.15,3.1,...,3.6,3.0,3.0,2.5,3.5,3.5,34,39373,397,0.5
6,Irrational Man (2015),42,53,53,7.6,6.9,4.0,3.5,2.1,2.65,...,3.45,2.0,2.5,2.5,4.0,3.5,17,2680,252,0.5
7,Top Five (2014),86,64,81,6.8,6.5,4.0,3.5,4.3,3.2,...,3.25,4.5,3.0,4.0,3.5,3.5,124,16876,3223,0.5
8,Shaun the Sheep Movie (2015),99,82,81,8.8,7.4,4.5,4.0,4.95,4.1,...,3.7,5.0,4.0,4.0,4.5,3.5,62,12227,896,0.5
9,Love & Mercy (2015),89,87,80,8.5,7.8,4.5,4.0,4.45,4.35,...,3.9,4.5,4.5,4.0,4.5,4.0,54,5367,864,0.5


One thing we could ask is: how correlated are Rotten Tomatoes, Metacritic, IMDB, and Fandango scores? We can use `pd.corr` to compute the correlation matrix for these columns.

In [17]:
df[['RottenTomatoes', 'Metacritic', 'IMDB', 'Fandango_Stars']].corr()

Unnamed: 0,RottenTomatoes,Metacritic,IMDB,Fandango_Stars
RottenTomatoes,1.0,0.95736,0.779671,0.293988
Metacritic,0.95736,1.0,0.727298,0.181124
IMDB,0.779671,0.727298,1.0,0.587295
Fandango_Stars,0.293988,0.181124,0.587295,1.0


The correlations are quite high between Rotten Tomatoes and Metacritic, but rather low with Fandango stars! Could this be due to how the data is structured? The FiveThirtyEight folks also looked at normalized scores. How do the correlations fare for those? (Exercise left to the reader).

Another question: what movies have the biggest difference between the Critic's rating and the User's rating on Rotten Tomatoes? Let's check it out:

In [19]:
df['RottenTomatoes_diff'] = (df['RottenTomatoes'] - df['RottenTomatoes_User']).abs()
df[['FILM', 'RottenTomatoes', 'RottenTomatoes_User', 'RottenTomatoes_diff']].sort_values(by='RottenTomatoes_diff', ascending=False).head(30)

Unnamed: 0,FILM,RottenTomatoes,RottenTomatoes_User,RottenTomatoes_diff
3,Do You Believe? (2015),18,84,66
85,Little Boy (2015),20,81,61
105,Hitman: Agent 47 (2015),7,49,42
69,Mr. Turner (2014),98,56,42
134,The Longest Ride (2015),31,73,42
125,The Wedding Ringer (2015),27,66,39
132,Max (2015),35,73,38
15,Taken 3 (2015),9,46,37
19,Pixels (2015),17,54,37
51,Entourage (2015),32,68,36


Some movies have pretty big differences!

We can keep going with this. Think of a question - no matter how vague, or pie-in-the-sky - can you distill it down to something you can compute? Use your classmates or LLMs as a sounding board. *But think about it yourself first, before using the LLM as a crutch*.

## Key Points

We learned the following:

-   Visual Studio Code is a code editor that has extensions to allow you to code in various langauges (i.e., R) and to use GitHub Copilot.

-   To chat with Copilot, you can open the chat by clicking `Ctrl + Shift + I` or `Cmd + Shift + I` or use the in-line chat feature (`Ctrl + I` or `Cmd + I`).
  
-   Copilot can provide suggestions as you type, based on the context of your code. Press `Tab` to accept a suggestion.

-   Copilot can help you clean-up your code and change repetitive code into functions.

For more information on using GitHub Copilot in VS Code, check out the [official documentation](https://code.visualstudio.com/docs/copilot/overview), which includes many other helpful tips and tricks.