# D-Lab Copilot Assisted Coding Workshop - Python Edition

## Learning Objectives

<div class="alert alert-success">  
    
### Learning Objectives 
    
1. Set-up and navigate Visual Studio Code with Python
2. Take advantage of the main functionalities of GitHub Copilot for coding
3. Understand some of the strengths and weaknesses of AI coding assistants
</div>

Throughout this workshop, we will use the following icons:

🔔 **Question**: A quick question to help you understand what's going on.

🥊 **Challenge**: Interactive exercise. We'll go through these in the workshop! Solutions to the challenges can be found in the solutions folder.

⚠️ **Warning**: Heads-up about tricky stuff or common mistakes.

💡 **Tip**: How to do something a bit more efficiently or effectively.

## 1. Getting Comfortable with Visual Studio Code for Python

In our workshop, we will be using GitHub codespaces to run Visual Studio Code in the cloud. This means that you do not need to install anything on your computer to participate in the workshop. However, since you will likely want to use Visual Studio Code on your own computer in the future, we will walk you through how to set up Visual Studio Code for Python development.

When you start VS Code on your own computer, you will be greeted by a Welcome screen that prompts you to start by opening a file or folder. Click `Open Folder` and open the GitHub-Copilot workshop folder that you downloaded from GitHub.

Now that you have opened a folder, you will see an `Explorer` panel appear with all of the files within the folder you just opened. Click on the `workshop.ipynb` file to open it. You can open and close the `Explorer` panel by clicking on the icon that looks like two pieces of paper in the left-hand sidebar.

💡 **Tip**: When you are on your own computer in Visual Studio Code, you may notice the text sometimes goes off the screen if the line is very long. You can change your VS Code settings to make it so that the text is automatically wrapped. To do this:
1. Press `Ctrl + ,` or `Cmd + ,` to open the settings (You can also click on the cog icon in the bottom left corner of the window and then click `Settings`).
2. Search for word "wrap" in the search bar.
3. In the dropdown menu for `Editor: Word Wrap`, select on.
For more tips and tricks like this, check out the extensive and user-friendly Visual Studio Code documentation (https://code.visualstudio.com/docs) or ask GitHub Copilot!

### Visual Studio Code Extensions for Python
**Extensions** are add-ons that you can install to add additional features and tools to Visual Studio Code. You can find extensions by clicking on the square icon with four squares in the left-hand sidebar to open the `Extensions` panel. We will be using the following extensions in this workshop:
1. Python - This extension provides support for the Python programming language and includes IntelliSense, debugging, and Jupyter notebook support.
2. Jupyter - This extension provides Jupyter notebook support within VS Code.
3. GitHub Copilot - This extension is what gives us access to Copilot in VS Code

In the search bar, type Python. Click on the Python extension and then click the green `Install` button. Repeat for the Jupyter and GitHub Copilot extensions. When you install the GitHub Copilot extension you will be prompted to login to your GitHub account. This is necessary to use Copilot.

### Running code in Visual Studio Code with Jupyter

Now that we have all the extensions we need, we can start running Python code! In Jupyter notebooks, you can run a cell by clicking on it and pressing `Shift + Enter`. You can also click the "Run" button that appears to the left of each code cell.

Run the Python code below and see what happens.

In [None]:
print("Yippee!")

When you run this code you will notice that the output appears directly below the cell. This is one of the advantages of using Jupyter notebooks - the output is displayed inline with your code.

### Installing and importing packages

The next steps are not necessary for this workshop, because we are using the GitHub codespace where everything is set-up for you. However, if you are working on your own computer, you will need to install some packages.

First, let's make sure we have the packages we need. We will use pandas for data manipulation, matplotlib and seaborn for visualization, and numpy for numerical operations.

In [None]:
# If you haven't installed these packages before, uncomment the lines below
# !pip install pandas matplotlib seaborn numpy scipy scikit-learn

💡 **Tip**: The exclamation mark (!) at the beginning of the line tells Jupyter to run the command in the system shell rather than as Python code. This is how we install packages from within a notebook.

Let's import the libraries we'll need and test that everything is working:

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")

# Create a tiny dataframe to test
tiny_df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})

# Plot the data
plt.figure(figsize=(6, 4))
plt.scatter(tiny_df['x'], tiny_df['y'])
plt.xlabel('x')
plt.ylabel('y')
plt.title('Test Plot')
plt.show()

## 2. Introduction to GitHub Copilot

Now that we have set up Visual Studio Code, we can start using GitHub Copilot!

For our examples, we will be using data from [Gapminder](https://www.gapminder.org/), an educational non-profit. It includes data for 142 countries, with values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.

In [None]:
# Read in data
gap = pd.read_csv("data/gapminder.csv")
gap.head()

⚠️ **Warning**: By default, Jupyter notebooks will set the working directory to the location of the notebook file. This is convenient for accessing data files in the same directory structure. You can check where your working directory is by running the code below:

In [None]:
import os
print(os.getcwd())

### GitHub Copilot Chat

GitHub Copilot chat allows you to directly ask questions to GitHub Copilot. To open the chat, click `Ctrl + Shift + I` or `Cmd + Shift + I`. You can ask Copilot to generate code, explain code, or provide examples in this chat. For easy access, you can optionally add a "chat" button to your tool bar on the left by dragging the Copilot chat window on the right to your toolbar.

💡 **Tip**: A common question we get is: what can Copilot "see" of my files and code? By default, Copilot can see code that you have highlighted or are working on in the open file. If you want Copilot to ignore your open file, you can click the eye button next to "Current file" in the chat to disable current file context. Copilot cannot see all the files in your folder automatically, however you can tell Copilot to look at other files in your project by clicking the paperclip button in the chat to add context. Copilot learns from everything you show it, so the more you use it, the better it will get.

### 🥊 Challenge 1:
Open the chat, highlight the code in the cell below, and ask Copilot to explain it by typing a question into the chat (e.g., "Explain this code, please"), no need to copy the code in, highlighting is enough.

In [None]:
gap.groupby(['continent', 'year'])['lifeExp'].mean().reset_index().pivot(index='continent', columns='year', values='lifeExp')

Copilot can also help us debug errors in our code. Below I have written some code to create histograms of the "lifeExp" and "gdpPercap" columns in the `gap` dataframe. However, I have made three mistakes in the code. Can you find them? Run the code and then copy the error into Copilot and ask it to help you debug it.

In [None]:
plt.figure(figsize=(10, 6))
plt.hist(gap['gdpPercap'], bins=30, alpha=0.7)
plt.axhline(y=gap['gdpPercap'].mean(), color='red', linestyle='--')
plt.xlabel('GDP per Capita')
plt.title('Histogram of GDP per Capita')
plt.colorbar()  # This doesn't make sense for a histogram
plt.show()

💡 **Tip**: Copilot may suggest various improvements when debugging. This kind of "conversation" is an important part of working with Copilot.

If you hover your cursor over the code chunk generated by Copilot you will see in the upper-right hand corner of the cell some buttons including (1) an `Apply in editor` button that will apply the edits to the code in the window, (2) an `Insert at cursor` button that you can click to insert the code at the cursor in your open file, and (3) a `Copy` button that you can click to copy the code to your clipboard. We generally recommend using buttons (2) and (3) as button (1) can sometimes take a while and be a bit glitchy. Use button (2) `Insert at cursor` to insert the corrected code below.

In [None]:
# Add corrected code here

You may notice that in addition to fixing errors, Copilot also made some other improvements to the code. For example, it might have suggested better bin sizes or improved the styling. These improvements can be helpful, but you should always check to make sure these changes make sense for your data.

In [None]:
# Example corrected code
plt.figure(figsize=(10, 6))
plt.hist(gap['gdpPercap'], bins=30, alpha=0.7, color='skyblue', edgecolor='black')
plt.axvline(x=gap['gdpPercap'].mean(), color='red', linestyle='--', label='Mean')
plt.xlabel('GDP per Capita')
plt.ylabel('Frequency')
plt.title('Histogram of GDP per Capita')
plt.legend()
plt.show()

### In-line Chat
You can start an in-line chat with Copilot by pressing `Ctrl + I` or `Cmd + I`. This will allow you to ask Copilot for help with the code you are currently writing. When you activate in-line chat while highlighting code, Copilot will edit the code in the highlighted area. To exit in-line chat, press `Esc`.

### 🥊 Challenge 2:
Highlight this matplotlib code, press `Ctrl + I` or `Cmd + I`, and ask Copilot to change the background of the plot to your favorite color (e.g. "purple"), increase the text size of the x and y-axis titles to size 14, and make the title center aligned (you can highlight this prompt and copy it word for word into the inline chat). This is super helpful for when you can't remember what the exact syntax is for a specific matplotlib element.

Click "Accept" to accept the changes Copilot suggests. You can also click "Discard" to discard the changes, "Rerun Request" to ask Copilot to generate new code, or "Toggle changes" to see the changes Copilot made.

Accept the changes and run the cell to see your new plot.

In [None]:
plt.figure(figsize=(10, 6))
plt.hist(gap['gdpPercap'], bins=30, color='blue', edgecolor='black')
plt.xlabel('GDP per Capita')
plt.title('Histogram of GDP per Capita')
plt.show()

⚠️ **Warning**: Sometimes you may notice that Copilot doesn't do everything you told it to. This is because Copilot is still learning and may not always understand what you are asking for. You can always ask it again or make the changes yourself. For example, when you asked to change the background, it may have changed the figure background when what you really wanted was to change the plot area background, or vice versa; in that case, you would have to update your request to be more specific.

### In-line suggestions
Another handy feature of Copilot is in-line suggestions as you type. These suggestions are known as "ghost text" and appear in a lighter gray text after your cursor. You can accept them by pressing `Tab` or ignore them by continuing to type (or click `Esc` to reject them outright). These suggestions can be very helpful for completing code quickly. Suggestions are automatically triggered by the code you write, based on context from the code you have already written. You may notice as you use Copilot more often that the suggestions even learn your "style" of coding.

### 🥊 Challenge 3:
Start typing `unique_countries = gap['country'].` and see what suggestions Copilot gives you. You may need to wait a second for the suggestion to appear or continue typing. Click `Tab` to accept the suggestion. This should complete the code to create a list of unique countries in the `gap` dataframe.

In [None]:
# Start typing here

⚠️ **Warning**: Copilot sometimes provides suggestions that are not what you want. You can always ignore the suggestion and continue typing. Copilot can change over time and give different suggestions based on even minor differences in context clues. You can test this out by deleting the line of code you just wrote and typing it again to see if you get the same suggestion.

### 🥊 Challenge 4:
You can also trigger suggestions by providing comments in your code. These comments allow you to give more detailed information and context for the in-line suggestions. Start typing `# Calculate average gdpPercap and lifeExp grouped by continent for the year 2007` and see what suggestions Copilot gives you. Click `Tab` to accept the suggestion.

💡 **Tip**: You may have to start writing code (e.g., `avg_gap = gap`) to trigger the suggestions.

In [None]:
# Calculate average gdpPercap and lifeExp grouped by continent for the year 2007

Now you have seen how we can use both the chat, in-line chat, and in-line suggestions to interact with Copilot. So far we have used some simple examples, but Copilot can also help with more complex code.

### 🥊 Challenge 5:
Here, we have code below that is a bit untidy (missing proper spacing, no comments, long single lines of code). Highlight the code below and in the chat (you can use the in-line or the regular chat) ask Copilot to reformat this code and add comments.

In [None]:
gap_asia=gap[gap['continent']=='Asia'].groupby('year')['lifeExp'].mean().reset_index()
plt.figure(figsize=(8,6))
plt.plot(gap_asia['year'],gap_asia['lifeExp'])
plt.xlabel('Year')
plt.ylabel('Life Expectancy')
plt.title('Average Life Expectancy in Asia by Year')
plt.show()

gap_europe=gap[gap['continent']=='Europe'].groupby('year')['lifeExp'].mean().reset_index()
plt.figure(figsize=(8,6))
plt.plot(gap_europe['year'],gap_europe['lifeExp'])
plt.xlabel('Year')
plt.ylabel('Life Expectancy')
plt.title('Average Life Expectancy in Europe by Year')
plt.show()

gap_oceania=gap[gap['continent']=='Oceania'].groupby('year')['lifeExp'].mean().reset_index()
plt.figure(figsize=(8,6))
plt.plot(gap_oceania['year'],gap_oceania['lifeExp'])
plt.xlabel('Year')
plt.ylabel('Life Expectancy')
plt.show()

You may notice that you have to do some extra work to complete the clean-up; for example, you may want to remove unnecessary or incorrect comments or add titles that are missing.

In [None]:
# Paste new cleaned code here

We can take this clean-up a step further and ask it to make a function from our repetitive code. Highlight the code above and ask Copilot to reduce repetitiveness in the code by turning it into a function that takes in data and a continent name (for some of you, it may have done this automatically in your first try based on how you asked for the clean-up)

In [None]:
# Paste the new function code here

### Avoiding common pitfalls

A common pitfall when working with Copilot is using it to generate code without understanding what the code is doing. **Just because code runs without an error does not mean it is correct.** Remember, you are the pilot and Copilot is...the copilot; it is crucial to use your knowledge and expertise to guide your coding decisions. To demonstrate this pitfall, let's ask Copilot to do something very broad: figure out what drives life expectancy.

In [None]:
# Using the chat, ask Copilot to determine the driver of life expectancy in our data

You will probably get some code that builds a linear regression model using scikit-learn or statsmodels. For example:

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder
import statsmodels.api as sm

# Prepare the data
# Encode categorical variables
le_country = LabelEncoder()
le_continent = LabelEncoder()

gap_encoded = gap.copy()
gap_encoded['country_encoded'] = le_country.fit_transform(gap['country'])
gap_encoded['continent_encoded'] = le_continent.fit_transform(gap['continent'])

# Select features and target
X = gap_encoded[['year', 'pop', 'gdpPercap', 'continent_encoded', 'country_encoded']]
y = gap_encoded['lifeExp']

# Fit the model
X_with_const = sm.add_constant(X)
model = sm.OLS(y, X_with_const).fit()
print(model.summary())

What next steps would you take to assess the validity of this model?

Next, let's check the residuals of the model to see if they are normally distributed:

In [None]:
import scipy.stats as stats

# Get residuals
residuals = model.resid

# Q-Q plot for residuals
plt.figure(figsize=(8, 6))
stats.probplot(residuals, dist="norm", plot=plt)
plt.title('Q-Q Plot of Residuals')
plt.show()

The residuals do not appear to be normally distributed (i.e., the points deviate from the Q-Q line). Let's do some additional data visualization to figure out what is going on:

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Population vs Life Expectancy
axes[0].scatter(gap['pop'], gap['lifeExp'], alpha=0.6)
axes[0].set_xlabel('Population')
axes[0].set_ylabel('Life Expectancy')
axes[0].set_title('Population vs Life Expectancy')

# Year vs Life Expectancy
axes[1].scatter(gap['year'], gap['lifeExp'], alpha=0.6)
axes[1].set_xlabel('Year')
axes[1].set_ylabel('Life Expectancy')
axes[1].set_title('Year vs Life Expectancy')

# GDP per Capita vs Life Expectancy
axes[2].scatter(gap['gdpPercap'], gap['lifeExp'], alpha=0.6)
axes[2].set_xlabel('GDP per Capita')
axes[2].set_ylabel('Life Expectancy')
axes[2].set_title('GDP per Capita vs Life Expectancy')

plt.tight_layout()
plt.show()

We notice that the relationship between some of the variables and life expectancy are non-linear. We are going to focus on population size, because there is a particularly strong non-linear pattern in the data. There appear to be some groups in the data; let's plot the points colored by continent and country to see if there is a pattern:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# By continent
for continent in gap['continent'].unique():
    subset = gap[gap['continent'] == continent]
    axes[0].scatter(subset['pop'], subset['lifeExp'], label=continent, alpha=0.6)
axes[0].set_xlabel('Population')
axes[0].set_ylabel('Life Expectancy')
axes[0].set_title('Population vs Life Expectancy by Continent')
axes[0].legend()

# By country (too many to show legend)
countries = gap['country'].unique()
colors = plt.cm.tab20(np.linspace(0, 1, len(countries)))
for i, country in enumerate(countries[:20]):  # Show only first 20 countries for clarity
    subset = gap[gap['country'] == country]
    axes[1].scatter(subset['pop'], subset['lifeExp'], color=colors[i], alpha=0.6, s=10)
axes[1].set_xlabel('Population')
axes[1].set_ylabel('Life Expectancy')
axes[1].set_title('Population vs Life Expectancy by Country (first 20 countries)')

plt.tight_layout()
plt.show()

We can see that the relationship varies by country and fitting one line to all of the countries is probably not a good idea. If this was a real analysis, there are additional steps we would take to address this issue, but the point of this exercise is that everything we just did to validate our model was not suggested by Copilot when we asked it to identify the drivers of life expectancy. If we had just gone along with what Copilot gave us, we may not have discovered the problem with population size. **Not all incorrect code will result in an error or warning, the most common and insidious code problems are silent**, which is why it is essential to understand what your code is doing. We had to use our knowledge as data scientists to guide our analysis. Copilot is still helpful for creating code to run our models, generate plots, etc., but we need to be the ones in the driver's seat. You are the expert!

## Key Points

Summary of what was learned in today's workshop:

- Visual Studio Code is a code editor that has extensions to allow you to code in various languages (i.e., Python) and to use GitHub Copilot.

- To chat with Copilot, you can open the chat by clicking `Ctrl + Shift + I` or `Cmd + Shift + I` or use the in-line chat feature (`Ctrl + I` or `Cmd + I`).

- Copilot can provide suggestions as you type, based on the context of your code. Press `Tab` to accept a suggestion.

- Copilot can help you clean-up your code and change repetitive code into functions.

- Copilot is still learning and may not always provide the exact code you want. You may need to make changes to the code it provides. The more you use Copilot, the better it will get.

For more information on using GitHub Copilot in VS Code, check out the official documentation, which includes many other helpful tips and tricks: https://code.visualstudio.com/docs/copilot/overview

💡 **Tip**: If you have been going through this workshop in our GitHub Codespace and you want to download your edited file, right-click on the file you want to download (i.e. workshop.ipynb) in the `Explorer` panel on the left-hand side and click `Download...` (Note: you may have to click `Allow` on a security pop-up). If you are comfortable with Git, you can also fork this repository and commit your changes to your fork.