# GitHub Copilot Crushes Data Science And ML Tasks: Ultimate Review
## Watch Copilot perform more than 20 common data science tasks with just a comment
![](images/pexels.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://www.pexels.com/@harveyvillarino?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Harvey Tan Villarino</a>
        on 
        <a href='https://www.pexels.com/photo/vintage-technology-sport-bike-6503106/?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Pexels![pexels.jpg](attachment:b7c67b67-6f57-471d-b1b1-3b33f35366cb.jpg)</a>
    </strong>
</figcaption>

### Setup

In [1]:
import logging
import time
import warnings

import catboost as cb
import datatable as dt
import joblib
import lightgbm as lgbm
import matplotlib.pyplot as plt
import numpy as np
import optuna
import pandas as pd
import seaborn as sns
import shap
import umap
import umap.plot
import xgboost as xgb
from optuna.samplers import TPESampler
from sklearn.compose import *
from sklearn.impute import *
from sklearn.metrics import *
from sklearn.model_selection import *
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import *

logging.basicConfig(
    format="%(asctime)s - %(message)s", datefmt="%d-%b-%y %H:%M:%S", level=logging.INFO
)
optuna.logging.set_verbosity(optuna.logging.WARNING)
warnings.filterwarnings("ignore")
pd.set_option("float_format", "{:.5f}".format)

### Introduction

I said to myself - it is gonna take more than an AI coding assistant to make me switch from PyCharm to VSCode. That was the time when Copilot Beta was just released and it was VSCode-exclusive. I was impressed enough to write an article about it ([which went viral](https://towardsdatascience.com/should-we-be-worried-now-that-github-copilot-is-out-12f59551cd95)) but didn't have immediately plans to start using it if I ever got out of the ten mile-long waitlist.

Then, a few days back, I saw someone on Twitter mentioning Copilot for data science tasks and I was tempted to try it just once, even if I have to install VSCode. To my delight, I found out that not only was I cleared from the waitlist but I could install it in PyCharm as well. 

### Testing for common preprocessing tasks

Let's start with something small, like imports:

![](images/1_imports.gif)

It takes only a single line prompt of importing Matplotlib to import other common libraries. Also, after a single Sklearn statement, Copilot starts suggesting other classes from the library. If you notice, after I changed one of the import statements to load all classes in a module with \*, Copilot picked up the pattern to import other modules as a whole.

Then, I focused a bit on preprocessing tasks like below:

![](images/2_split.gif)

Copilot produced the entire function from just a single line of comment to split the data into train, validation and test sets - a task commonly done when you have so much data that you can afford splitting it into three. 

I also used Copilot to write some functions which I almost always use when I open a new dataset:

![](images/6_object_string.gif)

`object` columns in Pandas DataFrames are the worst - they suck your RAM for living. 

The next function is non-negotiable if you claim to be a real Python lover:

![](images/16_snake_case.gif)

If you've read some of my articles last September, you know that I went a bit berserk and went all in to showing [how to work with large datasets efficiently](https://towardsdatascience.com/how-to-work-with-million-row-datasets-like-a-pro-76fb5c381cdd?source=your_stories_page----------------------------------------). In one of them, I discussed how casting the data types of columns to the smallest subtype possible can reduce dataset size up to 90%. Well, I asked Copilot to reproduce that same function and it raised to the occasion splendidly:

![](images/7_cast_numeric_cols.gif)

Imagine typing this out!