<center><a href="https://www.nvidia.com/en-us/training/"><img src="https://developer.nvidia.com/sites/default/files/akamai/embedded/images/EDU/DLI%20Asset%20-%20Logo.jpg" width="400" height="186" /></a></center>

# Speed Up DataFrame Operations w/ RAPIDS cuDF
A **DataFrame** is a 2-dimensional data structure used to represent data in a tabular format, like a spreadsheet or SQL table. Originally offered through the Python Data Analysis ([pandas](https://pandas.pydata.org/docs/)) library, DataFrames have become very popular for its familiar representation as well as a robust set of features that are intuitive and expressive. 

Raw data often needs to be maniupulated before it can be used for further purpose such as Business Intelligence, Dashboard Visualization, or Machine Learning. These preprocessing steps can include filtering, merging, grouping, and aggregating. 

Below is a typical data processing pipeline: 
![pipeline](https://github.com/NVDLI/notebooks/blob/kl/cudf_speed_up/images/flow.png?raw=true)

According to [studies](https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/?sh=29f71b266f63), data preparation accounts for ~80% of the work for analysts. This could be due to the rapid increase in the size of data as well as the iterative nature of analytics. 

Recognizing this potential bottleneck, NVIDIA created [cuDF](https://docs.rapids.ai/api/cudf/stable/) that leverages GPU hardware and software to perform data manipulati on tasks with parallel computing, saving valuable time and resources. The cuDF library is part of the larger RAPIDS data science framework that allows for the execution of end-to-end analytics pipelines entirely on GPUs. One of the focus for cuDF and its companion suite of open source software libraries to is to provide familiar APIs, making it very easy to implement. 

This notebook is intended to demonstrate a speed up in data processesing by moving common DataFrame operations to the GPU with minimal changes to existing code. 

###Environment Sanity Check

Click the _Runtime_ dropdown at the top of the page, then _Change Runtime Type_ and confirm the instance type is _GPU_.

Check the output of `!nvidia-smi` to make sure you've been allocated a Tesla T4, P4, or P100.

In [1]:
!nvidia-smi

Mon Aug 16 20:01:20 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   57C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

###Setup:
Because the cuDF isn't readily available in this Google Colab environment, it needs to be installed following the below steps: 
1. Updates gcc in Colab
2. Installs Conda
3. Install RAPIDS' current stable version of its libraries
4. Copy RAPIDS .so files into current working directory, a neccessary workaround for RAPIDS+Colab integration.


In [2]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/env-check.py

Cloning into 'rapidsai-csp-utils'...
remote: Enumerating objects: 282, done.[K
remote: Counting objects: 100% (111/111), done.[K
remote: Compressing objects: 100% (90/90), done.[K
remote: Total 282 (delta 61), reused 39 (delta 21), pack-reused 171[K
Receiving objects: 100% (282/282), 82.35 KiB | 691.00 KiB/s, done.
Resolving deltas: 100% (123/123), done.
***********************************************************************
Woo! Your instance has the right kind of GPU, a Tesla T4!
***********************************************************************



In [None]:
# This will update the Colab environment and restart the kernel.  Don't run the next cell until you see the session crash.
!bash rapidsai-csp-utils/colab/update_gcc.sh
import os
os._exit(00)

Updating your Colab environment.  This will restart your kernel.  Don't Panic!
Ign:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Get:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Ign:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Get:6 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:9 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease [15.9 kB]
Hit:10 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:11 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Hit:12 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease
Hit:13 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu bionic InRelease
Get:14 http:/

In [1]:
# This will install CondaColab.  This will restart your kernel one last time.  Run this cell by itself and only run the next cell once you see the session crash.
import condacolab
condacolab.install()

⏬ Downloading https://github.com/jaimergp/miniforge/releases/latest/download/Mambaforge-colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:35
🔁 Restarting kernel...


In [1]:
# you can now run the rest of the cells as normal
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


In [2]:
# Installing RAPIDS is now 'python rapidsai-csp-utils/colab/install_rapids.py <release> <packages>'
# The <release> options are 'stable' and 'nightly'.  Leaving it blank or adding any other words will default to stable.
# The <packages> option are default blank or 'core'.  By default, we install RAPIDSAI and BlazingSQL.  The 'core' option will install only RAPIDSAI and not include BlazingSQL, 
!python rapidsai-csp-utils/colab/install_rapids.py stable
import os
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ['CONDA_PREFIX'] = '/usr/local'

Installing RAPIDS Stable 21.08
Starting the RAPIDS install on Colab.  This will take about 15 minutes.
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - cudatoolkit=11.0
    - gcsfs
    - llvmlite
    - openssl
    - python=3.7
    - rapids=21.08


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    abseil-cpp-20210324.2      |       h9c3ff4c_0        1010 KB  conda-forge
    aiohttp-3.7.4.post0        |   py37h5e8e339_0         625 KB  conda-forge
    anyio-3.3.0 

### Loading a Sample Data

In [2]:
import numpy as np # for generating sample data

import pandas as df
import time # for clocking process times
import matplotlib.pyplot as plt # for visualizing results

We start our demonstration by generating two 2-dimensional arrays of random numbers - we've configured for sizeable arrays at 1MM rows by 50 columns each. Once the random arrays are generated, they are converted to DataFrames using `pandas.DataFrame()` or `cudf.DataFrame()`

In [6]:
rows=1000000
columns=50

In [8]:
def load_data():
  start=time.time()
  

  data_a=np.random.rand(rows, columns)
  data_b=np.random.rand(rows, columns)
  dataframe_a=df.DataFrame(data_a, columns=[f'a_{i}' for i in range(columns)])
  dataframe_b=df.DataFrame(data_b, columns=[f'b_{i}' for i in range(columns)])
  

  process_time=time.time()-start
  print(f'The loading process took {process_time:.2f} seconds')
  
  return dataframe_a, dataframe_b, process_time
dataframe_a, dataframe_b, process_time=load_data()
display(dataframe_a.tail(5))
display(dataframe_b.tail(5))

The loading process took 0.73 seconds


Unnamed: 0,a_0,a_1,a_2,a_3,a_4,a_5,a_6,a_7,a_8,a_9,a_10,a_11,a_12,a_13,a_14,a_15,a_16,a_17,a_18,a_19,a_20,a_21,a_22,a_23,a_24,a_25,a_26,a_27,a_28,a_29,a_30,a_31,a_32,a_33,a_34,a_35,a_36,a_37,a_38,a_39,a_40,a_41,a_42,a_43,a_44,a_45,a_46,a_47,a_48,a_49
999995,0.318904,0.719155,0.621336,0.578785,0.759481,0.536744,0.009518,0.845558,0.745016,0.762431,0.807678,0.03456,0.465485,0.428346,0.582307,0.644316,0.747532,0.703226,0.508122,0.924723,0.491353,0.8737,0.217828,0.840306,0.802018,0.79551,0.619,0.012358,0.982013,0.669414,0.343491,0.622467,0.971936,0.553009,0.637301,0.478396,0.732153,0.63897,0.264546,0.216498,0.362455,0.639202,0.269265,0.727496,0.607361,0.432926,0.334657,0.338584,0.248383,0.638332
999996,0.743614,0.207547,0.50269,0.682063,0.141699,0.701187,0.49382,0.013331,0.52493,0.286593,0.162959,0.38123,0.785886,0.380504,0.340126,0.351502,0.66745,0.622897,0.928731,0.508709,0.565148,0.502206,0.004838,0.933403,0.738148,0.53184,0.993282,0.502379,0.846521,0.468781,0.723097,0.830576,0.077326,0.276834,0.976827,0.86065,0.372303,0.683559,0.057739,0.988223,0.237514,0.571661,0.116769,0.965005,0.3606,0.873834,0.369722,0.74962,0.034162,0.885188
999997,0.406007,0.673363,0.179672,0.703205,0.275026,0.714475,0.141729,0.596118,0.536309,0.700972,0.202732,0.171962,0.129728,0.407291,0.828775,0.454696,0.31664,0.509446,0.644177,0.344393,0.203597,0.52069,0.365892,0.929756,0.6775,0.684981,0.281321,0.811872,0.841566,0.796724,0.027367,0.968231,0.607218,0.799007,0.572474,0.274804,0.432311,0.93056,0.638095,0.67993,0.563899,0.947205,0.970118,0.974085,0.314138,0.017844,0.234234,0.67237,0.433267,0.791213
999998,0.50171,0.918499,0.976172,0.098163,0.148817,0.174616,0.413232,0.801049,0.199907,0.580595,0.028105,0.412594,0.29636,0.44427,0.717527,0.745956,0.087767,0.952368,0.371084,0.122601,0.991571,0.749805,0.827415,0.654759,0.399486,0.949504,0.616656,0.232539,0.348774,0.869668,0.0485,0.375804,0.780241,0.611872,0.255986,0.868576,0.795996,0.337512,0.120255,0.21378,0.382574,0.136129,0.573281,0.945882,0.147189,0.721501,0.782056,0.796168,0.249075,0.273406
999999,0.762569,0.275198,0.39373,0.68764,0.107719,0.925889,0.382885,0.890722,0.594749,0.285974,0.438702,0.227009,0.357522,0.406699,0.585929,0.329315,0.651123,0.296486,0.490635,0.274902,0.977867,0.177091,0.169198,0.070896,0.010289,0.110818,0.021514,0.76168,0.975087,0.267417,0.009079,0.213821,0.317828,0.868153,0.242778,0.545486,0.355029,0.717463,0.98989,0.631187,0.404317,0.319294,0.193983,0.660436,0.889094,0.139826,0.995459,0.210791,0.318259,0.543987


Unnamed: 0,b_0,b_1,b_2,b_3,b_4,b_5,b_6,b_7,b_8,b_9,b_10,b_11,b_12,b_13,b_14,b_15,b_16,b_17,b_18,b_19,b_20,b_21,b_22,b_23,b_24,b_25,b_26,b_27,b_28,b_29,b_30,b_31,b_32,b_33,b_34,b_35,b_36,b_37,b_38,b_39,b_40,b_41,b_42,b_43,b_44,b_45,b_46,b_47,b_48,b_49
999995,0.398533,0.745495,0.048269,0.174175,0.942458,0.288487,0.321187,0.321023,0.790255,0.267633,0.188153,0.543379,0.96105,0.564795,0.335185,0.711345,0.683035,0.472411,0.107308,0.276004,0.001867,0.408879,0.493416,0.638562,0.237178,0.53717,0.330519,0.825668,0.279718,0.702933,0.24746,0.635658,0.663372,0.978454,0.418107,0.826985,0.336842,0.54283,0.420464,0.213567,0.553416,0.922767,0.803768,0.644861,0.526865,0.902068,0.387954,0.653876,0.243659,0.415585
999996,0.41256,0.398073,0.474783,0.468066,0.355692,0.692596,0.064374,0.411697,0.601558,0.299044,0.87121,0.259571,0.867832,0.718982,0.769612,0.213357,0.573419,0.974861,0.268415,0.254601,0.399408,0.500262,0.503102,0.600874,0.051754,0.182896,0.97166,0.31121,0.185165,0.100453,0.12155,0.869733,0.881798,0.141848,0.89966,0.742391,0.802278,0.709745,0.748502,0.512911,0.209722,0.800816,0.050963,0.154502,0.293688,0.119943,0.482943,0.413194,0.259578,0.55488
999997,0.973048,0.796254,0.007305,0.533486,0.979714,0.399738,0.320234,0.232313,0.318876,0.033806,0.675117,0.255512,0.537959,0.703583,0.084151,0.914434,0.914033,0.571691,0.814781,0.655165,0.651091,0.639369,0.011729,0.195565,0.180069,0.000507,0.37136,0.137485,0.722037,0.588969,0.579977,0.545799,0.449502,0.314995,0.320353,0.305392,0.428927,0.860219,0.695863,0.678991,0.965006,0.970638,0.884505,0.962075,0.323243,0.45255,0.671191,0.245284,0.61982,0.289844
999998,0.671032,0.568929,0.697375,0.973897,0.704415,0.172174,0.474686,0.120109,0.194576,0.005538,0.878262,0.225641,0.623102,0.605431,0.376539,0.517996,0.742342,0.343875,0.011165,0.275187,0.296763,0.711606,0.122155,0.248879,0.356361,0.07543,0.928236,0.84002,0.654329,0.565137,0.871982,0.52523,0.169743,0.012614,0.078683,0.473031,0.854374,0.016891,0.901806,0.91757,0.916927,0.744254,0.137492,0.702303,0.688583,0.41719,0.208066,0.310772,0.145824,0.090375
999999,0.772556,0.514124,0.88043,0.241636,0.116229,0.5188,0.536131,0.738294,0.413282,0.644818,0.0962,0.716156,0.880033,0.592948,0.172228,0.436634,0.443219,0.952186,0.183746,0.11291,0.111753,0.383369,0.082823,0.166046,0.413047,0.516164,0.145605,0.454013,0.964791,0.819065,0.349986,0.413549,0.979457,0.628706,0.271079,0.667383,0.349647,0.561039,0.429967,0.77767,0.767941,0.623675,0.683315,0.224006,0.869348,0.760768,0.190619,0.887351,0.045752,0.211439


There are times when we have to merge two data sources into one with `DataFrame.merge()`: 

In [10]:
def merge_data(left_df, right_df):
  start=time.time()
  combined_df=df.merge(left_df, right_df, left_index=True, right_index=True)
  process_time=time.time()-start
  print(f'The merging process took {process_time:.2f} seconds')
  return combined_df, process_time
combined_df, process_time=merge_data(dataframe_a, dataframe_b)
print(f'The merged DataFrame has {combined_df.shape[1]} columns')
display(combined_df.head())

The merging process took 1.03 seconds
The merged DataFrame has 100 columns


Unnamed: 0,a_0,a_1,a_2,a_3,a_4,a_5,a_6,a_7,a_8,a_9,a_10,a_11,a_12,a_13,a_14,a_15,a_16,a_17,a_18,a_19,a_20,a_21,a_22,a_23,a_24,a_25,a_26,a_27,a_28,a_29,a_30,a_31,a_32,a_33,a_34,a_35,a_36,a_37,a_38,a_39,...,b_10,b_11,b_12,b_13,b_14,b_15,b_16,b_17,b_18,b_19,b_20,b_21,b_22,b_23,b_24,b_25,b_26,b_27,b_28,b_29,b_30,b_31,b_32,b_33,b_34,b_35,b_36,b_37,b_38,b_39,b_40,b_41,b_42,b_43,b_44,b_45,b_46,b_47,b_48,b_49
0,0.027069,0.586187,0.469475,0.263308,0.624001,0.594076,0.018722,0.147582,0.240646,0.087935,0.6729,0.609745,0.422647,0.869875,0.843771,0.603028,0.463014,0.664444,0.552628,0.517124,0.365855,0.450523,0.177817,0.142698,0.390373,0.917075,0.257009,0.234273,0.858658,0.908269,0.227197,0.766139,0.19773,0.886297,0.861197,0.672055,0.117981,0.187616,0.776417,0.479721,...,0.888178,0.64109,0.719943,0.708839,0.082494,0.27843,0.712987,0.746893,0.166512,0.500913,0.013023,0.982312,0.991773,0.739673,0.00952,0.465901,0.325257,0.203918,0.454083,0.731371,0.898681,0.747613,0.184568,0.859019,0.130529,0.868186,0.596191,0.190599,0.446225,0.208778,0.006531,0.615988,0.912878,0.930075,0.917551,0.771813,0.589415,0.538499,0.958867,0.197634
1,0.21218,0.735726,0.438963,0.023718,0.151715,0.099224,0.409209,0.871139,0.428222,0.825762,0.126581,0.745688,0.281162,0.302851,0.372973,0.142945,0.540088,0.705317,0.901218,0.714506,0.721046,0.315751,0.740448,0.826376,0.17869,0.669898,0.62916,0.575368,0.453994,0.577767,0.3724,0.884414,0.617815,0.902462,0.161379,0.147387,0.20915,0.889256,0.29871,0.757579,...,0.313589,0.803027,0.630594,0.499855,0.441187,0.90702,0.084113,0.33086,0.638842,0.728506,0.193774,0.931305,0.779968,0.138801,0.42056,0.312619,0.535374,0.178688,0.88512,0.548143,0.287048,0.593912,0.781947,0.37189,0.945783,0.570537,0.623201,0.678088,0.669532,0.120271,0.825671,0.795523,0.413151,0.126024,0.362623,0.221271,0.303475,0.093503,0.28201,0.506212
2,0.645612,0.27475,0.607538,0.719801,0.813668,0.477036,0.557303,0.283507,0.23911,0.571467,0.790533,0.853631,0.429088,0.942532,0.466288,0.197721,0.252907,0.20615,0.874068,0.531437,0.930023,0.897828,0.826762,0.330097,0.527383,0.325509,0.06673,0.588707,0.251069,0.201027,0.789375,0.040245,0.830845,0.476775,0.336548,0.09091,0.897105,0.406602,0.465949,0.942874,...,0.498749,0.905389,0.450171,0.691625,0.108852,0.477633,0.323488,0.75919,0.949,0.185845,0.257176,0.617295,0.042793,0.366868,0.611651,0.805319,0.798011,0.672055,0.900219,0.93265,0.442502,0.506266,0.993572,0.501324,0.809956,0.115436,0.819598,0.867321,0.665115,0.91655,0.175187,0.32613,0.13024,0.440953,0.446801,0.402399,0.948066,0.998556,0.529901,0.521304
3,0.526157,0.775339,0.504882,0.304286,0.566336,0.009293,0.824825,0.143542,0.368471,0.445662,0.382847,0.935641,0.558403,0.819303,0.148567,0.078923,0.18811,0.368356,0.626996,0.770864,0.752806,0.784249,0.801045,0.60974,0.761263,0.409608,0.718489,0.443403,0.352863,0.983732,0.283727,0.788488,0.560929,0.684788,0.784494,0.653206,0.910638,0.713787,0.292973,0.480636,...,0.274511,0.337592,0.088625,0.850396,0.813547,0.466742,0.868046,0.040856,0.394904,0.376374,0.669294,0.461666,0.280958,0.152491,0.31239,0.560446,0.749528,0.216254,0.517217,0.146171,0.782342,0.485316,0.607167,0.985303,0.445049,0.535826,0.398367,0.164387,0.647192,0.817998,0.256993,0.358584,0.772372,0.427448,0.83796,0.168399,0.663235,0.217494,0.811491,0.584388
4,0.476371,0.532105,0.314552,0.75329,0.752874,0.85355,0.380807,0.320628,0.179687,0.088485,0.64489,0.488923,0.895439,0.899622,0.272772,0.880648,0.455818,0.48722,0.377481,0.486581,0.756364,0.503515,0.37141,0.10631,0.12226,0.223875,0.439267,0.449156,0.100128,0.928292,0.443095,0.361172,0.706818,0.06094,0.197754,0.368388,0.875865,0.432962,0.454769,0.683262,...,0.049253,0.618764,0.71972,0.073198,0.663322,0.508705,0.774475,0.34039,0.425831,0.974969,0.331137,0.879963,0.159365,0.187973,0.532507,0.963128,0.480436,0.835035,0.596067,0.514089,0.946912,0.273812,0.011468,0.012721,0.938531,0.395389,0.152713,0.424273,0.879404,0.853773,0.829563,0.36821,0.851883,0.317057,0.299929,0.70308,0.500079,0.020449,0.570028,0.408859


Next, we can perform univariate statistics with the `df.describe()` method. Univariate statistics include count, min, max, mean, and standard deviation. 

In [11]:
def summarize(dataframe):
  start=time.time()
  summary_df=dataframe.describe()
  process_time=time.time()-start
  print(f'The summarizing process took {process_time:.2f} seconds')
  return summary_df, process_time
summary_df, process_time=summarize(combined_df)
display(summary_df)

The summarizing process took 4.11 seconds


Unnamed: 0,a_0,a_1,a_2,a_3,a_4,a_5,a_6,a_7,a_8,a_9,a_10,a_11,a_12,a_13,a_14,a_15,a_16,a_17,a_18,a_19,a_20,a_21,a_22,a_23,a_24,a_25,a_26,a_27,a_28,a_29,a_30,a_31,a_32,a_33,a_34,a_35,a_36,a_37,a_38,a_39,...,b_10,b_11,b_12,b_13,b_14,b_15,b_16,b_17,b_18,b_19,b_20,b_21,b_22,b_23,b_24,b_25,b_26,b_27,b_28,b_29,b_30,b_31,b_32,b_33,b_34,b_35,b_36,b_37,b_38,b_39,b_40,b_41,b_42,b_43,b_44,b_45,b_46,b_47,b_48,b_49
count,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,...,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0
mean,0.500251,0.5000858,0.500148,0.5004392,0.500291,0.4999623,0.5000168,0.5003089,0.499745,0.4997406,0.5002977,0.500375,0.499668,0.5001683,0.5001819,0.49988,0.499868,0.499896,0.5001622,0.499982,0.4997071,0.4998466,0.500097,0.499987,0.4996847,0.4997962,0.4996086,0.4997452,0.5006091,0.500463,0.500363,0.5001847,0.5002426,0.500638,0.500045,0.5000103,0.500318,0.4999459,0.4999382,0.500155,...,0.500436,0.4999944,0.5000713,0.5000212,0.50011,0.499488,0.4997467,0.5000759,0.4999556,0.4999778,0.4997686,0.5000516,0.4996013,0.5004106,0.4998386,0.4995978,0.5003543,0.4998198,0.500389,0.4995114,0.500482,0.5003916,0.4998974,0.499459,0.4999516,0.5000277,0.5003805,0.500128,0.4995189,0.499542,0.500382,0.4998537,0.500625,0.4999244,0.4998,0.499959,0.5002642,0.5002359,0.500124,0.5002364
std,0.288795,0.2887168,0.288704,0.2887477,0.288774,0.2886138,0.2887164,0.2887018,0.288662,0.2888058,0.2885634,0.28879,0.288729,0.2886892,0.2887426,0.288888,0.288692,0.2888166,0.2889337,0.288727,0.2885727,0.2890346,0.288863,0.288854,0.2885735,0.2888503,0.2888308,0.2886256,0.2885852,0.288537,0.288578,0.2888234,0.2887765,0.288559,0.288609,0.2888006,0.28898,0.2886704,0.2886713,0.2887133,...,0.2886,0.2885932,0.2885857,0.2887377,0.288627,0.288582,0.2887388,0.2886667,0.2886108,0.2886594,0.2885894,0.2886473,0.2888212,0.2884444,0.2889479,0.2885653,0.2886656,0.2886098,0.288428,0.2887197,0.288624,0.2886005,0.2885968,0.288673,0.2887666,0.2884214,0.2885649,0.288622,0.2885346,0.288658,0.288656,0.2886238,0.288579,0.288734,0.288505,0.28855,0.2887228,0.2885012,0.288812,0.2884807
min,2e-06,1.505385e-08,2e-06,2.312953e-07,3e-06,8.820713e-08,3.759098e-07,5.807722e-07,1e-06,9.188132e-07,5.509986e-08,1e-06,2e-06,6.248977e-07,2.138551e-08,2e-06,2e-06,6.051017e-07,6.717522e-07,2e-06,9.089906e-08,9.28867e-07,2e-06,5e-06,7.952429e-07,1.427398e-07,9.627895e-07,1.484832e-07,3.786733e-07,2e-06,2e-06,5.78935e-07,1.53463e-07,1e-06,1e-06,6.9639e-08,4e-06,7.64622e-07,6.367182e-07,5.571519e-08,...,3e-06,2.532708e-07,3.670647e-07,7.669532e-07,2e-06,2e-06,2.023182e-08,9.679254e-07,3.375486e-07,7.996245e-07,2.472573e-07,1.430043e-07,4.135148e-07,1.604828e-07,2.147024e-07,3.289426e-07,3.694465e-07,6.605615e-07,1e-06,2.910854e-07,3e-06,2.456015e-07,1.582303e-07,2.224776e-07,3.499109e-07,1.199961e-07,2.836836e-07,2e-06,5.25276e-07,1e-06,2e-06,5.140083e-07,2e-06,4.267662e-07,1e-06,1e-06,1.960486e-07,2.823017e-07,1e-06,7.991642e-07
25%,0.249893,0.2499654,0.250249,0.2504189,0.250099,0.2500635,0.2499038,0.2502986,0.249805,0.2496466,0.2503645,0.250327,0.249373,0.2501539,0.2502047,0.249814,0.250025,0.2496491,0.2494603,0.249416,0.249737,0.2489752,0.24965,0.249657,0.2500043,0.2493198,0.2495936,0.2498092,0.2505979,0.250742,0.250787,0.2500141,0.2502288,0.251135,0.250308,0.2496803,0.249661,0.2498102,0.2498235,0.2504229,...,0.250659,0.2499319,0.2503448,0.2497809,0.250315,0.249323,0.2495624,0.2505176,0.2500536,0.2497541,0.2498549,0.249981,0.2492976,0.2509448,0.249604,0.249869,0.2501477,0.2500698,0.251029,0.2493681,0.250463,0.2505802,0.2499191,0.2491159,0.2497529,0.2506206,0.2505339,0.250091,0.2500864,0.249701,0.250257,0.2498095,0.250987,0.2501986,0.250025,0.250286,0.2502588,0.2509013,0.249822,0.250423
50%,0.500379,0.5001788,0.500347,0.5007966,0.500292,0.4997331,0.5000781,0.5001457,0.499623,0.499447,0.5005883,0.500465,0.499769,0.5002276,0.5000216,0.499728,0.500062,0.5000418,0.5008636,0.50026,0.4994048,0.4998833,0.500321,0.500322,0.4995302,0.4994904,0.4993509,0.4995158,0.5013476,0.500679,0.500665,0.500154,0.5004136,0.500623,0.500115,0.5001809,0.500248,0.4999288,0.5001614,0.4999906,...,0.500628,0.5002817,0.5001069,0.5005478,0.499967,0.499456,0.499861,0.4999437,0.4998088,0.4995773,0.499694,0.4999816,0.499692,0.5004099,0.4998847,0.4992711,0.5007983,0.4999822,0.500152,0.4996688,0.500425,0.5005934,0.5003876,0.4991154,0.4997222,0.4997503,0.5004804,0.50037,0.4988868,0.498968,0.500656,0.4998072,0.500975,0.4997262,0.499644,0.49955,0.5003291,0.5002453,0.500393,0.500355
75%,0.750767,0.7501525,0.750239,0.7503875,0.750657,0.7497991,0.7499641,0.7504348,0.749798,0.7497488,0.7502707,0.750733,0.749493,0.7501938,0.7505857,0.75051,0.750001,0.749866,0.7503692,0.749762,0.7495855,0.7502781,0.750411,0.750041,0.7494719,0.7504818,0.7496643,0.749584,0.7505205,0.750331,0.750266,0.7506557,0.7503313,0.750657,0.750026,0.7499615,0.75091,0.7500501,0.7495817,0.7501762,...,0.750429,0.749966,0.7499619,0.7497726,0.750103,0.749229,0.7497092,0.7497062,0.7496638,0.7501891,0.7496996,0.7497134,0.7495021,0.7499744,0.7507414,0.7495362,0.7503897,0.7495657,0.750401,0.7496216,0.75082,0.7499871,0.7496305,0.7494701,0.7502158,0.749816,0.7500817,0.750451,0.7491411,0.749541,0.750629,0.7498698,0.750262,0.7497678,0.749661,0.74964,0.7501647,0.7501763,0.750103,0.7499955
max,0.999998,0.9999995,0.999999,0.9999994,0.999999,0.9999998,0.9999977,1.0,1.0,0.9999998,0.9999985,1.0,1.0,0.9999998,0.9999964,0.999998,1.0,0.9999989,0.9999983,0.999998,0.9999981,0.9999999,0.999999,1.0,0.9999996,0.9999975,0.9999986,0.9999999,0.999999,0.999998,1.0,0.9999992,0.9999986,1.0,1.0,0.9999998,0.999999,0.999999,0.9999971,0.9999986,...,0.999997,0.9999988,0.9999959,0.9999976,0.999998,0.999998,0.9999991,0.9999956,1.0,0.9999996,0.9999989,0.9999964,0.9999992,1.0,0.9999991,0.9999996,1.0,0.9999996,0.999999,0.9999985,0.999999,0.9999987,0.9999926,0.9999995,0.9999976,0.9999997,0.9999991,0.999998,0.9999998,1.0,0.999998,0.9999995,0.999998,0.9999998,0.999996,0.999999,0.9999993,0.9999999,0.999999,0.9999996


We can perform bi-variate statistics such as correlation between two variables using the `df.corr()` method to find out if there are highly correlated variables: 

In [54]:
def correlation(dataframe): 
  start=time.time()
  corr_df=dataframe.corr()
  process_time=time.time()-start
  print(f'The correlation process took {process_time:.2f} seconds')
  return var_df, process_time
corr_df, process_time=correlation(combined_df)
display(var_df.head())

The correlation process took 18.78 seconds


Unnamed: 0,a_0,a_1,a_2,a_3,a_4,a_5,a_6,a_7,a_8,a_9,a_10,a_11,a_12,a_13,a_14,a_15,a_16,a_17,a_18,a_19,a_20,a_21,a_22,a_23,a_24,a_25,a_26,a_27,a_28,a_29,a_30,a_31,a_32,a_33,a_34,a_35,a_36,a_37,a_38,a_39,...,b_10,b_11,b_12,b_13,b_14,b_15,b_16,b_17,b_18,b_19,b_20,b_21,b_22,b_23,b_24,b_25,b_26,b_27,b_28,b_29,b_30,b_31,b_32,b_33,b_34,b_35,b_36,b_37,b_38,b_39,b_40,b_41,b_42,b_43,b_44,b_45,b_46,b_47,b_48,b_49
a_0,1.0,-0.003012,-0.002597,0.002352,-0.000464,-0.001732,-0.001681,0.002229,-0.004885,-0.000468,-0.003782,-0.001525,0.006607,0.003943,0.0034,-0.001136,-0.000155,0.002659,0.002231,-2.1e-05,0.000193,-0.0004,-0.000506,0.000747,0.001584,-0.003192,-0.005026,0.001375,-0.000144,-0.000845,0.001094,-8.8e-05,-0.000862,-0.000833,0.005281,0.000251,-0.001471,-0.003389,-0.000422,-0.000789,...,-0.000702,-1.6e-05,-0.000375,0.004152,-0.002723,0.00071,-0.003697,0.002051,-0.006471,-0.00031,-0.001411,0.004546,0.004072,-0.005238,-0.007091,-0.000717,-0.000369,0.000317,0.003071,-0.002569,0.002292,0.002836,0.001318,0.00384,-0.002781,-0.003812,0.000242,-0.000799,-0.00366,0.000822,-0.001663,-0.000126,0.002807,-0.003147,0.000277,0.001923,-0.002946,0.002509,0.003357,-0.000383
a_1,-0.003012,1.0,-0.000842,0.005502,-0.00267,-0.000705,-0.002787,-0.004085,0.004894,0.000421,0.002037,-0.00515,-0.000852,-0.000521,-0.00226,5.1e-05,0.003565,0.000777,-0.005649,0.00083,0.000526,0.005362,-0.003351,0.003855,-0.005331,0.005151,0.001369,-0.000806,-0.003105,0.004756,-0.003323,0.002013,-0.004569,-0.001413,-0.002517,-0.001747,0.001465,-0.001127,-0.000511,-0.000785,...,-0.000389,0.001358,0.004232,0.003234,-3.4e-05,0.004494,-0.000544,0.000834,-0.000795,0.009153,0.003491,-0.003963,0.005013,0.002315,-0.00396,-0.005212,-0.000518,0.003725,0.001555,-0.002797,0.003884,-0.004629,-0.004173,0.000587,-0.003426,9.9e-05,-0.001041,0.004291,0.000574,0.001202,-0.001846,0.002765,0.003257,0.001635,-0.001896,-0.002036,0.003804,0.003406,-0.00109,0.001169
a_2,-0.002597,-0.000842,1.0,0.003369,0.001364,-0.000952,-0.00259,-0.005196,-0.00228,-0.005434,-0.003453,0.000515,0.001113,0.006953,0.002178,0.001092,-0.001271,-6.8e-05,-0.000975,-0.000171,0.001962,-0.000501,5.9e-05,-0.006556,0.002381,-0.005294,0.00135,-0.002051,0.00288,0.000743,0.00063,-0.000616,0.001057,0.001725,0.001943,-0.004638,-0.002407,-0.001905,-0.003029,0.006077,...,-0.001625,-0.001743,-0.000324,-0.002154,-0.001287,-0.002239,-0.006965,0.002382,0.000485,-0.001562,0.003951,-0.006896,-0.000816,-0.00144,0.000648,-0.004787,0.002931,-0.002775,-0.002224,-0.001544,-0.006853,0.000216,0.001599,0.004352,0.001984,0.004093,0.000558,-0.001192,0.00091,-0.000374,0.004306,0.001996,0.00225,0.00087,-0.002168,-0.000193,0.002518,-0.004443,0.000242,-0.001423
a_3,0.002352,0.005502,0.003369,1.0,-0.003523,-0.003311,-0.002491,-0.003158,0.001393,-0.001254,0.001036,-0.001206,-0.004394,-0.003498,0.004079,-0.001738,0.001174,-0.003493,0.007267,-0.002169,0.002209,0.000706,0.000552,-0.001953,-0.0021,-0.005596,-0.001846,0.002313,-0.003486,0.003141,0.003999,0.000251,-0.003841,0.002846,0.001147,-0.001448,-0.0009,0.001389,0.007764,-0.00409,...,-0.003835,-0.00399,0.000211,0.009248,-0.001373,-0.005572,0.000265,1.5e-05,-0.000867,0.004018,0.001936,0.002722,0.002503,-0.004122,0.000371,-9.4e-05,-0.000382,0.000589,0.002904,3.7e-05,0.00092,-0.000587,0.000712,0.005953,-0.000655,-0.000435,0.001177,-0.002581,-0.002064,-0.003693,0.000196,0.001682,-0.002227,-0.006521,0.005852,-0.003252,0.002756,-0.001555,-0.001146,-0.000972
a_4,-0.000464,-0.00267,0.001364,-0.003523,1.0,0.000877,-0.00286,0.001609,-0.001288,0.001536,-0.005412,0.00012,0.007046,0.000742,-0.001196,0.003832,0.003736,-0.001998,-0.00045,-0.004206,0.002281,0.003528,0.003038,0.001509,-0.003898,0.000917,-0.000195,0.001171,-0.003877,-0.002982,0.00218,0.004771,0.002471,0.004019,-0.000496,0.006667,0.002698,0.00074,0.00207,0.002945,...,-0.004346,-0.004387,0.001277,0.007372,0.000967,-0.005029,0.004326,-0.000578,0.004261,0.006717,0.002566,0.000111,-0.004556,0.001465,0.003069,0.000377,0.002199,0.001874,0.000695,0.002988,0.000919,0.000472,-0.005537,0.003283,0.001156,0.000423,0.001377,0.003127,0.003569,-0.004023,-0.001752,-0.004864,0.001745,-0.000546,0.001449,0.007738,0.001006,-0.001154,-0.00062,0.004577


We can potentially simplify our analysis by binning and grouping continuous variables with the `pd.cut()` and `cudf.cut()` methods, followed by `df.groupby()` and `df.mean()`: 

In [55]:
def groupby_summarize(dataframe):
  start=time.time()
  dataframe['group']=df.cut(dataframe.iloc[:, 0], 10)
  group_describe_df=dataframe.groupby('group').mean()
  process_time=time.time()-start
  print(f'The grouping process took {process_time:.2f} seconds')
  return group_describe_df, process_time
group_describe_df, process_time=groupby_summarize(combined_df)
display(group_describe_df.head())

The grouping process took 0.69 seconds


Unnamed: 0_level_0,a_0,a_1,a_2,a_3,a_4,a_5,a_6,a_7,a_8,a_9,a_10,a_11,a_12,a_13,a_14,a_15,a_16,a_17,a_18,a_19,a_20,a_21,a_22,a_23,a_24,a_25,a_26,a_27,a_28,a_29,a_30,a_31,a_32,a_33,a_34,a_35,a_36,a_37,a_38,a_39,...,b_10,b_11,b_12,b_13,b_14,b_15,b_16,b_17,b_18,b_19,b_20,b_21,b_22,b_23,b_24,b_25,b_26,b_27,b_28,b_29,b_30,b_31,b_32,b_33,b_34,b_35,b_36,b_37,b_38,b_39,b_40,b_41,b_42,b_43,b_44,b_45,b_46,b_47,b_48,b_49
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
"(-0.000999, 0.1]",0.049904,0.500438,0.500178,0.500555,0.500297,0.498944,0.499997,0.50005,0.499569,0.499053,0.499991,0.500241,0.50008,0.498904,0.500805,0.499859,0.499025,0.497786,0.499039,0.500743,0.500408,0.499654,0.500921,0.501114,0.49925,0.499201,0.501049,0.50044,0.498975,0.5001,0.501464,0.49969,0.500435,0.50157,0.500376,0.498767,0.501504,0.50068,0.499575,0.499573,...,0.500911,0.501573,0.500459,0.50057,0.499601,0.500003,0.500813,0.498926,0.499849,0.501081,0.500366,0.499377,0.499556,0.500488,0.500165,0.500036,0.499959,0.498949,0.500876,0.498906,0.499407,0.499619,0.501956,0.499705,0.501906,0.498536,0.500377,0.500732,0.499553,0.49917,0.500073,0.500096,0.500625,0.500907,0.500498,0.499858,0.500571,0.499577,0.500623,0.500788
"(0.1, 0.2]",0.150061,0.497524,0.501016,0.498858,0.500344,0.500765,0.498467,0.498362,0.501242,0.500993,0.500804,0.500043,0.50004,0.500088,0.501124,0.499732,0.500278,0.500235,0.50059,0.500656,0.499541,0.499346,0.499614,0.500867,0.500924,0.499698,0.500224,0.499824,0.499074,0.501248,0.499348,0.501745,0.500543,0.499193,0.500446,0.499684,0.500889,0.498265,0.499146,0.502691,...,0.499999,0.498883,0.499739,0.499823,0.499599,0.499199,0.499415,0.499226,0.49792,0.499817,0.499788,0.501387,0.500274,0.499308,0.500962,0.499712,0.498395,0.500015,0.499456,0.498596,0.49907,0.500694,0.50087,0.499753,0.500774,0.498788,0.499393,0.501564,0.499027,0.500828,0.499819,0.501996,0.499357,0.500594,0.499241,0.50031,0.500702,0.500676,0.500551,0.499105
"(0.2, 0.3]",0.249956,0.500272,0.499812,0.500456,0.502087,0.499388,0.500677,0.499344,0.501242,0.498874,0.499195,0.500221,0.500714,0.499679,0.501081,0.498358,0.500037,0.500355,0.501058,0.500564,0.501343,0.500927,0.499159,0.50059,0.499184,0.50041,0.500646,0.50068,0.500218,0.499961,0.499725,0.501521,0.500001,0.500716,0.498774,0.499764,0.502253,0.500415,0.500988,0.499388,...,0.501735,0.501445,0.499683,0.498888,0.501171,0.499866,0.500447,0.500573,0.501113,0.500726,0.500605,0.499889,0.500505,0.499593,0.50071,0.498654,0.500071,0.499663,0.499397,0.501692,0.49866,0.499422,0.498719,0.500681,0.50059,0.500385,0.499178,0.498825,0.499583,0.499894,0.502264,0.500801,0.500252,0.498391,0.501968,0.499986,0.500814,0.50038,0.500273,0.49958
"(0.3, 0.4]",0.350046,0.500191,0.498973,0.501985,0.499152,0.499621,0.499466,0.500781,0.499378,0.499185,0.502645,0.500966,0.502017,0.499989,0.501062,0.501672,0.499106,0.499492,0.501615,0.499317,0.500183,0.500508,0.500963,0.502042,0.499218,0.49973,0.498793,0.498116,0.500296,0.500956,0.500335,0.49934,0.498474,0.499808,0.499776,0.498727,0.500208,0.498864,0.500036,0.498964,...,0.499302,0.499606,0.499117,0.498577,0.49951,0.499851,0.499186,0.500299,0.50003,0.501345,0.500316,0.499632,0.499132,0.502457,0.499118,0.499213,0.50046,0.500379,0.498942,0.500839,0.500553,0.500398,0.49794,0.499248,0.500292,0.50083,0.499553,0.498825,0.501152,0.500965,0.500644,0.499439,0.500267,0.4994,0.500115,0.49913,0.499702,0.498828,0.498836,0.500294
"(0.4, 0.5]",0.44993,0.499969,0.500068,0.498963,0.499154,0.50074,0.499843,0.50073,0.498742,0.500893,0.499393,0.499305,0.499889,0.498777,0.500666,0.499846,0.501766,0.499297,0.499957,0.500412,0.499538,0.500182,0.500579,0.500088,0.500231,0.501171,0.499441,0.50027,0.498628,0.49824,0.499132,0.50067,0.501089,0.500473,0.499427,0.499733,0.500423,0.500917,0.498332,0.498984,...,0.501085,0.499252,0.498985,0.499945,0.500166,0.499942,0.500168,0.500764,0.49975,0.499608,0.499614,0.499343,0.498898,0.500868,0.500162,0.499423,0.499923,0.499558,0.499691,0.49923,0.500002,0.499804,0.501147,0.499552,0.500553,0.50031,0.49893,0.500696,0.500355,0.49913,0.499877,0.50147,0.497791,0.500593,0.500664,0.500255,0.499909,0.50063,0.500784,0.499815


We can measure the total elapsed time for this sample data processing workflow. 

In [17]:
def pipeline():
  performance={}
  dataframe_a, dataframe_b, performance['load data']=load_data()
  combined_df, performance['merge data']=merge_data(dataframe_a, dataframe_b)
  _, performance['summarize']=summarize(combined_df)
  _, performance['correlation']=correlation(combined_df)
  _, performance['groupby_summarize']=groupby_summarize(combined_df)
  try: 
    df.DataFrame([performance], index=['gpu']).to_pandas().plot(kind='bar', stacked=True)
  except: 
    df.DataFrame([performance], index=['cpu']).plot(kind='bar', stacked=True)
  return None
import pandas as df
pipeline()

The loading process took 0.71 seconds
The merging process took 0.95 seconds
The summarizing process took 3.71 seconds


NameError: ignored

### Switching to GPU
Traditionally, these tasks are frequently done using the popular pandas library, which only runs on a single CPU. NVIDIA's [cuDF](https://docs.rapids.ai/api/cudf/stable/) library was built with the users in mind - by offering nearly identical syntax to its CPU counterpart, developers only have to make few changes to their existing code to take advantage of its capabilities. 

In [12]:
import cudf as df

ModuleNotFoundError: ignored

**That's it!** cuDF uses nearly identical syntax to the familiar [pandas](https://pandas.pydata.org/) API. **Brilliant!** 

In [16]:
pipeline()

The loading process took 0.78 seconds
The merging process took 0.97 seconds
The summarizing process took 4.04 seconds


NameError: ignored

### Comparison Results
In a trial run, cuDF completes the data processing tasks in nearly 4x faster than pandas. The expectations is that it will be even more significant as the size of the data becomes largers. Feel free to give it a try by modifying the row and size of the data. 

### Conclusion
Congratulations on completing the notebook! Want to learn more about cuDF and the rest of the RAPIDS framework? Check out the follow-up to this course, [Accelerating End-to-End Data Science Workflows]('https://courses.nvidia.com/courses/course-v1:DLI+S-DS-01+V1/about') or our other online courses at [NVIDIA DLI]('https://www.nvidia.com/en-us/training/online/').