<a href="https://colab.research.google.com/github/ashishpatel26/Rapidsai_Machine_learning_on_GPU/blob/main/10_Min_to_Pandas_and_Cudf_Comparision_on_GPU.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://github.com/ashishpatel26/Rapidsai_Machine_learning_on_GPU/raw/main/images/rapidsailogo.jpg?raw=true)

🤩**About Rapids**🤩

- The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
![](https://github.com/rapidsai/cudf/raw/branch-21.08/img/rapids_arrow.png)

### **What are these Libraries?**
* [cuDF](https://github.com/rapidsai/cudf) is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating tabular data using a DataFrame style API.

* [Dask](https://dask.org/) is a flexible library for parallel computing in Python that makes scaling out your workflow smooth and simple. On the CPU, Dask uses Pandas to execute operations in parallel on DataFrame partitions.

* [Dask-cuDF](https://github.com/rapidsai/cudf/tree/main/python/dask_cudfhttps://github.com/rapidsai/cudf/tree/main/python/dask_cudf) extends Dask where necessary to allow its DataFrame partitions to be processed by cuDF GPU DataFrames as opposed to Pandas DataFrames. For instance, when you call dask_cudf.read_csv(…), your cluster’s GPUs do the work of parsing the CSV file(s) with underlying cudf.read_csv().

### **When to use cuDF and Dask-cuDF**
* If your workflow is fast enough on a single GPU or your data comfortably fits in memory on a single GPU, you would want to use cuDF. If you want to distribute your workflow across multiple GPUs, have more data than you can fit in memory on a single GPU, or want to analyze data spread across many files at once, you would want to use Dask-cuDF.



# 10 Min to Pandas and cuDF Comparision on GPU

🎯 Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

🎯 cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

### **1.Installation of cuDF**

> 🏹**Conda installation** - 📥Recommended

```console
# for CUDA 11.0
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cudf=21.06 python=3.7 cudatoolkit=11.0

# or, for CUDA 11.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cudf=21.06 python=3.7 cudatoolkit=11.2
```
> 🏹**For Nightly Version**

```console
# for CUDA 11.0
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cudf python=3.7 cudatoolkit=11.0

# or, for CUDA 11.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cudf python=3.7 cudatoolkit=11.2
```
> 🏹**PIP Installation**

```console
!pip install cudf
```

> 🏹**Colab Installation** - [![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1rY7Ln6rEE1pOlfSHCYOVaqt8OvDO35J0#forceEdit=true&offline=true&sandboxMode=true
)

In [None]:
# Install RAPIDS
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!bash rapidsai-csp-utils/colab/rapids-colab.sh stable

import sys, os

dist_package_index = sys.path.index('/usr/local/lib/python3.7/dist-packages')
sys.path = sys.path[:dist_package_index] + ['/usr/local/lib/python3.7/site-packages'] + sys.path[dist_package_index:]
sys.path
exec(open('rapidsai-csp-utils/colab/update_modules.py').read(), globals())

### **2.Import Libraries**

In [5]:
import os
import cupy as cp
import pandas as pd
import cudf
import dask_cudf
import time

cp.random.seed(2021)

In [4]:
print(cudf.__version__)
print(dask_cudf.__version__)
print(pd.__version__)

0.19.2
0.19.2
1.1.5


### **3.Object Creation**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 3.84ms| 9.6ms| 18.5ms |

In [10]:
%%time
# 1. Pandas
s = pd.Series([1,2,3,None, 4])
print(s)

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
dtype: float64
CPU times: user 2.35 ms, sys: 0 ns, total: 2.35 ms
Wall time: 3.84 ms


In [11]:
%%time
# 2. cudf
s = cudf.Series([1,2,3,None, 4])
print(s)

0       1
1       2
2       3
3    <NA>
4       4
dtype: int64
CPU times: user 5.37 ms, sys: 35 µs, total: 5.41 ms
Wall time: 9.6 ms


In [12]:
%%time
ds = dask_cudf.from_cudf(s, npartitions=2)
print(ds.compute())

0       1
1       2
2       3
3    <NA>
4       4
dtype: int64
CPU times: user 8.22 ms, sys: 1.77 ms, total: 10 ms
Wall time: 18.5 ms


### **4.Loading Dataset**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 2.42s| 1.52s| 1.62s |


In [16]:
%%time
# 1.Pandas
data = pd.read_csv("https://storage.googleapis.com/industryanalytics/LoanDefaultData.csv")
display(data.head(2))

Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.0,0.0,45.78,0


CPU times: user 1.69 s, sys: 171 ms, total: 1.86 s
Wall time: 2.42 s


In [17]:
%%time
# 2.cudf
data_cudf = cudf.read_csv("https://storage.googleapis.com/industryanalytics/LoanDefaultData.csv")
display(data_cudf.head(2))

Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.0,0.0,45.78,0


CPU times: user 587 ms, sys: 314 ms, total: 901 ms
Wall time: 1.52 s


In [20]:
%%time
# 3.dask_cudf
data_daskcudf = dask_cudf.from_cudf(cudf.read_csv("https://storage.googleapis.com/industryanalytics/LoanDefaultData.csv"), npartitions=8)
display(data_daskcudf.head(2))

Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.0,0.0,45.78,0


CPU times: user 622 ms, sys: 305 ms, total: 927 ms
Wall time: 1.62 s


### **5.Convert Pandas data to cudf and dask_cudf**

| Framework  | Pandas to cudf | Pandas to daskcudf |
| ---- | ---- | ---- | 
| Time | 2.73s| 2.84s| 

In [184]:
%%time
# 1. cudf from pandas
data = pd.read_csv("https://storage.googleapis.com/industryanalytics/LoanDefaultData.csv")
cudf_data = cudf.DataFrame.from_pandas(data)
cudf_data

CPU times: user 2.12 s, sys: 205 ms, total: 2.33 s
Wall time: 2.84 s


In [22]:
%%time
# 2. dask_cudf from pandas
data = pd.read_csv("https://storage.googleapis.com/industryanalytics/LoanDefaultData.csv")
cudf_data = cudf.DataFrame.from_pandas(data)
daskcudf_data = dask_cudf.from_cudf(cudf_data, npartitions=2)

CPU times: user 2.11 s, sys: 252 ms, total: 2.36 s
Wall time: 2.79 s


### **6.Viewing Data**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 241 µs| 4.32ms| 12.7ms |

In [23]:
%%time
# 1.Pandas
data.head()

CPU times: user 232 µs, sys: 2 µs, total: 234 µs
Wall time: 241 µs


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.0,0.0,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.0,0.0,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.0,0.0,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.0,0.0,156.11,0


In [24]:
%%time
# 2.cudf
cudf_data.head()

CPU times: user 4.55 ms, sys: 0 ns, total: 4.55 ms
Wall time: 4.32 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.0,0.0,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.0,0.0,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.0,0.0,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.0,0.0,156.11,0


In [28]:
%%time
# 3.daskcudf
daskcudf_data.head()

CPU times: user 9.35 ms, sys: 0 ns, total: 9.35 ms
Wall time: 12.7 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.0,0.0,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.0,0.0,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.0,0.0,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.0,0.0,156.11,0


### **7.Sorting Values**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 149ms| 37.8ms| 4.67s |

In [29]:
%%time
# 1.Pandas
data.sort_values('cust_id').head()

CPU times: user 143 ms, sys: 0 ns, total: 143 ms
Wall time: 149 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
7853,54734,2009,Haryana,01/08/2009,1102011,0.5,RENT,Low,INDIVIDUAL,debt_consolidation,Low,B,85000,25000,11.89,36 months,19.48,29324.32,25000.0,0.0,829.1,0
614,55521,2008,Karnataka,01/07/2008,1032010,0.5,RENT,Low,INDIVIDUAL,debt_consolidation,High,F,30000,1000,16.08,36 months,23.84,1207.76,999.99,0.0,35.2,0
615,55742,2008,West Bengal,01/05/2008,1062011,0.5,RENT,Low,INDIVIDUAL,credit_card,Low,B,65000,7000,10.71,36 months,14.29,8215.45,7000.0,0.0,228.22,0
636,56413,2008,Nagaland,01/04/2008,1102008,10.0,MORTGAGE,Medium,INDIVIDUAL,debt_consolidation,High,F,189500,7000,16.08,36 months,22.47,1231.9,783.46,0.25,246.38,1
470508,56705,2015,Andhra Pradesh,01/11/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,B,33500,11000,9.99,36 months,18.38,376.25,263.31,0.0,354.89,0


In [30]:
%%time
# 2.cudf
cudf_data.sort_values('cust_id').head()

CPU times: user 33.1 ms, sys: 3.84 ms, total: 36.9 ms
Wall time: 37.8 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
7853,54734,2009,Haryana,01/08/2009,1102011,0.5,RENT,Low,INDIVIDUAL,debt_consolidation,Low,B,85000,25000,11.89,36 months,19.48,29324.32,25000.0,0.0,829.1,0
614,55521,2008,Karnataka,01/07/2008,1032010,0.5,RENT,Low,INDIVIDUAL,debt_consolidation,High,F,30000,1000,16.08,36 months,23.84,1207.76,999.99,0.0,35.2,0
615,55742,2008,West Bengal,01/05/2008,1062011,0.5,RENT,Low,INDIVIDUAL,credit_card,Low,B,65000,7000,10.71,36 months,14.29,8215.45,7000.0,0.0,228.22,0
636,56413,2008,Nagaland,01/04/2008,1102008,10.0,MORTGAGE,Medium,INDIVIDUAL,debt_consolidation,High,F,189500,7000,16.08,36 months,22.47,1231.9,783.46,0.25,246.38,1
470508,56705,2015,Andhra Pradesh,01/11/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,B,33500,11000,9.99,36 months,18.38,376.25,263.31,0.0,354.89,0


In [31]:
%%time
# 3.daskcudf
daskcudf_data.sort_values('cust_id').head()

CPU times: user 3.75 s, sys: 211 ms, total: 3.96 s
Wall time: 4.67 s


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
7853,54734,2009,Haryana,01/08/2009,1102011,0.5,RENT,Low,INDIVIDUAL,debt_consolidation,Low,B,85000,25000,11.89,36 months,19.48,29324.32,25000.0,0.0,829.1,0
614,55521,2008,Karnataka,01/07/2008,1032010,0.5,RENT,Low,INDIVIDUAL,debt_consolidation,High,F,30000,1000,16.08,36 months,23.84,1207.76,999.99,0.0,35.2,0
615,55742,2008,West Bengal,01/05/2008,1062011,0.5,RENT,Low,INDIVIDUAL,credit_card,Low,B,65000,7000,10.71,36 months,14.29,8215.45,7000.0,0.0,228.22,0
636,56413,2008,Nagaland,01/04/2008,1102008,10.0,MORTGAGE,Medium,INDIVIDUAL,debt_consolidation,High,F,189500,7000,16.08,36 months,22.47,1231.9,783.46,0.25,246.38,1
470508,56705,2015,Andhra Pradesh,01/11/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,B,33500,11000,9.99,36 months,18.38,376.25,263.31,0.0,354.89,0


### **8.Selection**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 46.3µs| 1.18ms| 7.37ms |

In [32]:
%%time
# 1. Pandas
data["state"]

CPU times: user 42 µs, sys: 0 ns, total: 42 µs
Wall time: 46.3 µs


0            Andhra Pradesh
1                 Rajasthan
2                   Manipur
3            Andhra Pradesh
4         Arunachal Pradesh
                ...        
887374             Nagaland
887375            Telangana
887376            Meghalaya
887377                Bihar
887378               Odisha
Name: state, Length: 887379, dtype: object

In [33]:
%%time
# 2. cudf
cudf_data["state"]

CPU times: user 1.17 ms, sys: 26 µs, total: 1.19 ms
Wall time: 1.18 ms


0            Andhra Pradesh
1                 Rajasthan
2                   Manipur
3            Andhra Pradesh
4         Arunachal Pradesh
                ...        
887374             Nagaland
887375            Telangana
887376            Meghalaya
887377                Bihar
887378               Odisha
Name: state, Length: 887379, dtype: object

In [35]:
%%time
# 3. cudf
daskcudf_data["state"].compute()

CPU times: user 5.27 ms, sys: 873 µs, total: 6.15 ms
Wall time: 7.37 ms


0            Andhra Pradesh
1                 Rajasthan
2                   Manipur
3            Andhra Pradesh
4         Arunachal Pradesh
                ...        
887374             Nagaland
887375            Telangana
887376            Meghalaya
887377                Bihar
887378               Odisha
Name: state, Length: 887379, dtype: object

### **9.Selection By label**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 1.4ms| 1.82ms| 7.45ms |

In [41]:
%%time
# 1.Pandas
data.loc[1:100, ["state","income_type"]]

CPU times: user 1.38 ms, sys: 61 µs, total: 1.44 ms
Wall time: 1.4 ms


Unnamed: 0,state,income_type
1,Rajasthan,Low
2,Manipur,Low
3,Andhra Pradesh,Low
4,Arunachal Pradesh,High
5,Chhattisgarh,Low
...,...,...
96,Odisha,Low
97,Kerala,Low
98,Madhya Pradesh,Low
99,Arunachal Pradesh,Medium


In [51]:
%%time
# 2.cudf
cudf_data.loc[1:100, ["state","income_type"]]

CPU times: user 1.81 ms, sys: 0 ns, total: 1.81 ms
Wall time: 1.82 ms


Unnamed: 0,state,income_type
1,Rajasthan,Low
2,Manipur,Low
3,Andhra Pradesh,Low
4,Arunachal Pradesh,High
5,Chhattisgarh,Low
...,...,...
96,Odisha,Low
97,Kerala,Low
98,Madhya Pradesh,Low
99,Arunachal Pradesh,Medium


In [52]:
%%time
# 3.daskcudf
daskcudf_data.loc[1:100, ["state","income_type"]].compute()

CPU times: user 7.13 ms, sys: 0 ns, total: 7.13 ms
Wall time: 7.45 ms


Unnamed: 0,state,income_type
1,Rajasthan,Low
2,Manipur,Low
3,Andhra Pradesh,Low
4,Arunachal Pradesh,High
5,Chhattisgarh,Low
...,...,...
96,Odisha,Low
97,Kerala,Low
98,Madhya Pradesh,Low
99,Arunachal Pradesh,Medium


### **10.Selection by Position**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 1.4ms| 1.82ms| 30.1ms |

In [56]:
%%time
# 1.Pandas
data.iloc[4:10, 0:2].head()

CPU times: user 1.41 ms, sys: 66 µs, total: 1.48 ms
Wall time: 1.99 ms


Unnamed: 0,cust_id,year
4,84670,2007
5,84098,2007
6,83979,2007
7,85818,2007
8,83489,2007


In [57]:
%%time
# 2.cudf
cudf_data.iloc[4:10, 0:2].head()

CPU times: user 2.87 ms, sys: 0 ns, total: 2.87 ms
Wall time: 2.84 ms


Unnamed: 0,cust_id,year
4,84670,2007
5,84098,2007
6,83979,2007
7,85818,2007
8,83489,2007


In [61]:
%%time
# 3.daskcudf
daskcudf_data.compute().iloc[4:10, 0:2].head()

CPU times: user 14 ms, sys: 14.9 ms, total: 28.9 ms
Wall time: 30.1 ms


Unnamed: 0,cust_id,year
4,84670,2007
5,84098,2007
6,83979,2007
7,85818,2007
8,83489,2007


### **11.Boolean Indexing**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 182ms| 34.7ms| 9.8ms |

In [67]:
%%time
# 1. Pandas
data[data.cust_id>83000]

CPU times: user 68.8 ms, sys: 110 ms, total: 179 ms
Wall time: 182 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.00,0.00,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.00,0.00,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.00,0.00,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.00,0.00,156.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887374,68616825,2015,Nagaland,01/12/2015,1012016,2.0,RENT,Medium,INDIVIDUAL,debt_consolidation,Low,B,130000,20000,9.17,36 months,12.77,0.00,0.00,0.00,637.58,0
887375,68616851,2015,Telangana,01/12/2015,1012016,10.0,RENT,Low,INDIVIDUAL,credit_card,Low,A,52000,12500,7.49,36 months,6.83,0.00,0.00,0.00,388.78,0
887376,68616867,2015,Meghalaya,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,C,80000,16000,14.85,60 months,19.59,247.39,181.39,0.00,379.39,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.00,0.00,641.52,0


In [68]:
%%time
# 2. cudf
cudf_data[cudf_data.cust_id>83000]

CPU times: user 24.5 ms, sys: 6.14 ms, total: 30.7 ms
Wall time: 34.7 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.00,0.00,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.00,0.00,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.00,0.00,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.00,0.00,156.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887374,68616825,2015,Nagaland,01/12/2015,1012016,2.0,RENT,Medium,INDIVIDUAL,debt_consolidation,Low,B,130000,20000,9.17,36 months,12.77,0.00,0.00,0.00,637.58,0
887375,68616851,2015,Telangana,01/12/2015,1012016,10.0,RENT,Low,INDIVIDUAL,credit_card,Low,A,52000,12500,7.49,36 months,6.83,0.00,0.00,0.00,388.78,0
887376,68616867,2015,Meghalaya,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,C,80000,16000,14.85,60 months,19.59,247.39,181.39,0.00,379.39,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.00,0.00,641.52,0


In [69]:
%%time
# 3. daskcudf
daskcudf_data[daskcudf_data.cust_id>83000]

CPU times: user 3.59 ms, sys: 0 ns, total: 3.59 ms
Wall time: 9.8 ms


Unnamed: 0_level_0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
npartitions=2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,int64,int64,object,object,int64,float64,object,object,object,object,object,object,int64,int64,float64,object,float64,float64,float64,float64,float64,int64
443690,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887378,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...


### **12.Query API**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 189ms| 178ms| 75.5ms |


In [74]:
%%time
# 1. pandas
data.query("cust_id > 83000")

CPU times: user 90.2 ms, sys: 86.3 ms, total: 177 ms
Wall time: 189 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.00,0.00,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.00,0.00,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.00,0.00,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.00,0.00,156.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887374,68616825,2015,Nagaland,01/12/2015,1012016,2.0,RENT,Medium,INDIVIDUAL,debt_consolidation,Low,B,130000,20000,9.17,36 months,12.77,0.00,0.00,0.00,637.58,0
887375,68616851,2015,Telangana,01/12/2015,1012016,10.0,RENT,Low,INDIVIDUAL,credit_card,Low,A,52000,12500,7.49,36 months,6.83,0.00,0.00,0.00,388.78,0
887376,68616867,2015,Meghalaya,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,C,80000,16000,14.85,60 months,19.59,247.39,181.39,0.00,379.39,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.00,0.00,641.52,0


In [75]:
%%time
# 2. cudf
cudf_data.query("cust_id > 83000")

CPU times: user 170 ms, sys: 4.35 ms, total: 174 ms
Wall time: 178 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.00,0.00,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.00,0.00,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.00,0.00,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.00,0.00,156.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887374,68616825,2015,Nagaland,01/12/2015,1012016,2.0,RENT,Medium,INDIVIDUAL,debt_consolidation,Low,B,130000,20000,9.17,36 months,12.77,0.00,0.00,0.00,637.58,0
887375,68616851,2015,Telangana,01/12/2015,1012016,10.0,RENT,Low,INDIVIDUAL,credit_card,Low,A,52000,12500,7.49,36 months,6.83,0.00,0.00,0.00,388.78,0
887376,68616867,2015,Meghalaya,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,C,80000,16000,14.85,60 months,19.59,247.39,181.39,0.00,379.39,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.00,0.00,641.52,0


In [77]:
%%time
# 3. daskcudf
daskcudf_data.query("cust_id > 83000").compute()

CPU times: user 52.2 ms, sys: 22.3 ms, total: 74.5 ms
Wall time: 75.7 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.00,0.00,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.00,0.00,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.00,0.00,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.00,0.00,156.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887374,68616825,2015,Nagaland,01/12/2015,1012016,2.0,RENT,Medium,INDIVIDUAL,debt_consolidation,Low,B,130000,20000,9.17,36 months,12.77,0.00,0.00,0.00,637.58,0
887375,68616851,2015,Telangana,01/12/2015,1012016,10.0,RENT,Low,INDIVIDUAL,credit_card,Low,A,52000,12500,7.49,36 months,6.83,0.00,0.00,0.00,388.78,0
887376,68616867,2015,Meghalaya,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,C,80000,16000,14.85,60 months,19.59,247.39,181.39,0.00,379.39,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.00,0.00,641.52,0


### **13.`isin` Method**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 49.9ms| 44.2ms| 67.5ms |


In [79]:
%%time
# 1. Pandas
data[data.state.isin(["Bihar", "Odisha"])]

CPU times: user 50.1 ms, sys: 0 ns, total: 50.1 ms
Wall time: 49.9 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
37,70686,2007,Bihar,01/06/2007,1062010,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,70000,5000,7.75,36 months,8.81,5619.72,5000.0,0.0,156.11,0
39,96844,2007,Bihar,01/07/2007,1072010,7.0,MORTGAGE,Low,INDIVIDUAL,credit_card,Low,A,74000,5300,8.38,36 months,14.37,6012.16,5300.0,0.0,167.02,0
48,92676,2007,Bihar,01/07/2007,1072010,0.5,MORTGAGE,Medium,INDIVIDUAL,home_improvement,Low,A,180000,5000,8.07,36 months,5.55,5645.90,5000.0,0.0,156.84,0
52,88854,2007,Bihar,01/08/2007,1032008,4.0,RENT,Medium,INDIVIDUAL,house,Low,A,200000,5000,7.43,36 months,0.28,5174.18,5000.0,0.0,155.38,0
63,72998,2007,Bihar,01/06/2007,1062010,0.5,RENT,Low,INDIVIDUAL,other,Low,B,12000,1000,9.64,36 months,10.00,1155.53,1000.0,0.0,32.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887327,68615046,2015,Bihar,01/12/2015,1012016,7.0,RENT,Low,INDIVIDUAL,debt_consolidation,High,D,76000,16800,17.97,60 months,33.66,0.00,0.0,0.0,426.34,0
887348,68616039,2015,Bihar,01/12/2015,1012016,3.0,MORTGAGE,Medium,INDIVIDUAL,credit_card,Low,B,110000,12000,9.17,36 months,24.61,0.00,0.0,0.0,382.55,0
887355,68615444,2015,Bihar,01/12/2015,1012016,4.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,A,50000,12000,5.32,36 months,15.25,0.00,0.0,0.0,361.38,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.0,0.0,641.52,0


In [81]:
%%time
# 2. cudf
cudf_data[cudf_data.state.isin(["Bihar", "Odisha"])]

CPU times: user 31.8 ms, sys: 10.9 ms, total: 42.7 ms
Wall time: 44.2 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
37,70686,2007,Bihar,01/06/2007,1062010,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,70000,5000,7.75,36 months,8.81,5619.72,5000.0,0.0,156.11,0
39,96844,2007,Bihar,01/07/2007,1072010,7.0,MORTGAGE,Low,INDIVIDUAL,credit_card,Low,A,74000,5300,8.38,36 months,14.37,6012.16,5300.0,0.0,167.02,0
48,92676,2007,Bihar,01/07/2007,1072010,0.5,MORTGAGE,Medium,INDIVIDUAL,home_improvement,Low,A,180000,5000,8.07,36 months,5.55,5645.90,5000.0,0.0,156.84,0
52,88854,2007,Bihar,01/08/2007,1032008,4.0,RENT,Medium,INDIVIDUAL,house,Low,A,200000,5000,7.43,36 months,0.28,5174.18,5000.0,0.0,155.38,0
63,72998,2007,Bihar,01/06/2007,1062010,0.5,RENT,Low,INDIVIDUAL,other,Low,B,12000,1000,9.64,36 months,10.00,1155.53,1000.0,0.0,32.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887327,68615046,2015,Bihar,01/12/2015,1012016,7.0,RENT,Low,INDIVIDUAL,debt_consolidation,High,D,76000,16800,17.97,60 months,33.66,0.00,0.0,0.0,426.34,0
887348,68616039,2015,Bihar,01/12/2015,1012016,3.0,MORTGAGE,Medium,INDIVIDUAL,credit_card,Low,B,110000,12000,9.17,36 months,24.61,0.00,0.0,0.0,382.55,0
887355,68615444,2015,Bihar,01/12/2015,1012016,4.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,A,50000,12000,5.32,36 months,15.25,0.00,0.0,0.0,361.38,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.0,0.0,641.52,0


In [83]:
%%time
# 3. daskcudf
daskcudf_data[daskcudf_data.state.isin(["Bihar", "Odisha"])].compute()

CPU times: user 55.6 ms, sys: 11.1 ms, total: 66.8 ms
Wall time: 67.5 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
37,70686,2007,Bihar,01/06/2007,1062010,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,70000,5000,7.75,36 months,8.81,5619.72,5000.0,0.0,156.11,0
39,96844,2007,Bihar,01/07/2007,1072010,7.0,MORTGAGE,Low,INDIVIDUAL,credit_card,Low,A,74000,5300,8.38,36 months,14.37,6012.16,5300.0,0.0,167.02,0
48,92676,2007,Bihar,01/07/2007,1072010,0.5,MORTGAGE,Medium,INDIVIDUAL,home_improvement,Low,A,180000,5000,8.07,36 months,5.55,5645.90,5000.0,0.0,156.84,0
52,88854,2007,Bihar,01/08/2007,1032008,4.0,RENT,Medium,INDIVIDUAL,house,Low,A,200000,5000,7.43,36 months,0.28,5174.18,5000.0,0.0,155.38,0
63,72998,2007,Bihar,01/06/2007,1062010,0.5,RENT,Low,INDIVIDUAL,other,Low,B,12000,1000,9.64,36 months,10.00,1155.53,1000.0,0.0,32.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887327,68615046,2015,Bihar,01/12/2015,1012016,7.0,RENT,Low,INDIVIDUAL,debt_consolidation,High,D,76000,16800,17.97,60 months,33.66,0.00,0.0,0.0,426.34,0
887348,68616039,2015,Bihar,01/12/2015,1012016,3.0,MORTGAGE,Medium,INDIVIDUAL,credit_card,Low,B,110000,12000,9.17,36 months,24.61,0.00,0.0,0.0,382.55,0
887355,68615444,2015,Bihar,01/12/2015,1012016,4.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,A,50000,12000,5.32,36 months,15.25,0.00,0.0,0.0,361.38,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.0,0.0,641.52,0


### **14.MultiIndex**

| Framework  | Pandas | cudf |
| ---- | ---- | ---- | 
| Time | 5.9ms| 12.9ms|

In [97]:
%%time
# 1. Pandas
arrays = [['a', 'a', 'b', 'b'], [1, 2, 3, 4]]
tuples = list(zip(*arrays))
idx = pd.MultiIndex.from_tuples(tuples)
display(idx)

MultiIndex([('a', 1),
            ('a', 2),
            ('b', 3),
            ('b', 4)],
           )

CPU times: user 5.28 ms, sys: 27 µs, total: 5.31 ms
Wall time: 5.9 ms


In [98]:
%%time
# 2. cudf
arrays = [['a', 'a', 'b', 'b'], [1, 2, 3, 4]]
tuples = list(zip(*arrays))
idx = cudf.MultiIndex.from_tuples(tuples)
display(idx)

MultiIndex([('a', 1),
            ('a', 2),
            ('b', 3),
            ('b', 4)],
           )

CPU times: user 13.3 ms, sys: 6 µs, total: 13.3 ms
Wall time: 12.9 ms


### **15.Missing Data**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 380ms| 29.8ms| 85.4ms |


In [100]:
%%time
# 1. Pandas
data.fillna(999)

CPU times: user 288 ms, sys: 86.8 ms, total: 374 ms
Wall time: 380 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.00,0.00,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.00,0.00,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.00,0.00,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.00,0.00,156.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887374,68616825,2015,Nagaland,01/12/2015,1012016,2.0,RENT,Medium,INDIVIDUAL,debt_consolidation,Low,B,130000,20000,9.17,36 months,12.77,0.00,0.00,0.00,637.58,0
887375,68616851,2015,Telangana,01/12/2015,1012016,10.0,RENT,Low,INDIVIDUAL,credit_card,Low,A,52000,12500,7.49,36 months,6.83,0.00,0.00,0.00,388.78,0
887376,68616867,2015,Meghalaya,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,C,80000,16000,14.85,60 months,19.59,247.39,181.39,0.00,379.39,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.00,0.00,641.52,0


In [101]:
%%time
# 2.cudf
cudf_data.fillna(999)

CPU times: user 16.6 ms, sys: 11.1 ms, total: 27.7 ms
Wall time: 29.8 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.00,0.00,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.00,0.00,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.00,0.00,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.00,0.00,156.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887374,68616825,2015,Nagaland,01/12/2015,1012016,2.0,RENT,Medium,INDIVIDUAL,debt_consolidation,Low,B,130000,20000,9.17,36 months,12.77,0.00,0.00,0.00,637.58,0
887375,68616851,2015,Telangana,01/12/2015,1012016,10.0,RENT,Low,INDIVIDUAL,credit_card,Low,A,52000,12500,7.49,36 months,6.83,0.00,0.00,0.00,388.78,0
887376,68616867,2015,Meghalaya,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,C,80000,16000,14.85,60 months,19.59,247.39,181.39,0.00,379.39,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.00,0.00,641.52,0


In [103]:
%%time
# 3.cudf
daskcudf_data.fillna(999).compute()

CPU times: user 44.2 ms, sys: 38.7 ms, total: 82.9 ms
Wall time: 85.4 ms


Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,01/12/2007,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,01/06/2007,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.00,0.00,45.78,0
2,85675,2007,Manipur,01/06/2007,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.00,0.00,34.21,0
3,84918,2007,Andhra Pradesh,01/09/2007,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.00,0.00,155.38,0
4,84670,2007,Arunachal Pradesh,01/06/2007,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.00,0.00,156.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887374,68616825,2015,Nagaland,01/12/2015,1012016,2.0,RENT,Medium,INDIVIDUAL,debt_consolidation,Low,B,130000,20000,9.17,36 months,12.77,0.00,0.00,0.00,637.58,0
887375,68616851,2015,Telangana,01/12/2015,1012016,10.0,RENT,Low,INDIVIDUAL,credit_card,Low,A,52000,12500,7.49,36 months,6.83,0.00,0.00,0.00,388.78,0
887376,68616867,2015,Meghalaya,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,C,80000,16000,14.85,60 months,19.59,247.39,181.39,0.00,379.39,0
887377,68616501,2015,Bihar,01/12/2015,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.00,0.00,641.52,0


### **16.Statistics Operation**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 5.7ms| 4.51ms| 651ms |


In [106]:
%%time
# 1.Pandas
data.total_pymnt.mean(), data.total_pymnt.var()

CPU times: user 5.08 ms, sys: 0 ns, total: 5.08 ms
Wall time: 5.7 ms


(7558.826683642953, 61956471.64980233)

In [107]:
%%time
# 2.cudf
cudf_data.total_pymnt.mean(), cudf_data.total_pymnt.var()

CPU times: user 2.68 ms, sys: 813 µs, total: 3.49 ms
Wall time: 4.51 ms


(7558.82668364404, 61956471.64974804)

In [109]:
%%time
# 3.daskcudf
daskcudf_data.total_pymnt.mean().compute(), daskcudf_data.total_pymnt.var().compute()

CPU times: user 640 ms, sys: 13.5 ms, total: 653 ms
Wall time: 651 ms


(7558.82668364404, 61956471.649748065)

### **17.Applymap**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 211ms| 1.84ms| 7.16ms |


In [114]:
%%time
# 1.Pandas
def add(n):
  return n + 5
data["total_pymnt"].apply(add)

CPU times: user 177 ms, sys: 34 ms, total: 211 ms
Wall time: 211 ms


In [116]:
%%time
# 2.cudf
def add(n):
  return n + 5
cudf_data["total_pymnt"].applymap(add)

CPU times: user 1.61 ms, sys: 928 µs, total: 2.54 ms
Wall time: 1.84 ms


In [120]:
%%time
# 3.daskcudf
def add(n):
  return n + 5
daskcudf_data["total_pymnt"].compute().applymap(add)

CPU times: user 4.26 ms, sys: 2.21 ms, total: 6.48 ms
Wall time: 7.16 ms


### **18. Histogramming**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 68.4ms| 23.5ms| 36.5ms |

In [121]:
%%time
# 1. Pandas
data["state"].value_counts()

CPU times: user 68 ms, sys: 0 ns, total: 68 ms
Wall time: 68.4 ms


Chhattisgarh         32211
Sikkim               32130
Haryana              31999
Punjab               31853
Assam                31792
Goa                  31786
Madhya Pradesh       31752
Uttar Pradesh        31750
Maharashtra          31732
Himachal Pradesh     31730
Arunachal Pradesh    31703
Nagaland             31701
Andhra Pradesh       31688
Kerala               31687
Rajasthan            31669
Tripura              31669
Karnataka            31619
Manipur              31603
West Bengal          31596
Odisha               31587
Telangana            31586
Bihar                31577
Jharkhand            31550
Gujarat              31536
Uttarakhand          31536
Mizoram              31536
Tamil Nadu           31517
Meghalaya            31284
Name: state, dtype: int64

In [123]:
%%time
# 2. cudf
cudf_data["state"].value_counts()

CPU times: user 19.1 ms, sys: 980 µs, total: 20.1 ms
Wall time: 23.5 ms


Chhattisgarh         32211
Sikkim               32130
Haryana              31999
Punjab               31853
Assam                31792
Goa                  31786
Madhya Pradesh       31752
Uttar Pradesh        31750
Maharashtra          31732
Himachal Pradesh     31730
Arunachal Pradesh    31703
Nagaland             31701
Andhra Pradesh       31688
Kerala               31687
Rajasthan            31669
Tripura              31669
Karnataka            31619
Manipur              31603
West Bengal          31596
Odisha               31587
Telangana            31586
Bihar                31577
Jharkhand            31550
Uttarakhand          31536
Gujarat              31536
Mizoram              31536
Tamil Nadu           31517
Meghalaya            31284
Name: state, dtype: int32

In [125]:
%%time
# 3. daskcudf
daskcudf_data["state"].value_counts().compute()

CPU times: user 31.8 ms, sys: 2.07 ms, total: 33.9 ms
Wall time: 36.5 ms


Chhattisgarh         32211
Sikkim               32130
Haryana              31999
Punjab               31853
Assam                31792
Goa                  31786
Madhya Pradesh       31752
Uttar Pradesh        31750
Maharashtra          31732
Himachal Pradesh     31730
Arunachal Pradesh    31703
Nagaland             31701
Andhra Pradesh       31688
Kerala               31687
Tripura              31669
Rajasthan            31669
Karnataka            31619
Manipur              31603
West Bengal          31596
Odisha               31587
Telangana            31586
Bihar                31577
Jharkhand            31550
Mizoram              31536
Uttarakhand          31536
Gujarat              31536
Tamil Nadu           31517
Meghalaya            31284
Name: state, dtype: int64

### **19.String Methods**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 3.59ms| 6.17ms| 16.4ms |

In [143]:
states = data.state.unique().tolist()

In [148]:
%%time
# 1.Pandas
s = pd.Series(states)
print(s.str.lower())

0        andhra pradesh
1             rajasthan
2               manipur
3     arunachal pradesh
4          chhattisgarh
5               mizoram
6               gujarat
7               tripura
8               haryana
9             jharkhand
10                assam
11               sikkim
12               punjab
13            telangana
14          maharashtra
15            karnataka
16                  goa
17             nagaland
18     himachal pradesh
19        uttar pradesh
20          west bengal
21                bihar
22           tamil nadu
23               kerala
24       madhya pradesh
25            meghalaya
26          uttarakhand
27               odisha
dtype: object
CPU times: user 3.68 ms, sys: 0 ns, total: 3.68 ms
Wall time: 3.59 ms


In [146]:
%%time
# 2.cudf
s_cudf = cudf.Series(states)
print(s_cudf.str.lower())

0        andhra pradesh
1             rajasthan
2               manipur
3     arunachal pradesh
4          chhattisgarh
5               mizoram
6               gujarat
7               tripura
8               haryana
9             jharkhand
10                assam
11               sikkim
12               punjab
13            telangana
14          maharashtra
15            karnataka
16                  goa
17             nagaland
18     himachal pradesh
19        uttar pradesh
20          west bengal
21                bihar
22           tamil nadu
23               kerala
24       madhya pradesh
25            meghalaya
26          uttarakhand
27               odisha
dtype: object
CPU times: user 5.27 ms, sys: 764 µs, total: 6.04 ms
Wall time: 6.17 ms


In [147]:
%%time
# 3.daskcudf
s_daskcudf = dask_cudf.from_cudf(cudf.Series(states), npartitions=2)
print(s_daskcudf.str.lower().compute())

0        andhra pradesh
1             rajasthan
2               manipur
3     arunachal pradesh
4          chhattisgarh
5               mizoram
6               gujarat
7               tripura
8               haryana
9             jharkhand
10                assam
11               sikkim
12               punjab
13            telangana
14          maharashtra
15            karnataka
16                  goa
17             nagaland
18     himachal pradesh
19        uttar pradesh
20          west bengal
21                bihar
22           tamil nadu
23               kerala
24       madhya pradesh
25            meghalaya
26          uttarakhand
27               odisha
dtype: object
CPU times: user 13.4 ms, sys: 998 µs, total: 14.4 ms
Wall time: 16.4 ms


### **20.Concat**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 1.41ms| 2.28ms| 11.1ms |

In [149]:
%%time
# 1. Pandas
pd.concat([s,s])

CPU times: user 1.37 ms, sys: 48 µs, total: 1.41 ms
Wall time: 1.41 ms


0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Telangana
14          Maharashtra
15            Karnataka
16                  Goa
17             Nagaland
18     Himachal Pradesh
19        Uttar Pradesh
20          West Bengal
21                Bihar
22           Tamil Nadu
23               Kerala
24       Madhya Pradesh
25            Meghalaya
26          Uttarakhand
27               Odisha
0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Te

In [152]:
%%time
# 2. cudf
cudf.concat([s_cudf,s_cudf])

CPU times: user 2.03 ms, sys: 0 ns, total: 2.03 ms
Wall time: 2.28 ms


0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Telangana
14          Maharashtra
15            Karnataka
16                  Goa
17             Nagaland
18     Himachal Pradesh
19        Uttar Pradesh
20          West Bengal
21                Bihar
22           Tamil Nadu
23               Kerala
24       Madhya Pradesh
25            Meghalaya
26          Uttarakhand
27               Odisha
0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Te

In [158]:
%%time
# 3. daskcudf
dask_cudf.concat([s_daskcudf,s_daskcudf]).compute()

CPU times: user 9.56 ms, sys: 0 ns, total: 9.56 ms
Wall time: 11.1 ms


0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Telangana
14          Maharashtra
15            Karnataka
16                  Goa
17             Nagaland
18     Himachal Pradesh
19        Uttar Pradesh
20          West Bengal
21                Bihar
22           Tamil Nadu
23               Kerala
24       Madhya Pradesh
25            Meghalaya
26          Uttarakhand
27               Odisha
0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Te

### **21.Join**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 15.7ms| 9.05ms| 30.2ms |

In [159]:
%%time
# 1.Pandas
df_a = pd.DataFrame()
df_a['key'] = ['a', 'b', 'c', 'd', 'e']
df_a['vals_a'] = [float(i + 10) for i in range(5)]

df_b = pd.DataFrame()
df_b['key'] = ['a', 'c', 'e']
df_b['vals_b'] = [float(i+100) for i in range(3)]

merged = df_a.merge(df_b, on=['key'], how='left')
merged.head()

CPU times: user 9.04 ms, sys: 54 µs, total: 9.1 ms
Wall time: 15.7 ms


In [160]:
%%time
# 2.cudf
df_a = cudf.DataFrame()
df_a['key'] = ['a', 'b', 'c', 'd', 'e']
df_a['vals_a'] = [float(i + 10) for i in range(5)]

df_b = cudf.DataFrame()
df_b['key'] = ['a', 'c', 'e']
df_b['vals_b'] = [float(i+100) for i in range(3)]

merged = df_a.merge(df_b, on=['key'], how='left')
merged.head()

CPU times: user 8.98 ms, sys: 0 ns, total: 8.98 ms
Wall time: 9.05 ms


In [161]:
%%time
# 3.daskcudf
ddf_a = dask_cudf.from_cudf(df_a, npartitions=2)
ddf_b = dask_cudf.from_cudf(df_b, npartitions=2)

merged = ddf_a.merge(ddf_b, on=['key'], how='left').compute()
merged

CPU times: user 29.1 ms, sys: 812 µs, total: 29.9 ms
Wall time: 30.2 ms


### **22.Append**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 574µs| 4.26ms| 30.2ms |

In [163]:
%%time
# 1. Pandas
s.append(s)

CPU times: user 585 µs, sys: 0 ns, total: 585 µs
Wall time: 574 µs


0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Telangana
14          Maharashtra
15            Karnataka
16                  Goa
17             Nagaland
18     Himachal Pradesh
19        Uttar Pradesh
20          West Bengal
21                Bihar
22           Tamil Nadu
23               Kerala
24       Madhya Pradesh
25            Meghalaya
26          Uttarakhand
27               Odisha
0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Te

In [165]:
%%time
# 2. cudf
s_cudf.append(s_cudf)

CPU times: user 2.95 ms, sys: 0 ns, total: 2.95 ms
Wall time: 4.26 ms


0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Telangana
14          Maharashtra
15            Karnataka
16                  Goa
17             Nagaland
18     Himachal Pradesh
19        Uttar Pradesh
20          West Bengal
21                Bihar
22           Tamil Nadu
23               Kerala
24       Madhya Pradesh
25            Meghalaya
26          Uttarakhand
27               Odisha
0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Te

In [167]:
%%time
# 3.daskcudf
s_daskcudf.append(s_daskcudf).compute()

CPU times: user 8.67 ms, sys: 0 ns, total: 8.67 ms
Wall time: 9.61 ms


0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Telangana
14          Maharashtra
15            Karnataka
16                  Goa
17             Nagaland
18     Himachal Pradesh
19        Uttar Pradesh
20          West Bengal
21                Bihar
22           Tamil Nadu
23               Kerala
24       Madhya Pradesh
25            Meghalaya
26          Uttarakhand
27               Odisha
0        Andhra Pradesh
1             Rajasthan
2               Manipur
3     Arunachal Pradesh
4          Chhattisgarh
5               Mizoram
6               Gujarat
7               Tripura
8               Haryana
9             Jharkhand
10                Assam
11               Sikkim
12               Punjab
13            Te

### **23.Grouping**

| Framework  | Pandas | cudf |
| ---- | ---- | ---- | 
| Time | 792ms| 270ms| 

In [173]:
%%time
# 1.Pandas
data["agg_col1"] = [1 if x % 2 == 0 else 0 for x in range(len(data["total_pymnt"]))]
data["agg_col2"] = [1 if x % 2 == 0 else 0 for x in range(len(data["total_pymnt"]))]

data.groupby('agg_col1').sum()

CPU times: user 609 ms, sys: 167 ms, total: 775 ms
Wall time: 792 ms


In [185]:
%%time
# 2.cudf
cudf_data["agg_col1"] = [1 if x % 2 == 0 else 0 for x in range(len(cudf_data["total_pymnt"]))]
cudf_data["agg_col2"] = [1 if x % 2 == 0 else 0 for x in range(len(cudf_data["total_pymnt"]))]
cudf_data.groupby('agg_col1').sum()

CPU times: user 255 ms, sys: 17.9 ms, total: 273 ms
Wall time: 270 ms


### **24.Transpose**

| Framework  | Pandas | cudf | 
| ---- | ---- | ---- | 
| Time | 1.1ms| 3.13ms|

In [187]:
%%time
# 1.Pandas
sample = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
sample.transpose()

CPU times: user 1.09 ms, sys: 0 ns, total: 1.09 ms
Wall time: 1.1 ms


In [188]:
%%time
# 2.cudf
sample_cudf = cudf.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
sample_cudf.transpose()

CPU times: user 3.15 ms, sys: 0 ns, total: 3.15 ms
Wall time: 3.13 ms


### **25.Time Series**

| Framework  | Pandas | cudf | daskcudf |
| ---- | ---- | ---- | ---- |
| Time | 213ms| 330ms| 79ms |

In [195]:
%%time
# 1.Pandas
import datetime as dt

data["date_issued"] = pd.to_datetime(data["date_issued"])
search_date = dt.datetime.strptime('2015-11-23', '%Y-%m-%d')
display(data.query('date_issued <= @search_date'))

Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default
0,180675,2007,Andhra Pradesh,2007-01-12,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1
1,85781,2007,Rajasthan,2007-01-06,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.00,0.00,45.78,0
2,85675,2007,Manipur,2007-01-06,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.00,0.00,34.21,0
3,84918,2007,Andhra Pradesh,2007-01-09,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.00,0.00,155.38,0
4,84670,2007,Arunachal Pradesh,2007-01-06,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.00,0.00,156.11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887374,68616825,2015,Nagaland,2015-01-12,1012016,2.0,RENT,Medium,INDIVIDUAL,debt_consolidation,Low,B,130000,20000,9.17,36 months,12.77,0.00,0.00,0.00,637.58,0
887375,68616851,2015,Telangana,2015-01-12,1012016,10.0,RENT,Low,INDIVIDUAL,credit_card,Low,A,52000,12500,7.49,36 months,6.83,0.00,0.00,0.00,388.78,0
887376,68616867,2015,Meghalaya,2015-01-12,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,C,80000,16000,14.85,60 months,19.59,247.39,181.39,0.00,379.39,0
887377,68616501,2015,Bihar,2015-01-12,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.00,0.00,641.52,0


CPU times: user 205 ms, sys: 6.97 ms, total: 212 ms
Wall time: 213 ms


In [196]:
%%time
# 2.cudf
import datetime as dt

cudf_data["date_issued"] = cudf.to_datetime(cudf_data["date_issued"])
search_date = dt.datetime.strptime('2015-11-23', '%Y-%m-%d')
display(cudf_data.query('date_issued <= @search_date'))

Unnamed: 0,cust_id,year,state,date_issued,date_final,emp_duration,own_type,income_type,app_type,loan_purpose,interest_payments,grade,annual_pay,loan_amount,interest_rate,loan_duration,dti,total_pymnt,total_rec_prncp,recoveries,installment,is_default,agg_col1,agg_col2
0,180675,2007,Andhra Pradesh,2007-01-12,1032009,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,Low,C,73000,25000,10.91,36 months,22.13,13650.38,8767.32,2207.65,817.41,1,1,1
1,85781,2007,Rajasthan,2007-01-06,1072010,0.5,RENT,Low,INDIVIDUAL,other,Low,C,40000,1400,10.91,36 months,8.61,1663.04,1400.00,0.00,45.78,0,0,0
2,85675,2007,Manipur,2007-01-06,1062010,10.0,RENT,Low,INDIVIDUAL,other,High,E,25000,1000,14.07,36 months,16.27,1231.38,1000.00,0.00,34.21,0,1,1
3,84918,2007,Andhra Pradesh,2007-01-09,1042008,10.0,MORTGAGE,Low,INDIVIDUAL,other,Low,A,65000,5000,7.43,36 months,0.28,5200.44,5000.00,0.00,155.38,0,0,0
4,84670,2007,Arunachal Pradesh,2007-01-06,1082009,10.0,MORTGAGE,High,INDIVIDUAL,other,Low,A,300000,5000,7.75,36 months,5.38,5565.65,5000.00,0.00,156.11,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
887374,68616825,2015,Nagaland,2015-01-12,1012016,2.0,RENT,Medium,INDIVIDUAL,debt_consolidation,Low,B,130000,20000,9.17,36 months,12.77,0.00,0.00,0.00,637.58,0,1,1
887375,68616851,2015,Telangana,2015-01-12,1012016,10.0,RENT,Low,INDIVIDUAL,credit_card,Low,A,52000,12500,7.49,36 months,6.83,0.00,0.00,0.00,388.78,0,0,0
887376,68616867,2015,Meghalaya,2015-01-12,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,C,80000,16000,14.85,60 months,19.59,247.39,181.39,0.00,379.39,0,1,1
887377,68616501,2015,Bihar,2015-01-12,1012016,10.0,MORTGAGE,Low,INDIVIDUAL,debt_consolidation,High,D,100000,25000,18.49,60 months,5.58,0.00,0.00,0.00,641.52,0,0,0


CPU times: user 311 ms, sys: 19.8 ms, total: 331 ms
Wall time: 330 ms


In [200]:
%%time
# 3.daskcudf
date_ddf = dask_cudf.from_cudf(cudf_data, npartitions=2)
date_ddf.query('date_issued <= @search_date', local_dict={'search_date':search_date}).compute()

CPU times: user 42.8 ms, sys: 35.5 ms, total: 78.3 ms
Wall time: 79.2 ms


### **Conclusion**

* As per above example with this large scale [dataset](https://www.kaggle.com/c/home-credit-default-risk) cudf is working really faster on similar to pandas in somecases it beats pandas speed on operations.

### References

1. https://docs.rapids.ai/api/cudf/stable/10min.html
2. https://www.kaggle.com/c/home-credit-default-risk