# Rapids - Environment Sanity Check #

You can check the output of `!nvidia-smi` to check which GPU you have.  Please uncomment the cell below if you'd like to do that.  Currently, RAPIDS runs on all available Colab GPU instances.

In [1]:
!nvidia-smi

Sun Dec 31 12:04:16 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57       Driver Version: 515.57       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:07:00.0  On |                  N/A |
| 38%   40C    P8    24W / 215W |    466MiB /  8192MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/pip-install.py


# RAPIDS is now installed on Colab.  

![](https://docs.rapids.ai/api/cudf/stable/_images/duckdb-benchmark-groupby-join.png)

In [7]:
import os

import cupy as cp
import pandas as pd

import cudf
import dask_cudf

cp.random.seed(12)

### Multiindex

cuDF supports hierarchical indexing of DataFrames using MultiIndex. Grouping hierarchically (see Grouping below) automatically produces a DataFrame with a MultiIndex.


In [8]:
arrays = [["a", "a", "b", "b"], [1, 2, 3, 4]]
tuples = list(zip(*arrays))
idx = cudf.MultiIndex.from_tuples(tuples)
idx

MultiIndex([('a', 1),
            ('a', 2),
            ('b', 3),
            ('b', 4)],
           )

In [9]:
gdf1 = cudf.DataFrame(
  {"first": cp.random.rand(4), "second": cp.random.rand(4)}
)
gdf1.index = idx
gdf1

Unnamed: 0,Unnamed: 1,first,second
a,1,0.082654,0.967955
a,2,0.399417,0.441425
b,3,0.784297,0.793582
b,4,0.070303,0.271711


In [10]:
gdf2 = cudf.DataFrame(
    {"first": cp.random.rand(4), "second": cp.random.rand(4)}
).T
gdf2.columns = idx
gdf2

Unnamed: 0_level_0,a,a,b,b
Unnamed: 0_level_1,1,2,3,4
first,0.343382,0.0037,0.20043,0.581614
second,0.907812,0.101512,0.24179,0.22418


In [11]:
gdf1.loc[("b", 3)]

first     0.784297
second    0.793582
Name: ('b', 3), dtype: float64

In [13]:
gdf1.iloc[0:3]

Unnamed: 0,Unnamed: 1,first,second
a,1,0.082654,0.967955
a,2,0.399417,0.441425
b,3,0.784297,0.793582


In [18]:
gdf2[('a', 2)]

first     0.003700
second    0.101512
Name: ('a', 2), dtype: float64

### Missing Value

In [19]:
pdf = pd.DataFrame({"a": [0, 1, 2, 3], "b": [0.1, 0.2, None, 0.3]})
gdf = cudf.DataFrame.from_pandas(pdf)
gdf

Unnamed: 0,a,b
0,0,0.1
1,1,0.2
2,2,
3,3,0.3


In [20]:
gdf.isna()

Unnamed: 0,a,b
0,False,False
1,False,False
2,False,True
3,False,False


In [21]:
gdf["b"].notna()

0     True
1     True
2    False
3     True
Name: b, dtype: bool

In [22]:
gdf.fillna(999)

Unnamed: 0,a,b
0,0,0.1
1,1,0.2
2,2,999.0
3,3,0.3


In [23]:
gdf.fillna(method="ffill")

Unnamed: 0,a,b
0,0,0.1
1,1,0.2
2,2,0.2
3,3,0.3


In [24]:
gdf.fillna(method="bfill")

Unnamed: 0,a,b
0,0,0.1
1,1,0.2
2,2,0.3
3,3,0.3


In [27]:
gdf["b"].sum(skipna=True)

0.6000000000000001

In [25]:
gdf.dropna(axis=0)

Unnamed: 0,a,b
0,0,0.1
1,1,0.2
3,3,0.3


In [26]:
gdf.dropna(axis=1)

Unnamed: 0,a
0,0
1,1
2,2
3,3
