### cuML

cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects.

cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn.

For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.
    
[GitHub](https://github.com/rapidsai/cuml) | [Welcome Notebook](../welcome.ipynb#cuML---RAPIDS-Machine-Learning-Library)

In [1]:
from blazingsql import BlazingContext

# connect to BlazingSQL w/ BlazingContext API
bc = BlazingContext(pool=False)

BlazingContext ready


In [9]:
import os

# BlazingContext requires full data path
data_path = f'{os.getcwd().split("/intro_notebooks")[0]}/data/sample_taxi.csv'

# what's the data's path?
print(f"data_path == '{data_path}'\n")

# create a BlazingSQL table from any file w/ .create_table(table_name, file_path)
bc.create_table('taxi', data_path, header=0)

data_path == '/jupyterhub-homes/winston@blazingdb.com/blazingsql_notebooks/data/sample_taxi.csv'



<pyblazing.apiv2.context.BlazingTable at 0x7fe5000e9ed0>

In [None]:
taxi_columns = [col for col in bc.sql('select * from taxi').columns if col not in ['fare_amount', 'total_amount']]

taxi_columns

In [None]:
from cuml.preprocessing.model_selection import train_test_split

# split data into training & testing sets (70:30)
X_train, X_test, y_train, y_test = train_test_split(bc.sql('SELECT ', y, train_size = 0.7))

In [None]:
# convert cuDF DataFrame to pandas to identify feature correlation 
corr = bc.sql('select * from taxi').to_pandas().corr()

# visualize correlations
corr.style.background_gradient(cmap='coolwarm')