# <span style="color:#2E86C1"><b>Introduction to RAPIDS: GPU-Accelerated Data Science</b></span>

**RAPIDS** is an **open-source** suite of libraries and tools developed by NVIDIA to accelerate data science workflows using **NVIDIA GPUs**. Designed to speed up the end-to-end data pipeline, RAPIDS enables data processing, machine learning, and data visualization to run on the GPU, significantly reducing computational time compared to CPU-based processing.

### <span style="color:#D35400"><b>Key Libraries in RAPIDS</b></span>

RAPIDS includes several libraries optimized for specific stages in the data science pipeline, two of which are:

 - Dataframe processing with cuDF (similar API to pandas)
 - Machine learning with cuML (similar API to scikit-learn)
 - Graph processing with cuGraph (similar API to networkX)
 - Spatial analytics with cuSpatial (similar API to geoPandas)
 - Image processing with cuCIM (similar API to scikit-image)
 - Seamless cross-filtered dashboards with cuxfilter
 - Low level compute primitives with RAFT
 - Apache Spark acceleration with Spark RAPIDS

and many more at [here!](https://github.com/orgs/rapidsai/repositories?type=all)

### <span style="color:#28B463"><b>cuDF: GPU DataFrames for Data Processing</b></span>

**cuDF** is a RAPIDS library for working with DataFrames on GPUs, providing a similar API and functionality to pandas. It allows data scientists to leverage the power of parallel GPU computation for faster data manipulation and preprocessing.

- **Core Functionality**: Like **pandas**, cuDF enables operations like filtering, grouping, aggregating, merging, and joining. It’s designed to handle large datasets that might otherwise be slow to process on a CPU.
- **Syntax Similarity**: Since the cuDF syntax is nearly identical to pandas, users familiar with pandas can transition to GPU-accelerated workflows with minimal code changes.
- **Performance**: cuDF can process large datasets in parallel, often achieving orders-of-magnitude speed improvements over pandas on large data.

#### <span style="color:#8E44AD"><b>Example Comparison</b></span>

**Pandas**:
```bash
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df['c'] = df['a'] + df['b']
```

**cuDF**:
```bash
import cudf
df = cudf.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df['c'] = df['a'] + df['b']
```

### <span style="color:#F39C12"><b>cuML: GPU Machine Learning Library</b></span>

**cuML** is the machine learning library within RAPIDS, providing GPU-accelerated implementations of machine learning algorithms similar to those in **scikit-learn**. This enables faster model training and prediction, particularly useful for large datasets.

- **Algorithm Support**: cuML includes popular algorithms like linear regression, k-means clustering, PCA, and more, with similar syntax to scikit-learn.
- **Performance**: By leveraging the parallel processing power of GPUs, cuML achieves much faster computation times for model training and inference compared to CPU-based scikit-learn.
- **Seamless Integration**: With an API that closely mirrors scikit-learn, cuML allows for a smooth transition for users who want to accelerate their machine learning workflows without learning new syntax.

#### <span style="color:#8E44AD"><b>Example Comparison</b></span>

**Scikit-learn**:
```bash
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
```

**cuML**:
```bash
from cuml.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
```

### <span style="color:#3498DB"><b>Benefits of RAPIDS for Data Science</b></span>

1. **Speed**: GPU acceleration enables faster data processing, model training, and inference, reducing the overall time to insights.
2. **Familiar Syntax**: RAPIDS libraries like cuDF and cuML closely mirror pandas and scikit-learn, making it easy to adopt without extensive retraining.
3. **Scalability**: RAPIDS is particularly suited for large datasets that challenge CPU resources, allowing for scaling complex workflows on a single GPU or multiple GPUs.

RAPIDS represents a transformative approach for data scientists and machine learning practitioners, unlocking the computational power of GPUs for faster, more scalable data analysis and machine learning.

# <span style="color:#2E86C1"><b>Setting Up RAPIDS on Your System</b></span>

Setting up **RAPIDS** on your system enables the use of **cuDF**, **cuML**, and **cuGraph** libraries for GPU-accelerated data processing, machine learning, and graph analytics. Below, we’ll go through the setup process, ensuring that RAPIDS is correctly installed on a GPU-capable system. The setup includes verifying CUDA compatibility, installing dependencies, and setting up the RAPIDS libraries.

### <span style="color:#D35400"><b>1. System Requirements</b></span>

To use RAPIDS, you need:
- A **NVIDIA GPU** with CUDA capability (10.x or 11.x) installed.
- **Python** 3.7 or higher.
- A compatible operating system (RAPIDS supports Linux and some versions of Windows, but Linux is recommended).

> **Tip**: Use NVIDIA’s `nvidia-smi` command to confirm your CUDA version and GPU details before starting.

### <span style="color:#28B463"><b>2. Setting Up a Conda Environment</b></span>

To keep dependencies organized, we’ll set up RAPIDS within a dedicated **Conda environment**.

```bash
# Create and activate a new Conda environment
conda create -n rapids_env python=3.8 -y
conda activate rapids_env
```

### <span style="color:#8E44AD"><b>3. Installing RAPIDS Libraries</b></span>

RAPIDS provides Conda packages that simplify the installation process. You can install the RAPIDS suite with specific versions, which will include **cuDF**, **cuML**, and **cuGraph**.

- **Installing RAPIDS using Conda** (recommended for stability):

```bash
# Add RAPIDS channels
conda install -c rapidsai -c nvidia -c conda-forge -c defaults \
    cudf=23.10 cuml=23.10 cugraph=23.10 python=3.8 cudatoolkit=11.2
```

In this example:
- We specify RAPIDS version **23.10** (latest stable at the time) and **CUDA 11.2**. Update these values as needed for compatibility with your system.

> **Note**: Make sure that the CUDA version in the `cudatoolkit` argument matches your system's CUDA installation.

### <span style="color:#F39C12"><b>4. Verifying the Installation</b></span>

After installing RAPIDS, you can test each library to confirm they’re working correctly.

```bash
import cudf
import cuml
import cugraph

# Example cuDF DataFrame
df = cudf.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]})
print(df.head())

# Example cuML model
from cuml.linear_model import LinearRegression
print("cuML and cuDF working properly if no error.")

# Example cuGraph graph
import cudf
import cugraph
edges = cudf.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 3]})
G = cugraph.Graph()
G.from_cudf_edgelist(edges, source='src', destination='dst')
print("cuGraph working if no error.")
```

### <span style="color:#3498DB"><b>5. Running RAPIDS on Jupyter Notebooks (Optional)</b></span>

For ease of development, you can set up Jupyter Notebook to work with RAPIDS.

```bash
# Install Jupyter Notebook
conda install -c conda-forge notebook

# Launch Jupyter Notebook
jupyter notebook
```

> After launching, you can open a new notebook and test RAPIDS functions as shown above.

### <span style="color:#27AE60"><b>Troubleshooting</b></span>

1. **CUDA Compatibility**: Ensure the CUDA version specified in `cudatoolkit` aligns with your system’s CUDA installation.
2. **Library Import Errors**: If you encounter import errors, try upgrading or downgrading the RAPIDS version to match your system’s GPU and CUDA compatibility.

RAPIDS is now set up, enabling you to leverage GPU acceleration for data processi

### <span style="color:#8E44AD"><b>For More Information on cuDF and cuML Libraries Visit These sites :</b></span>
-   <b>Rapids Site : </b>[Click here](https://rapids.ai/) 
-   <b>cuDF Documentation : </b>[Click here](https://docs.rapids.ai/api/cudf/stable/user_guide/)
-   <b>cuML Documentation : </b>[Click here](https://docs.rapids.ai/api/cuml/stable/user_guide/)ng, machine learning, and graph analytics with **cuDF**, **cuML**, and **cuGraph**!

## <span style="color:#2E86C1"><b>CODE</b></span>
#### <b>cuDF , cuML thes eare community based open source projects under development so you may have to do some amount of Troubleshooting till you get CPU codes working on GPU using these libraries</b>  

In [1]:
# !conda install -c rapidsai -c nvidia -c conda-forge rapids=22.12 cudatoolkit=11.2
# GPU compatibilty need to be tested otherwise library won't work 

In [33]:
import cudf as pd # pandas equivalent of CUDA library 
from cuml.model_selection import train_test_split , GridSearchCV , StratifiedKFold 
from cuml.linear_model import LogisticRegression
from cuml.preprocessing import LabelEncoder
from cuml.metrics import confusion_matrix

In [3]:
data = pd.read_csv('/kaggle/input/crop-recommendation-dataset/Crop_Recommendation.csv')
data

Unnamed: 0,Nitrogen,Phosphorus,Potassium,Temperature,Humidity,pH_Value,Rainfall,Crop
0,90,42,43,20.879744,82.002744,6.502985,202.935536,Rice
1,85,58,41,21.770462,80.319644,7.038096,226.655537,Rice
2,60,55,44,23.004459,82.320763,7.840207,263.964248,Rice
3,74,35,40,26.491096,80.158363,6.980401,242.864034,Rice
4,78,42,42,20.130175,81.604873,7.628473,262.717340,Rice
...,...,...,...,...,...,...,...,...
2195,107,34,32,26.774637,66.413269,6.780064,177.774507,Coffee
2196,99,15,27,27.417112,56.636362,6.086922,127.924610,Coffee
2197,118,33,30,24.131797,67.225123,6.362608,173.322839,Coffee
2198,117,32,34,26.272418,52.127394,6.758793,127.175293,Coffee


In [4]:
data.info()

<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 2200 entries, 0 to 2199
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   Nitrogen     2200 non-null   int64
 1   Phosphorus   2200 non-null   int64
 2   Potassium    2200 non-null   int64
 3   Temperature  2200 non-null   float64
 4   Humidity     2200 non-null   float64
 5   pH_Value     2200 non-null   float64
 6   Rainfall     2200 non-null   float64
 7   Crop         2200 non-null   object
dtypes: float64(4), int64(3), object(1)
memory usage: 144.2+ KB


In [5]:
data.isna().sum()

Nitrogen       0
Phosphorus     0
Potassium      0
Temperature    0
Humidity       0
pH_Value       0
Rainfall       0
Crop           0
dtype: int64

In [6]:
encoder = LabelEncoder()

In [7]:
x = data.drop(columns='Crop')
y = encoder.fit_transform(data['Crop'])

In [8]:
x_train , x_test , y_train , y_test = train_test_split(x,y,test_size=0.3,random_state=23,shuffle=True,stratify=y)

In [21]:
LR = LogisticRegression()

In [22]:
LR.fit(x_train,y_train)

[W] [06:29:41.639421] L-BFGS: max iterations reached
[W] [06:29:41.639781] Maximum iterations reached before solver is converged. To increase model accuracy you can increase the number of iterations (max_iter) or improve the scaling of the input data.


In [38]:
# Convert y_test and y_pred to int32 or int64
y_test = y_test.astype('int32')
y_pred = y_pred.astype('int32')

# Now calculate confusion matrix
from cuml.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
print(cm)


[[30  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0 30  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0 26  0  0  0  0  0  0  0  1  0  0  3  0  0  0  0  0  0  0  0]
 [ 0  0  0 30  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0 30  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0 30  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0 30  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0 30  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  1  0  0 26  0  0  0  0  0  0  0  0  0  0  0  3  0]
 [ 0  0  0  0  0  0  0  0  0 30  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0 30  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  4  0  0  0  0 26  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0 30  0  0  0  0  0  0  0  0  0]
 [ 0  0  4  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0