# Querying MinIO with BlazingSQL

[Read on Medium](https://blog.blazingdb.com/querying-minio-with-blazingsql-91b6b3485027?source=friends_link&sk=a30c725b5bd3e9394801e21fbf954283) | [Launch BlazingSQL Notebooks](https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/blog_posts/querying_minio_with_blazingsql.ipynb)

BlazingSQL is an open-source project, and as such, we gladly receive feature requests on our Github repository all the time. One such request ([#242](https://github.com/BlazingDB/blazingsql/issues/242)) was to allow registering a Storage Plugin that was AWS S3 API compliant, in this case MinIO.

> MinIO is a high performance, Kubernetes-friendly, object store released open source under the Apache License v2.0. It is API compatible with Amazon S3 cloud storage service. Using MinIO, you can build high performance infrastructure for machine learning, analytics and application data workloads.

In this Notebook, we'll go over how to install MinIO Server and register a MinIO bucket with BlazingSQL so we can run queries on top of files that are stored in a MinIO cluster.

In [None]:
from blazingsql import BlazingContext
bc = BlazingContext()

### Set Up MinIO Server
MinIO can be deployed on Linux, Kubernetes, MacOS, Windows and with Source. This demo was built in an Ubuntu 18.04 environment, so we’ll set up MinIO for Linux.

Start by downloading the MinIO Server. The code cell below will download MinIO Server one level above where you are running this Notebook.

In [None]:
import os

# do we have minio downloaded?
if not os.path.exists('../minio'):
    print('downloading minio')
    # no, so download it one directory above here
    !wget -P .. https://dl.min.io/server/minio/release/linux-amd64/minio

Once the download completes, simply grant permissions with; 

In [None]:
!chmod +x ../minio

### Launch MinIO Server
MinIO Server installed, you’re now equipped to start a Server by calling `./minio server` followed the relative path to where you’d like the Server to run. In this case, we’re going to set up inside the `Welcome_to_BlazingSQL_Notebooks` repo. 

**Note**: running a continuous server in a Jupyter Notebook Code cell means that cell will execute until the server is turned off, so you will not be able to run the MinIO Server within this Notebook and complete the demo simultaneously. Please start a Terminal session with the JupyterLab Launcher, then:

```bash
cd Welcome_to_BlazingSQL_Notebooks

./minio server .
```

In [None]:
# !../minio server .

<img src="../data/imgs/minio_server.png" width="69%" />

### Register MinIO S3 bucker
With the Server running in `Welcome_to_BlazingSQL_Noteooks`, all of the repo's sub-directories have automatically are accessable as S3 buckets.

For this demo, we'll use data stored in the `data` bucket. As this bucket is not public, we'll need to input our `access_key_id` and `secret_key`.

To help BlazingSQL locate MinIO bucket, we'll also input the Server's URL as the `endpoint_override`.

`endpoint_override` is a new parameter that was added with [BlazingDB/blazingsql#524](https://github.com/BlazingDB/blazingsql/pull/524) to support MinIO S3 by extending our AWS S3 Storage Plugin to use custom URL endpoints.

In [None]:
bc.s3('taxi', 
      bucket_name='data',
      access_key_id='minioadmin', 
      secret_key='minioadmin',
      endpoint_override="http://172.17.0.1:9000")

### Create & Query table from MinIO S3 bucket

Now that the MinIO S3 bucket is registered with BlazingContext, we can easily create & query tables from data stored there.

In [None]:
bc.create_table('taxi', 's3://taxi/sample_taxi.csv')

And pull DataFrames from those tables with `.sql()`;

In [None]:
bc.sql("select * from taxi")

Handoff results to data viz packages like Matplotlib the same way;

In [None]:
query = '''
        SELECT 
            cast(trip_distance as int) int_dist, tip_amount 
        FROM 
            taxi
            WHERE
                trip_distance <= 20
                AND tip_amount BETWEEN 0 AND 40
                '''
bc.sql(query).to_pandas().plot(kind='scatter', x='int_dist', y='tip_amount', figsize=(12, 4))

Train machine learning models with suites like cuML;

In [None]:
from cuml import LinearRegression
lr = LinearRegression()

lr.fit(X=bc.sql('SELECT tip_amount, passenger_count FROM taxi'),
       y=bc.sql('SELECT trip_distance FROM taxi')['trip_distance'])

import cudf
df = cudf.DataFrame()

df['tip_amount'] = [0.00, 5.00, 20.00]
df['passenger_count'] = [1.0, 1.0, 1.0]

lr.predict(df)

Or whatever else you please!

<a href='https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/blog_posts/querying_minio_with_blazingsql.ipynb'><img src="https://blazingsql.com/launch-notebooks.png" alt="Launch on BlazingSQL Notebooks" width="500"/></a>