# Build Your First Machine Learning Project - Part 1

In this series, we'll build our first machine learning project end-to-end in Python. We'll do this in four bite-sized parts:
1. Data Operations - Ingest data, data wrangling and write to Snowflake
2. Exploratory data analysis (EDA) - Explore data, summary statistics, data visualization
3. Machine learning (ML) model building - Prepare data and build ML model
4. Data app - Build a sharable data app with Streamlit


## Install Prerequisite Libraries

Snowflake Notebooks includes common Python libraries by default. To add more, use the **Packages** dropdown in the top right. 

Let's add the following package:
- `modin` - Perform data operations (read/write) and wrangling just like pandas with the [Snowpark pandas API](https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/modin/index)
- `scikit-learn` - Perform data splits and build machine learning models

## Wine dataset

The dataset is a classic multi-class classification problem where the goal of the machine learning task is to classify each entry as belonging to one of three classes a wine belongs to based on its chemical analysis.

The wine dataset is comprised of 178 wine samples and each entry is described by 13 different features obtained from chemical analysis (also known as parameters, independent variables or X variables) and is assigned to one of three wine origin classes (three different cultivators/regions in Italy that each sample are derived from).



## Load data

Here, we'll use the wine dataset via `load_wine()` as provided by the `scikit-learn` package.

`load_wine()` returns a dictionary-like object containing the following that we'll use:
- `data` - a 2D data array with 13 feature columns and 178 wine sample rows
- `target` - a 1D data array of the target class
- `feature_names` - a list of the 13 feature names

In [None]:
from sklearn.datasets import load_wine

# Load the wine dataset
wine = load_wine()
wine

Create a DataFrame from `data`, `target` and `feature_names`. We've also converted the class column to the categorical type.

In [None]:
import modin.pandas as pd
import snowflake.snowpark.modin.plugin

# Create a DataFrame from the feature data
df = pd.DataFrame(data=wine.data, columns=wine.feature_names)

# Add class column and convert to categorical
df['class'] = wine.target

df

## Data Preparation

In one of the column, there's a forward slash (`/`) and we're going to replace it with `_`.

In [None]:
# Rename 'od280/od315_of_diluted_wines' to 'od280_od315_of_diluted_wines' for SQL compatibility
df.columns = [col.replace(' ', '_').replace('/', '_').lower() for col in df.columns]
df

## Code refactoring

Collectively, we could combine everything what we've done above into a clean custom function called `load_data()` for data loading.

In [None]:
# Load and prepare data
def load_data():
    wine = load_wine()
    # Create DataFrame with feature names
    df = pd.DataFrame(data=wine.data, columns=wine.feature_names)
    # Add class column and convert to categorical
    df['class'] = pd.Categorical(wine.target)
    # Rename 'od280/od315_of_diluted_wines' to 'od280_od315_of_diluted_wines' for SQL compatibility
    df.columns = [col.replace(' ', '_').replace('/', '_').lower() for col in df.columns]
    return df

# Load the data
df = load_data()
df

## Write data to a database table

### Determine current database and schema

But before we write to a Snowflake database table, let's figure out the current location where this notebook is located, which in turn is where are database table will reside in.

In [None]:
SELECT CURRENT_DATABASE(), CURRENT_SCHEMA();

In [None]:
df.to_snowflake("wine", if_exists="replace", index=False)

## Query data from table

In [None]:
SELECT * FROM CHANINN_DEMO_DATA.PUBLIC.WINE

## Resources
If you'd like to take a deeper dive into Snowpark pandas:
- [pandas on Snowflake](https://docs.snowflake.com/en/developer-guide/snowpark/python/pandas-on-snowflake)
- [Snowpark pandas API](https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/modin/index)
- [YouTube Playlist on Snowflake Notebooks](https://www.youtube.com/watch?v=YB1B6vcMaGE&list=PLavJpcg8cl1Efw8x_fBKmfA2AMwjUaeBI)