# Application of classical machine learning and MLP methods to predict the flavor of wine by its chemical composition
In this section, we tried to use classical machine learning to predict the flavor of wine by its chemical composition, several types of algorithms were used, including decision trees, boostings, and MLP, since these methods are most effective on tabular data.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import make_scorer, accuracy_score, f1_score

## 1. Data loading and processing
In order to use the obtained chemical composition matrices for classification problems using classical ML methods, it is necessary to "flatten" the matrix into a vector

In [2]:
# Loading matrices and target lists
X_array = np.load('X_array.npy')
Y_array = np.load('Y_array.npy')

# Reshaping X_array
X_flattened = X_array.reshape(449, 44 * 100)

# Number of target parameters
num_targets = Y_array.shape[1]
target_names = ['Herbs and spices', 'Tobacco/Smoke', 'Wood', 'Berries', 'Citrus',
                'Fruits', 'Nuts', 'Coffee', 'Chocolate/Cacao', 'Flowers']

# Creating lists to store results
results = {
    'Target': [],
    'Model': [],
    'Train Accuracy': [],
    'Train F1-Score': [],
    'CV Accuracy': [],
    'CV F1-Score': [],
    'Test Accuracy': [],
    'Test F1-Score': []
}

# Function for cross-validation and model evaluation
def cross_validate_and_evaluate(model_name, model, X_train, y_train, X_test, y_test):
    # Train model
    model.fit(X_train, y_train)
    
    # Train set metrics
    y_train_pred = model.predict(X_train)
    train_accuracy = accuracy_score(y_train, y_train_pred)
    train_f1 = f1_score(y_train, y_train_pred, average='weighted')
    
    # Cross-validation metrics
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    accuracy_scorer = make_scorer(accuracy_score)
    f1_scorer = make_scorer(f1_score, average='weighted')
    
    cv_accuracies = cross_val_score(model, X_train, y_train, cv=kf, scoring=accuracy_scorer)
    cv_f1_scores = cross_val_score(model, X_train, y_train, cv=kf, scoring=f1_scorer)
    
    cv_accuracy = cv_accuracies.mean()
    cv_f1 = cv_f1_scores.mean()
    
    # Test set metrics
    y_test_pred = model.predict(X_test)
    test_accuracy = accuracy_score(y_test, y_test_pred)
    test_f1 = f1_score(y_test, y_test_pred, average='weighted')
    
    return train_accuracy, train_f1, cv_accuracy, cv_f1, test_accuracy, test_f1

# Main loop for training and evaluating models
for i in range(num_targets):
    # Select the i-th target parameter
    y = Y_array[:, i]

    # Split into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X_flattened, y, test_size=0.2, random_state=42)

    # Models for training
    models = {
        'RandomForest': RandomForestClassifier(random_state=42),
        'XGBoost': XGBClassifier(random_state=42, use_label_encoder=False, eval_metric='mlogloss'),
        'CatBoost': CatBoostClassifier(random_state=42, silent=True),
        'MLP': MLPClassifier(random_state=42)
    }
    
    # Train and evaluate each model
    for model_name, model in models.items():
        train_accuracy, train_f1, cv_accuracy, cv_f1, test_accuracy, test_f1 = cross_validate_and_evaluate(model_name, model, X_train, y_train, X_test, y_test)
        results['Target'].append(target_names[i])
        results['Model'].append(model_name)
        results['Train Accuracy'].append(train_accuracy)
        results['Train F1-Score'].append(train_f1)
        results['CV Accuracy'].append(cv_accuracy)
        results['CV F1-Score'].append(cv_f1)
        results['Test Accuracy'].append(test_accuracy)
        results['Test F1-Score'].append(test_f1)
        print(f"Target {target_names[i]} - {model_name} Train Accuracy: {train_accuracy:.4f}, Train F1-Score: {train_f1:.4f}, CV Accuracy: {cv_accuracy:.4f}, CV F1-Score: {cv_f1:.4f}, Test Accuracy: {test_accuracy:.4f}, Test F1-Score: {test_f1:.4f}")

# Create DataFrame with results
results_df = pd.DataFrame(results)
results_df

Target Herbs and spices - RandomForest Train Accuracy: 0.9972, Train F1-Score: 0.9972, CV Accuracy: 0.6575, CV F1-Score: 0.6510, Test Accuracy: 0.7000, Test F1-Score: 0.6971


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.



Target Herbs and spices - XGBoost Train Accuracy: 0.9972, Train F1-Score: 0.9972, CV Accuracy: 0.6268, CV F1-Score: 0.6209, Test Accuracy: 0.7111, Test F1-Score: 0.7076
Target Herbs and spices - CatBoost Train Accuracy: 0.9109, Train F1-Score: 0.9084, CV Accuracy: 0.6910, CV F1-Score: 0.6784, Test Accuracy: 0.7000, Test F1-Score: 0.6910




Target Herbs and spices - MLP Train Accuracy: 0.7549, Train F1-Score: 0.7327, CV Accuracy: 0.6937, CV F1-Score: 0.6662, Test Accuracy: 0.6222, Test F1-Score: 0.5962
Target Tobacco/Smoke - RandomForest Train Accuracy: 0.9972, Train F1-Score: 0.9972, CV Accuracy: 0.8523, CV F1-Score: 0.8152, Test Accuracy: 0.9222, Test F1-Score: 0.9189


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.



Target Tobacco/Smoke - XGBoost Train Accuracy: 0.9972, Train F1-Score: 0.9972, CV Accuracy: 0.8552, CV F1-Score: 0.8242, Test Accuracy: 0.9444, Test F1-Score: 0.9254
Target Tobacco/Smoke - CatBoost Train Accuracy: 0.9471, Train F1-Score: 0.9399, CV Accuracy: 0.8607, CV F1-Score: 0.8197, Test Accuracy: 0.9333, Test F1-Score: 0.9011




Target Tobacco/Smoke - MLP Train Accuracy: 0.8858, Train F1-Score: 0.8321, CV Accuracy: 0.8858, CV F1-Score: 0.8326, Test Accuracy: 0.9333, Test F1-Score: 0.9011
Target Wood - RandomForest Train Accuracy: 0.9972, Train F1-Score: 0.9972, CV Accuracy: 0.8079, CV F1-Score: 0.7627, Test Accuracy: 0.7778, Test F1-Score: 0.7169


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.



Target Wood - XGBoost Train Accuracy: 0.9972, Train F1-Score: 0.9972, CV Accuracy: 0.7828, CV F1-Score: 0.7582, Test Accuracy: 0.7556, Test F1-Score: 0.7038
Target Wood - CatBoost Train Accuracy: 0.9276, Train F1-Score: 0.9181, CV Accuracy: 0.8218, CV F1-Score: 0.7620, Test Accuracy: 0.7889, Test F1-Score: 0.7235




Target Wood - MLP Train Accuracy: 0.8468, Train F1-Score: 0.7793, CV Accuracy: 0.8413, CV F1-Score: 0.7719, Test Accuracy: 0.7889, Test F1-Score: 0.7056
Target Berries - RandomForest Train Accuracy: 1.0000, Train F1-Score: 1.0000, CV Accuracy: 0.7214, CV F1-Score: 0.7223, Test Accuracy: 0.7000, Test F1-Score: 0.6985


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.



Target Berries - XGBoost Train Accuracy: 1.0000, Train F1-Score: 1.0000, CV Accuracy: 0.7381, CV F1-Score: 0.7391, Test Accuracy: 0.6333, Test F1-Score: 0.6328
Target Berries - CatBoost Train Accuracy: 0.9443, Train F1-Score: 0.9442, CV Accuracy: 0.7243, CV F1-Score: 0.7244, Test Accuracy: 0.6889, Test F1-Score: 0.6867




Target Berries - MLP Train Accuracy: 0.7660, Train F1-Score: 0.7635, CV Accuracy: 0.6574, CV F1-Score: 0.6555, Test Accuracy: 0.6556, Test F1-Score: 0.6559
Target Citrus - RandomForest Train Accuracy: 1.0000, Train F1-Score: 1.0000, CV Accuracy: 0.7661, CV F1-Score: 0.7494, Test Accuracy: 0.7222, Test F1-Score: 0.6960


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.



Target Citrus - XGBoost Train Accuracy: 1.0000, Train F1-Score: 1.0000, CV Accuracy: 0.7495, CV F1-Score: 0.7377, Test Accuracy: 0.7444, Test F1-Score: 0.7261
Target Citrus - CatBoost Train Accuracy: 0.9610, Train F1-Score: 0.9597, CV Accuracy: 0.7828, CV F1-Score: 0.7612, Test Accuracy: 0.7000, Test F1-Score: 0.6447




Target Citrus - MLP Train Accuracy: 0.8022, Train F1-Score: 0.7723, CV Accuracy: 0.7746, CV F1-Score: 0.7336, Test Accuracy: 0.6778, Test F1-Score: 0.6389
Target Fruits - RandomForest Train Accuracy: 1.0000, Train F1-Score: 1.0000, CV Accuracy: 0.6962, CV F1-Score: 0.6874, Test Accuracy: 0.6889, Test F1-Score: 0.6971


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.



Target Fruits - XGBoost Train Accuracy: 1.0000, Train F1-Score: 1.0000, CV Accuracy: 0.6490, CV F1-Score: 0.6439, Test Accuracy: 0.6556, Test F1-Score: 0.6666
Target Fruits - CatBoost Train Accuracy: 0.9109, Train F1-Score: 0.9085, CV Accuracy: 0.7102, CV F1-Score: 0.6972, Test Accuracy: 0.7111, Test F1-Score: 0.7111




Target Fruits - MLP Train Accuracy: 0.7521, Train F1-Score: 0.7259, CV Accuracy: 0.6656, CV F1-Score: 0.6283, Test Accuracy: 0.7222, Test F1-Score: 0.7093
Target Nuts - RandomForest Train Accuracy: 0.9972, Train F1-Score: 0.9972, CV Accuracy: 0.9193, CV F1-Score: 0.8973, Test Accuracy: 0.8667, Test F1-Score: 0.8254


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.



Target Nuts - XGBoost Train Accuracy: 0.9972, Train F1-Score: 0.9972, CV Accuracy: 0.9137, CV F1-Score: 0.8938, Test Accuracy: 0.8556, Test F1-Score: 0.8197
Target Nuts - CatBoost Train Accuracy: 0.9805, Train F1-Score: 0.9789, CV Accuracy: 0.9276, CV F1-Score: 0.8982, Test Accuracy: 0.8778, Test F1-Score: 0.8310




Target Nuts - MLP Train Accuracy: 0.9331, Train F1-Score: 0.9009, CV Accuracy: 0.9332, CV F1-Score: 0.9010, Test Accuracy: 0.8889, Test F1-Score: 0.8366
Target Coffee - RandomForest Train Accuracy: 0.9972, Train F1-Score: 0.9972, CV Accuracy: 0.9360, CV F1-Score: 0.9194, Test Accuracy: 0.9444, Test F1-Score: 0.9283


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.



Target Coffee - XGBoost Train Accuracy: 0.9972, Train F1-Score: 0.9972, CV Accuracy: 0.9304, CV F1-Score: 0.9129, Test Accuracy: 0.9667, Test F1-Score: 0.9570
Target Coffee - CatBoost Train Accuracy: 0.9777, Train F1-Score: 0.9749, CV Accuracy: 0.9443, CV F1-Score: 0.9200, Test Accuracy: 0.9556, Test F1-Score: 0.9338




Target Coffee - MLP Train Accuracy: 0.9526, Train F1-Score: 0.9341, CV Accuracy: 0.9443, CV F1-Score: 0.9236, Test Accuracy: 0.9556, Test F1-Score: 0.9338
Target Chocolate/Cacao - RandomForest Train Accuracy: 1.0000, Train F1-Score: 1.0000, CV Accuracy: 0.8830, CV F1-Score: 0.8513, Test Accuracy: 0.8889, Test F1-Score: 0.8598


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.



Target Chocolate/Cacao - XGBoost Train Accuracy: 1.0000, Train F1-Score: 1.0000, CV Accuracy: 0.8830, CV F1-Score: 0.8581, Test Accuracy: 0.8778, Test F1-Score: 0.8388
Target Chocolate/Cacao - CatBoost Train Accuracy: 0.9610, Train F1-Score: 0.9567, CV Accuracy: 0.8914, CV F1-Score: 0.8565, Test Accuracy: 0.8778, Test F1-Score: 0.8388




Target Chocolate/Cacao - MLP Train Accuracy: 0.9025, Train F1-Score: 0.8589, CV Accuracy: 0.8914, CV F1-Score: 0.8482, Test Accuracy: 0.8778, Test F1-Score: 0.8206
Target Flowers - RandomForest Train Accuracy: 1.0000, Train F1-Score: 1.0000, CV Accuracy: 0.7495, CV F1-Score: 0.7383, Test Accuracy: 0.6333, Test F1-Score: 0.6347


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.



Target Flowers - XGBoost Train Accuracy: 1.0000, Train F1-Score: 1.0000, CV Accuracy: 0.7078, CV F1-Score: 0.7048, Test Accuracy: 0.6556, Test F1-Score: 0.6542
Target Flowers - CatBoost Train Accuracy: 0.9387, Train F1-Score: 0.9373, CV Accuracy: 0.7552, CV F1-Score: 0.7441, Test Accuracy: 0.6556, Test F1-Score: 0.6542




Target Flowers - MLP Train Accuracy: 0.7688, Train F1-Score: 0.7464, CV Accuracy: 0.7050, CV F1-Score: 0.6757, Test Accuracy: 0.6778, Test F1-Score: 0.6510
              Target         Model  Train Accuracy  Train F1-Score  \
0   Herbs and spices  RandomForest        0.997214        0.997212   
1   Herbs and spices       XGBoost        0.997214        0.997212   
2   Herbs and spices      CatBoost        0.910864        0.908358   
3   Herbs and spices           MLP        0.754875        0.732721   
4      Tobacco/Smoke  RandomForest        0.997214        0.997199   
5      Tobacco/Smoke       XGBoost        0.997214        0.997199   
6      Tobacco/Smoke      CatBoost        0.947075        0.939862   
7      Tobacco/Smoke           MLP        0.885794        0.832149   
8               Wood  RandomForest        0.997214        0.997204   
9               Wood       XGBoost        0.997214        0.997204   
10              Wood      CatBoost        0.927577        0.918119   
11  



In [3]:
results_df

Unnamed: 0,Target,Model,Train Accuracy,Train F1-Score,CV Accuracy,CV F1-Score,Test Accuracy,Test F1-Score
0,Herbs and spices,RandomForest,0.997214,0.997212,0.657512,0.650989,0.7,0.697143
1,Herbs and spices,XGBoost,0.997214,0.997212,0.626839,0.620869,0.711111,0.707576
2,Herbs and spices,CatBoost,0.910864,0.908358,0.691002,0.678364,0.7,0.690952
3,Herbs and spices,MLP,0.754875,0.732721,0.693701,0.666249,0.622222,0.596197
4,Tobacco/Smoke,RandomForest,0.997214,0.997199,0.852347,0.815207,0.922222,0.918917
5,Tobacco/Smoke,XGBoost,0.997214,0.997199,0.855203,0.824241,0.944444,0.925406
6,Tobacco/Smoke,CatBoost,0.947075,0.939862,0.86072,0.819718,0.933333,0.901149
7,Tobacco/Smoke,MLP,0.885794,0.832149,0.885837,0.832616,0.933333,0.901149
8,Wood,RandomForest,0.997214,0.997204,0.807903,0.762689,0.777778,0.716916
9,Wood,XGBoost,0.997214,0.997204,0.782825,0.758208,0.755556,0.703846


In [4]:
results_df.to_csv('ML_res.csv')