# AutoML MLJar
The mljar-supervised library is an automated machine learning (AutoML) tool tailored for working with tabular datasets in Python. Aimed at optimizing a data scientist's workflow, it simplifies the process by automating data preprocessing, machine learning model construction, and hyperparameter optimization to identify the optimal model. Far from being a mysterious black-box, it provides complete transparency into the construction of the ML pipeline, offering detailed Markdown reports for each model created.

## Setup

In [None]:
import sys
import os

# Get the current working directory
current_working_directory = os.getcwd()

# Go up one level from the current working directory
parent_directory = os.path.join(current_working_directory, '..')

# Add the parent directory to sys.path
sys.path.append(parent_directory)

os.getcwd()

In [None]:
%pip install mljar-supervised
%pip install scikit-learn
%pip install pandas

In [None]:
%load_ext autoreload

In [None]:
%autoreload 

# Import the necessary libraries
%matplotlib inline
import warnings
import pandas as pd
from sklearn.metrics import accuracy_score, classification_report 
from supervised.automl import AutoML

pd.set_option('display.max_columns', 200)
warnings.filterwarnings('ignore')

from src.features.post_processor import save_predictions
from src.features.ml_service import  prepare_data, prepare_test_data

## Load data

In [None]:
x_train, _, x_test, y_train, _, y_test = prepare_data(validation_size=0.0, test_size=0.1)
train_data = pd.concat([x_train, y_train], axis=1)

## Train model

In [None]:
# Initialize MLJAR AutoML
predictor = AutoML(mode="Explain", 
    random_state=42,
    n_jobs=-1, 
    golden_features=True,
    features_selection=True,
    stack_models=True
    )

# Train the model
predictor.fit(x_train, y_train)


## Make predictions

In [None]:
# Evaluate on the test set
y_test_pred = predictor.predict(x_test)
test_accuracy = accuracy_score(y_test, y_test_pred)
print("Test Accuracy: ", test_accuracy)
print("Test Classification Report:\n", classification_report(y_test, y_test_pred))
# MLJAR also provides a leaderboard with model performance
predictor.report()


## Save model

In [None]:
x_test = prepare_test_data()
final_predictions = predictor.predict(x_test)

save_predictions(final_predictions, 'mljar_automl')