In [16]:
! pip install optuna

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063


In [17]:
# Importing necessary libraries
import optuna
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import joblib

In [18]:
# Step 1: Load the data
iris = load_iris()
X, y = iris.data, iris.target

In [19]:
# Step 2: Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [20]:
# Step 3
def objective(trial):
    # Define hyperparameters based on given search space
    penalty = trial.suggest_categorical('penalty', ['l1', 'l2'])
    C = trial.suggest_float('C', 1e-4, 1e4, log=True)

    # Initialize and train classifier
    model = make_pipeline(
        StandardScaler(),
        LogisticRegression(solver='liblinear', penalty=penalty, C=C, random_state=0)
    )
    model.fit(X_train, y_train)

    # Predict and return accuracy
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    return accuracy

In [21]:
# Step 4: Create a study object and specify the direction is 'maximize'.
study = optuna.create_study(direction='maximize')

[I 2023-11-02 19:44:43,026] A new study created in memory with name: no-name-494d6e25-c43a-45cc-9481-ca195e118d1e


In [22]:
# Step 5: Optimize the study, the objective function is passed in as the first argument.
study.optimize(objective, n_trials=100)

[I 2023-11-02 19:44:43,044] Trial 0 finished with value: 0.9666666666666667 and parameters: {'penalty': 'l2', 'C': 4.502313328282632}. Best is trial 0 with value: 0.9666666666666667.
[I 2023-11-02 19:44:43,046] Trial 1 finished with value: 1.0 and parameters: {'penalty': 'l2', 'C': 134.6114988665146}. Best is trial 1 with value: 1.0.
[I 2023-11-02 19:44:43,048] Trial 2 finished with value: 1.0 and parameters: {'penalty': 'l1', 'C': 508.67005398069693}. Best is trial 1 with value: 1.0.
[I 2023-11-02 19:44:43,050] Trial 3 finished with value: 0.6 and parameters: {'penalty': 'l1', 'C': 0.019554952606035996}. Best is trial 1 with value: 1.0.
[I 2023-11-02 19:44:43,052] Trial 4 finished with value: 1.0 and parameters: {'penalty': 'l1', 'C': 26.46394941581556}. Best is trial 1 with value: 1.0.
[I 2023-11-02 19:44:43,055] Trial 5 finished with value: 1.0 and parameters: {'penalty': 'l1', 'C': 881.8841675931137}. Best is trial 1 with value: 1.0.
[I 2023-11-02 19:44:43,057] Trial 6 finished wit

In [23]:
# Step 6: Retrieve the best parameters
best_params = study.best_trial.params
print("Best parameters: ", best_params)

Best parameters:  {'penalty': 'l2', 'C': 134.6114988665146}


In [24]:
# Step 7: Retrieve the best model
best_model = make_pipeline(
    StandardScaler(),
    LogisticRegression(**best_params, solver='liblinear', random_state=0)
)

In [25]:
# Step 10: Train the best model
best_model.fit(X_train, y_train)

In [26]:
# Step 9: Make predictions using the test set
y_pred = best_model.predict(X_test)

In [27]:
# Step 10: Evaluate the model's performance
cv_scores = cross_val_score(best_model, X_train, y_train, cv=5)
print(f"Cross-validation scores: {cv_scores}")
print(f"Average CV Score: {cv_scores.mean()}")
print()
print("Accuracy on test set:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

Cross-validation scores: [0.95833333 1.         0.83333333 1.         0.95833333]
Average CV Score: 0.95

Accuracy on test set: 1.0
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30


In [28]:
# Step 12: Predicting on new data
new_data = np.array([[5.1, 3.5, 1.4, 0.2]])
new_prediction = best_model.predict(new_data)
print("Prediction for the new data:", new_prediction)

Prediction for the new data: [0]


In [29]:
# Step 13: Save the model to a file for future use
joblib.dump(best_model, 'best_logistic_model.pkl')

['best_logistic_model.pkl']

After you save your trained model to a file using joblib.dump, you would typically perform the following steps depending on your project needs:

1. Deployment: If the model's performance is satisfactory, you may deploy it to a production environment where it can start making predictions on new, unseen data. This can involve setting up a REST API, using a model serving platform, or integrating it directly into an application.

1. Monitoring: Once deployed, it's crucial to monitor your model to ensure it maintains performance over time and to check if it's still relevant for the data it's receiving. Monitoring can also help you detect when the model might need retraining.

1. Retraining: As new data becomes available, you might retrain your model periodically with the new data to keep it up to date. This is especially important if the underlying data distribution changes over time (a phenomenon known as concept drift).

1. Versioning: You should version control your model like you would with code. This means saving new versions of the model each time you retrain, so you can roll back to a previous version if necessary.

1. Documentation: Documenting your model's performance metrics, the hyperparameters used, and any peculiarities noted during training/testing is vital for reproducibility and for future reference.

1. Model Analysis: Sometimes, after deploying a model, you'll want to further analyze what kind of predictions it's making. Techniques like a confusion matrix, ROC curve analysis, or feature importance analysis can provide insight into how your model is operating.

1. Feedback Loop: In many machine learning systems, you'll set up a feedback loop where the model's predictions are evaluated by users or domain experts, and their feedback is used to further improve the model.

1. Load the Model: When you need to make predictions, you will load the model using joblib.load and then call its predict or predict_proba methods.



In [30]:
# Load the saved model
loaded_model = joblib.load('best_logistic_model.pkl')

# Predict on new data
new_data = np.array([[5.9, 3.0, 5.1, 1.8]])  # Replace this with new data
prediction = loaded_model.predict(new_data)
print(f"The predicted class for the new data is: {prediction}")

The predicted class for the new data is: [2]
