# Machine Learning Model Storage in MongoDB Exercise

This exercise will help you practice loading a machine learning model's weights from a MongoDB database using Python.

## Prerequisites

Make sure you have the following libraries installed:
- `pymongo`
- `scikit-learn`
- `python-dotenv`

You can install them using pip:

```bash
pip install pymongo scikit-learn python-dotenv
```

## Accessing the Prepared Database

We have set up a MongoDB database that contains the Iris dataset and pre-trained model weights for your practice. Here are the details to connect and access the data:

### Database Structure
- **Database Name**: `iris_database`
- **Collections**:
  - **Iris Dataset**: Stored in the `iris_collection`.
  - **Model Weights**: Stored in the `Models` collection.

### Connection Details
Use the following connection string to connect to the database via MongoDB:

```python
MONGO_CONNECTION_STRING = "mongodb+srv://tuwaiq_user:pawYC4S9KMzU4toN@pythoncluster.fqxzyxz.mongodb.net/?retryWrites=true&w=majority&appName=PythonCluster"
```

## Import libraries

In [7]:
pip install pymongo scikit-learn python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [8]:
import os
import pickle
from pymongo import MongoClient
from dotenv import load_dotenv
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Retrieve the Iris dataset from the database, and create a classifier using the retrieved data

## Step 1: Connect to MongoDB and retrieve the data

In [9]:
iris = datasets.load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

In [10]:
MONGO_CONNECTION_STRING = "mongodb+srv://nasersaqerr:K6I8t3w6tdlzVejz@nasser.4ulhqnp.mongodb.net/?retryWrites=true&w=majority&appName=Nasser"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']
collection = db['iris_collection']

In [11]:
data_dict = df.to_dict("records")

In [12]:
collection.insert_many(data_dict)

InsertManyResult([ObjectId('66b0d6433e4c4abd942a389b'), ObjectId('66b0d6433e4c4abd942a389c'), ObjectId('66b0d6433e4c4abd942a389d'), ObjectId('66b0d6433e4c4abd942a389e'), ObjectId('66b0d6433e4c4abd942a389f'), ObjectId('66b0d6433e4c4abd942a38a0'), ObjectId('66b0d6433e4c4abd942a38a1'), ObjectId('66b0d6433e4c4abd942a38a2'), ObjectId('66b0d6433e4c4abd942a38a3'), ObjectId('66b0d6433e4c4abd942a38a4'), ObjectId('66b0d6433e4c4abd942a38a5'), ObjectId('66b0d6433e4c4abd942a38a6'), ObjectId('66b0d6433e4c4abd942a38a7'), ObjectId('66b0d6433e4c4abd942a38a8'), ObjectId('66b0d6433e4c4abd942a38a9'), ObjectId('66b0d6433e4c4abd942a38aa'), ObjectId('66b0d6433e4c4abd942a38ab'), ObjectId('66b0d6433e4c4abd942a38ac'), ObjectId('66b0d6433e4c4abd942a38ad'), ObjectId('66b0d6433e4c4abd942a38ae'), ObjectId('66b0d6433e4c4abd942a38af'), ObjectId('66b0d6433e4c4abd942a38b0'), ObjectId('66b0d6433e4c4abd942a38b1'), ObjectId('66b0d6433e4c4abd942a38b2'), ObjectId('66b0d6433e4c4abd942a38b3'), ObjectId('66b0d6433e4c4abd942a38

In [13]:
document_count = collection.count_documents({})

print(f'The number of documents in the collection is: {document_count}')

The number of documents in the collection is: 450


In [14]:
# MONGO_CONNECTION_STRING = "WRITE_MONGO_CONNECTION_STRING_HERE"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']  # Access the 'iris_database'
collection = db['iris_collection']  # Access the 'iris_collection'

# Retrieve the data from the collection
data = list(collection.find({}))
df = pd.DataFrame(data)

# Drop the MongoDB specific fields
df.drop(columns=['_id'], inplace=True)

## Step 2: Preprocess the data

In [15]:
X = df.drop(columns=['target']).values
y = df['target'].values

scaler = StandardScaler()
X = scaler.fit_transform(X)

## Step 3: Split the data into training and testing sets

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 4: Train a Support Vector Machine (SVM) classifier

In [17]:
model = SVC(kernel='linear', random_state=42)
model.fit(X_train, y_train)

SVC(kernel='linear', random_state=42)

## Step 5: Evaluate the model

In [18]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')
print('Classification Report:')
print(classification_report(y_test, y_pred))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

Accuracy: 94.44%
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        25
           1       1.00      0.86      0.93        36
           2       0.85      1.00      0.92        29

    accuracy                           0.94        90
   macro avg       0.95      0.95      0.95        90
weighted avg       0.95      0.94      0.94        90

Confusion Matrix:
[[25  0  0]
 [ 0 31  5]
 [ 0  0 29]]


# Loading the Model Weights

## Step 1: Connect to MongoDB and retrieve the model

In [19]:
# MONGO_CONNECTION_STRING = "WRITE_MONGO_CONNECTION_STRING_HERE"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']  # Access the 'iris_database'
collection = db['models']  # Create a collection to store models

# Serialize the model
model_bytes = pickle.dumps(model)

# Store the model in the collection
model_document = {
    'model_name': 'svm_iris',
    'model_data': model_bytes
}
collection.insert_one(model_document)

print("Model successfully saved to MongoDB")

Model successfully saved to MongoDB


In [20]:
# MONGO_CONNECTION_STRING = "WRITE_MONGO_CONNECTION_STRING_HERE"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']  # Access the 'iris_database'
collection = db['models']  # Access the 'models' collection

# Retrieve the model from the collection
model_document = collection.find_one({'model_name': 'svm_iris'})
model_bytes = model_document['model_data']

## Step 2: Deserialize the model

In [21]:
model = pickle.loads(model_bytes)

## Step 3: Load the Iris dataset and preprocess it

In [23]:
iris = datasets.load_iris()
X = iris.data
y = iris.target

scaler = StandardScaler()
X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 4: Evaluate the loaded model

In [24]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')
print('Classification Report:')
print(classification_report(y_test, y_pred))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

Accuracy: 96.67%
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Confusion Matrix:
[[10  0  0]
 [ 0  8  1]
 [ 0  0 11]]
