<a href="https://colab.research.google.com/github/7atemAlawwad/T5/blob/main/Storing%26Retrieving_ML_Models_Exercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning Model Storage in MongoDB Exercise

This exercise will help you practice loading a machine learning model's weights from a MongoDB database using Python.

## Prerequisites

Make sure you have the following libraries installed:
- `pymongo`
- `scikit-learn`
- `python-dotenv`

You can install them using pip:

```bash
pip install pymongo scikit-learn python-dotenv
```

## Accessing the Prepared Database

We have set up a MongoDB database that contains the Iris dataset and pre-trained model weights for your practice. Here are the details to connect and access the data:

### Database Structure
- **Database Name**: `iris_database`
- **Collections**:
  - **Iris Dataset**: Stored in the `iris_collection`.
  - **Model Weights**: Stored in the `Models` collection.

### Connection Details
Use the following connection string to connect to the database via MongoDB:

```python
MONGO_CONNECTION_STRING = "mongodb+srv://tuwaiq_user:pawYC4S9KMzU4toN@pythoncluster.fqxzyxz.mongodb.net/?retryWrites=true&w=majority&appName=PythonCluster"
```

## Import libraries

In [1]:
pip install pymongo scikit-learn python-dotenv



In [2]:
import os
import pickle
from pymongo import MongoClient
from dotenv import load_dotenv
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import ssl

# Retrieve the Iris dataset from the database, and create a classifier using the retrieved data

## Step 1: Connect to MongoDB and retrieve the data

In [8]:
iris = datasets.load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

In [13]:
MONGO_CONNECTION_STRING = "mongodb+srv://hatemalawwad:1234@hatemcluster.1ruxsni.mongodb.net/?retryWrites=true&w=majority&appName=HatemCluster"
# Continue your code here

In [14]:
client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']
collection = db['iris_collection']

In [15]:
data_dict = df.to_dict("records")

In [16]:
collection.insert_many(data_dict)

InsertManyResult([ObjectId('66b0dbc1d0bdc6697bd26304'), ObjectId('66b0dbc1d0bdc6697bd26305'), ObjectId('66b0dbc1d0bdc6697bd26306'), ObjectId('66b0dbc1d0bdc6697bd26307'), ObjectId('66b0dbc1d0bdc6697bd26308'), ObjectId('66b0dbc1d0bdc6697bd26309'), ObjectId('66b0dbc1d0bdc6697bd2630a'), ObjectId('66b0dbc1d0bdc6697bd2630b'), ObjectId('66b0dbc1d0bdc6697bd2630c'), ObjectId('66b0dbc1d0bdc6697bd2630d'), ObjectId('66b0dbc1d0bdc6697bd2630e'), ObjectId('66b0dbc1d0bdc6697bd2630f'), ObjectId('66b0dbc1d0bdc6697bd26310'), ObjectId('66b0dbc1d0bdc6697bd26311'), ObjectId('66b0dbc1d0bdc6697bd26312'), ObjectId('66b0dbc1d0bdc6697bd26313'), ObjectId('66b0dbc1d0bdc6697bd26314'), ObjectId('66b0dbc1d0bdc6697bd26315'), ObjectId('66b0dbc1d0bdc6697bd26316'), ObjectId('66b0dbc1d0bdc6697bd26317'), ObjectId('66b0dbc1d0bdc6697bd26318'), ObjectId('66b0dbc1d0bdc6697bd26319'), ObjectId('66b0dbc1d0bdc6697bd2631a'), ObjectId('66b0dbc1d0bdc6697bd2631b'), ObjectId('66b0dbc1d0bdc6697bd2631c'), ObjectId('66b0dbc1d0bdc6697bd263

In [17]:
document_count = collection.count_documents({})

print(f'The number of documents in the collection is: {document_count}')

The number of documents in the collection is: 150


## Step 2: Preprocess the data

In [None]:
document_count = collection.count_documents({})

print(f'The number of documents in the collection is: {document_count}')

In [18]:
# MONGO_CONNECTION_STRING = "WRITE_MONGO_CONNECTION_STRING_HERE"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']  # Access the 'iris_database'
collection = db['iris_collection']  # Access the 'iris_collection'

# Retrieve the data from the collection
data = list(collection.find({}))
df = pd.DataFrame(data)

# Drop the MongoDB specific fields
df.drop(columns=['_id'], inplace=True)

In [19]:
X = df.drop(columns=['target']).values
y = df['target'].values

scaler = StandardScaler()
X = scaler.fit_transform(X)

## Step 3: Split the data into training and testing sets

In [20]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 4: Train a Support Vector Machine (SVM) classifier

In [21]:
model = SVC(kernel='linear', random_state=42)
model.fit(X_train, y_train)

## Step 5: Evaluate the model

In [22]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')
print('Classification Report:')
print(classification_report(y_test, y_pred))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

Accuracy: 96.67%
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Confusion Matrix:
[[10  0  0]
 [ 0  8  1]
 [ 0  0 11]]


# Loading the Model Weights

## Step 1: Connect to MongoDB and retrieve the model

In [23]:
# MONGO_CONNECTION_STRING = "WRITE_MONGO_CONNECTION_STRING_HERE"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']  # Access the 'iris_database'
collection = db['models']  # Create a collection to store models

# Serialize the model
model_bytes = pickle.dumps(model)

# Store the model in the collection
model_document = {
    'model_name': 'svm_iris',
    'model_data': model_bytes
}
collection.insert_one(model_document)

print("Model successfully saved to MongoDB")

Model successfully saved to MongoDB


## Step 2: Deserialize the model

In [24]:
# MONGO_CONNECTION_STRING = "WRITE_MONGO_CONNECTION_STRING_HERE"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']  # Access the 'iris_database'
collection = db['models']  # Access the 'models' collection

# Retrieve the model from the collection
model_document = collection.find_one({'model_name': 'svm_iris'})
model_bytes = model_document['model_data']

## Step 3: Load the Iris dataset and preprocess it

In [25]:
model = pickle.loads(model_bytes)

## Step 4: Evaluate the loaded model

In [27]:
iris = datasets.load_iris()
X = iris.data
y = iris.target

scaler = StandardScaler()
X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')
print('Classification Report:')
print(classification_report(y_test, y_pred))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

Accuracy: 96.67%
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Confusion Matrix:
[[10  0  0]
 [ 0  8  1]
 [ 0  0 11]]
