### Machine Learning for Data Quality Prediction
**Description**: Use a machine learning model to predict data quality issues.

**Steps**:
1. Create a mock dataset with features and label (quality issue/label: 0: good, 1: issue).
2. Train a machine learning model.
3. Evaluate the model performance.

In [1]:
# write your code from here
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# 1. Create mock dataset
data = {
    "num_missing_values": [0, 2, 1, 5, 0, 3, 0, 4, 2, 0, 1, 0, 3, 5, 0],
    "num_outliers": [0, 1, 0, 3, 0, 2, 0, 4, 1, 0, 0, 0, 2, 3, 0],
    "avg_field_length": [10, 8, 9, 6, 11, 7, 10, 5, 8, 10, 9, 11, 7, 6, 10],
    "data_quality_issue": [0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0],
}
df = pd.DataFrame(data)

# Features and labels
X = df.drop("data_quality_issue", axis=1)
y = df["data_quality_issue"]

# 2. Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train RandomForestClassifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# 3. Evaluate model
y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Accuracy: 1.0

Confusion Matrix:
 [[3 0]
 [0 2]]

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00         3
           1       1.00      1.00      1.00         2

    accuracy                           1.00         5
   macro avg       1.00      1.00      1.00         5
weighted avg       1.00      1.00      1.00         5

