### Machine Learning for Data Quality Prediction
**Description**: Use a machine learning model to predict data quality issues.

**Steps**:
1. Create a mock dataset with features and label (quality issue/label: 0: good, 1: issue).
2. Train a machine learning model.
3. Evaluate the model performance.

In [1]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.exceptions import NotFittedError

# Step 1: Generate Mock Dataset
def generate_data(size=1000):
    np.random.seed(42)
    df = pd.DataFrame({
        "missing_value_rate": np.random.rand(size),
        "num_outliers": np.random.poisson(2, size),
        "avg_data_age": np.random.randint(1, 365, size),
        "inconsistent_records": np.random.randint(0, 5, size)
    })
    # Label: 1 if poor quality, else 0
    df["quality_issue"] = ((df["missing_value_rate"] > 0.4) | (df["inconsistent_records"] > 2)).astype(int)
    return df

try:
    df = generate_data()

    # Step 2: Split Dataset
    X = df.drop("quality_issue", axis=1)
    y = df["quality_issue"]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Step 3: Train ML Model with basic hyperparameter tuning
    model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
    model.fit(X_train, y_train)

    # Step 4: Cross-Validation
    cv_scores = cross_val_score(model, X, y, cv=5)
    print(f"Cross-validation Accuracy (5-fold): {cv_scores.mean():.2f} ± {cv_scores.std():.2f}")

    # Step 5: Evaluate Model
    y_pred = model.predict(X_test)
    print("\nTest Accuracy:", accuracy_score(y_test, y_pred))
    print("\nClassification Report:\n", classification_report(y_test, y_pred))
    print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

except ValueError as ve:
    print("ValueError:", ve)

except NotFittedError as nf:
    print("Model not fitted error:", nf)

except Exception as e:
    print("An unexpected error occurred:", e)

Cross-validation Accuracy (5-fold): 1.00 ± 0.00

Test Accuracy: 0.995

Classification Report:
               precision    recall  f1-score   support

           0       1.00      0.98      0.99        54
           1       0.99      1.00      1.00       146

    accuracy                           0.99       200
   macro avg       1.00      0.99      0.99       200
weighted avg       1.00      0.99      0.99       200

Confusion Matrix:
 [[ 53   1]
 [  0 146]]
