## **Self-Training Model**

Self-Training is a semi-supervised learning algorithm that applies a supervised learning model on a partially labeled dataset. It iteratively refines the model by predicting labels for the unlabeled data and adding those predictions to the training set, which improves the model's performance on the remaining unlabeled data.


**Imports**

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.semi_supervised import SelfTrainingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import make_classification


**Data Loading**

In [None]:
# For demonstration, we create a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_classes=2, random_state=42)

# Simulating unlabeled data (assigning some labels as -1)
y[::5] = -1  # Assigning -1 (unlabeled) to every 5th sample


**Minimal Preprocessing**

In [None]:
# Split data into labeled and unlabeled sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# The y_train contains labeled data and the rest is considered as unlabeled


**Model Building**

In [None]:
# Create a base classifier (e.g., RandomForest)
base_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Create the Self-Training model
self_training_model = SelfTrainingClassifier(base_classifier)

# Train the model
self_training_model.fit(X_train, y_train)


**Predictions**

In [None]:
# Make predictions on the test set
y_pred = self_training_model.predict(X_test)


**Performance Metrics**

In [None]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print("Classification Report:\n", classification_report(y_test, y_pred))


**Visualizations**

In [None]:
# Plotting the confusion matrix
from sklearn.metrics import confusion_matrix
import seaborn as sns

cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Class 0", "Class 1"], yticklabels=["Class 0", "Class 1"])
plt.title("Confusion Matrix")
plt.ylabel("Actual")
plt.xlabel("Predicted")
plt.show()
