## **Pseudo-Labeling**

Pseudo-Labeling is a semi-supervised learning technique where a model is first trained on labeled data. After the initial model is trained, it is used to generate labels for the unlabeled data, which are treated as "pseudo-labels." These pseudo-labeled data points are then added to the training set, and the model is retrained.



**Imports**

In [3]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression


**Data Loading**

In [None]:
# Create a synthetic dataset with some unlabeled data
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
y[::5] = -1  # Assigning -1 (unlabeled) to every 5th sample

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


**Preprocessing**

In [None]:
# No significant preprocessing required for this synthetic dataset


**Model Building**

In [None]:
# Initialize the base model (Logistic Regression)
model = LogisticRegression()

# Train the model using the labeled data
model.fit(X_train, y_train[y_train != -1])  # Fit only on labeled data

# Generate pseudo-labels for the unlabeled data
pseudo_labels = model.predict(X_train[y_train == -1])

# Add pseudo-labeled data to the training set
X_train_pseudo = np.vstack([X_train[y_train != -1], X_train[y_train == -1]])
y_train_pseudo = np.concatenate([y_train[y_train != -1], pseudo_labels])

# Retrain the model with pseudo-labeled data
model.fit(X_train_pseudo, y_train_pseudo)


**Pedictions**

In [None]:
# Make predictions on the test set
y_pred = model.predict(X_test)


**Performance Metrics**

In [None]:
# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")


**Visualizations**

In [None]:
# Visualizing the results (for demonstration purposes, we'll plot only two features)
import matplotlib.pyplot as plt

plt.scatter(X_test[:, 0], X_test[:, 1], c=y_pred, cmap='viridis')
plt.title("Pseudo-Labeling - Prediction")
plt.show()
