In [24]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report





GitHub Activity Verification Model
Overview
This machine learning model is designed to predict the verification status of a user's resume based on their GitHub activity. The model takes into account key GitHub metrics such as activity level, pull requests, and commits to determine whether a resume should be considered verified.

Features
GitHub_Activity: Quantifies the overall GitHub activity level of a user.
Pull_Requests: Reflects the number of pull requests made by the user.
Commits: Represents the number of code commits made by the user.
Target Variable
Resume_Verified: Binary variable indicating whether the user's resume is considered verified (1) or not (0).
Usage
Data Collection: Gather data on GitHub activities, including activity level, pull requests, and commits.
Data Preparation: Organize the data into a format suitable for training the model.
Training: Use the provided script to train the Random Forest Classifier on your dataset.
Prediction: Once trained, the model can make predictions on new data to determine the verification status of a user's resume.
Evaluation
The model's performance can be assessed using standard classification metrics such as accuracy, precision, recall, and F1 score. Adjust the model or features as needed based on the characteristics of your dataset.

Dependencies
numpy
pandas
scikit-learn

In [2]:
np.random.seed(42)
data_size = 1000

In [15]:
data = {
    'GitHub_Activity': np.random.randint(0, 100, data_size),
    'Pull_Requests': np.random.randint(0, 50, data_size),
    'Commits': np.random.randint(0, 100, data_size),
    'Resume_Verified': np.random.choice([0, 1], data_size)
}

In [16]:
df = pd.DataFrame(data)

In [17]:
X = df[['GitHub_Activity', 'Pull_Requests', 'Commits']]
y = df['Resume_Verified']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [18]:
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

In [19]:
y_pred = model.predict(X_test)

In [20]:
# # Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

In [23]:
print(f'Accuracy: {accuracy}')
print('Classification Report:\n', report)

Accuracy: 0.51
Classification Report:
               precision    recall  f1-score   support

           0       0.53      0.51      0.52       105
           1       0.48      0.51      0.49        95

    accuracy                           0.51       200
   macro avg       0.51      0.51      0.51       200
weighted avg       0.51      0.51      0.51       200

