# Hyer ML Logistic Regression

This script initiates a straightforward machine learning project using Logistic Regression to predict a task fill status based on given features. The steps include loading a dataset, preprocessing the data by converting date-time information and encoding categorical variables, splitting the dataset into training and testing sets, and then training a Logistic Regression model on the training set. The model is finally evaluated on the testing set using metrics like accuracy, confusion matrix, and a classification report to understand its performance in predicting the task fill status.

## Import necessary libraries and modules

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## Load the data

In [None]:
print('Loading data...')
# Use the appropriate file path and file type (CSV or Excel) to load your data
# data = pd.read_excel('data_complete.xlsx')
data = pd.read_csv('subset.csv', low_memory=False)

## Data Exploration and Preprocessing

In [None]:
print(f'Preprocessing records: {len(data)}')

# Convert the date column to datetime format
data['DateCreated'] = pd.to_datetime(data['DateCreated'])

# Preprocessing
data['HourOfDay'] = data['DateCreated'].dt.hour
data['DayOfWeek'] = data['DateCreated'].dt.dayofweek
data['EstimatedHours'] = data['EstimatedNumberOfSeconds'] / 3600  # Convert seconds to hours

# Encoding categorical variables
data['PrivatePublic_encoded'] = data['Private or Public'].apply(lambda x: 0 if x == 'Public' else 1)

## Split the data

In [None]:
# Data Splitting
print('Splitting data...')

X = data[['HourOfDay', 'DayOfWeek', 'EstimatedHours', 'PrivatePublic_encoded']]
y = data['Task Fill Status'].apply(lambda x: 1 if x == 'Filled' else 0)

X_train, X_test, y_train, y_test = train_test_split(X.values, y.values, test_size=0.2, random_state=42)  # Convert to numpy arrays

## Model Training

In [None]:
print('Training model...')
model = LogisticRegression() # Use the appropriate model
model.fit(X_train, y_train)

## Model Evaluation

In [None]:
print('Evaluating model...')
y_pred = model.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')
print(f'Confusion Matrix:\n{confusion_matrix(y_test, y_pred)}')
print(f'Classification Report:\n{classification_report(y_test, y_pred)}')