# HR Attrition Analysis

## 1. Business Understanding
Tujuan dari proyek ini adalah untuk menganalisis data karyawan dan memprediksi kemungkinan seorang karyawan akan keluar (attrition), guna membantu tim HR dalam membuat keputusan berbasis data.

## 2. Import Library

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

import warnings
warnings.filterwarnings('ignore')

## 3. Load Dataset

In [None]:
df = pd.read_csv("employee_data.csv")
df.head()

## 4. Data Understanding

In [None]:
df.info()
df.describe()

## 5. Exploratory Data Analysis

In [None]:
# Distribusi target
sns.countplot(data=df, x='Attrition')
plt.title('Distribusi Attrition')
plt.show()

In [None]:
# Korelasi fitur numerik
plt.figure(figsize=(10,8))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.title("Correlation Matrix")
plt.show()

## 6. Data Preparation

In [None]:
# Drop kolom tidak relevan
df = df.drop(['EmployeeCount', 'EmployeeNumber', 'Over18', 'StandardHours'], axis=1)

# Encode target
df['Attrition'] = df['Attrition'].map({'Yes': 1, 'No': 0})

# Encode fitur kategorik
cat_cols = df.select_dtypes(include='object').columns
df[cat_cols] = df[cat_cols].apply(LabelEncoder().fit_transform)

# Scaling
scaler = StandardScaler()
num_cols = df.select_dtypes(include=['int64', 'float64']).columns.drop('Attrition')
df[num_cols] = scaler.fit_transform(df[num_cols])

# Split data
X = df.drop('Attrition', axis=1)
y = df['Attrition']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 7. Modeling

In [None]:
# Logistic Regression
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred_logreg = logreg.predict(X_test)

# Random Forest
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)

## 8. Evaluation

In [None]:
print("Logistic Regression")
print(classification_report(y_test, y_pred_logreg))
print("Accuracy:", accuracy_score(y_test, y_pred_logreg))

In [None]:
print("Random Forest Classifier")
print(classification_report(y_test, y_pred_rf))
print("Accuracy:", accuracy_score(y_test, y_pred_rf))

## 9. Conclusion
Beberapa fitur yang memiliki pengaruh kuat terhadap attrition antara lain: `OverTime`, `JobSatisfaction`, `YearsAtCompany`, dan `MonthlyIncome`. Model Random Forest memberikan performa lebih baik dibanding Logistic Regression.