In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression


data = {
    "Experience": [2, 5, 1, 8, 4, 10, 3, 6, 7, 2],
    "TrainingHours": [40, 60, 20, 80, 50, 90, 30, 70, 75, 25],
    "WorkingHours": [38, 42, 35, 45, 40, 48, 37, 44, 46, 36],
    "Projects": [3, 6, 2, 8, 5, 9, 4, 7, 7, 3],
    "Productivity": [62, 78, 55, 88, 72, 92, 65, 82, 85, 60]
}

df = pd.DataFrame(data)

X = df[["Experience", "TrainingHours", "WorkingHours", "Projects"]]
y = df["Productivity"]

model = LinearRegression()
model.fit(X, y)

coeffs = pd.Series(model.coef_, index=X.columns)
intercept = model.intercept_

print("Regression Equation:")
print(f"Productivity = {intercept:.2f} ", end="")
for feature, coef in coeffs.items():
    sign = "+" if coef >= 0 else "-"
    print(f"{sign} {abs(coef):.2f}*{feature} ", end="")
print("\n")


print("INTERPRETATION\n")

# 1. Strongest factor
strongest_factor = coeffs.abs().idxmax()
print(f"1. Factor which impacts productivity most strongly is '{strongest_factor}',")

# 2. Training effect
if coeffs["TrainingHours"] > 0:
    print("2. Training hours have positive effect on productivity.\n")
else:
    print("2. Training hours have a negative effect on productivity.\n")

# 3. Training vs Working Hours
if coeffs["TrainingHours"] > coeffs["WorkingHours"]:
    print("3. The company should focus more on increasing training hours rather than working hours.")
else:
    print("3. The company should focus more on optimizing working hours rather than training hours.\n")

# 4. Excessive working hours
if coeffs["WorkingHours"] > 0:
    print("4. Productivity increases with working hours.\n")
else:
    print("4. Increasing working hours reduces productivity.\n")

# 5. Experience effect
if coeffs["Experience"] < 0:
    print("5. Productivity may decrease with more experience indicating skill stagnation.\n")
else:
    print("5. Productivity increases with experience.\n")

# 6. Overfitting detection
print("6. Overfitting can be detected by comparing training and testing performance.")
print("   A large gap between training R² and testing R² indicates overfitting.\n")

# 7. Suggested new feature
print("7. Adding 'Job Role Complexity' or 'Skill Level' as a new feature")
print("   could improve prediction accuracy by capturing role-based productivity differences.")


Regression Equation:
Productivity = 17.43 - 0.96*Experience + 0.04*TrainingHours + 0.82*WorkingHours + 4.70*Projects 

INTERPRETATION

1. The factor with the strongest impact on productivity is 'Projects',
   as it has the largest absolute coefficient (4.70).

2. Training hours have a positive effect on productivity.
   Increasing training improves productivity, though the impact is relatively modest.

3. The company should focus more on optimizing working hours than training hours.

4. Productivity increases with working hours up to a certain limit.
   However, excessive working hours may cause fatigue and reduce productivity in practice.

5. Productivity can decrease with more experience when other factors are controlled.
   This may indicate skill stagnation or lack of continuous learning.

6. Overfitting can be detected by comparing training and testing performance.
   A large gap between training R² and testing R² indicates overfitting.

7. Adding 'Job Role Complexity' or 'Skill L