# **Manpower Planning**
## **Using Competency Assessment Method with Gurobi Framework**

_by: TK-Bunga Matahari Team_

---

# **Manpower Planning Objectives and Process**

## *The Objectives*

1. Optimizing the Use of Human Resources
2. Minimizing Recruitment Costs
3. Meeting Future Workforce Needs
4. Maintaining an Adequate Workforce

## *The Process*

1. Determine The Company’s Targets and Goals
	- How does a company optimize the workforce with the initial fund of $10.000?
2. Assess Current Workforce
	- Distribution of employees in these work units
	- Distribution of employees that have high/low skill value
	- Distribution of under-qualified and overqualified employees
3.
	- Distribution of  Forecast Future Needs
	- Workload Analysis
	- Workforce Analysis
	- Trend Analysis
4. Gap Analysis
	- Assessing the current status of current workforce and determining where company want to be in the future


# 0. The Obligatory Part


In [1]:
# Aggregate skill gaps for each skill
skill_gaps = (
    merged_data[[col for col in merged_data.columns if "_gap" in col]]
    .sum()
    .reset_index()
)
skill_gaps.columns = ["Skill", "Total_Gap"]
skill_gaps = skill_gaps.sort_values(by="Total_Gap", ascending=False)

skill_gaps

NameError: name 'merged_data' is not defined

iterate to all employee that have gap_skill value in range 1-2 in `most_needed_skill`

In [None]:
most_needed_skill = skill_gaps[skill_gaps['Total_Gap'] > 0]
most_needed_skill

In [None]:
# show employee that has the most needed skill

In [None]:
import ast
import numpy as np
import pandas as pd
import seaborn as sns
# import ace_tools as tools
import matplotlib.pyplot as plt
from gurobipy import Model, GRB, quicksum

# 1. Define the Data Structure

## 1.1. Employee and Task Data

In [None]:
# Run this if the data in Local/Repository
new_employee_path = "./data/fixed_data_employee.csv"
new_task_path = "./data/fixed_data_task.csv"

In [None]:
# Read data
employee_skills_df = pd.read_csv(new_employee_path, index_col='No')
# employee_skills_df.drop(columns=['no'], inplace=True, errors='ignore')

employees = employee_skills_df.index.tolist()
skills_name = employee_skills_df.columns[2:].tolist()

employee_skills_df

In [None]:
# Read task data
task_skills_df = pd.read_csv(new_task_path, index_col='task_id')

tasks = task_skills_df.index.tolist()

task_skills_df

## 1.2. Optimization Output Data

### 1.2.1. Assigned Task Output

In [None]:
result_moo = pd.read_csv('./output_VM/3_gap_0.025/result_5_MOO_2.csv')
result_moo.head()

### 1.2.2 Assessment Score Output

In [None]:
score = pd.read_csv('./output_VM/3_gap_0.025/score.csv', index_col=[0])
score

## 1.3. Estimated Salary Data

In [None]:
# salary_job_df = pd.read_csv('./data/linreg_salary_job.csv')
# salary_job_df.head()

In [None]:
# job_roles = salary_job_df['job_role'].unique()
# levels = {
#     'junior': salary_job_df['junior'],
#     'middle': salary_job_df['middle'],
#     'senior': salary_job_df['senior']
# }
# levels

## 1.4. Company Targets

In [None]:
initial_fund = 10000 # USD

# 2. Assess Current Workforce

Using Exploratory Data Analysis (EDA), we can analyze the distribution and the insight of our current data and optimization result

## 2.1. The Distribution of Employees respect to Role

To see how much the employees we have by Role

In [None]:
# Plot the horizontal bar chart
ax = employee_skills_df[['Role','employee_id']].groupby('Role', as_index=False).count().sort_values('employee_id',ascending=True)\
.plot(kind='barh', x = 'Role')

# Annotate the values on each bar
for index, value in enumerate(employee_skills_df[['Role','employee_id']].groupby('Role', as_index=False).count().sort_values('employee_id',ascending=True)\
['employee_id']):
    ax.text(value, index, str(value))

plt.show()

## 2.2. The Distribution Respect to Good Skill Value (3-5)

Value 3-5 can we consider that employees have a good skill in that competency

In [None]:
for skill in skills_name:
	j_good_skill = round(len(employee_skills_df[(employee_skills_df[skill] <= 5) & (employee_skills_df[skill] >= 3)]) / len(employees) * 100, 2)
	print(f"{skill}: {j_good_skill}%")

It is interesting to explore more on certain parts tailored to the role (example: MLOPS - AI, does the percentage of each MLOPS skill match the amount of AI available, or is there a significant difference?)

### 2.2.1. Data Analyst Distribution

The skills mastered by 50% of the population of each role will be explored.

In [None]:
# Data Analyst
print(f"Total Data Analyst percentage in population: {len(employee_skills_df[employee_skills_df['Role']=='Data Analyst'])/len(employee_skills_df)*100}%")
for skill in skills_name:
  if round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                   (employee_skills_df[skill]>=3)&
                  (employee_skills_df['Role']=='Data Analyst')])/112*100,2) >= (42*0.6):
    print(skill,': ',f"""{round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                    (employee_skills_df[skill]>=3)&
                    (employee_skills_df['Role']=='Data Analyst')])/112*100,2)}%""")

dari skill-skill yang dikuasai oleh kurang lebih 60% Data Analyst:
- Data Analyst menguasai beberapa topic: Statistics & Probabilities (3/6), Data Structures & Algorithms (4/7), Econometrics, Data Analysis, and Data Visualization (5/8), Relational DB.
- Dari topic Statistics & Probabilities, banyak talent kita yang menguasai Statistics, sementara lebih sedikit untuk skill Probability & Sampling dan Hypothesis Testing (mungkin dikarenakan use case di telkom belum mengerjakan A/B Testing)
- Dari topic Data Structures & Algorithms, talent kita sangat menguasai SQL, Data Structures, Programming, dan Algorithms.
- Dari topic Econometrics, Data Analysis, and Data Visualization, talent kita sangat menguasai Data Preprocessing & EDA, Data Viz & Storytelling, Regression Analysis, Time Series Analysis, and Correlation Analysis.
- Dari topic Data & Cloud Engineering, talent kita hanya menguasai Relational DB.

In [None]:
# Create a list to store the percentages of each skill
skill_da_percentages = []

for skill in skills_name:
  skill_percentage = round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                   (employee_skills_df[skill]>=3)&
                   (employee_skills_df['Role']=='Data Analyst')])/112*100,2)
  
  if skill_percentage >= (42*0.6):
    # print(skill,': ',f"{skill_percentage}%")
    skill_da_percentages.append(skill_percentage)

# Create a distribution plot of the skill percentages
sns.set_style('whitegrid')
plt.figure(figsize=(10,6))
sns.kdeplot(skill_da_percentages, shade=True, color='skyblue')
plt.axvline(x=np.mean(skill_da_percentages), color='red', linestyle='--')
plt.axvline(x=np.median(skill_da_percentages), color='green', linestyle='--')
plt.legend(['Mean', 'Median'])


# Add labels and title
plt.xlabel('Skill Percentage', size=13)
plt.ylabel('Density', size=13)
plt.title('Distribution of Skill Percentages for Data Analyst', size=15)
plt.show()

### 2.2.2. Data Scientist Distribution

In [None]:
# Data Scientist
print(f"Total Data Scientist percentage in population: {len(employee_skills_df[employee_skills_df['Role']=='Data Scientist'])/len(employee_skills_df)*100}%")
for skill in skills_name:
  if round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                   (employee_skills_df[skill]>=3)&
                  (employee_skills_df['Role']=='Data Scientist')])/112*100,2) >= (35*0.6):
    print(skill,': ',f"""{round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                    (employee_skills_df[skill]>=3)&
                    (employee_skills_df['Role']=='Data Scientist')])/112*100,2)}%""")

dari skill-skill yang dikuasai oleh kurang lebih 60% Data Scientist:
- Data scientist hampir menguasai semua topic kecuali MLOps.
- Dari topic **Mathematics**, hampir semua skill dikuasai talent dengan **Combinatorics & Graph** menjadi skill yang paling sedikit dibanding skill lain.
- Dari topic **Statistics & Probabilities**, hampir semua menguasai **Statistics** dan **Probability & Sampling**.
- Dari topic **Data Structures & Algorithms**, talent kita sangat menguasai SQL, Data Structures, Programming, dan Algorithms.
- Dari topic Econometrics, Data Analysis, and Data Visualization, talent kita sangat menguasai Data Preprocessing & EDA, Data Viz & Storytelling, Regression Analysis, Time Series Analysis, and Correlation Analysis.
- Dari topic Data & Cloud Engineering, talent kita hanya menguasai Relational DB.

In [None]:
# make distribution plot for Data Scientist
skill_ds_percentages = []

for skill in skills_name:
    skill_percentage = round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                     (employee_skills_df[skill]>=3)&
                     (employee_skills_df['Role']=='Data Scientist')])/112*100,2)
    
    if skill_percentage >= (35*0.6):
        # print(skill,': ',f"{skill_percentage}%")
        skill_ds_percentages.append(skill_percentage)

# Create a distribution plot of the skill percentages
sns.set_style('whitegrid')
plt.figure(figsize=(10,6))
sns.kdeplot(skill_ds_percentages, shade=True, color='skyblue')
plt.axvline(x=np.mean(skill_ds_percentages), color='red', linestyle='--')
plt.axvline(x=np.median(skill_ds_percentages), color='green', linestyle='--')
plt.legend(['Mean', 'Median'])

# Add labels and title
plt.xlabel('Skill Percentage', size=13)
plt.ylabel('Density', size=13)
plt.title('Distribution of Skill Percentages for Data Scientist', size=15)
plt.show()

### 2.2.3. Data Engineer Distribution

In [None]:
# Data Engineer
print(f"Total Data Engineer percentage in population: {len(employee_skills_df[employee_skills_df['Role']=='Data Engineer'])/len(employee_skills_df)*100}%")
for skill in skills_name:
  if round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                   (employee_skills_df[skill]>=3)&
                  (employee_skills_df['Role']=='Data Engineer')])/112*100,2) >= 7:
    print(skill,': ',f"""{round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                    (employee_skills_df[skill]>=3)&
                    (employee_skills_df['Role']=='Data Engineer')])/112*100,2)}%""")

In [None]:
# make distribution plot for Data Engineer
skill_de_percentages = []

for skill in skills_name:
    skill_percentage = round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                     (employee_skills_df[skill]>=3)&
                     (employee_skills_df['Role']=='Data Engineer')])/112*100,2)
    
    if skill_percentage >= 4:
        # print(skill,': ',f"{skill_percentage}%")
        skill_de_percentages.append(skill_percentage)

# Create a distribution plot of the skill percentages
sns.set_style('whitegrid')
plt.figure(figsize=(10,6))
sns.kdeplot(skill_de_percentages, shade=True, color='skyblue')
plt.axvline(x=np.mean(skill_de_percentages), color='red', linestyle='--')
plt.axvline(x=np.median(skill_de_percentages), color='green', linestyle='--')
plt.legend(['Mean', 'Median'])

# Add labels and title
plt.xlabel('Skill Percentage', size=13)
plt.ylabel('Density', size=13)
plt.title('Distribution of Skill Percentages for Data Engineer', size=15)
plt.show()

### 2.2.4. Artificial Intelligence Distribution

In [None]:
# AI Engineer
print(f"Total Artificial Intelligence Engineer percentage in population: {len(employee_skills_df[employee_skills_df['Role']=='Artificial Intelligence Engineer'])/len(employee_skills_df)*100}%")
for skill in skills_name:
  if round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                   (employee_skills_df[skill]>=3)&
                  (employee_skills_df['Role']=='Artificial Intelligence Engineer')])/112*100,2) >= 4:
    print(skill,': ',f"""{round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                    (employee_skills_df[skill]>=3)&
                    (employee_skills_df['Role']=='Artificial Intelligence Engineer')])/112*100,2)}%""")

In [None]:
# make distribution plot for Artificial Intelligence
skill_ai_percentages = []

for skill in skills_name:
    skill_percentage = round(len(employee_skills_df[(employee_skills_df[skill]<=5)&
                     (employee_skills_df[skill]>=3)&
                     (employee_skills_df['Role']=='Artificial Intelligence Engineer')])/112*100,2)
    
    if skill_percentage >= 4:
        # print(skill,': ',f"{skill_percentage}%")
        skill_ai_percentages.append(skill_percentage)

# Create a distribution plot of the skill percentages
sns.set_style('whitegrid')
plt.figure(figsize=(10,6))
sns.kdeplot(skill_ai_percentages, shade=True, color='skyblue')
plt.axvline(x=np.mean(skill_ai_percentages), color='red', linestyle='--')
plt.axvline(x=np.median(skill_ai_percentages), color='green', linestyle='--')
plt.legend(['Distribution', 'Mean', 'Median'])

# Add labels and title
plt.xlabel('Skill Percentage', size=13)
plt.ylabel('Density', size=13)
plt.title('Distribution of Skill Percentages for Artificial Intelligence Engineers', size=15)
plt.show()

## 2.3. The Distribution of Employees that Over and Under Qualified the Task



In [None]:
result_moo

In [None]:
result = {}
for idx, row in result_moo.iterrows():
    employee = row['employee']
    result[employee] = {        
        'company': ast.literal_eval(row['company']),
        'assigned_task': ast.literal_eval(row['assigned_task']),
        'sum_sp': row['sum_sp'],
        'wasted_sp': row['wasted_sp'],
        'assessment_score': ast.literal_eval(row['assessment_score'])
    }

In [None]:
result

In [None]:
# Show employee and task in dict that has positive assessment score

overqualified = {}

for j, val in result.items():
    for task, score in zip(val['assigned_task'], val['assessment_score']):
        if score >= 0:
            overqualified[j] = (task, score)
            print(f'{j}: {task} with score = {score}')	

In [None]:
print(f"Over-qualified Employees: {len(overqualified)}")

In [None]:
# Show employee and task in dict that has negative assessment score

underqualified = {}

for j, val in result.items():
    for task, score in zip(val['assigned_task'], val['assessment_score']):
        if score < 0:
            underqualified[j] = (task, score)
            print(f'{j}: {task} with score = {score}')	

In [None]:
print(f"Under-qualified Employees: {len(underqualified)}")

In [None]:
# show assessment score in underqualified in descending order
sorted(underqualified.items(), key=lambda x: x[1][1])

In [None]:
over_set = set(overqualified.keys())
under_set = set(underqualified.keys())
intersection = over_set.intersection(under_set)

In [None]:
intersection

In [None]:
over_set = over_set - intersection
under_set = under_set - intersection
print(len(over_set))
print(len(under_set))

In [None]:
over_set

In [None]:
under_set

## 2.4. Over and Underqualified Employees by Role

In [None]:
employee_index_df = pd.read_csv(new_employee_path, index_col='employee_id').drop(columns=['No'])
employee_index_df.head()

In [None]:
roles = set(employee_index_df['Role'])

role_over_qualified = {}

for role in roles:
    temp = []
    for j in employee_index_df.index:
        if j in over_set and employee_index_df.loc[j, 'Role'] == role:
            temp.append(j)
    role_over_qualified[role] = temp

role_over_qualified

In [None]:
role_under_qualified = {}

for role in roles:
    temp = []
    for j in employee_index_df.index:
        if j in under_set and employee_index_df.loc[j, 'Role'] == role:
            temp.append(j)
    role_under_qualified[role] = temp

role_under_qualified['Data Engineer']

In [None]:
role_intersection = {}

for role in roles:
    temp = []
    for j in employee_index_df.index:
        if j in intersection and employee_index_df.loc[j, 'Role'] == role:
            temp.append(j)
    role_intersection[role] = temp

role_intersection

### 2.4.1. Artificial Intelligence Pie Chart

In [None]:
# make 3d pie chart by AI
plt.figure(figsize=(10, 10))
plt.pie([len(role_over_qualified['Artificial Intelligence Engineer']), len(role_under_qualified['Artificial Intelligence Engineer']), len(role_intersection['Artificial Intelligence Engineer'])], labels=['Over-qualified', 'Under-qualified', 'Intersection'], autopct='%1.1f%%')
plt.title('Artificial Intelligence Engineer')
plt.show()

In [None]:
ai_under_task = role_under_qualified["Artificial Intelligence Engineer"]

ai_under_qualified = []
ai_passed = []

for j in score.iterrows():
    if j[0] in ai_under_task:
        if any(j[1] > 0):
            ai_passed.append(j[0])
        else:
            ai_under_qualified.append(j[0])

print(f"AI Engineer employee that Under-Qualified: {ai_under_qualified}")
print(f"AI Engineer employee that Passed the Qalification: {ai_passed}")

In [None]:
score.loc[['Talent 23']]

Unnamed: 0,T85,T254,T253,T146,T27,T214,T158,T112,T10,T238,T287,T155,T126,T78,T128,T273,T205,T8,T171,T19,T1,T290,T258,T220,T299,T217,T279,T32,T133,T246,T168,T297,T169,T244,T180,T18,T206,T60,T296,T225,T288,T13,T119,T53,T134,T76,T114,T204,T182,T295,T59,T163,T104,T110,T108,T280,T237,T79,T292,T99,T57,T235,T136,T242,T272,T113,T247,T270,T75,T72,T73,T84,T111,T213,T50,T234,T184,T294,T103,T54,T41,T150,T249,T9,T95,T215,T74,T232,T148,T156,T89,T259,T12,T70,T277,T140,T36,T138,T11,T233,T194,T55,T123,T196,T152,T211,T131,T87,T86,T43,T183,T231,T255,T178,T38,T263,T127,T284,T23,T137,T56,T106,T283,T269,T42,T51,T58,T132,T223,T2,T200,T187,T219,T159,T186,T118,T285,T208,T21,T228,T188,T199,T62,T175,T63,T212,T124,T218,T105,T22,T179,T101,T120,T274,T172,T185,T154,T256,T157,T16,T93,T96,T77,T14,T224,T15,T202,T33,T275,T278,T250,T115,T30,T7,T173,T161,T121,T145,T245,T252,T122,T286,T240,T90,T216,T198,T236,T193,T221,T195,T222,T48,T151,T65,T162,T6,T260,T248,T130,T20,T139,T176,T191,T97,T29,T44,T109,T107,T264,T92,T39,T147,T266,T153,T45,T144,T291,T37,T167,T197,T268,T35,T25,T61,T289,T135,T49,T271,T160,T24,T26,T189,T276,T68,T5,T28,T293,T4,T149,T129,T80,T298,T177,T201,T282,T227,T209,T257,T31,T281,T261,T267,T226,T207,T98,T300,T71,T210,T243,T251,T142,T143,T174,T262,T125,T67,T166,T192,T40,T116,T46,T181,T170,T100,T83,T230,T3,T69,T203,T165,T239,T241,T94,T17,T34,T229,T66,T265,T47,T81,T52,T117,T102,T82,T64,T88,T190,T141,T164,T91
Talent 23,-0.011954,-0.010015,-0.011486,-0.012066,-0.012058,-0.01222,-0.009832,-0.010889,-0.011512,-0.011312,-0.010064,-0.011765,-0.014547,-0.011105,-0.014224,-0.012634,-0.01279,-0.01379,-0.013526,-0.012786,-0.01289,-0.012507,-0.012783,-0.012903,-0.012267,-0.013526,-0.013933,-0.013339,-0.014703,-0.010839,-0.014066,-0.013077,-0.01341,-0.013721,-0.012774,-0.014632,-0.015181,-0.016302,-0.013899,-0.012887,-0.014162,-0.013733,-0.015976,-0.015821,-0.014359,-0.013347,-0.014043,-0.014647,-0.015583,-0.01378,-0.013968,-0.017343,-0.012987,-0.01627,-0.013686,-0.014601,-0.014024,-0.014694,-0.014209,-0.016144,-0.013202,-0.014037,-0.015588,-0.01291,-0.014167,-0.015296,-0.01376,-0.015969,-0.014447,-0.014386,-0.014948,-0.016035,-0.013041,-0.014811,-0.014681,-0.01625,-0.013291,-0.015029,-0.016255,-0.013598,-0.016093,-0.014352,-0.01454,-0.014825,-0.015679,-0.014292,-0.015664,-0.013636,-0.015064,-0.015962,-0.015829,-0.014662,-0.014888,-0.014716,-0.013546,-0.01461,-0.014943,-0.013277,-0.015292,-0.016007,-0.015293,-0.014575,-0.013886,-0.015385,-0.017015,-0.016783,-0.016149,-0.015289,-0.016667,-0.016135,-0.017297,-0.016397,-0.015934,-0.016211,-0.015976,-0.015594,-0.015851,-0.016346,-0.015728,-0.016503,-0.015562,-0.015473,-0.016398,-0.014551,-0.014748,-0.015385,-0.016325,-0.013857,-0.015923,-0.01458,-0.017183,-0.015965,-0.017766,-0.018269,-0.017189,-0.016154,-0.016765,-0.015973,-0.016674,-0.017738,-0.016272,-0.017934,-0.018403,-0.014998,-0.015967,-0.016018,-0.018139,-0.01727,-0.016371,-0.018165,-0.017688,-0.015083,-0.017824,-0.016497,-0.016463,-0.016068,-0.015577,-0.017239,-0.018064,-0.015292,-0.016864,-0.016575,-0.01666,-0.018281,-0.017137,-0.016683,-0.01732,-0.016745,-0.017135,-0.016127,-0.015851,-0.017535,-0.018174,-0.015101,-0.019448,-0.018011,-0.014761,-0.019499,-0.016007,-0.01804,-0.016642,-0.020192,-0.017466,-0.016727,-0.017274,-0.018324,-0.018191,-0.018405,-0.016604,-0.016392,-0.018174,-0.015742,-0.017183,-0.017009,-0.0163,-0.017296,-0.017065,-0.016041,-0.017411,-0.018733,-0.018586,-0.018648,-0.017341,-0.017788,-0.016054,-0.018234,-0.018681,-0.016932,-0.020027,-0.018075,-0.016933,-0.017529,-0.017123,-0.019358,-0.015487,-0.016449,-0.018675,-0.01767,-0.016783,-0.017148,-0.0178,-0.018444,-0.017033,-0.01697,-0.018803,-0.017705,-0.018497,-0.017697,-0.016941,-0.018424,-0.019894,-0.018713,-0.019452,-0.018207,-0.01907,-0.017238,-0.019108,-0.01836,-0.019254,-0.017805,-0.020482,-0.017981,-0.018654,-0.017066,-0.018259,-0.017818,-0.019077,-0.020044,-0.01848,-0.018916,-0.019092,-0.017824,-0.017241,-0.019349,-0.019433,-0.017065,-0.019926,-0.019991,-0.018424,-0.019946,-0.020063,-0.018971,-0.019555,-0.018205,-0.017934,-0.022078,-0.019685,-0.01978,-0.019,-0.019281,-0.018462,-0.019005,-0.020198,-0.021282,-0.018552,-0.019602,-0.020414,-0.019005,-0.019838,-0.020132,-0.020295,-0.019754,-0.021477,-0.019465,-0.020138,-0.021384,-0.019895,-0.02149,-0.021057,-0.019523,-0.019301,-0.020681,-0.02139,-0.021702,-0.021628,-0.022131,-0.019868,-0.02235,-0.022023,-0.022126


### 2.4.2. Data Analyst

In [None]:
plt.figure(figsize=(10, 10))
plt.pie([len(role_over_qualified['Data Analyst']), len(role_under_qualified['Data Analyst']), len(role_intersection['Data Analyst'])], labels=['Over-qualified', 'Under-qualified', 'Intersection'], autopct='%1.1f%%')
plt.title('Data Analyst')
plt.show()

### 2.4.3. Data Engineer

In [None]:
plt.figure(figsize=(10, 10))
plt.pie([len(role_over_qualified['Data Engineer']), len(role_under_qualified['Data Engineer']), len(role_intersection['Data Engineer'])], labels=['Over-qualified', 'Under-qualified', 'Intersection'], autopct='%1.1f%%')
plt.title('Data Engineer')
plt.show()

### 2.4.2. Data Scientist

In [None]:
plt.figure(figsize=(10, 10))
plt.pie([len(role_over_qualified['Data Scientist']), len(role_under_qualified['Data Scientist']), len(role_intersection['Data Scientist'])], labels=['Over-qualified', 'Under-qualified', 'Intersection'], autopct='%1.1f%%')
plt.title('Data Scientist')
plt.show()

In [None]:
# Show the dict sort by assessment score in ascending order
for j, val in result.items():
    # print(j)
    for task, score in sorted(zip(val['assigned_task'], val['assessment_score']), key=lambda x: x[1]):
        print(f'{j}: {task} with score = {score}')

kalo ada employee yg under, bisa fokusin situ sebagai insight apakah skill yang under memang cocok atau tidak sama employeenya, barangkali yg bikin jadi under karna skill ini.


## 2.5. Gap Analysis (Test)

In [None]:
result_moo["company"] = result_moo["company"].astype(str)
result_moo["assigned_task"] = result_moo["assigned_task"].astype(str)
result_moo["assessment_score"] = result_moo["assessment_score"].astype(str)
result_moo["company"] = result_moo["company"].apply(ast.literal_eval)
result_moo["assigned_task"] = result_moo["assigned_task"].apply(ast.literal_eval)
result_moo["assessment_score"] = result_moo["assessment_score"].apply(ast.literal_eval)

In [None]:
pd.set_option("display.max_columns", None)

# Expand the result optimization data to have one row per employee-task pair
expanded_result = result_moo.explode(["assigned_task", "company", "assessment_score"]).drop(["sum_sp", "wasted_sp"], axis=1)

# Merge with task data
merged_data = expanded_result.merge(
    task_skills_df, left_on="assigned_task", right_on="task_id"
)

# Merge with employee data
merged_data = merged_data.merge(
    employee_index_df, left_on="employee", right_on="employee_id"
)

# Calculate skill gaps (task - employee)
for skill in skills_name:
    merged_data[f"{skill}_gap"] = merged_data[f"{skill}_x"] - merged_data[f"{skill}_y"]

# Display the first few rows of the merged data with skill gaps
merged_data

### 2.5.2. Aggregate Skill Gaps

In [None]:
# show employee that has the most needed skill
dl_gan_gap = merged_data[merged_data['Deep Learning.GAN_gap'] > 0]
dl_gan_gap[['employee', 'assigned_task', 'Deep Learning.GAN_gap']]

In [None]:
# Define the average skill level per new hire
average_skill_level_per_hire = 3

# Calculate the number of employees needed for each skill gap
skill_gaps["Number_of_Hires"] = (
    -skill_gaps["Total_Gap"] / average_skill_level_per_hire
).apply(lambda x: max(1, round(x)))

# Focus on the top skill gaps
top_skill_gaps = skill_gaps.head(10)

# Create job descriptions based on the top skill gaps
job_descriptions = top_skill_gaps[["Skill", "Number_of_Hires"]]

job_descriptions