# Python Mystery Case – Employee Investigation

The HR team of our startup suspects strange activity in the employee records:  

- Inactive employees are still getting salaries.  
- Interns might be overpaid.  
- Some departments look suspicious.  

**Your mission:** Solve the mystery step by step using Python and find out if these suspicions are true or false.

**Dataset:** python_activity.csv

**Columns:** id, name, age, role, salary, department, status

## Load Dataset

In [None]:
import pandas as pd

from google.colab import drive
drive.mount('/content/drive')

df = pd.read_csv('/content/drive/MyDrive/python_activity.csv')
df.head()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Unnamed: 0,id,name,age,role,salary,department,status
0,101,Seema,30,Manager,59424,Analytics,active
1,102,Imran,22,Software Engineer,85111,Design,active
2,103,Hina,44,Product Manager,81704,HR,active
3,104,Mahnoor,25,Designer,66056,Finance,active
4,105,Muneeb,22,QA Engineer,36472,Design,inactive


## Part 1: Warm-up

**Question 1:** Show the names of every employee in the company.

In [None]:
print(df['name'])

0        Seema
1        Imran
2         Hina
3      Mahnoor
4       Muneeb
        ...   
95        Nida
96    Sarfaraz
97       Farah
98       Zakir
99       Sadia
Name: name, Length: 100, dtype: object


In [None]:
for name in df['name']:
  print(name)

Seema
Imran
Hina
Mahnoor
Muneeb
Kamran
Yasir
Rehan
Kiran
Farhan
Sumbal
Arif
Bushra
Noor
Sana
Hassan
Omar
Shahzad
Amna
Nadia
Omar
Rizwan
Laiba
Javed
Fatima
Amber
Usman
Waleed
Eman
Rabia
Lubna
Fahad
Asad
Noman
Khalid
Mahnoor
Anam
Iqra
Saad
Shazia
Danish
Sadia
Hamza
Kiran
Mariam
Shazia
Noor
Ali
Sumbal
Tariq
Adeel
Shireen
Haris
Sadia
Raza
Mehak
Zahid
Sahil
Kinza
Talha
Sara
Arsalan
Nimra
Zeeshan
Fiza
Jibran
Nashit
Hiba
Murtaza
Asiya
Imtiaz
Shabana
Adil
Sameer
Tehmina
Faizan
Saher
Bilquis
Zunair
Areesha
Raheel
Huma
Affan
Mishal
Tariq
Haleema
Farrukh
Shamim
Zoya
Irfan
Jaweria
Saif
Parveen
Tanveer
Haris
Nida
Sarfaraz
Farah
Zakir
Sadia


**Question 2:** Find out how many employees are tracked in the dataset.

In [None]:
print(len(df))

100


**Question 3:** Take a quick preview of the first 5 employee records to see how the data looks.

In [None]:
print(df.head())

    id     name  age               role  salary department    status
0  101    Seema   30            Manager   59424  Analytics    active
1  102    Imran   22  Software Engineer   85111     Design    active
2  103     Hina   44    Product Manager   81704         HR    active
3  104  Mahnoor   25           Designer   66056    Finance    active
4  105   Muneeb   22        QA Engineer   36472     Design  inactive


## Part 2: Data Filtering & Conditions

**Question 4:** Identify all the interns.

In [None]:
print(df[df['role'] == 'Intern'])

     id     name  age    role  salary department    status
11  112     Arif   35  Intern   27003       Tech  inactive
16  117     Omar   24  Intern   29369       Tech    active
22  123    Laiba   22  Intern   29645  Analytics    active
28  129     Eman   21  Intern   26645         HR    active
35  136  Mahnoor   23  Intern   28431     Design  inactive
39  140   Shazia   21  Intern   27682         HR    active
47  148      Ali   27  Intern   29982       Tech    active
51  152  Shireen   24  Intern   28745  Analytics    active
55  156    Mehak   22  Intern   26218     Design    active
60  161     Sara   23  Intern   29402       Tech  inactive
67  168     Hiba   21  Intern   27109         HR    active
73  174   Sameer   23  Intern   26831    Finance    active
82  183    Affan   22  Intern   25893  Analytics    active
87  188   Shamim   25  Intern   28401         HR    active
92  193  Parveen   23  Intern   27511     Design    active
97  198    Farah   21  Intern   26391       Tech    acti

**Question 5:** Find out which employees are making more than 85,000.

In [None]:
print(df[df['salary'] > 85000].sort_values(by = 'salary', ascending = False))

     id     name  age               role  salary department    status
40  141   Danish   42    Product Manager   99111    Finance  inactive
75  176   Faizan   35    Product Manager   98712       Tech    active
20  121     Omar   41    Product Manager   98641       Tech    active
31  132    Fahad   44    Product Manager   98231       Tech  inactive
68  169  Murtaza   40    Product Manager   97210       Tech    active
83  184   Mishal   43    Product Manager   96511       Tech    active
10  111   Sumbal   29    Product Manager   96013  Analytics    active
26  127    Usman   43       Finance Lead   95320    Finance  inactive
58  159    Kinza   31    Product Manager   95128    Finance    active
29  130    Rabia   38            Manager   94730       Tech    active
90  191  Jaweria   30    Product Manager   94520    Finance    active
38  139     Saad   35            Manager   94129     Design    active
45  146   Shazia   39            Manager   93824    Finance    active
52  153    Haris   4

**Question 6:** List the employees who are currently inactive in the company.

In [None]:
print(df[df['status'] == 'inactive'])

     id     name  age               role  salary department    status
4   105   Muneeb   22        QA Engineer   36472     Design  inactive
6   107    Yasir   41           Designer   80476       Tech  inactive
11  112     Arif   35             Intern   27003       Tech  inactive
13  114     Noor   23           Designer   62118     Design  inactive
18  119     Amna   34         HR Officer   49725         HR  inactive
23  124    Javed   45       Data Analyst   72593  Analytics  inactive
26  127    Usman   43       Finance Lead   95320    Finance  inactive
31  132    Fahad   44    Product Manager   98231       Tech  inactive
35  136  Mahnoor   23             Intern   28431     Design  inactive
40  141   Danish   42    Product Manager   99111    Finance  inactive
43  144    Kiran   31        QA Engineer   46583     Design  inactive
49  150    Tariq   45       Data Analyst   75284  Analytics  inactive
53  154    Sadia   27         HR Officer   52109         HR  inactive
60  161     Sara   2

**Question 7:** Investigate the dataset to see which employees are younger than 25 years old.

In [None]:
print(df[df['age'] < 25])

     id     name  age               role  salary department    status
1   102    Imran   22  Software Engineer   85111     Design    active
4   105   Muneeb   22        QA Engineer   36472     Design  inactive
7   108    Rehan   23       Data Analyst   42298    Finance    active
13  114     Noor   23           Designer   62118     Design  inactive
16  117     Omar   24             Intern   29369       Tech    active
22  123    Laiba   22             Intern   29645  Analytics    active
28  129     Eman   21             Intern   26645         HR    active
35  136  Mahnoor   23             Intern   28431     Design  inactive
39  140   Shazia   21             Intern   27682         HR    active
51  152  Shireen   24             Intern   28745  Analytics    active
55  156    Mehak   22             Intern   26218     Design    active
60  161     Sara   23             Intern   29402       Tech  inactive
67  168     Hiba   21             Intern   27109         HR    active
73  174   Sameer   2

**Question 8:** From all the departments, filter out the names of employees working in Tech.

In [None]:
print(df[df['department'] == 'Tech']['name'])

6       Yasir
11       Arif
16       Omar
17    Shahzad
20       Omar
24     Fatima
27     Waleed
29      Rabia
31      Fahad
36       Anam
42      Hamza
47        Ali
48     Sumbal
52      Haris
57      Sahil
60       Sara
63    Zeeshan
68    Murtaza
71    Shabana
75     Faizan
79    Areesha
80     Raheel
83     Mishal
89      Irfan
93    Tanveer
97      Farah
98      Zakir
Name: name, dtype: object


## Part 3: Loops & Aggregation

**Question 9:** Loop through the dataset and display the managers.

In [None]:
for i, row in df.iterrows():
  if row['role'] == 'Manager':
    print(i, row['name'])

0 Seema
12 Bushra
19 Nadia
29 Rabia
38 Saad
45 Shazia
52 Haris
61 Arsalan
69 Asiya
77 Bilquis
84 Tariq
91 Saif


In [None]:
print(df[df['role'] == 'Manager']['name'])

0       Seema
12     Bushra
19      Nadia
29      Rabia
38       Saad
45     Shazia
52      Haris
61    Arsalan
69      Asiya
77    Bilquis
84      Tariq
91       Saif
Name: name, dtype: object


**Question 10:** Count how many employees work in each department.

In [None]:
print(df['department'].value_counts())

department
Tech         27
Finance      22
Design       19
Analytics    18
HR           14
Name: count, dtype: int64


**Question 11:** Calculate the average salary of all employees in the company.

In [None]:
print(df['salary'].mean())

67559.47


**Question 12:** Detect the oldest employee in the company and display their details.

In [None]:
print(df.loc[df['age'].idxmax()])

id                     124
name                 Javed
age                     45
role          Data Analyst
salary               72593
department       Analytics
status            inactive
Name: 23, dtype: object


## Part 4: Functions

**Question 13:** Write a function `avg_salary(dept)` that returns the average salary for a given department.

In [None]:
def avg_salary(dept):
  subset = df[df['department'] == dept]
  return subset['salary'].mean() if len(subset) > 0 else 0
print(avg_salary("HR"))

50330.92857142857


**Question 14:** Write a function `list_by_role(role)` that returns a list of employees for a given role.

In [None]:
def list_by_role(role):
  return df[df['role'] == role]['name'].tolist()
print(list_by_role("Software Engineer"))

['Imran', 'Kiran', 'Fatima', 'Waleed', 'Anam', 'Hamza', 'Sumbal', 'Sahil', 'Zeeshan', 'Shabana', 'Raheel', 'Irfan', 'Zakir']


**Question 15:** Write a function `find_anomalies()` that identifies interns earning more than 30,000 (possible anomalies).

In [None]:
def find_anomalies():
  return df[(df['role'] == 'Intern') & (df['salary'] > 30000)]
print(find_anomalies())

Empty DataFrame
Columns: [id, name, age, role, salary, department, status]
Index: []
