## Topic: Apply Function in DF

### OUTCOMES

- 1. Introduction of Apply()

- 2. Apply() in lambda function

- 3. Apply() in Custom Function

- 4. Apply() in Entire DataFarme

### 1. Introduction of Apply()

- The apply() function allows to apply any custom function(user-defined) to each element, row, column of Pandas DataFrame or Series.

- short:
    - apply() pass a input to a function and return the function output.

- Syntax:
    - df.apply(user-define_function, axis = 0)

    - user_define_function => Custom function
    - axis = 0 => column wise apply
    - axis = 1 => row-wise apply

In [1]:
import pandas as pd

In [3]:
df = pd.read_csv("student_data.csv")
df.head(4)

Unnamed: 0,StudentID,FullName,Data Structure Marks,Algorithm Marks,Python Marks,CompletionStatus,EnrollmentDate,Instructor,Location
0,PH1001,Alif Rahman,85.0,85.0,88.0,Completed,2024-01-15,Mr. Karim,Dhaka
1,PH1002,Fatima Akhter,92.0,92.0,,In Progress,2024-01-20,Ms. Salma,Chattogram
2,PH1003,Imran Hossain,88.0,88.0,85.0,Completed,2024-02-10,Mr. Karim,Dhaka
3,PH1004,Jannatul Ferdous,78.0,78.0,82.0,Completed,2024-02-12,Ms. Salma,Sylhet


### 2. Apply() in lambda function

In [None]:
# apply on Python marks column to add 2 marks

add_2_python = df['Python Marks'].apply(lambda x: x + 2)

add_2_python

0     90.0
1      NaN
2     87.0
3     84.0
4     97.0
5     80.0
6      NaN
7     87.0
8     78.0
9     90.0
10    93.0
11    89.0
12    81.0
13     NaN
14    86.0
15     NaN
16    95.0
17    87.0
18     NaN
19    91.0
Name: Python Marks, dtype: float64

#### apply total Marks column for min-max scaling

- min-max scaling means the values in a range like (0-1)

- formula of min-max scaling:

    - scaling = (x - min_val)/(max_val - min_val)

In [10]:
df['Total Marks'] = df.iloc[::,2:5].sum(axis = 1)

df.head(3)

Unnamed: 0,StudentID,FullName,Data Structure Marks,Algorithm Marks,Python Marks,CompletionStatus,EnrollmentDate,Instructor,Location,Total Marks
0,PH1001,Alif Rahman,85.0,85.0,88.0,Completed,2024-01-15,Mr. Karim,Dhaka,258.0
1,PH1002,Fatima Akhter,92.0,92.0,,In Progress,2024-01-20,Ms. Salma,Chattogram,184.0
2,PH1003,Imran Hossain,88.0,88.0,85.0,Completed,2024-02-10,Mr. Karim,Dhaka,261.0


In [None]:
# now apply min-max scaling

min_val = df['Total Marks'].min()
max_val = df['Total Marks'].max()


df['Scaled Marks'] = df['Total Marks'].apply(lambda x: (x-min_val)/(max_val - min_val))


df.head(3)


Unnamed: 0,StudentID,FullName,Data Structure Marks,Algorithm Marks,Python Marks,CompletionStatus,EnrollmentDate,Instructor,Location,Total Marks,Scaled Marks
0,PH1001,Alif Rahman,85.0,85.0,88.0,Completed,2024-01-15,Mr. Karim,Dhaka,258.0,0.945055
1,PH1002,Fatima Akhter,92.0,92.0,,In Progress,2024-01-20,Ms. Salma,Chattogram,184.0,0.673993
2,PH1003,Imran Hossain,88.0,88.0,85.0,Completed,2024-02-10,Mr. Karim,Dhaka,261.0,0.956044


### 3. Apply() in Custom Function on column

In [None]:
# apply() use in custom function on a column(Series)

def grading_system(mark):
    if mark >= 260:
        return 'A+'
    elif mark >= 250:
        return 'A'
    else:
        return 'A-'



df['Grade'] = df['Total Marks'].apply(grading_system)

df.head(3)

Unnamed: 0,StudentID,FullName,Data Structure Marks,Algorithm Marks,Python Marks,CompletionStatus,EnrollmentDate,Instructor,Location,Total Marks,Scaled Marks,Grade
0,PH1001,Alif Rahman,85.0,85.0,88.0,Completed,2024-01-15,Mr. Karim,Dhaka,258.0,0.945055,A
1,PH1002,Fatima Akhter,92.0,92.0,,In Progress,2024-01-20,Ms. Salma,Chattogram,184.0,0.673993,A-
2,PH1003,Imran Hossain,88.0,88.0,85.0,Completed,2024-02-10,Mr. Karim,Dhaka,261.0,0.956044,A+


### 4. Apply() in Entire DataFarme

In [15]:
# apply() use in user-define function on entire df

def marking_system(df):
    a = df['Data Structure Marks'] * 2

    b = df['Algorithm Marks'] * 3

    c = df['Python Marks'] * 4

    return a + b + c



df['Exception Marks'] = df.apply(marking_system, axis=1)

# axis = 1 (row-wise)

df

Unnamed: 0,StudentID,FullName,Data Structure Marks,Algorithm Marks,Python Marks,CompletionStatus,EnrollmentDate,Instructor,Location,Total Marks,Scaled Marks,Grade,Exception Marks
0,PH1001,Alif Rahman,85.0,85.0,88.0,Completed,2024-01-15,Mr. Karim,Dhaka,258.0,0.945055,A,777.0
1,PH1002,Fatima Akhter,92.0,92.0,,In Progress,2024-01-20,Ms. Salma,Chattogram,184.0,0.673993,A-,
2,PH1003,Imran Hossain,88.0,88.0,85.0,Completed,2024-02-10,Mr. Karim,Dhaka,261.0,0.956044,A+,780.0
3,PH1004,Jannatul Ferdous,78.0,78.0,82.0,Completed,2024-02-12,Ms. Salma,Sylhet,238.0,0.871795,A-,718.0
4,PH1005,Kamal Uddin,,,95.0,In Progress,2024-03-05,Mr. Karim,Chattogram,95.0,0.347985,A-,
5,PH1006,Laila Begum,75.0,75.0,78.0,Completed,2024-03-08,Ms. Salma,Rajshahi,228.0,0.835165,A-,687.0
6,PH1007,Mahmudul Hasan,80.0,80.0,,In Progress,2024-04-01,Mr. Karim,Dhaka,160.0,0.586081,A-,
7,PH1008,Nadia Islam,81.0,81.0,85.0,Completed,2024-04-22,Ms. Salma,Chattogram,247.0,0.904762,A-,745.0
8,PH1009,Omar Faruq,72.0,72.0,76.0,Completed,2024-05-16,Mr. David,Dhaka,220.0,0.805861,A-,664.0
9,PH1010,Priya Sharma,89.0,89.0,88.0,Completed,2024-05-20,Ms. Salma,Sylhet,266.0,0.974359,A+,797.0


### Mini Project: Employee Performance Data Analysis

In [25]:

employees = [
    {"ID": 101, "Name": "Alice Smith", "Dept": "IT", "Salary": 85000, "Experience": 5, "Location": "New York"},
    {"ID": 102, "Name": "Bob Jones", "Dept": "HR", "Salary": 54000, "Experience": 3, "Location": "California"},
    {"ID": 103, "Name": "Charlie Brown", "Dept": "IT", "Salary": 95000, "Experience": 8, "Location": "New York"},
    {"ID": 104, "Name": "David Clark", "Dept": "Finance", "Salary": 72000, "Experience": 6, "Location": "Texas"},
    {"ID": 105, "Name": "Emma Wilson", "Dept": "IT", "Salary": 99000, "Experience": 9, "Location": "California"},
    {"ID": 106, "Name": "Frank Taylor", "Dept": "HR", "Salary": 58000, "Experience": 4, "Location": "Texas"},
    {"ID": 107, "Name": "Grace Lee", "Dept": "Finance", "Salary": 70000, "Experience": 5, "Location": "New York"},
    {"ID": 108, "Name": "Hannah Scott", "Dept": "IT", "Salary": 102000, "Experience": 10, "Location": "California"},
    {"ID": 109, "Name": "Ian White", "Dept": "HR", "Salary": 56000, "Experience": 2, "Location": "Texas"},
    {"ID": 110, "Name": "Julia Adams", "Dept": "Finance", "Salary": 76000, "Experience": 7, "Location": "New York"}
]

df = pd.DataFrame(employees)
df


Unnamed: 0,ID,Name,Dept,Salary,Experience,Location
0,101,Alice Smith,IT,85000,5,New York
1,102,Bob Jones,HR,54000,3,California
2,103,Charlie Brown,IT,95000,8,New York
3,104,David Clark,Finance,72000,6,Texas
4,105,Emma Wilson,IT,99000,9,California
5,106,Frank Taylor,HR,58000,4,Texas
6,107,Grace Lee,Finance,70000,5,New York
7,108,Hannah Scott,IT,102000,10,California
8,109,Ian White,HR,56000,2,Texas
9,110,Julia Adams,Finance,76000,7,New York


#### Task 1 — Increase salary by 10% (0.1) for all employees

In [26]:
df['Increment_Salary'] = df['Salary'].apply(lambda x: x*0.1)

df.head()

Unnamed: 0,ID,Name,Dept,Salary,Experience,Location,Increment_Salary
0,101,Alice Smith,IT,85000,5,New York,8500.0
1,102,Bob Jones,HR,54000,3,California,5400.0
2,103,Charlie Brown,IT,95000,8,New York,9500.0
3,104,David Clark,Finance,72000,6,Texas,7200.0
4,105,Emma Wilson,IT,99000,9,California,9900.0


In [27]:
increse_salary = df['Salary'].apply(lambda x: x*0.1)

df['New_Salary'] = df['Salary'] + increse_salary

In [28]:
df.head(3)

Unnamed: 0,ID,Name,Dept,Salary,Experience,Location,Increment_Salary,New_Salary
0,101,Alice Smith,IT,85000,5,New York,8500.0,93500.0
1,102,Bob Jones,HR,54000,3,California,5400.0,59400.0
2,103,Charlie Brown,IT,95000,8,New York,9500.0,104500.0


#### Task 2 - Categorize employees by experience

In [30]:
def experience_level(year):
    if year >= 8:
        return "Senior"
    elif year >= 5:
        return "Mid-Level"
    else:
        return "Junior"


df['Level'] = df['Experience'].apply(experience_level)

df.head(3)

Unnamed: 0,ID,Name,Dept,Salary,Experience,Location,Increment_Salary,New_Salary,Level
0,101,Alice Smith,IT,85000,5,New York,8500.0,93500.0,Mid-Level
1,102,Bob Jones,HR,54000,3,California,5400.0,59400.0,Junior
2,103,Charlie Brown,IT,95000,8,New York,9500.0,104500.0,Senior


#### Task 3 - Calculate yearly tax (assume 15% or 0.15) 

In [32]:
df['Yearly_Tax'] = df['Salary'].apply(lambda x: x*0.15)

df.head()

Unnamed: 0,ID,Name,Dept,Salary,Experience,Location,Increment_Salary,New_Salary,Level,Yearly_Tax
0,101,Alice Smith,IT,85000,5,New York,8500.0,93500.0,Mid-Level,12750.0
1,102,Bob Jones,HR,54000,3,California,5400.0,59400.0,Junior,8100.0
2,103,Charlie Brown,IT,95000,8,New York,9500.0,104500.0,Senior,14250.0
3,104,David Clark,Finance,72000,6,Texas,7200.0,79200.0,Mid-Level,10800.0
4,105,Emma Wilson,IT,99000,9,California,9900.0,108900.0,Senior,14850.0


<!-- #### Task 4 — Create a "Full_Info" column (combine name and dept) -->

#### Task 5 — Flag employees with salary > 90K as "High Earner"

In [None]:
def higer_ear(salary):
    if salary > 90000:
        return True
    else:
        return False

df['High Earner'] = df['Salary'].apply(higer_ear)
df.head()

# df['High Earner'] = df['Salary'].apply(lambda x: True if x>90000 else False)

Unnamed: 0,ID,Name,Dept,Salary,Experience,Location,Increment_Salary,New_Salary,Level,Yearly_Tax,High Earner
0,101,Alice Smith,IT,85000,5,New York,8500.0,93500.0,Mid-Level,12750.0,False
1,102,Bob Jones,HR,54000,3,California,5400.0,59400.0,Junior,8100.0,False
2,103,Charlie Brown,IT,95000,8,New York,9500.0,104500.0,Senior,14250.0,True
3,104,David Clark,Finance,72000,6,Texas,7200.0,79200.0,Mid-Level,10800.0,False
4,105,Emma Wilson,IT,99000,9,California,9900.0,108900.0,Senior,14850.0,True


#### Task 6 — Create a new column "Bonus"

- If experience  7 -> bonus = 5000

- Else -> bonus = 2000

In [41]:
def bonus(df):
    if (df['Experience'] > 7):
        return 5000
    else:
        return 2000



df['Bonus'] = df.apply(bonus,axis = 1)
df.head()

Unnamed: 0,ID,Name,Dept,Salary,Experience,Location,Increment_Salary,New_Salary,Level,Yearly_Tax,High Earner,Bonus
0,101,Alice Smith,IT,85000,5,New York,8500.0,93500.0,Mid-Level,12750.0,False,2000
1,102,Bob Jones,HR,54000,3,California,5400.0,59400.0,Junior,8100.0,False,2000
2,103,Charlie Brown,IT,95000,8,New York,9500.0,104500.0,Senior,14250.0,True,5000
3,104,David Clark,Finance,72000,6,Texas,7200.0,79200.0,Mid-Level,10800.0,False,2000
4,105,Emma Wilson,IT,99000,9,California,9900.0,108900.0,Senior,14850.0,True,5000


#### Task 7 — Calculate total compensation (salary + bonus – tax)

In [43]:
def calculate_total(df):
    add = df['Salary'] + df['Bonus']

    cal_total = add - df['Yearly_Tax']

    return cal_total


df['Calculate_Total'] = df.apply(calculate_total,axis = 1)

df.head(5)

Unnamed: 0,ID,Name,Dept,Salary,Experience,Location,Increment_Salary,New_Salary,Level,Yearly_Tax,High Earner,Bonus,Calculate_Total
0,101,Alice Smith,IT,85000,5,New York,8500.0,93500.0,Mid-Level,12750.0,False,2000,74250.0
1,102,Bob Jones,HR,54000,3,California,5400.0,59400.0,Junior,8100.0,False,2000,47900.0
2,103,Charlie Brown,IT,95000,8,New York,9500.0,104500.0,Senior,14250.0,True,5000,85750.0
3,104,David Clark,Finance,72000,6,Texas,7200.0,79200.0,Mid-Level,10800.0,False,2000,63200.0
4,105,Emma Wilson,IT,99000,9,California,9900.0,108900.0,Senior,14850.0,True,5000,89150.0


#### Task 8 — Normalize Salary (scale between 0 and 1)

In [50]:
# min max scale formula 
#   scale = ( x - min_val) / (max_val - min_val)

min_val = df['Salary'].min()
max_val = df['Salary'].max()


df['Scale'] = df['Salary'].apply(lambda x: (x - min_val)/(max_val - min_val))

df.head()


Unnamed: 0,ID,Name,Dept,Salary,Experience,Location,Increment_Salary,New_Salary,Level,Yearly_Tax,High Earner,Bonus,Calculate_Total,Scale
0,101,Alice Smith,IT,85000,5,New York,8500.0,93500.0,Mid-Level,12750.0,False,2000,74250.0,0.645833
1,102,Bob Jones,HR,54000,3,California,5400.0,59400.0,Junior,8100.0,False,2000,47900.0,0.0
2,103,Charlie Brown,IT,95000,8,New York,9500.0,104500.0,Senior,14250.0,True,5000,85750.0,0.854167
3,104,David Clark,Finance,72000,6,Texas,7200.0,79200.0,Mid-Level,10800.0,False,2000,63200.0,0.375
4,105,Emma Wilson,IT,99000,9,California,9900.0,108900.0,Senior,14850.0,True,5000,89150.0,0.9375


#### Task 9 — Create a performance label

- If salary > 90000 and experience > 7 → "Excellent"

- Else if salary > 70000 → "Good"

- Else → "Average"

In [51]:
def label(df):

    if ((df['Salary'] > 90000) & (df['Experience'] > 7) ):
        return 'Excellent'

    elif (df['Salary'] > 70000):
        return 'Good'

    else:
        return 'Average'


df["performance"] = df.apply(label, axis = 1)

df

Unnamed: 0,ID,Name,Dept,Salary,Experience,Location,Increment_Salary,New_Salary,Level,Yearly_Tax,High Earner,Bonus,Calculate_Total,Scale,performance
0,101,Alice Smith,IT,85000,5,New York,8500.0,93500.0,Mid-Level,12750.0,False,2000,74250.0,0.645833,Good
1,102,Bob Jones,HR,54000,3,California,5400.0,59400.0,Junior,8100.0,False,2000,47900.0,0.0,Average
2,103,Charlie Brown,IT,95000,8,New York,9500.0,104500.0,Senior,14250.0,True,5000,85750.0,0.854167,Excellent
3,104,David Clark,Finance,72000,6,Texas,7200.0,79200.0,Mid-Level,10800.0,False,2000,63200.0,0.375,Good
4,105,Emma Wilson,IT,99000,9,California,9900.0,108900.0,Senior,14850.0,True,5000,89150.0,0.9375,Excellent
5,106,Frank Taylor,HR,58000,4,Texas,5800.0,63800.0,Junior,8700.0,False,2000,51300.0,0.083333,Average
6,107,Grace Lee,Finance,70000,5,New York,7000.0,77000.0,Mid-Level,10500.0,False,2000,61500.0,0.333333,Average
7,108,Hannah Scott,IT,102000,10,California,10200.0,112200.0,Senior,15300.0,True,5000,91700.0,1.0,Excellent
8,109,Ian White,HR,56000,2,Texas,5600.0,61600.0,Junior,8400.0,False,2000,49600.0,0.041667,Average
9,110,Julia Adams,Finance,76000,7,New York,7600.0,83600.0,Mid-Level,11400.0,False,2000,66600.0,0.458333,Good
