# Pandas Applying Functions

In [6]:
# Importing Libraries
import pandas as pd
from datasets import load_dataset
import matplotlib.pyplot as plt  

# Loading Data
dataset = load_dataset('lukebarousse/data_jobs')
df = dataset['train'].to_pandas()

# Data Cleanup
df['job_posted_date'] = pd.to_datetime(df['job_posted_date'])

  from .autonotebook import tqdm as notebook_tqdm


## Apply

In [7]:
help(df.apply)

Help on method apply in module pandas.core.frame:

apply(func: 'AggFuncType', axis: 'Axis' = 0, raw: 'bool' = False, result_type: "Literal['expand', 'reduce', 'broadcast'] | None" = None, args=(), by_row: "Literal[False, 'compat']" = 'compat', engine: "Literal['python', 'numba']" = 'python', engine_kwargs: 'dict[str, bool] | None' = None, **kwargs) method of pandas.core.frame.DataFrame instance
    Apply a function along an axis of the DataFrame.
    
    Objects passed to the function are Series objects whose index is
    either the DataFrame's index (``axis=0``) or the DataFrame's columns
    (``axis=1``). By default (``result_type=None``), the final return type
    is inferred from the return type of the applied function. Otherwise,
    it depends on the `result_type` argument.
    
    Parameters
    ----------
    func : function
        Function to apply to each column or row.
    axis : {0 or 'index', 1 or 'columns'}, default 0
        Axis along which the function is applied:
 

### Notes

* `apply()`: Apply functions to columns or rows.

### Example 1

Calculate projected salaries next year, using an assumed rate of 3.0% for all roles.

In [8]:
def inflation(salary):
    return salary * 1.03

df["salary_year_inflated"] = df["salary_year_avg"].apply(inflation)
df[pd.notna(df["salary_year_avg"])][["salary_year_avg", "salary_year_inflated"]]


Unnamed: 0,salary_year_avg,salary_year_inflated
28,109500.0,112785.00
77,140000.0,144200.00
92,120000.0,123600.00
100,228222.0,235068.66
109,89000.0,91670.00
...,...,...
785624,139216.0,143392.48
785641,150000.0,154500.00
785648,221875.0,228531.25
785682,157500.0,162225.00


We can actually simplify this with a lambda function.

In [9]:
df["salary_year_inflated"] = df["salary_year_avg"].apply(lambda salary : salary * 1.03)

df[pd.notna(df["salary_year_avg"])][["salary_year_avg", "salary_year_inflated"]]

Unnamed: 0,salary_year_avg,salary_year_inflated
28,109500.0,112785.00
77,140000.0,144200.00
92,120000.0,123600.00
100,228222.0,235068.66
109,89000.0,91670.00
...,...,...
785624,139216.0,143392.48
785641,150000.0,154500.00
785648,221875.0,228531.25
785682,157500.0,162225.00


Now technically this could have been done like this... 

In [10]:
df["salary_year_inflated"] = df["salary_year_avg"]*1.03
df[pd.notna(df["salary_year_avg"])][["salary_year_avg","salary_year_inflated"]]

Unnamed: 0,salary_year_avg,salary_year_inflated
28,109500.0,112785.00
77,140000.0,144200.00
92,120000.0,123600.00
100,228222.0,235068.66
109,89000.0,91670.00
...,...,...
785624,139216.0,143392.48
785641,150000.0,154500.00
785648,221875.0,228531.25
785682,157500.0,162225.00


### Example 2

Calculate projected salaries next year, but:
- For senior roles (e.g., Senior Data Analysts), assume the rate is 5%
- For all other roles, assume rate is 3%

In [11]:
def projected_salary(row):
    if "Senior" in row["job_title_short"]:
        return 1.05 * row["salary_year_avg"]
    else:
        return 1.03 * row["salary_year_avg"]

df["salary_year_inflated"] = df.apply(projected_salary, axis=1)
df[pd.notna(df["salary_year_avg"])][["job_title_short","salary_year_avg", "salary_year_inflated"]]

Unnamed: 0,job_title_short,salary_year_avg,salary_year_inflated
28,Data Scientist,109500.0,112785.00
77,Data Engineer,140000.0,144200.00
92,Data Engineer,120000.0,123600.00
100,Data Scientist,228222.0,235068.66
109,Data Analyst,89000.0,91670.00
...,...,...,...
785624,Data Engineer,139216.0,143392.48
785641,Data Engineer,150000.0,154500.00
785648,Data Scientist,221875.0,228531.25
785682,Data Scientist,157500.0,162225.00


Technically you could write this with a lambda function:

In [12]:
df["salary_year_inflated"] = df.apply(lambda row:1.05 * row["salary_year_avg"] if "Senior" in row["job_title_short"] else 1.03 * row["salary_year_avg"], axis=1)
df[pd.notna(df["salary_year_avg"])][["job_title_short","salary_year_avg","salary_year_inflated"]]

Unnamed: 0,job_title_short,salary_year_avg,salary_year_inflated
28,Data Scientist,109500.0,112785.00
77,Data Engineer,140000.0,144200.00
92,Data Engineer,120000.0,123600.00
100,Data Scientist,228222.0,235068.66
109,Data Analyst,89000.0,91670.00
...,...,...,...
785624,Data Engineer,139216.0,143392.48
785641,Data Engineer,150000.0,154500.00
785648,Data Scientist,221875.0,228531.25
785682,Data Scientist,157500.0,162225.00


### Example 3

Convert the `job_skills` from a generic object to an actual list object (*hint* this is very important for later). Let's try doing that by just using `ast.literal_eval` and then look at our new column.

A reminder of what our `job_skills` column looks like now:

In [13]:
df["job_skills"]

0                                                      None
1         ['r', 'python', 'sql', 'nosql', 'power bi', 't...
2         ['python', 'sql', 'c#', 'azure', 'airflow', 'd...
3         ['python', 'c++', 'java', 'matlab', 'aws', 'te...
4         ['bash', 'python', 'oracle', 'aws', 'ansible',...
                                ...                        
785736    ['bash', 'python', 'perl', 'linux', 'unix', 'k...
785737                       ['sas', 'sas', 'sql', 'excel']
785738                              ['powerpoint', 'excel']
785739    ['python', 'go', 'nosql', 'sql', 'mongo', 'she...
785740                                      ['aws', 'flow']
Name: job_skills, Length: 785741, dtype: object

In [14]:
df['job_skills'][0]

In [15]:
type(df['job_skills'][0])

NoneType

Let's look at the `literal_eval()` function from the Python Standard Library `ast` module.

In [17]:
import ast
import pandas as pd

def parse_skills(val):
    if pd.isna(val):
        return []  # Boşsa boş liste döndür
    try:
        return ast.literal_eval(val)  # Formatlıysa çevir
    except:
        return []  # Hatalıysa boş liste döndür

df["parsed_skills"] = df["job_skills"].apply(parse_skills)


In [18]:
df["parsed_skills"].head()


0                                                   []
1           [r, python, sql, nosql, power bi, tableau]
2    [python, sql, c#, azure, airflow, dax, docker,...
3    [python, c++, java, matlab, aws, tensorflow, k...
4    [bash, python, oracle, aws, ansible, puppet, j...
Name: parsed_skills, dtype: object

In [16]:
import ast

ast.literal_eval(df['job_skills'][0])

ValueError: malformed node or string: None

In [None]:
type(ast.literal_eval(df['job_skills'][0]))

ValueError: malformed node or string: None

🪲 **Debugging**

**This is an intentional mistake**

This is used to demonstrate debugging.

Error: This will return an error because in the `job_skills` column has NaN values.

Steps to Debug:

1. Look at the actual error, can you tell what the problem is?
2. If not, then look it up:
  1. Use a chatbot like ChatGPT or Claude
  2. Look it up using Google

In [None]:
# Returns an error 

# Convert string representation to actual list
df['job_skills'] = df['job_skills'].apply(ast.literal_eval)

df.head()

ValueError: malformed node or string: None

Since we have nan values lets adjust our code to add in a condition to check if the value is not NaN. 
* If it's not NaN it returns `True` and applies `ast.literal_eval()` function on it. 
* if it's a Nan value then it returns `False` and the NaN value doesn't change. 

In [None]:
import ast

# Convert string representation to actual list, checking for NaN values first
df['job_skills'] = df['job_skills'].apply(lambda x: ast.literal_eval(x) if pd.notna(x) else x)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [None]:
df["job_skills"]

0                                                      None
1                [r, python, sql, nosql, power bi, tableau]
2         [python, sql, c#, azure, airflow, dax, docker,...
3         [python, c++, java, matlab, aws, tensorflow, k...
4         [bash, python, oracle, aws, ansible, puppet, j...
                                ...                        
785736    [bash, python, perl, linux, unix, kubernetes, ...
785737                               [sas, sas, sql, excel]
785738                                  [powerpoint, excel]
785739    [python, go, nosql, sql, mongo, shell, mysql, ...
785740                                          [aws, flow]
Name: job_skills, Length: 785741, dtype: object

Pandas'ın apply() fonksiyonu, bir fonksiyonu bir DataFrame'in bir veya daha fazla eksenine uygulamak için güçlü bir araçtır. Tek bir sütuna uygulandığında, apply() sütunun her bir öğesi üzerinde yineleme yaparak belirtilen işlevi uygular.

### ✅ 1. Temel Bilgi: Nedir "Applying Functions"?

Pandas’ta bazı verileri hücre hücre, satır satır veya sütun sütun işlemek istiyorsan, apply(), map(), applymap() gibi fonksiyonları kullanırsın.

### 🧠 Kullanılan Fonksiyonlar:

| Fonksiyon      | Nerde Kullanılır?           | Ne İşe Yarar?|
|-------------|---------------------|---------------------------------------|
|map()        | Yalnızca Series        | Tek bir sütundaki değerlere işlev uygular|
|apply()       | Hem Series hem DataFrame       | Satır ya da sütunlara fonksiyon uygular|
|applymap()       | Yalnızca DataFrame       | Tüm hücrelere (hücre hücre) fonksiyon uygular|


In [None]:
import pandas as pd

data = {
    'Salary': [1000, 1500, 2000],
    'Bonus': [200, 300, 400]
}

df = pd.DataFrame(data)
print(df)


   Salary  Bonus
0    1000    200
1    1500    300
2    2000    400


In [None]:
df["Bonus"] = df["Bonus"].apply(lambda x : x + 50)
df

Unnamed: 0,Salary,Bonus
0,1000,350
1,1500,450
2,2000,550
