# Applying Functions in Pandas
This notebook demonstrates how to use Python functions with Pandas, including the `apply` method, to manipulate and transform data.

## Import Required Libraries
We start by importing the necessary libraries.

In [65]:
import pandas as pd

## Load Data
Load the dataset containing client information.

In [66]:
df = pd.read_csv("../data/clients.csv")
df

Unnamed: 0,idCliente,flEmail,flTwitch,flYouTube,flBlueSky,flInstagram,qtdePontos,dtCriacao,dtAtualizacao
0,000ff655-fa9f-4baa-a108-47f581ec52a1,0,0,0,0,0,686,,
1,001749bd-37b5-4b1e-8111-f9fbba90f530,0,0,0,0,0,50,,
2,0019bb9e-26d4-4ebf-8727-fc911ea28a92,0,0,0,0,0,2,,
3,0033b737-8235-4c0f-9801-dc4ca185af00,0,1,0,0,0,1090,0000-00-00 00:00:00.000,2025-02-19 12:48:24.632
4,00684343-40b5-4ce7-b2e8-71a5340973bf,0,0,0,0,0,0,,
...,...,...,...,...,...,...,...,...,...
2431,fecbdf63-3bf4-44e5-8b1a-0acc9d963603,0,0,0,0,0,111,,
2432,ff07d926-f09e-420b-bebf-3dba02ae5dff,0,1,0,0,0,54,2025-01-21 11:49:58.172,2025-02-28 11:55:04.999
2433,ff1ceaef-650c-422b-bdc3-6984e29e7aa5,0,0,0,0,0,162,,
2434,ff2cabd3-3316-4b3f-8494-c25f95e90524,0,1,0,0,0,57,2025-02-10 11:12:30.631,2025-02-10 12:37:47.892


## Define a Function
Define a function to extract the last part of a hyphen-separated string.

In [67]:
def get_last_id(id):
    """
    Extracts the last part of a hyphen-separated string.
    Args:
        id (str): A string containing hyphen-separated values.
    Returns:
        str: The last part of the input string after the last hyphen.
    """
    return id.split("-")[-1]

## Apply Function Without Pandas
Use a loop to apply the function to each value in the `idCliente` column.

In [68]:
new_id = []
for i in df["idCliente"]:
    new = get_last_id(i)
    new_id.append(new)

## Apply Function Using Pandas
Use the `apply` method to apply the function directly to the `idCliente` column.

In [69]:
df["idCliente"].apply(get_last_id)

0       47f581ec52a1
1       f9fbba90f530
2       fc911ea28a92
3       dc4ca185af00
4       71a5340973bf
            ...     
2431    0acc9d963603
2432    3dba02ae5dff
2433    6984e29e7aa5
2434    c25f95e90524
2435    0fbd76e37857
Name: idCliente, Length: 2436, dtype: object

## Load Another Dataset
Load a dataset containing information about Brazilian states.

In [70]:
df_uf = pd.read_csv("../data/ufs_brazil.csv", sep=";")
df_uf.head(1)

Unnamed: 0,Bandeira,Unidade federativa,Abreviação,Sede de governo,Área (km²),População (Censo 2022),Densidade (2005),PIB (2015),(% total) (2015),PIB per capita (R$) (2015),IDH (2010),Alfabetização (2016),Mortalidade infantil (2016),Expectativa de vida (2016)
0,,Acre,AC,Rio Branco,"164 122,2",830 018,430,13 622 000,2,"16 953,46",663,"86,9%","17,0‰","73,9 anos"


## Convert Strings to Floats
Define a function to clean and convert formatted strings into float values.

In [71]:
def str_to_float(number:str):
    """
    Converts a formatted string representing a number into a float.
    Args:
        number (str): The input string containing a number with formatting, 
                      such as spaces, commas, or non-breaking spaces.
    Returns:
        float: The numeric value as a float after removing formatting.
    """
    number = (number.replace(" ", "")
                   .replace(",", ".")
                   .replace("\xa0", "")
                   .replace("anos", ""))
    return float(number)

## Apply String-to-Float Conversion
Apply the `str_to_float` function to multiple columns in the dataset.

In [72]:
df_uf["Área (km²)"] = df_uf["Área (km²)"].apply(str_to_float)
df_uf["População (Censo 2022)"] = df_uf["População (Censo 2022)"].apply(str_to_float)
df_uf["PIB (2015)"] = df_uf["PIB (2015)"].apply(str_to_float)
df_uf["PIB per capita (R$) (2015)"] = df_uf["PIB per capita (R$) (2015)"].apply(str_to_float)
df_uf["Expectativa de vida (2016)"] = df_uf["Expectativa de vida (2016)"]
# Apply all transformations together to avoid type errors.

## Convert Mortality Rates
Define a function to clean and convert mortality rates into float values.

In [73]:
def mortality_to_float(mor:str):
    mor = float(mor.replace("‰","")
           .replace(",","."))
    return mor

## Apply Mortality Conversion
Apply the `mortality_to_float` function to the mortality column.

In [74]:
df_uf["Mortalidade infantil (2016) por mil"] = df_uf["Mortalidade infantil (2016)"].apply(mortality_to_float)

## Define Classification Function
Define a function to classify rows based on specific criteria.

In [75]:
def classification(row):
    return (
        row["PIB per capita (R$) (2015)"] > 30000 and
        row["Mortalidade infantil (2016) por mil"] < 15 and
        row["IDH (2010)"] > 700
    )

## Apply Classification Function
Apply the `classification` function to classify rows in the dataset.

In [76]:
df_uf.apply(classification, axis=1)

0     False
1     False
2     False
3     False
4     False
5     False
6      True
7      True
8     False
9     False
10    False
11     True
12    False
13    False
14    False
15     True
16    False
17    False
18     True
19    False
20     True
21    False
22    False
23     True
24     True
25    False
26    False
dtype: bool