# Chapter 8: Functions

## 8.7) Processing DataFrames

Functions are extremely useful for data processing tasks. In this example, we will define a function that filters a DataFrame based on multiple criteria: state (`estado`), year range (`ano_inicio` to `ano_fim`), and macro-regions (`macrorregoes`).

In [None]:
import pandas as pd

# Creating a sample DataFrame for demonstration
data = {
    'estado': ['SP', 'RJ', 'MG', 'BA', 'AM', 'SP', 'RS'],
    'ano': [2020, 2021, 2019, 2022, 2020, 2023, 2021],
    'macrorregiao': ['Sudeste', 'Sudeste', 'Sudeste', 'Nordeste', 'Norte', 'Sudeste', 'Sul'],
    'valor': [100, 200, 150, 120, 180, 210, 190]
}

df_exemplo = pd.DataFrame(data)
print("Original DataFrame:")
print(df_exemplo)

### Defining the Function

The function `tratamentos_dfs` takes a DataFrame and filtering parameters. It returns the filtered DataFrame.

In [None]:
def tratamentos_dfs(df, estado: str, ano_inicio: int, ano_fim: int, macrorregoes: list = None) -> pd.DataFrame:
    """
    Filters the DataFrame based on state, year range, and optional macro-regions.
    
    Parameters:
    df (pd.DataFrame): The input DataFrame.
    estado (str): The state code to filter by.
    ano_inicio (int): The start year.
    ano_fim (int): The end year.
    macrorregoes (list, optional): List of macro-regions to include.
    
    Returns:
    pd.DataFrame: The filtered DataFrame.
    """
    # Filter by State
    if 'estado' in df.columns:
        df_filtered = df[df['estado'] == estado]
    else:
        print("Column 'estado' not found.")
        return df

    # Filter by Year Range
    if 'ano' in df.columns:
        df_filtered = df_filtered[(df_filtered['ano'] >= ano_inicio) & (df_filtered['ano'] <= ano_fim)]
    
    # Filter by Macro-regions if provided
    if macrorregoes and 'macrorregiao' in df_filtered.columns:
        df_filtered = df_filtered[df_filtered['macrorregiao'].isin(macrorregoes)]
        
    return df_filtered

### Using the Function

Now let's use the function to filter for 'SP' between 2020 and 2023.

In [None]:
# Calling the function
df_resultado = tratamentos_dfs(df_exemplo, estado='SP', ano_inicio=2020, ano_fim=2023, macrorregoes=['Sudeste'])

print("\nFiltered DataFrame:")
print(df_resultado)