# Data Concatenation in Pandas
This notebook demonstrates how to concatenate DataFrames in Pandas using different configurations.

## Importing Required Libraries
We start by importing the necessary libraries for data manipulation and file handling.

In [None]:
import pandas as pd
import os

## Creating Sample DataFrames
Below, we create three sample DataFrames to demonstrate concatenation.

In [None]:
# First DataFrame with client IDs and names
df = pd.DataFrame(
    {
        "cliente": [1, 2, 3, 4, 5],
        "nome": ["mar", "nani", "mon", "stacy", "tap"]
    }
)

In [None]:
# Second DataFrame with additional client IDs and names
df_02 = pd.DataFrame(
    {
        "cliente": [6, 7, 8],
        "nome": ["frida", "frodo", "faisca"]
    }
)

In [None]:
# Third DataFrame with ages
df_03 = pd.DataFrame(
    {
        "idade": [2, 2, 1, 1, 4, 8, 8, 3]
    }
)

## Vertical Concatenation
Concatenating `df` and `df_02` vertically (row-wise) while resetting the index.

In [None]:
# Concatenate df and df_02 row-wise
pd.concat([df, df_02], ignore_index=True)

## Horizontal Concatenation
Concatenating `df` and `df_03` horizontally (column-wise).

In [None]:
# Combine df and df_03 into a list for concatenation
dfs = [df, df_03]

In [None]:
# Concatenate df and df_03 column-wise
pd.concat(dfs, axis=1)

## Sorting and Resetting Index
Sorting `df_03` by the `idade` column and resetting its index for better alignment.

In [None]:
# Sort df_03 by age and reset the index
df_03 = df_03.sort_values(by="idade").reset_index(drop=True)
df_03

In [None]:
# Concatenate df and the sorted df_03 column-wise
pd.concat([df, df_03], axis=1)

## Reading and Processing Files
Define a function to read and process CSV files from a directory.

In [None]:
# Function to read and process a CSV file
def read_file(file_name: str):
    df = (pd.read_csv(f"../data/ipea/{file_name}.csv", sep=";")
          .rename(columns={"valor": file_name})
          .set_index(["nome", "período"])
          .drop(["cod"], axis=1))
    return df

In [None]:
# Example: Reading a specific file
df_negros = read_file("homicidios-negros")
df_negros

## Processing Multiple Files
Iterate through all files in the directory and process them using the `read_file` function.

In [None]:
# List all files in the directory
file_names = os.listdir("../data/ipea/")

# Process each file and store the resulting DataFrames in a list
dfs_ipea = []
for file in file_names:
    file_name = file.split(".")[0]
    dfs_ipea.append(read_file(file_name))

In [None]:
# Accessing one of the processed DataFrames
dfs_ipea[-3]

## Consolidating Data
Concatenate all processed DataFrames horizontally and sort the result.

In [None]:
# Concatenate all DataFrames and sort by period and name
df_full = pd.concat(dfs_ipea, axis=1).reset_index().sort_values(["período", "nome"])

## Exporting the Consolidated Data
Save the consolidated DataFrame to a CSV file for further analysis.

In [None]:
# Save the consolidated DataFrame to a CSV file
df_full.to_csv("../data/ipea/homicidios-consolidados.csv", index=False, sep=";")