## Merging Multiple CSV Files

This Jupyter Notebook merge multiple CSV files into a single CSV file. It also performs a safety check to verify that the row count in the output CSV matches the sum of the row counts in the input CSV files. After the merge is complete, it removes the input CSV files.

1. **Get a List of CSV Files**: The code lists all the files in the input folder and filters for those with a ".csv" extension.

2. **Initialize an Empty List**: An empty list is initialized to store DataFrames.

3. **Iterate through CSV Files**: The code iterates through each CSV file, reads its data, and appends the data to the list of DataFrames.

4. **Save the Combined DataFrame**: The merged DataFrame is saved as a new CSV file named "audio_annotations.csv."

5. **Row Count Verification**: The code calculates the row count in the input CSV files and compares it to the row count in the merged CSV. An assertion check is performed to ensure they match.

6. **Remove Input CSV Files**: Finally, the code removes the original input CSV files to clean up the folder.

In [4]:
import os
import pandas as pd

In [5]:
ROOT_PATH = "../../../desarrollo/"

# Load the CSV file
input_folder = ROOT_PATH + "Data/Annotations"

# Path to the folder where you want to save the CSV files
output_file = ROOT_PATH + "Data/Annotations/" + "b01_audio_annotations.csv"

In [6]:
# Get a list of CSV files in the input folder
csv_files = [file for file in os.listdir(input_folder) if file.endswith(".csv")]

# Initialize an empty list to store DataFrames
dataframes = []

# Iterate through each CSV file, read its data, and append it to the list
for csv_file in csv_files:
    file_path = os.path.join(input_folder, csv_file)
    data = pd.read_csv(file_path)
    dataframes.append(data)

# Concatenate the DataFrames into a single DataFrame
combined_df = pd.concat(dataframes, ignore_index=True)

# Save the combined DataFrame to a single CSV file
combined_df.to_csv(output_file, index=False)

# Ensure the row count in the output CSV matches the sum of row counts in input CSVs
row_count_input = sum([pd.read_csv(os.path.join(input_folder, file)).shape[0] for file in csv_files])
row_count_output = combined_df.shape[0]

# Perform a safety check
assert row_count_output == row_count_input, "Row count mismatch between input and output CSVs."

# Remove the input CSV files
for csv_file in csv_files:
    file_path = os.path.join(input_folder, csv_file)
    os.remove(file_path)