# Merge CSV Files

This Jupyter Notebook is designed to add more annotations to  CSV base files.

The new annotations contain data related to the date and time of bird songs recordings. The goal is to create new CSV files that store this information in a structured format.

## Process Overview

1. The notebook will recursively search for CSV annotation files within a specified data directory.

2. For each CSV file found, it will extract relevant information, including the audio file name, start time, end time, and bird species.

3. The extracted data will be organized into a structured DataFrame.

4. New interesing data will be created and added to each audio file as date and time of the recording.

5. The data will be saved as a CSV file with a name matching the original TXT file in the "Data/Annotations" directory.

In [1]:
import os
import pandas as pd

In [2]:
ROOT_PATH = "../"

# Load the CSV file
input_folder = ROOT_PATH + "Data/Annotations/"

# Path to the folder where you want to save the CSV files
output_file = ROOT_PATH + "Data/Annotations/" + "audio_annotations.csv"

In [4]:
# Get a list of CSV files in the input folder
csv_files = [file for file in os.listdir(input_folder) if file.endswith(".csv")]

# Initialize an empty list to store DataFrames
dataframes = []

# Iterate through each CSV file, read its data, and append it to the list
for csv_file in csv_files:
    file_path = os.path.join(input_folder, csv_file)
    data = pd.read_csv(file_path)
    dataframes.append(data)

# Concatenate the DataFrames into a single DataFrame
combined_df = pd.concat(dataframes, ignore_index=True)

# Save the combined DataFrame to a single CSV file
combined_df.to_csv(output_file, index=False)

# Ensure the row count in the output CSV matches the sum of row counts in input CSVs
row_count_input = sum([pd.read_csv(os.path.join(input_folder, file)).shape[0] for file in csv_files])
row_count_output = combined_df.shape[0]

# Perform a safety check
assert row_count_output == row_count_input, "Row count mismatch between input and output CSVs."

# Remove the input CSV files
for csv_file in csv_files:
    file_path = os.path.join(input_folder, csv_file)
    os.remove(file_path)