# Create Base CSV Files

This Jupyter Notebook is designed to convert TXT annotation files into CSV format for a bird song classification project. 

The annotations contain data related to the start, end, and species of bird songs within audio files. The goal is to create CSV files that store this information in a structured format.

## Process Overview

1. The notebook will recursively search for TXT annotation files within a specified dataset directory, including all its subdirectories.

2. For each TXT file found, it will extract relevant information, including the audio file name, start time, end time, and bird species.

3. The extracted data will be organized into a structured DataFrame.

4. The data will be saved as a CSV file with a name matching the original TXT file in the "Data/Annotations" directory.

In [1]:
import os
import pandas as pd

In [11]:
!mkdir ../../../desarrollo/Data

In [10]:
# !rm -rf ../../../desarrollo/Data

In [12]:
ROOT_PATH = "../../../desarrollo/"

# Path to the root folder containing the dataset
dataset_path = ROOT_PATH + "Audio_Data/"

# Path to the folder where you want to save the CSV files
output_path = ROOT_PATH + "Data/Annotations"

In [13]:
csv_number = 0

# Iterate through all subdirectories in the dataset path
for root, dirs, files in os.walk(dataset_path):
    # If both new_raven_format_Edu and new_raven_format_Giulia are present, prioritize new_raven_format_Edu
    if 'new_raven_format_Edu' in dirs and 'new_raven_format_Giulia' in dirs:
        dirs.remove('new_raven_format_Giulia')
    for d in dirs:
        if 'new_raven_format_Edu' in d or 'new_raven_format_Giulia' in d:
            subdir_path = os.path.join(root, d)
            for file in os.listdir(subdir_path):
                if file.endswith(".txt"):
                    txt_file_path = os.path.join(subdir_path, file)
                    # Read the TXT file
                    df = pd.read_csv(txt_file_path, sep='\t')
                    # Extract the file name without extension (audio file name)
                    audio_file_name = file.replace('.txt', '').replace(".WAV","")
                    # Generate the path by joining root and audio file name
                    root_path = root.split("Audio_Data/")[1].split("new_raven_format")[0] + "Audios"

                    # Determine annotator if Edu is in the d or Giulia
                    annotator = "Other"
                    if "Edu" in d:
                        annotator = "Edu"
                    elif "Giulia" in d:
                        annotator = "Giulia"

                    df['path'] = os.path.join(root_path, audio_file_name + ".WAV")
                    # Select the desired columns and rename them
                    df = df[['path', 'Begin Time (s)', 'End Time (s)','Low Freq (Hz)', 'High Freq (Hz)', 'species']]
                    df = df.rename(columns={'Begin Time (s)': 'start_time', 'End Time (s)': 'end_time', 'Low Freq (Hz)': 'low_frequency', 'High Freq (Hz)': 'high_frequency', 'species': 'specie'})
                    # Add 'annotator' column
                    df['annotator'] = annotator
                    # Define the CSV file path in the Annotations folder
                    csv_file_name = file.replace('.txt', '.csv').replace(".WAV","")
                    csv_file_path = os.path.join(output_path, csv_file_name)
                    # Create the output directory if it doesn't exist
                    os.makedirs(output_path, exist_ok=True)
                    # Save the selected columns to the CSV file
                    df.to_csv(csv_file_path, index=False)
                    csv_number += 1
                    print(f"Created CSV file: {csv_file_path} from TXT file: {txt_file_path}")

print(f"Created {csv_number} CSV files in the Annotations folder")

Created CSV file: ../../../desarrollo/Data/Annotations/AM10_20230718_082000.csv from TXT file: ../../../desarrollo/Audio_Data/AM10/2023_07_18/Etiquetas/new_raven_format_Giulia_3/AM10_20230718_082000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM10_20230718_105000.csv from TXT file: ../../../desarrollo/Audio_Data/AM10/2023_07_18/Etiquetas/new_raven_format_Giulia_3/AM10_20230718_105000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM10_20230718_103000.csv from TXT file: ../../../desarrollo/Audio_Data/AM10/2023_07_18/Etiquetas/new_raven_format_Giulia_3/AM10_20230718_103000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM10_20230718_090000.csv from TXT file: ../../../desarrollo/Audio_Data/AM10/2023_07_18/Etiquetas/new_raven_format_Giulia_3/AM10_20230718_090000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM10_20230718_100000.csv from TXT file: ../../../desarrollo/Audio_Data/AM10/2023_07_18/Etiquetas/new_raven_format_Giulia_3/AM10_20230

In [5]:
#!rm -rf ../../../desarrollo/Data/Annotations