# Create Base CSV Files

This Jupyter Notebook is designed to convert TXT annotation files into CSV format for a bird song classification project. 

The annotations contain data related to the start, end, and species of bird songs within audio files. The goal is to create CSV files that store this information in a structured format.

## Process Overview

1. The notebook will recursively search for TXT annotation files within a specified dataset directory, including all its subdirectories.

2. For each TXT file found, it will extract relevant information, including the audio file name, start time, end time, and bird species.

3. The extracted data will be organized into a structured DataFrame.

4. The data will be saved as a CSV file with a name matching the original TXT file in the "Data/Annotations" directory.

In [7]:
import os
import pandas as pd

In [8]:
ROOT_PATH = "../../../desarrollo/"

# Path to the root folder containing the dataset
dataset_path = ROOT_PATH + "Audio_Data/"

# Path to the folder where you want to save the CSV files
output_path = ROOT_PATH + "Data/Annotations"

In [9]:
import os
import pandas as pd

# Iterate through all subdirectories in the dataset path
for root, dirs, files in os.walk(dataset_path):
    for file in files:
        if file.endswith(".txt") and 'new_raven_format' in root:
            txt_file_path = os.path.join(root, file)
            
            # Read the TXT file
            df = pd.read_csv(txt_file_path, sep='\t')
            
            # Extract the file name without extension (audio file name)
            audio_file_name = file.replace('.txt', '').replace(".WAV","")
            
            # Generate the path by joining root and audio file name
            root_path = root.split("Audio_Data/")[1].replace("Etiquetas/new_raven_format","Audios")
            # _Edu if there are from various users and the one it is if there are from one user
            df['path'] = os.path.join(root_path, audio_file_name + ".WAV")
            
            # Select the desired columns and rename them
            df = df[['path', 'Begin Time (s)', 'End Time (s)','Low Freq (Hz)', 'High Freq (Hz)', 'species']]
            df = df.rename(columns={'Begin Time (s)': 'start_time', 'End Time (s)': 'end_time', 'Low Freq (Hz)': 'low_frequency', 'High Freq (Hz)': 'high_frequency', 'species': 'specie'})
            
            # Define the CSV file path in the Annotations folder
            csv_file_name = file.replace('.txt', '.csv').replace(".WAV","")
            csv_file_path = os.path.join(output_path, csv_file_name)
            
            # Create the output directory if it doesn't exist
            os.makedirs(output_path, exist_ok=True)
            
            # Save the selected columns to the CSV file
            df.to_csv(csv_file_path, index=False)
            
            print(f"Created CSV file: {csv_file_path} from TXT file: {txt_file_path}")

Created CSV file: ../../../desarrollo/Data/Annotations/AM1_20230515_100000.csv from TXT file: ../../../desarrollo/Audio_Data/AM1/2023_05_15/Etiquetas/new_raven_format/AM1_20230515_100000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM1_20230515_103000.csv from TXT file: ../../../desarrollo/Audio_Data/AM1/2023_05_15/Etiquetas/new_raven_format/AM1_20230515_103000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM1_20230515_090000.csv from TXT file: ../../../desarrollo/Audio_Data/AM1/2023_05_15/Etiquetas/new_raven_format/AM1_20230515_090000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM1_20230515_060000.csv from TXT file: ../../../desarrollo/Audio_Data/AM1/2023_05_15/Etiquetas/new_raven_format/AM1_20230515_060000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM1_20230515_073000.csv from TXT file: ../../../desarrollo/Audio_Data/AM1/2023_05_15/Etiquetas/new_raven_format/AM1_20230515_073000.txt
Created CSV file: ../../../desarrollo/Data/An

Created CSV file: ../../../desarrollo/Data/Annotations/AM4_20230531_081000.csv from TXT file: ../../../desarrollo/Audio_Data/AM4/2023_05_31/Etiquetas/new_raven_format/AM4_20230531_081000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM4_20230531_101000.csv from TXT file: ../../../desarrollo/Audio_Data/AM4/2023_05_31/Etiquetas/new_raven_format/AM4_20230531_101000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM4_20230531_062000.csv from TXT file: ../../../desarrollo/Audio_Data/AM4/2023_05_31/Etiquetas/new_raven_format/AM4_20230531_062000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM4_20230531_104000.csv from TXT file: ../../../desarrollo/Audio_Data/AM4/2023_05_31/Etiquetas/new_raven_format/AM4_20230531_104000.txt
Created CSV file: ../../../desarrollo/Data/Annotations/AM4_20230531_060000.csv from TXT file: ../../../desarrollo/Audio_Data/AM4/2023_05_31/Etiquetas/new_raven_format/AM4_20230531_060000.txt
Created CSV file: ../../../desarrollo/Data/An