## Calculating Audio Durations and Updating CSV

This Jupyter Notebook calculates the durations of audio files and updates a CSV file. The CSV file, named `audio_annotations.csv`, is expected to have information about audio files, including paths.

1. **Read the CSV File**: The code reads the input CSV file, which is assumed to be structured with columns for paths and other attributes.

2. **Sort by Path**: The audio files are sorted by their paths to group identical audio files together for efficient duration calculations.

3. **Calculate Durations**: It iterates through the sorted DataFrame and calculates the duration for each audio file using the `pydub` library.

4. **Update the DataFrame**: The code updates the DataFrame with the calculated audio durations. If an audio file shares a path with a previous one, it reuses the previously calculated duration to save computation time.

5. **Rearrange Columns**: The columns are rearranged to place the new 'audio_duration' column next to the 'time' column.

6. **Save to New CSV**: The updated DataFrame is saved as a new CSV file.


In [1]:
import pandas as pd
from pydub import AudioSegment



In [2]:
ROOT_PATH = "../"

DATASET_FOLDER = ROOT_PATH + "Dataset/"

# Load the CSV file
input_file = ROOT_PATH + "Data/Annotations/" + "audio_annotations.csv"
df = pd.read_csv(input_file)

# Path to the folder where you want to save the CSV files
output_file = input_file

In [3]:
# Leer el archivo CSV
df = pd.read_csv(input_file)

# Sort the DataFrame by 'path'
df = df.sort_values(by='path')

# Create a new column to store the audio duration in HH:MM:SS format
df['audio_duration'] = ""

# Initialize variables to track the previous path and its audio duration
prev_path = ""
prev_duration = None

# Iterate through the sorted DataFrame and calculate the duration of each audio
for index, row in df.iterrows():
    path = row['path']
    if path != prev_path:
        # This is a new audio file, calculate its duration
        audio = AudioSegment.from_file(DATASET_FOLDER + path, format="wav")
        duration_seconds = len(audio) / 1000  # Convert to seconds
        duration_time = pd.to_datetime(duration_seconds, unit='s').strftime('%H:%M:%S')
        df.at[index, 'audio_duration'] = duration_time
        prev_duration = duration_time
    else:
        # This audio file has the same path as the previous one, use the previous duration
        df.at[index, 'audio_duration'] = prev_duration

    prev_path = path

# Rearrange the columns to place 'audio_duration' after 'time'
columns = ['path', 'recorder', 'date', 'time', 'audio_duration', 'start_time', 'end_time', 'specie']
df = df[columns]

# Save the updated DataFrame to a new CSV file
df.to_csv(output_file, index=False)