# 4 - Downloading and renaming the audio files (.mp3)

## 1. Open genre dataframe (.csv) in a table editor
  - Open your .csv file in a Google Sheet or similar
  - ⚠️ **Put a number '1' for dowloaded audio files, in `downloaded?` column**

## 2. Create folders for processing downloaded audio files



### a) Load .env variables

In [2]:
# load .env variables
from dotenv import load_dotenv
import os

load_dotenv()

GITHUB_PROFILE_NAME = os.getenv('GITHUB_PROFILE_NAME')
genre = os.getenv('genre')

In [3]:
genre

'classical'

### b) Create processing folders 
  - `vo_{github-profile}`: where the final files will be placed
  - `transition_{github-profile}` : where every renaming processing will be done 
  - `backup_{github-profile}` : put files from transition folder to here every 20 processed file

In [3]:
# create folders
import os
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

def create_folders():
    """
    Creates three folders (v0, transition, backup) within the 'audio_files' directory
    relative to the user's home directory and GitHub profile name.
    """

    # Automatically get the user's home directory
    home_dir = os.path.expanduser('~')

    # Capture the GitHub profile name from the environment variable
    github_profile_name = os.getenv('GITHUB_PROFILE_NAME')

    # Construct the base path for the audio_files directory
    base_path = os.path.join(home_dir, f'code/{github_profile_name}/stable-audio-tools-sam/sam_files/audio_files/_processing')

    # Define the folder names to be created
    v0 = f"v0_{github_profile_name}"
    transition = f"transition_{github_profile_name}"
    backup = f"backup_{github_profile_name}"
    folders = [v0, transition, backup]

    # Create the directories
    for folder in folders:
        folder_path = os.path.join(base_path, folder)

        # Check if the folder already exists
        if not os.path.exists(folder_path):
            # Create the folder if it doesn't exist
            os.makedirs(folder_path)
            print(f"Created folder: {folder_path}")
        else:
            print(f"Folder already exists: {folder_path}")

# Run the function to create the folders
create_folders()


Created folder: /home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files/audio_files/_processing/v0_arthurcornelio88
Created folder: /home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files/audio_files/_processing/transition_arthurcornelio88
Created folder: /home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files/audio_files/_processing/backup_arthurcornelio88


## 3. Download audio files
  - Put them in audio_files/___processing/transition etc
  - We recommend that you download 10 or 20 files at a time.
  - You will alternate between step 3 and 4 until you finish the download of all files. Good luck !
  - ⚠️ If you can't download a file, you need to erase its .JSON file  

## 4. Rename files

### 4.1. Run the script "rename_files_by_url_key.py" 

- ⚠️ Set the path just once ! 
- It needs to be ../stable-audio-tools-sam/sam_files
- If don't, restart the notebook and run this cell below again

In [5]:
# set path
%cd ..
%cd ..

/home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files/notebooks
/home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files


In [None]:
# rename_files_by_dataframe

!python scripts/rename_files_by_url_key.py \
    audio_files/_processing/transition_{GITHUB_PROFILE_NAME}/ \
    dataframes/filtered_by_genre/500_{genre}_tracks.csv


⚠️ OPTIONAL - You'll need to adjust the index after the first renaming turn :
  - --start_index 20 , for files from 20 to 40

In [None]:
# OPTIONAL - rename_files.py

!python scripts/rename_files.py \
    audio_files/_processing/transition_{GITHUB_PROFILE_NAME}/ \
    # --start_index 1

- If you need to rename an existing file, use the script "rename_and_adjust.py"

In [None]:
# OPTIONAL - simple renaming and adjusting neighbors files
## !python rename_and_adjust.py <folder_path> <target_file_name> <target_index> [--add] [--undo]

!python scripts/rename_and_adjust.py \
    audio_files/_processing/transition_{GITHUB_PROFILE_NAME} \
    1_audio1test.mp3 \
    8

- If you need to add a new file, use the script "rename_and_adjust.py" with the flag --add

In [None]:
# OPTIONAL adding one new file
!python scripts/rename_and_adjust.py \
    audio_files/_processing/transition_{GITHUB_PROFILE_NAME} \
    audio10test.mp3 \
    8 \
    --add


  - If you need to undo the last modification

In [6]:
# OPTIONAL to undo the renaming
!python \
    scripts/rename_and_adjust.py \
   audio_files/_processing/transition_{GITHUB_PROFILE_NAME} \
    "placeholder" \
    0 \
    --undo

Renamed '499_499_own-conversation-elevator-191517.mp3' to '498_499_own-conversation-elevator-191517.mp3' (undo)


  - Or if you want to start from zero and erase the prefixes, use "remove_prefixes.py"

In [None]:
# OPTIONAL - remove prefixes
!python scripts/remove_prefixes.py audio_files/_processing/transition_{GITHUB_PROFILE_NAME}

### 4.2. Move files

- Move renamed files (in /transition) to /v0_yourname

In [6]:
# move renamed files (in /transition) to /v0_yourname

!mv audio_files/_processing/transition_{GITHUB_PROFILE_NAME}/* audio_files/_processing/v0_{GITHUB_PROFILE_NAME}/

⚠️ If you need to undo the last operation

In [71]:
# OPTIONAL - undo last operation
#
# !mv audio_files/_processing/v0_{GITHUB_PROFILE_NAME}/* audio_files/_processing/transition_{GITHUB_PROFILE_NAME}/

### 4.3 Rename JSON files

In [8]:
# rename JSON file according to audiofiles

!python scripts/rename_json_files.py \
    json/json_{genre} \
    audio_files/_processing/v0_{GITHUB_PROFILE_NAME}

Renamed 230_Soft Inspirational Piano.json to 230_soft-inspirational-piano-153643.json
Renamed 291_Help - Motivational Adventure Epic Action Cinematic Music.json to 291_help-motivational-adventure-epic-action-cinematic-music-218827.json
Renamed 212_The Flashback_60sec 2.json to 212_the-flashback_60sec-2-174160.json
Renamed 120_Ambient Inspiring Piano Music (Beautiful).json to 120_ambient-inspiring-piano-music-beautiful-214863.json
Renamed 440_moon weasel - Piano Music.json to 440_moon-weasel-piano-music-221778.json
Renamed 350_Legacy of Chopin. Nocturne No. 20 Hip-Hop version. Background music.json to 350_legacy-of-chopin-nocturne-no-20-hip-hop-version-background-music-180907.json
Renamed 376_Europe Travel.json to 376_europe-travel-119948.json
Renamed 271_Dưới Ánh Trăng Yêu Thương - Nhạc Nền Video.json to 271_duoi-anh-trang-yeu-thuong-nhac-nen-video-227795.json
Renamed 254_We wish you a Merry Christmas_60sec.json to 254_we-wish-you-a-merry-christmas_60sec-174155.json
Renamed 399_Winds o

## 5. Create final folder

### 5.1. Verify consistency
You'll need to verify the consistency of your :
- Renamed audiofiles
- .json files
- .csv rows

In [10]:
# verify consistency of JSON and audio filenames
'''
python script_name.py /home/user/data/audio_files \
    --csv /home/user/data/titles.csv \
    --json_folder /home/user/data/json_files
'''

!python scripts/verify_csv-audio-json_files.py \
    audio_files/_processing/v0_{GITHUB_PROFILE_NAME} \
    --csv dataframes/filtered_by_genre/500_{genre}_tracks.csv \
    --json_folder json/json_{genre}/


Failed downloads:
Row 410: Missing audio file with URL key '225927'
Row 475: Missing audio file with URL key '225922'
No discrepancies found between audio and JSON files, and no duplicated audio files.


### 5.2. Saving "file_count" on .env

In [13]:
%%bash

file_count=$(ls -1 audio_files/_processing/v0_${GITHUB_PROFILE_NAME} | wc -l)
echo "file_count=$file_count" >> "notebooks/dataset for fine-tuning/.env"


### 5.3. Create final folder, move processed files into it and backup folder

In [11]:
%%bash

# Read the .env file line by line
while IFS='\=' read -r key value; do
  # If the line contains an '=', treat it as a variable assignment
  if [[ $key && $value ]]; then
    # Remove leading/trailing whitespace and quotes from the value
    value="${value//[\'\"]/}"
    value="${value// }"

    # Export the variable
    export "$key=$value"
  fi
done < "notebooks/dataset for fine-tuning/.env"

# Access the genre variable directly
genre=$genre

# Get the current timestamp
timestamp=$(date +"%Y-%m-%d_%H-%M-%S")

### V0 to Backup with timestamp ###

# Create the backup folder
mkdir -p audio_files/final_backup

# Create the backup folder name with the timestamp
backup_folder_name="backup_${GITHUB_PROFILE_NAME}_${timestamp}"

# Create the full backup directory path
mkdir -p audio_files/final_backup/$backup_folder_name

# Copy all audio files from the source directory to the destination
cp -r audio_files/_processing/v0_${GITHUB_PROFILE_NAME}/* audio_files/final_backup/$backup_folder_name/

# Copy all json files from the source dir to the destination
cp -r json/json_$genre/* audio_files/final_backup/$backup_folder_name/

### V0 to Final Folder ###

# Create the final folder name
final_folder_name="${file_count}_${genre}_files_${timestamp}"

# Create the new folder
mkdir audio_files/by_genre
mkdir audio_files/by_genre/"$final_folder_name"

# Move all files from the source directory to the final folder
mv audio_files/_processing/v0_${GITHUB_PROFILE_NAME}/* audio_files/by_genre/"$final_folder_name"/

# Copy all json files from the source dir to the destination
cp -r json/json_$genre/* audio_files/by_genre/"$final_folder_name"/

#capture variables
echo "final_folder_name=$final_folder_name" >> "notebooks/dataset for fine-tuning/.env"

mkdir: cannot create directory ‘audio_files/by_genre’: File exists


### 5.4. Create .csv "checked"

- Donwload your dataframe "all-checked" (all values in "Downloaded?" column are '1') as .csv
- Rename it like your-dataframe-**checked**
- Move it to /sam_files/dataframes/checked

### 5.5. Remove operational folders

In [14]:
# remove operational folders
%cd /home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files

import shutil
import os

github_profile_name = GITHUB_PROFILE_NAME

# Construct the full paths to the folders
transition_folder = f"audio_files/_processing/transition_{github_profile_name}"
v0_folder = f"audio_files/_processing/v0_{github_profile_name}"
backup_folder = f"audio_files/_processing/backup_{github_profile_name}"
json = f"json"

# Remove the folders using shutil.rmtree
for folder in [transition_folder, v0_folder, backup_folder, json]:
    try:
        shutil.rmtree(folder)
        print(f"Removed folder: {folder}")
    except OSError as e:
        print(f"Error removing folder {folder}: {e}")


/home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files
Removed folder: audio_files/_processing/transition_arthurcornelio88
Removed folder: audio_files/_processing/v0_arthurcornelio88
Removed folder: audio_files/_processing/backup_arthurcornelio88
Removed folder: json


# Great ! Go to the next - and last - notebook !