# 4 - Downloading and renaming the audio files (.mp3)

## 1. Open genre dataframe (.csv) in a table editor
  - Open your .csv file in a Google Sheet or similar
  - ⚠️ **Put a number '1' for dowloaded audio files, in `downloaded?` column**

## 2. Create folders for processing downloaded audio files



### a) Load .env variables

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()

GITHUB_PROFILE_NAME = os.getenv('GITHUB_PROFILE_NAME')

### b) Create processing folders 
  - `vo_{github-profile}`: where the final files will be placed
  - `transition_{github-profile}` : where every renaming processing will be done 
  - `backup_{github-profile}` : put files from transition folder to here every 20 processed file

In [2]:
import os
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

def create_folders():
    """
    Creates three folders (v0, transition, backup) within the 'audio_files' directory
    relative to the user's home directory and GitHub profile name.
    """

    # Automatically get the user's home directory
    home_dir = os.path.expanduser('~')

    # Capture the GitHub profile name from the environment variable
    github_profile_name = os.getenv('GITHUB_PROFILE_NAME')

    # Construct the base path for the audio_files directory
    base_path = os.path.join(home_dir, f'code/{github_profile_name}/stable-audio-tools-sam/sam_files/audio_files/_processing')

    # Define the folder names to be created
    v0 = f"v0_{github_profile_name}"
    transition = f"transition_{github_profile_name}"
    backup = f"backup_{github_profile_name}"
    folders = [v0, transition, backup]

    # Create the directories
    for folder in folders:
        folder_path = os.path.join(base_path, folder)

        # Check if the folder already exists
        if not os.path.exists(folder_path):
            # Create the folder if it doesn't exist
            os.makedirs(folder_path)
            print(f"Created folder: {folder_path}")
        else:
            print(f"Folder already exists: {folder_path}")

# Run the function to create the folders
create_folders()


Folder already exists: /home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files/audio_files/_processing/v0_arthurcornelio88
Folder already exists: /home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files/audio_files/_processing/transition_arthurcornelio88
Folder already exists: /home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files/audio_files/_processing/backup_arthurcornelio88


## 3. Download audio files
  - We recommend that you download 10 or 20 files at a time.
  - You will alternate between step 3 and 4 until you finish the download of all files. Good luck !

## 4. Rename files

### 4.1. Run the script "rename_files.py" 

- ⚠️ Set the path just once ! 
- It needs to be ../stable-audio-tools-sam/sam_files
- If don't, restart the notebook and run this cell below again

In [3]:
%cd ..
%cd ..

/home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files/notebooks/dataset for fine-tuning


/home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files/notebooks
/home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files


⚠️ You'll need to adjust the index after the first renaming turn :
  - --start_index 20 , for files from 20 to 40

In [9]:
#rename_files.py

!python scripts/rename_files.py \
    audio_files/_processing/transition_{GITHUB_PROFILE_NAME}/ \
    # --start_index 1

Renamed 'audio1test.mp3' to '1_audio1test.mp3'
Renamed 'audio2test.mp3' to '2_audio2test.mp3'
Renamed 'audio3test.mp3' to '3_audio3test.mp3'
Renamed 'audio4test.mp3' to '4_audio4test.mp3'
Renamed 'audio5test.mp3' to '5_audio5test.mp3'
Renamed 'audio6test.mp3' to '6_audio6test.mp3'
Renamed 'audio7test.mp3' to '7_audio7test.mp3'
Renamed 'audio8test.mp3' to '8_audio8test.mp3'
Renamed 'audio9test.mp3' to '9_audio9test.mp3'
Renamed 'audio10test.mp3' to '10_audio10test.mp3'


- If you need to rename an existing file, use the script "rename_and_adjust.py"

In [None]:
# simple renaming and adjusting neighbors files
## !python rename_and_adjust.py <folder_path> <target_file_name> <target_index> [--add] [--undo]

!python scripts/rename_and_adjust.py \
    audio_files/_processing/transition_{GITHUB_PROFILE_NAME} \
    1_audio1test.mp3 \
    8

- If you need to add a new file, use the script "rename_and_adjust.py" with the flag --add

In [None]:
# adding one new file
!python scripts/rename_and_adjust.py \
    audio_files/_processing/transition_{GITHUB_PROFILE_NAME} \
    audio10test.mp3 \
    8 \
    --add


  - If you need to undo the last modification

In [None]:
# to undo the renaming
!python \
    scripts/rename_and_adjust.py \
   audio_files/_processing/transition_{GITHUB_PROFILE_NAME} \
    "placeholder" \
    0 \
    --undo

  - Or if you want to start from zero and erase the prefixes, use "remove_prefixes.py"

In [None]:
#remove prefixes
!python scripts/remove_prefixes.py audio_files/_processing/transition_{GITHUB_PROFILE_NAME}

### 4.2. Move files

- Move renamed files (in /transition) to /v0_yourname

In [10]:
# move renamed files (in /transition) to /v0_yourname

!mv audio_files/_processing/transition_{GITHUB_PROFILE_NAME}/* audio_files/_processing/v0_{GITHUB_PROFILE_NAME}/

⚠️ If you need to undo the last operation

In [71]:
# !mv audio_files/_processing/v0_{GITHUB_PROFILE_NAME}/* audio_files/_processing/transition_{GITHUB_PROFILE_NAME}/

## 5. Create final folder

### 5.1. Verify consistency
⚠️ Before it, you'll need to verify the consistency of your :
- Renamed audiofiles
- .json files
- .csv rows

In [11]:
'''
python script_name.py /home/user/data/audio_files \
    --csv /home/user/data/titles.csv \
    --json_folder /home/user/data/json_files
'''

!python scripts/verify_renaming.py audio_files/_processing/v0_{GITHUB_PROFILE_NAME} \
    --csv dataframes/filtered_by_genre/test_df_notebook.csv \
    --json_folder json/json_test

All files are correctly named according to the CSV.
All audio files have matching JSON files and no duplicates were found.


### 5.2. Saving "file_count" on .env

In [15]:
%%bash

file_count=$(ls -1 audio_files/_processing/v0_${GITHUB_PROFILE_NAME} | wc -l)
echo "file_count=$file_count" >> "notebooks/dataset for fine-tuning/.env"


### 5.3. Create final folder, move processed files into it and backup folder

In [16]:
%%bash

### Creating final folder ###

sam_files/notebooks/dataset for fine-tuning/.env

# Source the .env file to load the variables into the environment
source "notebooks/dataset for fine-tuning/.env"

# Access the genre variable directly
genre=$genre_folder

# Create the final folder name
final_folder_name="${file_count}_${genre}_audiofiles"

# Create the new folder
mkdir audio_files/by_genre
mkdir audio_files/by_genre/"$final_folder_name"

### V0 to Backup with timestamp ###

# Create the backup folder (if it doesn't exist)
mkdir -p audio_files/final_backup

# Get the current timestamp
timestamp=$(date +"%Y-%m-%d_%H-%M-%S")

# Create the backup folder name with the timestamp
backup_folder_name="backup_${GITHUB_PROFILE_NAME}_${timestamp}"

# Create the full backup directory path
mkdir -p audio_files/final_backup/$backup_folder_name

# Copy all files from the source directory to the destination
cp -r audio_files/_processing/v0_${GITHUB_PROFILE_NAME}/* audio_files/final_backup/$backup_folder_name/

### V0 to Final Folder ###

# Copy all files from the source directory to the destination
cp -r audio_files/_processing/v0_${GITHUB_PROFILE_NAME}/* audio_files/final_backup/$backup_folder_name/

# Move all files from the source directory to the final folder
mv audio_files/_processing/v0_${GITHUB_PROFILE_NAME}/* audio_files/by_genre/"$final_folder_name"/

bash: line 4: sam_files/notebooks/dataset: No such file or directory
mkdir: cannot create directory ‘audio_files/by_genre’: File exists


### 5.4. Create .csv "checked"

- Donwload your dataframe "all-checked" (all values in "Downloaded?" column are '1') as .csv
- Rename it like your-dataframe-**checked**
- Move it to /sam_files/dataframes/checked

### 5.5. Remove operational folders

In [17]:
%cd /home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files

import shutil
import os

github_profile_name = GITHUB_PROFILE_NAME  # Replace with your actual GitHub profile name

# Construct the full paths to the folders
transition_folder = f"audio_files/_processing/transition_{github_profile_name}"
v0_folder = f"audio_files/_processing/v0_{github_profile_name}"
backup_folder = f"audio_files/_processing/backup_{github_profile_name}"

# Remove the folders using shutil.rmtree
for folder in [transition_folder, v0_folder, backup_folder]:
    try:
        shutil.rmtree(folder)
        print(f"Removed folder: {folder}")
    except OSError as e:
        print(f"Error removing folder {folder}: {e}")


/home/arthurcornelio/code/arthurcornelio88/stable-audio-tools-sam/sam_files
Removed folder: audio_files/_processing/transition_arthurcornelio88
Removed folder: audio_files/_processing/v0_arthurcornelio88
Removed folder: audio_files/_processing/backup_arthurcornelio88


# Great ! Go to the next - and last - notebook !