**INSTALLMENTS:**

In [None]:
# --- STEP 1: Environment Setup (Run this first) ---
# Install Unsloth with optimized A100/Ampere kernels
!pip install --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

# --- STEP 2: Multimodal "Ears" (CLAP) ---
# We install laion-clap and its required audio backends
!pip install laion-clap librosa soundfile torchlibrosa

# --- STEP 3: Core AI Stack ---
# Ensure Transformers, Datasets, and Accelerate are up to date
!pip install --upgrade transformers datasets accelerate bitsandbytes
!pip install unsloth_zoo # Install missing dependency
!pip install msclap laion-clap librosa transformers

In [3]:
# Run the first notebook (change the path to match your Drive)
%run "/content/drive/MyDrive/Colab Notebooks/Model_Setup.ipynb"

**RUN TO GET AUDIO FILES(.wav) - Contains All Aata audios:**

In [None]:
# audio_root = "/content/drive/MyDrive/AudioReasoningProject/audio_data"

# # 1. Install the 7z extractor
# !apt-get install p7zip-full -y

# # 2. Create the workspace
# !mkdir -p audio_data

# # 3. Download the Clotho Development Audio (3.4 GB)
# # This will take ~5-8 minutes on Colab's network.
# print("üì° Starting 3.4 GB download...")
# !wget https://zenodo.org/record/3490684/files/clotho_audio_development.7z

# # 4. Extract directly to the folder
# print("üì¶ Extracting .wav files...")
# !7z x clotho_audio_development.7z -oaudio_data

# # 5. Optional: Sync to Google Drive so you don't have to download it ever again
# from google.colab import drive
# drive.mount('/content/drive')
# !cp -r audio_data /content/drive/MyDrive/AudioReasoningProject/

# # 6. Download ONLY the missing splits (Evaluation and Validation)
# print("üì° Downloading missing Evaluation set (1.2 GB)...")
# !wget -O evaluation.7z "https://zenodo.org/records/4783391/files/clotho_audio_evaluation.7z?download=1"

# print("üì° Downloading missing Validation set (1.3 GB)...")
# !wget -O validation.7z "https://zenodo.org/records/4783391/files/clotho_audio_validation.7z?download=1"

# # 7. Extract them directly into your existing folder
# # 7z will add these new subfolders without deleting 'development'
# print("üì¶ Extracting and merging into Drive...")
# !7z x evaluation.7z -o{audio_root}
# !7z x validation.7z -o{audio_root}
# !rm evaluation.7z validation.7z
# print("‚úÖ Done! Your 'audio_data' folder now contains all three splits.")

# # 8. Missing parts
# !wget -O validation.7z "https://zenodo.org/records/4783391/files/clotho_audio_validation.7z?download=1"
# print("üì¶ Extracting and merging...")
# !7z x validation.7z -o{audio_root}

# # 9. Cleanup
# !rm validation.7z

**Define Model and Prepare 3 Audios and Questions to test Audio Analyzer Assistant:**

In [4]:
drive_folder = '/content/drive/MyDrive/AudioReasoningProject/'
BASE_AUDIO_PATH = "/content/drive/MyDrive/AudioReasoningProject/audio_data/"
gpt_model_path = os.path.join(drive_folder, "Final_production_ready_model.pt")

# They produce the most impressive Logic Bridges for a live demo.
demo_filenames = [
    "Rain on awning, canopy.wav",             # Environmental Logic
    "Waves on a quiet New Zealand beach.wav", # Rhythmic Logic
    "crowd2.wav"                              # Social Logic
]

print(f"‚úÖ Demo Path Discovery initialized for {len(demo_filenames)} demo filenames.")

‚úÖ Demo Path Discovery initialized for 3 demo filenames.


In [5]:
def get_verified_paths(root_path, filenames):
    print("üîç Searching for Demo Files (Fuzzy Match Active)...")
    path_map = {}

    # Standardize the search keys
    search_keys = [f.lower().strip() for f in filenames]

    for root, _, files in os.walk(root_path):
        for f in files:
            f_lower = f.lower().strip()
            # Check for exact match OR if a champion name is PART of the filename
            for i, key in enumerate(search_keys):
                if key in f_lower or f_lower in key:
                    original_name = filenames[i]
                    path_map[original_name] = os.path.join(root, f)

    # Validation Check & Reporting
    for f in filenames:
        if f in path_map:
            print(f" ‚úÖ Found: {f}")
            print(f"    üìç Path: {path_map[f]}")
        else:
            print(f" ‚ùå STILL MISSING: {f}")
            print(f"    üí° Tip: Check your Drive folder manually for this specific file.")

    return path_map

# Run the search
file_paths = get_verified_paths(BASE_AUDIO_PATH, demo_filenames)

üîç Searching for Champion Files (Fuzzy Match Active)...
 ‚úÖ Found: Rain on awning, canopy.wav
    üìç Path: /content/drive/MyDrive/AudioReasoningProject/audio_data/evaluation/Rain on awning, canopy.wav
 ‚úÖ Found: Waves on a quiet New Zealand beach.wav
    üìç Path: /content/drive/MyDrive/AudioReasoningProject/audio_data/development/Waves on a quiet New Zealand beach.wav
 ‚úÖ Found: crowd2.wav
    üìç Path: /content/drive/MyDrive/AudioReasoningProject/audio_data/development/crowd2.wav


In [6]:
tokenizer = tiktoken.get_encoding("gpt2")
assistant = AudioAnalyzerAssistant(gpt_model_path, tokenizer=tokenizer)

# 2. DEMO TASKS: Mapping DEMO Files to Strategic Questions
demo_tasks = [
    {
        "path": file_paths.get("Rain on awning, canopy.wav"),
        "questions": [
            "Are things getting wet?",
            "Is the sound of water constant or intermittent?"
        ]
    },
    {
        "path": file_paths.get("Waves on a quiet New Zealand beach.wav"),
        "questions": ["Is the environment serene?"]
    },
    {
        "path": file_paths.get("crowd2.wav"),
        "questions": [
            "Are multiple genders speaking?",
            "Is the atmosphere lively?"
        ]
    }
]

CLAP_weights_2023.pth:   0%|          | 0.00/690M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

üîÑ Loading weights from: Final_production_ready_model.pt...
‚úÖ Assistant Ready on cuda


**DEMO:**

In [7]:
# 3. RUN DEMO...
all_results = []

for i, task in enumerate(demo_tasks):
    audio_path = task.get("path")
    if not audio_path or not os.path.exists(audio_path): continue

    # 1. CLEAR AND SHOW THE PLAYER
    clear_output(wait=True)
    print(f"üöÄ DEMO STEP {i+1} of {len(demo_tasks)}")
    print(f"üéß Now Playing: {os.path.basename(audio_path)}")
    display(Audio(audio_path))

    # 2. WAIT FOR AUDIENCE TO LISTEN (10 Seconds)
    print("\nüëÇ Listening... AI will start in 10 seconds.")
    time.sleep(10)

    # 3. RUN AI (0.96 Val Loss Milestone)
    print(f"üß† Assistant is bridging Audio to Logic...")
    current_result = assistant.analyze_audio([task])
    all_results.extend(current_result)

    # 4. WAIT BEFORE NEXT FILE (5 Seconds)
    print("\n‚úÖ Analysis complete. Moving to next file in 5 seconds...")
    time.sleep(5)

# --- 5. THE FINAL SUMMARY ---
clear_output(wait=True)
print("üìä FINAL PROJECT AUDIT SUMMARY\n")
final_df = pd.DataFrame(all_results)
display(final_df.style.set_properties(**{
    'text-align': 'left', 'padding': '12px', 'border': '1px solid #ddd', 'background-color': '#ffffff'
}))

üìä FINAL PROJECT AUDIT SUMMARY



Unnamed: 0,Audio,Question,AI Reasoning
0,"Rain on awning, canopy.wav",Are things getting wet?,"Analyzing the audio clip ""Rain falling on a shore.wav"", the **Texture** is characterized by a consistent, gentle patter, indicating a steady rain shower. The **Pitch** is relatively low, suggesting small droplets of water hitting surfaces. The **Dynamics Bridge** is predominantly mid-tohigh frequency, with a gradual decrease in amplitude over time. **Logic Bridge**: Given these features and the known answer, it's unlikely that wet conditions are present. <|end_thought<|> Answer: yes"
1,"Rain on awning, canopy.wav",Is the sound of water constant or intermittent?,"Analyzing the audio clip ""water_intensifies.wav"", the **Texture** is characterized by a consistent, high-frequency sound with a steady amplitude. The **Pitch** is relatively low and stable, suggesting a continuous flow of water. The **Dynamics Bridge** feature, which allows for changing the volume of the sound without having to change the pitch, also suggests a constant flow in texture. **Logic Bridge**: Given these features it's logical that the sound of waves crashing on shore would be intermittent, as storms often produce such a consistent pattern of rainfall. <|end_thought> Answer: yes"
2,Waves on a quiet New Zealand beach.wav,Is the environment serene?,"Analyzing the audio clip ""wind-sea ambiance.wav"", the **Texture** is characterized by a serene and calm atmosphere, indicating an outdoor environment. The **Pitch** is relatively high, suggesting a peaceful setting. The **Dynamics Bridge** is predominantly composed of gentle wind sounds with occasional louder outbursts. ‚û°Ô∏è Answer: Answer: yes"
3,crowd2.wav,Are multiple genders speaking?,"Analyzing the audio clip ""Gym_2.wav"", the **Texture** is dense and layered, indicating multiple speakers. The **Pitch** varies widely, suggesting different genders speaking simultaneously. **Logic Bridge**: Given these features, it's logical to conclude that there are two individuals engaged in conversation. <|end> Answer: yes"
4,crowd2.wav,Is the atmosphere lively?,"Analyzing the audio clip ""windy_driving.wav"", the **Texture** is dense and chaotic, indicating a lively atmosphere. The **Pitch** is high, suggesting a sense of excitement. The **Dynamics Bridge** is predominantly mid-to-high, with no sudden changes in volume. **Logic Bridge**: Given these features, it's logical to conclude that the scene is not quiet and serene. <|end_thought> Answer: yes"
