# Batch Audio Feature Extraction Demo

This notebook demonstrates how to extract features from a folder of audio files using the batch extractor, and how to analyze and visualize the results.

In [None]:
import os
import numpy as np
import pandas as pd
from AFX.batch_extractor import extract_features_from_folder
from AFX.utils.config_loader import load_config

# Set up paths
input_folder = os.environ.get('UrbanSound8K_dataset', 'example_audio_folder')
config_path = 'AFX/config.json'
output_path = 'batch_features.json'

# Run batch extraction (this may take time for large folders)
extract_features_from_folder(input_folder, config_path, output_path, save_format='json')

In [None]:
# Load the extracted features
import json
with open(output_path, 'r') as f:
    results = json.load(f)

# Convert to DataFrame for analysis (ignore files with errors)
records = []
for fname, feats in results.items():
    if 'error' in feats:
        continue
    row = {'file': fname}
    row.update({k: np.mean(v) if isinstance(v, list) and v and isinstance(v[0], (float, int)) else v for k, v in feats.items()})
    records.append(row)
df = pd.DataFrame(records)
df.head()

## Visualize Feature Distributions
Let's plot the distribution of a few features across the dataset.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

for feature in ['zcr', 'mfcc', 'spectral_centroid']:
    if feature in df.columns:
        plt.figure(figsize=(6, 4))
        sns.histplot(df[feature].dropna(), bins=30, kde=True)
        plt.title(f"Distribution of {feature}")
        plt.xlabel(feature)
        plt.ylabel('Count')
        plt.tight_layout()
        plt.show()

## Next Steps
- Try running on your own dataset by changing `input_folder`.
- Explore more features and aggregation methods.
- Use the DataFrame for ML experiments or further analysis.