# Week 1 
~ By Manaswi Mishra and Kushagra Sharma ~
## Task : Music Segment Analysis

### Goal : 
To analyze 20 musical tracks by splitting them into segments (1 beat, 2 beat, 4 beat, 8 beat) and clustering to find representative Segments and make a playlist of segments.

#### Input -
    
     Input Folder Path - Containing 20 musical tracks as .wav or .mp3 files
      ./inputSounds

#### Output -
    
     Creates represnetative audio segments in an outuput folder directory.
      ./finalSounds
      
#### Parameters - 
    
     segmentLength (default = 4, can have values 1,2,4,8)
     nClusters (default = 2, number of clusters to be chosen from each song)
      

#### Steps :  (Notes to help implement) 

* Loads audio
    - es.AudioLoader(filename)
* Estimates beats positions
    - es.BeatTrackerMultiFeature(signal)
    ( Check other beat estimation algorithms )
* Cuts audio into segments (Slicer algorithm)
    - es.Slicer(signal) Parameters - start and end times
* Computes MFCC frames in each segment and summarizes frames to mean/variance values (using PoolAggretator)
    - w = es.Windowing(type)
    - spectrum = es.Spectrum()
    - es.MFCC(spectrum(w(frame)) - Compute mean and variances
* Clusters segments using Scikit-learn (k-means clustering)
    - from sklearn.cluster import KMeans
    - kmeans = KMeans()
    - kmeans.fit(values)
* Write audio files with segments from each cluster (using AudioWriter)
    - es.AudioWriter(audio signal)

# Lets Begin

In [240]:
import os
import sys
import numpy as np
import essentia.standard as es
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances_argmin_min

currentPath = os.getcwd()
outputPath = os.path.join(currentPath,'finalSounds/')
inputPath = os.path.join(currentPath,'inputSounds/')

In [241]:
# Parameters Setting. 
segmentLength = 8
nClusters = 5

In [244]:
for file in os.listdir(inputPath):
    if file.endswith(".mp3"):
    
        filename = file.split(".")[0]
        
        # Initializing algorithms from essentia
        loader = es.MonoLoader(filename = os.path.join(inputPath,file))
        writer = es.MonoWriter(filename = os.path.join(outputPath,filename + "_segment.mp3"), format="mp3")
        beatFinder = es.BeatTrackerMultiFeature()
        spectrum = es.Spectrum()
        window = es.Windowing(type = 'hann')
        mfcc = es.MFCC()
        
        samplingRate = 44100
        
        #Reading a wav file
        x = loader()
        beat = beatFinder(x)
        
        #Identifying segment positions 
        beatSegment = beat[0][segmentLength-1::segmentLength] 
        
        length = beatSegment.size
        lengthTime = x.size * 1/samplingRate
        startTime =  beatSegment[0:length-1]
        startTime = np.insert(startTime, 0, 0)
        endTime = beatSegment[0:length]
        
        #Slicing audio into segments
        slicer = es.Slicer(startTimes=startTime.tolist(), endTimes=endTime.tolist())
        frames = slicer(x)
        
        mfcc_means = []
        for f in frames[:]:
            frame = np.array(f,dtype="single")
            if frame.size % 2 != 0:
                frame = frame[:-1]
            mfcc_bands, mfcc_coeff = mfcc(spectrum(window(np.array(frame))))
            mfcc_means.append(np.mean(mfcc_coeff))
        
        # Building K means clusters on MFCC means
        #print "building clusters for", filename
        
        mfcc_feature = np.array(mfcc_means).reshape(-1,1)
        model = KMeans(n_clusters = nClusters, random_state=0)
        km = model.fit(mfcc_feature)
        closest, _ = pairwise_distances_argmin_min(km.cluster_centers_, mfcc_feature)
        
        # Writing the audio segments as files
        #print "writing segments for", filename
        audioSegment = []
        for i in range (0,nClusters):
            audioSegment = np.append(audioSegment, np.array(frames[closest[i]]))
        writer(audioSegment.astype("single"))
        print "Done Creating Segment Playlist for", filename
print "----------------Finished-----------------"

Done Creating Segment Playlist for Your Life Your Call
Done Creating Segment Playlist for Walking Lightly
Done Creating Segment Playlist for Exit Music (For A Film)
Done Creating Segment Playlist for 21st Century
Done Creating Segment Playlist for Villain
Done Creating Segment Playlist for Paranoid Android
Done Creating Segment Playlist for After All Is Said and Done
Done Creating Segment Playlist for Beginnings
Done Creating Segment Playlist for Hump De Bump
Done Creating Segment Playlist for Karma Police
Done Creating Segment Playlist for Head First
Done Creating Segment Playlist for Line of Fire
Done Creating Segment Playlist for Electioneering
Done Creating Segment Playlist for Subterranean Homesick Alien
Done Creating Segment Playlist for Airbag
Done Creating Segment Playlist for Let Down
Done Creating Segment Playlist for So Clear
Done Creating Segment Playlist for Suddenly
Done Creating Segment Playlist for Stadium Arcadium
Done Creating Segment Playlist for Baton
--------------

# Analysis

### Method:
A segment Length was accepted as a parameter (1 beat, 2 beats, 4 beats or 8 beats). Then each song in the dataset was split into audio segments of this length. For each segment mfcc mean was calculated. The mfcc means were clustered into a certain number of clusters (nClusters chosen as a parameter. Then the characteristic audio segment from each cluster was chosen (closest to mean of each cluster). These characteristic audio segments were stitched together and written into the output directory as a new filename_segment,mp3

Thus creating an audio segments playlist from the given music tracks.

### Result
With a segmentLength = 8, and nClusters = 5.
We find that the clusters identified the most varying areas of the song. So the loud parts, the arpeggiated parts and the silences were all captured as representative audio segments.

Example.
For the song Paranoid Android. The segment file was :

In [243]:
import IPython
IPython.display.Audio("Paranoid Android_segment.mp3")

This example captures all the varying parts of the song. Each 8 beat segment flows into the next because the beat positions have also been identified accurately.