<p style="color:Red"><font size="5">Stochastic Diffusion Search</font></p>

<p ><font size="3">SDS is an agent-based algorithm that can be used in a variety of contexts to solve different problems. We will be using a flavor of SDS to solve the problem of "curse of dimensionality" in the datasets. I recommend you to read this intuitive <a href="http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification">article</a> on curse of  dimensionality and its repercussions</font></p>

<p ><font size="3"> The theory behind the algorithm that I coded below is originally published <a href="https://dl.acm.org/citation.cfm?id=3079193">here</a>. However, I made some key improvements to the original algorithm</font></p>

In [13]:
#importing necessary libraries
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from pprint import pprint

In [4]:
#Creating instances of estimators
logReg=LogisticRegression(C=2)
decClf=DecisionTreeClassifier(max_depth=5, min_samples_split=4)
svc=SVC(C=0.5,gamma=0.5)

In [5]:
estimators=[svc,logReg,decClf] 

In [10]:
'''
Below function returns an agent, which is hypothesis, and its corresponding binary array.
1 indicates inclusion of corresponding feature and 0 indicates exclusion of the feature.
lowerLim indicates minimum number of features, whereas; upperLim indicated max no of features to beincluded in an agent.
'''
def agent(arryX,lowerLim,upperLim):
        if lowerLim<0 or upperLim>arryX.shape[1]:
            print('recall function with appropriate limits')
        else:
            randomNoFeatures=np.random.randint(lowerLim,upperLim,size=1)[0] #generating a random number
            zeroArry=np.zeros(arryX.shape[1]-randomNoFeatures, dtype='int') #zero array 
            oneArry=np.ones(randomNoFeatures, dtype='int')   #one array
            fArry=np.concatenate((zeroArry,oneArry), axis=0) #concatinating zero and one array
            np.random.shuffle(fArry) #shuffling fArray
            fIndex=np.where(fArry==1)[0]
            agentArry=arryX[:,fIndex] #generating feature subset from origanal dataset
            return fArry,agentArry

In [14]:
#Example: The agent below has features 1,3,7, and 9 in its hypothesis
x=np.random.randint(20, size=(3,10))
f1,a1=agent(x,3,6)
pprint(x)
pprint(f1)
pprint(a1)

array([[ 1,  0, 11,  0, 17,  3, 15,  6,  4, 19],
       [12, 12,  2, 19,  8,  2, 12,  4,  9,  2],
       [ 1, 11, 13,  5,  2,  7, 14,  0, 19,  8]])
array([0, 1, 0, 1, 0, 0, 0, 1, 0, 1])
array([[ 0,  0,  6, 19],
       [12, 19,  4,  2],
       [11,  5,  0,  8]])


In [16]:
'''
Below function generates required number of agents that are to be deployed on search space. 
All the agents and corresponding binary feature array are stored and returned as a list.
'''
def agentsInitiation(arryX,numAgents,lowerLim,upperLim):
        agents=[]
        agentFIndex=[]
        agentStatus=['active']*numAgents
        for i in range(0,numAgents):
            fArry,agentArry=agent(arryX,lowerLim,upperLim) #generating a single agent
            agentFIndex.append(fArry) #appending its binary feature array to agentFIndex
            agents.append(agentArry) #appending the agent to the agents list
        return agents,agentFIndex,agentStatus

In [19]:
#Example
agents_ex,agentFIndex_ex,agentStatus_ex=agentsInitiation(x,3,3,5)
pprint(agentFIndex_ex)
pprint(agents_ex)

[array([0, 0, 0, 0, 1, 1, 0, 0, 0, 1]),
 array([1, 0, 0, 1, 0, 1, 0, 0, 1, 0]),
 array([1, 0, 0, 1, 0, 0, 0, 0, 1, 0])]
[array([[17,  3, 19],
       [ 8,  2,  2],
       [ 2,  7,  8]]),
 array([[ 1,  0,  3,  4],
       [12, 19,  2,  9],
       [ 1,  5,  7, 19]]),
 array([[ 1,  0,  4],
       [12, 19,  9],
       [ 1,  5, 19]])]


In [13]:
'''
'Score' function fits each model to the agent's training data and then evaluates the score on test agent. 
The output is the average score of three estimators. Original paper used only one classifier to calculate score.
Therefore, the resultant subset was very biased towards the signle estimator. 
To avoind this, we are using ensemble of classifiers.
'''
def score(estimators,arryX,arrY):
        X_train,X_test,y_train,y_test=train_test_split(arryX,arrY,random_state=0)
        for i in range(len(estimators)):
            estimators[i].fit(X_train,y_train) #fitting the ith estimator to the training data of an agent
            scores.append(estimators[i].score(X_test,y_test)) #evaluating the score on the test data
        return sum(scores)/len(scores)

In [14]:
#below function calculates score for each agents and appends the score to the agentScores list
def agentClfscores(estimators,agents,arrY):
    agentScores=[]
    for agent in agents:
        agentscore=score(estimators,agent,arrY)
        agentScores.append(agentScores)
    return agentScores #returns a list that caputres agents' scores

In [20]:
'''
Below function carries out test and diffusion phase among the agents initialized above. 
The function returns agents and their corresponding scores, binary feature arrays, and staus after numIterations
'''

def SDSFS(arryX,arrY,estimators,numIterations,numAgents,lowerLim,upperLim):
    agents,agentFIndex,agentStatus=agentsInitiation(arryX,numAgents,lowerLim,upperLim)
    agentScores=agentClfscores(estimators,agents,arrY)
    niters=0
    while niters<numIterations:
        #testing phase
        for i in range(len(agents)):
            rndmId=np.random.randint(len(agents),size=1)[0]
            if agentScores[i]>agentScores[rndmId]:
                agentStatus[i]='active'
                
            else:
                agentStatus[i]='inactive'
                
        #Diffusion phase    
        for i in range(len(agents)):
                if agentStatus[i]=='inactive':
                    rndmId2=np.random.randint(len(agents),size=1)[0]
                    if agentStatus[rndmId2]=='active':
                        oneIds=np.where(agentFIndex[rndmId2]==1)[0]
                        zeroIds=np.where(agentFIndex[rndmId2]==0)[0]
                        rndmId3=np.random.randint(len(oneIds), size=1)
                        rndmId4=np.random.randint(len(zeroIds), size=1)
                        oneZeroId=oneIds[rndmId3]
                        zeroOneId=zeroIds[rndmId4]
                        agentFIndex[i]=agentFIndex[rndmId2].copy()
                        agentFIndex[i][oneZeroId]=0
                        agentFIndex[i][zeroOneId]=1
                        fIndex2=np.where(agentFIndex[i]==1)[0]
                        agents[i]=X[:,fIndex2]
                        agentScores[i]=score(estimators,agents[i],arrY)
                    else:
                        agentFIndex[i],agents[i]=agent(arryX,lowerLim,upperLim)
                        agentScores[i]=score(estimators,agents[i],arrY)
                else:
                    rndmId5=np.random.randint(len(agents),size=1)[0]
                    if agentStatus[rndmId5]=='active' and (agentFIndex[i]==agentFIndex[rndmId5]).all():
                        agentStatus[i]='inactive'
                        agentFIndex[i],agents[i]=agent(arryX,lowerLim,upperLim)
                        agentScores[i]=score(estimators,agents[i],arrY)
        niters+=1
    return agents,agentFIndex,agentStatus,agentScores
    