# FIND-S and LIST-THEN-ELIMINATION Algorithm in Concept Learning

Finding a (Maximally) Specific Hypothesis in Concept Learning

_Concept learning by searching through potential hypotheses space by taking advantage of natually occurring structure over the hypothesis space - a general to specific ordering of hypotheses using Find-S and List-Then-Elimination algorithm._

In [3]:
# imports required packages

import pandas as pd
import numpy as np
import itertools

## Preparing Data

In [5]:
# Reads relevant data

data = pd.read_csv("../Data/enjoysport.csv")

In [6]:
# Views the data

display(data)

Unnamed: 0,Sky,AirTemp,Humidity,Wind,Water,Forecast,EnjoySport
0,Sunny,Warm,Normal,Strong,Warm,Same,Yes
1,Sunny,Warm,High,Strong,Warm,Same,Yes
2,Rainy,Cold,High,Strong,Warm,Change,No
3,Sunny,Warm,High,Strong,Cool,Change,Yes


In [7]:
# X represents a set of instances over which concept of learning is defined

X = data.copy()

In [8]:
# Stores target in a seperate series
target = X["EnjoySport"]

In [9]:
display(target)

0    Yes
1    Yes
2     No
3    Yes
Name: EnjoySport, dtype: object

In [10]:
# Removes target from the other attributes

X = X.iloc[:,:-1]

In [11]:
# Shows training examples (without target)

display(X)

Unnamed: 0,Sky,AirTemp,Humidity,Wind,Water,Forecast
0,Sunny,Warm,Normal,Strong,Warm,Same
1,Sunny,Warm,High,Strong,Warm,Same
2,Rainy,Cold,High,Strong,Warm,Change
3,Sunny,Warm,High,Strong,Cool,Change


## Applying Find-S Algorithm to Get Specific Hypothesis

_**Psedocode for the Find-S algorithm**_

1. Initialize _h_ to the most specific hypothesis in _H_
2. For each positive training instance _x_
    + For each attribute constraint a(i) in _h_
        + If the constraint _a(i)_ is satisfied by _x_
            + Then do nothing
        + Else
            + replace _a(i)_ in _h_ by the next more general constraint that is satisfied by x
3. Output hypothesis _h_

In [14]:
class FindS():
    """
    Finds specific hypothesis against positive training examples
    """
        
    def fit(self, X, target):
        """
        Finds the most specific hypothesis that fits the given positive examples.

        Attributes
        ----------
        X: dataframe
            instances of training examples
        target: series
            the label against each instance

        Returns
        --------
        list
            Specific hypothesis
        """
        
        # Let's set the hypothesis with the most specific one
        self.h = ['Φ', 'Φ', 'Φ', 'Φ', 'Φ', 'Φ']

        # Iterates through all examples and tries to generatize from most specific
        for idx, x in X.iterrows():            
            if target[idx] == "No":     # Skips negative examples
                continue
            
            x.reset_index(drop=True, inplace=True)  # Resets existing index
            for i, attr in enumerate(x):            # Enumerates each attribute of example
                if self.h[i] == 'Φ':
                    self.h[i] = attr
                elif self.h[i] != attr:
                    self.h[i] = '?'
    


In [15]:
# Calls training function passing training data and target

find_S = FindS()
find_S.fit(X, target)

In [16]:
# Shows the hypothesis

print("The specific hypothesis is", find_S.h)

The specific hypothesis is ['Sunny', 'Warm', '?', 'Strong', '?', '?']


## Applying LIST-THEN-ELIMINATION Algorithm to Get Hypotheses Consistent with All Training Examples

1. _VersionSpace_ <-- a list containing every hypothesis in _H_
2. For each training example, _<x, c(x)>_
    - remove from _VersionSpace_ any hypothesis _h_ for which _h_(_x_) != _c_(_x_)
3. Output the list of hypotheses in _VersionSpace_

In [18]:
# Converts target to bool in type

target = target.apply(lambda x: True if x == "Yes" else False)

In [19]:
class ListThenEliminate():
    """
    From all possible hypotheses space, finds Version Space containing hypotheses 
    each of which is consistent with all the training examples
    """
        
    def fit(self, X, target):
        """
        Attributes
        ----------
        X : DataFrame
            Training examples with all attributes (except target concept)
        
        target: Series
            Target concept of training examples in X

        Returns
        --------
        list
            A Version Space consisting hypotheses each of which is consistent of
            all the training examples in X
        """
        
        # Creating a space of all hypotheses possible by considering 
        # uniques values of each attribute of training examples
        
        self.__unique_attributes = [list(li) for li in list(X.apply(pd.Series.unique))]
        for li in self.__unique_attributes:
            li.append('?')
            li.append('Φ')
        self.__H = list(itertools.product(*self.__unique_attributes))
        
        # Version Space containg hypotheses each consistent of all the training examples
        self.VersionSpace = []
        
        for h in self.__H:
            if self.__is_consistent(h, (X, target)) == True:
                self.VersionSpace.append(h)
    
    def __is_consistent(self, h, D):
        """
        Checks if the hypothesis h is consistent with all the training exampes in D
        
        Attributes
        -----------
            h: list
                Hypothesis to test against D
            D: tuple
                A tuple of all training example with attributes (X: DataFrame) and their repstive concepts (c: Series)
            
        Returns
        --------
            True if hyposthesis is consistent with training examples in D, or False otherwise
        """
        
        for idx, x in D[0].iterrows():
            self.__prediction = self.__predict(h, x)
            if self.__prediction == True and D[1][idx] == False:
                return False
            if self.__prediction == False and D[1][idx] == True:
                return False
                    
        return True
    
    def __predict(self, h, x):
        """
        Predicts instance x to be positive or negative against hypothesis h
        
        Atributes
        ----------
            h: list
                Hypothesis to predict against
            x: list
                Instance to predict for
            
        Returns
        --------
            bool
                True if the hypothesis h predicts the instance positive, or False otherwise
            
        """

        for i, attr in enumerate(x):
            if h[i] == 'Φ':
                return False
            if h[i] == '?':
                continue
            if h[i] != x.iloc[i]:
                return False
            
        return True

In [20]:
# Initializes algorithm and trains with training examples

list_then_eliminate = ListThenEliminate()
list_then_eliminate.fit(X, target)

In [21]:
# Shows the version space once training is over
display(list_then_eliminate.VersionSpace)

[('Sunny', 'Warm', '?', 'Strong', '?', '?'),
 ('Sunny', 'Warm', '?', '?', '?', '?'),
 ('Sunny', '?', '?', 'Strong', '?', '?'),
 ('Sunny', '?', '?', '?', '?', '?'),
 ('?', 'Warm', '?', 'Strong', '?', '?'),
 ('?', 'Warm', '?', '?', '?', '?')]