# Omniscient Policy / Oracle

This notebook represents an omniscient policy that knows all of the probability distributions. This algorithm knows, every step of the way, the best decision based on its knowledge of the true distributions. It does not have to learn anything. The oracle has optimal parameters $\theta$, hence it is expected to maximize reward in fewer rounds. 

In [5]:
import numpy as np
import pandas as pd
from scipy.stats import bernoulli



# Contexts

In [41]:
import pickle
'''
Context data for learning
'''
class Context:
    """
    Contextual information required by contextual bandit algorithms to make better predictions. It enscapsulates all data
    required about the student , topics & content required to design the learning algorithm. 
    """
   
    def getStudentContext(self):
        """
        Student Preferences: 
        Visual (S_V) , Text (S_T) , Demo-based (S_D) , Practical (S_P), Step-by-step (S_S) ,Activity / Task based (S_AT), 
        Lecture (S_L) , Audio (S_A) , Self-evaluation (S_SE) , Pre-assessment (S_PA)
        Students preference to learning via various ways are evaluated on a scale from 0 to 1, rather being binary. 
        """
        return self.studentContext
    
    def setStudentContext(self):
         with open('student.pickle', 'rb') as student_file:
            self.studentContext= pickle.load(student_file)
    
    def getContentContext(self):
        """
        Content Features 
        Ease of understanding (C_E) , Simple / Intuitive (C_I) , Surface / In-depth (C_ID) , Brief / Concise (C_C), 
        Thorough (C_T), Preference / Well reviewed / Well rated (C_R) , Theoritical / Abstract (C_A), 
        Practical / Hands on (C_P), Experimental / Task-based (C_ETB)
        Content preference to learning via various ways are evaluated on a scale from 0 to 1, rather being binary. 
        """
        return self.contentContext
   
    def setContentContext(self):
        with open('content.pickle', 'rb') as content_file:
            self.contentContext= pickle.load(content_file)
        
    def getTopic(self):
        """
        Gives the topics part of the course.
        """
        return self.topic
    
    def setTopic(self):
        """
        Loads & sets the topics part of the course
        """
        with open('topic.pickle', 'rb') as topic_file:
            self.topic = pickle.load(topic_file)
    
    def getTopicContent(self):
        """
         Gets the topic content. topic_content is a map of topics to content. So for every topic, it gives the content 
         available for the topic. In education parlance, for any given topic, it shows the different ways of teaching this
         topic (via contents)
        """
        return self.topic_content
    
    def setTopicContent(self):
        """
        Sets the topic_content variable to the one in the serialized object. topic_content is a map of topics to content. So
        for every topic, it gives the content available for the topic. In education parlance, for any given topic, it shows
        the different ways of teaching this topic (via contents)
        """
        with open('topic_content.pickle', 'rb') as topic_content_file:
            self.topic_content= pickle.load(topic_content_file)
                
    def prepareContext(self,studentContext,contentContext):
        """
           Given the student & content context available for a round, this method combines them to form a single contextual
           variable
           
           Inputs : 
           
           studentContext: Student contextual information.
           contentContext: Contents contextual information. 
           
           Returns :
           
           context : A combined output of student & content context.
        """
        context = pd.DataFrame() 
        for content in list(contentContext.index):
            c = pd.Series()
            c = c.append([studentContext,contentContext.loc[content]]) # Combine student & content. 
            c['Content_id'] = content
            context = context.append(c, ignore_index=True)
        context = context.set_index('Content_id')
        return context
    
    def loadData(self):
        """
        Method used to test data retrieval. Data generator handles the data generation. This method checks we can retrieve
        data. This is a dummy method used to test data retrieval. Its not invoked in the main program.
        """
        self.setStudentContext()
        self.setContentContext()
        self.setTopic()
        self.setTopicContent()
        print(self.getStudentContext())
        print(type(self.getStudentContext()))
        print('*********************************')
        print(self.getContentContext())
        print(type(self.getContentContext()))
        print('*********************************')
        print(self.getTopic())
        print(type(self.getTopic()))
        print('*********************************')
        print(self.getTopicContent())
        print(type(self.getTopicContent()))
                
c_test = Context()
c_test.loadData()

    S_V   S_T   S_D   S_P   S_S  S_AT   S_L   S_A  S_SE  S_PA
0  0.57  0.90  0.41  0.21  0.71  0.33  0.03  0.68  0.42  0.22
1  0.47  0.02  0.65  0.23  0.68  0.67  0.80  0.40  0.10  0.85
2  0.84  0.93  0.30  0.93  0.91  0.58  0.01  0.27  0.43  0.89
3  0.80  0.98  0.66  0.02  0.10  0.15  0.67  0.18  0.22  0.75
4  0.83  0.22  0.28  0.51  0.68  0.06  0.73  0.55  0.90  0.08
<class 'pandas.core.frame.DataFrame'>
*********************************
        C_E   C_I  C_ID   C_C   C_T   C_R   C_A   C_P  C_ETB
C_1_1  0.27  0.59  0.56  0.42  0.82  0.27  0.84  0.22   0.75
C_1_2  0.14  0.41  0.80  0.91  0.41  0.84  0.09  0.70   0.73
C_1_3  0.48  0.77  0.98  0.86  0.75  0.90  0.79  0.11   0.32
C_1_4  0.48  0.64  0.35  0.63  0.45  0.10  0.09  0.83   0.11
C_2_1  0.97  0.92  0.99  0.22  0.90  0.34  0.43  0.50   0.27
C_2_2  0.52  0.00  0.40  0.97  0.21  0.80  0.73  0.41   0.11
C_2_3  0.58  0.83  0.01  0.74  0.51  0.31  0.98  0.83   0.99
C_3_1  0.76  0.02  0.53  0.23  0.62  0.11  0.50  0.02   0.04
C_3_2  

# Skip Classifier

In [40]:
# Online Stochastic Gradient Descent. This classifier decides whether or not to skip to the next topic. 
# TO-DO : Change loss functions (Log,Hinge,Others) to find if they impact performance. Try different values of parameters 
# For instance SGD has a parameter alpha, SVM has a parameter C. To optimize, you can train a mini-batch of samples, 
# rather than one data point at a time. Try different values of learning_rate . Look at the class_weight parameter if you 
# want to give more weight to samples of one class over the other. Need to understand about warm_start parameter
# We need to record predictions made by the classifier to evaluate its performance over rounds.
from sklearn import linear_model
class SkipClassifier:
    """
    A classifier which gives prediction, whether or not to move to the next topic. This is important, because we want 
    students to learn content which the algorithm is confident would help the student learn. The skip classifier is trained
    online, hence we use a confidence threshold, to be conservative & minimize skipping topics. Skipping is not preferred, 
    but if the classifier is confident the next round would help gain higher rewards, then we should skip. Ideally, we want 
    to consider skipping after the first pulled arm has failed, to avoid frustrating the student. 
    """
    def __init__(self):
        self.clf = linear_model.SGDClassifier()
        
    def check_fitted(self,clf): 
        """
        Check if the classifier is fit before asking for prediction. Our classifier is trained in online mode, hence it would
        be asked to predict before fitting. This method makes sure we only ask for prediction after a data point has been 
        fit to the estimator/model
        """
        return hasattr(clf, "classes_")
    
    def train(self,student,pta,next_topic_pta,label):
        """
        Used to train the classifier in online mode, over every data point. In future we might want to consider training in 
        mini-batches, rather than for every data point. 
        """
        X = pd.Series()
        X = X.append([student,pd.Series([pta,next_topic_pta],index=['pta','next_topic_pta'])])
        X = np.array([X.values])
        Y = np.array([label])
        clf = self.clf.partial_fit(X,Y,classes=np.array([0,1]))
               
    def predict(self,student,pta,next_topic_pta):
        """
        Gets predictions from the classifier, along with the confidence score to help determine the reliability / confidence
        level of the prediction being made. 
        """
        X = pd.Series()
        X = X.append([student,pd.Series([pta,next_topic_pta],index=['pta','next_topic_pta'])])
        if self.check_fitted(self.clf):
            Y = self.clf.predict([X.values])[0]
            confidence_score = self.clf.decision_function([X.values])[0]
        else:
            Y = 0
            confidence_score = 0
        return Y , confidence_score 

# SkipTopic

In [39]:
class SkipTopic:
    """
    A wrapper around the Skip Classifier to validate the inputs, before sending it to Skip Classifier for prediction. 
    It post-processes the results of the prediction made by skip classifier to check for confidence threshold, 
    before sending out the decision to skip or not. 
    """
    def __init__(self):
        """
        Initializes the SkipTopic class & sets confidence threshold to make confident skip decisions.
        """
        self.skipClassifier = SkipClassifier()
        self.confidence_threshold = 55 # It the confidence score returned by the classifier is greater than this, then we trust in the decision made by the classifier. 
                
    def skipTopic(self,student,pta,topic_number,context_obj,topic_content,oracle):
        """
        Pre-validates the topic number before asking the skip classifier for a prediction. Then checks the confidence 
        of the prediction before sending out the decision to skip or not. 
        """
        contentContext = context_obj.getContentContext() # Get the content dataframe.
        topic = context_obj.getTopic() # Get the topic list. 
        current_topic_index = topic.index(topic_number) # Get the index number of the current topic
        next_topic_index = current_topic_index + 1
        next_topic = '' # Initialized to make it accessible outside the if statement. 
        if next_topic_index < len(topic): # Check to see if we're going out of bounds
            next_topic = topic[next_topic_index]
            next_topic_contents = topic_content[next_topic]
            t_c = contentContext.loc[next_topic_contents]
            X = context_obj.prepareContext(student,t_c)
            arm_pulled , next_topic_pta = oracle.expectedPayoff(X,next_topic_contents)
        else:
            # Will be going out of bounds. Current topic is the last topic. No more topics to complete. 
            next_topic_pta = 0
        skip_decision , confidence_score = self.skipClassifier.predict(student,pta,next_topic_pta)
        if skip_decision and confidence_score < self.confidence_threshold:
            skip_decision = 0
        return skip_decision,next_topic_pta

    def setLabel(self,skip_decision,actual_payoff):
        """
        Sets the label for training the skip classifier
        """
        if skip_decision == 0 and actual_payoff == 0:
            label = 1
        if skip_decision == 0 and actual_payoff == 1:
            label = 0
        return label
    
    def train(self,student,pta,pta_next_topic,label):
        """
        Training the skip classifier
        """
        self.skipClassifier.train(student,pta,pta_next_topic,label)              

# Omniscient Policy / Oracle

In [38]:
class Oracle :
    """
    It has the optimal parameters to maximize rewards. The learning algorithm updates its parameters to emulate its parameters
    It is an omniscient policy that knows all of the probability distributions. This is the algorithm which, every step of 
    the way, makes the best decision based on its knowledge of the true distributions (it does not have to learn anything). 
    """
    def __init__(self):
        """
        Initalizes parameters for the omniscient policy. 
        """
        self.rounds = 0 # Number of round played
        self.rounds_data = pd.DataFrame() # Rounds data required for Skip Algorithm
        self.oracle_lock = threading.Lock()
        self.skipTopic = SkipTopic()
    
    def setParameters(self, features , arms): # Setting optimal parameter theta
        """
        Sets the optimal parameters for the omniscient policy. 
        """
        parameters = np.random.uniform(size=(len(arms) , len(features)))
        # Normalize parameters
        for i in range(parameters.shape[0]): # Have it in a list comprehension.
            parameters[i] = parameters[i] / np.sum(parameters[i])
        self.theta_df = pd.DataFrame(data = parameters ,  index = arms , columns = features , dtype= np.float)
    
    def expectedPayoff(self,contexts,arms):
        """
        Gives the max expected pay-off for a round with the given context & available arms. The arm is not pulled up here as we 
        also depend of the decision from the skip classifer before the arm is actually pulled.         
        
        Input : 
        
        contexts : Contextual data available in the round. Its a combination of student & content context
        arms : Arms / Content available in this round. 
        
        Returns : 
        
        arm_pulled : The arm that should be pulled 
        expected_payoff : Expected pay-off for the pulled suggested to be pulled. 
        
        """
        arms_payoff = list()
        for arm in arms:
            arm_theta = self.theta_df.loc[arm]
            X = contexts.loc[arm]
            pta = pd.Series.dot(X,arm_theta) # Vector dim : (1 * d) (d * 1).
            arms_payoff.append(pta)
        arm_index = np.argmax(arms_payoff)
        arm_pulled = arms[arm_index]
        expected_payoff = np.max(arms_payoff)
        return arm_pulled,np.round(expected_payoff,2)

# Simulator

In [37]:
class Simulator:
    
    def __init__(self):
        self.context = Context()
        self.context.setStudentContext()
        self.context.setContentContext()
        self.context.setTopic()
        self.context.setTopicContent()
        self.oracle = Oracle()
        self.skipTopic = SkipTopic()
        self.simulator_lock = threading.Lock()
        self.rounds=0
        self.logs = pd.DataFrame(columns = ['student_number','topic','arm_pulled','pay-off','pay-off_next_topic','skip_decision'
                                            ,'reward']) 

    def getPayoff(self,pta):
        """
        Student shares feedback about the content / understanding of the topic. 
        
        Input : 
        
        pta : Payoff at round 't' for pulling an arm. 
        
        Returns : 
        
        reward : Reward / Feedback from student for the content shown / arm pulled
        """
        reward = bernoulli.rvs(size=1,p=pta)[0] # Simulate student's response
        return reward
    
    def takeCourse(self,student_number,studentContext,contentContext,topic,topic_content):
        """
        This method simulates students taking a course. As part of it, students are presented content for various topics. 
        Students share their feedback, based on which we either move to the next topic or remain on the same topic.  
        We get the expected pay-off from the oracle. We then decide whether to skip or remain on the same topic.
        If skip is true, then the student moves to the next topic, else the student remains on the same topic, shares feedback on 
        the content & we train the skip classifier with this feedback. This method drives the flow of the system, hence key 
        data elements available in this method are logged for analysis.
        
        Inputs : 
        
        student_number : Student Id 
        studentContext : Student context vector. 
        contentContext : Contents context. This has context of all contents for the topic. 
        topic : All the topics to be taught as part of the course. 
        topic_content : Relates all topics to the contents available for every topic     
         
        """
        for i in topic:
            contents = topic_content[i] # You now have all arm associated with the topic 't'
            t_c = contentContext.loc[contents]
            contexts = self.context.prepareContext(studentContext,t_c)
            arms = list(t_c.index)
            while arms:
                arm , pta = self.oracle.expectedPayoff(contexts,arms)
                skip_decision , pta_next_topic = self.skipTopic.skipTopic(studentContext,pta,i,self.context,topic_content,self.oracle)
                if skip_decision:
                    print('We\'re skipping. Student {0} is on topic {1} was expected to be shown content {2}. Expected Pay-off of this arm is {3}, compared to expected pay-off of next round is {4}. Decision of skip classifier is {5}'
                          .format(student_number,i,arm,pta,pta_next_topic,skip_decision))
                    break # Decision is to skip. Hence, we won't pull the arm. 
                else:
                    actual_payoff = self.getPayoff(pta)
                    log = pd.Series([student_number,i,arm,pta,pta_next_topic,skip_decision,actual_payoff], 
                                        index=['student_number','topic','arm_pulled','pay-off',
                                                'pay-off_next_topic','skip_decision','reward']) # Print log for this round
                    with self.simulator_lock:
                        self.rounds+=1
                        print('Student {0} is on topic {1} is shown content {2} feedback recd is {3}. Expected Pay-off of this arm is {4}, compared to expected pay-off of next round is {5}. Decision of skip classifier is {6}'
                              .format(student_number,i,arm,actual_payoff,pta,pta_next_topic,skip_decision))
                        self.logs = self.logs.append(log , ignore_index=True) # Log in a file
                    label = self.skipTopic.setLabel(skip_decision,actual_payoff) # Set Label
                    self.skipTopic.train(studentContext,pta,pta_next_topic,label)
                if actual_payoff != 1:
                    arms.remove(arm)
                else:
                    break # Move to the next topic 

    def main(self):
        """
        Its the main method. Its in the name :)
        """
        studentContext = self.context.getStudentContext() # Student dataframe
        contentContext = self.context.getContentContext() # Content Dataframe
        topic = self.context.getTopic() # List of topics. 
        topic_content = self.context.getTopicContent() # Topics Data, which includes topics to content mapping.
        features = list(studentContext.columns) + list(contentContext.columns)
        self.oracle.setParameters(features , contentContext.index) 
        student_thread = list() # Keep track of students taking the course. 
        for student_number , student in studentContext.iterrows():
            t = threading.Thread(target=self.takeCourse, args=(student_number,student,contentContext,topic,topic_content))
            student_thread.append(t)
            # Some threads do background tasks, like sending keepalive packets, or performing periodic garbage collection, or 
            # whatever. These are only useful when the main program is running, and it's okay to kill them off once the other, 
            # non-daemon, threads have exited. Once the main thread finishes & one of the student is still working through the course. 
            # we will wait for the student to complete the course, since the main thread is completed. We want all students 
            # to complete the course. Hence, setting daemon to False
            t.daemon = False # classifying as a daemon, so they will die when the main dies
            t.start() # begins, must come after daemon definition
        for t in student_thread: # This is done to ensure, we proceed to save the logs only after all students have completed the course. 
            t.join()
        self.logs.to_csv('logs_oracle',index=False)
        print('Total Number of rounds : ', self.rounds)  
    
simulator = Simulator()
simulator.main()

Student 0 is on topic T_1 is shown content C_1_3 feedback recd is 0. Expected Pay-off of this arm is 0.5613074829669537, compared to expected pay-off of next round is 0.5405330029677575. Decision of skip classifier is 0
Actual decision made by classifier :  1
Confidence score returned is 34.57407902427907, which is less than threshold 55
Student 1 is on topic T_1 is shown content C_1_3 feedback recd is 0. Expected Pay-off of this arm is 0.581255215669592, compared to expected pay-off of next round is 0.5578660298323748. Decision of skip classifier is 0
Actual decision made by classifier :  1
Confidence score returned is 47.20465137766126, which is less than threshold 55
Student 2 is on topic T_1 is shown content C_1_2 feedback recd is 1. Expected Pay-off of this arm is 0.6134983106918924, compared to expected pay-off of next round is 0.6436691347516137. Decision of skip classifier is 0
Actual decision made by classifier :  1
Confidence score returned is 37.32974774733678, which is less

Student 1 is on topic T_3 is shown content C_3_2 feedback recd is 0. Expected Pay-off of this arm is 0.32845197822430405, compared to expected pay-off of next round is 0.619022850552005. Decision of skip classifier is 0
Student 3 is on topic T_5 is shown content C_5_1 feedback recd is 0. Expected Pay-off of this arm is 0.47093486487967756, compared to expected pay-off of next round is 0. Decision of skip classifier is 0
Actual decision made by classifier :  1
Confidence score returned is 18.237434360379016, which is less than threshold 55
Student 0 is on topic T_5 is shown content C_5_1 feedback recd is 0. Expected Pay-off of this arm is 0.5485151227461391, compared to expected pay-off of next round is 0. Decision of skip classifier is 0
Actual decision made by classifier :  1
Confidence score returned is 39.37191530186942, which is less than threshold 55
Student 3 is on topic T_5 is shown content C_5_2 feedback recd is 1. Expected Pay-off of this arm is 0.45798812976828335, compared t