# Omniscient Policy / Oracle

This notebook represents an omniscient policy that knows all the probability distributions. This policy knows every step of the way the best decision based on its knowledge of the true distributions. It does not have to learn anything. The oracle has optimal parameters $\theta$, hence it is expected to maximize reward in fewer rounds. 

Before running an experiment, you need to configure the $\textbf{file_path}$ to set the location of the dataset. This is where we have the contextual features and the course outline (topics and content items). Configure $\alpha$ to control exploration, $\textbf{confidence_threshold}$ to control skipping and $\textbf{to_csv}$ to set the file name to log the user interaction. A user interaction stores the content items present to the student, along with the expected payoff of the current and next topic in the sequence, the prediction of the skip classifier and feedback sent by a student. This is required to evaluate the learning algorithm.

- Configure / validate these before you run: 
    1. File Path: Location of the dataset. 
    2. alpha: control exploration
    3. confidence_threshold : 
    4. to_csv :  

In [15]:
import numpy as np
import pandas as pd

# Contexts

In [16]:
import os,pickle

# file_path = os.path.join(os.path.curdir, '..' , 'dataset' , 'very_small')
# file_path = os.path.join(os.path.curdir, '..' , 'dataset' , 'small')
# file_path = os.path.join(os.path.curdir, '..' , 'dataset' , 'medium')
file_path = os.path.join(os.path.curdir, '..' , 'dataset' , 'large')
# file_path = os.path.join(os.path.curdir, '..' , 'dataset' , 'very_large')
'''
Context data for learning
'''
class Context:
    """
    Contextual information required by contextual bandit algorithms to make better predictions. It enscapsulates all data
    about the student , topics & content to experiment with the oracle.
    """
   
    def getStudentContext(self):
        """
        Student Preferences: 
        Visual (S_V) , Text (S_T) , Demo-based (S_D) , Practical (S_P), Step-by-step (S_S) ,Activity / Task based (S_AT), 
        Lecture (S_L) , Audio (S_A) , Self-evaluation (S_SE) , Pre-assessment (S_PA)
        Students preference to learning via various ways are evaluated on a scale from 0 to 1, rather being binary. 
        """
        return self.studentContext
    
    def setStudentContext(self):
        """
        Load the student data
        """
        with open(os.path.join(file_path , 'student.pickle'), 'rb') as student_file:
            self.studentContext= pickle.load(student_file)
    
    def getContentContext(self):
        """
        Content Features 
        Ease of understanding (C_E), Simple / Intuitive (C_I), Surface / In-depth (C_ID), Brief / Concise (C_C), 
        Thorough (C_T), Preference / Well reviewed / Well rated (C_R), Theoritical / Abstract (C_A), 
        Practical / Hands on (C_P), Experimental / Task-based (C_ETB)
        Content preference to learning via various ways are evaluated on a scale from 0 to 1, rather being binary. 
        """
        return self.contentContext
   
    def setContentContext(self):
        """
        Load the content data
        """
        with open(os.path.join(file_path ,'content.pickle'), 'rb') as content_file:
            self.contentContext= pickle.load(content_file)
        
    def getTopic(self):
        """
        Gives the topics part of the course.
        """
        return self.topic
    
    def setTopic(self):
        """
        Loads the topics part of the course
        """
        with open(os.path.join(file_path ,'topic.pickle'), 'rb') as topic_file:
            self.topic = pickle.load(topic_file)
    
    def getTopicContent(self):
        """
         Gets the topic content. topic_content is a map of topics to content. So for every topic, it gives the content 
         available for the topic. In education parlance, for any given topic, it shows the different ways of teaching this
         topic (via contents)
        """
        return self.topic_content
    
    def setTopicContent(self):
        """
        Sets the topic_content variable to the one in the serialized object. topic_content is a map of topics to content. So
        for every topic, it gives the content available for the topic. In education parlance, for any given topic, it shows
        the different ways of teaching this topic (via contents)
        """
        with open(os.path.join(file_path ,'topic_content.pickle'), 'rb') as topic_content_file:
            self.topic_content= pickle.load(topic_content_file)
                
    def prepareContext(self,studentContext,contentContext):
        """
           Given the student & content context available for a round, this method combines them to form a single contextual
           variable
           
           Inputs : 
           
           studentContext: Student contextual information.
           contentContext: Contents contextual information. 
           
           Returns :
           
           context : A combined output of student & content context.
        """
        context = pd.DataFrame() 
        for content in list(contentContext.index):
            c = pd.Series()
            c = c.append([studentContext,contentContext.loc[content]]) # Combine student & content. 
            c['Content_id'] = content
            context = context.append(c, ignore_index=True)
        context = context.set_index('Content_id')
        return context
    
    def loadData(self):
        """
        Method used to test data retrieval. Data generator handles the data generation. This method checks we can retrieve
        data. This is a dummy method used to test data retrieval. Its not invoked in the main program.
        """
        self.setStudentContext()
        self.setContentContext()
        self.setTopic()
        self.setTopicContent()
        print("Student Context Shape : ", self.getStudentContext().shape)
        print("Content Context Shape : ", self.getContentContext().shape)
        print("**********************************")
        print(self.getStudentContext())
        print(type(self.getStudentContext()))
        print('*********************************')
        print(self.getContentContext())
        print(type(self.getContentContext()))
        print('*********************************')
        print(self.getTopic())
        print(type(self.getTopic()))
        print('*********************************')
        print(self.getTopicContent())
        print(type(self.getTopicContent()))
                
c_test = Context()
c_test.loadData()

Student Context Shape :  (400, 10)
Content Context Shape :  (720, 9)
**********************************
      S_V   S_T   S_D   S_P   S_S  S_AT   S_L   S_A  S_SE  S_PA
0    0.95  0.21  0.14  0.98  0.22  0.86  0.93  0.62  0.61  0.22
1    0.11  0.19  0.45  0.65  0.20  0.51  0.27  0.34  0.12  0.92
2    0.06  0.55  0.36  0.61  0.50  0.21  0.88  0.67  0.35  0.37
3    0.94  0.80  0.44  0.33  0.79  0.88  0.13  0.97  0.75  0.35
4    0.28  0.32  0.52  0.02  0.24  0.54  0.96  0.62  0.87  0.64
5    0.78  0.34  0.02  0.02  0.58  0.42  0.67  0.32  0.13  1.00
6    0.50  0.19  0.10  0.31  0.69  0.13  0.00  0.01  0.32  0.07
7    0.40  0.99  0.13  0.46  0.61  0.55  0.90  0.40  0.90  0.90
8    0.94  0.84  0.02  0.97  0.30  0.34  0.19  0.08  0.52  0.91
9    0.08  0.20  0.01  0.62  0.12  0.28  0.93  0.52  0.67  0.68
10   0.59  0.01  0.61  0.98  0.08  0.07  0.39  0.98  0.54  0.16
11   0.99  0.12  0.10  0.69  0.30  0.98  0.19  0.10  0.22  0.77
12   0.68  0.28  0.38  0.98  0.11  0.48  0.80  0.06  0.90  0.71


# Skip Classifier

In [11]:
# Online Stochastic Gradient Descent. This classifier decides whether or not to skip to the next topic. 
# TO-DO : Change loss functions (Log,Hinge,Others) to find if they impact performance. Try different values of parameters 
# For instance SGD has a parameter alpha, SVM has a parameter C. To optimize, you can train a mini-batch of samples, 
# rather than one data point at a time. Try different values of learning_rate. Look at the class_weight parameter if you 
# want to give more weight to samples of one class over the other. Need to understand about warm_start parameter
# We need to record predictions made by the classifier to evaluate its performance over rounds.
from sklearn import linear_model
class SkipClassifier:
    """
    A classifier which gives prediction, whether or not to move to the next topic. This is important, because we want 
    students to learn content which the algorithm is confident would help the student learn. The skip classifier is trained
    online, hence we use a confidence threshold, to be conservative & minimize skipping topics. Skipping is not preferred, 
    but if the classifier is confident the next round would help gain higher rewards, then we should skip. Ideally, we want 
    to consider skipping after the first pulled arm has failed, to avoid frustrating the student. 
    """
    def __init__(self):
        self.clf = linear_model.SGDClassifier()
        self.clf.partial_fit(np.array([[0,0,0,0,0,0,0,0,0,0,0,0]]),np.array([0]),classes=np.array([0,1])) # Used to initialize the skip classifier
#         if os.path.exists('skip_classifier_oracle_large.sav'):
#             self.clf = pickle.load(open('skip_classifier_oracle.sav', 'rb'))
#         else:
#             self.clf = linear_model.SGDClassifier()
#             self.clf.partial_fit(np.array([[0,0,0,0,0,0,0,0,0,0,0,0]]),np.array([0]),classes=np.array([0,1])) # Used to initialize the skip classifier
# #         self.classifier_lock = threading.Lock()
        
        
    def check_fitted(self,clf): 
        """
        Check if the classifier is fit before asking for prediction. Our classifier is trained in online mode, hence it would
        be asked to predict before fitting. This method makes sure we only ask for prediction after a data point has been 
        fit to the estimator/model
        """
        return hasattr(clf, "classes_")
    
    def train(self,student,pta,next_topic_pta,label):
        """
        Used to train the classifier in online mode, over every data point. In future we might want to consider training in 
        mini-batches, rather than for every data point. 
        """
        X = pd.Series()
        X = X.append([student,pd.Series([pta,next_topic_pta],index=['pta','next_topic_pta'])])
        X = np.array([X.values])
        Y = np.array([label])
        self.clf = self.clf.partial_fit(X,Y)
        pickle.dump(self.clf, open('skip_classifier_oracle_large.sav', 'wb'))
               
    def predict(self,student,pta,next_topic_pta):
        """
        Gets predictions from the classifier, along with the confidence score to help determine the reliability / confidence
        level of the prediction being made. 
        """
        X = pd.Series()
        X = X.append([student,pd.Series([pta,next_topic_pta],index=['pta','next_topic_pta'])])
        if self.check_fitted(self.clf):
            Y = self.clf.predict([X.values])[0]
            confidence_score = self.clf.decision_function([X.values])[0]
        else:
            Y = 0
            confidence_score = 0
        return int(Y) , confidence_score 

# SkipTopic

In [12]:
class SkipTopic:
    """
    A wrapper around the Skip Classifier to validate the inputs, before sending it to Skip Classifier for prediction. 
    It post-processes the results of the prediction made by skip classifier to check for confidence threshold, 
    before sending out the decision to skip or not. 
    """
    def __init__(self):
        """
        Initializes the SkipTopic class & sets confidence threshold to make confident skip decisions.
        """
        self.skipClassifier = SkipClassifier()
        self.confidence_threshold = 100 # It the confidence score returned by the classifier is greater than this, then we trust in the decision made by the classifier. 
        self.threshold_updated_count = 1
        self.skip_topic_lock = threading.Lock()
                
    def skipTopic(self,student,pta,topic_number,context_obj,topic_content,oracle):
        """
        Pre-validates the topic number before asking the skip classifier for a prediction. Then checks the confidence 
        of the prediction before sending out the decision to skip or not. 
        """
        contentContext = context_obj.getContentContext() # Get the content dataframe.
        topic = context_obj.getTopic() # Get the topic list. 
        current_topic_index = topic.index(topic_number) # Get the index number of the current topic
        next_topic_index = current_topic_index + 1
        next_topic = '' # Initialized to make it accessible outside the if statement. 
        if next_topic_index < len(topic): # Check to see if we're going out of bounds
            next_topic = topic[next_topic_index]
            next_topic_contents = topic_content[next_topic]
            t_c = contentContext.loc[next_topic_contents]
            X = context_obj.prepareContext(student,t_c)
            arm_pulled , next_topic_pta = oracle.expectedPayoff(X,next_topic_contents)
        else:
            # Will be going out of bounds. Current topic is the last topic. No more topics to complete. 
            next_topic_pta = 0
        actual_decision , confidence_score = self.skipClassifier.predict(student,pta,next_topic_pta)
        if actual_decision and confidence_score > self.confidence_threshold:
            skip_decision = 1
        else:
            skip_decision = 0
        return actual_decision,confidence_score,skip_decision,next_topic_pta

    def setLabel(self,actual_payoff):
        """
        Sets the label for training the skip classifier
        """
#         if actual_payoff == -1:
        if actual_payoff == 0:
            label = 1
        if actual_payoff == 1:
            label = 0
        return label
    
    def train(self,student,pta,pta_next_topic,label):
        """
        Training the skip classifier
        """
        self.skipClassifier.train(student,pta,pta_next_topic,label)   
        
    def updateConfidenceThreshold(self , rounds):
        if np.log10(rounds) > self.threshold_updated_count : 
            self.confidence_threshold /= np.log10(rounds)
            self.threshold_updated_count += 1
            with self.skip_topic_lock:
                print("self.confidence_threshold : {0} and self.threshold_updated_count : {1}".format(self.confidence_threshold,self.threshold_updated_count))
        

# Omniscient Policy / Oracle

In [13]:
class Oracle :
    """
    It has the optimal parameters to maximize rewards. The learning algorithm updates its parameters to emulate its parameters
    It is an omniscient policy that knows all of the probability distributions. This is the algorithm which, every step of 
    the way, makes the best decision based on its knowledge of the true distributions (it does not have to learn anything). 
    """
    def __init__(self):
        """
        Initalizes parameters for the omniscient policy. 
        """
#         self.rounds = 0 # Number of round played
#         self.rounds_data = pd.DataFrame() # Rounds data required for Skip Algorithm
    
    def setParameters(self, features , arms): # Setting optimal parameter theta
        """
        Sets the optimal parameters for the omniscient policy. 
        """
        parameters = np.random.uniform(size=(len(arms) , len(features)))
        # Normalize parameters
        for i in range(parameters.shape[0]): # Have it in a list comprehension.
            parameters[i] = parameters[i] / np.sum(parameters[i])
        self.theta_df = pd.DataFrame(data = parameters ,  index = arms , columns = features , dtype= np.float)
    
    def expectedPayoff(self,contexts,arms):
        """
        Gives the max expected pay-off for a round with the given context & available arms. The arm is not pulled up here as we 
        also depend of the decision from the skip classifer before the arm is actually pulled.         
        
        Input : 
        
        contexts : Contextual data available in the round. Its a combination of student & content context
        arms : Arms / Content available in this round. 
        
        Returns : 
        
        arm_pulled : The arm that should be pulled 
        expected_payoff : Expected pay-off for the pulled suggested to be pulled. 
        
        """
        arms_payoff = list()
        for arm in arms:
            arm_theta = self.theta_df.loc[arm]
            X = contexts.loc[arm]
            pta = pd.Series.dot(X,arm_theta) # Vector dim : (1 * d) (d * 1).
            arms_payoff.append(pta)
#         for i in range(len(arms_payoff)): # Normalize arms_payoff. Required for cases when alpha > 1. Have it in a list comprehension.
#             arms_payoff[i] = arms_payoff[i] / np.sum(arms_payoff)
        arm_index = np.argmax(arms_payoff)
        arm_pulled = arms[arm_index]
        expected_payoff = np.max(arms_payoff)
        return arm_pulled,np.round(expected_payoff,2)

# Simulator

In [14]:
import threading
from scipy.stats import bernoulli

class Simulator:
    
    def __init__(self):
        self.context = Context()
        self.context.setStudentContext()
        self.context.setContentContext()
        self.context.setTopic()
        self.context.setTopicContent()
        self.oracle = Oracle()
        self.skipTopic = SkipTopic()
        self.simulator_lock = threading.Lock()
        self.rounds=0
        self.rounds_interval = 1
#         self.logs = pd.DataFrame(columns = ['student_number','topic','arm_pulled','reward']) 

        self.logs = pd.DataFrame(columns = ['student_number','topic','arm_pulled','pay-off','pay-off_next_topic','actual_decision','skip_decision','skip_enabled'
                                            ,'reward']) 


    def getPayoff(self,pta):
        """
        Student shares feedback about the content / understanding of the topic. 
        
        Input : 
        
        pta : Payoff at round 't' for pulling an arm. 
        
        Returns : 
        
        reward : Reward / Feedback from student for the content shown / arm pulled
        """
        reward = bernoulli.rvs(size=1,p=pta)[0] # Simulate student's response
#         if reward == 0:
#             reward = -1
        return reward
    
    def takeCourse(self,student_number,studentContext,contentContext,topic,topic_content):
        """
        This method simulates students taking a course. As part of it, students are presented content for various topics. 
        Students share their feedback, based on which we either move to the next topic or remain on the same topic.  
        We get the expected pay-off from the oracle. We then decide whether to skip or remain on the same topic.
        If skip is true, then the student moves to the next topic, else the student remains on the same topic, shares feedback on 
        the content & we train the skip classifier with this feedback. This method drives the flow of the system, hence key 
        data elements available in this method are logged for analysis.
        
        Inputs : 
        
        student_number : Student Id 
        studentContext : Student context vector. 
        contentContext : Contents context. This has context of all contents for the topic. 
        topic : All the topics to be taught as part of the course. 
        topic_content : Relates all topics to the contents available for every topic     
         
        """
        for i in topic:
            skip_enabled = False # Done to disable skipping without attempting to teach a student. 
            contents = topic_content[i] # You now have all arm associated with the topic 't'
            t_c = contentContext.loc[contents]
            contexts = self.context.prepareContext(studentContext,t_c)
            arms = list(t_c.index)
            while arms:
                arm , pta = self.oracle.expectedPayoff(contexts,arms)
                actual_decision , confidence_score, skip_decision , pta_next_topic = self.skipTopic.skipTopic(studentContext,pta,i,self.context,topic_content,self.oracle)
                if skip_decision and skip_enabled:
                    log = pd.Series([student_number,i,arm,pta,pta_next_topic,actual_decision,confidence_score,skip_decision,skip_enabled], 
                                        index=['student_number','topic','arm_pulled','pay-off',
                                                'pay-off_next_topic','actual_decision','confidence_score','skip_decision','skip_enabled']) # Print log for this round
                    with self.simulator_lock:
#                         print('We\'re skipping. Student {0} is on topic {1} was expected to be shown content {2}. Expected Pay-off of this arm is {3}, compared to expected pay-off of next round is {4}. Actual decision was {5} with confidence {6} Decision of skip classifier is {7}'
#                           .format(student_number,i,arm,pta,pta_next_topic,actual_decision,confidence_score,skip_decision))                    
                        self.logs = self.logs.append(log , ignore_index=True) # Log in a file
                    break # Decision is to skip. Hence, we won't pull the arm. 
                else:
                    actual_payoff = self.getPayoff(pta)
#                 log = pd.Series([student_number,i,arm,actual_payoff], 
#                                 index=['student_number','topic','arm_pulled','reward']) # Print log for this round
                    log = pd.Series([student_number,i,arm,pta,pta_next_topic,actual_decision,confidence_score,skip_decision,skip_enabled,actual_payoff], 
                                        index=['student_number','topic','arm_pulled','pay-off',
                                                'pay-off_next_topic','actual_decision','confidence_score','skip_decision','skip_enabled','reward']) # Print log for this round
                    with self.simulator_lock:
                        self.rounds+=1
                        if self.rounds > self.rounds_interval:
                            print('{0} rounds completed'.format(self.rounds))
                            self.rounds_interval += 100                        
#                         print('Student {0} is on topic {1} is shown content {2} feedback recd is {3}.'
#                                   .format(student_number,i,arm,actual_payoff))
#                         self.logs = self.logs.append(log , ignore_index=True) # Log in a file
#                         print('Student {0} is on topic {1} is shown content {2} feedback recd is {3}. Expected Pay-off of this arm is {4}, compared to expected pay-off of next round is {5}. Actual decision was {6} with confidence {7}. Decision of skip classifier is {8} and skipping is {9}.'
#                               .format(student_number,i,arm,actual_payoff,pta,pta_next_topic,actual_decision,confidence_score,skip_decision,skip_enabled))
                        self.logs = self.logs.append(log , ignore_index=True) # Log in a file
                    label = self.skipTopic.setLabel(actual_payoff) # Set Label
                    self.skipTopic.train(studentContext,pta,pta_next_topic,label)
                    self.skipTopic.updateConfidenceThreshold(self.rounds)
                if actual_payoff != 1:
                    arms.remove(arm)
                    skip_enabled = True
                else:
                    break # Move to the next topic 

    def main(self):
        """
        Its the main method. Its in the name :)
        """
        studentContext = self.context.getStudentContext() # Student dataframe
        contentContext = self.context.getContentContext() # Content Dataframe
        topic = self.context.getTopic() # List of topics. 
        topic_content = self.context.getTopicContent() # Topics Data, which includes topics to content mapping.
        features = list(studentContext.columns) + list(contentContext.columns)
        self.oracle.setParameters(features , contentContext.index) 
        student_thread = list() # Keep track of students taking the course. 
        for student_number , student in studentContext.iterrows():
            t = threading.Thread(target=self.takeCourse, args=(student_number,student,contentContext,topic,topic_content))
            student_thread.append(t)
            # Some threads do background tasks, like sending keepalive packets, or performing periodic garbage collection, or 
            # whatever. These are only useful when the main program is running, and it's okay to kill them off once the other, 
            # non-daemon, threads have exited. Once the main thread finishes & one of the student is still working through the course. 
            # we will wait for the student to complete the course, since the main thread is completed. We want all students 
            # to complete the course. Hence, setting daemon to False
            t.daemon = False # classifying as a daemon, so they will die when the main dies
            t.start() # begins, must come after daemon definition
        for t in student_thread: # This is done to ensure, we proceed to save the logs only after all students have completed the course. 
            t.join()
        self.logs.to_csv('logs_oracle_large',index=False)
        print('Total Number of rounds : ', self.rounds)  
    
simulator = Simulator()
simulator.main()

2 rounds completed
self.confidence_threshold : 89.77117175026231 and self.threshold_updated_count : 2
self.confidence_threshold : 44.78881127772551 and self.threshold_updated_count : 3
102 rounds completed
202 rounds completed
302 rounds completed
402 rounds completed
502 rounds completed
602 rounds completed
702 rounds completed
802 rounds completed
902 rounds completed
self.confidence_threshold : 14.927443870172556 and self.threshold_updated_count : 4
1002 rounds completed
1102 rounds completed
1202 rounds completed
1302 rounds completed
1402 rounds completed
1502 rounds completed
1602 rounds completed
1702 rounds completed
1802 rounds completed
1902 rounds completed
2002 rounds completed
2102 rounds completed
2202 rounds completed
2302 rounds completed
2402 rounds completed
2502 rounds completed
2602 rounds completed
2702 rounds completed
2802 rounds completed
2902 rounds completed
3002 rounds completed
3102 rounds completed
3202 rounds completed
3302 rounds completed
3402 rounds co

34702 rounds completed
34802 rounds completed
34902 rounds completed
35002 rounds completed
35102 rounds completed
35202 rounds completed
35302 rounds completed
35402 rounds completed
35502 rounds completed
35602 rounds completed
35702 rounds completed
35802 rounds completed
35902 rounds completed
36002 rounds completed
36102 rounds completed
36202 rounds completed
36302 rounds completed
36402 rounds completed
36502 rounds completed
36602 rounds completed
36702 rounds completed
36802 rounds completed
36902 rounds completed
37002 rounds completed
37102 rounds completed
37202 rounds completed
37302 rounds completed
37402 rounds completed
37502 rounds completed
37602 rounds completed
37702 rounds completed
37802 rounds completed
37902 rounds completed
38002 rounds completed
38102 rounds completed
38202 rounds completed
38302 rounds completed
38402 rounds completed
38502 rounds completed
38602 rounds completed
38702 rounds completed
38802 rounds completed
38902 rounds completed
39002 round

Exception ignored in: <function _get_module_lock.<locals>.cb at 0x7fd4f47bb0d0>
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 191, in cb
KeyError: 'pandas._libs.pandas.core.dtypes.cast'


55602 rounds completed
55702 rounds completed
55802 rounds completed
55902 rounds completed
56002 rounds completed
56102 rounds completed
56202 rounds completed
56302 rounds completed
56402 rounds completed
56502 rounds completed
56602 rounds completed
56702 rounds completed
56802 rounds completed
56902 rounds completed
57002 rounds completed
57102 rounds completed
57202 rounds completed
57302 rounds completed
57402 rounds completed
57502 rounds completed
57602 rounds completed
57702 rounds completed
57802 rounds completed
57902 rounds completed
58002 rounds completed
58102 rounds completed
58202 rounds completed
58302 rounds completed
58402 rounds completed
58502 rounds completed
58602 rounds completed
58702 rounds completed
58802 rounds completed
58902 rounds completed
59002 rounds completed
59102 rounds completed
59202 rounds completed
59302 rounds completed
59402 rounds completed
59502 rounds completed
59602 rounds completed
59702 rounds completed
59802 rounds completed
59902 round

In [111]:
# reward_not_0 = pd.read_csv('logs_oracle_medium_CT10_Topics10')
# reward_not_0 = pd.read_csv('logs_linUCB_verySmall_0.001')
reward_not_0 = pd.read_csv('logs_oracle_verySmall')
len(reward_not_0['arm_pulled'].unique())

69

In [40]:
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('logs_oracle_small')
#df[df['confidence_score'] > 25]
df[df['student_number'] == 8]


FileNotFoundError: File b'logs_oracle_small' does not exist

Setting the reward as -1 & 1. 

We now, penalize arms for their wrong predictions. For every wrong prediction, we set the reward as -1, instead of 0. A unique advantage of this is that more content items are explored. This is because, now the expected pay-off of a content item reduces for a wrong prediction. This was not the case earlier, as we did not penalize content items for not getting a reward. 

