# Algorithms

### Glossary 

**Student** : Someone trying to learn a course.  
**Environment**: Parameters for optimal reward exist in a environment. Its receives prediction from the learning algorithm, calculates expectated reward for the choosen arm. Sends its to Bernoulli, which returns a reward (to simulate students feedback). It sends feedback to the learning algorithm to update its parameters.  
**Learning Algorithm** : Its a contextual bandit algorithm which predicts the best content for a student. It sends its prediction to nature & gets a feedback.  It updates the parameters of the arm to incorporate the feedback. Its does this iteratively to be close to the optimal policy.
**Arms/Content/Action** : For this notebook, these words mean the same. Its the _content_ presented to the student, or the _arm_ pulled by the bandit algorithm or the _action_ taken to obtain a reward.  
**Course**: A subject a student is trying to master.  
**Topic**: A concept within a course. Every topic has atleast 2 ways of teaching a concept.  
**Content**: A way of teach a student, to streamline learning. This is choosen by the learning algorithm. The learning algorithm chooses content based on student & content features to provide the best experience to the student, which for the learning algorithm is to maximize rewards.   
**Feedback/Reward**: A response send by the student to indicate whether they found the content useful. A recognition of the quality of predictions made by the learning algorithm, which tries to maximize it.  
**Rounds(T)**: Total number of rounds played by all students to complete a course. Parameters are updated in every round.  
**Skip Algorithm** : An online learning algorithm, to predict whether to skip & move on to the next topic or remain at the same topic & present content. Goal is to maximize reward. **TO BE IMPLEMENTED**

Nature estimates expected reward for different arms. If knows the best parameters, but does not reveal it to the learning algorithm. It calculates the expected reward for each arm as $E[{r_{t,a}}|{x_{t,a}}] =  x^{T}_{t,a} θ^∗_{a}$ , where _'r'_ is the expected reward for predicting an action _'$a$'_  at round _'$t$'_ given context information _'$x$'_ about the student & content is equal to the dot product of the transpose of the context vector and the parametric vector $\theta$ of an arm.

### Flow


### TO-DO

- MAKE SURE YOU MULTIPLYING THE RIGHT CONTEXT. A WITH A , V WITH V . ORDER NEEDS TO BE CONSISTENT. 
- NATURE: NORMALIZING THE PARAMETERS , OVER THE CONTEXT,  FOR EACH ARM
- Evaluation of learning algorithm. 
- Multithreading across students to simulate real life scenario. 
- Relevant features for content. More features for student & content. 
- Criteria to trust the skipping predictor. 
- Prototype system , with user interface showing the learning algorithm in action.

In [488]:
import numpy as np
import pandas as pd
from scipy.stats import bernoulli

# Data Creation

In [489]:
'''
This class holds student data. Its made to have all attributes of the students. StudentContext is meant to takes a subset 
of attributes from this class
'''
class Students:
    '''
    student_data created during data generation
    '''
    def setStudentsFeatures(self , student_data):
        self.studentsFeatures = student_data
    
    def getStudentsFeatures(self):
        return self.studentsFeatures
    
'''
This class holds content data. Its made to have all attributes of contents & topics. ContentContext takes a subset of 
attributes of content & topics
'''
class Content:
        
    def getContentData(self): # Rename to content data
        return self.contentsFeatures
    '''
    courseContent created during data generation
    '''
    def setContentData(self,courseContent):
        self.contentsFeatures = courseContent
        
    def getTopics(self):
        return self.topicContent
    '''
    topics created during data generation
    '''   
    def setTopics(self,topics):
        self.topicContent = topics

'''
Class that enscapulates student & content data generators. Its uses the StudentDataGen & ContentDataGen to create data. 
'''
class DataGenerator:
    
    def __init__(self):
        self.studentDataGen = StudentDataGen()
        self.contentDataGenerator = ContentDataGen()
        
    def createStudentData(self):
        self.studentData =  self.studentDataGen.create()

    def getStudentData(self):
        return self.studentData
    
    def createContentData(self):
        self.contentsFeatures = self.contentDataGenerator.getContentsFeatures() 
        self.topicContent = self.contentDataGenerator.getTopicContent()
        
    def getContentData(self):
        return self.contentsFeatures
    
    def getTopicData(self):
        return self.topicContent

'''
This is the student data generator
'''
class StudentDataGen:
    def __init__(self):
        self.number_of_students = 2 # Students taking the course. 
        self.student_context = ['video','audio','reading','kinesthetic'] # Student preferences
        # TO-DO : Have student preferences & probability of having those preferences as a tuple. 
    
    def create(self):
        ## Create Student Context Data
        student_context_df = pd.DataFrame(data=np.random.binomial(1 , [0.7,0.6,0.5,0.4] , 
                            size=(self.number_of_students,len(self.student_context))) , columns = self.student_context)
        return student_context_df
    
'''
This is the content data generator
'''
class ContentDataGen:
    
    def __init__(self):
        self.number_of_topics = 5 # Number of topics in the course
        self.content_context = ['A','B','C','D','E','F'] # Content features. Add meaningful features.
        self.prob_content_context = [0.8,0.7,0.6,0.5,0.4,0.3]
        self.no_contents_per_topic = np.random.randint(2,5,number_of_topics) # Variable number of contents per topic.
    
    def create(self):
        all_contents = list()
        topic_content = {}
        for i,j in enumerate(self.no_contents_per_topic):
            topic_id = "T_" + str(i+1) # e.g : T_10
            content_ids = [] # Temporary variable to help map topic to content. 
            for j_1 in range(1,j+1) : # Number of contents
                c_id = 'C_' + str(i+1) + '_' + str(j_1) # e.g : C_10_2 : Content number 2 for topics 10
                content_ids.append(c_id)
                all_contents.append(c_id)
            topic_content[topic_id] = content_ids   
        return topic_content , all_contents
    
    # Content related features
    def getContentsFeatures(self):
        self.topic_content , self.all_contents = self.create()
        content_context_df = pd.DataFrame(data=np.random.binomial(1 , self.prob_content_context, 
                             size=(sum(self.no_contents_per_topic),len(self.content_context))) , 
                             columns = self.content_context , index=self.all_contents)
        return content_context_df
    
    def getTopicContent(self):
        return self.topic_content


# Contexts

In [490]:
'''
Context data for learning
'''
class Context:
        
    def getStudentContext(self):
        return self.studentContext
    
    def setStudentContext(self , studentFeatures):
        self.studentContext = studentFeatures
    
    def getContentContext(self):
        return self.contentContext
   
    def setContentContext(self , courseContent):
        self.contentContext = courseContent

# Nature / Universe

In [503]:
class Nature : 
    
    '''
    arms: Content ids
    contexts: Featurs
    '''

    def setParameters(self, contexts , arms):
        parameters = np.random.uniform(size=(len(arms) , len(contexts)))
        # Normalize parameters
        for i in range(parameters.shape[0]): # Have it in a list comprehension.
            parameters[i] = parameters[i] / np.sum(parameters[i])
        self.theta_df = pd.DataFrame(data = parameters ,  index = arms , columns = contexts , dtype= np.float)
    
    '''
    X: Context information. 
    arm_id: Id of the arm pulled. 
    '''
    def getReward(self,X,arm_id):
        arm_theta = self.theta_df.loc[arm_id] #Get parameters for the arm predicted by the learning algo
        print('X.type {0} , X.shape {1} , X = {2} '.format(type(X),X.shape,X))
        print('arm_theta.type {0} , arm_theta.shape {1} , arm_theta = {2}  : '.format(type(arm_theta),arm_theta.shape,arm_theta))
        expected_reward = pd.Series.dot(X,arm_theta) # Vector dim : (1 * d) (d * 1).
        print('expected_reward : ', expected_reward)
        reward = bernoulli.rvs(size=1,p=expected_reward)[0] # Simulate student's response
        print('Actual Reward : ', reward)
        return reward

# Learning Algorithm (LinUCB)

In [549]:
# THINK : Need to decide , whether to record data of rounds, where skip algorithm was not activated. This would be the case when
# the right prediction was made on 1st attempt. 

class LinUCB:
    def __init__(self,alpha=0.5):
        self.alpha = alpha # Hyper parameter required for LinUCB to adjust confidence bounds.
        self.arm_params = {} # Maps content to arm object
        self.rounds = 0 # Number of round played
        self.rounds_data = pd.DataFrame() # Rounds data required for Skip Algorithm
                    
    def prepareContext(self,studentContext,contentContext):
        context = pd.DataFrame() 
        for content in list(contentContext.index):
            c = pd.Series()
            c = c.append([studentContext,contentContext.loc[content]]) # Combine student & content. 
            c['Content_id'] = content
            context = context.append(c, ignore_index=True)
        context = context.set_index('Content_id')
        return context
    
    '''
    Method called by Simulator. Encapsulates finding the best arm, making a prediction, getting rewards & 
    updating parameters. 
    studentContext: Context information of student 
    contentContext : Context information of content/arms that can be pulled for this topic. 
    topic: Topic on which predictions are being made. Data needs for skip algorithm. 
    nature: One who knows it all. Get actual/real reward for the pulled arm. 
    '''
    def learn(self, studentContext , contentContext , topic , nature):
        context = self.prepareContext(studentContext,contentContext)
        arms = list(context.index)
        skip_algo = False # Skip algorithm is inactive initially. Activated when predictions give no reward. 
        
        # LinUCB has started 
        
        while True and arms: # Try to find the best arm, till there are no arms to pull. If there are no arms, then move to the next topic. 
            ## If skip algorithm is active, then code for skip algorithm comes here. It here where we'll decide whether to skip to the next topic or predict another content for the same topic. 
            arms_payoff = list() # Check if the payoff values change for arms that have not been pulled. It doesn't change. 
            for arm in arms: # arms is a list of all arms available w.r.t content
                X = context.loc[arm] # Give student & content context for an arm 
                if arm not in self.arm_params: # If new content is added, then parameters would be created for it. 
                    self.arm_params[arm] = Arm(len(X.index)) # Arm class below, has arm specific parameters 
                arm_obj = self.arm_params[arm]
                theta = self.getTheta(arm_obj) # Arm parameter. 
                pta = self.getPta(X , arm_obj) # pta : pay-off/reward at round 't' for arm 'a'. 
                arms_payoff.append(pta)
            expected_payoff = np.max(arms_payoff) # To be used a input data for skip algorithm
            ### Get prediction whether to skip to next topic ? If it return no, then don't skip, else do skip
            
            print('arms_payoff : ', arms_payoff) # Expected pay-off of all arms. 
            arm_index = np.argmax(arms_payoff) # Find the index of the arm which max pay-off
            print('Index of arm with max payoff : ', arm_index)
            arm_pulled = arms[arm_index] # Give me the arm with max pay-off
            print('Arm pulled : ', arm_pulled)
            real_payoff = nature.getReward(context.loc[arm_pulled],arm_pulled) # Get me the reward for arm_pulled
            self.rounds += 1 # Increment number of rounds by 1
            pulled_arm_obj = self.arm_params[arm_pulled] # Get me the arm object for the pulled arm
            pulled_arm_obj.updateParams(context.loc[arm_pulled],real_payoff) # Update parameters of the pulled arm. 
            
            # LinUCB has ended
            
            arms_payoff.remove(expected_payoff) # Remove pay-off of pulled arm
            if real_payoff == 0: 
                skip_algo = True # Activate skip algorithm, as there is potential to skip to next topic
                arms.remove(arm_pulled) # Remove that arm from the list
            if skip_algo:
                if arms_payoff:
                    potential_payoff = np.max(arms_payoff) # Gives the 2nd highest expected pay-off from remaining arms. 
                else:
                    potential_payoff = 0
                self.record_rounds_data(studentContext, topic , expected_payoff,  potential_payoff, real_payoff) # Record round details for skip algorithm
            if real_payoff == 1: # Move to the next topic
                break
    '''
       arm: Arm object
    '''
    def getTheta(self,arm): # Theta is used to compute the mean reward for an arm 
        arm.theta = np.dot(arm.Ainv , arm.b) # A vector
        return arm.theta
    
    def getMean(self, context , arm):
        mean = np.dot(arm.theta.T , context)
#         print('mean : {0} type : {1}'.format(mean,type(mean)))
        return mean
        
    def getUCB(self , context ,arm):
        ucb = np.sqrt(np.dot(np.dot(context.T , arm.Ainv) , context))
        return ucb
    
    def getPta(self, context , arm):
        payoff = self.getMean(context,arm) + self.alpha * self.getUCB(context , arm)
        return payoff
    
    '''
    studentContext: students contextual data
    expected_reward: expected reward of arm played
    sum_of_arm_rewards: rewards for remaining arms, that have not been played
    reward: actual reward received
    '''
    def record_rounds_data(self, studentContext, topic, expected_reward, potential_payoff, real_payoff):
        r_data = {'topic': topic , 'expected_reward': expected_reward , 'potential_reward': potential_payoff , 'reward': real_payoff}
        series = pd.Series()
        series = series.append([studentContext,pd.Series(r_data)])
        print('series in record_rounds_data: ' , series)
        self.rounds_data = self.rounds_data.append(series,ignore_index=True)
    
    def getRoundsData(self):
        return self.rounds_data

class Arm:
    def __init__(self,dimensions):
        self.A = np.identity(dimensions)
        self.b = np.zeros(dimensions)
        self.Ainv = np.linalg.inv(self.A)
        self.theta = np.dot(self.Ainv , self.b)
    
    def updateParams(self, context, reward):
        self.A += np.outer(context,context.T)
        self.b += reward * context
        self.Ainv = np.linalg.inv(self.A)   

# Algorithm


Inputs :\;  $\alpha\; \epsilon\; \mathbb{R}_+ $
Receive : $x_s$ : student context \\
          $x_c$ : content context \\
          $t$ : topic being taught \\
          $e$ : environment \\
Prepare context $X = x_s \cup x_c$ \\

Observe: student context , content context, topic, environment
Prepare context 
For each arm :
    Get expected pay-off
Find arm with max pay-off for current topic. 
Pull the arm
Get actual reward
Update parameters of pulled arm. 
if reward == 0
    get prediction from skip topic
    if skip = true
       move on to the next topic
    else
       stay on current topic & pull the next best arm. 
else 
    move on to the next topic



Skip System 

skip_train: 
    

skip_predict: 
    Input : Student context,expected reward on current topic, expected reward on next topic
    Output : Binary (Skip or not)
    Feedback on prediction 
        - If no skip
            - send reward received for next round with same topic.
        - If skipped
            - send reward received for next round with next topic.



In [550]:
class Simulator:
    
    def __init__(self):
        self.dataGenerator = DataGenerator()
        self.dataGenerator.createStudentData()
        self.dataGenerator.createContentData()
        self.students = Students()
        self.students.setStudentsFeatures(self.dataGenerator.getStudentData())
        self.contents = Content()
        self.contents.setContentData(self.dataGenerator.getContentData())
        self.contents.setTopics(self.dataGenerator.getTopicData())
        self.contexts = Context()
        self.contexts.setStudentContext(self.students.getStudentsFeatures())
        self.contexts.setContentContext(self.contents.getContentData())
        self.nature = Nature()
        self.linUCB = LinUCB()
                  
    def main(self):
        studentContext = self.contexts.getStudentContext() # Student dataframe
        contentContext = self.contexts.getContentContext() # Content Dataframe
        topics = self.contents.getTopics() # Topics Data, which includes topics to content mapping.
        contexts = list(studentContext.columns) + list(contentContext.columns)
        self.nature.setParameters(contexts , contentContext.index)  
        for index , student in studentContext.iterrows():
            for t in topics:
                content = topics[t] # You now have all arm associated with the topic 't'
                X = pd.DataFrame()
                topic_contents = contentContext.loc[content]
                self.linUCB.learn(student , topic_contents , t, self.nature)
        print('Rounds Data: ')
        print(self.linUCB.getRoundsData())
        print('Total Number of rounds : ', self.linUCB.rounds)
        
simulator = Simulator()
simulator.main()

arms_payoff :  [1.118033988749895, 1.118033988749895, 1.224744871391589, 1.224744871391589]
Index of arm with max payoff :  2
Arm pulled :  C_5_3
X.type <class 'pandas.core.series.Series'> , X.shape (10,) , X = A              0.0
B              1.0
C              1.0
D              1.0
E              0.0
F              0.0
audio          1.0
kinesthetic    0.0
reading        1.0
video          1.0
Name: C_5_3, dtype: float64 
arm_theta.type <class 'pandas.core.series.Series'> , arm_theta.shape (10,) , arm_theta = video          0.140600
audio          0.026214
reading        0.115929
kinesthetic    0.161580
A              0.040614
B              0.026159
C              0.150512
D              0.114050
E              0.056172
F              0.168170
Name: C_5_3, dtype: float64  : 
expected_reward :  0.5734638691298761
Actual Reward :  1
arms_payoff :  [1.118033988749895, 1.224744871391589, 1.224744871391589]
Index of arm with max payoff :  1
Arm pulled :  C_6_2
X.type <class 'pandas.cor

arms_payoff :  [1.224744871391589, 1.3228756555322954, 1.224744871391589]
Index of arm with max payoff :  1
Arm pulled :  C_9_2
X.type <class 'pandas.core.series.Series'> , X.shape (10,) , X = A              1.0
B              1.0
C              1.0
D              1.0
E              0.0
F              0.0
audio          1.0
kinesthetic    0.0
reading        1.0
video          1.0
Name: C_9_2, dtype: float64 
arm_theta.type <class 'pandas.core.series.Series'> , arm_theta.shape (10,) , arm_theta = video          0.103831
audio          0.184566
reading        0.108778
kinesthetic    0.092391
A              0.065537
B              0.112266
C              0.067230
D              0.209100
E              0.020304
F              0.035996
Name: C_9_2, dtype: float64  : 
expected_reward :  0.8513090348444625
Actual Reward :  1
arms_payoff :  [1.224744871391589, 1.3228756555322954]
Index of arm with max payoff :  1
Arm pulled :  C_4_2
X.type <class 'pandas.core.series.Series'> , X.shape (10,) , 

arms_payoff :  [1.5760731598251656, 1.3228756555322954, 1.224744871391589]
Index of arm with max payoff :  0
Arm pulled :  C_1_1
X.type <class 'pandas.core.series.Series'> , X.shape (10,) , X = A              1.0
B              1.0
C              1.0
D              1.0
E              1.0
F              0.0
audio          1.0
kinesthetic    1.0
reading        1.0
video          1.0
Name: C_1_1, dtype: float64 
arm_theta.type <class 'pandas.core.series.Series'> , arm_theta.shape (10,) , arm_theta = video          0.012438
audio          0.271546
reading        0.097422
kinesthetic    0.185070
A              0.097813
B              0.084579
C              0.121489
D              0.095154
E              0.003629
F              0.030861
Name: C_1_1, dtype: float64  : 
expected_reward :  0.9691386819156891
Actual Reward :  1
arms_payoff :  [1.3228756555322954, 1.5596531968814578, 1.3228756555322954]
Index of arm with max payoff :  1
Arm pulled :  C_9_2
X.type <class 'pandas.core.series.Serie

# Test Code

In [555]:
# SGD in action 

import numpy as np
from sklearn import linear_model
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1],[-1, -1], [-2, -1]])
Y = np.array([1, 1, 0, 0, 1, 1])
clf = linear_model.SGDClassifier(max_iter=1000)
clf.partial_fit(X,Y,classes=[0, 1])
pred = clf.predict(X[:2, :])
pred
# array([1, 1])


array([1, 1])

In [556]:
#clf.decision_function(X[:2, :])

#X[:2, :]
# array([[-1, -1],
#        [-2, -1]])
clf.decision_function(X[:2,:])

array([29.90049751, 39.85074627])

In [504]:
# Dot product using pandas.Series.dot

# import pandas as pd
# series_1 = pd.Series(data=[1,1,1,0] , index=['A','B','C','D'])
# series_2 = pd.Series(data=[1,1,1,1] , index=['D','C','B','A'])
# pd.Series.dot(series_1,series_2)

# Testing Nature class setParameters method

# contexts = ['Video','Audio','Reading','Kinesthetics','A','B','C','D','E','F']
# arms = ['C_1_1','C_1_2','C_1_3','C_2_1']
# nature = Nature()
# theta_df = nature.setParameters(contexts,arms)
# print(theta_df)
# print(theta_df.loc['C_1_1'])

# Testing Nature class getReward method

# contexts = ['Video','Audio','Reading','Kinesthetics','A','B','C','D','E','F']
# values = [1,1,0,1,1,0,1,1,0,0]
# data_point_series = pd.Series(values , index=contexts)
# print(data_point_series.values)
# nature.getReward(data_point_series,'C_1_2')

# p = {'V' : 0.4 , 'A': 0.3 , 'R':0.2 , 'K': 0.1}
# c = {'V' : 1 , 'A': 0 , 'R':1 , 'K': 0}
# p_series = pd.Series(p)
# c_series = pd.Series(c)
# print('p_series.shape : ', p_series.shape)
# print('c_series.shape : ', c_series.shape)
# #pd.Series.dot(c_series,p_series)

# If I modify a list in a function, would it change the original list too ? Yes, it does. 

# class TestClass:
    
#     def my_func(self,m_l):
#         m_l.remove(2)

# my_list = [1,2,3]
# tc = TestClass()
# tc.my_func(my_list.copy())
# print(my_list)

# a = np.zeros((10, 2))
# b = a.T
# c = b.view()
# c.reshape(5,4)
# a = np.arange(6).reshape(3,2)
# np.ravel(a)

#Condition to check if list is empty

# a = []
# if not a:
#     print("List is empty")
# else:
#     print("List not empty")

# Playing with dot product

# arr_1 = np.array([1,2,3])
# arr_2 = np.array([4,5,6])
# print('arr_1.shape : ' , arr_1.shape)
# print('arr_1.T.shape : ' , arr_1.T.shape)
# np.dot(arr_1,arr_2)

# I = np.eye(3)
# arr_1 = np.array([1,2,3])
# #arr_1 = arr_1.reshape(3,1)
# result = np.dot(arr_1,I)
# result.shape

# df = pd.DataFrame({'col_1' : [1,2,3,4] , 'col_2' : [5,6,7,8]})
# for index , context in df.iterrows():
# #     print(index)
#     print(context * 2)

#Convert a dictionary to a series 

# my_dict = {'col_1' : 1, 'col_2' : 2}
# pd.Series(my_dict)

# Getting selected context based on index

# df = pd.DataFrame({'col_1' : [1,2,3,4] , 'col_2' : ['a','b','c','d']})
# for index , entry in df.iterrows():
#     print(entry['col_1'])
#list(df.columns)
# df_1 = df.loc[[0,2]]
# df_1.loc[2]

# Concatenate versus append for Series

# series_1 = pd.Series({'video':1 , 'audio': 1 , 'reading':1 , 'kinesthetics' : 0})
# series_2 = pd.Series({'A':1 , 'B':0})
# #pd.concat([series_1,series_2] , axis=1 , ignore_index=True , sort=True)
# X_1 = pd.Series()
# X_1 = X_1.append(series_1)
# X_1 = X_1.append(series_2)
# X_1
#series_2 = pd.Series

# Is 'a' accessbiel outside the if block, since it has been defined & set up there ? Yes,it is accessible

# if True:
#     a = 1
# print(a)

# Checking if the list is empty

# my_list = []
# if not my_list:
#     print('Empty')


# Merging student & content context information

# context = pd.concat([student_context_df ,content_context_df] , axis=1)
# context

# number_of_contexts = len(student_context_df.columns) + len(content_context_df.columns)
# print('number_of_contexts : ', number_of_contexts)
# nature_arm_parameter_df = pd.DataFrame(data = np.random.uniform(size=(len(all_contents) , number_of_contexts)) ,  index = all_contents , dtype= np.float16 )
# #nature_arm_parameter_df

number_of_contexts :  10
