# Environment

The dataset replicates an online learning system. An environment where, students take courses. Each course has a number of topics. Each topic can be presented in different ways to cater preferences of different students coming from diverse backgrounds & preferences. The different ways to present a topic are referred as content. There's an omniscient policy (an oracle) who knows the best way to teach every student. The student shares feedback on the content. If the content is useful, the student is taught the next topic. If the content is not useful, the policy presents the next best content. All feedback from students is recorded & used to decide the next content to be presented. 

The notebook is named after Beta distribution which is used by the oracle to select content. The omniscient knows the best content & hence does not need to explore, unlike the other approaches. However, it adjusts its choices based on the students feedback.

The decision agent, which is the contextual bandit we would train on this data, needs to learn from this dataset to create a policy which minimizes cost (negative feedback) & present content that caters to students preference, based on contexts. Contexts are students preferences on their learning style. We follow the VARK (Video , Audio, Reading, Kinesthetics) model. 

Context help choose the best actions. We have limited contextual information / features at the moment. When the system is stablized, we would increase the contextual data. The best actions are those who maximize rewards. Now, since we're creating the dataset, we're aware about it. But the bandit algorithm knows nothing about it.

# Goal : 

Generate data to train an online adaptive learning system. This system chooses the best content for any given topic. The student shares their feedback. If the feedback is negative, then the next best content is presented. This continues, till we have no other content for the topic. 



# Challenges 

A dataset that is not biased. One that is worthy to learn. 

# Assumptions made for dataset generation. 

We're making no assumptions about students. The student is modelled through contextual information. Rewards are assumed to be discrete {0,1}. Future work would be to have continuous rewards. 
     


# Story

We present a hypothetical questionaire to students, which asks the students the preferred way in which they would like explanations should be to help them understand quickly. The questionaire ask students about the most effective way to helping them understand concepts / learn quickly. Below are the questionaires 

1. Which is the effective ways for help explain concepts to you. 
- Video / Audio / Read / Kinestictics (VARK). So this gives us 4 features about user preferences. 

# Dataset

The dataset comprises of contextual information about the student, represented as a Bernoulli distribution (*currenty given by features about their preference for Video / Audio / Reading / Kinesthetics*). It also has contextual information about the content. Presently, the contextual information we have is whether the content is available for this topic or not. Actions / Contents has prior probability, which represents the prior belief of the teacher on the usuability of the content. This is set randomly. Also, for each action we track the number of rewards (*positive feedback / rewards*) & no reward (*negative feedback*) was observed. The feedback is represented as a Beta distribution. The feedback from the student is a Bernoulli distribution taking values {0,1), with probabilities given by the confidence the oracle has in the content. This confidence is sampled from the Beta distribution of the content. 

Each datapoint is represented as **Contents Topic_Tag|Namespace User_Context |Namespace Content_Context**

Here *Contents* would be all available contents for the topic. The selected content has cost (feedback) & prior probability associated with it. *Tag* identifies the data point. The *namespace* is a place holder for contextual information about the student & content. It has values *student* & *content*. Contextual information has the format *name:value* or *name* . If you only specify the *name*, then it default to *name:1* . Each students preference is presented as  0 or 1. For e.g: *video:1* implies the user prefers video (*because of 1*) & *reading:0* implies the student does not prefer reading (*because of 0*). 

4:1 5 T_1|student video:1 audio:1 reading:0 kinesthetic:1 |content C_1_1 C_1_2 

# Algorithm used for data set generation

Representations : 
- Actions : Beta Distribtuion (a : Number of postive rewards b : Number of negative rewards). Initially, all arms have a uniform distribution with a = 1 & b = 1. Then based on the rewards received, a (for positive reward) & b (for negative reward) get updates.  
- Rewards : Bernoulli distribution ( 0 or 1) 
- 

Step 1: Creation of Contextual information 

- Lets create contextual information about the user like the VARK. Each features is a bernoulli distribution with varying probability of selection For instance V can be 60% , A = 40 % , R=30 , K=50. So, we now have all contextual information to start off with. 

Step 2: Create topics. 

- Topics have action associated with them. Each topic has a variable number of actions which we'll define. For each topic actions are selected by us, as the best action to be taken. How do we decide the best action ? Well, for that we first define each action as a by Beta distribution to indicate number of successes & failures. Each action has a varied value of a & b. We'll use these values to predict a Bernoulli output reward. The predicted output reward is used to predict our action. We update the value of a & b for the action we took.

Step 3: Navigation based on reward. 

- If the reward is zero then we stay on the same topic. We first predict, by picking an action , pass normalized values of a & b through bernoulli & get a prediction. 

Step 3: Create actions

Goal : On the test set our goal is to see if the bandit algorithm is able to find for each topic, the arm which would maximize reward. 

   Initialize the number of students (*number_of_students*) , students preferences (*user_context*) , number of topics in a course (*number_of_topics*) , content metadata (*content_columns*) , number of contents per topic (*no_contents_per_topic*)

 For each student 
   For each topic 
        while content is present
            Oracle selects a content ~Beta(reward,no reward)
            Student returns feedback on content ~Bernoulli {0,1}
            if feedback is positive
                Move to the next topic
            else 
                remove selected content
            Save the data point
            Oracle updates its understanding based on feedback


# Dataset Generation

## Imports

In [231]:
import numpy as np , pandas as pd 
import os , time , copy # Copy : To deep copy python dict. As python dict copy is by reference & not by value
from scipy.stats import bernoulli
from sklearn.preprocessing import LabelEncoder

## Initialize variables

In [244]:
# These are the variables to change your sample size

number_of_students = 50 # Students taking the course. 
user_context = ['video','audio','reading','kinesthetic'] # Student preferences
number_of_topics = 10 # Number of topics in the course
content_columns = ["content_id" , "encoded" , "prior_prob" , "rewards" , "rejections"]
# no_contents_per_topic : This can be a constant or variable. Comment the one you don't want to use
# no_contents_per_topic = [4] * number_of_topics # Same number of contents per topic
no_contents_per_topic = np.random.randint(1,5,number_of_topics) # Variable number of contents per topic.


## Create Student Context Data

In [233]:
context_df = pd.DataFrame(data=np.random.binomial(1 , [0.7,0.6,0.5,0.4] , size=(number_of_students,len(user_context))) , columns = user_context)
context_df.head()

Unnamed: 0,video,audio,reading,kinesthetic
0,1,1,1,0
1,1,1,1,0
2,1,1,1,0
3,1,1,1,0
4,1,1,1,1


In [245]:
# Transform student data in sparse data format required for learning. 
# Initial implemention had context hard-coded. The latter removes the need to hard-code

# features = [] # Student Context
# for index, student_pref in context_df.iterrows():
#     features.append('video:' +  str(student_pref["video"]) + " " + 'audio:' + str(student_pref["audio"]) + " " + 'reading:' + str(student_pref["reading"]) + " " + 'kinesthetic:' + str(student_pref["kinesthetic"]))
# print(features)

features = [] # Student Context
for index, student_pref in context_df.iterrows():
    context_str = ''
    for i in range(len(user_context)):
        context_str += user_context[i] + ":" + str(student_pref[user_context[i]]) + ' '
    features.append(context_str)
print(features)

['video:1 audio:1 reading:1 kinesthetic:0 ', 'video:1 audio:1 reading:1 kinesthetic:0 ', 'video:1 audio:1 reading:1 kinesthetic:0 ', 'video:1 audio:1 reading:1 kinesthetic:0 ', 'video:1 audio:1 reading:1 kinesthetic:1 ', 'video:1 audio:1 reading:1 kinesthetic:0 ', 'video:0 audio:1 reading:1 kinesthetic:1 ', 'video:1 audio:0 reading:1 kinesthetic:0 ', 'video:0 audio:0 reading:0 kinesthetic:0 ', 'video:1 audio:0 reading:0 kinesthetic:0 ', 'video:0 audio:0 reading:0 kinesthetic:1 ', 'video:0 audio:1 reading:1 kinesthetic:1 ', 'video:1 audio:0 reading:1 kinesthetic:1 ', 'video:0 audio:1 reading:0 kinesthetic:0 ', 'video:0 audio:0 reading:1 kinesthetic:0 ', 'video:1 audio:1 reading:0 kinesthetic:0 ', 'video:1 audio:1 reading:0 kinesthetic:0 ', 'video:1 audio:1 reading:1 kinesthetic:0 ', 'video:0 audio:0 reading:1 kinesthetic:0 ', 'video:1 audio:0 reading:1 kinesthetic:0 ', 'video:1 audio:0 reading:0 kinesthetic:0 ', 'video:1 audio:0 reading:0 kinesthetic:1 ', 'video:1 audio:1 reading:0 kine

## Map content to topic

In [246]:
# Prepare topic to content mapping. 
topic_content = {} # Maps topics to content. 
all_topics = [] # Saves all topics for this course. 
all_contents = [] # Saves all content we have for the course
for i,j in enumerate(no_contents_per_topic):
    topic_id = "T_" + str(i+1) # e.g : T_10
    all_topics.append(topic_id)
    content_ids = [] # Temporary variable to help map topic to content. 
    for j_1 in range(1,j+1) : # Number of contents
        c_id = 'C_' + str(i+1) + '_' + str(j_1) # e.g : C_10_2 : Content number 2 for topics 10
        content_ids.append(c_id)
        all_contents.append(c_id)
    topic_content[topic_id] = content_ids
le = LabelEncoder().fit(all_contents)    
print('All topics : ', all_topics)
print('\n All Contents : ' , all_contents)
print('\n Contents per topic : ', topic_content)

All topics :  ['T_1', 'T_2', 'T_3', 'T_4', 'T_5', 'T_6', 'T_7', 'T_8', 'T_9', 'T_10']

 All Contents :  ['C_1_1', 'C_1_2', 'C_1_3', 'C_1_4', 'C_2_1', 'C_2_2', 'C_2_3', 'C_3_1', 'C_3_2', 'C_4_1', 'C_4_2', 'C_4_3', 'C_4_4', 'C_5_1', 'C_5_2', 'C_5_3', 'C_5_4', 'C_6_1', 'C_6_2', 'C_7_1', 'C_7_2', 'C_7_3', 'C_8_1', 'C_8_2', 'C_9_1', 'C_10_1', 'C_10_2', 'C_10_3', 'C_10_4']

 Contents per topic :  {'T_9': ['C_9_1'], 'T_1': ['C_1_1', 'C_1_2', 'C_1_3', 'C_1_4'], 'T_7': ['C_7_1', 'C_7_2', 'C_7_3'], 'T_3': ['C_3_1', 'C_3_2'], 'T_6': ['C_6_1', 'C_6_2'], 'T_10': ['C_10_1', 'C_10_2', 'C_10_3', 'C_10_4'], 'T_4': ['C_4_1', 'C_4_2', 'C_4_3', 'C_4_4'], 'T_5': ['C_5_1', 'C_5_2', 'C_5_3', 'C_5_4'], 'T_2': ['C_2_1', 'C_2_2', 'C_2_3'], 'T_8': ['C_8_1', 'C_8_2']}


In [236]:
# Encode contents id's for sparse data representation.

# topic_content_encoded = {}
# for topic in topic_content.keys():
#     topic_content_encoded[topic] = le.transform(topic_content[topic])
# topic_content_encoded

# Decoding content. (For reference)
# topic_content_decoded = {}
# for t in topic_content_encoded.keys():
#     topic_content_decoded[t] = le.inverse_transform(topic_content_encoded[t])
# topic_content_decoded

## Create content

In [247]:
# Setting probability of arm for oracle to decide the arm to select
# content_prob = {}
# content_prob_encoded = {}
# for t in topic_content:
#     c = topic_content[t]
#     content_prob_per_topic = np.random.random(len(c)) # As teachers might have prefereces to some content, over others
#     # TO-DO : Set content_prob_per_topic to draw samples over a uniform distribution. No preference over content. 
#     content_prob_per_topic_normalized = np.round(content_prob_per_topic / sum(content_prob_per_topic) , 2)
#     for i in range(len(c)):
#         content_prob[c[i]] = content_prob_per_topic_normalized[i]
# print('content_prob',content_prob)
# for c in content_prob.keys():
# # Label Encoder's transform method expects a list. Hence had to cast the content id (a string) to a list. Also, Label 
# # Encoder returns a numpy array, hence we have the [0] , as we only want the encoded version, a number, rather than an array.
#      content_prob_encoded[le.transform([c])[0]] = content_prob[c]
# print('content_prob_encoded',content_prob_encoded)
# print(le.transform('C_9_1'))
       
##################################################################################################################

# Create content dataframe having : content_id , encoded form , prior probability , number of passes (reward) , number of rejections (no reward)

content_df = pd.DataFrame(columns=content_columns)

for t in topic_content.keys():
    c = topic_content[t]
    # Instead of assigning probabilities randomly, another option is to sample from a uniform distribution. 
    # Valid when all contents with a topic have equal chance of selection. 
    # Though random is not wrong too. 
    content_prob_per_topic = np.random.random(len(c)) # Teachers might have prefereces to some content, over others. These probabilities capture that. 
    content_prob_per_topic_normalized = np.round(content_prob_per_topic / sum(content_prob_per_topic) , 2)
    for i in range(len(c)):
        temp_content_item = {}
        temp_content_item["content_id"] = c[i]
        if le.transform([c[i]])[0] == 0: # VW doesn't like its actions to be encoded as 0. Hence, this hack.
            temp_content_item["encoded"] = len(all_contents)
        else:
            temp_content_item["encoded"] = le.transform([c[i]])[0]
        temp_content_item["prior_prob"] = content_prob_per_topic_normalized[i] 
        temp_content_item["rewards"] = 1 # Parameter 'a' of Beta distribution . 
        temp_content_item["rejections"] = 1 # Parameter 'b' of Beta distribution .
        temp_content_item["beta_dist_sample"] = np.random.beta(1,1)
        content_df = content_df.append(temp_content_item, ignore_index=True)
content_df.set_index("content_id" , inplace=True, verify_integrity=True)
print("Number of contents : " , len(content_df))
content_df
#content_df.loc['C_3_1']['encoded']
#numpy.random.uniform(low=0.0, high=1.0, size=None)¶

# Oracle to select an arm 
# Arm sends a value based on Beta distribution
# Student shares feedback through Bernoulli



Number of contents :  29


Unnamed: 0_level_0,encoded,prior_prob,rewards,rejections,beta_dist_sample
content_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
C_9_1,28,1.0,1,1,0.955019
C_1_1,4,0.02,1,1,0.542714
C_1_2,5,0.46,1,1,0.079422
C_1_3,6,0.29,1,1,0.554696
C_1_4,7,0.23,1,1,0.707773
C_7_1,23,0.34,1,1,0.962667
C_7_2,24,0.39,1,1,0.090885
C_7_3,25,0.26,1,1,0.854891
C_3_1,11,0.37,1,1,0.660449
C_3_2,12,0.63,1,1,0.665315


In [240]:
# For our use case, file name would be dynamic. Coz, in write mode, the previous version of the file is overwritten. 
# To start with, we would like to keep all the data we generated , based on the assumption we make. 
# 

# !pwd
# # /vagrant/project_env/Adaptive-Learning/notebooks
# !ls ../dataset
# # README.md  Topics.csv

# Code for reading file: Working
# import os
# file_path = os.path.join(".." , "dataset" , "README.md") # Find better alternatives for reading files. 
# f = open(file_path , "r")
# f.readlines()

# append the following to file name. 

#print(timestr)

# temp = range(5)
# timestr = time.strftime("%Y%m%d-%H%M%S")
# file_name = "data_" + timestr + ".dat"
# file_path = os.path.join(".." , "dataset" , file_name)
# #f = open(file_path , "w" , newline="\n")
# with open(file_path, 'w') as f:
#     for i in temp:
#         f.write(str(i) + "\n")

# #f.close()
# print("After writing")
# Reading a file line by line. 
# with open(file_path, 'r') as f:
# WAY 1
#     for l in f:
#         print(l)

# WAY 2
#     l = f.readline()
#     while l:
#         print(l)
#         l = f.readline()

# Split data into train & test

In [241]:
#content_df.loc['C_9_1' , 'prior_prob']

# for t in topic_content:
#     content_ids = topic_content[t]
#     content = content_df.loc[content_ids]
#     total_sum_of_prior_prob = sum(content['prior_prob'])
#     content['prior_prob'] = content['prior_prob'] / total_sum_of_prior_prob
#     content_df['prior_prob'] = content['prior_prob']
# content_df
# topic_content
# content_df

# Oracle

In [None]:

# Evaluation Strategy : We need to filter our test data, to only have data points for which we received a reward. 
# two ways of going about with it : 
# 1. Consider only those data points for which we received a reward. 
# 2. Consider all data points, irrespective of reward. However, a data point is considered for evaluation only if it matches
# the oracle. If the match gave a reward, that's a good policy. If the match did not give a reward, then that a bad one. 



In [249]:
## TO DO : GET NUMBER OF ARMS YOU HAVE. 
## TO DO : SPECIFY PROBABILITY OF THE OF THE EXPLORATION POLICY TO CHOOSE THIS ACTION. 2 VARIATIONS POSSIBLE
## VARIATION 1: USE THE PRIOR PROBABILITY
## VARIATION 2: USE THE VALUE FROM BETA DISTRIBUTION. Number of postive feedback / number of negative feedback. 

# Normalize probability of content for every topic. As these are generated randomly, hence we have to normalize. 
# This resembles the oracle much better. The oracle knows the best content, hence it doesn't have to explore. 
# The feedback from students is Bernoulli as before, as there's always uncertainty. 


            

# for t in topic_content:
#     content_ids = topic_content[t]
#     content = content_df.loc[content_ids]
#     total_sum_of_prior_prob = sum(content['prior_prob'])
#     content['prior_prob'] = content['prior_prob'] / content_prob_per_topic
#     content_df['prior_prob'] = content['prior_prob']
#     for i , c in content.iterrows() :
#         print('i : {0} c : {1}' , i , c)
#         content_df[i]['prior_prob'] = content[i]['prior_prob']
    


#Oracle. 
dataset = []
for student in features:
#    print("********************* Student : {0}".format(student))
    topic_content_copy = copy.deepcopy(topic_content)
    #print("topic_content_copy : ", topic_content_copy)
    for t in topic_content_copy:
        content_ids = topic_content_copy[t]   
#         content = content_df.loc[content_ids]
#         content_prob_per_topic = sum(content['prior_prob'])
#         content['prior_prob'] = content['prior_prob'] / content_prob_per_topic
        while True and content_ids:
#             print("On topic {0} having content {1} ".format(t , content_ids))
#             print(content_df.loc[content_ids])
            content_df_beta = content_df.loc[content_ids]["beta_dist_sample"]
            selected_content = content_df_beta.idxmax() # Arm/Content selection by omniscient is based on Beta distribution
#             print("selected_content : ", selected_content)
            feedback = bernoulli.rvs(size=1,p=content_df_beta[selected_content]) # Feedback from student is Bernoulli. 
#             print('feedback : ', feedback[0])
            
            # Preparing data point for Vowpal Wabbit. 
            arms = ''
            for c in content_ids:
                if c == selected_content:
                    if feedback[0] == 1: # If we received a positive feedback, then cost is 0
                        arms += str(content_df.loc[selected_content]['encoded'])  + ":" + '0' + ":" + str(content_df.loc[selected_content]['prior_prob']) + ' '
                    else: # If we received no feedback, then cost is 1
                        arms += str(content_df.loc[selected_content]['encoded'])  + ":" + '1' + ":" + str(content_df.loc[selected_content]['prior_prob']) + ' '
                else:
                    arms += str(content_df.loc[c]['encoded']) + ' '
            #print('Arms : ', arms)        
            arms_context = ''
            for c in content_ids:
                arms_context += c + ' '
            #print('arms_context : ', arms_context)        
            
            line = arms + t + "|student " + student + "|content " + arms_context
            print("{0}".format(line))
            dataset.append(line)
            
            # Update parameters based on received feedback. 
            if feedback == 0: # Update parameters based on feedback. 
                content_ids.remove(selected_content)
                content_df.loc[selected_content , "rejections"] += 1
                content_df.loc[selected_content , "beta_dist_sample"] = np.random.beta(content_df.loc[selected_content , "rewards"] , content_df.loc[selected_content , "rejections"])
            else:
                content_df.loc[selected_content , "rewards"] += 1
                content_df.loc[selected_content , "beta_dist_sample"] = np.random.beta(content_df.loc[selected_content , "rewards"] , content_df.loc[selected_content , "rejections"])
                break
            #print(sum(content['prior_prob']))
            
# Write to file 
timestr = time.strftime("%Y%m%d-%H%M%S")
file_name = "data_" + timestr + ".dat"
file_path = os.path.join(".." , "dataset" , file_name)
with open(file_path, 'w') as f:
    for d in dataset:
        f.write(d + "\n")

4 5 6 7:0:0.23 T_1|student video:1 audio:1 reading:1 kinesthetic:0 |content C_1_1 C_1_2 C_1_3 C_1_4 
23 24 25:0:0.26 T_7|student video:1 audio:1 reading:1 kinesthetic:0 |content C_7_1 C_7_2 C_7_3 
28:0:1.0 T_9|student video:1 audio:1 reading:1 kinesthetic:0 |content C_9_1 
21:0:0.14 22 T_6|student video:1 audio:1 reading:1 kinesthetic:0 |content C_6_1 C_6_2 
29:0:0.42 1 2 3 T_10|student video:1 audio:1 reading:1 kinesthetic:0 |content C_10_1 C_10_2 C_10_3 C_10_4 
13 14 15:1:0.12 16 T_4|student video:1 audio:1 reading:1 kinesthetic:0 |content C_4_1 C_4_2 C_4_3 C_4_4 
13 14 16:1:0.12 T_4|student video:1 audio:1 reading:1 kinesthetic:0 |content C_4_1 C_4_2 C_4_4 
13 14:1:0.29 T_4|student video:1 audio:1 reading:1 kinesthetic:0 |content C_4_1 C_4_2 
13:0:0.47 T_4|student video:1 audio:1 reading:1 kinesthetic:0 |content C_4_1 
11:1:0.37 12 T_3|student video:1 audio:1 reading:1 kinesthetic:0 |content C_3_1 C_3_2 
12:1:0.63 T_3|student video:1 audio:1 reading:1 kinesthetic:0 |content C_3_2 
1

13 14 15:1:0.12 T_4|student video:1 audio:0 reading:0 kinesthetic:0 |content C_4_1 C_4_2 C_4_3 
13 14:1:0.29 T_4|student video:1 audio:0 reading:0 kinesthetic:0 |content C_4_1 C_4_2 
13:1:0.47 T_4|student video:1 audio:0 reading:0 kinesthetic:0 |content C_4_1 
11:0:0.37 12 T_3|student video:1 audio:0 reading:0 kinesthetic:0 |content C_3_1 C_3_2 
17:0:0.22 18 19 20 T_5|student video:1 audio:0 reading:0 kinesthetic:0 |content C_5_1 C_5_2 C_5_3 C_5_4 
8:1:0.2 9 10 T_2|student video:1 audio:0 reading:0 kinesthetic:0 |content C_2_1 C_2_2 C_2_3 
9 10:1:0.08 T_2|student video:1 audio:0 reading:0 kinesthetic:0 |content C_2_2 C_2_3 
9:1:0.72 T_2|student video:1 audio:0 reading:0 kinesthetic:0 |content C_2_2 
26:1:0.93 27 T_8|student video:1 audio:0 reading:0 kinesthetic:0 |content C_8_1 C_8_2 
27:0:0.07 T_8|student video:1 audio:0 reading:0 kinesthetic:0 |content C_8_2 
4 5 6 7:1:0.23 T_1|student video:0 audio:0 reading:0 kinesthetic:1 |content C_1_1 C_1_2 C_1_3 C_1_4 
4:1:0.02 5 6 T_1|student 

21 22:1:0.86 T_6|student video:1 audio:1 reading:0 kinesthetic:0 |content C_6_1 C_6_2 
21:0:0.14 T_6|student video:1 audio:1 reading:0 kinesthetic:0 |content C_6_1 
29:0:0.42 1 2 3 T_10|student video:1 audio:1 reading:0 kinesthetic:0 |content C_10_1 C_10_2 C_10_3 C_10_4 
13 14 15 16:0:0.12 T_4|student video:1 audio:1 reading:0 kinesthetic:0 |content C_4_1 C_4_2 C_4_3 C_4_4 
11:1:0.37 12 T_3|student video:1 audio:1 reading:0 kinesthetic:0 |content C_3_1 C_3_2 
12:1:0.63 T_3|student video:1 audio:1 reading:0 kinesthetic:0 |content C_3_2 
17:1:0.22 18 19 20 T_5|student video:1 audio:1 reading:0 kinesthetic:0 |content C_5_1 C_5_2 C_5_3 C_5_4 
18 19 20:1:0.31 T_5|student video:1 audio:1 reading:0 kinesthetic:0 |content C_5_2 C_5_3 C_5_4 
18 19:0:0.08 T_5|student video:1 audio:1 reading:0 kinesthetic:0 |content C_5_2 C_5_3 
8:1:0.2 9 10 T_2|student video:1 audio:1 reading:0 kinesthetic:0 |content C_2_1 C_2_2 C_2_3 
9:1:0.72 10 T_2|student video:1 audio:1 reading:0 kinesthetic:0 |content C_2_

9:1:0.72 10 T_2|student video:0 audio:0 reading:0 kinesthetic:0 |content C_2_2 C_2_3 
10:1:0.08 T_2|student video:0 audio:0 reading:0 kinesthetic:0 |content C_2_3 
26:0:0.93 27 T_8|student video:0 audio:0 reading:0 kinesthetic:0 |content C_8_1 C_8_2 
4:1:0.02 5 6 7 T_1|student video:1 audio:0 reading:0 kinesthetic:0 |content C_1_1 C_1_2 C_1_3 C_1_4 
5 6 7:1:0.23 T_1|student video:1 audio:0 reading:0 kinesthetic:0 |content C_1_2 C_1_3 C_1_4 
5 6:1:0.29 T_1|student video:1 audio:0 reading:0 kinesthetic:0 |content C_1_2 C_1_3 
5:1:0.46 T_1|student video:1 audio:0 reading:0 kinesthetic:0 |content C_1_2 
23:0:0.34 24 25 T_7|student video:1 audio:0 reading:0 kinesthetic:0 |content C_7_1 C_7_2 C_7_3 
28:0:1.0 T_9|student video:1 audio:0 reading:0 kinesthetic:0 |content C_9_1 
21 22:1:0.86 T_6|student video:1 audio:0 reading:0 kinesthetic:0 |content C_6_1 C_6_2 
21:1:0.14 T_6|student video:1 audio:0 reading:0 kinesthetic:0 |content C_6_1 
29:0:0.42 1 2 3 T_10|student video:1 audio:0 reading:0 

9:1:0.72 10 T_2|student video:1 audio:0 reading:0 kinesthetic:0 |content C_2_2 C_2_3 
10:1:0.08 T_2|student video:1 audio:0 reading:0 kinesthetic:0 |content C_2_3 
26:0:0.93 27 T_8|student video:1 audio:0 reading:0 kinesthetic:0 |content C_8_1 C_8_2 
4:0:0.02 5 6 7 T_1|student video:1 audio:1 reading:1 kinesthetic:1 |content C_1_1 C_1_2 C_1_3 C_1_4 
23:0:0.34 24 25 T_7|student video:1 audio:1 reading:1 kinesthetic:1 |content C_7_1 C_7_2 C_7_3 
28:0:1.0 T_9|student video:1 audio:1 reading:1 kinesthetic:1 |content C_9_1 
21 22:0:0.86 T_6|student video:1 audio:1 reading:1 kinesthetic:1 |content C_6_1 C_6_2 
29:0:0.42 1 2 3 T_10|student video:1 audio:1 reading:1 kinesthetic:1 |content C_10_1 C_10_2 C_10_3 C_10_4 
13 14 15:0:0.12 16 T_4|student video:1 audio:1 reading:1 kinesthetic:1 |content C_4_1 C_4_2 C_4_3 C_4_4 
11:1:0.37 12 T_3|student video:1 audio:1 reading:1 kinesthetic:1 |content C_3_1 C_3_2 
12:1:0.63 T_3|student video:1 audio:1 reading:1 kinesthetic:1 |content C_3_2 
17:0:0.22 1

26:0:0.93 27 T_8|student video:0 audio:0 reading:1 kinesthetic:0 |content C_8_1 C_8_2 
4:0:0.02 5 6 7 T_1|student video:0 audio:1 reading:0 kinesthetic:0 |content C_1_1 C_1_2 C_1_3 C_1_4 
23:0:0.34 24 25 T_7|student video:0 audio:1 reading:0 kinesthetic:0 |content C_7_1 C_7_2 C_7_3 
28:0:1.0 T_9|student video:0 audio:1 reading:0 kinesthetic:0 |content C_9_1 
21 22:1:0.86 T_6|student video:0 audio:1 reading:0 kinesthetic:0 |content C_6_1 C_6_2 
21:1:0.14 T_6|student video:0 audio:1 reading:0 kinesthetic:0 |content C_6_1 
29:0:0.42 1 2 3 T_10|student video:0 audio:1 reading:0 kinesthetic:0 |content C_10_1 C_10_2 C_10_3 C_10_4 
13 14 15:0:0.12 16 T_4|student video:0 audio:1 reading:0 kinesthetic:0 |content C_4_1 C_4_2 C_4_3 C_4_4 
11:1:0.37 12 T_3|student video:0 audio:1 reading:0 kinesthetic:0 |content C_3_1 C_3_2 
12:1:0.63 T_3|student video:0 audio:1 reading:0 kinesthetic:0 |content C_3_2 
17:0:0.22 18 19 20 T_5|student video:0 audio:1 reading:0 kinesthetic:0 |content C_5_1 C_5_2 C_5_3

13 14 15:1:0.12 16 T_4|student video:1 audio:1 reading:1 kinesthetic:1 |content C_4_1 C_4_2 C_4_3 C_4_4 
13 14 16:1:0.12 T_4|student video:1 audio:1 reading:1 kinesthetic:1 |content C_4_1 C_4_2 C_4_4 
13 14:1:0.29 T_4|student video:1 audio:1 reading:1 kinesthetic:1 |content C_4_1 C_4_2 
13:0:0.47 T_4|student video:1 audio:1 reading:1 kinesthetic:1 |content C_4_1 
11:1:0.37 12 T_3|student video:1 audio:1 reading:1 kinesthetic:1 |content C_3_1 C_3_2 
12:1:0.63 T_3|student video:1 audio:1 reading:1 kinesthetic:1 |content C_3_2 
17:0:0.22 18 19 20 T_5|student video:1 audio:1 reading:1 kinesthetic:1 |content C_5_1 C_5_2 C_5_3 C_5_4 
8:1:0.2 9 10 T_2|student video:1 audio:1 reading:1 kinesthetic:1 |content C_2_1 C_2_2 C_2_3 
9 10:1:0.08 T_2|student video:1 audio:1 reading:1 kinesthetic:1 |content C_2_2 C_2_3 
9:1:0.72 T_2|student video:1 audio:1 reading:1 kinesthetic:1 |content C_2_2 
26:1:0.93 27 T_8|student video:1 audio:1 reading:1 kinesthetic:1 |content C_8_1 C_8_2 
27:0:0.07 T_8|student

In [238]:
# Oracle : It knows the probabilities of a content sending positive reward at any time step & selects one of them probabilistically, 
# but randomly. 

# for student in features[:1]:
#     for topic in all_topics:
#         encoded_contents = topic_content_encoded[topic]
#         content_probability_encoded = {} # When a student is on a topic. We save the probability of contents for that topic. 
#         print("Topic : {0} . Encoded_contents : {1} ".format(topic , encoded_contents))
#         for ec in encoded_contents: # DataFrame would we a better option.
#             content_probability_encoded[ec] = content_prob_encoded[ec] 
#         while(True and content_probability_encoded):
#             # Normalize to sum probability to 1
#             print("sum(list(content_probability.values())) : " , sum(list(content_probability_encoded.values())))
#             if sum(list(content_probability_encoded.values())) != 1.0:
#                 temp = content_probability_encoded.copy()
#                 for e_c in temp.keys():
#                     content_probability_encoded[e_c] = round(temp[e_c] / sum(list(temp.values())) , 2)
#                 print("After normalization : " , sum(list(content_probability_encoded.values())))
#             selected_content = -1 # Its an integer. Kept of out of exception block to avoid scoping issues. 
#             content_options = [int(e_c) for e_c in content_probability_encoded.keys()]
#             c_p = list(content_probability_encoded.values()) # c_p : content_probability
#             print("content_options : {0} , probability values : {1}".format(content_options , c_p))
#             try:
#                 selected_content = np.random.choice(content_options , 1 ,  p = c_p)[0] # The [0] is because np.random.choice returns as numpy array, but we only want the content selected. 
#             except ValueError as ve:
#                 print(ve)
#                 selected_content = np.random.choice(content_options , 1 )[0] # If probabilities don't sum to 1, then we select content uniformly. Temporary workaround
#             print("selected_content : {0}".format(selected_content))
#             # Get feedback from student. Note, we're using the contents orignal probability, not the normalized one we computed above
#             print("content_prob_encoded[selected_content] : " , content_prob_encoded[selected_content])
#             feedback = bernoulli.rvs(size=1,p=content_prob_encoded[selected_content])
#             print("Feedback : ", feedback)
#             # Now save all this in a file, which we would use for training. 
            
#             if feedback:
#                 break
#             else:
#                 del content_probability_encoded[selected_content]
#                 print("After Deletion content_probability: {0}".format(content_probability_encoded))

#######################################################################################################################

for student in features[:1]:
    for topic in all_topics:
        #encoded_contents = topic_content_encoded[topic]
        content_ids = topic_content[topic]
        content = content_df.loc[content_ids]
        content_prob_per_topic = sum(content['prior_prob'])
        content['prior_prob'] = content['prior_prob'] / content_prob_per_topic # Normalized content
        
        #content_probability_encoded = {} # When a student is on a topic. We save the probability of contents for that topic. 
#         print("Topic : {0} . Encoded_contents : {1} ".format(topic , encoded_contents))
#         for ec in encoded_contents: # DataFrame would we a better option.
#             content_probability_encoded[ec] = content_prob_encoded[ec] 
        while True and content_ids:
            content_df_beta = content_df.loc[content_ids]["beta_dist_sample"]
            selected_content = content_df_beta.argmax()
            feedback = bernoulli.rvs(size=1,p=content_df_beta[selected_content])
            # Normalize to sum probability to 1
            # print("sum(list(content_probability.values())) : " , sum(list(content_probability_encoded.values())))
#             if sum(content['prior_prob']) != 1.0:
                
#                 temp = content_probability_encoded.copy()
#                 for e_c in temp.keys():
#                     content_probability_encoded[e_c] = round(temp[e_c] / sum(list(temp.values())) , 2)
#                 print("After normalization : " , sum(list(content_probability_encoded.values())))
#             selected_content = -1 # Its an integer. Kept of out of exception block to avoid scoping issues. 
#             content_options = [int(e_c) for e_c in content_probability_encoded.keys()]
#             c_p = list(content_probability_encoded.values()) # c_p : content_probability
#             print("content_options : {0} , probability values : {1}".format(content_options , c_p))
#             try:
#                 selected_content = np.random.choice(content_options , 1 ,  p = c_p)[0] # The [0] is because np.random.choice returns as numpy array, but we only want the content selected. 
#             except ValueError as ve:
#                 print(ve)
#                 selected_content = np.random.choice(content_options , 1 )[0] # If probabilities don't sum to 1, then we select content uniformly. Temporary workaround
#             print("selected_content : {0}".format(selected_content))
#             # Get feedback from student. Note, we're using the contents orignal probability, not the normalized one we computed above
#             print("content_prob_encoded[selected_content] : " , content_prob_encoded[selected_content])
            feedback = bernoulli.rvs(size=1,p=content_prob_encoded[selected_content])
            print("Feedback : ", feedback)
            # Now save all this in a file, which we would use for training. 
            
            if feedback:
                break
            else:
                del content_probability_encoded[selected_content]
                print("After Deletion content_probability: {0}".format(content_probability_encoded))

Topic : T_1 . Encoded_contents : [3 4 5] 
sum(list(content_probability.values())) :  1.01
After normalization :  1.0
content_options : [3, 4, 5] , probability values : [0.08, 0.67, 0.25]
selected_content : 4
content_prob_encoded[selected_content] :  0.68
Feedback :  [1]
Topic : T_2 . Encoded_contents : [6] 
sum(list(content_probability.values())) :  1.0
content_options : [6] , probability values : [1.0]
selected_content : 6
content_prob_encoded[selected_content] :  1.0
Feedback :  [1]
Topic : T_3 . Encoded_contents : [7] 
sum(list(content_probability.values())) :  1.0
content_options : [7] , probability values : [1.0]
selected_content : 7
content_prob_encoded[selected_content] :  1.0
Feedback :  [1]
Topic : T_4 . Encoded_contents : [8] 
sum(list(content_probability.values())) :  1.0
content_options : [8] , probability values : [1.0]
selected_content : 8
content_prob_encoded[selected_content] :  1.0
Feedback :  [1]
Topic : T_5 . Encoded_contents : [ 9 10] 
sum(list(content_probability.v

In [239]:
# Iterate through the topics 
# topics : list of topics , so we don't have to read from the file next time 
# dict {topics : set<arms>} : As arms would not be used independently, they have to be used in the context of the topic. 
# 

   # For each topic 
     # Add topic to list of topics 
      # Get the arms associated with each topic. 
        # For each arm 
            # Create an Action / Arm object with values of a , b & probably distribution associated with the arm

In [337]:
topic_content
# Iterate over 'n' students 
  # Iterate over 't' topics 
     # Get arms for a topic 
        # Make a choice among the arms 
        # For the arm picked get a sample from its Beta distribution
        # Give that value to the Bernoulli distribution 
        # Save the value returned by Bernoulli. This represents the reward obtained 
        # Make an entry in the file, as you now have the features , list of actions , rewards obtained from action 
        # (Remember you need to use cost / NOT REWARD)
        # If cost is 1 & there are other arms available, then go back to 'Make a choice among the arms'. Remember to remove the arms already picked just for this loop
        # If cost is 0, make an entry in the file. then go to the next topic. 
        

{'T_1': ['C_1_1', 'C_1_2', 'C_1_3'],
 'T_10': [],
 'T_2': ['C_2_1'],
 'T_3': [],
 'T_4': [],
 'T_5': ['C_5_1', 'C_5_2'],
 'T_6': ['C_6_2'],
 'T_7': [],
 'T_8': ['C_8_1', 'C_8_2', 'C_8_3', 'C_8_4'],
 'T_9': ['C_9_1', 'C_9_2', 'C_9_3', 'C_9_4']}