# Model Building (contd.)

In the last part we built half of the recommendation engine. Now in this part we'll build the other half. The purpose of this engine will be to recommend problems solved by users of a higher expertise level to users that have points higher than average in their respective domains (FLTA, LTA, GTA). Let's start building this other half.

## Engine - II

The functioning of this engine can again be better explained with a flowchart as follows:

<img src="RE2.png">

As you can see this is pretty straightforward engine as compared to the previous one. It's functioning will require only minor modification in the previous part's functions. Let's start with the first one of finding users from the next domain.

Finding the next subsequent domain is pretty straightforward since we have an increasing relationship as follows:
    1. FLTA < LTA < GTA < FGTA
    2. Beginner < Intermediate < Advanced < Expert
    
But one other thing we have to take into consideration is some sort of similarity measure like in the previous function we had the difference of total_points that served as a similarity measure. Here, of course, we can't use the same variable. Instead, we will use the success_rate because that will eliminate the obvious, and possibly large difference in points.

In [1]:
#Setting the path
setwd("/home/ankit19/Desktop/Jupyter_Notebooks/Recommendation Engine")

#Importing data
UserData = read.csv("data/ProcessedUserData.csv", stringsAsFactors=F)
ProblemData = read.csv("data/ProcessedProblemData.csv", stringsAsFactors=F)
SubmissionsData = read.csv("data/ProcessedSubmissionsData.csv", stringsAsFactors=F)
TestUsers = read.csv("data/TestUsers.csv", stringsAsFactors=F)

In [2]:
#Function for finding better users
findBetterUsers = function(userID){
    
    #Get user information
    userInfo = subset(UserData, user_id == userID)
    
    #Determine if the user is a learner or a player
    if(userInfo$learner_player == "learner"){ 
        isLearnerPlayer = "learner";
    }else{ 
        isLearnerPlayer = "player"
    }
    
    #Find users that are either learner or player like the target user,
    #of the next expertise, domain, and similar success rate
    if(userInfo$expertise_level == "beginner"){ 
        nextLevel = "intermediate"
    }else if(userInfo$expertise_level == "intermediate"){ 
        nextLevel = "advanced"
    }else{ 
        nextLevel = "expert"
    }

    if(userInfo$level_of_points == "GTA"){ 
        nextDomain = "LTA"
    }else{
        nextDomain = "GTA"
    }
    
    #Set of users that satisfy all of the conditions above
    betterUsers = subset(UserData, 
                          expertise_level == nextLevel
                          & learner_player == isLearnerPlayer
                          & level_of_points == nextDomain, 
                          select = c(user_id, success_rate))
    
    #Calculate the absolute difference between in success_rate
    #of all other users and target user
    betterUsers$srate_diff = abs(betterUsers$success_rate - userInfo$success_rate)
    
    #Get all users that have least absolute difference and
    #have success_rate higher or equal to the target user
    betterUsers = subset(betterUsers, 
                          srate_diff == min(srate_diff)
                          & success_rate >= userInfo$success_rate,
                          select = c(user_id))
    
    #Remove rownames
    rownames(betterUsers) = NULL
    
    #Combine the values into a list
    betterUsersDetails = list(betterUsers, isLearnerPlayer)
    names(betterUsersDetails) = c("betterUsers", "isLearnerPlayer")
    
    #Return data
    betterUsersDetails
}

In [3]:
#Calling the function
betterUsersDetails = findBetterUsers("user_2039")
betterUsersDetails$betterUsers
betterUsersDetails$isLearnerPlayer

user_id
user_210
user_2427
user_2015
user_969
user_229
user_3136
user_146


Similar to the previous function we get two outputs. One is the list of users in the upper level having similar success rate, while other describes if the user is a learner or a player. We'll use this information for finding problems solved by this better set of users that is best suited for our target player to help him level up. Now this function is the exact replica of the function that found unsolved problems attempted by similar users. The only difference here is that we are looking for problems that are unsolved, similar, and better in nature than the ones solved by our target user.

In [4]:
#Function for finding problems solved by users of higher expertise level
findBetterProblems = function(userID, betterUsers, isLearnerPlayer){
    #Get all submissions made by user
    userSubmissions = subset(SubmissionsData, user_id == userID, select=c(problem_id))

    #Get information on all submitted problems
    userSubmissions = subset(ProblemData, problem_id %in% userSubmissions$problem_id)

    if(isLearnerPlayer == "learner"){
        #Get all unique tags the user's submissions have
        #Tags are split on the comma delimiter and then unlisted
        #Only unique values are stored
        uSub_tags = unique(unlist(strsplit(userSubmissions$tags, ",")))

        #Get all the submissions made by similar users
        #Exclude all those problems that are already solved by target user
        otherSubmissions = subset(SubmissionsData, 
                                  (user_id %in% betterUsers$user_id) & 
                                  !(problem_id %in% userSubmissions$problem_id),
                                  select=c(problem_id))

        #Get information on the problems
        otherSubmissions = subset(ProblemData, problem_id %in% otherSubmissions$problem_id, 
                                  select=c(problem_id, tags))
        #Create a dummy column
        otherSubmissions$similar = F

        #For each problem check if their tags match to the tags of user's solved problems
        #If they do, then store TRUE in the "similar" column
        for(i in 1:nrow(otherSubmissions)){
            tags = unique(unlist(strsplit(otherSubmissions[i, "tags"], ",")))
            if(length(intersect(uSub_tags, tags)) > 0){ otherSubmissions[i, "similar"] = T }
        }

        betterUnsolvedProblems = subset(otherSubmissions, similar == T, select=c(problem_id))
        rownames(betterUnsolvedProblems) = NULL
                 
    }else{
        #Get all the level_types user has attempted
        uSub_levels = levels(factor(userSubmissions$level_type))

        #Get all the submissions made by other users
        #Exclude all those problems that are already solved by target user
        otherSubmissions = subset(SubmissionsData, 
                                  (user_id %in% betterUsers$user_id) & 
                                  !(problem_id %in% userSubmissions$problem_id),
                                  select=c(problem_id))

        #Get information on the problems
        otherSubmissions = subset(ProblemData, problem_id %in% otherSubmissions$problem_id)

        #Get all those problems that are at least one of the level_type 
        #that the user has attempted so far
        betterUnsolvedProblems = subset(otherSubmissions, level_type %in% uSub_levels, select=c(problem_id))
        rownames(betterUnsolvedProblems) = NULL
    }

    #Return better unsolved problems
    betterUnsolvedProblems   
}

In [5]:
betterUnsolvedProblems = findBetterProblems("user_2039", 
                                            betterUsersDetails$betterUsers, 
                                            betterUsersDetails$isLearnerPlayer)
nrow(betterUnsolvedProblems)

So the number of observations itself shows that there are over 200 problems that are solved by these users of the next level that our target user can solve to achieve his desired level of expertise. Yet again, we'll make one function called as RecommendationEngine2 that will subsequently call these other two functions and form the second half of our intended recommendation engine.

In [6]:
RecommendationEngine2 = function(userID){
    #Find users having similar success rate in the next level
    betterUsersDetails = findBetterUsers(userID)
    
    #Find problems solved by the users of the aspired level
    betterUnsolvedProblems = findBetterProblems(userID, 
                                                betterUsersDetails$betterUsers,
                                                betterUsersDetails$isLearnerPlayer)
    
    #Get information on top 10 problems to recommend to the target user
    recommendedProblems = subset(ProblemData, problem_id %in% betterUnsolvedProblems$problem_id)
    recommendedProblems = head(recommendedProblems, 10)
    rownames(recommendedProblems) = NULL
    
    #Return the recommended problems
    recommendedProblems
}

In [7]:
#Recommendations for test users 5 & 6
RecommendationEngine2("user_2039")
RecommendationEngine2("user_436")

problem_id,level_type,points,tags
prob_3750,B,500,math
prob_75,A,250,implementation
prob_226,A,250,"brute force,implementation,sortings"
prob_4862,B,500,"binary search,greedy"
prob_3438,A,250,implementation
prob_4320,K,2750,MISC
prob_3431,C,750,"brute force,constructive algorithms,implementation"
prob_6007,F,1500,"math,two pointers"
prob_6126,C,750,"*special,constructive algorithms,greedy"
prob_5646,A,250,"implementation,sortings"


problem_id,level_type,points,tags
prob_6007,F,1500,"math,two pointers"
prob_5427,C,750,"brute force,math,number theory,sortings,two pointers"
prob_2043,F,1500,MISC
prob_663,B,500,"brute force,implementation"
prob_3658,C,750,"brute force,dp"
prob_361,D,1000,"dfs and similar,dp,geometry,greedy,trees"
prob_5565,B,500,greedy
prob_3981,A,250,MISC
prob_4283,C,750,"data structures,greedy,sortings"
prob_3524,C,750,MISC


Now that we've made both the parts of the main recommendation engine, we must merge them into one main function and we must also define the condition that will invoke either of the two sub-engines. If we recall the purpose for two different engines were as follows:

1. The first part applies to users who have fewer than average points in their given domain (FLTA, LTA, GTA) and are hence not close to leveling up. 
2. The second part applies to users who have points higher than the average points in their domain and hence are close to leveling up.
    
Thus we need a condition check of whether a user has lesser than or greater than average points in their respective expertise levels and domains. If we the target user has points lesser than or equal to the average points in that respective domain, then we invoke RecommendationEngine1, else we'll invoke RecommendationEngine2. The output of both the sub-engines will be a list of unsolved problems for the target users with complete information on the problems.