# Decentralized Prediction

## Summary

Here you'll find a usage of the decentralized prediction implementation.
You have a py file that do the same avaialble: decentralized.py

The notebook will use the User and Server classes that are used for simulate the network of agents that wants to create a similarity matrix without giving up data.
These classes are documented in the class files

In [1]:
import numpy as np
import pandas as pd
import os
from model.track_collection import TrackCollection
from utils.collection_splitter import splitter
from agent.server import Server
from agent.user import User

## Configuration

We created a way to extract rating of each track from a user's library. 
The probleme we faced is that we just have one real user's library. So we can't use it for the prediction (we need more users, and if we split the lib of this user in multiple libraries, it'll not be relevant because the rating will be the same).
So we created a totally fake music library, and 5 users with a part of the global library and notation on it. 
You can find the details in files:
* data/track_collection_test.json - The global library
* data/users/i.json - The lib of the ith user

So here in the config, the commented code is dynamic, but not relevant as we have just one real library. And the other code is hardcorded for the 5 test users.

In [2]:
#### CONFIG 
number_of_users = 5

### Loading the tracks data; and splitting them into number_of_users collections

track_collection = TrackCollection()
track_collection.load(os.path.join('data', 'track_collection_test.json'))
df_track_collection = track_collection.to_dataframe()
track_list = df_track_collection[['id']]

#user_collections = splitter(track_collection, number_of_users, 0.3)
#user_dfs = []

### Generating the users_dataframes vector with all tracks and their ratings

#for user_collection in user_collections:
#  ndf = user_collection.to_dataframe()[['id', 'rating_score']]
#  user_matrix = track_list.merge(ndf, on='id', how='left').fillna(0)
#  user_dfs.append(user_matrix[['rating_score']])


user_dfs = []
for i in range(number_of_users):
  tc = TrackCollection()
  tc.load(os.path.join('data', 'users', str(i+1)+'.json'))
  ndf = tc.to_dataframe()[['id', 'rating_score']]
  user_matrix = track_list.merge(ndf, on='id', how='left').fillna(0)
  user_dfs.append(user_matrix[['rating_score']])

### Generating the user
users = []
i = 0
for df in user_dfs:
  users.append(User(i, df))
  i += 1

### Setting the user loop: each user have to know which one is the next, in order to compute the decentralized calculuses
for user in users:
  if(user.id < number_of_users-1):
    user.nextNode = users[user.id+1]
  else:
    user.nextNode = users[0]



## Runing the server

Now we have created all the users, and created a loop of users, we can create the server and run it.
By runing it, it'll generate the similmarity matrix and spread it to all users.

So at the end of this executio, all users will have the similarity matrix calculated in a decentralized way.

In [3]:
### Generating the server
server = Server(users, track_list)

### Running the server
server.run()

### Printing the similarity matrix
users[0].similarity_matrix


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,1.0,-0.343876,-inf,0.752071,-0.1438235,-0.07634773,-0.112421,0.087439,-0.09818338,-0.1970848,-0.3309113,0.085644,0.5993677,0.5745961,0.752071,0.236361,-0.2602099,-0.215667,0.520973,0.727986
1,-0.343876,1.0,-inf,-0.3265986,0.8295151,0.8088264,0.7458699,0.542451,-0.1332427,0.09884389,0.6531973,-0.158748,-0.05809929,-0.03390318,-0.3265986,0.045823,-0.1579773,0.572346,-0.175835,0.181458
2,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf
3,0.752071,-0.326599,-inf,1.0,0.0,-2.233763e-12,0.2491364,0.249136,0.0,0.0,0.2,0.56939,0.5336761,0.4982729,1.0,0.757638,3.105366e-12,0.223039,0.305083,0.493865
4,-0.143823,0.829515,-inf,-1.630581e-11,1.0,0.9931906,0.893325,0.297775,-0.2786391,-0.1106473,0.7171372,0.365174,-0.2764081,-0.2729604,9.317606e-12,0.31862,-0.1496356,0.571247,0.257396,0.501739
5,-0.076348,0.808826,-inf,-2.233763e-12,0.9931906,1.0,0.8566475,0.265153,-0.2767417,-0.133505,0.6385725,0.342222,-0.2745259,-0.2711017,0.0,0.254424,-0.1611955,0.485188,0.348204,0.580197
6,-0.112421,0.74587,-inf,0.2491364,0.893325,0.8566475,1.0,0.586207,0.0,0.2128948,0.9135003,0.478619,0.0,1.176071e-12,0.2491364,0.559275,0.1134197,0.701203,0.081969,0.369119
7,0.087439,0.542451,-inf,0.2491364,0.297775,0.2651528,0.5862069,1.0,0.4646419,0.6741669,0.5813184,-0.167228,0.7091101,0.7241379,0.2491364,0.279637,0.2835493,0.436598,-0.335326,0.082026
8,-0.098183,-0.133243,-inf,-1.272162e-11,-0.2786391,-0.2767417,0.0,0.464642,1.0,0.9529048,-2.827027e-12,-0.071226,0.3276232,0.3436414,-7.269499e-12,-0.278021,0.9286466,-0.401113,-0.025102,-0.218752
9,-0.197085,0.098844,-inf,0.0,-0.1106473,-0.133505,0.2128948,0.674167,0.9529048,1.0,0.2563593,-0.087027,0.4003041,0.4198758,-1.110273e-12,-0.12389,0.8720676,-0.122524,-0.214697,-0.267281


## Prediction

Now the similarity matrix is created, we can do the predictions.
The predictions can be computed locally for an user. That's perfect: this way we don't give any inforamtion to other users

In [4]:

### The list of notes that users don't have in their library
userToPredict = []
userToPredict.append([2,3,4,7,8,10,11,14,15,18])
userToPredict.append([0,2,3,9,12,13,14,18,19])
userToPredict.append([2,3,4,5,6,8,10,12,13,14,15,17,19])
userToPredict.append([2,3,7,8,9,10,12,13,14,15,16])
userToPredict.append([1,2,4,5,6,7,8,9,10])

i = 0
for uToPredict in userToPredict:
    for j in uToPredict:
        if users[i].willILikeIt(j):
            print("User %d will probably like song %i " % (i,j))
        else:
            print("User %d will probably not like song %i. Because the predicted score (%f) to low" % (i,j,users[i].average_rating))
    i+=1


User 0 will probably not like song 2. Because the predicted score (0.440000) to low
User 0 will probably not like song 3. Because the predicted score (0.440000) to low
User 0 will probably not like song 4. Because the predicted score (0.440000) to low
User 0 will probably not like song 5. Because the predicted score (0.440000) to low
User 0 will probably not like song 6. Because the predicted score (0.440000) to low
User 0 will probably not like song 10. Because the predicted score (0.440000) to low
User 0 will probably not like song 11. Because the predicted score (0.440000) to low
User 0 will probably not like song 14. Because the predicted score (0.440000) to low
User 0 will probably not like song 15. Because the predicted score (0.440000) to low
User 0 will probably not like song 18. Because the predicted score (0.440000) to low
User 1 will probably not like song 0. Because the predicted score (0.568182) to low
User 1 will probably not like song 2. Because the predicted score (0.56

Here we can the that