-
Create an endpoint to get the number of times user x listenned to artist y:
- user_id
- Number of listens
-
Create an endpoint to update the listens for artists y by user x:
- user_id
- Update of Number of listens
- Save dataset to csv tu use it back later
-
Create an endpoint for each user to get recommandations such as:
- Random artists: taking a sample of 5 artists
- Still unknown artists: taking a sample of 5 never listenned artists
- Artists that are similar to the ones already listenned
To build the docker image:
docker build -t kptncook-data-challenge .
To check vulnerabilities from docker image:
docker scan kptncook-data-challenge
To run the docker image:
docker run -d --name kptncook-data-challenge_container -p 80:80 kptncook-data-challenge
out: 176b0cb9897582f3c923fd3c179cb700415762ddc96e081176adb726a3a681e5
Commands to run after a code update
docker stop kptncook-data-challenge_container
docker rm kptncook-data-challenge_container
docker build -t kptncook-data-challenge .
docker run -d --name kptncook-data-challenge_container -p 80:80 kptncook-data-challenge
or simply execute run_docker.bat
To access api : http://localhost
To see the doc: http://127.0.0.1/docs
To recommend similar artists to the ones already listenned, here are the steps:
- 5 already listenned artist are selected (knwon artists)
- corrwith() Pandas' function is used to find each 5 previous selected most similar artist, based on a correlation rank.
corrwith() allows us to choose among 3 different correlation ranks : pearson, kendall, spearman.
If no method is precised, corrwith uses Pearson coefficient which works as follows:
Pearson correlation coefficient value | Strength | Direction |
---|---|---|
Greater than .5 | Strong | Positive |
Between .3 and .5 | Moderate | Positive |
Between 0 and .3 | Weak | Positive |
0 | None | None |
Between 0 and –.3 | Weak | Negative |
Between –.3 and –.5 | Moderate | Negative |
Less than –.5 | Strong | Negative |
In the results, mosts of recommendations are between 0 and .3. It shows a weak correlation, however not null.
After some tries with the other 2 ranking method, results appears to be sensitively the same.One hypothesis is that users of this dataset have not listened to enough music, which makes ranking coefficients less effective.