# 1 Introduction

You are dining in a restaurant, and once you finish your main course, the attendant appears by your side and recommends a dessert that fits your taste like he could reading your mind. The attendant may not be a clairvoyant, but they could have suggested that dessert by considering your items selected for dinner or simply offering the most popular dessert on the menu.

We have known this kind of recommendation for a while; however, with the rise of e-commerce and streaming services, recommendation systems are part of our daily lives. You finish watching a movie, and a selection of films pops up on your screen, you include an item in the cart of your favorite online store, and related articles are presented to you, and this list keeps growing bigger.

To evaluate different approaches to the recommendation systems, I created a model to suggest new songs based on an existing playlist. Then, using Spotify users' playlists, I took a part of the playlist's songs, recommended the remaining songs, and checked if the recommendations matched the songs on the playlist initially.

<!-- https://developer.spotify.com/documentation/web-api/

https://developer.spotify.com/

https://developer.spotify.com/dashboard/login 

**Objectives:**
- recommend the next song for an user base on his current playlist

**Assumptions:**
- tracks of a given playlist are the favorite songs of the user

**Methodology:**
1. Collaborative filtering
2. Initial sample for each user will take 70% of the tracks in the playlist (check if the sorting of the musics in the playlist has any information - date of inclusion, user sorting, etc) and then use the remaining 30% of the tracks as test.
3. **If we have time:** Hybrid model by adding *Content based filtering* for playlists with few tracks
    - Track features: genre, time, top_track, etc.
**Concerns:**
- cold start: playlists with few tracks
- rare tracks: treatment to be applied to the tracks with low frequency on the database
- only positive rating: we will have only tracks that the user liked, we do not have any data about tracks that they did not like.


**Python Libraries:**
- Sagemaker
- Elasticsearch
- PySpark

-->

## Recommendation systems

A recommendation system suggests a new item based on data about the product, services, or the clients themselves. For example, we have collaborative filtering, which models user preference based on their past interactions.  

https://towardsdatascience.com/neural-collaborative-filtering-96cef1009401

It was considered the **random model** where random songs were selected with the same probability, and 4 variants of the collaborative filtering model with selection criteria based on:
1. **Track occurrence** frequency: the number of times that a given track appeared in the data set will set the probability of that track being selected

2. **Artists co-occurrence**: artists present in the same playlist; for example, artist A is in playlists 1 and 2. Now we have a playlist that we will provide a list of tracks, and this playlist also has artist A. Then we will select randomly one track from playlists 1 or 2.

3. **Albums co-occurrence**: the same as artists, but considering the albums' co-occurrence

4. **Tracks co-occurrence**: the same for tracks

I assumed the playlist's tracks as the favorite songs of a user, then, if they were suggested, the user would like the suggestion, and it could be considered a success.

On the other hand, I did not include any special treatment for *cold starts*, e.g., the item's first appearance. In this case, a random song was selected.

## Data source

The data for the model was acquired using the API Spotipy:
* https://spotipy.readthedocs.io/en/2.19.0/
* https://developer.spotify.com/documentation/web-api/

The data is hierarchical based on:
* User ID
    * Playlist ID
        * Tracks
            * Track ID
                * Track name
                * Artists
                * Album

## Stakeholders

The subject of this study is relevant to:

- **streaming services** suggesting the right content is fundamental to keep their users engaged

- **e-commerce** the model can provide insights on how to improve their suggestions and improve the customer journey

- **business** any executive that wants to guide their company to success can take advantage of knowing how to provide the exemplary service or product to their clients

- **consumers** knowing how the companies make use of their data of goods and services usage and behaviors can instruct them to be aware of these practices and can save them from impulsive behavior of consumption

## Project Structure

The following parts are:

2. **Data extraction:** API data extraction

3. **Exploratory data analysis (EDA) and data cleansing:** general data validation, overall numbers, basic statistics, distributions, data visualizations, and data cleansing

4. **Model:** model definition and testing

5. **Results and conclusion:** summary of the results with graphical visualization, conclusions, and enhancements suggestions