ReFORM

Recommendation Engine For Myself (though it can be for anyone if the model is trained on different data)

This is a content-based recommendation system for music that is tailored to my specific tastes. At the heart of the system is an LSTM auto-encoder that learns a compressed representation of any song, first converted into a spectrogram. Pairwise distance is used as a similarity metric in this representation space, where a lower distance between two songs indicate greater similarity. This can be used to rank selected songs against a set of indexed songs.

The rest of this README is a work in progress.

Installation

Tensorflow >= 2.0 is required. Installation instructions can be found on the official website: https://www.tensorflow.org/install

After installing Tensorflow, simply clone this repo: git clone https://github.com/eddddddy/ReFORM.git

Usage

Data Generation

The auto-encoder is trained on mel-spectrogram data. The data file can be generated by running python generate.py. The script expects the DATA_PATH directory to be organized as follows:

DATA_PATH\
    artist1\
        album1\
            song1.wav
            song2.wav
            ...
        album2\
            song3.wav
            ...
        ...
    artist2\
        album3\
            song4.wav
            ...
        ...
    ...

and the mapping.yaml file to be in the following format:

artist1:
  album1:
    - name: song1
    - name: song2
    ...
  album2:
    - name: song3
    ...
  ...
artist2:
  album3:
    - name: song4
    ...
  ...
...

The data file will be placed in the spectrogram_data directory, which will be created if it does not already exist.

Training

To train the model, simply run python net.py. Model weights will be checkpointed into the checkpoints directory.

Representations

The embed.py file contains functionality for encoding spectrogram data into the representation space learned by the auto-encoder. It also contains utilities for working with these representations. For example, use the following code to visualize the representation space:

>>> import embed
>>> e = embed.Embedding('checkpoints/weights.hdf5')
>>> e.calculate(name=['artist1.album1.song1', 'artist2.album2.song2', 'artist3.album3.song3'])
>>> e.plot(boundary='convex')

Example output: To get the similarity between two songs, use:

>>> e.similarity('artist1.album1.song1', 'artist2.album2.song2')

Contributing

Currently not accepting any pull requests, but feel free to fork a copy of this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
scrape		scrape
.gitignore		.gitignore
README.md		README.md
constants.py		constants.py
embed.py		embed.py
generate.py		generate.py
mapping.yaml		mapping.yaml
model_train.png		model_train.png
net.py		net.py
recommend.py		recommend.py
representation.png		representation.png
serialize.py		serialize.py
spectrogram.py		spectrogram.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReFORM

Installation

Usage

Data Generation

Training

Representations

Contributing

About

Releases

Packages

Languages

eddddddy/ReFORM

Folders and files

Latest commit

History

Repository files navigation

ReFORM

Installation

Usage

Data Generation

Training

Representations

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages