Skip to content
Switch branches/tags
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Codacy Badge Documentation Status Build Status

Findmyreviewers (FMR in short) is an open-source project that extracts topics from a piece of text using trained LDA models and tries to find best matching scholars from a pool of scholars.

Under the hood, it uses LDA models to extract topics and tries to find a set of best matches of reviewers.

The web app is built on top of flask and the LDA model is trained with gensim. With slight modification, you can also use other libraries to replace gensim and load your own trained LDA model.


Make sure your Python version is 3.6.x.


Using virtualenv is highly recommended.

If you do not have a virtual environment yet on the project folder, set it up with:

$ virtualenv venv

Then activate the virtual environment:

$ source venv/bin/activate

Install packages:

$ pip install -r requirements.txt

Download demo models:

$ cd trained
$ python
$ cd ..

Install NLTK data:

$ python -m nltk.downloader brown
$ python -m nltk.downloader punkt

Running the server

Initialize web app database:

$ python create_table

Run the web app server:

$ python runserver

Then after navigate to the following address:

To access the dashboard, please visit:

Customization and Development

We have a rough documents available in the /docs folder.

You can also checkout an online version at

There are also some jupyter notebooks in the /tutorial folder. They cover:

  • How we preprocess the data
  • How we trained the model
  • How the matching algorithm is developed


We will keep refining the project as well as the documentation.

Currently we are looking at:

  • Refining the preprocessing procedures
  • Refining LDA model training
  • Implementing Author-topics model

Demo Model and Databases

A trained demo LDA model and a demo database is shipped with this repository.

The LDA model is trained with our complete full text corpus (tons of pdfs). It retains all the states and data you need to further train it with new documents.

The demo database is a portion of our complete database, as the data sources do not allow us to reveal the data.

Therefore, the matching results from our demo database may seem sub-optimal because the lack of complete data.


To focus on more important stuff, we make use of several open-source libraries and projects. We sincerely appreciate their works.

Python Libraries

  • gensim
  • nltk
  • TextBlob
  • flask (and several extensions)



Frontend template:

Dashboard template:


This project is sponsored by:

The Chinese University of Hong Kong, Shenzhen

University of Delaware



Find-my-reviewers matches scholars and paper together with topic extraction (LDA).




No packages published

Contributors 4