Findmyreviewers (FMR in short) is an open-source project that extracts topics from a piece of text using trained LDA models and tries to find best matching scholars from a pool of scholars.
Under the hood, it uses LDA models to extract topics and tries to find a set of best matches of reviewers.
The web app is built on top of
flask and the LDA model is trained with
gensim. With slight modification, you can also use other libraries to replace
gensim and load your own trained LDA model.
Make sure your Python version is 3.6.x.
virtualenv is highly recommended.
If you do not have a virtual environment yet on the project folder, set it up with:
$ virtualenv venv
Then activate the virtual environment:
$ source venv/bin/activate
$ pip install -r requirements.txt
Download demo models:
$ cd trained $ python download.py $ cd ..
Install NLTK data:
$ python -m nltk.downloader brown $ python -m nltk.downloader punkt
Running the server
Initialize web app database:
$ python manage.py create_table
Run the web app server:
$ python manage.py runserver
Then after navigate to the following address:
To access the dashboard, please visit:
Customization and Development
We have a rough documents available in the
You can also checkout an online version at http://findmyreviewers.readthedocs.io.
There are also some jupyter notebooks in the
/tutorial folder. They cover:
- How we preprocess the data
- How we trained the model
- How the matching algorithm is developed
We will keep refining the project as well as the documentation.
Currently we are looking at:
- Refining the preprocessing procedures
- Refining LDA model training
- Implementing Author-topics model
Demo Model and Databases
A trained demo LDA model and a demo database is shipped with this repository.
The LDA model is trained with our complete full text corpus (tons of pdfs). It retains all the states and data you need to further train it with new documents.
The demo database is a portion of our complete database, as the data sources do not allow us to reveal the data.
Therefore, the matching results from our demo database may seem sub-optimal because the lack of complete data.
To focus on more important stuff, we make use of several open-source libraries and projects. We sincerely appreciate their works.
- flask (and several extensions)
Frontend template: https://freehtml5.co/elate-free-html5-bootstrap-template/
Dashboard template: https://github.com/puikinsh/gentelella
This project is sponsored by:
The Chinese University of Hong Kong, Shenzhen
University of Delaware