UH Related similarity Detector for Qt Jira data
This service was created as a result of the OpenReq project funded by the European Union Horizon 2020 Research and Innovation programme under grant agreement No 732463.
Palmu uses vectors to find similar issues within Qt Jira data. The set of Qt issues is transformed to vector representations using FastText embeddings, and a fast similarity search algorithm is able to find the nearest neighbords for a given query. The idea is that vectors that are close in the embedding space must be somehow related and correspond to duplicates or dependencies in the issue space. This search seems to be good to reduce the search space ( from hundred thousands to hundreds ) in the reduced space a random forest classifier is applied to output the k-th most likely dependencies.
-
Python3, the service is built on python and the basic libraries for numeric analysis are needed. (https://www.python.org/download/releases/3.0/)
-
Faiss library, a library used to perform fast search in high dimensional vector spaces. (https://github.com/facebookresearch/faiss)
-
FastText used for word embeddings (https://fasttext.cc)
-
LightGBM: A gradient boosting decision tree library. ( https://github.com/microsoft/LightGBM )
The service has not been deployed yet. Thus, there's no public API available.
Must download and add to data folder: https://drive.google.com/file/d/1Y1rRyQN8DyZbtnUmIYuYXr08ZF8f-jSS/view https://drive.google.com/file/d/119vRzV00oAlkQFwu1OS5HmFixSvOlep_/view
Must have valid project requirement JSON files in the /data/ folder for the program to build. Then, with Docker installed, run (this will take a while)
docker build . -t palmu
then
docker run -p 9210:9210 --name palmu palmu
GET hostname:9210/getRelated?id={issueId}k={}
Returns a String list of k closest related issues to the given issueId (requires projects posted)
POST hostname:9210/postProject
(project JSON in request body)
Post a new project to Palmu
POST hostname:9210/newIssue
valid OpenReq JSON must be in the request. The system will add this new data point to the current database and then perfom the search.
None at the moment.
None
See the OpenReq Contribution Guidelines here.
Free use of this software is granted under the terms of the EPL version 2 (EPL2.0).