Salento is a statistical bug-detection framework based on the machine learning model used by Bayou. For technical details about Salento refer to the paper Bayesian Specification Learning for Finding API Usage Errors, FSE'17 (link)
- Python3 (Tested with 3.5.1)
- Tensorflow (Tested with 1.4)
To train a Salento model on a data file, say
- Setup environment:
Ensure that the data is in the right JSON format using the schema file
(Optional.) Extract evidences from the data:
python3 src/main/python/scripts/evidence_extractor.py DATA.json DATA-training.json
This will create a
DATA-training.json after extracting evidences from each package in
DATA.json. Run with
--help for more options that you can use to filter the sequences selected for training.
- Go to the model folder and start training with a model configuration:
cd src/main/python/salento/models/low_level_evidences python3 train.py /path/to/DATA-training.json --config config.json
--help to see a description of the model configuration options. Edit
config.json as needed.
To test a trained model on some test data:
1-3. Follow steps 1-3 above to produce a file
DATA-testing.json with evidences.
- Go to the aggregators folder and run one of the aggregators on the test data:
cd src/main/python/salento/aggregators python3 sequence_aggregator.py --data_file /path/to/DATA-testing.json --model_dir /path/to/model/directory
The model directory should contain the trained model's files, such as