Lib2Desc-Dataset

The datasets proposed in Lib2Desc: Automatic generation of security-centric Android app descriptions using third-party libraries paper are published in this repository.

DATASETS

TPL-Dataset:

This is a structured dataset including information about 1900 Android third-party libraries.

TPL_category_textual-description.csv

It includes the name, category, general textual description of third-party libraries(TPL).
TPL_textual_features.csv

It includes rhe name and textual feature vector extracted from the general textual description of TPLs.
TPL_API_features.csv

It includes the name and API feature vector obtained from static analysis of TPLs.
TPL_name_url.csv

It includes the name and the website resource of the description.

AAPI Dataset:

This dataset consists of two parts codes of Android API calls and Javadoc comment of the APIs extracted from original Android API documentation. The dataset is split in train, validation, and test sets.

code.csv

It includes the Android API calls preprocessed as mentioned in the paper.
javadoc.csv

It includes the matching API descriptions for the API calls. This textual information is preprocessed as mentioned in the paper.

The repository hierarchy:

├── AAPI-Dataset
│   ├── dev
│   │   ├── code.csv
│   │   └── javadoc.csv
│   ├── test
│   │   ├── code.csv
│   │   └── javadoc.csv
│   └── train
│       ├── code.csv
│       └── javadoc.csv
├── LICENSE
├── README.md
└── TPL-Dataset
    ├── TPL_API_features.csv
    ├── TPL_category_textual-description.csv
    ├── TPL_name_url.csv
    └── TPL_textual_features.csv

Usage

The datasets can be used for developing machine learning, deep learning, NLG, NLP models. These datasets are constructed and preprocessed for developing classification, NLG, and transformer based seq2seq models.

Related Sections

You can find more information about dataset collection, preprocessing and models in the following sections.

Section 3 Data Collection
Section 3.1 Collecting Third-Party Libraries
Section 4.3.3 Datasets

Contact

Author: Beyza Cevik

Contact: beyzaccevik@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lib2Desc-Dataset

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
AAPI-Dataset		AAPI-Dataset
TPL-Dataset		TPL-Dataset
LICENSE		LICENSE
README.md		README.md

License

beyzacevik/Lib2Desc-Dataset

Folders and files

Latest commit

History

Repository files navigation

Lib2Desc-Dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages