Skip to content

Everglow123/MAKG

Repository files navigation

Copyright 2023 by Southeast University & Nanjing University of Posts and Telecommunications.

Time: 3/9/2023 Authors: Heng Zhou & Weizhuo Li & Buye Zhang

Mail: zhouheng2020@seu.edu.cn & liweizhuo@amss.ac.cn & zhangbuye@seu.edu.cn

Description: MAKG is a mobile applications knowledge graph, which is a high-quality knowledge graph about millions of applications and provide an open data resource to the researchers from communities in Semantic Web and CyberSecurity. You can use its original resources and visit the website to enjoy its services.

1. Introduction:

In this work, we present a mobile application knowledge graph, namely MAKG, which merge comprehensive resources (e.g., application markets, encyclopedias, news). to construct a high-quality knowledge graph about millions of applications.

We present a comprehensive framework to construct a mobile application knowledge graph for CyberSecurity, in which a lightweight ontology of apps is defined and concrete steps (App Crawling, Knowledge Extraction, Knowledge Alignment) are instantiated with promising algorithms. It can obtain more structured triples and correspondences among entities from different resources. Besides, we list three use-cases about MAKG that are helpful to provide better services for security analysts and users.

2. Usage:

MAKG consists of five important resources, including Ontology (Knowledge Schema), AppMarket-Triples, Encyclopedias, AppMarket-Alignments, Extraction-Triples.

A technical report with details of these resources and related evaluations can be downloaded in the same address.

Ontology:

We design one lightweight ontology of apps. It can bring a well-defined schema of collected apps so that these apps could share more linkage with each other. It contains 26 basic classes, 11 relations and 45 properties.

We provide two files (appOntology.owl and appSchema.xlsx) for researchers to use it. For the former file, it needs to install protege to open it.

AppMarket-Triples:

This dataset contains raw triples crawlled from Huawei AppGallery, Xiaomi App Store, Google Play, App Store.

All of the files of these triples from application markets are provided in the format of .nt.

Encyclopedias:

This dataset contains the triples of apps crawled from Baidu Baike, Toutiao Baike, Wikipedia.

As the number of Wikipedia is few, we only provide the extracted triples of apps from Baidu Baike, Toutiao Baike.

AppMarket-Alignments:

This dataset contains the alignments of apps, which can share and reuse the description information of apps so as to provide better services based on MAKG for security analysts and users. We utilize two kinds of entity alignment techniques (i.e., Rule miner method, Knowledge graph embedding-based platform) to obtain the best results of them.

We present all the manually labeled alignments among four mainstream application markekts for evaluation.

In addition, we also provide the correspondences that are automatically generated by RuleMiner and KG embedding methods. (i.e., MultiKE, RDGCN, NMN).

Extraction-Triples:

This dataset contains the triples extracted from textual descriptions of apps crawled from application markets. We utilize three strategies (i.e., Infobox-based Method, Named entity recognition, Relation extraction platforms including OpenNRE, DeepKE, FewRel) and select the best models to extract basic triples.

We also provide the labeled corpus for training the methods based on named entity recognition and relation extraction.

3. Use-Cases:

We list the main use-cases of MAKG about cybersecurity in our developed WebSite.

  • MAKG can provide semantic retrievalfor users and security analysts. For example, if one user queries one app, MAKG can present more comprehensive than application markets to the user.

  • MAKG can link the apps to their appearing textual descriptions (e.g., news) with entity linking techniques. Benefited from above cases, users can fully understand the information of apps and avoid downloading some invalid apps.

  • MAKG can help security analysts to detect some sensitive apps, which own more conditions or plausibility than normal apps that become the hotbeds for related cybercriminals. With comprehensive relations and properties of apps, analysts can induce more prior rules and employ promising algorithms to evaluate the sensitivity of apps. It is able to lower the risk of some sensitive apps in advance and delay them published in the application markets.

  • MAKG can recommend some similar apps by our hybrid method for users and security analysts when they request related services, which can further reduce the potential risks and maintain the security of mobile internet.

4. Citation:

If you want to employ this dataset, please cite our paper as follows:

###Normal:

Heng Zhou, Weizhuo Li, Buye Zhang, Qiu Ji, Yiming Tan, and Chongning Na. MAKG: 
A Mobile Application Knowledge Graph for the Research of Cybersecurity. In: Proceedings of China Conference on Knowledge Graph and Semantic Computing, 
Guangzhou, China, Springer, 2021, pp. 321–328.

###BibTeX:

@inproceedings{MAKG2021, 
author = {Heng Zhou, Weizhuo Li, Buye Zhang, Qiu Ji, Yiming Tan, and Chongning Na}, 
title = {Combining Knowledge Graph Embedding and Network Embedding for 
Detecting Similar Mobile Applications}, 
booktitle = {Proceedings of China Conference on Knowledge Graph and Semantic 
Computing,Guangzhou, China}, 
pages={321--328}, 
year={2021},
publisher={Springer}
}

About

移动app知识图谱

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published