Common datasets used in our research
Data Set | Basic Meta | User Context | ||||||
---|---|---|---|---|---|---|---|---|
Users | Items | Ratings (Scale) | Density | Users | Links (Type) | |||
Ciao [1] | 7,375 | 105,114 | 284,086 | [1, 5] | 0.0365% | 7,375 | 111,781 | Trust |
Epinions [2] | 40,163 | 139,738 | 664,824 | [1, 5] | 0.0118% | 49,289 | 487,183 | Trust |
Douban [3] | 2,848 | 39,586 | 894,887 | [1, 5] | 0.794% | 2,848 | 35,770 | Trust |
LastFM [8] | 1,892 | 17,632 | 92,834 | implicit | 0.27% | 1,892 | 25,434 | Trust |
Data Set | Basic Meta | Context | ||||||
---|---|---|---|---|---|---|---|---|
Users | Tracks | Artists | Albums | Record | Tag | User Profile | Artist Profile | |
NowPlaying [9] | 1,744 | 16,864 | 2,108 | N/A | 1,117,335 | N/A | N/A | N/A |
Xiami [10] | 4,271 | 290,312 | 33,316 | 95,003 | 1,301,486 | Yes | N/A | N/A |
Yahoo Music [source] | 1,800,000 | 136,000 | many | many | 717,000,000 | Yes | N/A | N/A |
30 Music [source][11] | 45167 | 5023108 | 595049 | 217337 | many | Yes | Yes | N/A |
Data Set | Basic Meta | Context | ||||
---|---|---|---|---|---|---|
Users | Papers | FeedBack | Tag | Content | ||
CiteULike [12] | 7,947 | 25,975 | 134,860 | 52,946 | full abstract |
Data Set | Basic Meta | Context | ||||
---|---|---|---|---|---|---|
Users | Locations | FeedBack | relation | Time | ||
Gowalla | 18,737 | 32,510 | 1,278,274 | Yes | Yes |
Data Set | Basic Meta | Context | ||||
---|---|---|---|---|---|---|
Users | Items | Category | Behavior Type | Time | ||
Taobao(Extraction code: xv8o)[24, 25] | 987,994 | 4,162,024 | 9,439 | 5 | Yes |
Data Set | Non-spammer | Spammer | Introduction |
---|---|---|---|
Twitter [4] | 1,295 | 355 | The first column is the user class (i.e., 1 for non-spammers and 2 for spammers) and the subsequent columns numbered from 1 to 62 represent the user characteristics. |
YouTube [5] | 641 | 31 (promoter) 157(spammer) | The first column is the user class (i.e., 1 for promoters, 2 for spammers, and 3 for legitimates) and the subsequent columns numbered from 1 to 60 represent the user characteristics. |
Data Set | Non-spammer | Spammer | Introduction |
---|---|---|---|
Amazon [6] | 3,118 | 1,937 | Colunms in profiles.txt follow this order: userid itemid rating. In labels.txt: 1: spammer 0: non-spammer |
Yelp [7] | 52,815 | 80,466 | Colunms in yelp.txt follow this order: user_id prod_id rating label date. labels -1: spammer 1: non-spammer I recommend you to filter users who have less than 5 ratings. *More information can be found in Google Drive |
Data Set | Year | Annotated method | # Data | # Cyberbullying | Cyberbullying Ratio |
---|---|---|---|---|---|
Formspring [13] | 2010 | Crowdsourcing | 3,915 | 369 | 9.43% |
MySpace [14] | 2011 | Expert Labeling | 2,088 | 434 | 20.79% |
Ask.fm [15] | 2014 | ||||
Instagram [16] | 2014 | Crowdsourcing | 1,954 | 567 | 29% |
Vine [17] | 2015 | Crowdsourcing | 971 | 304 | 31.34% |
BullyingV3.0 [18] | 2015 | Label Algorithm | 7,321 | 2,102 | 28.71% |
WOW [19] | 2016 | Expert Labeling | 16,975 | 137 | 0.81% |
LOL [19] | 2016 | Expert Labeling | 17,354 | 207 | 1.19% |
Twitter [20] | 2017 | Crowdsourcing | 1,303 | 58 | 4.45% |
Wikipedia [21] | 2017 | Crowdsourcing | 37,611 | 338 | 0.9% |
Harassment-Corpus [22] | 2018 | Expert Labeling | 24,189 | 3,119 | 12.89% |
Hate and Abusive Speech [23] | 2018 | Crowdsourcing | 99,799 | 46,009 | 46.1% |
[1]. Tang, J., Gao, H., Liu, H.: mtrust:discerning multi-faceted trust in a connected world. In: International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, Wa, Usa, February. pp. 93–102 (2012)
[2]. Massa, P., Avesani, P.: Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on Recommender systems. pp. 17–24. ACM (2007)
[3]. G. Zhao, X. Qian, and X. Xie, “User-service rating prediction by exploring social users’ rating behaviors,” IEEE Transactions on Multimedia, vol. 18, no. 3, pp. 496–506, 2016.
[4]. Benevenuto, F., Magno, G., Rodrigues, T., & Almeida, V.: Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS). Vol. 6, No. 2010, p. 12. 2010.
[5]. Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., & Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd ACM SIGIR conference on Research and development in information retrieval. pp. 620-627. ACM (2009)
[6]. Xu, Chang, et al. "Uncovering collusive spammers in Chinese review websites." ACM International Conference on Conference on Information & Knowledge Management ACM, 2013:979-988.
[7]. Rayana, Shebuti, and L. Akoglu. "Collective Opinion Spam Detection: Bridging Review Networks and Metadata." ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM, 2015:985-994.
[8]. Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2011. 2nd Workshop on Information Heterogeneity and Fusion in Recom- mender Systems (HetRec 2011). In Proceedings of the 5th ACM conference on Recommender systems (RecSys 2011). ACM, New York, NY, USA
[9]. Eva Zangerle, Martin Pichl, Wolfgang Gassler, and Günther Specht. 2014. #nowplaying Music Dataset: Extracting Listening Behavior from Twitter. In Proceedings of the First International Workshop on Internet-Scale Multimedia Management (WISMM '14). ACM, New York, NY, USA, 21-26
[10]. Wang, Dongjing, et al. "Learning music embedding with metadata for context aware recommendation." Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 2016.
[11]. Turrin R, Quadrana M, Condorelli A, et al. 30Music Listening and Playlists Dataset[C]//RecSys Posters. 2015.
[12]. Hao Wang*, Wu-Jun Li, Relational collaborative topic regression for recommender systems. IEEE Transactions on Knowledge and Data Engineering (TKDE), 27(5): 1343-1355, 2015.
[13]. Reynolds K, Kontostathis A, Edwards L. Using machine learning to detect cyberbullying. Machine learning and applications and workshops (ICMLA), 2011 10th International Conference on. IEEE, 2011, 2: 241-244.
[14]. Bayzick J, Kontostathis A, Edwards L. Detecting the presence of cyberbullying using computer software. In 3rd Annual ACM Web Science Conference (WebSci ‘11). 2011: 1-2.
[15]. Hosseinmardi H, Ghasemianlangroodi A, Han R, et al. Towards understanding cyberbullying behavior in a semi-anonymous social network. Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on. IEEE, 2014: 244-252.
[16]. Hosseinmardi H, Mattson S A, Rafiq R I, et al. Analyzing labeled cyberbullying incidents on the Instagram social network. International Conference on Social Informatics. Springer, Cham, 2015: 49-66.
[17]. Rafiq R I, Hosseinmardi H, Han R, et al. Careful what you share in six seconds: Detecting cyberbullying instances in Vine. Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ACM, 2015: 617-622.
[18]. Sui J. Understanding and fighting bullying with machine learning[D]. The University of Wisconsin-Madison, 2015.
[19]. Bretschneider U, Peters R. Detecting Cyberbullying in Online Communities. ECIS. 2016: ResearchPaper61.
[20]. Chatzakou D, Kourtellis N, Blackburn J, et al. Mean birds: Detecting aggression and bullying on twitter. Proceedings of the 2017 ACM on web science conference. ACM, 2017: 13-22.
[21]. Wulczyn E, Thain N, Dixon L. Ex machina: Personal attacks seen at scale. Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017: 1391-1399.
[22]. Rezvan M, Shekarpour S, Balasuriya L, et al. A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research. Proceedings of the 10th ACM Conference on Web Science. ACM, 2018: 33-36.
[23]. Founta A-M, Djouvas C, Chatzakou D, et al. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. Proceedings of the 11th International Conference on Web and Social Media, ICWSM, 2018.
[24]. Han Z, Xiang L, Pengye Z, et al. Learning Tree-based Deep Model for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
[25]. Han Z, Daqing C, Ziru X, et al. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. arXiv:1902.07565.
[26]. Han Z, Daqing C, Ziru X, et al. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. arXiv:1902.07565.