DatasetCollection

Common datasets used in our research

Recommender systems

Social Recommendation

Data Set	Basic Meta					User Context
Data Set	Users	Items	Ratings (Scale)		Density	Users	Links (Type)
Ciao [1]	7,375	105,114	284,086	[1, 5]	0.0365%	7,375	111,781	Trust
Epinions [2]	40,163	139,738	664,824	[1, 5]	0.0118%	49,289	487,183	Trust
Douban [3]	2,848	39,586	894,887	[1, 5]	0.794%	2,848	35,770	Trust
LastFM [8]	1,892	17,632	92,834	implicit	0.27%	1,892	25,434	Trust

Music Recommendation

Data Set	Basic Meta					Context
Data Set	Users	Tracks	Artists	Albums	Record	Tag	User Profile	Artist Profile
NowPlaying [9]	1,744	16,864	2,108	N/A	1,117,335	N/A	N/A	N/A
Xiami [10]	4,271	290,312	33,316	95,003	1,301,486	Yes	N/A	N/A
Yahoo Music [source]	1,800,000	136,000	many	many	717,000,000	Yes	N/A	N/A
30 Music [source][11]	45167	5023108	595049	217337	many	Yes	Yes	N/A

Paper Recommendation

Data Set	Basic Meta			Context
Data Set	Users	Papers	FeedBack	Tag	Content
CiteULike [12]	7,947	25,975	134,860	52,946	full abstract

Location Recommendation

Data Set	Basic Meta			Context
Data Set	Users	Locations	FeedBack	relation	Time
Gowalla	18,737	32,510	1,278,274	Yes	Yes

Product Recommendation

Data Set	Basic Meta			Context
Data Set	Users	Items	Category	Behavior Type	Time
Taobao(Extraction code: xv8o)[24, 25]	987,994	4,162,024	9,439	5	Yes

Spammer detection

Social Network

Data Set	Non-spammer	Spammer	Introduction
Twitter [4]	1,295	355	The first column is the user class (i.e., 1 for non-spammers and 2 for spammers) and the subsequent columns numbered from 1 to 62 represent the user characteristics.
YouTube [5]	641	31 (promoter) 157(spammer)	The first column is the user class (i.e., 1 for promoters, 2 for spammers, and 3 for legitimates) and the subsequent columns numbered from 1 to 60 represent the user characteristics.

Shilling Detection

Data Set	Non-spammer	Spammer	Introduction
Amazon [6]	3,118	1,937	Colunms in profiles.txt follow this order: userid itemid rating. In labels.txt: 1: spammer 0: non-spammer
Yelp [7]	52,815	80,466	Colunms in yelp.txt follow this order: user_id prod_id rating label date. labels -1: spammer 1: non-spammer I recommend you to filter users who have less than 5 ratings. More information can be found in Google Drive*

Cyberbullying Detection

Data Set	Year	Annotated method	# Data	# Cyberbullying	Cyberbullying Ratio
Formspring [13]	2010	Crowdsourcing	3,915	369	9.43%
MySpace [14]	2011	Expert Labeling	2,088	434	20.79%
Ask.fm [15]	2014
Instagram [16]	2014	Crowdsourcing	1,954	567	29%
Vine [17]	2015	Crowdsourcing	971	304	31.34%
BullyingV3.0 [18]	2015	Label Algorithm	7,321	2,102	28.71%
WOW [19]	2016	Expert Labeling	16,975	137	0.81%
LOL [19]	2016	Expert Labeling	17,354	207	1.19%
Twitter [20]	2017	Crowdsourcing	1,303	58	4.45%
Wikipedia [21]	2017	Crowdsourcing	37,611	338	0.9%
Harassment-Corpus [22]	2018	Expert Labeling	24,189	3,119	12.89%
Hate and Abusive Speech [23]	2018	Crowdsourcing	99,799	46,009	46.1%

Reference

[1]. Tang, J., Gao, H., Liu, H.: mtrust:discerning multi-faceted trust in a connected world. In: International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, Wa, Usa, February. pp. 93–102 (2012)

[2]. Massa, P., Avesani, P.: Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on Recommender systems. pp. 17–24. ACM (2007)

[3]. G. Zhao, X. Qian, and X. Xie, “User-service rating prediction by exploring social users’ rating behaviors,” IEEE Transactions on Multimedia, vol. 18, no. 3, pp. 496–506, 2016.

[4]. Benevenuto, F., Magno, G., Rodrigues, T., & Almeida, V.: Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS). Vol. 6, No. 2010, p. 12. 2010.

[5]. Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., & Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd ACM SIGIR conference on Research and development in information retrieval. pp. 620-627. ACM (2009)

[6]. Xu, Chang, et al. "Uncovering collusive spammers in Chinese review websites." ACM International Conference on Conference on Information & Knowledge Management ACM, 2013:979-988.

[7]. Rayana, Shebuti, and L. Akoglu. "Collective Opinion Spam Detection: Bridging Review Networks and Metadata." ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM, 2015:985-994.

[8]. Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2011. 2nd Workshop on Information Heterogeneity and Fusion in Recom- mender Systems (HetRec 2011). In Proceedings of the 5th ACM conference on Recommender systems (RecSys 2011). ACM, New York, NY, USA

[9]. Eva Zangerle, Martin Pichl, Wolfgang Gassler, and Günther Specht. 2014. #nowplaying Music Dataset: Extracting Listening Behavior from Twitter. In Proceedings of the First International Workshop on Internet-Scale Multimedia Management (WISMM '14). ACM, New York, NY, USA, 21-26

[10]. Wang, Dongjing, et al. "Learning music embedding with metadata for context aware recommendation." Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 2016.

[11]. Turrin R, Quadrana M, Condorelli A, et al. 30Music Listening and Playlists Dataset[C]//RecSys Posters. 2015.

[12]. Hao Wang*, Wu-Jun Li, Relational collaborative topic regression for recommender systems. IEEE Transactions on Knowledge and Data Engineering (TKDE), 27(5): 1343-1355, 2015.

[13]. Reynolds K, Kontostathis A, Edwards L. Using machine learning to detect cyberbullying. Machine learning and applications and workshops (ICMLA), 2011 10th International Conference on. IEEE, 2011, 2: 241-244.

[14]. Bayzick J, Kontostathis A, Edwards L. Detecting the presence of cyberbullying using computer software. In 3rd Annual ACM Web Science Conference (WebSci ‘11). 2011: 1-2.

[15]. Hosseinmardi H, Ghasemianlangroodi A, Han R, et al. Towards understanding cyberbullying behavior in a semi-anonymous social network. Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on. IEEE, 2014: 244-252.

[16]. Hosseinmardi H, Mattson S A, Rafiq R I, et al. Analyzing labeled cyberbullying incidents on the Instagram social network. International Conference on Social Informatics. Springer, Cham, 2015: 49-66.

[17]. Rafiq R I, Hosseinmardi H, Han R, et al. Careful what you share in six seconds: Detecting cyberbullying instances in Vine. Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ACM, 2015: 617-622.

[18]. Sui J. Understanding and fighting bullying with machine learning[D]. The University of Wisconsin-Madison, 2015.

[19]. Bretschneider U, Peters R. Detecting Cyberbullying in Online Communities. ECIS. 2016: ResearchPaper61.

[20]. Chatzakou D, Kourtellis N, Blackburn J, et al. Mean birds: Detecting aggression and bullying on twitter. Proceedings of the 2017 ACM on web science conference. ACM, 2017: 13-22.

[21]. Wulczyn E, Thain N, Dixon L. Ex machina: Personal attacks seen at scale. Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017: 1391-1399.

[22]. Rezvan M, Shekarpour S, Balasuriya L, et al. A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research. Proceedings of the 10th ACM Conference on Web Science. ACM, 2018: 33-36.

[23]. Founta A-M, Djouvas C, Chatzakou D, et al. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. Proceedings of the 11th International Conference on Web and Social Media, ICWSM, 2018.

[24]. Han Z, Xiang L, Pengye Z, et al. Learning Tree-based Deep Model for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

[25]. Han Z, Daqing C, Ziru X, et al. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. arXiv:1902.07565.

[26]. Han Z, Daqing C, Ziru X, et al. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. arXiv:1902.07565.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DatasetCollection

Recommender systems

Social Recommendation

Music Recommendation

Paper Recommendation

Location Recommendation

Product Recommendation

Spammer detection

Social Network

Shilling Detection

Cyberbullying Detection

Reference

About

Releases

Packages

Contributors 3

CQU-CSE/DatasetCollection

Folders and files

Latest commit

History

Repository files navigation

DatasetCollection

Recommender systems

Social Recommendation

Music Recommendation

Paper Recommendation

Location Recommendation

Product Recommendation

Spammer detection

Social Network

Shilling Detection

Cyberbullying Detection

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages