Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read "Who Models the World? Collaborative Ontology Creation and User Roles in Wikidata" #1001

Closed
Daniel-Mietchen opened this issue Oct 18, 2018 · 11 comments

Comments

@Daniel-Mietchen
Copy link
Owner

as per
https://twitter.com/aliossandro/status/1052842767381090304

@Daniel-Mietchen
Copy link
Owner Author

Daniel-Mietchen commented Oct 18, 2018

"In this paper, we build upon these previous studies to understand the range of contributions users
make to the Wikidata ontology and their impact on the ontology quality. We carry out a literature
review to understand the different approaches and metrics that have been applied in the past to
assess the quality of ontologies and define a framework suitable for Wikidata. We then cluster
editing activities using k-means and features suggested in related work to identify characteristic
user roles in monthly time frames. Finally, we explore the relationship between user roles and
ontology quality in time by fitting several regression models.
Our paper fills a gap in collaborative knowledge engineering by analysing one of the most
significant projects in this space. In the context of Wikidata, our contribution is threefold:
(i.) we propose a quality framework for the Wikidata ontology, demonstrating its suitability as a tool to
monitor changes to its quality;
(ii.) we derive a set of user roles based on a broader range of activity
patterns and contribute to contextualise Wikidata within the field of online collaboration;
(iii.) we shed light on the links between collaborative processes and the outcomes of its community, by investigating how user roles influence the ontology quality."

@Daniel-Mietchen
Copy link
Owner Author

Daniel-Mietchen commented Oct 18, 2018

"To examine the evolution of user activity and of ontology quality over time, we extracted monthly slices of the data and collected all variables for each slice. P31 was created in early February 2013, while P279 dates back from early March 2013. Hence, the first slice in our dataset is March 2013, the last is September 2017, for a total of 55 slices. The code and part of the datasets produced are available at https://github.com/Aliossandro/who_models_the_world_submission_1161_cscw.git ."

@Daniel-Mietchen
Copy link
Owner Author

In section 5.2, the part around "We called these roles contributor and leader." is not very clear.

@Daniel-Mietchen
Copy link
Owner Author

"MicrobeBot created numerous new classes by adding sub-classes statements to as many protein and gene Items. These edits are not formally incorrect, yet they are questionable under a knowledge engineering point of view, as it is highly unlikely that instances will ever added to these classes."

@Daniel-Mietchen
Copy link
Owner Author

"The liberal policies of Wikidata put virtually no restrictions on the edits users can make. "

@Daniel-Mietchen
Copy link
Owner Author

Daniel-Mietchen commented Oct 18, 2018

"Whereas the structural indicators considered work well with the observational data provided by Wikidata and are able to illustrate trends over time, they are hardly comparable across ontologies and do not provide any direct insight into the correctness of the conceptualisation [44]. To deal with the first, we could consider normalised indicators [44]. For the second, we could try to detect inconsistencies, either by inspecting samples of the class hierarchy [43] or by using reasoning software—however, the size of the ontology makes both tasks extremely challenging for state of the art tools. Further on, our framework does not consider multilingual aspects of Wikidata."

@Daniel-Mietchen
Copy link
Owner Author

"To sum up, the metrics computed provide only a partial picture of the quality of the Wikidata ontology. Yet, it is an important part. First, our findings may be a starting point for future studies that want to explore differences in quality between domains in Wikidata conceptual knowledge. Second, the information provided by our metrics may be used to test future design solutions. For example, measures to make the effects of any edit on the hierarchy may be adopted to address the misuse of taxonomic relations, following a suggestion regarding collaborative ontology development contexts in [42]. An analysis of the metrics selected in this work may be subsequently used to assess the success of such approach."

Would be good to think about ways in which these measures could be injected into community workflows.

@Daniel-Mietchen
Copy link
Owner Author

Daniel-Mietchen commented Oct 18, 2018

"Our findings demonstrate that Wikidata’s sociotechnical fabric diverges from prior projects in both
the areas of peer-production and collaborative knowledge engineering and may actually represent a
new paradigm of collaborative system."

That would fit with "Wikidata: A New Paradigm of Human-Bot Collaboration?" — another paper by this paper's first author, which I think identified an interesting observation but was methodologically not very compelling.

@Daniel-Mietchen
Copy link
Owner Author

Daniel-Mietchen commented Oct 18, 2018

"Several users follow the path to leadership in their first months, but yearly cohorts in Figure 5.d show that they often do not beat that path again. This may be a sign of declining participants’ motivation—one challenge initiatives such as Wikidata and Wikipedia may face is the lack of shorter-term, tangible goals and achieving a specific editor status might act as a proxy to them [20]. Our analysis, alongside previous studies [27], could inform the definition of these ‘badges’ and help study their uptake and effects."

@Daniel-Mietchen
Copy link
Owner Author

"What is surprising is the lack of any substantial influence on the total number of classes (noc), reinforcing the impression that Wikidata user dynamics differ from those observed in prior collaborative ontology engineering projects."

@Daniel-Mietchen
Copy link
Owner Author

"The Wikidata ontology is large and messy, with numerous underpopulated classes and uneven depth. This confirms prior literature suggesting that several Wikidata contributors fail to use correctly the taxonomic relations P31 (instance of) and P279 (subclass of). On the other hand, we found evidence suggesting that parts of the ontology have higher depth and are likely to be curated by a core of expert users. We identified two activity patterns: contributors, i.e. users with lower number of edits and less engaged in community discussions, and leaders, who are more active in all of the features considered. Only a minority of users presents a leader activity pattern at any time during their interaction with the platform. Finally, whereas the activity of leaders seems to influence positively the depth of the ontology, no relation could be proven between any editor category and variables concerning the breadth of the ontology. Future work should explore what variables are at play in that regard."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant