Skip to content

Latest commit



265 lines (157 loc) · 8.98 KB


File metadata and controls

265 lines (157 loc) · 8.98 KB

About zoonyper

Zoonyper facilitates interpretation and wrangling for Zooniverse files in Jupyter (and Python).

Importing a Project

You import the Project class from the zoonyper library like this:

from zoonyper import Project

Loading a Project

Loading a project is done using the Project class:

.. py:class:: Project(path='', classifications_path='', subjects_path='', \
    workflows_path='', comments_path='', tags_path='')

You have two options to load a project:

  1. Provide paths to each individual file (using all the specific path arguments).
  2. Provide a general path (using the path keyword argument, preferred).

Option 1: Individual file paths

Providing the path for each of the five required files has the benefit of you being able to specify exactly where the files are located, individually:

classifications_path = "<full-path-to>/classifications.csv"
subjects_path = "<full-path-to>/subjects.csv"
workflows_path = "<full-path-to>/workflows.csv"
comments_path = "<full-path-to>/comments.json"
tags_path = "<full-path-to>/tags.json"

project = Project(

Option 2: Directory

In the example above, if all the required files (classifications.csv, subjects.csv, workflows.csv, talk-comments.json and talk-tags.json) are located in the same path, you can just providing the path where all of them are located, which is a neater way of writing the same thing:

project = Project("<full-path-to-all-files>")

Access All Project Data (Frames)

Project.classifications is the classifications DataFrame, and has all the functionality of a regular Pandas DataFrame:


Project.subjects is the subjects DataFrame, and has the same kind of functionality:


Project.workflows is the DataFrame variation for the project's workflows:


Shortcuts to Column Summaries

Project.workflow_ids (an attribute) lists all of the project's workflow IDs:


Project.inactive_workflow_ids (an attribute) lists the project's inactive workflow's IDs:


Using Project.workflow_ids and Project.inactive_workflow_ids, we can get the active workflows by using:

set(project.workflow_ids) - set(project.inactive_workflow_ids)

Project.subject_sets (an attribute) lists all of the project's subject sets and corresponding subject IDs:


Project.subject_urls lists all of the project's subjects and their corresponding URLs:


Listing Project and Workflow Participants

Project.participants() offers a quick way of seeing all participants across the entire project:


Project.participants(by_workflow=True) can be used to see all participants by workflow (as a dictionary):


Project.participants(workflow_id=<id>) can finally be used to retrieve participants for a particular workflow:


Counting participants

Counting participants across projects and workflows is also a quickly accessible functionality of the package.

Project.participants_count() offers a quick way to see how many participants were in the project altogether:


If the function is provided with an optional workflow ID, Project.participants_count(<workflow_id>) can also be used to see how many participants were in a particular workflow:


Logged in participants

Project.logged_in() can be used to counting how many classifications were done while users were logged in:


Similar to the functionality above, if the same function is provided with an optional workflow ID, Project.logged_in(<workflow_id>), we can also see how many classifications were made while logged in for a particular workflow ID:


Flatten annotations

Project.flattened_annotations is a property on the project that contains the values for each annotation per classification. (If there are lists, they are joined with "|".)

In order to save a file with flattened annotations, thus, we can combine the logic from zooniverse with that of Pandas DataFrames:


Counting classifications

Project.classification_counts(workflow_id=<workflow ID>, task_number=<task number>) is a method that retrieves the number of different classifications per subject ID for any given workflow:

project.classification_counts(workflow_id=12038, task_number=0)

Note: The method currently works best with text annotations.

Using classification_counts, we can also easily check for "agreement", say when all annotators have agreed on one classification:

agreement = {
    subject_id: len(unique_classifications) == 1
    for subject_id, unique_classifications in project.classification_counts(workflow_id=12038, task_number=0).items()


Similarly, we can construct a code block for whenever at least four annotators have agreed on one response for a subject:

agreement = {
    subject_id: len([classification for classification, count in unique_classifications.items() if count > 4]) == 1
    for subject_id, unique_classifications in project.classification_counts(workflow_id=12038, task_number=0).items()


Workflow Timelines

Project.get_workflow_timelines() provides the data for the extent of classifications for all workflows in a given project:


Project.get_workflow_timelines(include_active=False) does the same as above, but excludes active workflows from the list:



Project.comments (a property) provides access to all the comments in the project as a pandas DataFrame.


To get a pre-filtered comments DataFrame, including only non-staff members, you have to first set the staff property on the Project and then use the Project.get_comments(include_staff=False) method instead, using the include_staff setting set to False:

project.set_staff(["miaridge", "kallewesterling"])

Project.get_subject_comments(<subject_id>) offers a quick-access method on the Project to see the comments for each subject as a DataFrame (it always includes staff comments):

.. toctree::
   :maxdepth: 3
   :caption: Contents: