About `zoonyper`

Zoonyper facilitates interpretation and wrangling for Zooniverse files in Jupyter (and Python).

Importing a Project

You import the Project class from the zoonyper library like this:

from zoonyper import Project

Loading a Project

Loading a project is done using the Project class:

.. py:class:: Project(path='', classifications_path='', subjects_path='', \
    workflows_path='', comments_path='', tags_path='')

You have two options to load a project:

Provide paths to each individual file (using all the specific path arguments).
Provide a general path (using the path keyword argument, preferred).

Option 1: Individual file paths

Providing the path for each of the five required files has the benefit of you being able to specify exactly where the files are located, individually:

classifications_path = "<full-path-to>/classifications.csv"
subjects_path = "<full-path-to>/subjects.csv"
workflows_path = "<full-path-to>/workflows.csv"
comments_path = "<full-path-to>/comments.json"
tags_path = "<full-path-to>/tags.json"

project = Project(
        classifications_path=classifications_path,
        subjects_path=subjects_path,
        workflows_path=workflows_path,
        comments_path=comments_path,
        tags_path=tags_path
    )

Option 2: Directory

In the example above, if all the required files (classifications.csv, subjects.csv, workflows.csv, talk-comments.json and talk-tags.json) are located in the same path, you can just providing the path where all of them are located, which is a neater way of writing the same thing:

project = Project("<full-path-to-all-files>")

Access All Project Data (Frames)

Project.classifications is the classifications DataFrame, and has all the functionality of a regular Pandas DataFrame:

project.classifications.head(2)

Project.subjects is the subjects DataFrame, and has the same kind of functionality:

project.subjects.head(2)

Project.workflows is the DataFrame variation for the project's workflows:

project.workflows.head(2)

Shortcuts to Column Summaries

Project.workflow_ids (an attribute) lists all of the project's workflow IDs:

project.workflow_ids

Project.inactive_workflow_ids (an attribute) lists the project's inactive workflow's IDs:

project.inactive_workflow_ids

Using Project.workflow_ids and Project.inactive_workflow_ids, we can get the active workflows by using:

set(project.workflow_ids) - set(project.inactive_workflow_ids)

Project.subject_sets (an attribute) lists all of the project's subject sets and corresponding subject IDs:

project.subject_sets

Project.subject_urls lists all of the project's subjects and their corresponding URLs:

project.subject_urls

Listing Project and Workflow Participants

Project.participants() offers a quick way of seeing all participants across the entire project:

project.participants()

Project.participants(by_workflow=True) can be used to see all participants by workflow (as a dictionary):

project.participants(by_workflow=True)

Project.participants(workflow_id=<id>) can finally be used to retrieve participants for a particular workflow:

project.participants(workflow_id=12038)

Counting participants

Counting participants across projects and workflows is also a quickly accessible functionality of the package.

Project.participants_count() offers a quick way to see how many participants were in the project altogether:

project.participants_count()

If the function is provided with an optional workflow ID, Project.participants_count(<workflow_id>) can also be used to see how many participants were in a particular workflow:

project.participants_count(12194)

Logged in participants

Project.logged_in() can be used to counting how many classifications were done while users were logged in:

project.logged_in()

Similar to the functionality above, if the same function is provided with an optional workflow ID, Project.logged_in(<workflow_id>), we can also see how many classifications were made while logged in for a particular workflow ID:

project.logged_in(12194)

Flatten annotations

Project.flattened_annotations is a property on the project that contains the values for each annotation per classification. (If there are lists, they are joined with "|".)

In order to save a file with flattened annotations, thus, we can combine the logic from zooniverse with that of Pandas DataFrames:

project.flattened_annotations.to_csv('flattened-annotations.csv')

Counting classifications

Project.classification_counts(workflow_id=<workflow ID>, task_number=<task number>) is a method that retrieves the number of different classifications per subject ID for any given workflow:

project.classification_counts(workflow_id=12038, task_number=0)

Note: The method currently works best with text annotations.

Using classification_counts, we can also easily check for "agreement", say when all annotators have agreed on one classification:

agreement = {
    subject_id: len(unique_classifications) == 1
    for subject_id, unique_classifications in project.classification_counts(workflow_id=12038, task_number=0).items()
}

print(agreement)

Similarly, we can construct a code block for whenever at least four annotators have agreed on one response for a subject:

agreement = {
    subject_id: len([classification for classification, count in unique_classifications.items() if count > 4]) == 1
    for subject_id, unique_classifications in project.classification_counts(workflow_id=12038, task_number=0).items()
}

print(agreement)

Workflow Timelines

Project.get_workflow_timelines() provides the data for the extent of classifications for all workflows in a given project:

project.get_workflow_timelines()

Project.get_workflow_timelines(include_active=False) does the same as above, but excludes active workflows from the list:

project.get_workflow_timelines(include_active=False)

Comments

Project.comments (a property) provides access to all the comments in the project as a pandas DataFrame.

project.comments

To get a pre-filtered comments DataFrame, including only non-staff members, you have to first set the staff property on the Project and then use the Project.get_comments(include_staff=False) method instead, using the include_staff setting set to False:

project.set_staff(["miaridge", "kallewesterling"])
project.get_comments(include_staff=False)

Project.get_subject_comments(<subject_id>) offers a quick-access method on the Project to see the comments for each subject as a DataFrame (it always includes staff comments):

project.get_subject_comments(73334345)

.. toctree::
   :maxdepth: 3
   :caption: Contents:

   README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

About `zoonyper`

Importing a Project

Loading a Project

Option 1: Individual file paths

Option 2: Directory

Access All Project Data (Frames)

Shortcuts to Column Summaries

Listing Project and Workflow Participants

Counting participants

Logged in participants

Flatten annotations

Counting classifications

Workflow Timelines

Comments

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

About zoonyper

Importing a Project

Loading a Project

Option 1: Individual file paths

Option 2: Directory

Access All Project Data (Frames)

Shortcuts to Column Summaries

Listing Project and Workflow Participants

Counting participants

Logged in participants

Flatten annotations

Counting classifications

Workflow Timelines

Comments

About `zoonyper`