# Creating a Project Contributor/Collaborator/Author list

It's standard practice in Zooniverse projects to acknowledge the contributions of the volunteers in any publication, report, or other kind of description of the results of the project. This can come in many forms, e.g. in the authors section (usually as a footnote) or acknowledgments section of a journal article. There is usually a relatively brief acknowledgment that directs readers to a full list of contributors. This notebook shows you how to use `make_author_list.py` to generate that list. The list is generated in markdown format.

*Note: different projects use different terms to refer to their volunteers, depending on how they interact with them. Commonly used terms (in addition to "volunteer") are "collaborator", "author", and "contributor". Many avoid the term "user" as it can feel sterile and doesn't really describe the nature of the engagement between project team members (it gets used more often when describing people who make use of software features, for example).* 


In [1]:
import sys, os
import numpy as np
import pandas as pd
from make_author_list import make_author_list, make_author_list_help

print("Python version: %d.%d.%d, numpy version: %s, pandas version: %s." %(sys.version_info[0], 
                                                                           sys.version_info[1], 
                                                                           sys.version_info[2], 
                                                                           np.__version__, 
                                                                           pd.__version__))
print("Originally developed using Py 2.7.11, np v1.11.0, pd v0.19.2")
print("If these versions don't match and stuff breaks, that's probably why.")

Python version: 2.7.11, numpy version: 1.11.0, pandas version: 0.19.2.
Originally developed using Py 2.7.11, np v1.11.0, pd v0.19.2
If these versions don't match and stuff breaks, that's probably why.


#### Options for inputs

In order to generate a list, you just need to specify the name of an input file to read in and the name of an output file to write to. However, there are many different options. To see them all:

In [2]:
make_author_list_help()


Usage: (users_infile, outfile, clean_emails=False, preformat=False, usecol_cl=False, author_col=None, skip_lookup=False, out_logged_in=False, outcsv=None, max_line_length_char=72, is_classfile=False)
      users_infile is a list of usernames, technically a CSV, with column names
       'credited_name' or 'real_name' (preferred) or the variations on username: 'user', 'user_name', 'username', 'user_id'.
      The input file can be a user-classification file output by basic_project_stats.py, e.g.
 your-project-name-classifications_nclass_byuser_ranked.csv
      The output file will be in markdown, e.g. authorlist_out.md
  Optional extra inputs (no spaces):
    --clean_emails
       Try to clean usernames of email addresses etc that might be farmed by bots when these are displayed on the project
    --pre, --preformatted
       output the list in pre-formatted tags so markdown won't render
    col=col_name
       if the name of your Author name column isn't standard, specify it
    --no_l

***Note:** you will need to have the panoptes python client installed.

If you have used `basic_classification_stats.py`, e.g. by running the previous Jupyter notebook to this one, you will have generated a list of your project's users along with their classification counts, in csv format.

To start, let's assume that's the case and generate a team contributor list in preformatted markdown. If we can, we want to use the `credited_name` field, which is an optional field that a new Zooniverse participant can specify they want used when they're given credit for their contributions. The `user_name` and `credited_name` are both public in the Zooniverse, so it's ok that we're using them in this public example notebook.

Sometimes people use their emails as their usernames, and even though it's clearly stated that Zooniverse usernames are public, we should maybe avoid publishing them on a bunch of team web pages where they might be picked up by spam bots. The code includes a search for email addresses and can sanitize them if we wish.

In [3]:
project_name = "my-project"

# This is the file that's output by basic_classification_processing()
# we generated it in the previous Notebook.
vol_file = project_name + "-classifications_nclass_byuser_loggedin_ranked.csv"

outfile = project_name + "-contributors.md"

# we have to use the "user_name" column in this example because the example file
# doesn't have real Zooniverse IDs. Real exports will, and looking up by user id is faster
# so don't specify author_col unless you need to
authorlist = make_author_list(vol_file, outfile, clean_emails=True, preformat=True, author_col="user_name")

# this is... not that fast (as in, for low thousands of users, go get a coffee)

INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zooniverse.org


Using column user_name to determine author list...
Credited name: lookup 0 of 338 (Velski --> Velski Doe)
Credited name: lookup 33 of 338 (JanetCormack --> JanetCormack)
Credited name: lookup 66 of 338 (Spider1 --> Spider1)
Credited name: lookup 99 of 338 (dewcat --> dewcat)
Credited name: lookup 132 of 338 (nelson2056 --> nelson2056)
Credited name: lookup 165 of 338 (krassy_ms --> Krassimir Sirakov)
Credited name: lookup 198 of 338 (unique_ --> Kearstin Krehbiel)
Credited name: lookup 231 of 338 (Amelievb --> Amelievb)
Credited name: lookup 264 of 338 (Shir-El --> Shir-El)
Credited name: lookup 297 of 338 (spinkick --> spinkick)
Credited name: lookup 330 of 338 (maplecornerbarb --> Barbara Patterson)


INFO:make_author_list:Author list saved in markdown to my-project-contributors.md.
INFO:make_author_list: Logged-in user count: 338


  Here are the ids it returned an error on:
['rcmills1707', 'dsreidcardiac', 'gelias-warrenpeckschool']


#### Warning messages
There are various reasons the user search might not return anything for a given user ID or name. The most common are:
 1. The user account has been deleted
 2. There is an issue with the API request (timeout, etc)
 
If the user account has been deleted, their classifications remain in the system but you can no longer attach that username/ID to any other name information. In that case, either the user name or ID will appear in the author list instead. 

API errors sometimes happen if there is high server load, so if you suspect that's what's happening, you can try again later or email the contact@zooniverse account to find out more information. 

At the moment, both these reasons for a no-result lookup return the same error, so we can't distinguish between the two in the script. But you should always get back either a list of usernames or IDs that didn't pull a result, so you can use that to help you figure out what's happening (if it's just a few names, it's probably that people decided to delete their account). 

#### Checking that it worked

In [4]:
!head my-project-contributors.md

    Velski Doe, rcmills1707, SlowLoris, Chris Duffey, Kathryn Hingston, 
    Stephen J Morgan, Sandra Smith-Gordon, Mgorman, Graham Monroe, 
    Richard Hutton, jean-maurice  Gadéa, John M. Baron, Salomien Rudolph, 
    Skilak, Beachville, Jane Williamson, Kate Wheway, nefinia, 
    Mary Patricia Nichols, Cory Behara, kltayloriris, Kristie Carter, 
    maryse schild, ssddjj, KatyLack, sueburr, Ruth Bingham, Judy, 
    bobinco, Aman Elarbi, Kath Mullholand, museumknitter, 
    Janet Pawlowski, JanetCormack, Barbara Téglás, NynkS2, 
    Cissy van Geene-Huijzen, hpaakkanen, Annezoo42, jerryolsen, 
    Paul_Quinn, Kjell Terje Hoiland, CatzQueenx, Roger Edwards, 


In [5]:
authorlist

Unnamed: 0,user_name,n_class,user_id,name_merged
0,Velski,876,9.765414e+09,Velski Doe
1,rcmills1707,579,9.765007e+09,rcmills1707
2,SlowLoris,564,9.764004e+09,SlowLoris
3,Quinacridone,514,9.765365e+09,Chris Duffey
4,Kathryn999,492,9.765461e+09,Kathryn Hingston
5,SM-Ystrad,457,9.765462e+09,Stephen J Morgan
6,srasg56,445,9.765081e+09,Sandra Smith-Gordon
7,Mgorman,444,9.763989e+09,Mgorman
8,grahamg3,412,9.765399e+09,Graham Monroe
9,Lampyrichard,340,9.765462e+09,Richard Hutton


If you want to do something else with this author list, you can use the dataframe to do it, or manipulate the markdown file directly.

*Note: the names will show up in the same order they do in the csv file; in this case they're ranked by number of classifications, but you may want to order it differently.*

### Generating an author list directly from a classification export

If you haven't run `basic_classification_processing.py` you might not already have a list of users. But the classification exports provide this information, so you can use that instead to generate a user list.

In [6]:
from make_author_list import make_userfile_from_classfile

classification_file = project_name + "-classifications.csv"

userfile = make_userfile_from_classfile(classification_file)

Extracted 742 users from 46393 registered classifications and saved to my-project-classifications_userlist_with_nclass.csv...


In [7]:
authorlist_fromclassfile = make_author_list(userfile, outfile, clean_emails=True, preformat=True, author_col="user_name")

INFO:requests.packages.urllib3.connectionpool:Resetting dropped connection: www.zooniverse.org


Using column user_name to determine author list...
Credited name: lookup 0 of 742 (MerylPG --> Meryl Goulbourne)
Credited name: lookup 74 of 742 (NynkS2 --> NynkS2)
Credited name: lookup 148 of 742 (patrickm.sans --> patrickm.sans)
Credited name: lookup 222 of 742 (binbag42 --> binbag42)
Credited name: lookup 296 of 742 (asixtus --> Ann R Sixtus)
Credited name: lookup 370 of 742 (rottenjonny --> Jon Meyer)
Credited name: lookup 444 of 742 (wicliff7 --> Wicliff Fleurizard)
Credited name: lookup 518 of 742 (EduardoNET --> Ed Sampedro)
Credited name: lookup 592 of 742 (Lor23 --> Laura Simmons)
Credited name: lookup 666 of 742 (flipit4u --> flipit4u)
Credited name: lookup 740 of 742 (maplecornerbarb --> Barbara Patterson)
  Here are the ids it returned an error on:
['rcmills1707', 'acorsten', 'sue.hancock3', 'david_torranceymail', 'ron.gitter', 'dsreidcardiac', 'mike.newman81', 'gelias-warrenpeckschool', 'lg63ladd']


INFO:make_author_list:Author list saved in markdown to my-project-contributors.md.
INFO:make_author_list: Logged-in user count: 742


### Why are the two author lists different?

In the first example, we used a list of contributors that was generated from running `basic_classification_processing` on the classification example file *and only extracting classifications from one workflow and version*. The quick-and-dirty version used immediately above doesn't do that. It doesn't remove non-live classifications, or duplicates, or separate classifications by workflow. So because it considers all classifications in the example file instead of just the ones in a specific workflow, it's a longer list. 

This gives you the flexibility to choose how you generate these lists, depending on how you intend to use them.

### How should I use this list?

As a bare minimum, it's a good idea to generate a contributor list for all your volunteers who participated while your project was live, and paste it into your **Team** page on your project with an acknowledgment note as a header to the list. [Here's an example of that.](https://www.zooniverse.org/projects/vrooje/planetary-response-network-and-rescue-global-caribbean-storms-2017/about/team)

(It's no accident that the list there overlaps by quite a bit with the one here; the example classifications were taken from that project.)

#### Running `make_author_list` from the command line
At the command line, type:

`%> python make_author_list.py`

without any inputs to see what the CLI syntax is.