# JSC370 - Assignment #3


## Due dates

**Presentation using RISE and voila:** March 3 in Tutorial, 18:00 - 20:00.

**User documentation, Jupyter Notebook, other files:** March 9, 23:59.


## Instructions

In this assignment you will be working in pairs.  The group assignments are below:

In [1]:
import numpy as np

# set random seed to course number 
np.random.seed(88211) 

# student initals
students = np.array(['DG', 'JH', 'LH', 'HL','TS','SW','MX', 'SY'])

# select 4 groups of size without replacement
groups = np.random.choice(students, size=(4,2), replace=False)

# print out groups for students
for i in range(len(groups)):
    print('Group', i+1, 'is:', groups[i,0],'and',groups[i,1])

Group 1 is: SY and MX
Group 2 is: TS and SW
Group 3 is: JH and DG
Group 4 is: LH and HL


## Background

The UofT library would like to know about patterns of use among the UofT community for electronic journals.   


## Project Objective

Build an interactive web page using a Jupyter notebook with a Python kernel, [ipywidgets](https://ipywidgets.readthedocs.io/en/latest/index.html), and [voila](https://voila.readthedocs.io/en/stable/) to serve the notebook as a web page. The web page should display and explain the data (e.g., the number of downloads, number of UofT authors, time, and journal topic) allowing the user to answer the [questions](#Questions).  These types of web pages are often called dashboards.



## Assignment Questions

- Which journals are downloaded most frequently?

- How many authors from UofT publish in the journals that are downloaded?

- How do the download patterns change over time?

- Can you predict future downloads?

- Is it possible to use Web of Science data or another data source such as Crossref (see assignment #1) to gather information on the topic of the journal (e.g., Social Science, Life Science)?

-  Which journals/publishers do UofT faculty publish with that are not in our journals collection (represented by the usage reports provided)?

- Any other interesting questions that your group think could be answered using this data.


## Data

The data is confidential and you will be asked to sign an agreement that you will not share the data with individuals outside of this course.  Data access will be communicated in a separate document.

**Web of Science dataset** – you will notice that we did not bring all the names of the authors for each artifact. We only brought in the names of the UofT affiliated authors, as well as their position in the author list. These are the columns on the far right of the sheet.

**COUNTER reports** – the naming convention is ‘publisher  report type year covered’ so ‘Wiley JR1 2018’. Where available, reports covering  the usage of the content on the Scholars Portal platform (our backup local load copy) the file name will have the addition of the acronym SP in the file name.

2014-2018 usage for most publishers are provided. For some, years are missing because reports were not available, we didn’t have a license to the content for part of the period etc.


## Mandatory Project Workflow

- Each student clones the assignment repository from Github to their local machine, and **starts a unique branch** to work on their part of the assignment.


- Students work on their branches, committing changes and pushing their branch to the shared repository.


- When each student finishes their part of the assignment, they start a pull request.


- Each group works together to review the proposed changes, discuss improvements or alternatives, and resolve conflicting changes arising from concurrent development.


- When the students agree on a resolution, they merge each pull request.  The teaching team can leave feedback on commits or pull requests if they are tagged in the comments. 


- Both members of the team should contribute equally to building the web page and documentation.  It's not appropriate for one member to work on building the web page and the other to work on the documentation. 


### Git Tools Useful for working with Jupyter Notebooks


[nbdime](https://nbdime.readthedocs.io/en/latest/#) is a very useful Python library for working with .ipynb files with git and Github. 


## Presentation

Your presentation should demonstrate how your interactive Jupyter notebook answers the [questions](#Questions). Your presentation should use [RISE](https://rise.readthedocs.io/en/maint-5.6/) to create slides.  You may also demonstrate parts of your project using voila. 


## User Documentation

You and your partner will create [user documentation](#User-Documentation-for-Interactive-web page) for the web page. The documentation should be done 



## Issues to consider

- What information will the user see on the web page?

- How will your group display different data? As a visualization, table, text, or combination? Where will you add interactivity?  How will you know if your choices lead to effective and accurate communication of information?

- How will your group predict future downloads?  How will you display this information?


# Preparation Lab Expectations

- This lab will take place on Feb. 25, 6-8.


- Use this time to get familiar with the assignment expectations.


- Work with your project partner?  It's OK (and encouraged) to share information.


- Develop strategies on how you plan to tackle the points in [Issues to consider](#Issues-to-consider) and other challenges such as which data you will present, how you will present the data, and web page layout.


- During the last part of the tutorial give a very short presentation on your group's plan.  


- By the end of the tutorial commit you and your partner should decide on how you will split-up the work.  Collaborate on a brief written plan then commit and push to your assignment repository.  



# Presentation Expectations

The time allotted for each presentation is 7 minutes plus 3 minutes for questions/discussion. Each person should present for approximately half the time (i.e., 3.5 minutes). This time limit will be enforced. If you exceed the time limit then you will be asked to stop the presentation. This means that you should rehearse your presentation timing before you present to the class.


## General Presentation Guidelines

The goal of the presentation is to effectively communicate how librarians can use your web page to answer the [questions](#Questions) (i.e., the communication is aimed at a non-technical, but educated, audience). This does not mean that you should not include technical details, but you should aim to communicate the findings to an audience without a background in statistics, math, or computer science.

You will need to remind us about the project, but only tell us what we really need to know. We are curious about the results, and how you present the results, but they are not the only purpose of this presentation. So, what should you include? Examples, of questions to consider as you prepare your presentation are:

- What problem did your group set out to solve?


- How did your group define the problem?


- How will your results help librarians patterns of use among the UofT community for electronic journals? 

Your presentation will be graded using the [presentation rubric.](https://jsc370.github.io/assignment_rubrics.html#presentation_rubric)

The Jupyter notebook you used for the presentation should be pushed to your Github repository for this assignment by <u>**March 3, 18:00**</u>.


# User Documentation for Interactive web page

- The user documentation should explain to users what data is being displayed on your web page.   For example, if you use the data to do a calculation or create a plot then explain why the calculation was done, and how it should be interpreted.

- The documentation should be broken into sections that correspond to the sections of your web page. 

- The user documentation should be done using a Jupyter notebook.  Ideally your group would find a way to incorporate the documentation into the design of the web page, although this isn't necessary.


## How will my user documentation be evaluated?

Your user documentation will be evaluated for clarity and conciseness.

**Titles [1-5]:** There should be an appropriate title for each section of the web page.

**Introductions [1-5]:** What is the the purpose of each section? 

**Methods [1-5]:** Statistical calculations and data visualizations should be clearly explained to users in each section of the web page without assuming a background in statistics, math, or computer science.

**General Considerations [1-5]:** The documentation should be presented in logical order, with well-organized sections, no grammatical, spelling, or punctuation errors, an appropriate level of technical detail, and be clear and easy to follow.

**Workflow[1-5]:** Groups should follow the [project workflow](#Mandatory-Project-Workflow) by creating a branch for each member, pull requests, and merges using git and Github.

## How will the web page be evaluated?

The web page and user documentation will be considered the "written report" for this assignment.  70% of the written report mark will be based on the web page and 30% will be based on the user documentation.

The web page be graded by evaluating:

- Workflow will be evaluated according to [Mandatory Project Workflow](#Mandatory-Project-Workflow).

- Data analysis and programming will be evaluated according to the [data analysis](https://jsc370.github.io/assignment_rubrics.html#data_analysis_rubric) and [programming](https://jsc370.github.io/assignment_rubrics.html#programming_rubric) rubrics.

# Sample Data

In [2]:
import pandas as pd
import numpy as np

uoft2017 = pd.read_csv('uoft_sample_data.csv')
uoft2017.head(n=3)

Unnamed: 0.1,Unnamed: 0,PubDate,Source Title,UofT authors
0,7209,2017-07-01,CURRENT PSYCHIATRY REPORTS,1.0
1,11987,2017-01-01,RELIGIOUS EDUCATION,1.0
2,15940,2017-07-31,CANADIAN MEDICAL ASSOCIATION JOURNAL,4.0


# Using ipywidgets to add interactivity

The following code cell creates a dropdown menu that let's the user choose a date to display the journals where UofT authors publish. 

In [3]:
import ipywidgets as widgets

# dropdown menu of dates
dd = widgets.Dropdown(options = uoft2017.PubDate.drop_duplicates().sort_values())

# Output widget for dataframe
out2 = widgets.Output()

# display dropdown and dataframe
display(dd, out2)

# dd_eventhand is an event handler for displaying a filtered view of the dataframe
# The callback registered must have the signature handler(change) where change is a 
# dictionary holding the information about the change.
# see https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Events.html and 
# the doc string for observe (i.e., print(widgets.Widget.observe.__doc__))

def dd_eventhand(change):
    out2.clear_output() # clear current output
    with out2:
        # display three columns of dataframe filtered by date selected in dropdown
        display(uoft2017[uoft2017['PubDate'] == change.new][['PubDate','Source Title','UofT authors']].head())

dd.observe(dd_eventhand, names = 'value')

Dropdown(options=('2017-01-01', '2017-02-01', '2017-03-01', '2017-03-03', '2017-03-14', '2017-04-01', '2017-05…

Output()

Now add a plot to display the distribution of UofT authors.

In [4]:
import matplotlib.pyplot as plt

# dropdown menu of dates
dd = widgets.Dropdown(options = uoft2017.PubDate.drop_duplicates().sort_values())

# Output widget for dataframe
out2 = widgets.Output()
out3 = widgets.Output()

# display dropdown, dataframe
display(dd, out2, out3)

# dd_eventhand is an event handler for displaying a filtered view of the dataframe
# The callback registered must have the signature handler(change) where change is a 
# dictionary holding the information about the change.
# see https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Events.html and 
# the doc string for observe (i.e., print(widgets.Widget.observe.__doc__))

def dd_eventhand(change):
    out2.clear_output() # clear current output
    out3.clear_output() # clear current output

    with out2:
        # display three columns of dataframe filtered by date selected in dropdown
        display(uoft2017[uoft2017['PubDate'] == change.new][['PubDate','Source Title','UofT authors']].head())

    with out3:
        # display histogram of number of authors
        fig1 = plt.subplots()
        filter = (uoft2017['UofT authors']<=10)
        uoft2017[filter]['UofT authors'].hist(bins=10, color='grey', edgecolor='black')
        plt.grid(False)
        plt.xlabel('Number of Authors')
        plt.ylabel('Count')
        plt.show(fig1)

dd.observe(dd_eventhand, names = 'value')

Dropdown(options=('2017-01-01', '2017-02-01', '2017-03-01', '2017-03-03', '2017-03-14', '2017-04-01', '2017-05…

Output()

Output()

## Create a Dashboard using Tab Layout 

Add another dropdown menu for Journal menu and display results from dropdown selection in a [tab layout](https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20List.html#Tabs).  



In [5]:
import matplotlib.pyplot as plt

# dropdown menu of dates
dd = widgets.Dropdown(options = uoft2017.PubDate.drop_duplicates().sort_values())

# dropdown menu of journal titles
dd1 = widgets.Dropdown(options = uoft2017['Source Title'].drop_duplicates().sort_values())


# Output widget for dataframe
out2 = widgets.Output()
out3 = widgets.Output()
out4 = widgets.Output()

#display dropdown, dataframe
#display(dd, out2, out3)
#display(dd1, out4)

# dd_eventhand is an event handler for displaying a filtered view of the dataframe
# The callback registered must have the signature handler(change) where change is a 
# dictionary holding the information about the change.
# see https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Events.html and 
# the doc string for observe (i.e., print(widgets.Widget.observe.__doc__))

def dd_eventhand(change):
    out2.clear_output() # clear current output
    out3.clear_output() # clear current output

    with out2:
        # display three columns of dataframe filtered by date selected in dropdown
        display(uoft2017[uoft2017['PubDate'] == change.new][['PubDate','Source Title','UofT authors']].head())

    with out3:
        # display histogram of number of authors
        fig1 = plt.subplots()
        uoft2017[(uoft2017['PubDate'] == change.new)]['UofT authors'].hist(bins=10, color='grey', edgecolor='black')
        plt.grid(False)
        plt.xlabel('Number of Authors')
        plt.ylabel('Count')
        plt.show(fig1)
        
def dd_eventhand1(change):  
    out4.clear_output() # clear current output
    with out4:
        # display three columns of dataframe filtered by date selected in dropdown
        display(uoft2017[uoft2017['Source Title']==change.new][['PubDate','Source Title','UofT authors']].head())


dd.observe(dd_eventhand, names = 'value')
dd1.observe(dd_eventhand1, names = 'value')

In [6]:
input_widgets = widgets.HBox([dd,dd1])
display(input_widgets)

tab = widgets.Tab([out2, out3, out4])
tab.set_title(0, 'Date')
tab.set_title(1, 'Plot')
tab.set_title(2, 'Journal')
display(tab)

HBox(children=(Dropdown(options=('2017-01-01', '2017-02-01', '2017-03-01', '2017-03-03', '2017-03-14', '2017-0…

Tab(children=(Output(), Output(), Output()), _titles={'0': 'Date', '1': 'Plot', '2': 'Journal'})