![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Annual Key Performance Indicators for Strathcona County Library

## What you need to do 

The [Strathcona County Library](https://sclibrary.ca/) has many services, books, and programs. Now that they share a catalogue with [Fort Saskatchewan Public Library](https://fspl.ca/) and participate in [The Alberta Library](http://thealbertalibrary.ca/), they are considering expanding to a library location on Mars.

So they need to know what they are doing well and should continue doing. The Strathcona County Library collects and provides open access to datasets, which can help answer this question. Performance indicators can provide valuable feedback and information for moving forward with this project.

That’s where you as a data scientist come in. They’ve provided the data and you need to find out the library’s top-performing services, programs, and books, so they know what to bring to the red planet.

## Data Content
This data set is provided by the Strathcona County Library, Strathcona County, Alberta, Canada. Source: https://data.strathcona.ca/Recreation-Culture/Library-Key-Performance-Indicators/ep8g-4kxs 

### Key Performance Indicators (per year)

| Indicator  | Description  |
|---|---|
|  Answers |   types of answers the library provides to staff and reference questions|
|  Awareness | number of awareness sessions and the number of County residents attending  |
|  Borrowing |  measure and classification of loans and borrowed items |
|  Collections |  kinds and numbers of items the library manages |
|  Internet | number of patrons using WiFi or public computers  |
|  Library use | number of cards purchased; number of people reporting use of the library  |
|  Open days |  total number of days the library was open |
| Population  |  total number of served population |
|  Programs |  number of offered program sessions and the number of participants |
| Services  | use of fax, printing, scanning, copying resumes (since 2011)  |
|  Training | number of sessions and trainees on formal and informal technological sessions |
| Visits  | number of kinds of site visits |
|  Volunteers | number of volunteers  |

## Downloading and parsing the data

We'll begin by downloading the data from [the website](https://data.strathcona.ca/Recreation-Culture/Library-Key-Performance-Indicators/ep8g-4kxs) into a "dataframe" in this notebook. 

We selected the 'API' tag and chose CSV format on the top right side. Pressing the 'Copy' button gave us the URL to download the full dataset.

In [None]:
# import the Pandas module that we will be using to manipulate the data
import pandas as pd
print("Importing Python library was sucessful!")

# this is the link we will get the data from
link = "https://data.strathcona.ca/resource/ep8g-4kxs.csv"
# Read and parse data as a Pandas CSV
rawData = pd.read_csv(link)
# Rename columns
rawData = rawData.rename(columns={"kpicategory": "category", "kpimeasure": "measure","kpiyear":"year","kpivalue":"value"})
# Look at the first five columns
rawData.head()

## Selecting and Sorting Columns

Now that we have downloaded the full dataset, it's time to select and sort columns using Pandas dataframes.

We stored our data in a dataframe called `rawData`. 

Let's explore the `'category'` column and then look at the unique values. 

In [None]:
# Access the values under 'category' by using square brackets, followed by the column name in quotation marks
rawData["category"]

# Get unique values under 'category' column
rawData["category"].unique()

# possible category values are:
# 'Answers', 'Awareness', 'Borrowing', 'Collections', 'Internet',
#       'Library use', 'Open days', 'Population', 'Programs', 'Services',
#       'Training', 'Visits', 'Volunteers'

Let's look at the population size each year using the "Population" category. 

We can select the subset of data we are interested in by using a condition:

`dataframe['column_name']=='Category_name'`

and then passing that condition into the dataframe within square brackets: 

`dataframe[dataframe['column_name']=='Category_name']`

Using our dataframe `rawData`, we can access the subset of the data whose `category` is `'Population'` as follows:

In [None]:
# Set up condition to get only those rows under 'category' whose value matches 'Population'
condition_1 = rawData["category"]=='Population'
# Pass the condition in square brackets to get subset of the data 
rawData[condition_1]

In [None]:
# You could also do this directly without declaring a condition_1 variable
# rawData[rawData["category"]=='Population']

We can also pass multiple conditions by encapsulating each condition in parenthesis, and separating each condition by the logical operator **and** `&` as well as the logical operator **or** `|`. 

In this dataset, any two values under `category` are [mutually exclusive](https://www.mathsisfun.com/data/probability-events-mutually-exclusive.html), so we would need to use the `|` operator. 

`category` and `year` are not mutually exclusive, so we can obtain specific data points by using the `&` symbol.

Let's suppose we want to know how many people have used the library per year since 2012. 

###### Condition 1: 

We want all those entries whose `'category'` is equal to `'Population'`

`rawData["category"]=='Population'`

###### Condition 2: 

We want all those entries whose `'year'` is greater than or equal to `2012`

`rawData["year"]>=2012`

In [None]:
# set up the conditions
condition_1 = rawData["category"]=='Population'
condition_2 = rawData["year"]>=2012

# pass conditions - each in parenthesis and separated by |
rawData[(condition_1) & (condition_2)]

---
### Challenge 1a

Now that you know how to filter data using conditions, use code cells below to find how many "Volunteers" were there in 2015. (Hint: use the `year` and `category` as two conditions) 

---

### Challenge 1b

Like any good data scientist, explore the data by manipulating the dataframe to learn something about the data. You are free to choose any `category`, `measure`, `year` or `value` in the dataframe.

## Simple Summary Statistics

We can use built-in functions in Python to find the average, maximum, and minimum values in the data set. Run the cell below to learn more about these values for the population. 

In [None]:
# We will then access the 'value' column within the subset containing the 'Population' category
# and compute the average size of popultion between 2001 and 2016 using the mean() method
print("The average served population size between 2001 and 2016 is: ")
print(rawData[condition_1]["value"].mean())

# Same thing, except print the maximum 
print("The maximum served population size between 2001 and 2016 is: ")
print(rawData[condition_1]["value"].max())

# ... and minimum
print("The minimum served population size between 2001 and 2016 is: ")
print(rawData[condition_1]["value"].min())

---
### Challenge 2

Find the maximum, minimum, and average number of 'Open days' per year. The average is done for you to get you started :) 

---

In [None]:
# We will then access the 'value' column within the subset containing the 'Open days' category
# and compute the average size of popultion between 2001 and 2016 using the mean() method
condition_2 = rawData["category"]=='Open days'
print("The average number of days the library was open between 2001 and 2016 is: ")
print(rawData[condition_2]["value"].mean())



## Simple Visualization

It's not easy to see patterns and trends in data by looking at tables, so let's create visualizations (graphs) of our data. We'll use the `cufflinks` Python library. 

In [None]:
# If you get the error 'no module named cufflinks' then uncomment the follow line:
#!pip install cufflinks ipywidgets

# load the Cufflinks library and call it by the short name "cf"
import cufflinks as cf

# command to display graphics correctly in Jupyter notebook
cf.go_offline()

def enable_plotly_in_cell():
    import IPython
    from plotly.offline import init_notebook_mode
    display(IPython.core.display.HTML('''<script src="/static/components/requirejs/require.js"></script>'''))
    init_notebook_mode(connected=False)
    
get_ipython().events.register('pre_run_cell', enable_plotly_in_cell)

print("Successfully imported Cufflinks and enabled Plotly!")

### Statistics About Library Program Usage

We want to know how many people have used library program services, and compare that to both the number of program services and the total population size.

We can specify conditions to access that subset of the data: 

`condition_1 = rawData["category"]=='Population'`

will let us access all data points containing the size of served population, and:

`condition_3 = rawData["category"]=='Programs'`

will let us access all data points containing the number of programs offered as well as the number of people who accessed the programs. 

Remember that these categories are mutually exclusive.

We can then plot our subset that contains `condition_1` and `condition_3` to learn things from the data. 

In the plot below, we'll use a `scatter` plot, where the data are categorized by the `measure` column. On the x-axis is the `year` and the y-axis contains the `value`.

We'll also add a `title` so we know what the graph represents. 

In [None]:
# set up conditions (type or paste from above)
condition_1 = 
condition_3 = 

rawData[ (condition_1)| (condition_3)].iplot(
        kind="scatter",mode='markers',
        y="value",x="year",text='measure',categories='measure',
        title="Population served, compared against Program Participants and Number of Program Sessions")

#### Observations

We picked this category to find out how the population size, number of program participants, and number of program sessions have changed over time. 

We observe an upward trend in the total population served per year.

We also observe an upward trend in the number of program participants. This is interesting, given that the number of program sessions mostly didn't change. 

This suggests that the increase in number of participants is related to the served population size, not to the number of program sessions. 

---
### Challenge 3

Let's visualize multiple categories.

1. Run the cell below and look at the unique category values. Pick one that you are interested in.  
2. In the next cell, look at the plot using the `'Collections'` category. Now substitute `'Collections'` for the value you chose, and change the `given_title` variable to give an appropriate name to your plot. Run that cell. 
3. Hover over the plot. What do you observe?

---

In [None]:
rawData["category"].unique()

In [None]:
# Change the input below
# Substitute the value to be selected under 'category'
condition_4 = rawData["category"]=="Collections"
# Give the plot an appropriate title
given_title = "Library Collections"

# Run this cell when you have an appropriate value under category and title for the plot
rawData[ (condition_4)].iplot(kind='scatter',mode='markers',y="value",
        x="year",text='measure',categories='measure',
        title=given_title)

#### Observations

Double click this cell and state:

1. The category you picked
2. Why you were interested
3. What you learned from the graph

---
### Challenge 4

Create visualizations of more categories.

Think about how you would grade the library's performance based on the key indicators.

---

# Conclusions

Edit this cell to describe **what you would recommend for Strathcona County to bring and provide on Mars**, based on the dataset containing "Library Key Performance Indicators". Include any data filtering and sorting steps that you recommend, and why you would recommend them.



## Reflections

Write about some or all of the following questions, either individually in separate markdown cells or as a group.
- What is something you learned through this process?
- How well did your group work together? Why do you think that is?
- What were some of the hardest parts?
- What are you proud of? What would you like to show others?
- Are you curious about anything else related to this? Did anything surprise you?
- How can you apply your learning to future activities?

![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)