# CIC Python Workshop 

## Learning Objectives

* Explain what a library is, and what libraries are used for.
* Load a Python library and use the things it contains.
* Read tabular data from a file into a program.
* Assign values to variables.
* Select individual values and subsections from data.
* Perform operations on arrays of data.
* Display simple graphs.

Words are useful, but what’s more useful are the sentences and stories we build with them. Similarly, while a lot of powerful, general tools are built into languages like Python, specialized tools built up from these basic units live in libraries that can be called upon when needed.

In order to load our inflammation data, we need to import a library called __pandas__. pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It is well suited for many different kinds of data:

* Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
* Ordered and unordered (not necessarily fixed-frequency) time series data.
* Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
* Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

We can load _pandas_ using:

In [None]:
import pandas

Importing a library is like getting a piece of lab equipment out of a storage locker and setting it up on the bench. Libraries provide additional functionality to the basic Python package, much like a new piece of equipment adds functionality to a lab space. Once you’ve loaded the library, we can ask the library to read our data file for us:

In [None]:
df = pandas.read_csv('data/inflammation-01.csv')
df.index.name = 'patient_id'
df

The expression __`pandas.read_csv(...)`__ is a function call that asks Python to run the function loadtxt which belongs to the numpy library. This dotted notation is used everywhere in Python to refer to the parts of things as thing.component.

pandas.read_csv has one required parameter: the name of the file we want to read. This parameter needs to be a character string (or string for short), so we put them in quotes.

When we are finished typing and press Shift+Enter, the notebook runs our command. Since we haven’t told it to do anything else with the function’s output, the notebook displays it. In this case, that output is the data we just loaded. By default, only a few rows and columns are shown (with ... to omit elements when displaying big arrays). To save space, Python displays numbers as 1. instead of 1.0 when there’s nothing interesting after the decimal point.

Our call to __`pandas.read_csv`__ read our file, but didn’t save the data in memory. To do that, we need to assign the array to a variable. A variable is just a name for a value, such as __`x`__, __`current_temperature`__, or __`subject_id`__. Python’s variables must begin with a letter and are case sensitive. We can create a new variable by assigning a value to it using =. As an illustration, let’s step back and instead of considering a table of data, consider the simplest “collection” of data, a single value. The line below assigns the value __55__ to a variable __`weight_kg`__:

In [None]:
weight_kg = 55

Once a variable has a value, we can print it to the screen:

In [None]:
print(weight_kg)

and do arithmetic with it:

In [None]:
print('weight in pounds:', 2.2 * weight_kg)

As the example above shows, we can print several things at once by separating them with commas.
We can also change a variable’s value by assigning it a new one:

In [None]:
weight_kg = 57.5
print('weight in kilograms is now:', weight_kg)

If we imagine the variable as a sticky note with a name written on it, assignment is like putting the sticky note on a particular value:

![Variables as Sticky Notes](figures/python-sticky-note-variables-01.svg)
<center>__Figure: Variables as Sticky Notes__</center>

This means that assigning a value to one variable does not change the values of other variables. For example, let’s store the subject’s weight in pounds in a variable:

In [None]:
weight_lb = 2.2 * weight_kg
print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)

![Variables as Sticky Notes](figures/python-sticky-note-variables-02.svg)
<center>__Figure: Creating Another Variable__</center>

and then change weight_kg:

In [None]:
weight_kg = 100.0
print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)

![Variables as Sticky Notes](figures/python-sticky-note-variables-03.svg)
<center>__Figure: Updating a Variable__</center>

Since __`weight_lb`__ doesn’t “remember” where its value came from, it isn’t automatically updated when __`weight_kg`__ changes. This is different from the way spreadsheets work.