# Basic Python Workshop - Introduction
In this workshop we will go over importing packages and basic data structures.

For this session we will only import native python packages and do not require interfacing with our custom software. We will work with numpy for numerical analysis, pandas for its convenient data table manipulations, and plotnine for plotting purposes.

# How to import a package
When importing a package there are four main ways to do this.
1. `import package`

    This imports everything in the package and labels all functions and attributes of that package with the package name. For example, if you were to use `import numpy` you would then call the `arange` function from `numpy` by using `numpy.arange`.

2. `import package as shortname`

    This imports the package and labels all functions and attributes of that package with the `shortname` you defined. For example, if you were to use `import numpy as np` you would then call the `arange` function from `numpy` by using `np.arange`.

3. `from package import *`

    This imports all functions and attributes within the package without tagging them as part of the same package. The `*` here means "everything". For example, if you were to use `from numpy import *`, you would then call the `arange` function from `numpy` by using `arange`. This can be dangerous if two packages have the same function and you import the contents of both of them.

4. `from package import function` or `from package import function as shortfun` or `import package.function as shortfun`

   This imports the listed function from the package instead of importing the entire package. The second and third options are identical. If the function or attribute you are importing has attributes of its own, the second and third options are usually useful as you can use a shortened name to call additional attributes (like in bullet point 1)

Don't worry too much about the details of how to use these packages yet. We will get to that later in the workshop.

## First Task
For now, let's import two packages
1. `numpy` as `np`
2. `pandas` as `pd`


# Data Types

The most commonly used data types that we will cover are:
* `int` - integer
* `float` - floating point number (decimal places)
* `str` - string
* `list` - an ordered list of objects surrounded by square brackets [ ] and separated by commas
* `dict` - a dictionary of keys and values (may be ordered or unordered)
* `numpy.ndarray` - an array object (think vector or matrix)
* `pandas.DataFrame` - a dataframe object (think table)

If you want to see all the attributes or functions you can use on each of these obect types use the `dir` function. Ex: `dir(list)`

These data types can be sorted into two categories:
1. Mutable: `list`, `dict`, `numpy.array`, `pandas.DataFrame`
2. Immutable: `str`, `float`, `int`

Immutable objects cannot be changed once created, while mutable ones can. This is important because when you are assigning a variable, you are not creating a new object - but instead stating that the variable references an object.

For example, if we say `x = 1`, this means that `x` references the object `1`. If I then say `x = x+1`, it will now say that `x` references the object `2`. You can use the `id` function to show that between these two steps the identity of the variable has changed. This works because 1 and 2 are `int`s and are immutable.

What happens now if we try something similar with a mutable object?

Let's try with a `list`. We will first define `x = [1,2,3]`. This is a list with elements 1, 2, and 3. Then we will change the value of the first element in this list to 0 by using `x[0] = 0`. You will see that the id of the variable x stays the same.

### Mutability
#### Why is mutability a problem?
This can become a problem if you define one want to make a copy of a mutable object and then manipulate it. Say you have a list `x = [1,2,3]` which contains some original set of data. Let's say you then want to make a copy of this list with some additional data added to it. You would like to retain both versions. Because a `list` is mutable, any changes you make to it will propogate.

Let's see how that plays out:
We'll define `y=x`. Then we will `append` a new element to `x` and see what happens to `y`.

#### How to address mutability
If you want to copy and change a mutable object you can get around this issue by using the `.copy()` function. Instead of saying `y=x` it instead says that `y` is a copy of `x`. Then when you change `x`, `y` remains unchanged

---------
## Floats, stings, and lists
### Second Task - part 1

Define several objects of different types:
* Assign a variable `x` equal to an integer of your choice
* Assign a variable `y` to a float of your choice
* Assign a variable `z` to a string of your choice
* Assign a list `L` to be equal to a list of the above three variables
* Use the `print` function to output the value of the list `L` that you defined

In [1]:
#use this cell to define the variables


In [2]:
#print L in this cell


There are several ways to interact with lists.

You can use functions (`list.function(*input)`):
* `.append()` - adds object to end of the list
* `.clear()` - removes all items from the list
* `.copy()` - creates a copy of the list
* `.count()` - return number of times the input of the function appears in the list
* `.extend()` - extend a list by adding a new list or other iterable to the end
* `.index()` - returns the index of the input value
* `.insert()` - used as `list.insert(index, object)` - inserts the object at a location before the index
* `.pop()` - remove the entry at the index used in the input to this function and return that object
* `.remove()` - remove the first occurance of the value in the input of this function
* `.reverse()` - reverse the order of the list
* `.sort()` - sort list into ascending order

Besides using these functions you can also interact with lists by using indices. Python begins indexing with 0. So the first element of L is `L[0]`. The last element of L is `L[-1]`. And you can select a subset of values from L by using python index notation `first:last:step` where `first` is the first value to keep, `step` is the step size (1 corresponds to keeping every element, 2 corresponds to every other, etc), and `last` is one index past the last element you wish to keep.

If `first` is missing, the counting starts at index 0. If `last` is missing, the counting goes to the last element of the list. If `step` is missing, you count every element between first and last.

Ex: `L[::2]` counts every other element of `L`. `L[:3]` counts the the first three elements of `L` (`[L[0], L[1], L[2]`).


### Second Task - part 2
Use the functions above to create a new variable L2 which is produced by removing the string in L and sorting the two numbers in numerical order

In [3]:
#define L2 here


----------

## Numpy Arrays
The most convenient data type for mathematical analysis is the `Numpy` array (an `ndarray` object). This is an n-dimensional array of homogeneous data types (so all `int`, `float`, or `str`). If you have mixed element types within your array, the minimum type will be chosen.

A `Numpy` array is created by using the command `x = np.array(list)`. The list in the array function can be explicitely written out or a previously defined object.

When applying mathematical operations to an array multiplication, addition, and truth statements (>, <, ==) are done elementwise.

Matrix multiplication can be performed by using `@` in place of the `*` symbol, or by using `np.dot()` or `np.outer()` as appropriate

### Third Task
* Define a 1 dimensional array with any 5 elements you desire
* Print the array
* Print the 2nd element in the array
* Multiply this array by 2
* Use `np.dot` and `np.outer` to take the inner and outer product of the array with itself
* Define a second array of the same size
* Add the two arrays together

In [4]:
#define an array


In [5]:
#print the array


In [6]:
#print the second element of the array


In [7]:
#multiply the array by 2


In [8]:
#calculate the inner product


In [9]:
#calculate the outer product


In [10]:
#define a second array


In [11]:
#add the two arrays


------------
## Dictionaries
A dictionary is an extremely useful data structure that consists of key-value pairs. The purpose of a dictionary is to allow for indexing by a key, instead of by order. A dictionary is created by using the `{key: value}`. A key may not be a mutable object, such as a list or another dictionary. Each key must also be unique. If another key of the same object is added, it is overwritten. A value can be any object. Some useful methods for a dictionary are:

* `.get(key)` - returns the value of a specified key (can also use `dict[key]`)
* `.pop(key)` - removes key from dictionary and returns value that was removed
* `.keys()` - returns the keys of the dictionary
* `.values()` - returns the value of the dictionary
* `.items()` - returns the dictionary as a list containing the tuple for each key-value pair (useful for iterating)
* `.update(new_dict)` - adds the given dictionary to the dictionary, either updating existing keys or adding new ones
* `.setdefault(key, value)` - returns the value with the specified key, if the key does not exists, inserts it with the value

### Fourth Task

* Create a body weight dictionary named BW_dict for human, cyno, and mouse. The keys should be the species names and the values should be the body weights.

  Note, classic body weights: human = 70, cyno = 3.5, mouse = 0.02
* Call and print the cyno body weight
* Add an entry for rat (bw = 0.25)
* Print the dictionary to see that everything has been added correctly

In [12]:
# create a dictionary:


In [13]:
#print the cyno body weight


In [14]:
#add an entry for rat


In [15]:
# print dictionary


----------------
## Data Frames

Python uses the `pandas` package to create and manipulate data frames. A data frame is a table with labelled columns and rows. Data frames are useful because they can have a mix of different data types as entries. This is particularly helpful when working with string based labels (like species, units, etc) and float or integer data (measurements or parameters).

One of the most commonly used outputs of the QSP notebook is in the form of a data frame. And client data is typically imported as a data frame.

A dataframe can be created in a similar manner as a dictionary by using `df = pd.DataFrame({'column': [list of entries]})`. We will cover importing a dataframe in the introductory plotting notebook.

To call a column from the dataframe you can use either `df['column']` or `df.column`. This second option only works if there are no spaces in the column name.

Rows in a data frame are labeled by `index`. By default these are integers starting at zero - though you many turn a non integer column into indices. To call specific rows you may use similar method as in list and array indexing `df.loc[start:stop:step]`.

Some useful methods for data frames are listed here:
* `.query('condition')` - calls all rows that satisfy a specific condition. Note the condition must be listed as a string and is typically regarding the value of entries in a specific column.
* `.groupby([list of column])` - creates a grouped object collecting all row entries that share the same listed column values
* `.groupby([list of column]).agg(summary statistic)` - creates a new dataframe that summarizes the remaining columns after grouping

### Fifth Task

* Create a dataframe for the following minimal dummy data set:
$$\begin{array}{|c|c|c|c|}
\hline\text{\textbf{species}}& \text{\textbf{output}} & \text{\textbf{unit}} & \text{\textbf{measurement}}\\ \hline \hline
 \text{human}& \text{BW}& \text{kg}& \text{68.5} \\\hline
 \text{human}& \text{BW}& \text{kg}& \text{76.57} \\\hline
 \text{human}& \text{BW}& \text{kg}& \text{68.7} \\\hline
 \text{cyno}& \text{BW}& \text{kg}& \text{3.11} \\\hline
 \text{cyno}& \text{BW}& \text{kg}& \text{2.61} \\\hline
 \text{cyno}& \text{BW}& \text{kg}& \text{3.87} \\\hline
 \text{human}& \text{volume\_central}& \text{L}& \text{2.85} \\\hline
 \text{human}& \text{volume\_central}& \text{L}& \text{3.24} \\\hline
 \text{human}& \text{volume\_central}& \text{L}& \text{3.01} \\\hline
 \text{cyno}& \text{volume\_central}& \text{L}& \text{0.154} \\\hline
 \text{cyno}& \text{volume\_central}& \text{L}& \text{0.153} \\\hline
 \text{cyno}& \text{volume\_central}& \text{L}& \text{0.146}\\\hline
\end{array}$$
* Use `display(...)` to display the data frame you created
* Display just the human measurements
* Display just the body weight measurements
* Group the data frame by species and output and calculate the mean and standard deviation of the measured values
  

In [16]:
# create a data frame


In [17]:
# display the data frame


In [18]:
#display human measurements


In [19]:
#display BW


In [20]:
# group by species and measurement and take statistics
