# Python and Numpy Lab

The popularity of Python can be attributed to its flexibility and the large number of open source packages that enhance its capabilities.  Here, we will practice some basic Python and get familiar with Numpy.  

Python itself is a general purpose coding language that is easy to learn.  Readability and clarity are central tenets of writing good Python code, and the language is reliable and stable.  However, it is often criticized as being "slow," which is true when compared lower level languages like C, for instance.  For large numerical calculations, this limitation on core Python can be a restriction.  Therefore, Numpy, a library that actually implements numerical operations in C, is often used instead to offer a speed-up and recover much of the performance... plus the syntax is actually simpler.  The primary goal of this lab is to get comfortable using Python, but we also want to demonstrate how useful Numpy is in accelerating your analyses.  We've included the ``%%time`` cell magic where needed to measure the time to execute your cells of code.

Let's begin by exploring how variables are assigned, in particular how values are cast as ``int`` or ``float`` since they behave differently.  First, let's make some variables.  Let's make two ``int`` values named ``a`` and ``b``, and two ``float`` values named ``c`` and ``d``.

### Basic Python

In [3]:
a = 3
b = 42

c = 3.0
d = 42.0

Print ``a+b`` and ``c+d``.

In [4]:
print(a+b)
print(c+d)

45
45.0


Now, let's divide these same numbers.

In [5]:
print(b/a)
print(d/c)

14.0
14.0


Let's practice working with dictionaries by storing these values with their name as the key and then retrieving them.

In [6]:
# store values here

my_dict = {'a':a, 'b':b, 'c':c, 'd':d}

In [9]:
# retrieve values here

print(my_dict['a'])
print(my_dict['b'])

3
42


### Python Loops

These have been simple calculations and in real applications we have process much more than a few numbers at a time.  We've created an array of "data" for you to process.  Create a for loop to iterate through the list that multiplies it by 2 and subtracts 3.  Save it to the empty list (``proc_data``) using ``.append()``.  To check that it is working as it should, look at the first 5 entries.  The ``%%time`` magic will time the cell execution.

In [12]:
%%time

data = range(1e6)
proc_data = []

# Your code goes here
for val in data:
    new_val = 2.*val+3.
    proc_data.append(new_val)
    
print(proc_data[:5])

[3.0, 5.0, 7.0, 9.0, 11.0]
CPU times: user 164 ms, sys: 10.1 ms, total: 174 ms
Wall time: 173 ms


### Numpy Calculations

How does Numpy compare?  First, import the module.

In [13]:
import numpy as np

Now, do the same operation as before (hint it only takes one short line of code).  Again, print off the first 5 entries and checkout the time to see the speed-up.

In [15]:
%%time

data = np.arange(1e6)

# Your code goes here
proc_data = 2.*data+3.

print(proc_data[:5])

[ 3.  5.  7.  9. 11.]
CPU times: user 14.3 ms, sys: 3.07 ms, total: 17.4 ms
Wall time: 6.58 ms


### Slicing and Masking Numpy Arrays 

Being able to manipulate and extract data from Numpy arrays is an important skill for effectively doing analysis in Python.  Let's consider the following 2D array, initialized to:

In [34]:
array = np.random.rand(5,5)
print(array)

[[0.2214418  0.44836149 0.29246628 0.86303188 0.29546487]
 [0.56926067 0.64908953 0.91493804 0.6948288  0.32511917]
 [0.86064031 0.64800502 0.45651939 0.09780519 0.46871009]
 [0.22567767 0.81827667 0.92711933 0.45929054 0.20265769]
 [0.67379738 0.60825056 0.79439447 0.61328036 0.20413172]]


Let's pull out the first element of the array:

In [31]:
first_element = array[0,0]
print(first_element)

0.7548176099348337


What is the first row?

In [32]:
first_row = array[0,:]
print(first_row)

[0.75481761 0.1051799  0.51636522 0.71510762 0.7296728 ]


Let's finish by filtering out all values less than 0.5.  First, define a "mask" to the array.  Then, filter the values by applying the mask to ``array``.

In [36]:
mask = array > 0.5
masked_array = array[mask]
print(masked_array)

[0.86303188 0.56926067 0.64908953 0.91493804 0.6948288  0.86064031
 0.64800502 0.81827667 0.92711933 0.67379738 0.60825056 0.79439447
 0.61328036]
