# Python and Numpy Lab

The popularity of Python can be attributed to its flexibility and the large number of open source packages that enhance its capabilities.  Here, we will practice some basic Python and get familiar with Numpy.  

Python itself is a general purpose coding language that is easy to learn.  Readability and clarity are central tenets of writing good Python code, and the language is reliable and stable.  However, it is often criticized as being "slow," which is true when compared lower level languages like C, for instance.  For large numerical calculations, this limitation on core Python can be a restriction.  Therefore, Numpy, a library that actually implements numerical operations in C, is often used instead to offer a speed-up and recover much of the performance... plus the syntax is actually simpler.  The primary goal of this lab is to get comfortable using Python, but we also want to demonstrate how useful Numpy is in accelerating your analyses.  We've included the ``%%time`` cell magic where needed to measure the time to execute your cells of code.

Let's begin by exploring how variables are assigned, in particular how values are cast as ``int`` or ``float`` since they behave differently.  First, let's make some variables.  Let's make two ``int`` values named ``a`` and ``b``, and two ``float`` values named ``c`` and ``d``.

### Basic Python

In [9]:
a = 4
b = 5

c = 3.45
d = 6.5
e = 6.50
print ('a+b =', a+b)
print (a+b)
print (c+d)
print (c+e)

a+b = 9
9
9.95
9.95


Print ``a+b`` and ``c+d``.

In [11]:
print (`a+b`)

SyntaxError: invalid syntax (<ipython-input-11-4e1a73a21d97>, line 1)

Now, let's divide these same numbers.

In [13]:
a = 4
b = 5

c = 3.45
d = 6.5
e = 6.50
print ('a/b =', a/b)
print ('a*b =', a*b)
print ('c+d =',c+d)
print (c+e)

a/b = 0.8
a*b = 20
c+d = 9.95
9.95


Let's practice working with dictionaries by storing these values with their name as the key and then retrieving them.

In [21]:
# store values here

my_dict = {'i1':'One','i2':'Two','i3':'Three','i4':'Four'}
i4 = my_dict['i4']
print(i4)
my_dict.items()

Four


dict_items([('i1', 'One'), ('i2', 'Two'), ('i3', 'Three'), ('i4', 'Four')])

In [16]:
# retrieve values here
i4 = my_dict['4']

NameError: name 'my_dict' is not defined

### Python Loops

These have been simple calculations and in real application we have to process much more than a few numbers at a time.  We've created an array of "data" for you to process.  Create a for loop to iterate through the list that multiplies it by 2 and subtracts 3.  Save it to the empty list (``proc_data``) using ``.append()``.  To check that it is working as it should, look at the first 5 entries.  The ``%%time`` magic will time the cell execution.

In [47]:
%%time

data = range(6)
proc_data = []
strdata = ['apple', 'banana', 'cherry']
#fruits.append("orange")
for i in data:
    proc_data = i*2
    print(proc_data)

    proc_data = i-3
    print(proc_data)
    strdata.append('9');
    print(strdata)
    print('------')
    
# Your code goes here

0
-3
['apple', 'banana', 'cherry', '9']
------
2
-2
['apple', 'banana', 'cherry', '9', '9']
------
4
-1
['apple', 'banana', 'cherry', '9', '9', '9']
------
6
0
['apple', 'banana', 'cherry', '9', '9', '9', '9']
------
8
1
['apple', 'banana', 'cherry', '9', '9', '9', '9', '9']
------
10
2
['apple', 'banana', 'cherry', '9', '9', '9', '9', '9', '9']
------
CPU times: user 807 µs, sys: 136 µs, total: 943 µs
Wall time: 567 µs


### Numpy Calculations

How does Numpy compare?  First, import the module.

In [None]:
import numpy as np

Now, do the same operation as before (hint it only takes one short line of code). Again, print the first 5 entries and checkout the time to see the speed-up.

In [53]:
import numpy as np

data = np.arange(6)
print(data)
# Your code goes here

[0 1 2 3 4 5]


### Slicing and Masking Numpy Arrays 

Being able to manipulate and extract data from Numpy arrays is an important skill for effectively doing analysis in Python.  Let's consider the following 2D array, initialized to:

In [92]:
array = np
array = np.random.rand(5,5)
print(array)
first_element = array[2,2]
minus = array[-2:]
first_row = array[0]
print('first_element',first_element)
print('----------------------------------')
print('first_row',first_row)
print('----------------------------------')
print('minus',minus)
print('----------------------------------')
mask = 0.5
masked_array = array[array>0.5]
print(masked_array)

[[0.43249474 0.04548082 0.38249116 0.24703799 0.42809895]
 [0.90598317 0.71182877 0.58902317 0.74005154 0.8705758 ]
 [0.71369843 0.51760861 0.9191388  0.96691854 0.65837104]
 [0.15865303 0.51550533 0.8602662  0.48891993 0.45272948]
 [0.20347077 0.66426669 0.96736087 0.76167716 0.0092583 ]]
first_element 0.9191388048940535
----------------------------------
first_row [0.43249474 0.04548082 0.38249116 0.24703799 0.42809895]
----------------------------------
minus [[0.15865303 0.51550533 0.8602662  0.48891993 0.45272948]
 [0.20347077 0.66426669 0.96736087 0.76167716 0.0092583 ]]
----------------------------------
[0.90598317 0.71182877 0.58902317 0.74005154 0.8705758  0.71369843
 0.51760861 0.9191388  0.96691854 0.65837104 0.51550533 0.8602662
 0.66426669 0.96736087 0.76167716]


Let's pull out the first element of the array:

In [91]:
first_element = # get first element

SyntaxError: invalid syntax (<ipython-input-91-6de4e3f3bd43>, line 1)

What is the first row?

In [None]:
first_row = # get first row

Let's finish by filtering out all values less than 0.5.  First, define a "mask" to the array.  Then, filter the values by applying the mask to ``array``.

In [None]:
mask = # make mask
masked_array = # mask array
print(masked_array)