### Drill: NumPy Basics of Boolean Selection and Types

#### Getting to know our system and NumPy versions
- First let's import numpy and check its version.
- If you are doing this through ISVC, this is displaying the numpy version of the server 
- If you want to check your system version download this notebook (from the File menu) and run it locally

In [1]:
import numpy as np
print(np.__version__)

1.16.3


- Create an array of numbers 0-9 using NumPy. Assign the output to a variable named `ar` and print the array

In [2]:
# [Answer]
ar = np.arange(10)
ar

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Array properties: Using the array `ar` from above: 
- What are the # of elements in the array?
- What is the shape of the array?

In [3]:
print("Elements:", ar.size)  # could also do a len(ar)
print("Shape:", ar.shape)

Elements: 10
Shape: (10,)


- Looking at the descriptive statistics like: <code>mean</code>, <code>min</code>, <code>max</code>, <code>std</code>, <code>var</code> is one of the first steps in exploring data. Let's do this using NumPy and the array created above. 

In [4]:
print("mean:", ar.mean())
print("min :", ar.min())
print("max :", ar.max())
print("std :", ar.std())
print("var :", ar.var())

mean: 4.5
min : 0
max : 9
std : 2.8722813232690143
var : 8.25


- Let's check whether an array contains desired values. Write a statement using boolean expression to see if any values in the numpy array `ar` are above 7. Assign this boolean expression to the array `over7`

In [5]:
# [Answer]
over7 = ar > 7
over7 # should be an array of T/F statement showing where the original array ar is above 7

array([False, False, False, False, False, False, False, False,  True,
        True])

- Use the `over7` array to filter out value in the `ar` array using bracketing indexing

In [6]:
ar[over7]
# As you can see the over7 array filters the ar values to make a new array where over7 has the value True

array([8, 9])

##### Changing types of a NumPy Array

- Convert the array `demo` below to an int type; print the array

In [7]:
demo = np.array(range(0,100,3)) / 70
demo

array([0.        , 0.04285714, 0.08571429, 0.12857143, 0.17142857,
       0.21428571, 0.25714286, 0.3       , 0.34285714, 0.38571429,
       0.42857143, 0.47142857, 0.51428571, 0.55714286, 0.6       ,
       0.64285714, 0.68571429, 0.72857143, 0.77142857, 0.81428571,
       0.85714286, 0.9       , 0.94285714, 0.98571429, 1.02857143,
       1.07142857, 1.11428571, 1.15714286, 1.2       , 1.24285714,
       1.28571429, 1.32857143, 1.37142857, 1.41428571])

In [8]:
demo.astype(int) 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

- Now convert the array to float32 and float 64
- What's the difference between float32 and float64?

In [9]:
demo.astype('float32')

array([0.        , 0.04285714, 0.08571429, 0.12857144, 0.17142858,
       0.21428572, 0.25714287, 0.3       , 0.34285715, 0.3857143 ,
       0.42857143, 0.47142857, 0.51428574, 0.55714285, 0.6       ,
       0.64285713, 0.6857143 , 0.7285714 , 0.7714286 , 0.8142857 ,
       0.85714287, 0.9       , 0.94285715, 0.98571426, 1.0285715 ,
       1.0714285 , 1.1142857 , 1.1571429 , 1.2       , 1.2428571 ,
       1.2857143 , 1.3285714 , 1.3714286 , 1.4142857 ], dtype=float32)

In [10]:
demo.astype('float64')
# the difference is the precision (like the last number in the array is more precise with float64)

array([0.        , 0.04285714, 0.08571429, 0.12857143, 0.17142857,
       0.21428571, 0.25714286, 0.3       , 0.34285714, 0.38571429,
       0.42857143, 0.47142857, 0.51428571, 0.55714286, 0.6       ,
       0.64285714, 0.68571429, 0.72857143, 0.77142857, 0.81428571,
       0.85714286, 0.9       , 0.94285714, 0.98571429, 1.02857143,
       1.07142857, 1.11428571, 1.15714286, 1.2       , 1.24285714,
       1.28571429, 1.32857143, 1.37142857, 1.41428571])

 - Can we mix types? what's the type of the `test` array below?

In [11]:
test = np.array([True, True, 0.1, 1, 2.5, 7])
print("Type:", test.dtype)
print(test)

Type: float64
[1.  1.  0.1 1.  2.5 7. ]


Let's explore linspace to make an array:
- Reference: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html#numpy-linspace
- numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
  - Return evenly spaced numbers over a specified interval.
  - Returns `num` evenly spaced samples, calculated over the interval [start, stop]
- One use of linspace: Evenly space ticks on an a chart axis

- Make an array named `ln1` below from 0 to 100 spaced in 11 pieces

In [12]:
ln1 = np.linspace(0,100,11)
ln1

array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

- Now make a linspace array named `ln2` from 0 up to 100 spaced in 11 pieces without the end point (so not including 100)
- What's the difference between `ln1` and `ln2`?

In [13]:
ln2 = np.linspace(0,100,11, endpoint=False)
ln2

array([ 0.        ,  9.09090909, 18.18181818, 27.27272727, 36.36363636,
       45.45454545, 54.54545455, 63.63636364, 72.72727273, 81.81818182,
       90.90909091])

<h4>Exploratory analysis of array data</h4> 
Let's look at an array that contains <font color='red'>not-a-number</font> or <b>nan</b> values. 

- Find the min, max, mean and median of the array below named `nanvals`. What happened?

In [14]:
nanvals = np.array([np.nan, 123, 4, 5, 10, 50, 535])
print("min :", nanvals.min())
print("max :", nanvals.max())
print("mean:", nanvals.mean())
print("Median:", np.median(nanvals))

min : nan
max : nan
mean: nan
Median: nan


  r = func(a, **kwargs)


<h4>Additional exercises: NumPy runs a lot faster</h4>  <blockquote>A little optional fyi:  NumPy is based on Atlas, a library for linear algebra operations (see http://math-atlas.sourceforge.net/). NumPy arrays are densely packed arrays of homogeneous type. Python lists, by contrast, are arrays of pointers to objects, even when all of them are of the same type. So, you get the benefits of <font color='red'>locality of reference</font>. If we're summing integers, there's a specialized CPU vector operation (https://superuser.com/questions/1170062/whats-the-difference-between-a-superscalar-and-a-vector-processor).

Also, many NumPy operations are implemented in C, avoiding the general cost of loops in Python, pointer indirection and per-element dynamic type checking. The speed boost depends on which operations you are performing, but a few orders of magnitude isn't uncommon in number crunching programs.</blockquote>

- Using the magic commands to time a code block in Jupyter: first create a list of range of numbers 0, 25000 and then filter the list by only values that are even

In [15]:
%%timeit

demo_list = range(25000)
[x for x in demo_list if x % 2 == 0]

1.76 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


- Now time the same creating an NumPy array of values 0 to 25000 and filtering out only the even values

In [16]:
%%timeit

demoNp = np.arange(25000)
demoNp[demoNp % 2 == 0]

254 µs ± 885 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
