## CMPINF 2100 Week 03
### Manipulate and Summarize 1D NumPy Arrays

We can modify or manipulate 1D NumPy Arrays multiple ways

* Modify individual values
* Slice/Index to remove elements
* COMBINE arrays together
* Change the ORDER of the values
* SUMMARIZE the values within the array

### Import NumPy

In [1]:
import numpy as np

### Create an array
We will work with a few simple arrays in this notebook.

In [4]:
a = np.arange(6)

In [5]:
a

array([0, 1, 2, 3, 4, 5])

In [6]:
a.ndim

1

In [7]:
a.shape

(6,)

In [8]:
a.size

6

## Modify values
NumPy arrays are MUTABLE, meaning we can change their values!!!

In [9]:
a

array([0, 1, 2, 3, 4, 5])

### Slice of Index
Slicing or indexing works just as slicing/indexing the base Python lists!

In [10]:
a[0]

0

In [11]:
a[-1]

5

In [12]:
a[: 3]

array([0, 1, 2])

In [13]:
a[3:]

array([3, 4, 5])

In [14]:
a[1:]

array([1, 2, 3, 4, 5])

How can we modify or change a value?
Slice or index to indentify that element and assign a new value

In [15]:
a[0] = 100

In [16]:
a

array([100,   1,   2,   3,   4,   5])

But something to keep in mind...

In [17]:
b = np.linspace(-5, 5, num=21)

In [18]:
b

array([-5. , -4.5, -4. , -3.5, -3. , -2.5, -2. , -1.5, -1. , -0.5,  0. ,
        0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ])

In [20]:
b.size

21

In [23]:
a[:3]

array([100,   1,   2])

In [24]:
b[:3]

array([-5. , -4.5, -4. ])

## Conditional subset

Conditional subsetting FILTERS or SLICES an array based on a CONDITIONAL TEST!!!


Whats a conditional test?

Is 1 less than 3?

In [25]:
1 < 3

True

In [26]:
5 < 3

False

In [29]:
1 <= 3

True

In [32]:
a[5] = 101

In [41]:
a[0] = -101

In [42]:
a

array([-101,    1,    2,    3,    4,  101])

We do not need to iterate with for loops or comprehensions to do simple operations in NumPy!

In [34]:
a < 3

array([False,  True,  True, False, False, False])

In [35]:
dir(a)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__class_getitem__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__dlpack__',
 '__dlpack_device__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',

NumPy arrays are intended to be HOMOGENOUS. That means all elements should be the same data type within the array!!

Let's assign the result of the conditional test to a new object!

In [44]:
a_mask = a < 3

In [37]:
%whos

Variable   Type       Data/Info
-------------------------------
a          ndarray    6: 6 elems, type `int64`, 48 bytes
a_mask     ndarray    6: 6 elems, type `bool`, 6 bytes
aa         ndarray    6: 6 elems, type `int64`, 48 bytes
b          ndarray    21: 21 elems, type `float64`, 168 bytes
np         module     <module 'numpy' from '/Ap<...>kages/numpy/__init__.py'>


In [38]:
a_mask

array([False,  True,  True, False, False, False])

This MASK serves a useful purpose. It identifies ALL elements in the array `a` that SATISFY the conditional test!!

This means we can use the MASK to KEEP or FILTER the elements that meet our test!!!

In [45]:
a[a_mask]

array([-101,    1,    2])

I need to remember what object the MASK was applied to and what the conditional test is!!


Add the conditional test within the slicing/index brackets!!!

In [46]:
a [a<3]

array([-101,    1,    2])

Return all values of `b` that are less than 3.

In [47]:
b[b<3]

array([-5. , -4.5, -4. , -3.5, -3. , -2.5, -2. , -1.5, -1. , -0.5,  0. ,
        0.5,  1. ,  1.5,  2. ,  2.5])

## Combine multiple arrays together

Let's create 2 small lists.

In [48]:
a_list = [1, 2, 3]

In [49]:
b_list = [1, 2, 3]

In [50]:
b_list + a_list

[1, 2, 3, 1, 2, 3]

However, we know that the `+` operator does NOT work the same way in NumPy!!! 

In [53]:
a_array = np.array(a_list)

In [56]:
b_array = np.array(b_list)

In [57]:
b_array + a_array

array([2, 4, 6])

In [58]:
np.concatenate([a_array, b_array])

array([1, 2, 3, 1, 2, 3])

In [61]:
np.concatenate([a_array, b_array, a, b])

array([   1. ,    2. ,    3. ,    1. ,    2. ,    3. , -101. ,    1. ,
          2. ,    3. ,    4. ,  101. ,   -5. ,   -4.5,   -4. ,   -3.5,
         -3. ,   -2.5,   -2. ,   -1.5,   -1. ,   -0.5,    0. ,    0.5,
          1. ,    1.5,    2. ,    2.5,    3. ,    3.5,    4. ,    4.5,
          5. ])

In [62]:
type(np.concatenate([a_array, b_array, a, b]))

numpy.ndarray

The `np.concatenate()` function gives us arrays with all the same attributes and methods as if we made the arrays ourselves!!!

In [64]:
np.concatenate([a_array, a, b, b_array]).ndim

1

In [65]:
np.concatenate([a_array, a, b, b_array]).shape

(33,)

In [66]:
np.concatenate([a_array, a, b, b_array]).size

33

Let's assign the result of the concatenation to a new object.

In [67]:
big_array = np.concatenate([a_array, a, b, b_array])

In [68]:
big_array.ndim

1

In [69]:
big_array.shape

(33,)

In [70]:
big_array.size

33

In [71]:
%whos

Variable    Type       Data/Info
--------------------------------
a           ndarray    6: 6 elems, type `int64`, 48 bytes
a_array     ndarray    3: 3 elems, type `int64`, 24 bytes
a_list      list       n=3
a_mask      ndarray    6: 6 elems, type `bool`, 6 bytes
aa          ndarray    6: 6 elems, type `int64`, 48 bytes
b           ndarray    21: 21 elems, type `float64`, 168 bytes
b_aray      ndarray    3: 3 elems, type `int64`, 24 bytes
b_array     ndarray    3: 3 elems, type `int64`, 24 bytes
b_list      list       n=3
big_array   ndarray    33: 33 elems, type `float64`, 264 bytes
np          module     <module 'numpy' from '/Ap<...>kages/numpy/__init__.py'>


## Change element order by sorting

There are many ways to change element ordering, but let's see the `.sort()` method.

In [72]:
big_array

array([   1. ,    2. ,    3. , -101. ,    1. ,    2. ,    3. ,    4. ,
        101. ,   -5. ,   -4.5,   -4. ,   -3.5,   -3. ,   -2.5,   -2. ,
         -1.5,   -1. ,   -0.5,    0. ,    0.5,    1. ,    1.5,    2. ,
          2.5,    3. ,    3.5,    4. ,    4.5,    5. ,    1. ,    2. ,
          3. ])

In [73]:
big_array.sort()

In [74]:
big_array

array([-101. ,   -5. ,   -4.5,   -4. ,   -3.5,   -3. ,   -2.5,   -2. ,
         -1.5,   -1. ,   -0.5,    0. ,    0.5,    1. ,    1. ,    1. ,
          1. ,    1.5,    2. ,    2. ,    2. ,    2. ,    2.5,    3. ,
          3. ,    3. ,    3. ,    3.5,    4. ,    4. ,    4.5,    5. ,
        101. ])

We can ordering with the largest value before the smalle values. This is KNOWN as DESCENDING order, but DESCENDING order is not straight forward in `.sort()` from NumPy.

## Summarize

Summarizing values in an array returns FEWER values!!!!

We have already seen several important functions, such as SUMMING, MEAN, VARIANCE, and STANDARD DEVIATION!!

However, we need to define functions to execute those tasks in base Python.

NumPy is much easier for summarizing than base Python, but we must be careful!!!

In [76]:
big_array.sum()

22.0

In [77]:
big_array

array([-101. ,   -5. ,   -4.5,   -4. ,   -3.5,   -3. ,   -2.5,   -2. ,
         -1.5,   -1. ,   -0.5,    0. ,    0.5,    1. ,    1. ,    1. ,
          1. ,    1.5,    2. ,    2. ,    2. ,    2. ,    2.5,    3. ,
          3. ,    3. ,    3. ,    3.5,    4. ,    4. ,    4.5,    5. ,
        101. ])

In [78]:
help(big_array.sum)

Help on built-in function sum:

sum(...) method of numpy.ndarray instance
    a.sum(axis=None, dtype=None, out=None, keepdims=False, initial=0, where=True)
    
    Return the sum of the array elements over the given axis.
    
    Refer to `numpy.sum` for full documentation.
    
    See Also
    --------
    numpy.sum : equivalent function



We can calculate the MIN and MAX values in the array!!

In [79]:
big_array.max()

101.0

In [80]:
big_array.min()

-101.0

In [81]:
big_array.mean()

0.6666666666666666

In [82]:
a

array([-101,    1,    2,    3,    4,  101])

In [84]:
a.mean()

1.6666666666666667

In [86]:
b.mean()

0.0

In [87]:
big_array.mean()

0.6666666666666666

NumPy also has a method for the VARIANCE!!!

In [89]:
big_array.var()

625.3888888888889

In [90]:
a.var()

3402.555555555556

There is also a method for the standard deviation!!!

In [91]:
big_array.std()

25.007776568277496

In [93]:
big_array.var()**(0.5)

25.007776568277496

But very important!!!


The above values are WRONG!!!!!

NumPy is dangerous!!!

It's made by a company, used by millions of people in other companies, in universities all over the world.

And the DEFAULT way of calculating the variance and the standard deviation!!!

NumPy uses what's known as the BIASED estimator for the variance and the standard deviation!!!!

Instead the `ddof` argument MUST BE equal to 1.

This guves us the UNBIASED estimator!!

In [94]:
a[0]

-101

In [95]:
a[0].mean()

-101.0

In [97]:
a[0].std(ddof=1)

nan

Never use the default `.var()` or `.std()` arguments!!!

ALWAYS set the `ddof=1`!!!

In [99]:
big_array.var(ddof=1)

644.9322916666667

In [100]:
big_array.var()

625.3888888888889

In [101]:
big_array.std(ddof=1)

25.395517156905207

In [102]:
big_array.std()

25.007776568277496

In [110]:
"STD VALUES - BIASED: %1.6f, UNBIASED: %1.6f" %(big_array.std(), big_array.std(ddof=1))

'STD VALUES - BIASED: 25.007777, UNBIASED: 25.395517'

In [109]:
"VAR VALUES - BIASED: %1.6f, UNBIASED: %1.6f" %(big_array.var(), big_array.var(ddof=1))

'VAR VALUES - BIASED: 625.388889, UNBIASED: 644.932292'