# Lesson 01

## Lesson Objectives
* Understand what is a NumPy **array**
* Experiment with array **builtin methods**
* Use array **methods and attributes**

<img src="https://raw.githubusercontent.com/numpy/numpy/181f273a59744d58f90f45d953a3285484c72cba/branding/logo/primary/numpylogo.svg" width="25%" height="25%" />

* NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
* It is a **Linear Algebra** Library and almost all of the libraries used in Data Science rely on NumPy as one of their main building blocks.

---

## Import Package

* Colab offers a session with a set of packages already installed. To check which packages are installed type in a code cell **!pip freeze** and run it. In case NumPy is not installed, you may type and run in a code cell **!pip install numpy**
* NumPy should be already incluced in this set of packages. You will just need to import it.

In [1]:
import numpy as np

---

## Array

* An **array** is the foundation of NumPy, it is defined as a grid of values, either numbers or not. It comes as vector or matrix, where vector is a 1-d array, and matrix is 2-d (or n dimension) array. A matrix can also have have 1 row x 1 column.
* An array is **useful** because it helps to organize data. With this structure, elements can easily be sorted or searched.
* We can create a 1-d array based on a list:


In [2]:
my_list = [7,9,88,4621]
my_list

[7, 9, 88, 4621]

In [3]:
arr = np.array(my_list)
arr

array([   7,    9,   88, 4621])

* In the previous example, please note 1 bracket before the first item. That indicates it is a **1-d array**. 


```
array([   7,    9,   88, 4621])
```



* Just a side note
  * You don't have necessarily to pass the list as a variable at np.array() function, 
  * You can write the list directly if you prefer: np.array([7,9,88,4621])
  * Both create the same array

* An array can **handle** numbers (integer, float etc), strings (text), timestamps (dates).

In [4]:
my_list = ['text','label_example',55,150,'final_text_example']
arr = np.array(my_list)
arr

array(['text', 'label_example', '55', '150', 'final_text_example'],
      dtype='<U18')

* If your list has more than 1 dimension, you can create a **2-d array**, or matrix.
* Please note the 2 brackets before the first item. That indicates it is a 2-d array.


In [5]:
my_matrix = [[10,80,77], [99,99,99], ["this is good","string1","example"]]
my_matrix

[[10, 80, 77], [99, 99, 99], ['this is good', 'string1', 'example']]

In [6]:
arr = np.array(my_matrix)
arr

array([['10', '80', '77'],
       ['99', '99', '99'],
       ['this is good', 'string1', 'example']], dtype='<U21')

---

## Built-in methods to generate arrays

* You can **generate data for your arrays** using built-in methods
* You can quickly create an **evenly spaced** array of numbers using np.arange(). 
    * You may provide 3 arguments: the start, stop and step size of values interval. 
    * Stop argument is not inclusive. 
    * Play around with different step to see the effect. You may try and see the effect with step as 0.5, 1, 2, and 5.

In [7]:
arr = np.arange(start=1,stop=9,step=1)
arr

array([1, 2, 3, 4, 5, 6, 7, 8])

* You can also create an **array of zeros**. Just pass the shape of the desired array. The example below has a shape of 2 x 3 and is a 2-d array

In [8]:
arr = np.zeros((2,3))
arr

array([[0., 0., 0.],
       [0., 0., 0.]])

* The example below has shape of 2 x 2 x 2 and is a 3-d array of zeros
* Please note the 3 brackets before the first item. That indicates it is a 3-d array.

In [9]:
arr = np.zeros((2,2,3))
arr

array([[[0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.]]])

* You could also create an **array of all ones**. The example below is a 2-d array.

In [10]:
arr = np.ones((2,5))
arr

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

* You can also create **identity matrix**, that is a square matrix that has ones along its main diagonal and zeros everywhere else.
* The example below is an identity matrix of shape 4 x 4.

In [11]:
arr = np.eye(4)
arr

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

* A similar function to np.arange() is **np.linspace()**, but instead of step argument, there is num argument. Num is the number of samples that need to be retrieved in that interval. 
  * You will provide as argument the start, end and how many points you want in between. 
  * Stop argument here is inclusive

In [12]:
arr = np.linspace(start=1,stop=50,num=5)
arr

array([ 1.  , 13.25, 25.5 , 37.75, 50.  ])

* You can also create an array of a given shape with **random values from 0 to 1 using np.random.rand()**
    * The arguments are the shape you are interested

In [13]:
arr = np.random.rand(2,2)  # 2-d array
arr

array([[0.34675942, 0.06124652],
       [0.50228341, 0.94876263]])

In [14]:
arr = np.random.rand(3,5,2) # 3-d array
arr

array([[[0.07784958, 0.78660184],
        [0.34746185, 0.5739071 ],
        [0.71602508, 0.29144449],
        [0.18448641, 0.27181435],
        [0.43968437, 0.42659782]],

       [[0.01871331, 0.88398416],
        [0.09285539, 0.95513817],
        [0.80784937, 0.45410831],
        [0.74082035, 0.92137276],
        [0.29756483, 0.71545594]],

       [[0.90138414, 0.51007303],
        [0.2214252 , 0.84399849],
        [0.08037086, 0.65675383],
        [0.64750537, 0.23585504],
        [0.48890788, 0.38713603]]])

* You can also create an array of given shape from a **"standard normal" distribution using random.randn()**
* A standard normal distribution is a normal distribution with a mean of zero and standard deviation of 1. We will get back to that in future sections of the course

In [15]:
arr = np.random.randn(8,2) # 2-d array
arr

array([[ 0.80945946, -1.69458939],
       [-0.98210179, -0.23687888],
       [ 0.73629073, -0.80102737],
       [ 0.27509022,  1.20315952],
       [-0.74515798,  0.38215246],
       [ 0.2571301 ,  1.29532732],
       [ 1.5451975 ,  1.24976237],
       [-0.55568597, -0.5971104 ]])

In [16]:
arr = np.random.randn(80) # 1-d array
arr

array([ 1.01035878, -1.81223282, -1.06348074, -1.04164066, -0.70587014,
       -0.63608822,  0.34638672,  0.69044209, -0.70631284, -0.39567035,
        0.48375374, -0.37779522,  1.11429994,  1.83773154, -1.10202125,
        1.09516886,  2.26883814, -0.06407556, -1.14852377, -0.1984065 ,
       -0.64755947,  2.36583867,  0.3401628 ,  2.0863095 ,  0.39149914,
        0.35251181,  0.94688062, -0.18250771, -0.44572073,  0.83488438,
        0.19276051, -1.06676858, -0.33002978, -0.37262604, -0.04192381,
       -0.14651586, -0.67324385, -0.78593054, -0.51206091,  0.95795145,
       -0.60635976, -0.72069135,  1.88732665,  0.86512039, -0.01652148,
       -0.98044635,  1.39022487, -1.1392793 , -1.25862738, -0.92748054,
        0.240165  , -1.0591493 , -2.53706341,  0.06525959,  0.28768071,
       -0.75931668,  0.93424581, -1.38028274,  0.58909189, -1.36596083,
        0.71002022, -0.08766462,  0.99078968,  0.38598278, -0.37217782,
       -0.11516998,  1.20810404, -1.35580297,  0.54496886, -0.72

* You can also create random integers setting the interval and size using **np.random.randint()**
  * The arguments for interval are: low (inclusive) and high (exclusive). Size is the output shape, it can be an integer or tuple.

In [17]:
arr = np.random.randint(low=10,high=50,size=5) # 1-d array
arr 

array([44, 45, 32, 28, 41])

In [18]:
arr = np.random.randint(low=250,high=888,size=(4,3)) # 2-d array
arr 

array([[390, 674, 626],
       [605, 746, 429],
       [640, 291, 743],
       [743, 477, 467]])

---

* You may be interested to generate "constant" random values
  * **Run the example below multiple times, the random values will change**

In [19]:
arr = np.random.randint(low=10,high=50,size=5)
arr

array([42, 32, 48, 41, 45])

* You need to set **numpy seed** in order to get constant random values
  * In a jupyter notebook code cell, you just have to add np.seed() before defining your array(s)
  * The argument is seed, a integer. You can set any integer
  * **Run multiple times the cell below and note the array values will be the same**

In [20]:
np.random.seed(seed=123)
arr = np.random.randint(low=10,high=50,size=5)
arr

array([12, 38, 44, 48, 27])

---

## Array Methods

* You can reshape the array without changing the data within it using  .reshape() method.
  * the example below shows a 1-d array, with shape 40. 

In [21]:
np.random.seed(seed=0)
arr = np.random.randint(low=1,high=150,size=40)
arr

array([ 48, 118,  68, 104,  10,  22,  37,  88,  71,  89, 141,  59,  40,
        88,  89,  82,  26,  78,  73,  10, 149, 116,  80,  83, 100,  30,
       148, 148, 143,  33,  10, 128,  33,  32, 115,  29,  35, 129, 129,
        54])

* You can reshape as 4 x 10, transforming into a 2-d array

In [22]:
arr = arr.reshape(4,10)
arr

array([[ 48, 118,  68, 104,  10,  22,  37,  88,  71,  89],
       [141,  59,  40,  88,  89,  82,  26,  78,  73,  10],
       [149, 116,  80,  83, 100,  30, 148, 148, 143,  33],
       [ 10, 128,  33,  32, 115,  29,  35, 129, 129,  54]])

In [23]:
arr.reshape(1,-1) #####

array([[ 48, 118,  68, 104,  10,  22,  37,  88,  71,  89, 141,  59,  40,
         88,  89,  82,  26,  78,  73,  10, 149, 116,  80,  83, 100,  30,
        148, 148, 143,  33,  10, 128,  33,  32, 115,  29,  35, 129, 129,
         54]])

* There can be an opposite situation where you have a multidimensional array and want to transform to a 1-d array. You can use **flatten()** for it:

In [24]:
np.random.seed(seed=0)
arr = np.random.randint(low=1,high=150,size=(2,5))
arr

array([[ 48, 118,  68, 104,  10],
       [ 22,  37,  88,  71,  89]])

In [25]:
arr.flatten()

array([ 48, 118,  68, 104,  10,  22,  37,  88,  71,  89])

In [26]:
arr.reshape(-1)

array([ 48, 118,  68, 104,  10,  22,  37,  88,  71,  89])

---

* Min and Max values can be acessed using .min() and .max() methods.

In [27]:
arr.max()

118

In [28]:
arr.min()

10

* You can determine the index position of the minimum or maximum value in the darray along a particular axis using the argmin() and argmax() methods

In [29]:
arr.argmax()

1

In [30]:
arr.argmin()

4

---

## Array Attributes

*  You can check the shape and type of a NumPy array using, respectively, the attributes **.shape** and **.dtype**

In [31]:
arr = np.arange(start=1,stop=11,step=1).reshape(5,2)

print(
    f"* Array:\n {arr} \n\n"
    f"* Array shape: \n {arr.shape} \n\n"
    f"* Array type: \n {arr.dtype}"
    )

* Array:
 [[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]] 

* Array shape: 
 (5, 2) 

* Array type: 
 int64


---

## Challenges

### 1

* Define the numpy seed=1.
* In a variable called **arr**, Create a 2-d array, 4 x 4, with random integers, where the lowest value is 1 and the max 100 (inclusive).
* In a print() statement, display the array

In [32]:
# place your code here - john
# write the code

### 2

# New section

* Using your NumPy knowledge, and python knowledge with print statment function and displaying variables using f-string, create the following statement:
  * **"The max value for the array is 80 and its index location is 6. The min value for the array is 2 and its index location is 9."**

In [39]:
#%%writefile challeng.py
# write your code below
def add(a, b):
  '''working'''
  return a + b

---

# Challenge Tests

In [37]:

import unittest

results = {'Test 1': 'Failed', 'Test 2': 'Failed'}
result = None
class TestChallenge(unittest.TestCase):
  def test_add(self):
    self.assertEqual(add(4, 5), 9)
    result = 'Passed'
    results['Test 1'] = result
  
  def test_named_correctly(self):
    resultb = None
    try:
      add(4, 5)
      resultb = 'passeddddl'
    except:
      pass
    self.assertEqual(resultb, 'passeddddl')
    results['Test 2'] = resultb

    
res = unittest.main(argv=[""], verbosity=2, exit=False)



test_add (__main__.TestChallenge) ... ERROR
test_named_correctly (__main__.TestChallenge) ... FAIL

ERROR: test_add (__main__.TestChallenge)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-37-ed5dd9838ae2>", line 8, in test_add
    self.assertEqual(add(4, 5), 9)
NameError: name 'add' is not defined

FAIL: test_named_correctly (__main__.TestChallenge)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-37-ed5dd9838ae2>", line 19, in test_named_correctly
    self.assertEqual(resultb, 'passeddddl')
AssertionError: None != 'passeddddl'

----------------------------------------------------------------------
Ran 2 tests in 0.006s

FAILED (failures=1, errors=1)


# Submission Section

In [41]:
# execfile('tests.py')
# exec(compile(open('tests.py').read(), 'tests.py', 'exec'))
!wget -q https://testcolab.s3-eu-west-1.amazonaws.com/tests.py
exec(compile(open('tests.py').read(), 'tests.py', 'exec'))
!rm 'tests.py'
print('CHALLENGE RESULT', results)


WE are running!!
Running {'Test 1': 'Failed', 'Test 2': 'Failed'}
CHALLENGE RESULT {'Test 1': 'passed', 'Test 2': 'passed'}
