## NUMPY ARRAYS: OVERVIEW
-  a 1D + collection of (typically) numerical values of the *same* data type
  - unlike a list/tuple which can contain a mixture of data types
<br><br>
-  calculations using array operations can be *more efficient* than the equivalent using a loop
<br><br>
-  we'll focus on 1D and 2D arrays, but it's possible to have higher dimensioned arrays
    - can run into memory storage issues -- careful!

## NUMPY ARRAYS: DATA TYPES

-   a bit different than in Python itself
<br><br>
- can specify data type when creating array using *dtype* option
<br><br>
- **KEY**:  numpy arrays have a **fixed** data type
  - so pulling an float value into an integer array will *truncate* the value! 
<br><br> 
  
-   most commonly used types:
      - np.float64 or np.float_ (double precision float)
      - np.int64 or np.int_ (64-bit integer)
      - np.bool_ (Boolean - True or False)

## NUMPY ARRAYS: CREATION
### 1)  Creation, specific values: use *np.array* 

In [1]:
# Must always import numpy!
import numpy as np

# Example 1D array
             # input argument is a list or tuple 
b = np.array([2,3,4,5]) # this will create a 1D array of length 4
print(b)

[2 3 4 5]


In [2]:
# Example 2D array
# Use a nested list or tuple to create an array with two or more dimensions

            # 1st row, 2nd row
b = np.array([[2,3,4],[5,6,7]]) # this will create a 2D array that is 
                                # 2 rows x 3 columns 
print(b)

[[2 3 4]
 [5 6 7]]


### 2) Creation, all zeros or ones: use *np.zeros* or *np.ones*
- recommended over using an array created by *np.empty* so you actually know what is in array!

In [3]:
# Example
a = np.zeros((2,3),dtype=np.float_)  # more on data type options soon!
print(a)

[[0. 0. 0.]
 [0. 0. 0.]]


### 3) Creation, based on another array:  use *np.zeros_like()* or *np.ones_like()*

In [4]:
# Example
c = np.zeros_like(b) # same size and dimensions as b!
print(c) # also a 2 row by 3 column array!

[[0 0 0]
 [0 0 0]]


### 4) Creation, sequential arrays: use *np.arange(), np.linspace(), np.logspace()*
- #### np.arange()
  - *produces*: an array of sequential integers or floats
  - *you pick*: the beginning and end values (end value is NOT included), and the stride

In [5]:
a = np.arange(0,20,2)
print(a) # array will NOT include 20 as a value 

b = np.arange(0,1,0.1) 
print(b) # or 1

c = np.arange(9,0,-3)
print(c) # or 0

[ 0  2  4  6  8 10 12 14 16 18]
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
[9 6 3]


- #### np.linspace()
  - *produces*: an array of sequential integers or floats
  - *you pick*: the beginning and end values (end value IS included) and number of array elements 

In [6]:
a = np.linspace(0,10,21)
print(a)

# LOGSPACE is similar, except values produced are logarithmically spaced

[ 0.   0.5  1.   1.5  2.   2.5  3.   3.5  4.   4.5  5.   5.5  6.   6.5
  7.   7.5  8.   8.5  9.   9.5 10. ]


### 5) Creation, random number arrays:  functions in the random module
- #### overview
  - *pratical use*: when you need "unpredictable" numbers -e.g., in statistical data sampling, computer simulations (e.g., Monte Carlo simulations)
  - "random" module in numpy! https://numpy.org/doc/stable/reference/random/index.html
    - "pseudo-random" (truly random would be rooted in upredictable physical proccesses)
    - can set a "seed" for the generation - *ra.seed()*
    - create:
      - uniformly distributed random integers
      - uniformly distributed random floating-point values
      - normally distributed random floating-point values

- #### uniformly-distributed random integers: *randint()*

In [7]:
from numpy import random as ra

# Random integers
# The function in DeCaria has since been deprecated!
# Use randint instead
# randint(lower bound of range, upper bound of range, # of numbers to generate)
ra.randint(-100,-17,5)

array([-72, -28, -67, -63, -36])

- #### uniformly-distributed random floating-points: *random_sample()*

In [8]:
# Uniformly distributed floating points
# Default is from range 0 to 1
# random_sample(# of numbers)
ra.random_sample(10)

array([0.62516   , 0.53858693, 0.67247493, 0.82648048, 0.5081823 ,
       0.46852248, 0.0939284 , 0.67229828, 0.31921911, 0.21082586])

In [9]:
# You can modify to get larger range

# Range between 8 and 10
(10-8)*ra.random_sample(10) + 8

array([9.08517639, 9.21678238, 8.24138395, 8.81116528, 9.0173817 ,
       8.04606401, 8.13827741, 9.88629916, 8.82615801, 8.72665154])

- #### normally distributed random floating-point values: *normal*()

In [10]:
# Normally distributed
# normal(mean, standard deviation, # of numbers)
ra.normal(10,2,10)

array([ 6.45357801,  9.98681061,  8.80570125, 11.8848245 ,  9.63310283,
        9.38599539,  9.84864043,  9.52909929,  8.31444006, 11.32798575])

## NUMPY ARRAYS: ACCESSING ELEMENTS 
- reminder: indexing begins at **ZERO** and does NOT include the upper bound of the range you give
<br><br>
- accessing a subarray element is also known as *slicing*
  - syntax: x[start:stop:optional step]
<br><br>
- **CAVEAT: ACCESSING ELEMENTS OF AN ARRAY AND SAVING TO A NEW ARRAY DOES *NOT* CREATE A SEPARATE COPY!  SO IF YOU CHANGE THE ORIGINAL ARRAY, YOU ALSO CHANGE THE NEW ARRAY....**
  - if you want a separate copy: use np.copy() for a "deep" copy 

### (1) 1D Array

In [11]:
# Example: 1D array  
a = np.arange(0,20,2)
print(a)

b = a[5] # 6th element in a - result is single value 
print(b)

# Time to slice!
c = a[0:2] # first and second element of a - result is ARRAY
print(c)

d = a[0:3:2] # striding by 2 - skipping every other element 
print(d)

[ 0  2  4  6  8 10 12 14 16 18]
10
[0 2]
[0 4]


### (2) 2D Array

## NUMPY ARRAYS: OVERVIEW
-  a 1D + collection of numerical values of the *same* data type
  - unlike a list/tuple which can contain a mixture of data types
<br><br>
-  calculations using array operations can be *more efficient* than the equivalent using a loop
<br><br>
-  we'll focus on 1D and 2D arrays, but it's possible to have higher dimensioned arrays
    - can run into memory storage issues -- careful!

## NUMPY ARRAYS: DATA TYPES

-   a bit different than in Python itself
<br><br>
- can specify data type when creating array using *dtype* option
<br><br>
- **KEY**:  numpy arrays have a **fixed** data type
  - so pulling an float value into an integer array will *truncate* the value! 
<br><br> 
  
-   most commonly used types:
      - np.float64 or np.float_ (double precision float)
      - np.int64 or np.int_ (64-bit integer)
      - np.bool_ (Boolean - True or False)

## NUMPY ARRAYS: CREATION
### 1)  Creation, specific values: use *np.array* 

In [12]:
# Must always import numpy!
import numpy as np

# Example 1D array
             # input argument is a list or tuple 
b = np.array([2,3,4,5]) # this will create a 1D array of length 4
print(b)

[2 3 4 5]


In [13]:
# Example 2D array
# Use a nested list or tuple to create an array with two or more dimensions

            # 1st row, 2nd row
b = np.array([[2,3,4],[5,6,7]]) # this will create a 2D array that is 
                                # 2 rows x 3 columns 
print(b)

[[2 3 4]
 [5 6 7]]


### 2) Creation, all zeros or ones: use *np.zeros* or *np.ones*
- recommended over using an array created by *np.empty* so you actually know what is in array!

In [14]:
# Example
a = np.zeros((2,3),dtype=np.float_)  # more on data type options soon!
print(a)

[[0. 0. 0.]
 [0. 0. 0.]]


### 3) Creation, based on another array:  use *np.zeros_like()* or *np.ones_like()*

In [15]:
# Example
c = np.zeros_like(b) # same size and dimensions as b!
print(c) # also a 2 row by 3 column array!

[[0 0 0]
 [0 0 0]]


### 4) Creation, sequential arrays: use *np.arange(), np.linspace(), np.logspace()*
- #### np.arange()
  - *produces*: an array of sequential integers or floats
  - *you pick*: the beginning and end values (end value is NOT included), and the stride

In [16]:
a = np.arange(0,20,2)
print(a) # array will NOT include 20 as a value 

b = np.arange(0,1,0.1) 
print(b) # or 1

c = np.arange(9,0,-3)
print(c) # or 0

[ 0  2  4  6  8 10 12 14 16 18]
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
[9 6 3]


- #### np.linspace()
  - *produces*: an array of sequential integers or floats
  - *you pick*: the beginning and end values (end value IS included) and number of array elements 

In [17]:
a = np.linspace(0,10,21)
print(a)

# LOGSPACE is similar, except values produced are logarithmically spaced

[ 0.   0.5  1.   1.5  2.   2.5  3.   3.5  4.   4.5  5.   5.5  6.   6.5
  7.   7.5  8.   8.5  9.   9.5 10. ]


### 5) Creation, random number arrays:  functions in the random module
- #### overview
  - *pratical use*: when you need "unpredictable" numbers -e.g., in statistical data sampling, computer simulations (e.g., Monte Carlo simulations)
  - "random" module in numpy! https://numpy.org/doc/stable/reference/random/index.html
    - "pseudo-random" (truly random would be rooted in upredictable physical proccesses)
    - can set a "seed" for the generation - *ra.seed()*
    - create:
      - uniformly distributed random integers
      - uniformly distributed random floating-point values
      - normally distributed random floating-point values

- #### uniformly-distributed random integers: *randint()*

In [18]:
from numpy import random as ra

# Random integers
# The function in DeCaria has since been deprecated!
# Use randint instead
# randint(lower bound of range, upper bound of range, # of numbers to generate)
ra.randint(-100,-17,5)

array([-21, -28, -96, -22, -31])

- #### uniformly-distributed random floating-points: *random_sample()*

In [19]:
# Uniformly distributed floating points
# Default is from range 0 to 1
# random_sample(# of numbers)
ra.random_sample(10)

array([0.05265187, 0.70752409, 0.95121486, 0.11686867, 0.59371218,
       0.92424236, 0.80634241, 0.11782267, 0.1137101 , 0.00150616])

In [20]:
# You can modify to get larger range

# Range between 8 and 10
(10-8)*ra.random_sample(10) + 8

array([8.6732928 , 8.71916192, 9.28810744, 8.29007514, 8.34403652,
       8.36113825, 8.67193103, 9.67102609, 9.73502614, 8.50962963])

- #### normally distributed random floating-point values: *normal*()

In [21]:
# Normally distributed
# normal(mean, standard deviation, # of numbers)
ra.normal(10,2,10)

array([ 6.15246815,  8.16563114,  8.19577393, 12.5780694 ,  8.0948772 ,
       13.95659027, 11.80986166,  8.29980606, 11.49359039,  9.66560473])

## NUMPY ARRAYS: ACCESSING ELEMENTS 
- reminder: indexing begins at **ZERO** and does NOT include the upper bound of the range you give
<br><br>
- accessing a subarray element is also known as *slicing*
  - syntax: x[start:stop:optional step]
<br><br>
- **CAVEAT: ACCESSING ELEMENTS OF AN ARRAY AND SAVING TO A NEW ARRAY DOES *NOT* CREATE A SEPARATE COPY!  SO IF YOU CHANGE THE ORIGINAL ARRAY, YOU ALSO CHANGE THE NEW ARRAY....**
  - if you want a separate copy: use np.copy() for a "deep" copy 

### (1) 1D Array

In [22]:
# Example: 1D array  
a = np.arange(0,20,2)
print(a)

b = a[5] # 6th element in a - result is single value 
print(b)

# Time to slice!
c = a[0:2] # first and second element of a - result is ARRAY
print(c)

d = a[0:3:2] # striding by 2 - skipping every other element 
print(d)

[ 0  2  4  6  8 10 12 14 16 18]
10
[0 2]
[0 4]


### (2) 2D Array

In [23]:
# Example: accessing elements in 2D array
# DIFFERENT SYNTAX THAN FOR LISTS
# For 3D+ arrays, just extend the indexing syntax

a = np.array([[5,10,15],[20,25,30]])
print(a)

b = a[0,0]  # first row, first column - result is single value: 5
print(b)

c = a[1,2]  # last (second) row, third column - result is single value: 30
print(c)

#Time to slice!
d = a[0,0:2] # first row, first two columns - result is ARRAY: [5 10]
print(d)

e = a[:,2] # all rows, just third column: [15 30]
print(e)

[[ 5 10 15]
 [20 25 30]]
5
30
[ 5 10]
[15 30]


## NUMPY ARRAYS: GETTING CHARACTERISTICS
### (1) shape/size: *np.shape(a), np.size(a)*

In [24]:
# Shape
a = np.array([[2,3,4],[5,6,7]])
s = np.shape(a)
print(s) # a tuple of (number rows, number cols)

(2, 3)


In [25]:
# Size
z = np.size(a)
print(z) # total number of elements in array

6


### (2) summary statistics: *np.min(a), np.max(a), np.sum(a), np.mean(a), etc.*


In [26]:
# Minimum value in array 
np.min(a)

2

In [27]:
# Maximum value in array
np.max(a)

7

In [28]:
# Sum
np.sum(a)

27

In [29]:
# Mean
np.mean(a)

4.5

In [30]:
# Standard deviation
np.std(a)

1.707825127659933

  - #### helpfully, many operations will have a version that ignores NaNs!

In [31]:
# For this example, we first need to insert a NaN into an array

b = np.copy(a) # We don't want to change the array a at all

# Handy operation 'astype' to cast arrays to a different data type 
c = b.astype(np.float64) # If we're going to change a element value to NaN,
                         # we can't do that in an integer array

c[0,0] = np.nan
print(c)

# Alternative version of mean that will ignore NaNs
np.nanmean(c)
# np.mean(c) would result in NaN

[[nan  3.  4.]
 [ 5.  6.  7.]]


5.0

- #### doing operations (statistical and other types) over a specific "axis" (dimension) - effectively *collapses* this axis
  - designate via the "axis" option

In [32]:
print(a)

# To get a *column* average in a 2D array, you'll use axis=0, as you are averaging over all the *rows* in that column
# (and rows are the *first* dimension in the array, thus the "0")

np.mean(b,axis=0) # Result is column averages!

[[2 3 4]
 [5 6 7]]


array([3.5, 4.5, 5.5])