# NumPy Basics

In [2]:
# numpy operations perform much better than operations on 'native' types
import numpy as np

my_array = np.arange(10000000)
my_list = list(range(10000000))

%time for _ in range(10): my_array2 = my_array * 2
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

Wall time: 132 ms
Wall time: 5.94 s


## 4.1 The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray,  
which is a fast, flexible container for large datasets in Python. Arrays enable you to  
perform mathematical operations on whole blocks of data using similar syntax to the  
equivalent operations between scalar elements.

To give you a flavor of how NumPy enables batch computations with similar syntax  
to scalar values on built-in Python objects, I first import NumPy and generate a small  
array of random data:

In [3]:
# import numpy and create a random 2x3 array
import numpy as np

data = np.random.randn(2,3)
data

array([[-0.27290151, -0.99769491,  0.55357248],
       [-0.00387967,  0.71991013,  0.06575622]])

In [4]:
# multiply each vaue by 10
data * 10

array([[-2.72901512, -9.97694914,  5.53572475],
       [-0.03879668,  7.19910127,  0.65756222]])

In [5]:
# add each value to itself
data + data

array([[-0.54580302, -1.99538983,  1.10714495],
       [-0.00775934,  1.43982025,  0.13151244]])

In [6]:
# each multidimensional array has to main features
# the `shape` (size of each dimension) and `dtype`, the data type for all entries
# note that the data type applies to ALL elements in the array
print(data.shape)
print(data.dtype)

(2, 3)
float64


## 4.2 Creating ndarrays

The easiest way to create an array is to use the array function. This accepts any  
sequence-like object (including other arrays) and produces a new NumPy array containing  
the passed data. For example, a list is a good candidate for conversion:

In [4]:
import numpy as np

data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

In [5]:
# nested sequequences will be converted into a multidimensional array
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [6]:
# we can use `ndim` and `shape` to verify that the inferred dimension is 2
print(f"ndim: {arr2.ndim}")
print(f"shape: {arr2.shape}")

ndim: 2
shape: (2, 4)


In [7]:
# by default, np.array() tries to infer a good data type for the array
# the data type can be accessed from the `dtype`property
print(f"arr1.dtype: {arr1.dtype}")
print(f"arr2.dtype: {arr2.dtype}")

arr1.dtype: float64
arr2.dtype: int32


In [10]:
# there are other built-in functions to create arrays
# np.ones(), np.zeros() and np.empty()
print(np.zeros(10))
print(np.ones((2, 3)))
print(np.empty((2, 3, 2)))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[[1. 1. 1.]
 [1. 1. 1.]]
[[[6.23042070e-307 1.42417221e-306]
  [1.60219306e-306 9.79054228e-307]
  [1.69119330e-306 1.78022342e-306]]

 [[1.05700345e-307 1.11261027e-306]
  [9.34609111e-307 1.78019625e-306]
  [2.22522596e-306 0.00000000e+000]]]


In [11]:
# np.arange is the array-valued version of python's range() function
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

Array creation functions

- `array()`: Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype
or explicitly specifying a dtype; copies the input data by default
- `asarray()`: Convert input to ndarray, but do not copy if the input is already an ndarray
- `arange()`:  Like the built-in range but returns an ndarray instead of a list
- `ones()`, `ones_like()`: Produce an array of all 1s with the given shape and dtype; ones_like takes another array and
produces a ones array of the same shape and dtype
- `zeros()`, `zeros_like()`: Like ones and ones_like but producing arrays of 0s instead
- `empty()`, `empty_like()`: Create new arrays by allocating new memory, but do not populate with any values like ones and
zeros
- `full()`, `full_like()`: Produce an array of the given shape and dtype with all values set to the indicated “fill value”
full_like takes another array and produces a filled array of the same shape and dtype
- `eye()`, `identity()`:  Create a square N × N identity matrix (1s on the diagonal and 0s elsewhere)

In [14]:
# the data type can be specified for the np.array() function via parameter
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)

print(f"arr1.dtype: {arr1.dtype}")
print(f"arr2.dtype: {arr2.dtype}")

arr1.dtype: float64
arr2.dtype: int32


In [17]:
# explicit type casting for an array can be done with the astype() function
# Calling astype always creates a new array (a copy of the data), even if the new dtype is the same as the old dtype.
arr1 = np.arange(4)
print(f"arr1.dtype: {arr1.dtype}")

arr2 = arr1.astype(np.float64)
print(f"arr2.dtype: {arr2.dtype}")


arr1.dtype: int32
arr2.dtype: float64


### Arithmetic with NumPy Arrays

Arrays are important because they enable you to express batch operations on data  
without writing any for loops. NumPy users call this vectorization. Any arithmetic  
operations between equal-size arrays applies the operation element-wise:

In [19]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [20]:
arr * arr

array([[ 1,  4,  9],
       [16, 25, 36]])

In [21]:
arr - arr

array([[0, 0, 0],
       [0, 0, 0]])

Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [22]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [23]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

Comparisons between arrays of the same size yield boolean arrays:

In [26]:
arr2 = np.array([[0, 4, 1], [7, 2, 12]])
arr2

array([[ 0,  4,  1],
       [ 7,  2, 12]])

In [27]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])