# 2.1 Understanding Data Types in Python

### A Python List Is More Than Just a List

In [1]:
L = list(range(10))
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]:
type(L[0])

int

In [4]:
L2 = [i for i in L]
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [5]:
L3 = [str(i) for i in L2]
L3

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [6]:
type(L3[0])

str

Because of Python’s dynamic typing, we can even create heterogeneous lists:

In [7]:
L3 = [True, "2", 3.0, 4]

In [9]:
[type(i) for i in L3]

[bool, str, float, int]

At the implementation level, the array essentially contains a single pointer to one con‐
tiguous block of data. The Python list, on the other hand, contains a pointer to a
block of pointers, each of which in turn points to a full Python object like the Python
integer we saw earlier. Again, the advantage of the list is flexibility: because each list
element is a full structure containing both data and type information, the list can be
filled with data of any desired type. Fixed-type NumPy-style arrays lack this flexibil‐
ity, but are much more efficient for storing and manipulating data.


### Fixed-Type Arrays in Python  
Python offers several different options for storing data in efficient, fixed-type data
buffers. The built-in array module (available since Python 3.3) can be used to create
dense arrays of a uniform type:

In [10]:
import array

L = list(range(10))
A = array.array('i',L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here 'i' is a type code indicating the contents are integers

### Creating Arrays from Python Lists  

Remember that unlike Python lists, NumPy is constrained to arrays that all contain
the same type. If types do not match, NumPy will upcast if possible (here, integers are
upcast to floating point):

In [11]:
import numpy as np

# integer array
np.array([1,2,5,3,9])

array([1, 2, 5, 3, 9])

In [12]:
np.array([2,1,3.14,8])

array([2.  , 1.  , 3.14, 8.  ])

If we want to explicitly set the data type of the resulting array, we can use the dtype keyword:

In [13]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

unlike Python lists, NumPy arrays can explicitly be multidimensional; here’s
one way of initializing a multidimensional array using a list of lists:

In [14]:
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

### Creating Arrays from Scratch

In [2]:
import numpy as np
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [3]:
# Create a 3x5 floating-point array filled with 1s
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [4]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [5]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [6]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [7]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.20879291, 0.59684748, 0.38197474],
       [0.59099932, 0.39589526, 0.18546192],
       [0.2410955 , 0.04468355, 0.23133055]])

In [8]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

array([[-0.4599953 ,  1.92805248,  0.08916189],
       [ 0.5333879 ,  1.87410551, -0.51444567],
       [-0.06554323,  0.84614894,  0.39145087]])

In [9]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[4, 2, 5],
       [1, 2, 5],
       [7, 1, 8]])

In [10]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [11]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that
# memory location
np.empty(3)

array([1., 1., 1.])

### NumPy Standard Data Types

In [12]:
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

In [13]:
np.zeros(10, dtype=np.int16)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

![standard data type - 1](./images/2-1.png)
![standard data type - 2](./images/2-2.png)

# 2.2 The Basics of NumPy Arrays  

![Basics of numpy array](./images/2-3.png)

### NumPy Array Attributes

In [2]:
import numpy as np

np.random.seed(0) # seed for reproducibility

x1 = np.random.randint(10, size=6) # One-dimensional array
x2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array

In [12]:
x1

array([5, 0, 3, 3, 7, 9])

In [13]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [14]:
x3

array([[[8, 1, 5, 9, 8],
        [9, 4, 3, 0, 3],
        [5, 0, 2, 3, 8],
        [1, 3, 3, 3, 7]],

       [[0, 1, 9, 9, 0],
        [4, 7, 3, 2, 7],
        [2, 0, 0, 4, 5],
        [5, 6, 8, 4, 1]],

       [[4, 9, 8, 1, 1],
        [7, 9, 9, 3, 6],
        [7, 2, 0, 3, 5],
        [9, 4, 4, 6, 4]]])

In [6]:
np.random.seed?

Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array):

In [16]:
print('dimension x3 : ', x3.ndim)
print('shape x3 : ', x3.shape)
print('size x3 : ', x3.size)

dimension x3 :  3
shape x3 :  (3, 4, 5)
size x3 :  60


In [17]:
# find dataType
print("dtype:", x3.dtype)

dtype: int64


Other attributes include itemsize, which lists the size (in bytes) of each array element, and nbytes, which lists the total size (in bytes) of the array:

In [18]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 8 bytes
nbytes: 480 bytes


### Array Indexing: Accessing Single Elements  

If you are familiar with Python’s standard list indexing, indexing in NumPy will feel quite familiar. In a one-dimensional array, you can access the ith value (counting from zero) by specifying the desired index in square brackets, just as with Python lists:

In [19]:
x1

array([5, 0, 3, 3, 7, 9])

In [20]:
x1[0]

5

In [21]:
x1[4]

7

In [22]:
# To index from the end of the array, you can use negative indices
x1[-1]

9

In [23]:
x1[-3]

3

In a multidimensional array, you access items using a comma-separated tuple of indices:

In [24]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [26]:
x2[1,2]

8

In [27]:
x2[2,-1]

7

You can also modify values using any of the above index notation:

In [28]:
x2[0,0] = 12

In [29]:
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

Keep in mind that, unlike Python lists, NumPy arrays have a fixed type. This means,
for example, that if you attempt to insert a floating-point value to an integer array, the
value will be silently truncated. Don’t be caught unaware by this behavior!

In [30]:
x1[0] = 3.1416
x1

array([3, 0, 3, 3, 7, 9])

### Array Slicing: Accessing Subarrays  

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character. The NumPy slicing syntax follows that of the standard Python list;   
__syntax :__`x[start:stop:step]`

**One-dimensional subarrays**

In [1]:
import numpy as np

x = np.array(range(10))
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [2]:
 x[:5] # first five elements

array([0, 1, 2, 3, 4])

In [3]:
x[5:] # elements after index 5

array([5, 6, 7, 8, 9])

In [4]:
 x[4:7] # middle subarray

array([4, 5, 6])

In [5]:
x[::2]  # every other element

array([0, 2, 4, 6, 8])

In [6]:
x[1::2] # every other element, starting at index 1

array([1, 3, 5, 7, 9])

A potentially confusing case is when the step value is negative. In this case, the defaults for start and stop are swapped. This becomes a convenient way to reverse an array:

In [7]:
x[::-1] # all elements, reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [8]:
x[5::-2] # reversed every other from index 5

array([5, 3, 1])

**Multidimensional subarrays**  

Multidimensional slices work in the same way, with multiple slices separated by commas.

In [3]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [7]:
x2[:2,:3] #  two rows, three columns

array([[3, 5, 2],
       [7, 6, 8]])

In [6]:
x2[:3,::2] # all rows, every other column

array([[3, 2],
       [7, 8],
       [1, 7]])

Finally, subarray dimensions can even be reversed together:

In [8]:
x2[::-1, ::-1]

array([[7, 7, 6, 1],
       [8, 8, 6, 7],
       [4, 2, 5, 3]])

**Accessing array rows and columns** One commonly needed routine is accessing single
rows or columns of an array. You can do this by combining indexing and slicing,
using an empty slice marked by a single colon (:):

In [10]:
print(x2[:,0]) # print first column of x2

[3 7 1]


### Subarrays as no-copy views  

**One important—and extremely useful—thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:**

In [11]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [12]:
x2[:2,:2]

array([[3, 5],
       [7, 6]])

**Now if we modify this subarray, we’ll see that the original array is changed! Observe:**

In [14]:
sub_arr = x2[:2, :2]
sub_arr

array([[3, 5],
       [7, 6]])

In [16]:
sub_arr[0,0] = 99
sub_arr

array([[99,  5],
       [ 7,  6]])

In [18]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


__Creating copies of arrays__  

Despite the nice features of array views, it is sometimes useful to instead explicitly
copy the data within an array or a subarray. This can be most easily done with the
copy() method:

In [20]:
x2_sub_copy = x2[:2,:2].copy()
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


If we now modify this subarray, the original array is not touched

In [21]:
x2_sub_copy[0,0] =  12

In [22]:
print(x2_sub_copy)

[[12  5]
 [ 7  6]]


In [23]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


### Reshaping of Arrays  
Another useful type of operation is reshaping of arrays. The most flexible way of
doing this is with the reshape() method. For example, if you want to put the num‐
bers 1 through 9 in a 3×3 grid, you can do the following:

In [24]:
grid = np.arange(1,10).reshape(3,3)
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


more easily by making use of the newaxis keyword within a slice operation:

In [5]:
import numpy as np
x = np.array([1,2,3])
print(x)

[1 2 3]


In [6]:
type(x)

numpy.ndarray

In [11]:
# row vector via reshape()
x.reshape((1,3))

array([[1, 2, 3]])

In [14]:
# column vector via reshape()
x.reshape((3,1))

array([[1],
       [2],
       [3]])

In [10]:
# row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [13]:
 # column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

### Array Concatenation and Splitting  

All of the preceding routines worked on single arrays. It’s also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays.  

**Concatenation of arrays**  

Concatenation, or joining of two arrays in NumPy, is primarily accomplished through the routines np.concatenate, np.vstack, and np.hstack. np.concatenate takes a tuple or list of arrays as its first argument,

In [15]:
 x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

You can also concatenate more than two arrays at once:

In [16]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


np.concatenate can also be used for two-dimensional arrays:

In [17]:
grid1 = np.array([[1, 2, 3],
                  [4, 5, 6]])

grid2 = np.array([[7, 8, 9],
                  [9, 8, 7]])

In [18]:
print(np.concatenate([grid1, grid2]))

[[1 2 3]
 [4 5 6]
 [7 8 9]
 [9 8 7]]


In [20]:
 # concatenate along the second axis (zero-indexed)
np.concatenate([grid1, grid2], axis=1)

array([[1, 2, 3, 7, 8, 9],
       [4, 5, 6, 9, 8, 7]])

For working with arrays of mixed dimensions, it can be clearer to use the np.vstack(vertical stack) and np.hstack (horizontal stack) functions:

In [21]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [22]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])

np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

**Similarly, np.dstack will stack arrays along the third axis.**  

### Splitting of arrays  
The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points:

In [3]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])  # np.split(ary, indices_or_sections, axis=0)

print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


In [2]:
import numpy as np
np.split?

In [4]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [8]:
upper, lower = np.vsplit(grid, [2])
upper

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [9]:
lower

array([[ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [10]:
left, right = np.hsplit(grid, [2])

In [11]:
left

array([[ 0,  1],
       [ 4,  5],
       [ 8,  9],
       [12, 13]])

In [12]:
right

array([[ 2,  3],
       [ 6,  7],
       [10, 11],
       [14, 15]])

imilarly, np.dsplit will split arrays along the third axis.  

### 2.3 Computation on NumPy Arrays: Universal Functions

__The Slowness of Loops__  
The relative sluggishness of Python generally manifests itself in situations where
many small operations are being repeated—for instance, looping over arrays to oper

In [14]:
import numpy as np

np.random.seed(0)
def compute_reciprocals(values):
    output = np.empty(len(values))
    
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
        
    return output

values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)

array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

This implementation probably feels fairly natural to someone from, say, a C or Java
background. But if we measure the execution time of this code for a large input, we
see that this operation is very slow, perhaps surprisingly so! We’ll benchmark this
with IPython’s %timeit magic 

In [15]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

2.77 s ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


__Introducing UFuncs__  
For many types of operations, NumPy provides a convenient interface into just this
kind of statically typed, compiled routine. This is known as a vectorized operation.
You can accomplish this by simply performing an operation on the array, which will
then be applied to each element. This vectorized approach is designed to push the
loop into the compiled layer that underlies NumPy, leading to much faster execution

In [16]:
print(compute_reciprocals(values))
print(1.0 / values)

[0.16666667 1.         0.25       0.25       0.125     ]
[0.16666667 1.         0.25       0.25       0.125     ]


Looking at the execution time for our big array,

In [17]:
%timeit (1.0 / big_array)

3.42 ms ± 68 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Vectorized operations in NumPy are implemented via ufuncs, whose main purpose is to quickly execute repeated operations on values in NumPy arrays. Ufuncs are extremely flexible

In [18]:
np.arange(5) / np.arange(1, 6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

And ufunc operations are not limited to one-dimensional arrays—they can act on multidimensional arrays as well:

In [19]:
x = np.arange(9).reshape((3, 3))
2 ** x

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]])

Computations using vectorization through ufuncs are nearly always more efficient
than their counterpart implemented through Python loops, especially as the arrays
grow in size. Any time you see such a loop in a Python script, you should consider
whether it can be replaced with a vectorized expression.  

### Exploring NumPy’s UFuncs  

Ufuncs exist in two flavors: unary ufuncs, which operate on a single input, and binary
ufuncs, which operate on two inputs   

**Array arithmetic**

In [20]:
x = np.arange(4)
print("x =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)

print("x / 2 =", x / 2)
print("x // 2 =", x // 2) # floor division

x = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]


There is also a unary ufunc for negation, a ** operator for exponentiation, and a % operator for modulus:

In [21]:
print("-x = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2 = ", x % 2)

-x =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2 =  [0 1 0 1]


In [22]:
-(0.5*x + 1) ** 2

array([-1.  , -2.25, -4.  , -6.25])

All of these arithmetic operations are simply convenient wrappers around specific functions built into NumPy

In [23]:
 np.add(x, 2)

array([2, 3, 4, 5])

![Arithmatic operation implemented in numpy](./images/2-4.png)

__Absolute value__  



In [24]:
x = np.array([-2, -1, 0, 1, 2])
abs(x)

array([2, 1, 0, 1, 2])

The corresponding NumPy ufunc is np.absolute, which is also available under the alias np.abs:

In [25]:
np.absolute(x)

array([2, 1, 0, 1, 2])

In [26]:
np.abs(x)

array([2, 1, 0, 1, 2])

This ufunc can also handle complex data, in which the absolute value returns the magnitude:

In [27]:
 x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)

array([5., 5., 2., 1.])

__Trigonometric functions__  

NumPy provides a large number of useful ufuncs, and some of the most useful for the data scientist are the trigonometric functions. 

In [28]:
theta = np.linspace(0, np.pi, 3)
print("theta = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))

theta =  [0.         1.57079633 3.14159265]
sin(theta) =  [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) =  [ 1.000000e+00  6.123234e-17 -1.000000e+00]
tan(theta) =  [ 0.00000000e+00  1.63312394e+16 -1.22464680e-16]


Inverse trigonometric functions are also available:

In [29]:
x = [-1, 0, 1]
print("x = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))

x =  [-1, 0, 1]
arcsin(x) =  [-1.57079633  0.          1.57079633]
arccos(x) =  [3.14159265 1.57079633 0.        ]
arctan(x) =  [-0.78539816  0.          0.78539816]


__Exponents and logarithms__  
Another common type of operation available in a NumPy ufunc are the exponentials:

In [30]:
x = [1, 2, 3]
print("x =", x)
print("e^x =", np.exp(x))
print("2^x =", np.exp2(x))
print("3^x =", np.power(3, x))

x = [1, 2, 3]
e^x = [ 2.71828183  7.3890561  20.08553692]
2^x = [2. 4. 8.]
3^x = [ 3  9 27]


The inverse of the exponentials, the logarithms, are also available. The basic np.log
gives the natural logarithm; if you prefer to compute the base-2 logarithm or the
base-10 logarithm, these are available as well:

In [31]:
x = [1, 2, 4, 10]
print("x =", x)
print("ln(x) =", np.log(x))
print("log2(x) =", np.log2(x))
print("log10(x) =", np.log10(x))

x = [1, 2, 4, 10]
ln(x) = [0.         0.69314718 1.38629436 2.30258509]
log2(x) = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


There are also some specialized versions that are useful for maintaining precision with very small input:

In [32]:
x = [0, 0.001, 0.01, 0.1]
print("exp(x) - 1 =", np.expm1(x))
print("log(1 + x) =", np.log1p(x))

exp(x) - 1 = [0.         0.0010005  0.01005017 0.10517092]
log(1 + x) = [0.         0.0009995  0.00995033 0.09531018]


### Specialized ufuncs

In [33]:
from scipy import special

# Gamma functions (generalized factorials) and related functions
x = [1, 5, 10]
print("gamma(x) =", special.gamma(x))
print("ln|gamma(x)| =", special.gammaln(x))
print("beta(x, 2) =", special.beta(x, 2))

gamma(x) = [1.0000e+00 2.4000e+01 3.6288e+05]
ln|gamma(x)| = [ 0.          3.17805383 12.80182748]
beta(x, 2) = [0.5        0.03333333 0.00909091]


In [34]:
# Error function (integral of Gaussian)
# its complement, and its inverse

x = np.array([0, 0.3, 0.7, 1.0])
print("erf(x) =", special.erf(x))
print("erfc(x) =", special.erfc(x))
print("erfinv(x) =", special.erfinv(x))

erf(x) = [0.         0.32862676 0.67780119 0.84270079]
erfc(x) = [1.         0.67137324 0.32219881 0.15729921]
erfinv(x) = [0.         0.27246271 0.73286908        inf]


In [35]:
 x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)

[ 0. 10. 20. 30. 40.]


For large calculations, it is sometimes useful to be able to specify the array where the
result of the calculation will be stored. Rather than creating a temporary array, you
can use this to write computation results directly to the memory location where you’d like them

__Aggregates__  
For binary ufuncs, there are some interesting aggregates that can be computed
directly from the object. For example, if we’d like to reduce an array with a particular
operation, we can use the reduce method of any ufunc. 

In [2]:
import numpy as np
x = np.arange(1, 6)
np.add.reduce(x)

15

In [3]:
np.multiply.reduce(x)

120

If we’d like to store all the intermediate results of the computation, we can instead use
accumulate:

In [4]:
 np.add.accumulate(x)

array([ 1,  3,  6, 10, 15])

# 2.4 Aggregations: Min, Max, and Everything in Between  

### Summing the Values in an Array

In [1]:
import numpy as np

L = np.random.random(100)
sum(L)

51.23712958322241

The syntax is quite similar to that of NumPy’s sum function, and the result is the same
in the simplest case:

In [2]:
np.sum(L)

51.23712958322241

__because it executes the operation in compiled code, NumPy’s version of the operation is computed much more quickly:__

In [3]:
big_array = np.random.rand(1000000)

%timeit sum(big_array)
%timeit np.sum(big_array)

128 ms ± 2.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
788 µs ± 7.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


np.sum is aware of multiple array dimensions

### Minimum and Maximum  
Similarly, Python has built-in min and max functions, used to find the minimum value and maximum value of any given array:

In [9]:
min(big_array) , max(big_array)

(2.728055801370921e-06, 0.9999998322629087)

**NumPy’s corresponding functions have similar syntax, and again operate much more quickly:**

In [10]:
np.min(big_array), max(big_array)

(2.728055801370921e-06, 0.9999998322629087)

In [11]:
%timeit min(big_array)
%timeit min(big_array)

103 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
105 ms ± 7.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


For min, max, sum, and several other NumPy aggregates, a shorter syntax is to use methods of the array object itself:

In [12]:
print(big_array.min(), big_array.max(), big_array.sum())

2.728055801370921e-06 0.9999998322629087 499957.820143077


### Multidimensional aggregates

In [17]:
M = np.random.random((3,4))
print(M)

[[0.51102235 0.38232845 0.27347122 0.51248662]
 [0.51981408 0.58122516 0.91607566 0.15099868]
 [0.14229557 0.12531147 0.62491602 0.82566678]]


In [19]:
M.sum()

5.56561205740133

Aggregation functions take an additional argument specifying the axis along which
the aggregate is computed. For example, we can find the minimum value within each
column by specifying axis=0:

In [20]:
M.min(axis = 0)

array([0.14229557, 0.12531147, 0.27347122, 0.15099868])

Similarly, we can find the maximum value within each row:

In [22]:
M.min(axis = 1)

array([0.27347122, 0.15099868, 0.12531147])

So specifying axis=0 means that the first axis will be collapsed: for two-dimensional arrays, this means that values within each column will be aggregated.  

![Aggregation function available on numpy](./images/2-5.png)

![2-6](./images/2-6.png)
![2-7](./images/2-7.png)
![2-8](./images/2-8.png)