# Introduction to NumPy
NumPy is a very powerful tool for numerical calculation in large data set. NumPy is a python package which has many built in methods for performing fast vectorized array operations. NumPy can be used to perform complex computations on entire arrays without the need for Python for loops and it is important because they are significantly more efficient in memory use than their pure python counterparts.
In this jupyter notebook I will introduce the basic ways of creating numpy arrays and then some common operations and how we can use them for efficient performance. 

In [1]:
import numpy as np

## Creating NumPy arrays
NumPy arrays can be one dimensional or multidimensional.
If we have a list of datas that we want to convert to numpy array, we can easily do that with numpy's built in 'array()' function.

In [2]:
# Creating one dimensional NP array
some_list_1 = [1,2,3]
some_array_1 = np.array(some_list_1)

# Creating multi dimensional NP array
some_list_2 = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
some_array_2 = np.array(some_list_2)

In [3]:
some_array_1

array([1, 2, 3])

In [4]:
some_array_2

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

This array() function accepts any sequence-like object (including other arrays). In case of creating multi demensional array from list which contains other lists, the length of those lists needs to be equal to each other. Otherwise only one dimensional array will be created with data type as "object". Try that yourself for better understanding.

If we want to create an array with data by autogenarating in a sequence, as we create list by using "range()" function, we can do that by NP's "arange()" function.

In [5]:
# arange([start,optional] stop[, step, optional], dtype=None)
some_array_3 = np.arange(5) # One dimensional
print("array 3: \n{0}".format(some_array_3))

some_array_4 = np.arange(2,5) # One dimensional with specified start
print("\narray 4: \n{0}".format(some_array_4))

some_array_5 = np.arange(2,10,2) # One dimensional with specified start and steps
print("\narray 5: \n{0}".format(some_array_5))

array 3: 
[0 1 2 3 4]

array 4: 
[2 3 4]

array 5: 
[2 4 6 8]


Sometimes we might need to create arrays with only ones or zeroes. There are built in functions for it.

In [6]:
some_array_6 = np.zeros((2,3))
some_array_7 = np.ones((2,3))

In [7]:
print("array 6: \n{0}".format(some_array_6))
print("\narray 7: \n{0}".format(some_array_7))

array 6: 
[[0. 0. 0.]
 [0. 0. 0.]]

array 7: 
[[1. 1. 1.]
 [1. 1. 1.]]


The 1s and 0s are taken as float. If we inserted 'int' as follows, the 1s and 0s would be integers.

In [8]:
some_array_6 = np.zeros((2,3),'int')
some_array_6

array([[0, 0, 0],
       [0, 0, 0]])

In [9]:
some_array_7 = np.ones((2,3),'int')
some_array_7

array([[1, 1, 1],
       [1, 1, 1]])

There is also built-in function for creating an identity matrix.

In [10]:
identity = np.identity(3)
identity

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

For creating an NP array with random data, we can use "random.randn()" (for getting float numbers) or "random.randint()" (for getting integer numbers) functions.

In [11]:
some_array_8 = np.random.randn(2,3)
some_array_8

array([[ 0.17176512, -1.70811699,  0.8753117 ],
       [-0.22236257, -0.66810928,  0.55256393]])

In [12]:
some_array_9 = np.random.randint(low = 1, high = 10, size = (3,3))
some_array_9

array([[9, 5, 4],
       [6, 5, 7],
       [7, 7, 9]])

That's all for now. One thing to remember, the datatype of an array is not only limited to float or integer. They can be complex number, boolean object, strings and other python objects.

In [13]:
some_array_1>2

array([False, False,  True])

We can easily convert the data types if they are convertable.

In [14]:
# Creating an array with strings as data type
string_array = np.array(['1','2','2.5','3.5'])
string_array

array(['1', '2', '2.5', '3.5'], dtype='<U3')

In [15]:
# Now converting its data type to float numbers 
numeric_array = string_array.astype('float')
numeric_array

array([1. , 2. , 2.5, 3.5])

In [16]:
numeric_array.dtype

dtype('float64')

## Attributes

Now that we know how to create NP arrays, lets take a look how we can find the attributes of an array. For finding the shape of an array, we can use "some_array.shape". For finding the dimension, we can use "some_array.ndim". For finding the datatypes, we use "some_array.dtype".

In [17]:
some_array_2.shape

(4, 3)

In [18]:
some_array_2.ndim

2

In [19]:
some_array_2.dtype

dtype('int32')

## Indexing and slicing

Indexing and slicing of one dimensional arrays are simple.

In [20]:
# Creating a new array first for demonstration
new_array = np.arange(10)
new_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
# Calling the 6th element. Remember, indexing in python starts from 0.
new_array[5]

5

In [22]:
#slicing
sliced_array = new_array[2:6]
sliced_array

array([2, 3, 4, 5])

One thing to note down is this slicing is just 'view', not a new copy. So if we change anything in this sliced array, the main array will be mutated too. Lets see.

In [23]:
sliced_array[0] = 20
sliced_array[1] = 30

In [24]:
sliced_array

array([20, 30,  4,  5])

In [25]:
new_array

array([ 0,  1, 20, 30,  4,  5,  6,  7,  8,  9])

In [26]:
sliced_array[:] = 100
sliced_array

array([100, 100, 100, 100])

In [27]:
new_array

array([  0,   1, 100, 100, 100, 100,   6,   7,   8,   9])

To create a copy, we have to use "some_sliced_array.copy()" something like this. Then we can change the sliced array without mutating the main one.

Indexing of multidimensional array has options. We can call an individual element by a recursive call or we can just pass a comma separated list of indices.

In [28]:
# Creating a 2D array first
new_array = np.array([[1,2,3],[4,5,6]])
new_array

array([[1, 2, 3],
       [4, 5, 6]])

In [29]:
# Calling the first row
new_array[0]

array([1, 2, 3])

In [30]:
# calling the element from the first column of the first row
new_array[0][0]

1

In [31]:
# or we can pass the list of indices
new_array[0,0]

1

In [32]:
new_array[0,2]


3

Slicing of 2d or multidimensional array is slightly different.

In [33]:
new_array = np.array([[1,2,3],[4,5,6],[7,8,9]])
new_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [34]:
new_array[:2] # selecting every row upto the 3rd row


array([[1, 2, 3],
       [4, 5, 6]])

In [35]:
new_array[1:] #selecting every row after the first one

array([[4, 5, 6],
       [7, 8, 9]])

In [36]:
new_array[:,:2] #selecting every column upto the 3rd one, had to put an empty ':' to indicate 'take all rows'

array([[1, 2],
       [4, 5],
       [7, 8]])

In [37]:
new_array[:2,:2] # take all rows upto 3rd one and all columns upto the 3rd one

array([[1, 2],
       [4, 5]])

In [38]:
new_array[:2,1:] # take all rows upto 3rd one and all columns from the 2nd one to the last

array([[2, 3],
       [5, 6]])

There is also another very intering type of indexing called boolean indexing. We can pass an array of boolean values in the indexing, it will give us only the rows for which the boolean value is 'True'. Let's take a look at the following example. 

In [39]:
new_array = np.array([[1,2,3],[4,5,6],[7,8,9]])
boolean_array = np.array([True, False, True])

In [40]:
new_array[boolean_array]

array([[1, 2, 3],
       [7, 8, 9]])

This boolean indexing can be used in very creative ways. Let's look at an practical example.

Suppose we have four different things to mix (sugar, salt, spice, water or whatever). We tried 10 different combinations. Some proved to be a good mix, some proved to be bad mix.

In [41]:
combinations = np.random.randint(low = 1, high = 10, size = (10,4)) # Randomly creating an array for demonstration
combinations

array([[4, 9, 2, 9],
       [4, 8, 6, 3],
       [3, 4, 5, 4],
       [9, 8, 1, 3],
       [2, 5, 7, 7],
       [3, 2, 8, 3],
       [4, 6, 4, 4],
       [4, 6, 2, 1],
       [9, 6, 1, 7],
       [5, 5, 3, 5]])

In [42]:
results = np.array(['g','g','vg','g','b','b','b','vg','b','b']) 
# Randomly created the result, assume g = good, vg = very good, b = bad

Each row in the 'combination' array indicates the amount of each ingridients (or things). So we want to find an array showing only the very good or just good (designated by 'g' in the result) combinations or anything but bad combinations. Here is how we can use boolean indexing to do this.

In [43]:
results == 'g'

array([ True,  True, False,  True, False, False, False, False, False,
       False])

In [44]:
combinations[results=='vg']

array([[3, 4, 5, 4],
       [4, 6, 2, 1]])

In [45]:
combinations[results=='g']

array([[4, 9, 2, 9],
       [4, 8, 6, 3],
       [9, 8, 1, 3]])

In [46]:
condition = results=='b'
combinations[~condition] # using ~ before a condination will yield every other results except the given condition

array([[4, 9, 2, 9],
       [4, 8, 6, 3],
       [3, 4, 5, 4],
       [9, 8, 1, 3],
       [4, 6, 2, 1]])

We can reset the values, if we want, using the boolean indexing.

In [47]:
combinations[~condition] = 0
combinations

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [2, 5, 7, 7],
       [3, 2, 8, 3],
       [4, 6, 4, 4],
       [0, 0, 0, 0],
       [9, 6, 1, 7],
       [5, 5, 3, 5]])

In boolean indexing, we have to make sure that the length of the booling index matches the length of the axis its indexing.

## Operations

Here I will be covering some basics operations like addition, substraction, multiplication, division and some basic matrics operations. There are many other advanced operations available in Numpy but all cannot be covered in this "basics" nootbook.

In [48]:
# add
some_array_2_2 = some_array_2 + some_array_2
some_array_2_2

array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14, 16, 18],
       [20, 22, 24]])

In [49]:
# Substract
some_array_2_2 - some_array_2

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [50]:
# Multiplication with a scaler value
some_array_2 * 10

array([[ 10,  20,  30],
       [ 40,  50,  60],
       [ 70,  80,  90],
       [100, 110, 120]])

In [51]:
# Division
some_array_2 / 10

array([[0.1, 0.2, 0.3],
       [0.4, 0.5, 0.6],
       [0.7, 0.8, 0.9],
       [1. , 1.1, 1.2]])

In [52]:
# Adding a scaler value to every element
some_array_2 + 10

array([[11, 12, 13],
       [14, 15, 16],
       [17, 18, 19],
       [20, 21, 22]])

In the last addition operation, we did so without the need of a for loop. this is one of the reasons why we want to use numpy. Doing such operations without using for loops are significantly more memory efficient in large dataset. Let's test it by ourself.

In [53]:
some_array = np.arange(10000000)
some_list = [x for x in range(10000000)]

We have both an array and an equivalent list object in our hand. Let's perform a task on them and see which is faster.

In [54]:
%time some_array = some_array ** 2

Wall time: 31.2 ms


In [55]:
%time some_list_2 = [x ** 2 for x in some_list]

Wall time: 4.55 s


You can see it for yourself. The python for loop took almost 100 times (might differ in different hardwares) more than the NP's array operation.

Moving on to array multiplication with another array. Multiplication using the * symbol will multiply one element with coresponding element from the other array

In [56]:
# Creating an array first
some_array = np.array([[1,2,3],[4,5,6],[7,8,9]])
some_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [57]:
some_array * some_array

array([[ 1,  4,  9],
       [16, 25, 36],
       [49, 64, 81]])

But if we want to perform matrix multiplication, we can use the .dot() function of numpy module. 

In [58]:
some_array.dot(some_array)
# np.dot(some_array, some_array)

array([[ 30,  36,  42],
       [ 66,  81,  96],
       [102, 126, 150]])

We can transpose an array (exchanging its rows and columns) by using .T after the array

In [59]:
some_array.T

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

For multidimensional array, pass a tuple of the axes in the order we want. For example some_array.transpose((1,2,0)) will make the second axis first, the third axis second and the first axis last.

In [60]:
some_array = np.arange(16).reshape((2, 2, 4))

In [61]:
some_array.transpose((1, 2, 0))

array([[[ 0,  8],
        [ 1,  9],
        [ 2, 10],
        [ 3, 11]],

       [[ 4, 12],
        [ 5, 13],
        [ 6, 14],
        [ 7, 15]]])

There are some universal functions which helps us perform fast element wise operation. For example,

In [62]:
np.sqrt(some_array)

array([[[0.        , 1.        , 1.41421356, 1.73205081],
        [2.        , 2.23606798, 2.44948974, 2.64575131]],

       [[2.82842712, 3.        , 3.16227766, 3.31662479],
        [3.46410162, 3.60555128, 3.74165739, 3.87298335]]])

Some other universal functions are: abs, fabs (Compute the absolute value element-wise for integer, floating-point, or complex values), sqrt (Compute the square root of each element), square (Compute the square of each element), exp (Compute the exponent of each element) log, log10, log2, log1p (Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively), sign (Compute the sign of each element: 1 (positive), 0 (zero), or –1 (negative)), ceil (Compute the ceiling of each element (i.e., the smallest integer greater than or equal to that number)), floor (Compute the floor of each element (i.e., the largest integer less than or equal to each element)), rint (Round elements to the nearest integer, preserving the dtype), modf (Return fractional and integral parts of array as a separate array), isnan (Return boolean array indicating whether each value is NaN (Not a Number)), isfinite, isinf (Return boolean array indicating whether each element is finite (non-inf, non-NaN) or infinite, respectively),cos, cosh, sin, sinh, tan, tanh (Regular and hyperbolic trigonometric functions)arccos, arccosh, arcsin, arcsinh, arctan, arctanh (Inverse trigonometric functions) etc.

There are some mathematical functions to evaluate statistics of an array. For example,

In [63]:
some_array.mean()
# np.mean(some_array)

7.5

Some other mathematical functions are mean (Arithmetic mean),std, var (Standard deviation and variance, respectively), min, max (Minimum and maximum), argmin, argmax (Indices of minimum and maximum elements, respectively), cumsum (Cumulative sum of elements starting from 0), cumprod (Cumulative product of elements starting from 1)

# Conclusion

I have tried to cover all the necessary basics of numpy, to start working with it as a biginner, in as much details as posible without making them too understand so that beginers don't get lost. I am hoping to make another notebook for advanced use of numpy. But for that, I myself have a lot to learn too! 