![*INTERTECHNICA - SOLON EDUCATIONAL PROGRAMS - TECHNOLOGY LINE*](attachment:IntertechnicaLogo.png)

# Python for Data Processing - Numpy Basics

*Numpy is the core library for scientific computing in Python. 
It provides a high-performance multidimensional array object, and the tools for working with these arrays.*

## 1. Installing and Importing numpy

Numpy can be installed by using the following command (usually it comes pre-installed in conda-like environments):

In [1]:
!python -m pip install numpy



Numpy can be imported as any other Python Module via the **import** statement. 

In [2]:
import numpy as np
print ("You are using numpy version {}".format(np.__version__))

You are using numpy version 1.16.2


## 2. Understanding numpy Arrays

A numpy array is a **container of values** and it is indexed by a **tuple of non-negative integers**.  
All the values of a numpy array have the same type.  

The numpy arrays are represented by the array class in the numpy package.  
The types supported by numpy are as follows:

---

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_``  | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 

---

These types can be directly accessed from the numpy package. 

In [3]:
print(np.int16)

<class 'numpy.int16'>


In [4]:
print(np.float64)

<class 'numpy.float64'>


| Code	    | Description |
|---------------|-------------|
|'?'| 	boolean|
|'b'| 	(signed) byte|
|'B'| 	unsigned byte|
|'i'| 	(signed) integer|
|'u'| 	unsigned integer|
|'f'| 	floating-point|
|'c'| 	complex-floating point|
|'m'| 	timedelta|
|'M'| 	datetime|
|'O'| 	(Python) objects|
|'S', 'a'| 	zero-terminated bytes (not recommended)|
|'U'| 	Unicode string|
|'V'| 	raw data (void)| 

In [5]:
print(np.dtype('i4'))

int32


In [6]:
print(np.dtype('U20'))

<U20


## 3. Understanding numpy Array Indexing

### One-dimensional array indexing

The one-dimensional arrays are accessed via the simple indexing existing in Python. The indexing starts from 0.  
Negative indexing can be used and it will be applied from the end of the array and it starts by -1.

Let's build a 1-dimensional array:

In [7]:
x_unidimensional = np.array([ 1,   2,   3,   4,   5,   6,   7,   8,   9,  10])

We can experiment with indexing as follows:

In [8]:
print("First element is {}".format(x_unidimensional[0]))

First element is 1


In [9]:
first_index = 0 
print("First element can be referred using positive indexing {} and it is {}".format(
    first_index,
    x_unidimensional[first_index]
))

First element can be reffered using positive indexing 0 and it is 1


In [10]:
first_index_negative = -10
print("First element can be referred also using negative indexing {} and it is {}".format(
    first_index_negative,
    x_unidimensional[first_index_negative]
))

First element can be reffered also using negative indexing -10 and it is 1


In [11]:
last_index = 9
print("Last element can be referred using positive indexing {} and it is {}".format(
    last_index,
    x_unidimensional[last_index]
))

Last element can be reffered using positive indexing 9 and it is 10


In [12]:
last_index_negative = -1
print("Last element can be referred also using negative indexing {} and it is {}".format(
    last_index_negative,
    x_unidimensional[last_index_negative]
))

Last element can be reffered also using negative indexing -1 and it is 10


In [13]:
e4_index = 3 
print("The 4-th element can be referred using positive indexing {} and it is {}".format(
    e4_index,
    x_unidimensional[e4_index]
))

The 4-th element can be reffered using positive indexing 3 and it is 4


In [14]:
e4_index_negative = -7
print("The 4-th element can be referred also using negative indexing {} and it is {}".format(
    e4_index_negative,
    x_unidimensional[e4_index_negative]
))

The 4-th element can be reffered also using negative indexing -7 and it is 4


It is possible to use an array of indexes to obtain a sub-set of the array elements.

In [15]:
odd_elements_indexes = [0,2,4,6,8]

print("The odd elements in the array can be used via an array of indexes {} and it is the array {}".format(
    odd_elements_indexes,
    x_unidimensional[odd_elements_indexes]
))

The odd elements in the array can be used via an array of indexes [0, 2, 4, 6, 8] and it is the array [1 3 5 7 9]


There are also more advanced indexing methods which will be discussed in the section related to advanced indexing.

### Two dimensional array indexing

The indexing of two-dimensional arrays is done via a pair of indexes, the first element of the pair represents the row and the second element represents the column.  
For each dimension we can apply the same rules of indexing as for unidimensional case.

Let's build a simple 10x10 array of integer values.

In [16]:
x_two_dimensional = np.array(
      [[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15,  16,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25,  26,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35,  36,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45,  46,  47,  48,  49,  50],
       [ 51,  52,  53,  54,  55,  56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65,  66,  67,  68,  69,  70],
       [ 71,  72,  73,  74,  75,  76,  77,  78,  79,  80],
       [ 81,  82,  83,  84,  85,  86,  87,  88,  89,  90],
       [ 91,  92,  93,  94,  95,  96,  97,  98,  99, 100]])

We can experiment with indexing as follows:

In [17]:
index_row = 0
index_column = 0
print("The element at index row {} and index column {} is {}".format(
    index_row,
    index_column,
    x_two_dimensional[index_row, index_column]
))

The element at index row 0 and index column 0 is 1


In [18]:
index_row = 2
index_column = 3
print("The element at index row {} and index column {} is {}".format(
    index_row,
    index_column,
    x_two_dimensional[index_row, index_column]
))

The element at index row 2 and index column 3 is 24


In [19]:
index_row = -1
index_column = -1
print("The element at index row {} and index column {} is {}".format(
    index_row,
    index_column,
    x_two_dimensional[index_row, index_column]
))

The element at index row -1 and index column -1 is 100


###  Multi-dimensional array indexing

The case of two-dimensional arrays can be extended to multi-dimensional arrays. Therefore:  
* An element from a n-dimensional array can be accessed by a tuple of n non-negative integers;
* The 1-dimensional indexing rules apply for each dimension of the multi-dimensional array;
* It is possible to use array of indexes to select multiple elements from a n-dimensional array.

## 4. Array Dimensioning

The shape of an array is given by a tuple of positive integers representing the size of the array along each dimension. The shape of the array is given by the **shape** property:

In [20]:
print ("The shape of the 1-dimensional array is {}".format(x_unidimensional.shape))

The shape of the 1-dimensional array is (10,)


In [21]:
print ("The shape of the 2-dimensional array is {}".format(x_two_dimensional.shape))

The shape of the 2-dimensional array is (10, 10)


Arrays can be reshaped via the **reshape** method. The reshape method accepts a tuple parameter specifying the new shape of the array. The new shape must be compatible with the number of the elements in the array.

In [22]:
new_shape = (5,2)
x_unidimensional_reshaped = x_unidimensional.reshape(new_shape)
print ("The 1-dimensional array reshaped on {} is {}".format(
    new_shape,
    x_unidimensional_reshaped
))

The 1-dimensional array reshaped on (5, 2) is [[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]


In [23]:
new_shape = (2, 2, 25)
x_two_dimensional_reshaped = x_two_dimensional.reshape(new_shape)
print ("The 2-dimensional array reshaped on {} is \n {}".format(
    new_shape,
    x_two_dimensional_reshaped
))

The 2-dimensional array reshaped on (2, 2, 25) is 
 [[[  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    18  19  20  21  22  23  24  25]
  [ 26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42
    43  44  45  46  47  48  49  50]]

 [[ 51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67
    68  69  70  71  72  73  74  75]
  [ 76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92
    93  94  95  96  97  98  99 100]]]


It is possible to specify -1 for a single element of the tuple, in this case the value of the dimension will be **inferred from the values of the other dimensions**.

In [24]:
new_shape = (-1,25)
x_two_dimensional_reshaped = x_two_dimensional.reshape(new_shape)
print ("The 2-dimensional array reshaped on {} is \n {}".format(
    new_shape,
    x_two_dimensional_reshaped
))

The 2-dimensional array reshaped on (-1, 25) is 
 [[  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
   19  20  21  22  23  24  25]
 [ 26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43
   44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
   69  70  71  72  73  74  75]
 [ 76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93
   94  95  96  97  98  99 100]]


In [25]:
new_shape = (-1, 2, 25)
x_two_dimensional_reshaped = x_two_dimensional.reshape(new_shape)
print ("The 2-dimensional array reshaped on {} is \n {}".format(
    new_shape,
    x_two_dimensional_reshaped
))

The 2-dimensional array reshaped on (-1, 2, 25) is 
 [[[  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    18  19  20  21  22  23  24  25]
  [ 26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42
    43  44  45  46  47  48  49  50]]

 [[ 51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67
    68  69  70  71  72  73  74  75]
  [ 76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92
    93  94  95  96  97  98  99 100]]]


It is possible to **increase indefinitely the dimensionality of an array** by using tuples with a larger count and value 1.

In [26]:
new_shape = (1,1,10)
x_unidimensional_reshaped_3_dimensional = x_unidimensional.reshape(new_shape)
print ("The 1-dimensional array reshaped on {} is {}".format(
    new_shape,
    x_unidimensional_reshaped_3_dimensional
))

The 1-dimensional array reshaped on (1, 1, 10) is [[[ 1  2  3  4  5  6  7  8  9 10]]]


By **omitting the last dimension value**, it is possible to reduce the dimensionality of a numpy array:

In [27]:
new_shape = (1,10,)
x_3_dimensional_reshaped_2_dimensional = x_unidimensional_reshaped_3_dimensional.reshape(new_shape)
print ("The 3-dimensional array reshaped on {} is {}".format(
    new_shape,
    x_3_dimensional_reshaped_2_dimensional
))

The 3-dimensional array reshaped on (1, 10) is [[ 1  2  3  4  5  6  7  8  9 10]]
