# Day 1 | Data types and basics of matrices in NumPy

In the following day we will cover basics of NumPy that includes data types and introduction to NumPy matrices.


 ## Data types in NumPy

Storage and analysis of large complex datasets require handling various data types such as single number instances, vectors, lists, arrays or matrices. The main goal of the today's class is to provide you with an overview of most commonly used NumpP data structures and their practical considerations.  

Unlike other high level languages (C++ or Java), Python offers a simpilified way to work with basic data types including integers, double and long numbers, strings, etc... = No more specification of variables or constant values is required (yey!), Python will do it for you...

Let have a look on this example of iterative sum in C++:

```C 
/* C++ code 

int some_variable = 0;
for(int i=0; i<10; i++){
    some_variable += i;
}

```

In Python we would write as follows:

In [None]:
# Python code
some_variable = 0
for i in range(0,10):
    some_variable += i
    
print(some_variable)

Neither *i* nor *some_variable* has assigned type unless we assign a number to it. *Magic* -> Python automatically detects what type should be assigned to either variable. In fact, Python variables are equivalent to C++ structures that contain not only number but the type and size of the variable allocated in the memory. Specific name of variable point in the specific place in the memory much like object pointer in the C++ syntax. This make python a very accessible and dynamic programming language. However, there is a critical limitation to be considered: making many instances of those variables in Python tend to steal relatively large blocks of memory and ultimately can slow down further operations involving those variables.   

Now let us create one another useful data structure by the list of integers so called vector of numbers:

In [None]:
L = list(range(0,100))

In [None]:
L


Those come very handy during interations and data sorting. We can directly access the elements of the list and their type:
    

In [None]:
L[50]

In [None]:
type(L[0])

In Python we can create mixed type lists where each element/object has is own type and representation:

In [None]:
LM=[1,"cos", False]

In [None]:
for item in LM:
    print(type(item))

In contrast, arrays are different from lists in Python. The array variable typically points to the place in the memory with the continuous block of data instead of individual objects. Thus arrays are more powerful tool to perform calculations and mathematical operations.
To create simple array in NumPy we perform following steps:

In [None]:
import numpy as np

# intialize the numeric vector,
np.array([1, 2, 3, 4, 5])

We can also specify the type of array using *dtype* clause:
    

In [None]:
np.array([1,2,3], dtype='int')


Symmetric matrix 3x3 can be create as follows:

In [None]:
np.array([[1,2,3],[4,5,6],[7,8,9]], dtype='float32')

NumPy also offers handy tools to initiate the matrices with default entries such as zeros or ones.
Let's look at this example:

In [None]:
np.ones((3,3))

Default data type is a float number.

Next we can fill any number (i.e. PI number) in this matrix:

In [None]:
np.full((6,4),10)

A very nice tool that NumPy offers is to interpolate a sequence of number within a range and fill vector or matrix. 

In [None]:
# Create an array of ten values evenly spaced between 0 and 10
np.linspace(0, 10, 1000)

Alternatively we can generate a vector of number with defined steps in the given range:

In [None]:
np.arange(0, 20, 2)

We can also create the array of random numbers normally distributed between 0 and 1:

In [None]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

or create diagonal matrices

In [None]:
# Create a 3x3 ID matrix
np.eye(3)

In summary, NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations.
Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed below.
Note that when constructing an array, they can be specified using a string:

```python
np.zeros(10, dtype='int16')
```

Or using the associated NumPy object:

```python
np.zeros(10, dtype=np.int16)
```

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_``  | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 

## Introduction to NumPy Matrices

Data operation in NumPy is analogous to data opeartion on matrices or arrays. Here we will learn how to work with Python matrcies by spliting, rebuilding, reshaping and joing them. This class will cover following issues:

- *Array properties*:  size, shape, memory consumption, and data types of arrays
- *inside the array indices*: extracting and setting the value of individual array elements
- *Slicing of arrays*: Getting and setting smaller subarrays within a larger array
- *Reshaping of arrays*: Changing the shape of a given array
- *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many

We will create a matrix of random numbers for that we use NumPy's random number generator, which we will seed with a set value in order to ensure that the same random arrays are generated each time this code is executed:

In [None]:
import numpy as np
np.random.seed(0)

In [None]:
matrix2D = np.random.randint(2,size=(3,3))

Next we can access properties of generated matrix:

In [None]:
print("ndim: ", matrix2D.ndim)
print("shape:", matrix2D.shape)
print("size: ", matrix2D.size)
print("type: ", matrix2D.dtype)

How would we access any element of that  matrix? We simply use indexing as follows:


In [None]:
matrix2D

We can also draw matrix diagonal using numPy function *diag*

In [None]:
np.diag(matrix2D)

### Matrix slicing or sub-matrix

Next we will use 2D matrix that we generate and splice it such that we create row or column vector:


In [None]:
matrix2D

In [None]:
trace2050 = matrix2D[:,0:2]
trace2050

In [None]:
matrix2D[:,:]

Similarly, one can obtain the last column of that matrix:


In [None]:
matrix2D[:,2:]

We can also reverse rows or columns of the matrix:
    

In [None]:
matrix2D

In [None]:
matrix2D[::-1,:]

In [None]:
matrix2D[:,::-1]

If you would like to get any elements of that matrix in a new sub-matrix you can pass new matrix containing indices -this is called fancy indexing:


In [None]:
matrix2D

In [None]:
row = np.array([2, 1, 0])
col = np.array([1, 0, 2])
matrix2D[row, col]

We can also update values for those indices:
    

In [None]:
matrix2D[row,col]=np.array([1,0,1])

matrix2D


### Practical example 

We will generate random points drawn from the 2D normal distribution:


In [None]:
mean = [0, 0]
cov = [[1, 0],
       [0, 1]]
X = np.random.multivariate_normal(mean, cov, 1000)
X.shape

To plot those points we will use *matplotlib* library that we will discuss in more detail on Day 3:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plotter

plotter.scatter(X[:, 0], X[:, 1]);

Our goal is to select 10 random points using fancy indexing:


In [None]:
indices = np.random.choice(X.shape[0], 10, replace=False)
indices

selection = X[indices]  # fancy indexing here
selection.shape

Let see those selected points:

In [None]:
plotter.scatter(X[:, 0], X[:, 1], alpha=0.3)
plotter.scatter(selection[:, 0], selection[:, 1],
            facecolor='red', s=50);

### Reshaping matrices

Reshape matrix is a very useful feature on NumPy because you can easily switch between dimensions and build matrices from vectors and vectors from matrices:

In [None]:
x = np.array([1, 2, 3, 4 ])

# row vector via reshape
x.reshape((2, 2))


In [None]:
x

In [None]:
x.reshape(4,1)

or we can use:

In [None]:
x[:,np.newaxis]




In [None]:
x[np.newaxis,:]

x.T

### Combine, stack and transpose matrices

Another useful feature of NumPy are functions for merging and spliting the arrays.
Let say we have two 1D vectors that we want to combine:


In [None]:
x=np.array([1,2,3])
y=np.array([4,5,6])

np.concatenate([x,y])

Similar approach applies for the 2D case:

In [None]:
X = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [None]:
newmatrix = np.concatenate([X,X])

We can reorient that combined matrix using so called transpose function:

In [None]:
np.transpose(np.concatenate([X,X]))

For working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:

In [None]:
x = np.array([1, 2, 3])
y = np.array([[4, 5, 6],
                 [7, 8, 9]])


np.vstack([x, y])

In [None]:
z = np.array([[1],
              [1]])
np.hstack([y, z])

### Splitting matrices

The matrix splitting is performed using the following functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  For each of these, we can pass a list of indices giving the split points:

In [None]:
whatever = [1, 2, 3, 0, 0, 3, 2, 1]
p1, p2, p3 = np.split(whatever, [0, 3])
print(p1, p2, p3)

In [None]:
matrix2D = np.random.randint(10,size=(4,4))

matrix2D

We will split the matrix into upper and lower parts

In [None]:
upper, lower = np.vsplit(matrix2D, [1])
print("upper: ", upper)

print("lower:", lower)

and left and right sub-matrices:

In [None]:
left, right = np.hsplit(matrix2D, [2])
print("left:",left)
print("right", right)

### NumPy's structured matrix

This section demonstrates the use of NumPy's *structured arrays* and *record arrays*, which provide efficient storage for compound, heterogeneous data.

Imagine that we have several categories of data on a number of people (say, name, age, and weight), and we'd like to store these values for use in a Python program.
It would be possible to store these in three separate arrays:

In [None]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

But this is a bit clumsy. There's nothing here that tells us that the three arrays are related; it would be more natural if we could use a single structure to store all of this data.
NumPy can handle this through structured arrays, which are arrays with compound data types.

Recall that previously we created a simple array using an expression like this:

and we can similarly create a structured array using a compound data type specification:

In [None]:
# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

Here 'U10' translates to "Unicode string of maximum length 10," 'i4' translates to "4-byte (i.e., 32 bit) integer," and 'f8' translates to "8-byte (i.e., 64 bit) float." We'll discuss other options for these type codes in the following section.

Now that we've created an empty container array, we can fill the array with our lists of values:

In [None]:
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

As we had hoped, the data is now arranged together in one convenient block of memory.

The handy thing with structured arrays is that you can now refer to values either by index or by name:

In [None]:
# Get all names
data['name']

In [None]:
# Get first row of data
data[0]

In [None]:
# Get the names
data[:]['name']

Using comparision logics this even allows you to do some more sophisticated operations such as filtering on age:

In [None]:
# Get names where age is under 30
data[data['age'] < 30]['name']