# `DSML` - Workshop 02

In this tutorial we will introduce the concept of python libraries and introduce a first such library - Numpy.

We will go through the following:

- Introduction to the concept of `Python Libraries`
- Introduction to `Numpy`

## `Python Libraries`

**Introduction to Libraries**

A library (or module, or package) is a Python object with arbitrarily named attributes that you can bind and reference. Simply, a module is a file consisting of Python code. A module can define functions, classes and variables. A module can also include runnable code.

You can use any Python source file as a module by executing an import statement in some other Python source file. The import has the following syntax:


```
import <module name>
```

By convention is is common to name modules so they can be called by entering an abreviated name. This is effectively importing the module in the same way that `import <module name>` will do, with the only difference of it being available as ` <module name abbreviation>`. In the case of `numpy`, for example, the abbreviation `np`is used.

```
import <module name> as <module name abbreviation>
```


**Adding/Installing Libraries**

To add Python libraries to your installation you can use the `conda` package manager that we have installed in the last workshop. Alternatively you can also use the `pip` package manager. The quickest way to do so is via a terminal:

* If you are on a **Windows** computer, use the "Anaconda Command Prompt" from the Start menu. 
* On a **Mac**, start up the "Terminal". 
* In **Linux**, use any of the terminals available.


The gerneral command syntax is the following:

```
conda install <package name>
```

If you are looking for a specific package but are unsure of the exact command line name do a quick google search and/or check the [Anaconda Cloud](https://anaconda.org).

**Relevant Libraries for this course**

There is a large variety of open source libraries available in Python. Below is a list of some of the most relevant ones for data science, which will be covered in this course.

* Selected data science libraries

    * Data Analysis and Processing
    >* Pandas (pd)
    >* Numpy (np)
    * Visualization        
    >* matplotlib and pyplot (plt)
    >* seaborn (sns)
    * Models and methods
    >* sklearn
    >* statsmodels

---

## `NumPy`

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

- A powerful N-dimensional array object
- Sophisticated (broadcasting) functions
- Tools for integrating C/C++ and Fortran code
- Useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes. The number of axes is rank.

Let's get started...

In [119]:
import numpy as np

### Creating NumPy Arrays

First, we can use `np.array` to create arrays from python lists. Unlike the Python lists, NumPy is constrained to arrays that all contain the same type. If types don't match, NumPy will upcast if possible. 

In [120]:
A = np.array([1,2,3,4,5])
A

array([1, 2, 3, 4, 5])

In [121]:
B = np.array([3.1, 5, 4, 6])
B

array([3.1, 5. , 4. , 6. ])

If we want to explicitly set the data type of the resulting array, we can use the `dtype` keyword.

In [122]:
C = np.array([1,2,3,8],dtype = float)
print (C)

[1. 2. 3. 8.]


Other examples of creating arrays using np functions:

In [123]:
# create vector of length 5 filled with zeros
E = np.zeros(5,dtype = int)

# create 2x4 matrix of ones (float)
F = np.ones((2,4), dtype= float)

# create vector from 0-12 in steps of 2
G = np.arange(0,12,2)

# create vector from 0 to 1 with five equally (linearly) spaced elements 
H = np.linspace(0,1,5)

# create a 2x2 matrix with random floats in the half-open interval [0.0, 1.0)
I = np.random.random((2,2))

#Return random integers from 0 (inclusive) to 10 (exclusive) of size (4,3,2)
J = np.random.randint (0,10,(4,3,2))

print("E =", E,
      "\n\nF =", F, 
      "\n\nG =", G, 
      "\n\nH =", H, 
      "\n\nI =", I, 
      "\n\nJ =", J)

E = [0 0 0 0 0] 

F = [[1. 1. 1. 1.]
 [1. 1. 1. 1.]] 

G = [ 0  2  4  6  8 10] 

H = [0.   0.25 0.5  0.75 1.  ] 

I = [[0.89930441 0.85049069]
 [0.04698382 0.89827804]] 

J = [[[1 4]
  [5 3]
  [5 0]]

 [[2 6]
  [0 9]
  [6 9]]

 [[5 4]
  [1 5]
  [4 7]]

 [[7 4]
  [1 8]
  [1 2]]]


**Exercise**: Define a few additional NumPy Arrays using some or all of the commands introduced above

In [124]:
# Your code here








### Manipulating NumPy Arrays

Data manipulation in Python is nearly synonymous with NumPy array manipulation. We will cover a few categories of basic array manipulations here:
- **Attributes of arrays**: Determinig the size, shape, memory consumption and data type of arrays.
- **Indexing of arrays**: Getting and setting the value of indivisual array elements.
- **Slicing of arrays**: Getting and setting smaller subarrays within a larger array.
- **Reshaping of arrays**: Changing the shape of a given array.
- **Joining and splitting of arrays**: Combining multiple arrays into one, and splitting one array into many.

#### NumPy Array Attributes:
In the following some examples on attributes are presented.

In [125]:
# returns dimension
print("E ndim: ", E.ndim)

# returns shape
print("F sahpe: " , F.shape)

# returns size (i.e. no of elements)
print("J size: ", J.size)

# returns data type
print("H dtype: ", H.dtype)

# returns length of one array element in bytes
print("itemsize: ", I.itemsize," bytes")

# returns total bytes consumed by the elements of the array
print("nbytes:  ", I.nbytes, "bytes")

E ndim:  1
F sahpe:  (2, 4)
J size:  24
H dtype:  float64
itemsize:  8  bytes
nbytes:   32 bytes


#### NumPy Array Indexing:
In the following some examples on attributes are presented.

Accessing single elements:

In [126]:
A

array([1, 2, 3, 4, 5])

In [127]:
print ("The 5th element of A is {}". format(A[4]))
print ("The last element of A is {}". format(A[-1])) # index from the back

The 5th element of A is 5
The last element of A is 5


In a multidimensional array (i.e. a matrix), you access items using a comma-seperated tuple of indices.

In [128]:
I

array([[0.89930441, 0.85049069],
       [0.04698382, 0.89827804]])

In [129]:
print ("The first element of I is {}". format(I[0,0]))  #array[row,column]
print ("The last element of I is {}". format(I[1,1]))   #array[row,column]

The first element of I is 0.8993044086325149
The last element of I is 0.8982780353177644


**Exercise**: Play around with the techniques and commands introduced above and apply them to the previously defined arrays for practice

In [130]:
# Your code here







#### NumPy Array Slicing:

**One-dimensional arrays**

Just as we can use square brakets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon `:` character. The syntax is as follow:
` X[start (incl.):stop (excl.):step]`

In [131]:
G

array([ 0,  2,  4,  6,  8, 10])

In [132]:
# item 3 and 4
print ("middle subarray:", G[2:4])

# item 1 to 4(excl.)
print("First 3 elemnts:", G[:3])

# last 3
print("Last 3 elements:", G[-3:] )

# first element and every second from there
print("Every other element:", G[::2])


print ("All elements reversed:", G[::-1])

middle subarray: [4 6]
First 3 elemnts: [0 2 4]
Last 3 elements: [ 6  8 10]
Every other element: [0 4 8]
All elements reversed: [10  8  6  4  2  0]


**Multi-dimensional arrays**

Multi-dimensional slices work in the same way, with multiple slices seperated by commas. The command is `X[slice row, slice column]`

In [133]:
# create a multi-dimensional array
K = np.random.randint(0,20, (3,4))
K

array([[10, 11, 13,  5],
       [14,  2,  1, 16],
       [17,  1, 17,  0]])

In [134]:
print ("The first two rows and the first three column: \n", K[:2,:3])

The first two rows and the first three column: 
 [[10 11 13]
 [14  2  1]]


In [135]:
print("All rows and every other column:\n", K[:,::2])

All rows and every other column:
 [[10 13]
 [14  1]
 [17 17]]


In [136]:
print("Reversed:\n",K[::-1,::-1] )

Reversed:
 [[ 0 17  1 17]
 [16  1  2 14]
 [ 5 13 11 10]]


**Exercise**: Familiarize yourself with indexing by attempting some indexing techniques on the arrays defined above

In [137]:
# Your code here








#### NumPy Array Reshaping:

Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape method. Note that for this to work, the size of the initial array must match the size of the reshaped array.

In [138]:
Y = np.arange(1,25)
Y

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24])

In [139]:
Y.size

24

In [140]:
# we can re-shape this array into any shape with 24 elements

Y.reshape(6,4)

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20],
       [21, 22, 23, 24]])

#### NumPy Array Concatenation and Splitting

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays.

**Concatination of arrays**

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routine `np.concatenate`. Additionally `np.vstack`, and `np.hstack` may be used.

In [141]:
P = np.array([1,2,3])
Q = np.array([4,5,6])
np.concatenate((P,Q))

array([1, 2, 3, 4, 5, 6])

In [142]:
#The axis along which the arrays will be joined.
R = np.array([[3,5,7],[1,3,5]])
S = np.array([[2,4,2],[0,9,8]])

In [143]:
R

array([[3, 5, 7],
       [1, 3, 5]])

In [144]:
S

array([[2, 4, 2],
       [0, 9, 8]])

In [145]:
print (np.concatenate((R,S), axis = 1))

[[3 5 7 2 4 2]
 [1 3 5 0 9 8]]


For working with arrays of mixed dimensions, it can be more practical to use the `np.vstack` (vertical stack) and `np.hstack` (horizontal stack) functions:

In [146]:
R = np.array([[3,5,7],[1,3,5]])
S = np.array([[2,4,2],[0,9,8]])

In [147]:
# stack row-wise
np.vstack((R,S))

array([[3, 5, 7],
       [1, 3, 5],
       [2, 4, 2],
       [0, 9, 8]])

In [148]:
# stack column-wise
np.hstack((R,S))

array([[3, 5, 7, 2, 4, 2],
       [1, 3, 5, 0, 9, 8]])

**Splitting of arrays**

The opposite of concatenation is splitting, which is implemented by the functions `np.split`, `np.hsplit`, and `np.vsplit`. For each of these, we can pass a list of indices giving the split points:

In [149]:
# Devides the x array to 4 equal subarrays.
x = np.array([2,4,6,7,8,9,1,3,11,35,55,34])
x1, x2, x3, x4 = np.split(x,4)
print(x1, x2, x3, x4)

[2 4 6] [7 8 9] [ 1  3 11] [35 55 34]


In [150]:
Z = np.arange(16).reshape((4, 4))
Z

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [151]:
#Splits an array into multiple sub-arrays vertically (row-wise).
upper, lower = np.vsplit(Z, 2)
print("upper: \n",upper)
print("lower: \n",lower)

upper: 
 [[0 1 2 3]
 [4 5 6 7]]
lower: 
 [[ 8  9 10 11]
 [12 13 14 15]]


In [152]:
#Splits an array into multiple sub-arrays vertically (row-wise).
upper, lower = np.hsplit(Z, 2)
print("left: \n",upper)
print("right: \n",lower)

left: 
 [[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
right: 
 [[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


**Exercise**: Create a few arrays and cincantenate them, afterwards split them to arrive at the original arrays

In [153]:
# Your code here








### Array Operations

Numpy also allows for linear algebra matrix-type operations, which are a key component of scientifiy computing tasks.

In [154]:
A = np.array([1,2,3,4])
B = np.array([9,3,-9,1])

In [155]:
A

array([1, 2, 3, 4])

In [156]:
B

array([ 9,  3, -9,  1])

**Element-wise operations**

In [157]:
# element-wise addition

C=A+B
C

array([10,  5, -6,  5])

In [158]:
# element-wise substraction

D=A-B
D

array([-8, -1, 12,  3])

In [159]:
# element-wise multiplication

E=A*B
E

array([  9,   6, -27,   4])

In [160]:
# element-wise division

F=A/B
F

array([ 0.11111111,  0.66666667, -0.33333333,  4.        ])

**Matrix operations**

In [161]:
M = np.arange(10).reshape(2,5)
M

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [162]:
N = np.random.randint(1,10,10).reshape(2,5)
N

array([[5, 5, 8, 1, 8],
       [8, 4, 4, 2, 4]])

In [163]:
# Note: we are performing a matrix multiplication on two 2x5 matrices, which is not possible
M@N

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 5)

In [164]:
# We can transpose one of the matrices to obtain a 5x2 matrix
N.T

array([[5, 8],
       [5, 4],
       [8, 4],
       [1, 2],
       [8, 4]])

In [165]:
# The operation works
M@N.T

array([[ 56,  34],
       [191, 144]])

---