# `DSML_WS_02` - Introduction to Python Library Management & `NumPy`

In this tutorial we will introduce the concept of python libraries and cover a first such library - NumPy.

We will go through the following:
- Task: Programming a simple calculator
- Introduction to the concept of `Python Libraries`
- Task: Getting started with `NumPy`
- Introduction to `NumPy`

---

## 1. Task: Programming a simple calculator

Last week, you made yourself familiar with Python, getting to know about data structures like values, lists and dictionaries, explored for loops and if statements, and learned how to define functions. To put your knowledge of basic Python to the test, write a properly documented python function that can perform the taks of a simple calculator with the following behaviour:

1. The user shall pass the desired mathematical operation (plus, minus, divide, multiply) and two numbers

2. The result shall be calculated and printed

3. If the input for the mathematical operation is not covered by the list from above, print "Please correct your input"

**You should use the following elements:** 

- If/elif/else statement
- Functions
- Mathematical expressions

In [None]:
def simple_calc(first_num, ops, second_num):
    
    
    """
    Perform a mathematical operation between two numbers
    ...
    
    Arguments
    ---------
    first_num     : int/float
                    first number in calculation
                    
    ops           : str
                    mathematical operation; ops=["plus", "minus", "multiply", "divide"]
    
    second_num    : int/float
                    second number in calculation
    
    Returns
    -------
    result : str
            Operation and results of operation as string
    """
    #### Your Code below
    
    # check numerical input
    if type(first_num) != int and type(first_num) != float:
        return print("Please provide numerical input for first_num")
        
    if type(second_num) != int and type(second_num) != float:
        return print("Please provide numerical input for second_num")
    

    # check if selected operation is in ["plus","minus","multiply","divide"]
    if ops not in ["plus","minus","multiply","divide"]:
        return print("Please provide valid operation! Choose from [plus,minus,multiply,divide]")
    
    
    # perform calculations
    if ops == "plus":
        result = first_num + second_num
    elif ops == "minus":
        result = first_num - second_num
    elif ops == "multiply":
        result = first_num * second_num
    elif ops == "divide":
        result = first_num / second_num

  
    return print(first_num, ops, second_num, "=", result)

In [None]:
simple_calc(5,"divide",2)

In [None]:
simple_calc("five","divide",2)

In [None]:
simple_calc(5,"div",2)

---

## 2. Python Libraries

**Introduction to Libraries**

A library (or module, or package) is a Python object with arbitrarily named attributes that you can bind and reference. Simply put, a module is a file consisting of Python code. A module can define functions, classes and variables. A module can also include runnable code.

You can use any Python source file as a module by executing an import statement in some other Python source file. The import has the following syntax:

```
import <module name>
```

By convention it is common to name modules so they can be called by entering an abbreviated name. This is effectively importing the module in the same way that `import <module name>` will do, with the only difference of it being available as ` <module name abbreviation>`. In the case of `numpy`, for example, the abbreviation `np` is used.

```
import <module name> as <module name abbreviation>
```

In [None]:
import numpy as np

**Adding/Installing Libraries**

We have discussed how to add Python packages (or libraries) to your installation last week. We used the `conda` package manager. Alternatively you can also use the `pip` package manager (if you check out our environment.yml file, you will see that we actually use pip for one package).

If you are looking for a specific package but are unsure of the exact command line name, do a quick google search and/or check the [Anaconda Cloud](https://anaconda.org).

You can get packages from different channels. We use the conda default or conda-forge channels (again, check out the environment.yml file). It is recommended to retrieve all packages from the same conda channel such as conda-forge to ensure smooth working of all dependencies. If you install packages via the command line, you can specify the desired channel as follows:

```
conda install -c conda-forge <package name>
```

**Relevant Libraries for this course**

There is a large variety of open source libraries available in Python. Below is a list of some of the most relevant ones for data science, which will be covered in this course.

* Selected data science libraries

    * Data Analysis and Processing
    >* Pandas (pd)
    >* Numpy (np)
    * Visualization        
    >* matplotlib and pyplot (plt)
    >* seaborn (sns)
    * Models and methods
    >* Scikit Learn (sklearn)

---

## 3. Task: Getting started with `NumPy`

This week, we will be exploring our first Python package, namely NumPy. Get prepared for the workshop by completing the below steps:

In order to use NumPy's capabilities, we first have to import it. Do this by running the following cell. We use the `as` keyword so we can use an abbreviation when calling NumPy methods.

In [None]:
import numpy as np

The core structure of NumPy is the N-dimensional array object. You can create arrays using `np.array([..,..,..])`. Do the following:
1. Create a list of 5 integers and assign it to a variable called `lst`.
2. Use `lst` to create a NumPy array, and assign this array to a variable called `my_first_array`.
3. Verify that the type of `my_first_array` is np.array.
4. Create a second array, consisting of the first 3 numbers from `my_first_array`, and assign it to a variable called `my_second_array` (tip: selecting a subset of elements from arrays works similarly to lists).
5. Create a third array, which contains each number from `my_second_array` multiplied by 3, and assign it to a variable called `my_third_array` (tip: thinking about list operations from last week will help you here once again).

In [None]:
# your code here
lst = [1,2,3,4,5]
my_first_array = np.array(lst)
print(my_first_array)

In [None]:
type(my_first_array)

In [None]:
my_second_array = my_first_array[:3]
print(my_second_array)

In [None]:
my_third_array = my_second_array * 3
print(my_third_array)

Finally, let's combine `my_second_array` and `my_third_array`. We can use the functions np.hstack((`first_array`,`second_array`)) or np.vstack((`first_array`,`second_array`)) for this. Combine `my_second_array` and `my_third_array` using both functions. How does the output differ? Can I also use both functions to combine `my_first_array`and `my_second_array`?

In [None]:
np.hstack((my_second_array, my_third_array))

In [None]:
np.vstack((my_second_array, my_third_array))

In [None]:
# combining my_first_array and my_second_array using hstack works
np.hstack((my_first_array, my_second_array))

In [None]:
# combining my_first_array and my_second_array using vstack does not work
np.vstack((my_first_array, my_second_array))

---

## 4. Introduction to `NumPy`

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

- A powerful N-dimensional array object
- Sophisticated (broadcasting) functions
- Tools for integrating C/C++ and Fortran code
- Useful linear algebra, Fourier transform, and random number capabilities allowing for efficient matrix operations.

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes. The number of axes is rank.

In today's short overview tutorial we will cover the following:

1. **Creating NumPy Arrays**
1. **Manipulating NumPy Arrays**
1. **NumPy Array Operations**

Let's get started...

In [None]:
# import numpy as np if not already done above
import numpy as np

### Creating NumPy Arrays

First, we can use `np.array` to create arrays from python lists. Unlike the Python lists, **NumPy is constrained to arrays that all contain the same type**. If types don't match, NumPy will upcast if possible. 

In [None]:
# assign an array of integers to variable A
A = np.array([1,2,3,4,5])
A

In [None]:
# let's try to create an Array with integers and floats and assign it to variable B
B = np.array([3, 5.1, 4.6, 6])

In [None]:
# return B; note that NumPy upcasts everything to floats!
B

If we want to explicitly set the data type of the resulting array, we can use the `dtype` keyword.

In [None]:
# assign an Array with integers to C; however, specify "dtype = float"
C = np.array([1,2,3,8],dtype = float)
C

Other examples of creating arrays using np functions:

In [None]:
# create a vector of length 5 filled with zeros
D = np.zeros(shape=(1,5),dtype = float)
print(D)

In [None]:
# create a 2x4 matrix of ones (float)
E = np.ones((2,4), dtype= float)
print(E)

In [None]:
# create a vector from 0-12 in steps of 2
F = np.arange(0,12,2)
print(F)

In [None]:
# create a vector from 0 to 1 with five equally (linearly) spaced elements 
G = np.linspace(0,1,5)
print(G)

In [None]:
# create a 2x2 matrix with random floats in the half-open interval [0.0, 1.0)
H = np.random.random((2,2))
print(H)

In [None]:
# return random integers from 0 (inclusive) to 10 (exclusive) of size (4,3,2)
I = np.random.randint(0,10,(4,3,2))
print(I)

In [None]:
print("D =", D,
      "\n\nE =", E, 
      "\n\nF =", F, 
      "\n\nG =", G, 
      "\n\nH =", H, 
      "\n\nI =", I)

### Manipulating NumPy Arrays

Data manipulation in Python is nearly synonymous with NumPy array manipulation (although a lot of it may happen in higher-level frameworks like pandas). We will cover a few categories of basic array manipulations here:
- **Attributes of arrays**: Determinig the size, shape, memory consumption and data type of arrays.
- **Indexing of arrays**: Getting and setting the value of individual array elements.
- **Slicing of arrays**: Getting and setting smaller subarrays within a larger array.
- **Reshaping of arrays**: Changing the shape of a given array.
- **Joining and splitting of arrays**: Combining multiple arrays into one, and splitting one array into many.

#### NumPy Array Attributes:
You can retrieve an attribute by appending it to the respective array.

In [None]:
# determine the shape of array D using .shape
D.shape

In [None]:
# determine the memory consumption of array D using .nbytes
D.nbytes

In the following, some example attributes of array H are presented.

In [None]:
# remember our array H
H

In [None]:
# returns dimension
print("H ndim: ", H.ndim)

# returns shape in form (#row,#col)
print("H shape: " , H.shape) ##### Most important!

# returns size (i.e. no of elements)
print("H size: ", H.size)

# returns data type
print("H dtype: ", H.dtype)

# returns length of one array element in bytes
print("itemsize: ", H.itemsize," bytes")

# returns total bytes consumed by the elements of the array
print("nbytes:  ", H.nbytes, "bytes")

#### NumPy Array Indexing:
In the following, some examples on indexing are presented. Note that, for a 1-dimensional array, this is very similar to indexing and slicing lists!

Accessing single elements:

In [None]:
# remember our basic array A
A

In [None]:
# index the array from the front
A[0]

In [None]:
# fill in the correct indices
print("The 4th element of A is {}".format(A[3]))
print("The last element of A is {}".format(A[-1]))

In a multidimensional array (i.e. a matrix), you access items using a comma-seperated tuple of indices.

In [None]:
# remember H
print(H)

In [None]:
# remember the shape of H
H.shape

In [None]:
# access the element in the bottom left
H[1,0]

In [None]:
# fill in the correct indices
print ("The first element of H is {}".format(H[0,0]))  #array[row,column]
print ("The last element of H is {}".format(H[1,1]))   #array[row,column]

#### NumPy Array Slicing:

**One-dimensional arrays**

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon `:` character. The syntax is as follow:
` X[start (incl.):stop (excl.):step]`

In [None]:
# remember array F
print(F)

In [None]:
# slice F to retrieve all elements except the first and the last
F[1:5]

In [None]:
# we can also reverse the order by setting steps to -1
F[::-1]

In [None]:
# items 3 and 4
print ("middle subarray:", F[2:4])

# items 1 to 4 (excl.)
print("First 3 elements:", F[:3])

# last 2 elements
print("Last 2 elements:", F[-2:])

# first element and every second element from there
print("Every other element:", F[::2])

**Multi-dimensional arrays**

Multi-dimensional slices work in the same way, with multiple slices seperated by commas. The command is `X[slice row, slice column]`.

In [None]:
# let's create a new multi-dimensional array
J = np.random.randint(low=0,high=20, size=(3,4))

J

In [None]:
# add the correct indices
print ("The first two rows and the first three column: \n", J[:2,:3])

In [None]:
# add the correct indices
print("All rows and every other column:\n", J[:,::2])

In [None]:
print("Rows and columns reversed:\n",J[::-1,::-1])

#### NumPy Array Reshaping:

Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape method. Note that for this to work, the size of the initial array must match the size of the reshaped array.

In [None]:
# return evenly spaced values within a given interval
K = np.arange(1,25)
K

In [None]:
# determine number of dimensions of K
K.ndim

In [None]:
# determine number of elements of K
print(len(K))
print(K.size)

In [None]:
# we can re-shape this array into any shape with 24 elements using the .reshape method
K.reshape(4,6)

#### NumPy Array Concatenation and Splitting

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays.

**Concatenation of arrays**

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routine `np.concatenate`. Additionally, `np.vstack` and `np.hstack` may be used.

In [None]:
# let's define two multi-dimensional arrays
N = np.array([[3,5,7],[1,3,5]])
O = np.array([[2,4,2],[0,9,8]])

In [None]:
# print both arrays
print("N:\n",N)
print("O:\n",O)

In [None]:
# There are different ways of concatenating these arrays. We can specify the axis using the keyword 'axis'
print("Row-wise:\n",np.concatenate((N,O), axis = 0))
print("Column-wise:\n",np.concatenate((N,O), axis = 1))

For working with arrays of mixed dimensions, it can be more practical to use the `np.vstack` (vertical stack, i.e. stacking on top of each other) and `np.hstack` (horizontal stack, i.e. stacking next to each other) functions:

In [None]:
# stack row-wise using vstack
print(np.vstack((N,O)))

In [None]:
# stack column-wise using hstack
print(np.hstack((N,O)))

**Splitting of arrays**

The opposite of concatenation is splitting, which is implemented by the functions `np.split`, `np.hsplit`, and `np.vsplit`. For each of these, we can pass a list of indices as split points:

In [None]:
# remember our one-dimensional array K
print(K)

In [None]:
# we can split K into 3 arrays using np.split
K1, K2, K3 = np.split(K,3)
print(K1, K2, K3)

Note that this only works if the number of elements of the original array can be split equally among the sub-arrays.

In [None]:
# lets create a new multi-dimensional array
P = np.arange(16).reshape((4, 4))
print(P)

In [None]:
# using vsplit, we can split an array into multiple sub-arrays vertically (row-wise)
row1, row2, row3, row4 = np.vsplit(P, 4)

print("Row 1:",row1)
print("Row 2:",row2)
print("Row 3:",row3)
print("Row 4:",row4)

In [None]:
# using hsplit, we can split an array into multiple sub-arrays horizontally (column-wise)
left, middle1, middle2, right = np.hsplit(P, 4)
print("Left: \n",left)
print("Middle1: \n",middle1)
print("Middle2: \n",middle2)
print("Right: \n",right)

### NumPy Array Operations

Numpy allows for **element-wise** as well as linear algebra **matrix-type** operations, which are a key component of scientific computing tasks. Matrix operations make computing fast and easy. It is the core functionality of `numpy`.

In [None]:
# let's create two multi-dimensional arrays
a = np.arange(6).reshape((2, 3))
b = np.arange(6,12).reshape((2, 3))

In [None]:
# print a and b
print("a:\n", a)
print("b:\n", b)

**Element-wise operations**

In [None]:
# we can do an element-wise addition using the + operator
c = a + b
print(c)

In [None]:
# we can do an element-wise subtraction using the - operator
d = a - b
print(d)

In [None]:
# we can do an element-wise multiplication using the * operator
e = a * b
print(e)

In [None]:
# we can do an element-wise division using the / operator
f = a / b
print(f)

**Matrix operations**

In [None]:
# remember our original arrays a and b
print("a:\n", a)
print("b:\n", b)

We can perform a matrix multiplication using the '@' operator.

In [None]:
a@b

We get an error message when attempting to perform a matrix multiplication on a and b. Why? Look at the dimensions of a and b.

In [None]:
print(a.shape)
print(b.shape)

Performing a matrix multiplication on two 2x3 matrices is not possible. However, we can transpose b (making it a 3x2 matrix). Multiplying a 2x3 with a 3x2 matrix should work.

In [None]:
# transpose b using .T
b_t = b.T

print(b_t)

In [None]:
# perform matrix multiplication
a@b_t

---