# Python from Zero: The absolute Beginner's course

## Session 3 / 4 - 04.08.2021 9:00 - 12:30
<br>
<font size="3">
    <i>by Fabian Wilde, Katharina Hoff, Matthis Ebel & Mario Stanke<br></i><br>
<b>Contact:</b> fabian.wilde@uni-greifswald.de
<br>
</font>
<br>

## Running your code from the command-line
<br>
<font size="3">
    <b>In this case, we assume that you're using Linux.</b> If you'd like to run your Python script on the command-line (which is Bash under Linux in the default case), you need to save your script first and then run it with<br><br>
<font face="Courier"><b>python <i>your_script.py</i></b></font><br><br>
or<br><br>
<font face="Courier"><b>python3 <i>your_script.py</i></b></font><br><br>
In case you have both, Python 2.x and Python 3.x installed on your system, make sure, you're running the right interpreter using<br><br>
<font face="Courier"><b>which python</b></font><br><br>
which gives you the path to which the command is referring to.<br><br>
Of course, you can also directly run your Python script on the command-line treating is it as a Bash script (a script file for the command-line), but putting e.g.<br><br>
<font face="Courier"><b>#!/usr/bin/python3</b></font><br><br>
as first line in your Python script to tell Bash which interpreter to use for the following file content.<br><br>
    Then, you need to make your newly created file <b>executable</b> by adding the flag <b>executable</b> e.g. with the command<br><br>
    <font face="Courier"><b>chmod u+x your_script.py</b></font><br><br>
You can check the file permission flags in the listing of your directory using<br><br>
    <font face="Courier"><b>ls -la scripts/</b></font><br><br>
Then you should be able to simply run your Python script as if it would be a Bash script using<br><br>
<font face="Courier"><b>./your_script.py</b></font><br><br>                                                                                               
</font>

<font size="3">
In the following examples, a special "cell magic" command is used, so that bash scripts and commands can be run within a Jupyter notebook cell.
</font>

### Examples:

In [1]:
%%bash
which python3
python3 scripts/hello_world.py

/home/wildef/anaconda3/bin/python3


python3: can't open file 'scripts/hello_world.py': [Errno 2] No such file or directory


CalledProcessError: Command 'b'which python3\npython3 scripts/hello_world.py\n'' returned non-zero exit status 2.

<font size="3">
Or directly run your python script as it would be a bash script or native executable
</font>

In [None]:
%%bash
ls -la scripts/
chmod u+x scripts/hello_world2.py
./scripts/hello_world2.py

<font size="3">
We check what <i>hello_world2.py</i> contained by printing its file content with the command <i>cat</i>:
</font>

In [None]:
%%bash
cat scripts/hello_world2.py

## Handling of command-line arguments 
<br>
<font size="3">
Very often, it's the case that you'd like to run your Python script, but with slightly different parameters or paths with files to work on. It would be annoying to make the required changes every time in your Python code.<br><br>
Luckily, you can easily handle given command-line arguments in Python and work with them in your script. <b>The most basic option is to use the builtin library <i>sys</i> and the provided list <i>argv</i>. If any command-line argument was given to run your Python script, it will appear in <i>argv</i>.
</font>

### Examples:

<font size="3">Let's check first the content of the little example script:</font>

In [None]:
%%bash
cat scripts/cli_args.py

<font size="3">Then we run it giving various command-line arguments:</font>

In [None]:
%%bash 

# modifies file permission flags
chmod u+x scripts/cli_args.py

# run it without any command-line argument
./scripts/cli_args.py

# run it with one command-line argument
./scripts/cli_args.py --test1

# run it with multiple command-line arguments
./scripts/cli_args.py --test1 --test2 --test3

<font size="3">
    <b>A more convenient option to handle (define and check) command-line arguments is the module <a href="https://docs.python.org/3/library/argparse.html"><i>argparse</i></a></b>.<br><br> With <i>argparse</i> you can easily define expected mandatory or optional command-line arguments for your scripts with built-in checking of the user input. You can even define nice description and help texts for your parameters to help others to use your scripts later independently.
</font>

In [None]:
%%bash

# modifies file permission flags
chmod u+x scripts/argparse_example.py

# list file content
cat scripts/argparse_example.py

<font size="3">
    <b>If we now attempt to run the script without command-line arguments, we get an error and argparse gives us a hint what we have done wrong:</b>
            </font>

In [None]:
%%bash
./scripts/argparse_example.py

<font size="3">
    <b>Argparse provides a nice help function when the argument <i>-h</i> is used:</b
</font>

In [None]:
%%bash
./scripts/argparse_example.py -h

In [None]:
%%bash
./scripts/argparse_example.py 1 2 3 --sum

<font size="3"><div class="alert alert-warning"><b>Exercise 1:</b> Write your own script evaluating command-line arguments and run it on the command-line yourself. Either simply use the builtin <i>os.argv</i> variable or use the <i>argparse</i> module if you feel already comfortable enough.<br><br>
<b>Simply create a new Python script locally and upload it here.</b>
</div>
    
<b>Try to run it yourself here:</b></font>

In [None]:
%%bash
# replace "your_script" with your filename, may also change the file path
chmod u+x scripts/your_script.py
./scripts/your_script.py

## Importing Modules and Namespaces
<br>
<font size="3">
Structuring the code in functions which can be easily reused and maintained is the first step to achieve cleaner and leaner code. Nevertheless, in big Python projects, it would result in not really well readable code and collaborating on a common project would be difficult, if all the code would be in one single file. Nowadays Python projects are therefore organised in modules, allowing to distribute parts of the code over multiple files as well.<br><br> The figure below shows an exemplaric project directory structure organized in modules and submodules:<br><br>
</font>
<div align="center">
    <img src="img/absolute-import.jpg" width="40%">
</div>
<br>
<font size="2"><i>Source: <a href="https://www.geeksforgeeks.org/absolute-and-relative-imports-in-python/">https://www.geeksforgeeks.org/absolute-and-relative-imports-in-python/</a></i></font>
<br><br>
<font size="3">
<b>In order to use an existing (3rd party) module in your new script or project, you need to import the module using</b><br><br>
<b><font face="Courier">import <i>module_name</i></font></b><br><br>
<b>or define a shorter alias for the module name, if it is too long when used in the code:</b><br><br>
<b><font face="Courier">import <i>module_name</i> as <i>alias</i></font></b><br><br>
<font size="3">
<b>You can also import a specific class or function from module:</b><br><br>
<b><font face="Courier">from <i>module_name</i> import <i>function_name</i> as <i>alias</i></b></font><br><br>
<b>You can also further specifiy an <i>absolute path</i> to the submodule (or the file) to import a specific class or function from:</b><br><br>
<b><font face="Courier">from <i>module_name</i>.<i>submodule_name</i>.<i>file_name</i> import <i>function_name</i></font><br><br>
</b>
<b>Related to the project structure above, we could use the statement:</b><br><br>
<b><font face="Courier">from pkg2.subpkg1.module5 import fun3</font></b><br><br>
    to specify an <b><i>absolute import</i></b> path. However, absolute imports are discouraged to use when the directory structure is very large.<br><br>
<b>A relative import with respect to the project structure in the figure above may be defined using</b><br><br>
<b><font face="Courier">from .subpkg1.module5 import fun3</font></b><br><br>
    
The best practice in Python regarding imports can be found in the official Python style guide <a href="https://www.python.org/dev/peps/pep-0008/#imports">PEP8</a>.<br> The most important best practices are:
<ul>
    <li>import statements should be located at the beginning of your script.</li>
    <li>import statements should be sorted in alphabetical order for their module names.</li>
    <li>standard library imports before 3rd party imports.</li>
</ul>
</font>

### Examples:

In [None]:
# standard library imports first
# use for each import a new line
import math
import os

# the most popular example to use numpy
# imports numpy and sets the namespace to the alias "np"
# the module content is then accessible via the alias
import numpy as np

# the most popular example to plot/visualize data
from matplotlib import pyplot as plt

# import specific class from module
from tqdm import tqdm

<font size="3"><b>Try it yourself:</b></font>

<font size="3"><b>You can check the path, version and short documentation of a module using the module attributes:</b></font>

In [19]:
# the hidden attribute __version__ contains the module version
print(np.__version__)
# the hidden attribute __path__ contains the module path or location
print(np.__path__)
# the hidden attribute __doc__ contains a short documentation
# or description of the module, the so-called docstring
print(np.__doc__)

1.19.3
['/home/wildef/.local/lib/python3.8/site-packages/numpy']

NumPy
=====

Provides
  1. An array object of arbitrary homogeneous items
  2. Fast mathematical operations over arrays
  3. Linear Algebra, Fourier Transforms, Random Number Generation

How to use the documentation
----------------------------
Documentation is available in two forms: docstrings provided
with the code, and a loose standing reference guide, available from
`the NumPy homepage <https://www.scipy.org>`_.

We recommend exploring the docstrings using
`IPython <https://ipython.org>`_, an advanced Python shell with
TAB-completion and introspection capabilities.  See below for further
instructions.

The docstring examples assume that `numpy` has been imported as `np`::

  >>> import numpy as np

Code snippets are indicated by three greater-than signs::

  >>> x = 42
  >>> x = x + 1

Use the built-in ``help`` function to view a function's docstring::

  >>> help(np.sort)
  ... # doctest: +SKIP

For some objects, `

## The most popular Python Libraries (3rd party Modules)

<br>
<font size="3">
The most popular Python libraries which you will use yourself sooner that later are <br>
<ul>
<li><b><a href="https://numpy.org/">numpy</a>:</b><br>Numpy is one of the most widely used Python libraries. It offers fast handling and efficient storage of bigger amounts of numerical data in numpy arrays as well as a variety of useful functions to readin various data file formats.</li><br>
<li>
<b><a href="https://matplotlib.org/">matplotlib</a>:</b><br>Matplotlib is the most common Python library for data visualization. The library <a href="https://seaborn.pydata.org/">seaborn</a> builds on matplotlib and offers more beautiful plots and more sophisticated plot types for statistics.
</li><br>
    <li>
        <b><a href="https://pandas.pydata.org/">Pandas</a>:</b><br>Pandas is a library for chart data visualization and also for analysis and handling of large amounts of data. It became most popular for time series analysis and is also commonly used in finance.
    </li><br>
    <li>
     <b><a href="https://www.scipy.org/">scipy</a>:</b><br>Scipy is a general purpose library for science and engineering offering mostly functionalities for signal analysis, filtering and regression.
    </li><br>
        <li>
     <b><a href="https://www.statsmodels.org/">statsmodels</a>:</b><br>As the name implies, the library Statsmodels offers a big variety of statistical models and tests for your data.
    </li><br>
    <li>
     <b><a href="https://scikit-image.org/">scikit-image</a>:</b><br>Scikit-image offers functionalities for automatic image processing, enhancement and segmentation.
    </li><br>
     <li>
     <b><a href="https://scikit-learn.org/">scikit-learn</a>:</b><br>Scikit-learn offers a variety of machine learning models via keras as well as funtions for statistical tests and data fitting routines.
    </li><br>
     <li>
     <b><a href="https://www.tensorflow.org/">tensorflow</a>:</b><br>Tensorflow is the most popular machine learning library developed mostly by Google besides the competing <a href="https://pytorch.org/">pytorch</a> by Facebook.
    </li>
</ul>
</font>

## How to list installed modules and install new modules in Python

<br>
<font size="3">
Often you would like to use the functionality of a 3rd-party module, but the module is not installed. In most Python environments, a package manager is used to administer the installed modules.<br> In most cases, this is either <a href="https://docs.python.org/3/installing/index.html">pip</a> or if you use the Anaconda environment, it will be <a href="">conda</a>.<br><br>
    <b>In order to check which modules are installed, you can use the command:</b><br><br>
    <i>pip list</i>
    <br><br>
    or
    <br><br>
    <i>conda list</i><br><br>
    on the command-line prompt in a console or use directly
</font>

In [1]:
%%bash
conda list

# packages in environment at /home/wildef/anaconda3:
#
# Name                    Version                   Build  Channel
_ipyw_jlab_nb_ext_conf    0.1.0                    py38_0  
_libgcc_mutex             0.1                        main  
alabaster                 0.7.12                     py_0  
anaconda                  2020.07                  py38_0  
anaconda-client           1.7.2                    py38_0  
anaconda-navigator        1.9.12                   py38_0  
anaconda-project          0.8.4                      py_0  
argh                      0.26.2                   py38_0  
asn1crypto                1.3.0                    py38_0  
astroid                   2.4.2                    py38_0  
astropy                   4.0.1.post1      py38h7b6447c_1  
atomicwrites              1.4.0                      py_0  
attrs                     19.3.0                     py_0  
autopep8                  1.5.3                      py_0  
babel                     2.8.0       

<font size="3">in a cell of a JupyterNotebook where "%%bash" tells the interpreter to treat the following lines not as Python code but as bash script normally used on the command-line prompt of a Linux system.<br><br>
<b>To install a new module in your current Python environment, use then</font>

In [3]:
%%bash
conda install seaborn

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /home/wildef/anaconda3

  added / updated specs:
    - seaborn


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.10.3               |   py38h06a4308_0         2.9 MB
    ------------------------------------------------------------
                                           Total:         2.9 MB

The following packages will be UPDATED:

  conda              conda-forge::conda-4.9.0-py38h924ce5b~ --> pkgs/main::conda-4.10.3-py38h06a4308_0


Proceed ([y]/n)? 

Downloading and Extracting Packages
conda-4.10.3         | 2.9 MB    |            |   0% conda-4.10.3         | 2.9 MB    | 3          |   4% conda-4.10.3         | 2.9 MB    | ####3      |  44% conda-4.10.3         | 2.9 MB    | ########6  |  86% conda-4.10.3        

<font size="3">A Python environment in this case can have different modules and module versions installed. In order to not create a mess using conflicting versions of Python modules or imagine your code only works with specific versions of Python modules, you can create and use different Python environments.<br><br>
    <b>You can list the currently existing environments using</b></font>

In [4]:
%%bash
conda env list

# conda environments:
#
base                  *  /home/wildef/anaconda3



<font size="3"><b>You can create a new (empty) environment using</b></font>

In [5]:
%%bash
conda create --name new_env

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /home/wildef/anaconda3/envs/new_env



Proceed ([y]/n)? 
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
#
# To activate this environment, use
#
#     $ conda activate new_env
#
# To deactivate an active environment, use
#
#     $ conda deactivate



<font size="3"><b>You can switch to a specific environment using</b></font>

In [14]:
%%bash
conda activate new_env

no change     /home/wildef/anaconda3/condabin/conda
no change     /home/wildef/anaconda3/bin/conda
no change     /home/wildef/anaconda3/bin/conda-env
no change     /home/wildef/anaconda3/bin/activate
no change     /home/wildef/anaconda3/bin/deactivate
no change     /home/wildef/anaconda3/etc/profile.d/conda.sh
no change     /home/wildef/anaconda3/etc/fish/conf.d/conda.fish
no change     /home/wildef/anaconda3/shell/condabin/Conda.psm1
no change     /home/wildef/anaconda3/shell/condabin/conda-hook.ps1
no change     /home/wildef/anaconda3/lib/python3.8/site-packages/xontrib/conda.xsh
no change     /home/wildef/anaconda3/etc/profile.d/conda.csh
no change     /home/wildef/.bashrc
No action taken.


<font size="3"><b>And check on the actual activated environment using</b></font>

In [15]:
%%bash
conda info


     active environment : base
    active env location : /home/wildef/anaconda3
            shell level : 1
       user config file : /home/wildef/.condarc
 populated config files : /home/wildef/.condarc
          conda version : 4.10.3
    conda-build version : 3.18.11
         python version : 3.8.3.final.0
       virtual packages : __linux=5.8.0=0
                          __glibc=2.32=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/wildef/anaconda3  (writable)
      conda av data dir : /home/wildef/anaconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/wildef/anaconda3/pkgs
                          /home/wildef/.con

<font size="3"><div class="alert alert-warning"><b>Exercise 2:</b><br>Install the modules seaborn and statsmodels either using the command-line prompt in a console or here in the JupyterNotebook by using "%%bash" first. Check if numpy and pandas are installed.
</div>
    

### Try it yourself:

## A short introduction to Numpy

<br>
<font size="3">Often, you don't want or you can't define a list manually with your data. Instead you have a file e.g. a CSV file with comma-separated values and you need to convert the content yourself to a table-like datastructure. In this example, we have the famous <a href="https://en.wikipedia.org/wiki/Iris_flower_data_set">Iris dataset</a> (in this case a very popular standard data set in data science). This file could looks like this:</font>

In [21]:
%%bash
head -n 25 data/iris.csv

sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa
4.6,3.4,1.4,0.3,setosa
5.0,3.4,1.5,0.2,setosa
4.4,2.9,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa
5.4,3.7,1.5,0.2,setosa
4.8,3.4,1.6,0.2,setosa
4.8,3.0,1.4,0.1,setosa
4.3,3.0,1.1,0.1,setosa
5.8,4.0,1.2,0.2,setosa
5.7,4.4,1.5,0.4,setosa
5.4,3.9,1.3,0.4,setosa
5.1,3.5,1.4,0.3,setosa
5.7,3.8,1.7,0.3,setosa
5.1,3.8,1.5,0.3,setosa
5.4,3.4,1.7,0.2,setosa
5.1,3.7,1.5,0.4,setosa
4.6,3.6,1.0,0.2,setosa
5.1,3.3,1.7,0.5,setosa


<font size="3">which gives you the first 25 lines of the file here.<br><br>
Of course, we could now manually readin the file line by line, but then we would need to process the strings and convert them to a table-like data structure. <b>Luckily, there are already functions in the 3rd party libraries <a href="https://numpy.org/"><i>numpy</i></a> and <a href="https://pandas.pydata.org/"><i>pandas</i></a> that solve this problem.</b><br><br>
    <b>Recently, a <a href="https://www.nature.com/articles/s41586-020-2649-2">paper</a> about the numpy Python package was published in <i>nature</i>.</b><br><br>
        <div align="center">
<img src="img/numpy_nature.webp" width="100%">
        </div>
<font size="2"><i>Source: <a href="https://www.nature.com/articles/s41586-020-2649-2">https://www.nature.com/articles/s41586-020-2649-2</a></i></font>
<br><br>
<b>Numpy</b> is a powerful 3rd party package offering the new data type of the <b><i>numpy array</i></b> (numpy.ndarray) with a more powerful and faster implementation in C++ in the background. In contrast to lists in Python, <b>the size of a numpy array cannot be changed and the best practice is to allocate space in advance by initializing an empty array (e.g. filled with zeros)</b>. Also, the data type of its elements shouldn't differ and the number of elements in each row or column has to be the same, since <b>the numpy array represents a N x M matrix.<br><br>
<b>To sum it up, a numpy array behaves like a "classic" array in other high-level programming languages due to it's precompiled C++ implementation in the background.</b><br>
</font>

<font size="3">For using Numpy, you have to import the package once. Since you will in the following often have to refer to numpy, we assign an alias that is faster to type (np):</font>

In [27]:
# import numpy with alias np
import numpy as np

### NumPy Array

<br>
<font size="3">A NumPy Array can have many dimensions. Let's start with one (similar to a Python list) and two dimensions (similar to a matrix):</font>

In [24]:
# create a numpy array from a list
arr1 = np.array([1.0,2.35,3.141])
print("type(arr1) = "+str(type(arr1)))
print("arr1 = "+str(arr1))

# create an 2D array / a 3x3 matrix filled with zeros
arr2 = np.zeros((3,3))
print("type(arr2) = "+str(type(arr2)))
print("arr2 = \n"+str(arr2))

# index an array element in a multi-dimensional array
arr2[0,0] = 1
arr2[1,1] = 2
arr2[2,2] = 3
print("type(arr2) = "+str(type(arr2)))
print("arr2 = \n"+str(arr2))

# using the attribute "size" of the numpy array is more reliable than using the len() function
# the attribute "shape" contains a tuple with the array or matrix dimensions
print("arr2.size = "+str(arr2.size))
print("arr2.shape = "+str(arr2.shape))

type(arr1) = <class 'numpy.ndarray'>
arr1 = [1.    2.35  3.141]
type(arr2) = <class 'numpy.ndarray'>
arr2 = 
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
type(arr2) = <class 'numpy.ndarray'>
arr2 = 
[[1. 0. 0.]
 [0. 2. 0.]
 [0. 0. 3.]]
arr2.size = 9
arr2.shape = (3, 3)


<font size="3"><b>Sometimes you want or need to change the shape of your numpy array using the reshape method:</b></font>

In [40]:
# defines a numpy array with row number unequal column number
arr3 = np.array([[2,3],[3,5],[6,4]])
print("content of arr3:")
print(arr3)

# prints the shape (number of rows, number of columns)
print("shape of arr3:")
print(arr3.shape)

# transposes the array / matrix
arr3 = arr3.T
print("transposed array (rows and columns interchanged):")
print(arr3)

# prints the shape of the transposed array
print("shape of the transposed arr3:")
print(arr3.shape)

# the same result could be achieved using the reshape method expecting a tuple 
# (total number of elements needs to be the same!)
arr3 = arr3.reshape(2,3)

content of arr3:
[[2 3]
 [3 5]
 [6 4]]
shape of arr3:
(3, 2)
transposed array (rows and columns interchanged):
[[2 3 6]
 [3 5 4]]
shape of the transposed arr3:
(2, 3)


<font size="3"><b>Often you would like to perform computations over an entire row or column of your array (hence your dataset) for some statistical evaluation, like:
    </b></font>

In [46]:
# generates a 2D matrix of random floating-point numbers
arr = np.random.normal(0,2,size=(6,8))
print("random number array:")
print(arr)

# sums all values in each row
row_sum = np.sum(arr, axis=0)
print("row_sum = "+str(row_sum))

# sums all values in each column
col_sum = np.sum(arr, axis=1)
print("col_sum = "+str(col_sum))

# multiplies all values in each row
row_prod = np.sum(arr, axis=0)
print("row_prod = "+str(row_prod))

# multiplies all values in each column
col_prod = np.sum(arr, axis=1)
print("col_prod = "+str(col_prod))

random number array:
[[ 9.28448923e-01  2.85962210e+00 -2.45073732e+00  1.33347433e+00
   4.31950678e+00  9.82334027e-01 -2.97495743e-01  9.14464039e-01]
 [-1.86923312e+00 -1.32014197e+00  1.81377284e+00  2.80493995e-01
  -9.89007138e-01 -1.22437967e+00  1.58067586e+00 -2.58631262e+00]
 [-1.03499593e+00  2.03755424e+00  1.18882847e+00  2.39997078e+00
  -1.29893034e+00 -6.30375524e-01  1.37171947e+00  1.78241202e+00]
 [-4.56039865e+00  5.33393912e-01  1.27150454e+00 -6.57348102e+00
   4.23563924e+00  2.99069247e+00 -1.96488104e+00 -4.80135112e-01]
 [ 2.37416551e+00 -2.89236851e+00 -5.47833901e-03  9.50131637e-01
  -4.60499477e+00  7.20476133e-02 -2.42346572e+00  1.64157417e+00]
 [-1.24271935e+00  2.81094520e-01  1.12459809e+00 -2.39983129e+00
   7.50737387e-01  1.53934025e+00  1.06593859e+00 -4.01712866e-01]]
row_sum = [-5.40473262  1.49915428  2.94248828 -4.00924156  2.41295116  3.72965916
 -0.66750858  0.87028963]
col_sum = [ 8.58961713 -4.31413183  5.81618319 -4.54766566 -4.88838842 

<font size="3"><b>Of course, you can also do elementwise manipulations and matrix multiplications (that was one of the original purposes of numpy):</b></font>

In [67]:
arr = np.array([1,2,3,4])
print("arr:")
print(arr)

# e.g multiply each element with value
print("result of arr * 3.141:")
print(arr * 3.141)

# compute the products of two arrays element-wise
arr2 = np.array([5,6,0,1])
print("arr2:")
print(arr2)
result = arr * arr2
print("result of arr * arr2:")
print(result)

# compute the dot or scalar product of two vectors
result = np.dot(arr, arr2)
print("result of arr (dot) arr2:")
print(result)

# more general, compute the product of two matrices
arr = np.array([[1,2],[5,6],[4,2]])
arr2 = np.array([[5,6,7],[2,3,2]])
print("arr:")
print(arr)
print("arr2:")
print(arr2)
result = np.matmul(arr, arr2)
print("result of matmul(arr,arr2):")
print(result)

arr:
[1 2 3 4]
result of arr * 3.141:
[ 3.141  6.282  9.423 12.564]
arr2:
[5 6 0 1]
result of arr * arr2:
[ 5 12  0  4]
result of arr (dot) arr2:
21
arr:
[[1 2]
 [5 6]
 [4 2]]
arr2:
[[5 6 7]
 [2 3 2]]
result of matmul(arr,arr2):
[[ 9 12 11]
 [37 48 47]
 [24 30 32]]


<font size="3"><b>Now back to our CSV file, we can read-in the file using <a href="https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html">numpy.genfromtxt</a></b></font>

In [25]:
# load data file and convert it to (named) numpy array
iris_data = np.genfromtxt("data/iris.csv", names=True, delimiter=",")
# output just the first 20 rows
print("type(iris_data) = "+str(type(iris_data)))
print("iris_data[:20] = "+str(iris_data[:20]))
#print the guess data types for the columns of the table
print("iris_data.dtype = \n"+str(iris_data.dtype))

type(iris_data) = <class 'numpy.ndarray'>
iris_data[:20] = [(5.1, 3.5, 1.4, 0.2, nan) (4.9, 3. , 1.4, 0.2, nan)
 (4.7, 3.2, 1.3, 0.2, nan) (4.6, 3.1, 1.5, 0.2, nan)
 (5. , 3.6, 1.4, 0.2, nan) (5.4, 3.9, 1.7, 0.4, nan)
 (4.6, 3.4, 1.4, 0.3, nan) (5. , 3.4, 1.5, 0.2, nan)
 (4.4, 2.9, 1.4, 0.2, nan) (4.9, 3.1, 1.5, 0.1, nan)
 (5.4, 3.7, 1.5, 0.2, nan) (4.8, 3.4, 1.6, 0.2, nan)
 (4.8, 3. , 1.4, 0.1, nan) (4.3, 3. , 1.1, 0.1, nan)
 (5.8, 4. , 1.2, 0.2, nan) (5.7, 4.4, 1.5, 0.4, nan)
 (5.4, 3.9, 1.3, 0.4, nan) (5.1, 3.5, 1.4, 0.3, nan)
 (5.7, 3.8, 1.7, 0.3, nan) (5.1, 3.8, 1.5, 0.3, nan)]
iris_data.dtype = 
[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('species', '<f8')]


<font size="3"><b>Obviously, the numpy function couldn't properly guess the data type of the last column (just returning a <i>NaN</i> (not an number). But we can correct that:</b></font>

In [None]:
# load data file again with defined data types and column names and convert it to (named) numpy array
iris_data = np.genfromtxt("data/iris.csv", names=True, dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'),\
                                                              ('petal_length', '<f8'), ('petal_width', '<f8'),\
                                                              ('species', '<S8')], delimiter=",")
print("iris_data[:20] = "+str(iris_data[:20]))
#print the guess data types for the columns of the table
print("iris_data.dtype = \n"+str(iris_data.dtype))

# access column data with column name
print("iris_data['sepal_length'] = \n"+str(iris_data['sepal_length'][:20]))

#work with the data
#compute mean of column
mean = np.mean(iris_data['sepal_length'])
#compute standard deviation of column
std = np.std(iris_data['sepal_length'], axis=0)
print("Mean of column sepal_length:" + str(mean))
print("Std of column sepal_length:" + str(std))

# round to n significant digits
print("Mean of column sepal_length:" + str(np.round(mean,3)))

# or just change the number output format

<font size="3"><div class="alert alert-warning"><b>Exercise 3:</b> Write a function which loads a file (<b>in this case applied to <i>data/glass.csv<i></b>), print the shape of the resulting array, print the first N rows (adjustable via a function argument), compute and return the means and standard deviations for the data columns and print the results (adjustable via a parameter). The result of <b>numpy.genfromtxt is different in this case since glass.csv does not contain mixed data types. You don't need to specify the column data types in this case.</b><br><br>
    <b>Hint:</b> <b>Use the numpy functions <a href="https://numpy.org/doc/stable/reference/generated/numpy.mean.html">np.mean</a> and <a href="https://numpy.org/doc/stable/reference/generated/numpy.std.html">np.std</a></b> to compute the mean and the standard deviaton. The parameter axis allows to specifiy for which dimension/axis you'd like to perform the computation. So you don't need to use a loop.<br><br>
If you use the keyword argument <b>names = True</b> in numpy.genfromtxt, a named array will be returned where column data can be addressed via the column name defined in the first row of the file.
</div>
    
<b>Try it yourself here:</b></font>

### Example Solution:

In [None]:
import numpy as np  
def process_file(file, n_rows = 10, print_result = True): 
    glass_data = np.genfromtxt(file, delimiter=",", skip_header=1) 
    print(glass_data[:n_rows,:]) 
    means = np.mean(glass_data, axis = 0) 
    stds = np.std(glass_data, axis = 0) 
    if print_result: 
        print("means:"+str(means)) 
        print("stds:"+str(stds)) 
    return means, stds
means, stds=process_file("data/glass.csv", n_rows = 10, print_result = True)