# Scientific Python
## Central European University

## 01 Getting started, basic data types and control flow


Instructor: Márton Pósfai, TA: Luka Blagojevic

Email: posfaim@ceu.edu, Blagojevic_Luka@phd.ceu.edu

## Structure of the class

In most courses, you cover new materials during in-person classes, and you are asked to practice and do homework afterwards. Scientific Python this term will follow an opposite scheme: 

* We will meet for the first class normally. 
* Starting from the second week, you must work through the new material before the class by completing a Jupyter notebook that contains detailed explanations, videos, examples, and exercises. You can find an example notebook here (you need to install Jupyter to open the notebook). 
* After a short recap of the new material, the in-person classes are devoted to practice exercises and answering questions. By the end of the in-person sessions you have to solve and submit a final problem that contributes to your final grade. 

If you have any questions or you get stuck with one of the exercises during the week we will be available on [slack](http://www.personal.ceu.edu/staff/Marton_Posfai/slack_signup_forward.html). You are encouraged to answer each others questions.

## Goal of the course

* Explore python tools for data analysis
* More importantly: Internalize the logic of programming with python
* Prerequisite for other courses
* Diverse incoming coding experience -> narrow the gap
<img src='http://www.personal.ceu.edu/staff/Marton_Posfai/skill_goals.png' width=66%>

## How to get the most from the class?

* Arrive prepared for the week
* Try to solve the exercises before looking at the solution
* Ask questions
* Explore and show alternative solutions

## Assessment

### 1 Notebooks
* 50% of the final grade
* You have to upload a solution to a final problem at the end of each notebook
* Minimum requirement: upload an attempt at a solution
* Solutions will be graded

### 2 Final project
* 50% of the final grade
* Your project should perform a self-contained analysis of some empirical dataset
* Make use of Python tools that you newly learned
* More details about the requirements later
* The final deadline is the end of the term

### Extra credit
* Participate in discussions on slack at least five times --> +1% extra credit
* Challenges: Occasional harder problems

### University-wide Grade table
|| ||
| :--- | :---: | :---: |
|A	|	96	|	100|
|A-	|	88	|	95|
|B+	|	80	|	87|
|B	|	71	|	79|
|B-	|	63	|	70|
|C+	|	58	|	62|

## Today's plan
* Why Python and Jupyter notebook?
* Using Jupyter Notebooks
* Basic data types
* Basic flow control


## Intro

### Short story of python

* Small home hobby project of Guido van Rossum. First interpreter written over Christmas holiday
* Aim: script language with minimal core, highly extendible with modules, and "batteries included"
* Each module has its own small developer and support group
* Python is object-oriented, has dynamic typing, and memory management
* Focus is on readibility and not optimization
* Two main version: 2 and 3. Many people use both. Newest versions are 2.7 and 3.8. They are in mostly compatible (2.7 evolved to be more compatible with 3) but there are some differences. On older systems the default is 2, the support to Python 2 has ended on <a href="http://python3statement.org/">January 1, 2020</a>. We will use Python 3.
* Check which version are you running:

In [None]:
from platform import python_version
print(python_version())

If you see 3.x, you are good to go. If you see 2.x, create a new environment in Anaconda with Python 3.

### Why Python?

* High-level, easy to use (a lot gets done in the background that we don't have to care about)
* Cross-platform (although most data-centric work uses Linux or Mac)
* Popular (lot of support, lot of modules)

Our tools:
* **Anaconda**: cross-platform package and environment manager
* **python**: interpreter that takes code as input and runs it
* **Ipython**: interactive python, python + some user friendly features (e.g., [magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html))
* **Jupyter notebook**: combines code and formatted text, runs Ipython

### Jupyter notebooks
* Traditional Python script file, e.g., `my_code.py`:
    * Text file that contains code + unformatted comments
    * You have to save the output and present it separately 
* Jupyter Notebook:
    * Contains code + output of the code + formatted text (equations, images, links, etc.)
    * Publish it together with your paper -> increase reproducability
    * Exploratory data analysis, prototyping -> keep track of what you are doing

### Code

In [None]:
print("Hello!")

### Markdown
1. This is a <i>markdown</i> cell, you can change it in the menu
2. It is a rich text and it understands both html and mediawiki marks
3. Find out more about markdown [ here](https://en.wikipedia.org/wiki/Markdown) and [ here](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html).
4. You can also typeset equations like in wikipedia or latex. e.g. $\sqrt{x^2}=|x|$, or
$$\sum_{i=1}^\infty \frac{1}{2^i}=1$$
5. Double click to edit

### Help!
The most helpful IPython functions:

|Command|Description
|:---|:---
|?|Introduction and overview of IPython's features.
|%quickref|Quick reference.
|help|Python's own help system.
|object?|Details about the object

In [None]:
?print

In [None]:
%quickref

In [None]:
help(print)

In [None]:
c=2
c?

### Most useful way to get help:
* [google.com](https://lmgtfy.com/?q=why+doesn%27t+my+python+code+work)
* [stackoverflow.com](https://stackoverflow.com/questions/27156381/python-creating-a-2d-histogram-from-a-numpy-matrix)

## Variable types
|Type|Name|Example
|---|---|---
|int|Integer|30,-4
|float|Floating point|1.5,1e10
|str|String|"alma","c"
|bool|Boolean|True,False
|list|List|\[1,"alma",2\]
|dict|Dictionary|\{'course': "SciPy", 'teacher': "JT"\}

In [None]:
a = 7
b = 3.14
c = True
d = "alma"
e = [1, 2]
f = {'course': "SciPy", 'teacher': "JT", }
type(a),type(b),type(c),type(d),type(e),type(f)

## Integer numbers

How does division of integers work? Try out: division, integer division, remainder, rounding:

In [None]:
print(14 / 4)
print(14 // 4)
print(14 % 4)
print(round(14 / 4))
print(int(14 / 4))

Importance of the order of the operators. Precedence:
1. ()
2. **
3. *, /, //, %
4. +, -

When in doubt use paranthesis.

In [None]:
print(1 + 2 * 3 + 2 / 2)
print(2 / 2 * 3)
print(2 / (2 * 3))

print()
print(2 / 2**3)
print( (2 / 2)**3 )

## Floating point number
$$1.2345 = \underbrace{12345}_\text{significand} \times \underbrace{10}_\text{base}{\!}^{\overbrace{-4}^\text{exponent}}$$

In [None]:
import sys # this module allows us to extract information about the system you are using 
sys.float_info

What does this mean?
* *max*: maximum representable number
* *max_exp*, *max_10_exp*: maximum value of exponent in base 2 and base 10
* *dig*: maximum number of decimal digits that can be faithfully represented
* *epsilon*: difference between 1.0 and the least value greater than 1.0 that is representable as a float

<a href="https://docs.python.org/3/library/sys.html">Details in the documentation.</a>

Ways to write a literal float:

In [None]:
print(1) #integer
print(float(1), 1.) #floats
print(1.5)
print(1e4)
print(1e-4)
print(15e2)
print(1.5e3)

Now we can try out epsilon from `sys.float_info`: 

In [None]:
print(1 + 1e-10 - 1)
print(1 + 1e-16 - 1)  # too small to add

In [None]:
.2+.2+.2==.6

Please note that type of a variable may change in runtime

In [None]:
a = 2
print(a,type(a))
a /= 2
print(a,type(a))

In [None]:
.2+.2+.2==.6

## Functions and modules

Most of the functionality in Python is provided by *modules*, such as access to the operating system, file I/O, string management, network communication, and much more. We will use some of these modules along the course.

<b>To use a module</b> in a Python program it first has <b>to be imported</b>. A module can be imported using the `import` statement. For example, to import the module `math`, which contains many standard mathematical functions, we can do:

In [None]:
import math

print(math.sqrt(4))
print(math.log10(1000))
print(math.log(math.e**5))
print(math.pi)
print(math.cos(math.pi * 0.5))
print(math.pow(3,10))

In [None]:
from math import sqrt
from math import *
import math as ma

print(sqrt(9),ma.pi)

## Boolean
It can only be True or False

In [None]:
a = True
b = (0 == 1)
c = (2**3 == 8.0)
d = bool(0)
e = bool(3.14)
print(a,b,c,d,e)

Comparison operators:

In [None]:
print(1 == 1) # equal
print(1 < 1) # less
print(1 <= 1) # less or equal
print(1 < 1 or 1 == 1) # less OR equal
print(1 != 1 and 1 == 1) # not equal AND equal|
print((5 == 5)*1, (5 == 4)*2 + 3)

## Strings

Text inside quotation marks (you can use either `"` or `'`):

In [None]:
s1 = "apple"
print(s1)
s2= 'apple'
print(s2)
print(s1==s2)

### Escape sequences
When writing a literal string, we can include characters with special meaning using escape sequences. For example, `"` can mark the end of the string, to include it inside the string, we write `\"`:

In [None]:
print("\tapple\n\"pear\"\\'")

Note:

In [None]:
print("'",'"',"\"",'\"')

List of escape sequences (some are antiquated, e.g., `\a`):

<table border align="center" style="border-collapse: collapse">
  <thead>
    <tr class="tableheader">
      <th align="left"><b>Escape Sequence</b>&nbsp;</th>
      <th align="left"><b>Meaning</b>&nbsp;</th>
    </thead>
  <tbody valign='baseline'>
    <tr><td align="left" valign="baseline"><code>&#92;<var>newline</var></code></td>
        <td align="left">Ignored</td>
    <tr><td align="left" valign="baseline"><code>&#92;&#92;</code></td>
        <td align="left">Backslash (<code>&#92;</code>)</td>
    <tr><td align="left" valign="baseline"><code>&#92;'</code></td>
        <td align="left">Single quote (<code>'</code>)</td>
    <tr><td align="left" valign="baseline"><code>&#92;"</code></td>
        <td align="left">Double quote (<code>"</code>)</td>
    <tr><td align="left" valign="baseline"><code>&#92;a</code></td>
        <td align="left">ASCII Bell (BEL)</td>
    <tr><td align="left" valign="baseline"><code>&#92;b</code></td>
        <td align="left">ASCII Backspace (BS)</td>
    <tr><td align="left" valign="baseline"><code>&#92;f</code></td>
        <td align="left">ASCII Formfeed (FF)</td>
    <tr><td align="left" valign="baseline"><code>&#92;n</code></td>
        <td align="left">ASCII Linefeed (LF)</td>
    <tr><td align="left" valign="baseline"><code>&#92;r</code></td>
        <td align="left">ASCII Carriage Return (CR)</td>
    <tr><td align="left" valign="baseline"><code>&#92;t</code></td>
        <td align="left">ASCII Horizontal Tab (TAB)</td>
    <tr><td align="left" valign="baseline"><code>&#92;v</code></td>
        <td align="left">ASCII Vertical Tab (VT)</td>
    <tr><td align="left" valign="baseline"><code>&#92;<var>ooo</var></code></td>
        <td align="left">ASCII character with octal value <i>ooo</i></td>
    <tr><td align="left" valign="baseline"><code>&#92;x<var>hh...</var></code></td>
        <td align="left">ASCII character with hex value <i>hh...</i></td></tbody>
</table>

Prepending an `r` to a string means raw string and escape sequences are ignored:

In [None]:
print("a\bpple",r"a\bpple")
print(r'C:\some\name')

### Operators, conversions

Concatenating strings:

In [None]:
"app" + "le"

Convert strings to numbers and numbers to strings:

In [None]:
str(9), float("4.5"), int("4")

In [None]:
a = eval("1+1")
print(a)
b = a + 2
print(b)

Include variables formatted strings, there are several options.

**Method 0**: We can convert to a string and concatenate:

In [None]:
N = 1500

print("The value of N (=" + str(N) + ") is greater than allowed (1000)")

This is more of a workaround, not flexible.

**Method 1**: We can use so-called printf-style formatting (borrowed from C's `printf()` function) using the `%` operator:

In [None]:
print("The value of N (=%d) is greater than allowed (1000)" % N)

Here `%d` means insert decimal integer here, other common placeholders: `%f`=float, `%s`=string. We can have multiple conversions in a string and we can add additional format specifications:

In [None]:
d = 3
c = 1
b = 7
N = b + c + d
print("Peter has %d dogs, %d cats and %d birds" % (d, c, b))
print("%3.10f%%, %20.1f%%, %3.f%% of the animals are dogs, cats and birds respectively." % \
     (100.0 * d / N, 100.0 * c / N, 100.0 * b / N))

In `%3.4f` `3` gives the minimum field with, and `.4` sets the precision. Try changing it!

Also, note that to include a literal `%` we have to escape it, i.e., write `%%`.

For further details check <a href="https://docs.python.org/3/library/stdtypes.html#old-string-formatting">documentation</a>. This is the oldest method, it has trouble converting tuples and dictionaries (which we have not yet talked about), to overcome this new methods were introduced.

**Method 2**: Using the `.format()`

In [None]:
print("{:2.1f}%, {:2.1f}%, {:2.1f}% of the animals are dogs, cats and birds respectively."\
      .format(100.0 * d / N, 100.0 * c / N, 100.0 * b / N))

In [None]:
import math

"{1:.10f} {1:.4f} {0:.2f}".format(math.pi, 2*math.pi)

Both formats can be used.

**Method 3**: f-strings
Introduced with Python 3.6 (a good howto is here: http://zetcode.com/python/fstring/), Add `f` to the start of a string and include variables and format specifiers inside `{}`:

In [None]:
a = "foo"
b = "bar"
c = 3.14159
print(f"{a} {b} {c} {2 * c} {c:.2f} }}")
print(rf"\n {a}") # we can mix raw and f-strings

More readable since variables are at the location you want to include them.

### Encoding strings

Encoding mapping between characters and sequence of bytes.

In [None]:
s = 'apple'  #string with some kind of encoding
b = b'apple' #just a list of bytes
print(type(s),type(b))

In [None]:
s = "körte"
b = b"körte"

In [None]:
s = "Körte"
print(len(s))
print(type(s))
print(type(s.encode("utf-8")))
print()

print(s.encode("utf-8"))
print(s.encode("latin-1"))
print(str(s.encode("utf-8"),"utf-8"))
print(str(s.encode("utf-8"),"latin-1"))


print()
import sys
print(sys.getsizeof("korte".encode('utf-8')),
      sys.getsizeof("körte"),sys.getsizeof("körte".encode('latin-1')))

In [None]:
"őzláb".encode("latin1")

### Slicing
Accessing characters and substrings.

In [None]:
s = "Hello world!"
print("Length of string:",len(s))
print(s[0],s[1],s[-1],s[-2],s[-12]) #accessing characters 
print(s[0:4],s[:4],s[:-1]) #slices
print(s[0:-1:2]) #You can also set the increment, e.g., skip every second character

### Useful string methods

Try changing arguments!

In [None]:
s = "Helló világ!\n"
print(s[6:].capitalize() + "X")
print(s.rstrip() + "X")
print(s.strip("\n! l"))
print("split():",s.split(),s.split("l"))
print(s.count("l"))
print(s.index("l"))
print("isdigit():","123".isdigit(),"1e3".isdigit())
print("isupper():", "a".isupper(),"A".isupper(), "1".isupper())
print(s.upper())

## Lists

In [None]:
a = [ 7, 3 ,8, 10, 7, 1, 9, 1, 5, "foo"]
print(a[0], a[0:4], a[-1])
print()

b = []
print(b)
b.append("a")
b.append("a")

b.append(5)
b.append(a)
print(b)
b.remove("a")
print(b)
c=b.pop()
print('c',c)
print("Pop",b.pop(0))
print(b)
del b[0]
print(b)

## Basic control flow

Quick recap of loops and conditional statements.

### The `if` statement

In [None]:
a= 16
if a>3:
    print('a larger than 3')
elif a<3:
    print('a smaller than 3')
else:
    print('a equals 3')

### The `while` statement

In [None]:
f = 1.0
v = 1.0
while f * v < 1e10:
    f *= v
    v += 1.0
print("The largest factorial less than 1e10 is: %d! = %g" % (v - 1, f))

### The `for` statement

In [None]:
for a in range(10):
    if a < 5:
        print(a)
    else:
        print(10-a)

Loops over an iterable, an object that can return its elements one by one, e.g., `range(10)` provides the integers from 0 to 9. Other iterables include strings and lists. More on this later!

You can use loops to build lists. For example, this creates a list containing the square numbers:

In [None]:
L = []
for x in range(5):
    L.append(x**2)
print(L)

## Lists comprehensions

List comprehensions are a concise and readable way to create lists. The above example with a list comprehension looks like:

In [None]:
L = [x**2 for x in range(5)]
print(L)

List comprehensions are similar to the mathematical formalism of defining sets: 
$$ L=\lbrace x^2 : x \in \lbrace 0, 1, 2, 3, 4\rbrace \rbrace.$$

Items can be included conditionally:

In [None]:
L = [x**2 for x in range(5) if x > 2]
print(L)

This leaves out elements that do not satisfy the criteria, and it is not the same as using an `if`-`else` statement in the body of the list comprehension, e.g.:

In [None]:
L = [x**2 if x > 2 else "<=2" for x in range(5)]
print(L)

You can also have nested `for` loops:

In [None]:
L = [(i,j,i+j) for i in range(3) for j in range(4)]
print(L)

## The end

of new material. Exercises in other notebook.

