# Python Essentials
## DAT540 Introduction to Data Science
## University of Stavanger
### L02
### 28/08/2020

#### Antorweep Chakravorty (antorweep.chakravorty@uis.no)

## Python Language Basic
- **Indentation, not braces**
 - Python uses white spaces (tabs or spaces) to structure code
 - a colon **:** denotes the start of an indented code block
 - all statements that follow the colon must be indented by the same amount of spaces until the end of the block
```python
    if x < pivot:
        less.append(x)
    else:
        greater.append(x)
```
- Python statements doesnot need to be termininated by a semi-colon or any other markers
- All varibles, numbers, string, data structures, functions, classes and modules are refered to as a object
- Each object has a *type* and can be retrived using *type(object_name)* (e.g.: string, function)
- Commenting a statement in python is done by using the pound **#** sign 
- Often functionalities in python are grouped in file with a *.py* extention. They are refered to as *modules* and can be imported or used by another file using the **import module_name** command
```python
# This is a comment
a = 10
print(a) # printing "a"
```

- **Variables and argument passing**
 - When assigning a variable (or name) *"a"* in python, a reference to the object on the righthand side of the equal sign is created
- If *"a"* is now assigned to a new variable *"b"*, then *"b"* will also start to refer to the same object rather than maintaining a copy of the object that *"a"* refered to.
<img src="object_ref.png" alt="either local or remote" width=150>

In [1]:
a = [1, 2, 3]
b = a
a.append(4)

```python
print(b)
```

In [2]:
print(b)

[1, 2, 3, 4]


- In order to create a copy of an object, we use the *copy* method from the *copy module*

In [3]:
import copy
c = copy.copy(a)
a.append(7)
print('c:', c)
print('a:', a)

c: [1, 2, 3, 4]
a: [1, 2, 3, 4, 7]


- Objects passed as arguments to a functions are reference to the original objects

In [4]:
def append_an_element(my_list, element):
    my_list.append(element)
    print('my_list:', my_list)

data = [1, 2, 3]
append_an_element(data, 4)
print('data: ', data)

my_list: [1, 2, 3, 4]
data:  [1, 2, 3, 4]


- **Objects**
 - Object references in python have no type associated with them
 - Therefore, the same variable can be used sequentially to refer to objects of different types

In [5]:
a = 5
print(type(a))
a = '5'
print(type(a))

<class 'int'>
<class 'str'>


 - The *type* of an object  can be verified using the *isinstance* function 
 - The *isinstance* function also accepts a tuple of types, if a check needs to be made on whether a object is amoung those present in the tuple

In [6]:
a = 5
print(isinstance(a, int))
print(isinstance(a, (int, float)))

True
True


 - Objects in python have both attribute and methods that can be access by following the object reference with a dot **.** sign

- **Binary Operations**
<img src="binary_ops.png" alt="either local or remote" width=400>

In [7]:
a = 10
b = 3
print('Floor Divide:', a // b)
print('Raise a to the power of b', a ** 2)
print('Bitwise Ops.')
print('AND: ', a & b)
print('EXOR: ', a ^ b)

Floor Divide: 3
Raise a to the power of b 100
Bitwise Ops.
AND:  2
EXOR:  9


## Scalar Type
- single value objects such as numericals, stirngs, boolean, dates and times are refered to as scalar types
<img src="scalar_obj.png" alt="either local or remote" width=400>

- **Strings**
 - *string* type objects in python can be expressed using a single or a double quote
 - multiline strings with line breaks can be represented using triple quotes, either ''' or """
 - strings are immutable
 - sequence of unicode characters
 - The backslash character **\\** is an escape charecter and can be used to specify special charecters such as *\\n**
 - Strings with escape charecters can be interpreted in their raw format by leading it with the **r** char
 ```python
s = r'\this has no\n \\special char\.'
```
 - string objects also have a *format* method that can be used to substitute formatted arguments into the string, producing a new string
```python
# {0: 2f} first argument is formated as float with two decimal places
# {1:s} second argument is formated as a string
# {2:d} thrid argument is formated as an int
```
 - substituting arguments for these format parameter, we pass a sequence of args. to the *format* method

In [8]:
template = '{0: 2f} {1:s} are worth US${2:d}'
template.format(4.5560, 'Argentine Pesos', 1)

' 4.556000 Argentine Pesos are worth US$1'

- **Bytes, Unicode and Type casting**
 - Unicode has become the default class of string type for more consistent handling of ASCII and non-ASCII text from python 3.0+
 - It is common to encounter bytes objects in context of working with files, wherein it is important to know their proper encoding scheme so as to be able to decode them to the right format

In [9]:
val = 'språk'
print(val)
val_utf8 = val.encode('utf8')
print(val_utf8)
print(type(val_utf8))
print(val_utf8.decode())

språk
b'spr\xc3\xa5k'
<class 'bytes'>
språk


- str, bool, int, and float are also functions that can be used to cast values to those types

- **Dates and Times**
 - The *datetime* module provides *datetime*, *date*, and *time* types
 - *datetime* combines both *date* and *time*
 - These objects are immutable

In [10]:
from datetime import datetime, date, time
dt = datetime(2019, 8, 23, 14, 15, 0)
print('day:', dt.day)
print('minute:', dt.minute)

day: 23
minute: 15


 - The *date* and *time* objects can be extracted from *datetime* as well

In [11]:
print('date: ', dt.date())
print(type(dt.date()))

date:  2019-08-23
<class 'datetime.date'>


 - **Datetime Formatting**
  - the *strftime* method formats a datetime as string
 
 Further details can be found at: [Python Datetime Format reference](http://strftime.org/)

In [12]:
print(dt.strftime('%d/%m/%Y %H:%M'))

23/08/2019 14:15


  - Strings can be converted (parsed) into datetime objects using *strptime* method of datetime

In [13]:
print(datetime.strptime('01012019', '%d%m%Y'))
print(datetime.strptime('01/01/2019 10:30', '%d/%m/%Y %H:%M'))

2019-01-01 00:00:00
2019-01-01 10:30:00


 - **Datetime Operations**
  - The attibutes in an datetime object can be changes using the *replace* method resulting in a copy of the object

In [14]:
print(dt)
print(dt.replace(day=21))

2019-08-23 14:15:00
2019-08-21 14:15:00


  - The difference of two date time objects produces a datetime.timedelta object type

In [15]:
dt2 = datetime(2000, 10, 11)
print(dt - dt2)

6890 days, 14:15:00


## Control Flow
- Control flow statements are used in python for conditional logic, loops and other standards flow concepts
- **if, elsif, and else**
```python
if x < 0:
    print('it is negative')
elif x == 0:
    print('it is zero')
else:
    print('it is positive')
```

  - Compound conditon are created using **or** or **and** keywords.
  - Such compound conditional statements are evaluated left to right and will short-circuit
  - In the following example the comparison *c > d* never gets evaluated because the first comparison was True
```python
a, b, c, d = 5,7,8,4
if a < b or c > d:
      print('Made it!')
```

- **for loops**
 - iterates over a collection or an iterater
```python
for value in collection:
      # do someting with value
```
 - A *for loop* can be advanced to the next iteration, skipping the remainder of the block, using the **continue** keyword
 - A *for loop* can be exited altogether with the **break** keyword. The **break** keyword exits only the most inner most *for loop*    
 - If elements in the collection or iterator are sequences (tuple or list), they can be *unpacked* into variables in the *for loop* statements itself
```python
for a, b, c in iterator:
      # do something
```

- **while loops**
 - a *while loop* specified a condition and a block of code that is to be executed until the condition evaluates to *False* or is explicitly ended with *break* statement
```python
x = 256
total = 0
while x > 0:
    if total > 500:
        break
    total += x
    x = x // 2
```

- **pass**
 - *pass* is *"no-op"* statement that is used in blocks where no action is to be taken. This is used as a placeholder for code not yet implementd, as python uses whitespace to delimit block
- **range**
 - The *range* function runs an iterator that yeilds a sequence of evenly spaced integers
```python
range(10) # Range from 0 upto 10 (not including 10). Each number is incremented by one step
range (1, 10) # Starting of the range is provided
range(0, 10, 2) # Each number is incremented by two step
range(10, 0, -2) # Step can be negative as well to create a reverse list/iteration
```
- **Ternary expressions**
 - A *ternary expression* combines an if-else block into as single statement
```python
value = true-expr if condition else false-expr
a, b = 21, 74
flag = 'Negative' if a - b < 0 else 'Positive'
```


## Data Structures and Sequences
- **Tuple**
 - A tuple is a fixed length, immutable squence of python object
 - The straight forward way to create a tuple is with a comma seperated sequences of values enclodes within parenthesis **()**
 - Any sequence or iterator can be converted to a tuple by using the *tuple()* method
 - Elements can be access with square brackets [ ]
 - Tuples can also store other objects that may be themselves mutable
 - Tuples are light weight have limited instance methods. **.count(*value*)** counts the number of occurences of a value
  ```python
tup = (4, 5, 6)
tup2 = ( (4, 5, 6), (7, 8) )
tup3 = ( (1, 2, 3), [1, 'foo', False], 'bar')
```

In [16]:
tup3 = ( (1, 2, 3), [1, 'foo', False], 'bar')
tup3[1].append('123') # Mutable
print(tup3)

((1, 2, 3), [1, 'foo', False, '123'], 'bar')


 - Multiple tuples can be concatenated using the **+** sign producing a new tuple
 - Multiplying **\*** a tuple by an integer has the effect of concatenating together that many copies of the tuple

In [17]:
tup = (4, 5, 6)
tup2 = ( (4, 5, 6), (7, 8) )
print(tup * 3)
print(tup2 * 2)

(4, 5, 6, 4, 5, 6, 4, 5, 6)
((4, 5, 6), (7, 8), (4, 5, 6), (7, 8))


 - Tuples can be unpacked and each value can be assigned to multiple variables, if the assignement schema on the left hand side of the assignment is the same as to the schema of the tuple
 - Unpacking is a convinient way of handly data in for loops
```python
a, b, c = (1, 2, 3)
a, b, (c, d) = (1, 2, (3, 4))
```

 - **\*rest** syntax provides an advaces way of unpacking by allowing us to select or pluck few elements from the beginning or the end of a tuple

In [18]:
tup3 = (False, 25, [1, 'alice', 29, 'f'])
flag, id, *data = tup3
print('id:', id, '|flag:', flag, '|data:', *data)

id: 25 |flag: False |data: [1, 'alice', 29, 'f']


In [19]:
tup4 = ([1, 'alice', 29, 'f'], False, 25)
*data, flag, id  = tup4
print('id:', id, '|flag:', flag, '|data:', *data)

id: 25 |flag: False |data: [1, 'alice', 29, 'f']


- **List**
 - Mutable, variable length data structures
 - Allows in-place modification of content
 - Lists are defined using the square brackets **[ ]**
 - Elements can be appended to the end of the list with the **append(*value*)** method
 - The **insert(*index*, *value*)** method can be used to insert an element at a specific location in the list
 - The insertion index must be between 0 and the length of the list
 - Elements can be removed *by value* with the **remove(*value*)** method. Only the occurrence of the first *value* in the sequence is removed, if there are duplicates 
 - the **in** keyword is used to check if an element is present in a list: **element in list**.
 - Similarly  **not in** keyword can also be used to check if an element not present in a list
 - However, such checks are slower than *dict* and *sets*, as linear scan is performed on lists rather than checks using hash tables in constant time
```python
a_list = [1,2,3]
a_list.append('foo')
a_list.insert(1, False)
a_list.insert(0, 2)
a_list.remove(2)
print(2 in a_list)
```

 - Lists can also be concatenated using the **+** sign or duplicated using the **\*** sign
 - Alternatively the **extend(*object*)** method can be used to concatenate a list in place

In [20]:
a_list = [1,2,3]
a_list.extend(['a',0.0, 0 ,False])
a_list1 = [1,2,3]
a_list1.append(['a',0.0, 0 ,False])

In [21]:
print('extend:', len(a_list))
print('append:', len(a_list1))

extend: 7
append: 4


In [22]:
(1,2,'3')

(1, 2, '3')

 - **Sorting**
  - Lists can be sorted in place by calling the **sort()** function
  - *sort* also allows the ability to pass a secondary sort key

In [23]:
a_list = ['one', 'two', 'three', 'four', 'five']
a_list.sort()
print('default', a_list)
a_list.sort(key=len)
print('with s. key', a_list)


default ['five', 'four', 'one', 'three', 'two']
with s. key ['one', 'two', 'five', 'four', 'three']


 - **Slicing**
  - Allows us to select sections of most sequence types by using the **slice** notion: *start*:*end* passed to the indexing operator
  - Also, allows section of a sequence to be updated
  - If either the *start* or *end* indices are omitted, it picks the section from the beginning or until the end of the sequence
  - Negative indices slice the sequence relative to the end
<img src="slicing_eg.png" alt="either local or remote" width=320>

In [24]:
seq = [0,1,2,3,4,5,6,7]
seq[3:4] = ['a','b'] # assigning using slicing
print(seq, 'len: ', len(seq))
print(seq[1:4]) # slice
print(seq[:3])
print(seq[3:])
print(seq[-3:]) # negative slicing
print(seq[-6:-2])

[0, 1, 2, 'a', 'b', 4, 5, 6, 7] len:  9
[1, 2, 'a']
[0, 1, 2]
['a', 'b', 4, 5, 6, 7]
[5, 6, 7]
['a', 'b', 4, 5]


- **Built-in Sequence Functions**
 - **enumerate** allows tracking of index of current item when iterating over a sequence
 ```python
for i, value in enumerate(collection):
    # do something here
```
 - The **sorted** function returns a new sorted list from the elements of any sequence
 - Accepts same arguments as *sort* for lists
```python
sorted([8,5,1,8,3,-1,0])
```
 - **zip** *pairs* up the elements of a number of lists, tuples, or ther sequences to create a list of tuples
```python
seq1 = ['foo', 'bar', 'baz']
seq2 = ('one', 'two', 'three')
seq3 = [1, 2, 3]
zipped = zip(seq1, seq2)
zipped = zip(seq1, seq2, seq3)
print(list(zipped))
```
 - How would you unzip a zipped record?
 - **revered** itterates over the elements of a sequence in reverse order

- **dict** also called as *hash map* 
 - flexibly sized *key-value* pairs
 - uses *curly braces* **{ }** and *colon* seperated keys and values
 - access, insert and update are perfomed using the same syntax as for accessing elements in a list or tuple
 - elements accessed through their keys using **[ ]** notion that are not found raises an exception. This can be addressed by using the **get(*key*)** method which returns  *None* if a key is not found
 - elements can be deleted using the **del** keyword or the **pop(*key*)** method. However, if the key is not found for either of them a exception would be raised
 - The method **keys()** and **values()** returns all the keys and values respectively
 - The *keys* of a dict has to be of scalar types or tuples. Since dicts are also known has *hash maps*, the keys needs to be hashable. This can be varified by using the standalone **hash(value)** method

In [25]:
empty_dict = {}
a_dict = {'a': 'xyz', 'b': 2, 3:[1,2,3]}
a_dict[5] = False # insert new
a_dict[3][2] = 0 # update
print(a_dict['a']) # access
del a_dict['a'] # remove / delete
return_val = a_dict.pop(3) # pop returns the deleted value of the key
print(return_val)
print(a_dict.get('b'))
print(a_dict.get('xyz')) # Key not found

xyz
[1, 2, 0]
2
None


 - Example Categorizing a list of words by their first letters as a dict of lists

In [26]:
words = ['apple', 'bat', 'bar', 'atom', 'book']

by_letter = {}

for word in words:
  letter = word[0]
  if letter not in by_letter:
    by_letter[letter] = [word]
  else:
    by_letter[letter].append(word)

print(by_letter)

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}


 - The same can be done using the **setdefault(*value*, *list*)** method of the dict data structure

In [27]:
words = ['apple', 'bat', 'bar', 'atom', 'book']

by_letter = {}

for word in words:
  letter = word[0]
  by_letter.setdefault(letter, []).append(word)

print(by_letter)

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}


- *set* is an unordered collection of unique elements
 - can be created using the *set function*or the *set literal* with curly braces **{ }**
```python
set([1,2,3,4])
{1,2,3,4}
```
 - sets support multiple mathematical set operations like *union*, *intersection*, *difference*, and *symmetric* difference
 - it provides both *methods* and *operators* to perform these actions
<img src="set_opts.png" alt="either local or remote" width=500>


- **Comprehensions**
 - Allows to concisely transform *List*, *Set* and *Dict* and form a new object by filtering and transforming the elements of the collection
 - **[*expr* for val in collection if condition]**
 - Also called as flattening
 - Example: if we have a for loop that does the follow:
 ```python
result = []
for val in collection:
      if condition:
          result.append(expr)
```
 - It can be written as a comprehension:

In [28]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

  - Nested Comprehensions
  - Eg.: We have multi dimentional array with names. We want to get the list of names that have atleast 2 a's

In [29]:
all_names = [
  ['John', 'Emily', 'Michael', 'Mary', 'Steven'],
  ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']
]
names_of_interest = []
for names in all_names:
  for name in names:
    if name.count('a') >= 2:
      names_of_interest.append(name)
print(names_of_interest)

['Maria', 'Natalia']


 - How do we do it using *comprehensions*

In [30]:
all_names = [
  ['John', 'Emily', 'Michael', 'Mary', 'Steven'],
  ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']
]
names_of_interest = [name for names in all_names for name in names if name.count('a') >= 2]
print(names_of_interest)

['Maria', 'Natalia']


## Markdown and LaTeX for Code documentation
- Markdown is a popular markup language to create rich text documents using
plaintext
- $LaTeX$ is another rich text documentation framework that provides ability for
creating expresive mathematical equations as well. LaTeX equations can be
directly embedded into markdown, without the need to write or use LaTeX
package seperately.

In [None]:
%%markdown
### Markdown Basics
- Ways to create **bold**, *italics* and ~~scratched~~ words and charecters. No
underline yet

In [32]:
%%markdown
### Lists and Sublists (unordered)
- This is a list
 - This is a sublist
  - This is a subsublist
- ...

### Lists and Sublists (unordered)
- This is a list
 - This is a sublist
  - This is a subsublist
- ...

In [1]:
%%markdown
### Lists and Sublists (ordered)
1. This is a list
  1. This is a sublist
  2. This is a subsublist
2. ...

### Lists and Sublists (ordered)
1. This is a list
  1. This is a sublist
  2. This is a subsublist
2. ...


In [34]:
%%markdown
### Quotes
> Twenty years from now you will be more disappointed by the things that you didn’t do than by the ones you did do.So throw off the bowlines. Sail away from the
safe harbor. Catch the trade winds in your sails
>
> [Mark Twain](https://en.wikipedia.org/wiki/Mark_Twain)

### Quotes
> Twenty years from now you will be more disappointed by the things that you didn’t do than by the ones you did do.So throw off the bowlines. Sail away from the
safe harbor. Catch the trade winds in your sails
>
> [Mark Twain](https://en.wikipedia.org/wiki/Mark_Twain)

In [35]:
%%markdown
### Horizontal lines
---

### Horizontal lines
---

In [36]:
%%markdown
### Tables
| Name | Age | Gender |
|------|:---:|:-------|
|a     |30   |m       |
|b     |31   |f       |
|c     |26   |f       |

### Tables
| Name | Age | Gender |
|------|:---:|:-------|
|a     |30   |m       |
|b     |31   |f       |
|c     |26   |f       |

In [37]:
%%markdown
### Heading
# H1
## H2
### H3
#### H4
##### H5
###### H6

### Heading
# H1
## H2
### H3
#### H4
##### H5
###### H6

In [38]:
%%markdown
### Embed Code
- Python Code:
```python
x = 10
print(x)
```
- Javascript:
```javascript
console.log("Hello World")
```

### Embed Code
- Python Code:
```python
x = 10
print(x)
```
- Javascript:
```javascript
console.log("Hello World")
```

In [39]:
%%markdown
### Embeded $LaTeX$
- Inline expressions can be added by surrounding the latex code with \\$: $e^{i
\pi} + 1 = 0$ expression.
- Expressions on their own line are surrounded by \\$\\$:
$$e^x=\sum_{i=0}^\infty \frac{1}{i!}x^i$$

### Embeded $LaTeX$
- Inline expressions can be added by surrounding the latex code with \\$: $e^{i
\pi} + 1 = 0$ expression.
- Expressions on their own line are surrounded by \\$\\$:
$$e^x=\sum_{i=0}^\infty \frac{1}{i!}x^i$$

In [2]:
%%markdown
### Local or Remote file rendering (images, audio, video)
- Image
<img src="https://www.python.org/static/community_logos/python-logo-master-v3-TM.png" alt="either local or remote" width=100>

- Video 
<video controls src="../filesystem/file.mp4" width=200/>

- Audio 

<audio controls src="../filesystem/file.mp3"/>

### Local or Remote file rendering (images, audio, video)
- Image
<img src="https://www.python.org/static/community_logos/python-logo-master-v3-TM.png" alt="either local or remote" width=100>

- Video 
<video controls src="../filesystem/file.mp4" width=200/>

- Audio 

<audio controls src="../filesystem/file.mp3"/>


In [41]:
%%markdown
### HTML
- Because Markdown is a superset of HTML you can even add things like any HTML tags:
- Header:
<h2>H2</h2>
- Table
<table>
<tr><td>Name</td><td>Age</td><td>Gender</td></tr>
<tr><td>a</td><td>30</td><td>m</td></tr>
<tr><td>b</td><td>31</td><td>f</td></tr>
<tr><td>c</td><td>26</td><td>f</td></tr>
</table>

### HTML
- Because Markdown is a superset of HTML you can even add things like any HTML tags:
- Header:
<h2>H2</h2>
- Table
<table>
<tr><td>Name</td><td>Age</td><td>Gender</td></tr>
<tr><td>a</td><td>30</td><td>m</td></tr>
<tr><td>b</td><td>31</td><td>f</td></tr>
<tr><td>c</td><td>26</td><td>f</td></tr>
</table>

> jupyter nbconvert L02.ipynb --to slides  --post serve
  - From local machine
>
> jupyter nbconvert L02.ipynb --to slides --ServePostProcessor.ip="0.0.0.0" --post serve
  - From Container
>
> Serves jupyter notebooks as slides

> Can be accessed at: http://localhost:8000/L02.slides.html
