# Python For Data Science

Felix Biessmann

Lecture 2: Variables, Types, Operators and Data Structures

# Python Variables

- are assigned with ```=```
- are *dynamically typed* (have no static type)
- are pointers


## Python Variables are assigned with ```=```

In [1]:
# assign 4 to the variable x
x = 4 
print(x)

4


## Python Variables are 'dynamically typed'

In [2]:
x = 1         # x is an integer
print(x)
x = 'hello'   # now x is a string
print(x)
x = [1, 2, 3] # now x is a list
print(x)

1
hello
[1, 2, 3]


## Dynamic Typing: Caveats
- Type only known at runtime
- Can result in lots of [duck typing](https://en.wikipedia.org/wiki/Duck_typing) - or errors

<center>
<img src="figures/duck_typing.jpg">
</center>


<center>
"If it walks like a duck and it quacks like a duck, then it must be a duck"
</center>




### Duck Typing



- 'Normal' (that is *static*) typing: variable is declared to be of certain type
```
int c = 0;
```

- Python does not have variables with static types (but see [mypy](http://mypy-lang.org/examples.html))
- The **Duck Test** determines whether a variable can be used for a purpose to determine its type




In [3]:
# a function that for multiplying numbers by two:
def multiply_by_two(x):
    return x * 2

x = 2 # x is an integer
print(multiply_by_two(x))



4


In [4]:
x = "2" # x is a string
print(multiply_by_two(x))

22


## How to check types by introspection

In [5]:
x = 1         # x is an integer
type(x)

int

In [6]:
x = "1"         # x is a string
type(x)

str

In [7]:
x = [1]         # x is a list
type(x)

list

In [8]:
x = [1]         # x is a list
issubclass(type(x), list)

True

# Python Scalar Types

| Type        | Example        | Description                                                  |
|-------------|----------------|--------------------------------------------------------------|
| ``int``     | ``x = 1``      | integers (i.e., whole numbers)                               |
| ``float``   | ``x = 1.0``    | floating-point numbers (i.e., real numbers)                  |
| ``complex`` | ``x = 1 + 2j`` | Complex numbers (i.e., numbers with real and imaginary part) |
| ``bool``    | ``x = True``   | Boolean: True/False values                                   |
| ``str``     | ``x = 'abc'``  | String: characters or text                                   |
| ``NoneType``| ``x = None``   | Special object indicating nulls                              |


## Integers

In [9]:
x = 1
type(x)

int

In [10]:
# python ints are automatically casted to floats
x / 2

0.5

## Floats

In [11]:
x = 1.
type(x)

float

In [12]:
# explicit cast
x = float(1)
type(x)

float

In [13]:
# equality checks between floats and ints actually work
x == 1

True

### Exponential notation

``e`` or ``E`` can be read "...times ten to the...",
so that ``1.4e6`` is interpreted as $~1.4 \times 10^6$


In [14]:
x = 1400000.00
y = 1.4e6
print(x == y)

True


## None Type


In [15]:
return_value = print('abc')
type(return_value)

abc


NoneType

## Boolean Type

- ```True``` and ```False```
- Case sensitive!

In [16]:
result = (4 < 5)
result

True

In [17]:
type(result)

bool

### Many types are implicitly cast to booleans:

### Numbers

In [18]:
bool(2014)

True

In [19]:
bool(0)

False

### None Types (or any other Type)

In [20]:
bool(None)

False

### Strings

In [21]:
bool("")

False

In [22]:
bool("abc")

True

### Lists

In [23]:
bool([1, 2, 3])

True

In [24]:
bool([])

False

## Python variables are pointers

In [25]:
x = [1, 2, 3]
y = x
print(x)
print(y)

[1, 2, 3]
[1, 2, 3]


In [26]:
# let's change the original variable x
x.append(4)
# now lets inspect y
print(y)

[1, 2, 3, 4]


In [27]:
# however:
x = "Something entirely different"
print(y)

[1, 2, 3, 4]


## Python variables are objects
- Objects (in all object oriented languages) have:
 - Attributes / Fields
 - Functions / Methods
- For simple types (``int``, ``str``, ...), many methods are accessible through **Operators**

# Operators
- Arithmetic Operators
- Bitwise Operators
- Assignment Operators
- Comparison Operators
- Boolean Operators
- Membership Operators


## Arithmetic Operations

| Operator     | Name           | Description                                            |
|--------------|----------------|--------------------------------------------------------|
| ``a + b``    | Addition       | Sum of ``a`` and ``b``                                 |
| ``a - b``    | Subtraction    | Difference of ``a`` and ``b``                          |
| ``a * b``    | Multiplication | Product of ``a`` and ``b``                             |
| ``a / b``    | True division  | Quotient of ``a`` and ``b``                            |
| ``a // b``   | Floor division | Quotient of ``a`` and ``b``, removing fractional parts |
| ``a % b``    | Modulus        | Integer remainder after division of ``a`` by ``b``     |
| ``a ** b``   | Exponentiation | ``a`` raised to the power of ``b``                     |
| ``-a``       | Negation       | The negative of ``a``                                  |
| ``+a``       | Unary plus     | ``a`` unchanged (rarely used)                          |


In [28]:
a = 1
b = 1
a + b

2

## Bitwise Operations

| Operator     | Name            | Description                                 |
|--------------|-----------------|---------------------------------------------|
| ``a & b``    | Bitwise AND     | Bits defined in both ``a`` and ``b``        |
| <code>a &#124; b</code>| Bitwise OR      | Bits defined in ``a`` or ``b`` or both      |
| ``a ^ b``    | Bitwise XOR     | Bits defined in ``a`` or ``b`` but not both |
| ``a << b``   | Bit shift left  | Shift bits of ``a`` left by ``b`` units     |
| ``a >> b``   | Bit shift right | Shift bits of ``a`` right by ``b`` units    |
| ``~a``       | Bitwise NOT     | Bitwise negation of ``a``                          |

In [29]:
a = True
b = False
a & b

False

## Assignment Operations

| $~$     | $~$           | $~$                                 |
|--------------|-----------------|---------------------------------------------|
|``a += b``| ``a -= b``|``a *= b``| ``a /= b``|
|``a //= b``| ``a %= b``|``a **= b``|``a &= b``|
|<code>a &#124;= b</code>| ``a ^= b``|``a <<= b``| ``a >>= b``|

In [30]:
a = 2
a += 2  # equivalent to a = a + 2
print(a)

4


## Boolean Operations

| Operator      | Description                                       |
|---------------|---------------------------------------------------|
| ``a and b``    | True if ``a`` and ``b``     |
| ``a or b``| True if ``a`` or ``b`` is true |
| ``not a``    | True if ``a`` is false.|

In [31]:
True and False

False

In [32]:
[True, True] or [False, True]

[True, True]

In [33]:
[True, True] and [False, 2]

[False, 2]

## Comparison Operations

| Operation     | Description                       | Operation     | Description                          |
|---------------|-----------------------------------|---------------|--------------------------------------|
| ``a == b``    | ``a`` equal to ``b``              | ``a != b``    | ``a`` not equal to ``b``             |
| ``a < b``     | ``a`` less than ``b``             | ``a > b``     | ``a`` greater than ``b``             |
| ``a <= b``    | ``a`` less than or equal to ``b`` | ``a >= b``    | ``a`` greater than or equal to ``b`` |

In [34]:
# 25 is odd
25 % 2 == 1

True

In [35]:
# check if a is between 15 and 30
a = 25
15 < a < 30

True

In [36]:
# comparisons on standard collections are not element wise
[1,3] == [2,2]

False

In [37]:
# but
a = [1,2]
b = a
a == b

True

## Identity and Membership Operators

| Operator      | Description                                       |
|---------------|---------------------------------------------------|
| ``a is b``    | True if ``a`` and ``b`` are identical objects     |
| ``a is not b``| True if ``a`` and ``b`` are not identical objects |
| ``a in b``    | True if ``a`` is a member of ``b``                |
| ``a not in b``| True if ``a`` is not a member of ``b``            |

In [38]:
1 in [1,2,3]

True

## Strings

- Python is great for Strings
- String encoding is a good reason to not use Python 2
- We'll do a quick recap of regexps

### Some Useful String Functions

In [39]:
message = "The answer is "
answer = '42'

In [40]:
# length of string
len(answer)

2

In [41]:
# Make upper-case. See also str.lower()
message.upper()

'THE ANSWER IS '

In [42]:
# concatenation 
message + answer

'The answer is 42'

In [43]:
# multiplication
answer * 3

'424242'

In [44]:
# Accessing individual characters (zero-based indexing)
message[0]

'T'

In [45]:
# multiline strings
multiline_string = """
Computers are useless. 
They can only give you answers.
"""

In [46]:
# stripping off unnecessary blanks (including \n or \t)
line = '         this is the content         '
line.strip()

'this is the content'

In [47]:
# finding substrings
line = 'the quick brown fox jumped over a lazy dog'
line.find('fox')

16

In [48]:
line.find('bear')

-1

In [49]:
# simple replacements
line.replace('brown', 'red')

'the quick red fox jumped over a lazy dog'

In [50]:
# splitting a sentence into words
line.split()

['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'a', 'lazy', 'dog']

In [51]:
# joining them back together
'--'.join(line.split())

'the--quick--brown--fox--jumped--over--a--lazy--dog'

### String Formatting

In [52]:
x = 3.14159
"This is a bad approximation of pi: " + str(x)

'This is a bad approximation of pi: 3.14159'

In [53]:
"This is a bad approximation of pi: {}".format(x)

'This is a bad approximation of pi: 3.14159'

In [54]:
y = 3
"This is a bad approximation of pi: {} but better than {}".format(x, y)

'This is a bad approximation of pi: 3.14159 but better than 3'

In [55]:
"This is a bad approximation of pi: {1} but better than {0}".format(y, x)

'This is a bad approximation of pi: 3.14159 but better than 3'

In [56]:
"""This is a bad approximation of pi: {bad} but 
   better than {worse}""".format(bad=x, worse=y)

'This is a bad approximation of pi: 3.14159 but \n   better than 3'

In [57]:
"Both of these approximations of pi are bad: {0:.3f} and {1}".format(x,y)

'Both of these approximations of pi are bad: 3.142 and 3'

### Since Python 3.6: f-String interpolation

In [58]:
width = 5
precision = 3
f'Bad approximation, nice f-string interpolation formatting: {x:{width}.{precision}}'

'Bad approximation, nice f-string interpolation formatting:  3.14'

### Regular Expressions Recap

In [59]:
import re
line = 'the quick brown fox jumped over a lazy dog'
regex = re.compile('fox')
match = regex.search(line)
match.start()

16

In [60]:
regex.sub('BEAR', line)

'the quick brown BEAR jumped over a lazy dog'

### Some Special Regexp Characters

| Character | Description                 | Character | Description                     |
|-----------|-----------------------------|-----------|---------------------------------|
| ``"\d"``  | Match any digit             | ``"\D"``  | Match any non-digit             |
| ``"\s"``  | Match any whitespace        | ``"\S"``  | Match any non-whitespace        |
| ``"\w"``  | Match any alphanumeric char | ``"\W"``  | Match any non-alphanumeric char |

There are many more special regexp characters; for more details, see Python's [regular expression syntax documentation](https://docs.python.org/3/library/re.html#re-syntax).

In [61]:
regex = re.compile(r'\w\s\w')
regex.findall('the fox is 9 years old')

['e f', 'x i', 's 9', 's o']

### Finding Any Character in Set

If the special symbols are not enough, you can define your own character sets

In [62]:
regex = re.compile('[aeiou]')
regex.split('consequential')

['c', 'ns', 'q', '', 'nt', '', 'l']

In [63]:
regex = re.compile('[A-Z][0-9]')
regex.findall('1043879, G2, H6')

['G2', 'H6']

In [64]:
regex = re.compile('[A-Z][A-Z][0-9]')
regex.findall('1043879, G2, H6 AH9')

['AH9']

### Wildcards

| Character | Description | Example |
|-----------|-------------|---------|
| ``?`` | Match zero or one repetitions of preceding  | ``"ab?"`` matches ``"a"`` or ``"ab"`` |
| ``*`` | Match zero or more repetitions of preceding | ``"ab*"`` matches ``"a"``, ``"ab"``, ``"abb"``, ``"abbb"``... |
| ``+`` | Match one or more repetitions of preceding  | ``"ab+"`` matches ``"ab"``, ``"abb"``, ``"abbb"``... but not ``"a"`` |
| ``{n}`` | Match ``n`` repetitions of preceding | ``"ab{2}"`` matches ``"abb"`` |
| ``{m,n}`` | Match between ``m`` and ``n`` repetitions of preceding | ``"ab{2,3}"`` matches ``"abb"`` or ``"abbb"`` |


In [65]:
regex = re.compile('[A-Z][A-Z][0-9]')
regex.findall('1043879, G2, H6 AH9')

['AH9']

In [66]:
regex = re.compile('[A-Z]{2}[0-9]')
regex.findall('1043879, G2, H6 AH9')

['AH9']

In [67]:
regex = re.compile('[A-Z]+[0-9]')
regex.findall('1043879, G2, H6 AH9')

['G2', 'H6', 'AH9']

In [68]:
regex = re.compile('[A-Z]*[0-9]')
regex.findall('1043879, G2, H6 AH9')

['1', '0', '4', '3', '8', '7', '9', 'G2', 'H6', 'AH9']

### Example: Matching E-Mail Adresses

In [69]:
email = re.compile('\w+@\w+\.[a-z]{3}')
text = "To email me, try user1214@python.org or hans@google.com."
email.findall(text)

['user1214@python.org', 'hans@google.com']

In [70]:
email.findall('barack.obama@whitehouse.gov')

['obama@whitehouse.gov']

In [71]:
email2 = re.compile(r'[\w.]+@\w+\.[a-z]{3}')
email2.findall('barack.obama@whitehouse.gov')

['barack.obama@whitehouse.gov']

### Matching Groups
Often it can be helpful to extract groups of matched substrings

In [73]:
email3 = re.compile(r'([\w.]+)@(\w+)\.([a-z]{3})')
email3.findall(text)

[('user1214', 'python', 'org'), ('hans', 'google', 'com')]

### Matching Named Groups
For programmatic treatment of groups, naming them can be useful

In [74]:
email4 = re.compile(r'(?P<user>[\w.]+)@(?P<domain>\w+)\.(?P<suffix>[a-z]{3})')
match = email4.match('guido@python.org')
match.groupdict()

{'user': 'guido', 'domain': 'python', 'suffix': 'org'}

# Data Structures

## Builtin Python Data Structures

| Type Name | Example                   |Description                            |
|-----------|---------------------------|---------------------------------------|
| ``list``  | ``[1, 2, 3]``             | Ordered collection                    |
| ``tuple`` | ``(1, 2, 3)``             | Immutable ordered collection          |
| ``dict``  | ``{'a':1, 'b':2, 'c':3}`` | Unordered (key,value) mapping         |
| ``set``   | ``{1, 2, 3}``             | Unordered collection of unique values |


## Lists

- Ordered, indexable
- zero-based indexing
- Mutable
- Defined by ``[1, 2, 3]`` 


### List Indexing - Accessing Single Elements

In [75]:
L = [2, 3, 5, 7, 11]
L[0]

2

In [76]:
L[1]

3

In [77]:
L[-1]

11

In [78]:
L[-2]

7

### List Slicing - Accessing Multiple Elements

In [79]:
L[0:3]

[2, 3, 5]

In [80]:
L[:3]

[2, 3, 5]

In [81]:
L[-3:]

[5, 7, 11]

In [82]:
L[-3:-1]

[5, 7]

In [83]:
L[::2]  # equivalent to L[0:len(L):2]

[2, 5, 11]

In [84]:
L[::-1] # reverses a list

[11, 7, 5, 3, 2]

### List Indexing and Slicing for Accessing and Assigning Elements

In [85]:
L[0] = 100
L

[100, 3, 5, 7, 11]

In [86]:
L[1:3] = [55, 56]
L

[100, 55, 56, 7, 11]

## Lists


|Operation     | Example          | Class         | 
|--------------|------------------|---------------|
|Access        | ``l[i]  ``       | O(1)	     |
|Change Element| ``l[i] = 0 ``    | O(1)	     |
|Slice         | ``l[a:b] ``      | O(b-a)	     | 
|Extend        |`` l.extend(...)``| O(len(...))   |
|check ==, !=  | ``l1 == l2``     | O(N)          |
|Insert        | ``l[a:b] = ...`` | O(N)	     |
|Delete        | ``del l[i]``     | O(N)	     |
|Membership    | ``x in/not in l``| O(N)	     |
|Extreme value | ``min(l)/max(l)``| O(N)	     |
|Multiply      | ``k*l ``         | O(k N)        |

[Source](https://www.ics.uci.edu/~pattis/ICS-33/lectures/complexitypython.txt)

## Tuples
- Similar to lists
- Immutable
- Defined by ``(1, 2, 3)`` or ``1, 2, 3`` 

In [87]:
t = (1, 2, 3)
t

(1, 2, 3)

In [88]:
t = 1, 2, 3
t

(1, 2, 3)

In [89]:
len(t)

3

### Elements cannot be changed

```python
t[0] = 5
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-86-6dd06f73cec4> in <module>()
----> 1 t[0] = 5

TypeError: 'tuple' object does not support item assignment
```

### Return types of functions are often tuples


In [90]:
x = 0.125
x.as_integer_ratio()

(1, 8)

In [91]:
numerator, denominator = x.as_integer_ratio()
numerator / denominator

0.125

## Sets
- Unordered collections of unique items
- Support set operations
- Defined by ``{1, 2, 3}`` 

In [92]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}

### Union
items appearing in either set

In [93]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}
primes | odds      # with an operator
primes.union(odds) # equivalently with a method

{1, 2, 3, 5, 7, 9}

### Intersection
items appearing in both sets

In [94]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}
primes & odds             # with an operator
primes.intersection(odds) # equivalently with a method

{3, 5, 7}

### Difference
items appearing in one but not other set

In [95]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}
primes - odds           # with an operator
primes.difference(odds) # equivalently with a method

{2}

### Symmetric difference
items appearing in only one set

In [96]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}
primes ^ odds                     # with an operator
primes.symmetric_difference(odds) # equivalently with a method

{1, 2, 9}

## Dictionaries
- Hash table
- Extremely flexible and versatile
- Fast access
- Unordered
- Defined by ``key:value`` pairs within curly braces: ``{'a':1, 'b':2, 'c':3}``

In [97]:
numbers = {'one':1, 'two':2, 'three':3}

In [98]:
# Access a value via the key
numbers['two']

2

In [99]:
# Set a new key:value pair
numbers['ninety'] = 90
numbers

{'one': 1, 'two': 2, 'three': 3, 'ninety': 90}

## Dictionary

|Operation     |Example       |Class| 
|--------------|--------------|-----|
|Access        | ``d[k]``     | O(1)|
|Change Element| ``d[k] = 0 ``| O(1)|
|Delete        | ``del d[k]`` | O(1)|

[Source](https://www.ics.uci.edu/~pattis/ICS-33/lectures/complexitypython.txt)