# Data Types

In deze les behandelen we onderstaande data types:
* Numbers
  * Int (_immutable_)
  * Float (_immutable_)
* Booleans (_immutable_)
* String (_immutable_)
* List (_mutable_)
* NumPy Array (_mutable_)
* Dictonary (_mutable_)
* Tuple (_immutable_)

Daarbij behanden we ook:  
* List Comprehension
* Performance en geheugengebruik
* Gebruik van `range()`

## Numbers

In [1]:
# integers and floats
x = 10
print(x, type(x))

y = 20 / 2
print(y, type(y))

10 <class 'int'>
10.0 <class 'float'>


In [2]:
# pi
import numpy as np
print(type(np))
print(np.pi, type(np.pi))

<class 'module'>
3.141592653589793 <class 'float'>


In [3]:
# operands and operators
x = 2
y = 8
z = 20

# print (with space between arguments)
print('x ** y =', x ** y)
print('x * y =', x * y)
print('y / x =', y / x)
print('z % y =', z % y)
print('z // y =', z // y)

# print with string formatting
print('{} + {} = {}'.format(y, x, y + x))
print('{2} - {1} = {0}'.format(y - x, x, y))
print('\n{0} ** {0} = {1}'.format(y, y ** y)) # in Python the literal \n is used to display the result in new line

x ** y = 256
x * y = 16
y / x = 4.0
z % y = 4
z // y = 2
8 + 2 = 10
8 - 2 = 6

8 ** 8 = 16777216


In [4]:
# add 5 to x
x = 8
x = x + 2
x

10

In [5]:
# the python way
x = 8
x += 2
x

10

In [6]:
# same for all operators!
x = 20
x //= 8
x

2

In [7]:
# type conversion
i = int(3.14159262)
f = float(10)
print(i, type(i))
print(f, type(f))

3 <class 'int'>
10.0 <class 'float'>


## Booleans

In [8]:
type(True)

bool

In [9]:
type(False)

bool

In [10]:
5 == 10

False

In [11]:
5 != 10

True

In [12]:
5 < 10

True

In [13]:
5 >= 10

False

In [14]:
False + 5

5

In [15]:
True * 10

10

In [16]:
int(False)

0

In [17]:
int(True)

1

In [18]:
bool(0)

False

In [19]:
bool(1)

True

## Strings

In [20]:
# string
s = 'Python'
print(s, type(s))
print(s, 'is awesome' + '!' * 3)

Python <class 'str'>
Python is awesome!!!


In [21]:
# index and subsetting
s = 'Python'
print(1, s[1])    # index in zero-based
print(2, s[-1])   # last value
print(3, s[2:4])  # format is: string[from (inclusive) : to (exclusive)]
print(4, s[2:])   # to end
print(5, s[:2])   # from begin
print(6, s[:-1])
print(7, s[-4:])
print(8, s[::2])  # format is: string[from (inclusive) : to (exclusive) : slice step]
print(9, s[::-1]) # slice step -1 to reserse order!

1 y
2 n
3 th
4 thon
5 Py
6 Pytho
7 thon
8 Pto
9 nohtyP


**>>> Maak opdracht 1 uit het [Jupyter Notebook](https://nbviewer.jupyter.org/github/Brinkhuis/Cursus/blob/master/notebooks/opdrachten.ipynb) met de opdrachten.**

## Lists

In [22]:
# create a list
x = [1, 2, 3, 'a', 'b', 'c', [num * 10 for num in range(3)], [letter * 3 for letter in 'ABC']]
x

[1, 2, 3, 'a', 'b', 'c', [0, 10, 20], ['AAA', 'BBB', 'CCC']]

In [23]:
# print values and types
for i in range(len(x)):
    print(x[i], type(x[i]))

1 <class 'int'>
2 <class 'int'>
3 <class 'int'>
a <class 'str'>
b <class 'str'>
c <class 'str'>
[0, 10, 20] <class 'list'>
['AAA', 'BBB', 'CCC'] <class 'list'>


In [24]:
# subsetting
x[-1][1:]

['BBB', 'CCC']

In [25]:
# list method
x.remove(x[-1])

In [26]:
# list method
x.pop(-1)

[0, 10, 20]

In [27]:
# list method
for i in x[:3]:
    x.append(i * 10)

In [28]:
# list method
for char in 'abc':
    x.append(char.upper())

In [29]:
# list method
x.extend(range(100, 400, 100))

In [30]:
# print
print(x)

[1, 2, 3, 'a', 'b', 'c', 10, 20, 30, 'A', 'B', 'C', 100, 200, 300]


**>>> Maak opdracht 2 uit het [Jupyter Notebook](https://nbviewer.jupyter.org/github/Brinkhuis/Cursus/blob/master/notebooks/opdrachten.ipynb) met de opdrachten.**

## List Comprehension

In [31]:
# range
x = range(10)
print(x, type(x))
print(list(x), type(list(x)))

range(0, 10) <class 'range'>
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] <class 'list'>


In [32]:
# list comprehension
[num * 2 for num in range(6)]

[0, 2, 4, 6, 8, 10]

In [33]:
# list comprehension using if statement
[num for num in range(11) if num % 2 == 0]

[0, 2, 4, 6, 8, 10]

Welke van bovenstaande list comprehensions zou het meest efficiënt zijn?

In [34]:
# even numbers
list(range(0, 11, 2))

[0, 2, 4, 6, 8, 10]

Of is bovenstaand statement wellicht nog efficiënter?  

Daar komen we zodadelijk op terug, maar wel goed om daar nu alvast over na te denken!

In [35]:
# split string using list comprehension
[char.upper() for char in 'Python']

['P', 'Y', 'T', 'H', 'O', 'N']

In [36]:
# create list of lists using list comprehension
[[num, num * 10] for num in range(5)] # create a list of lists

[[0, 0], [1, 10], [2, 20], [3, 30], [4, 40]]

In [37]:
# list comprehension using if else statement
['even' if num % 2 == 0 else 'oneven' for num in range(10)] 

['even',
 'oneven',
 'even',
 'oneven',
 'even',
 'oneven',
 'even',
 'oneven',
 'even',
 'oneven']

In [38]:
# combined...
[[num, 'even'] if num % 2 == 0 else [num, 'oneven'] for num in range(10)] 

[[0, 'even'],
 [1, 'oneven'],
 [2, 'even'],
 [3, 'oneven'],
 [4, 'even'],
 [5, 'oneven'],
 [6, 'even'],
 [7, 'oneven'],
 [8, 'even'],
 [9, 'oneven']]

In [39]:
# performance test of different solutions with same results

aantal = 10000

a = list(np.array(range(5 * aantal)) * 2)
b = [num for num in range(10 * aantal) if num % 2 == 0]
c = list(np.array(range(5 * aantal)) * 2)
d = [num * 2 for num in range(5 * aantal)]
e = list(range(0, 10 * aantal, 2))

print('Zijn de gedefinieerde objecten identiek? >>', a == b == c == d == e, '\n')

%timeit list(np.array(range(5 * aantal)) * 2)
%timeit [num for num in range(10 * aantal) if num % 2 == 0]
%timeit list(np.array(range(5 * aantal)) * 2)
%timeit [num * 2 for num in range(5 * aantal)]
%timeit list(range(0, 10 * aantal, 2))

Zijn de gedefinieerde objecten identiek? >> True 

11.1 ms ± 493 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.94 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.7 ms ± 81 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.23 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.06 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


**>>> Maak opdracht 3 uit het [Jupyter Notebook](https://nbviewer.jupyter.org/github/Brinkhuis/Cursus/blob/master/notebooks/opdrachten.ipynb) met de opdrachten.**

In [40]:
# memory ussage

from sys import getsizeof

aantal = int(1e6)

a = range(aantal)
b = np.array(range(aantal))
c = list(range(aantal))

print(getsizeof(a))
print(getsizeof(b))
print(getsizeof(c))

48
4000096
9000112


List comprehension werkt super makkelijk en is zeer flexibel!  

Voorkom onnodige iteraties voor optimale performance. Met name bij veel (list)items.

In sommige gevallen is een andere oplossing sneller en/of meer efficient qua geheugengebruik...

## Array (Numpy)

In [41]:
# list (multiply range)
x = list(range(5))
print(x, type(x), '\n')

print(x * 2, '\n')

for i in x:
    x[i] *= 2
print(x)

[0, 1, 2, 3, 4] <class 'list'> 

[0, 1, 2, 3, 4, 0, 1, 2, 3, 4] 

[0, 2, 4, 6, 8]


In [42]:
# array (multiply range)
x = np.array(range(5))
print(x, type(x), '\n')
print(x * 2)

[0 1 2 3 4] <class 'numpy.ndarray'> 

[0 2 4 6 8]


In [43]:
%%timeit
# list performance
x = list(range(1000))
for i in x:
    x[i] *= 2

107 µs ± 8.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [44]:
%%timeit
# array performance
x = np.array(range(1000))
x *= 2

105 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [45]:
# list memory usage
x = list(range(1000))
for i in x:
    x[i] *= 2
print(type(x), getsizeof(x))

<class 'list'> 9112


In [46]:
# array memory usage
x = np.array(range(1000))
x *= 2
print(type(x), getsizeof(x))

<class 'numpy.ndarray'> 4096


Uit bovenstaande tests blijkt dat de snelheid van het vermenigvuldingen 1000 integers in een `list` of een `array` vergelijkbaar is.  
Echter, de `array` gebruikt slechts de helft van het geheugen van de `list`.

In [47]:
# subsetting
x = np.array(range(5))

print(x[-1])
print(type(x[-1]), '\n')

print(x[1:3])
print(type(x[1:3]))

4
<class 'numpy.int32'> 

[1 2]
<class 'numpy.ndarray'>


In [48]:
# array with floats
x = np.array([np.pi] * 3)
print(x)

[3.14159265 3.14159265 3.14159265]


## Dictionary

Een `dict` is een verzameling 'key-value pairs'

In [1]:
# create a dictionary
alphabet = {'a': 'Alpha',
            'b': 'Bravo',
            'c': 'Charlie'}
alphabet

{'a': 'Alpha', 'b': 'Bravo', 'c': 'Charlie'}

In [2]:
# keys
alphabet.keys()

dict_keys(['a', 'b', 'c'])

In [3]:
# values
alphabet.values()

dict_values(['Alpha', 'Bravo', 'Charlie'])

In [4]:
# value for a key
alphabet['b']

'Bravo'

In [5]:
# mutate
alphabet['c'] = 'Coding'
alphabet['c']

'Coding'

In [6]:
# add
alphabet['z'] = 'The Zen of Python'
alphabet['p'] = 'Python'
alphabet['r'] = 'rocks!'

In [9]:
# print items in dictionary
for key, value in alphabet.items():
    print(key, value)

a Alpha
b Bravo
c Coding
z The Zen of Python
p Python
r rocks!


In [56]:
# delete
for k in 'abz':
    del alphabet[k]
alphabet.values()

dict_values(['Coding', 'Python', 'rocks!'])

In [57]:
# dict of dicts
datascientist = dict({'Rene': {'tool': 'Python',
                               'Notebook': 'Jupyter',
                               'IDE': ('Visual Studio Code', 'Spyder')},
                      'Martin': {'tool': 'R',
                                 'Notebook': 'R Notebook',
                                 'IDE': ('RStudio', 'Emacs')}
                     })

In [58]:
# check for key existence
if 'Rene' in datascientist.keys():
    print(datascientist['Rene'], '\n')
    print(type(datascientist['Rene']))

{'tool': 'Python', 'Notebook': 'Jupyter', 'IDE': ('Visual Studio Code', 'Spyder')} 

<class 'dict'>


In [59]:
# keys in dict
datascientist['Rene'].keys()

dict_keys(['tool', 'Notebook', 'IDE'])

In [60]:
# tools for Rene
datascientist['Rene']['tool']

'Python'

In [61]:
# IDE for Martin
type(datascientist['Martin']['IDE'])

tuple

In [62]:
# last value in 'IDE tuple'
datascientist['Martin']['IDE'][-1]

'Emacs'

## Tuples

Een `tuple` is een _immutable_ `list`.

In [63]:
l = ['a', 'b', 'c']
print(type(l), l, '\n')

t = ('a', 'b', 'c')
print(type(t), t, '\n')

print(l[-1] == t[-1])

<class 'list'> ['a', 'b', 'c'] 

<class 'tuple'> ('a', 'b', 'c') 

True


In [64]:
# lists are mutable
l[1] = 'B'
l

['a', 'B', 'c']

In [65]:
# tuples are immutable
try:
    t[1] = 'B'
except TypeError as E:
    print(E)

'tuple' object does not support item assignment
