"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
—Linus Torvalds

| Object type | Meaning | Used for |
|------------|---------|-----------|
| int | Integer value | Natural numbers |
| float | Floating-point number | Real numbers |
| bool | Boolean value | Something true or false |
| str | String object | Character, word, text |
| tuple | Immutable container | Fixed set of objects, record |
| list | Mutable container | Changing set of objects |
| dict | Mutable container | Key-value store |
| set | Mutable container | Collection of unique objects |



# Integers

The built-in function type provides type information for all objects with standard
and built-in types as well as for newly created classes and objects.

In [1]:
a = 10
type(a)

int

One can get the number of bits needed to represent the int
object in memory by calling the method bit_length().

In [2]:
a.bit_length()

4

In [3]:
a = 10000000
a.bit_length()

24

In [6]:
googol = 10 ** 100
googol.bit_length()

333

Python integers can be arbitrarily large. The interpreter simply
uses as many bits/bytes as needed to represent the numbers.

In [7]:
1 + 4

5

In [8]:
1/4

0.25

In [9]:
1//4

0

In [10]:
type(1/4)

float

# Floats

In [11]:
type(1.)

float

In [12]:
type(1.0)

float

Float objects like this one are always represented internally up to a certain degree of accu‐racy only

In [13]:
b = 0.35
type(b)

float

In [14]:
b + 0.1

0.44999999999999996

a decimal number 0 < n < 1 is represented by a series of the form
$n = \frac{x}{2} + \frac{y}{4} + \frac{z}{8} + ...$  For certain floating-point numbers the binary representation
might involve a large number of elements or might even be an infinite series

In [15]:
c = 0.5
c.as_integer_ratio()

(1, 2)

In [16]:
b = 0.35
b.as_integer_ratio()

(3152519739159347, 9007199254740992)

The precision is dependent on the number of bits used to represent the number. In
general, all platforms that Python runs on use the IEEE 754 double-precision standard—i.e., 64 bits—for internal representation. This translates into a 15-digit relative accuracy.

Since this topic is of high importance for several application areas in finance, it is
sometimes necessary to ensure the exact, or at least best possible, representation of
numbers. For example, the issue can be of importance when summing over a large set
of numbers. In such a situation, a certain kind and/or magnitude of representation
error might, in aggregate, lead to significant deviations from a benchmark value.

The module decimal provides an arbitrary-precision object for floating-point num‐
bers and several options to address precision issues when working with such
numbers.

In [17]:
import decimal
from decimal import Decimal

In [20]:
decimal.getcontext()

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])

In [21]:
d = Decimal(1)/Decimal(11)

In [22]:
d

Decimal('0.09090909090909090909090909091')

One can change the precision of the representation by changing the respective
attribute value of the Context object.

In [24]:
decimal.getcontext().prec = 4

In [26]:
e = Decimal(1)/Decimal(11)
print(e)

0.09091


In [28]:
decimal.getcontext().prec = 50
f = Decimal(1)/Decimal(11)
print(f)

0.090909090909090909090909090909090909090909090909091


In [29]:
g = d + e + f
print(g)

0.27272818181818181818181818181909090909090909090909


# Booleans

A complete list of Python keywords is available in the keyword module.

In [30]:
import keyword
keyword.kwlist

['False',
 'None',
 'True',
 'and',
 'as',
 'assert',
 'async',
 'await',
 'break',
 'class',
 'continue',
 'def',
 'del',
 'elif',
 'else',
 'except',
 'finally',
 'for',
 'from',
 'global',
 'if',
 'import',
 'in',
 'is',
 'lambda',
 'nonlocal',
 'not',
 'or',
 'pass',
 'raise',
 'return',
 'try',
 'while',
 'with',
 'yield']

In [31]:
4 > 3

True

In [32]:
type(4 > 3)

bool

In [33]:
type(False)

bool

In [34]:
4 >= 3

True

In [35]:
4 <= 3

False

In [36]:
4 == 3

False

In [37]:
4 != 3

True

Boolean logic can also be combined.

In [38]:
True and True

True

In [39]:
True and False

False

In [40]:
False and False

False

In [41]:
True or True

True

In [42]:
True or False

True

In [43]:
False or False

False

In [44]:
not True

False

In [45]:
not False

True

In [46]:
(4 > 3) and (2 > 3)

False

In [47]:
(4 == 3) or (2 != 3)

True

In [48]:
not (4 != 4)

True

In [50]:
(not(4!=4)) and (2 != 3)

True

One major application area is to control the code flow via other Python keywords,
such as if or while.

In [52]:
if 4 > 3:
    print('condition is true')

condition is true


In [53]:
i = 0
while i < 4:
    print('condition is true and i = ', i)
    i+=1

condition is true and i =  0
condition is true and i =  1
condition is true and i =  2
condition is true and i =  3


Numerically, Python attaches a value of 0 to False and a value of 1 to True. When
transforming numbers to bool objects via the bool() function, a 0 gives False while
all other numbers give True.

In [54]:
int(True)

1

In [55]:
int(False)

0

In [57]:
float(True)

1.0

In [58]:
float(False)

0.0

In [59]:
bool(0)

False

In [60]:
bool(1)

True

In [61]:
bool(0.0000)

False

In [62]:
bool(10)

True

In [63]:
bool(-2)

True

# Strings
<!-- no strings attached :)-->

A str object is generally defined by single or double quotation marks or by
converting another object using the str() function (i.e., using the object’s standard
or user-defined str representation).

In [66]:
t = 'this is a string object'
type(t)

str

In [67]:
t.capitalize()

'This is a string object'

In [68]:
t.split()

['this', 'is', 'a', 'string', 'object']

In [73]:
t.find('th')

0

In [74]:
t.find('t')

0

In [75]:
t.find('string')

10

If the word is not in the str object, the method returns -1.

In [76]:
t.find('Python')

-1

In [77]:
t.replace(' ', '|')

'this|is|a|string|object'

In [86]:
'http://www.python.org'.strip('htp:/')

'www.python.org'

In [87]:
'http://www.python.org'.strip('ht')

'p://www.python.org'

In [90]:
t.upper()

'THIS IS A STRING OBJECT'

| Method | Arguments | Returns/result |
|--------|-----------|----------------|
| capitalize | () | Copy of the string with first letter capitalized |
| count | (sub[, start[, end]]) | Count of the number of occurrences of substring |
| encode | ([encoding[, errors]]) | Encoded version of the string |
| find | (sub[, start[, end]]) | (Lowest) index where substring is found |
| join | (seq) | Concatenation of strings in sequence seq |
| replace | (old, new[, count]) | Replaces old by new the first count times |
| split | ([sep[, maxsplit]]) | List of words in string with sep as separator |
| splitlines | ([keepends]) | Separated lines with line ends/breaks if keepends is True |
| strip | (chars) | Copy of string with leading/lagging characters in chars removed |
| upper | () | Copy with all letters capitalized |
| lower | () | Copy with all letters in lowercase |


In [95]:
i = 0
print(end='|')
while i < 4:
    print(i, end='|') #default is \n or break of line if end is not specified
    i+=1

|0|1|2|3|

Python offers powerful string replacement operations. There is the old way, via the %
character, and the new way, via curly braces ({}) and format(). Both are still applied
in practice. This section cannot provide an exhaustive illustration of all options, but
the following code snippets show some important ones. First, the old way of doing it.

PS: I like the new way

In [96]:
'this is an integer %d' %15

'this is an integer 15'

In [97]:
'this is also an integer %4d' %15

'this is also an integer   15'

In [102]:
'this is an integer too %04d' %150

'this is an integer too 0150'

In [105]:
'this is a float %f' %15.34577

'this is a float 15.345770'

In [107]:
'this is also a float %.2f' %15.98317649

'this is also a float 15.98'

In [109]:
'this is also a float %8f' %15.5

'this is also a float 15.500000'

In [114]:
'this is also a float %8.2f' %15.5

'this is also a float    15.50'

In [115]:
'this is also a float %08.2f' %15.5

'this is also a float 00015.50'

In [116]:
'this is a string %s' %'Python'

'this is a string Python'

In [117]:
'this is also a string %10s' %'python'

'this is also a string     python'

Now, here are the same examples implemented in the new way. Notice the slight differences in the output in some places.

In [119]:
'this is an integer {:d}'.format(15)

'this is an integer 15'

In [122]:
'this is an integer {:4d}'.format(15)

'this is an integer   15'

In [123]:
'this is an integer {:04d}'.format(15)

'this is an integer 0015'

In [124]:
'this is a float {:f}'.format(15.3456)

'this is a float 15.345600'

In [129]:
'this is a float {:0.2f}'.format(15.3456)

'this is a float 15.35'

In [130]:
'this is a float {:8f}'.format(15.3456)

'this is a float 15.345600'

In [131]:
'this is a float {:8.2f}'.format(15.3456)

'this is a float    15.35'

In [132]:
'this is a float {:08.2f}'.format(15.3456)

'this is a float 00015.35'

In [133]:
'this is a string {:s}'.format('Python')

'this is a string Python'

In [139]:
'this is a string {:10s}'.format('Python')

'this is a string Python    '

Now some examples

In [140]:
i = 0
while i < 4:
    print('the number is %d'%i)
    i+=1

the number is 0
the number is 1
the number is 2
the number is 3


In [141]:
i = 0
while i <4:
    print('the number is {:d}'.format(i))
    i+=1

the number is 0
the number is 1
the number is 2
the number is 3


A powerful tool when working with str objects is regular expressions. Python pro‐
vides such functionality in the module re.

Suppose a financial analyst is faced with a large text file, such as a CSV file, which
contains certain time series and respective date-time information. More often than
not, this information is delivered in a format that Python cannot interpret directly.
However, the date-time information can generally be described by a regular expression. Consider the following str object, containing three date-time elements, three
integers, and three strings. Note that triple quotation marks allow the definition of
str objects over multiple rows.

In [233]:
import re
series = """
'01/18/2014 13:00:00', 100, '1st';
'01/18/2014 13:30:00', 110, '2nd';
'01/18/2014 14:00:00', 120, '3rd'
"""

In [235]:
dt = re.compile("'[0-9/:\s]+'") #date time finding

  dt = re.compile("'[0-9/:\s]+'") #date time finding


In [236]:
result = dt.findall(series)

In [237]:
result

["'01/18/2014 13:00:00'", "'01/18/2014 13:30:00'", "'01/18/2014 14:00:00'"]

Parse the str objects containing the date-time information, one needs to provide
information of how to parse them—again as a str object.

In [240]:
from datetime import datetime
pydt = datetime.strptime(result[0].replace("'", ""), '%m/%d/%Y %H:%M:%S')
pydt

datetime.datetime(2014, 1, 18, 13, 0)

In [241]:
print(pydt)

2014-01-18 13:00:00


In [242]:
print(type(pydt))

<class 'datetime.datetime'>


### Optional: RegEx

In [161]:
import re

In [190]:
txt = "The01234Spain"
re.search("^The.*Spain$", txt)

<re.Match object; span=(0, 13), match='The01234Spain'>

The re module offers a set of functions that allows us to search a string for a match:

Function | Description
---------|------------
findall | Returns a list containing all matches
search | Returns a Match object if there is a match anywhere in the string  
split | Returns a list where the string has been split at each match
sub | Replaces one or many matches with a string


Metacharacters are characters with a special meaning:

Character | Description | Example
----------|-------------|----------
[] | A set of characters | "[a-m]"
\ | Signals a special sequence (can also be used to escape special characters) | "\d"
. | Any character (except newline character) | "he..o"
^ | Starts with | "^hello"
$ | Ends with | "planet$"
* | Zero or more occurrences | "he.*o"
+ | One or more occurrences | "he.+o"
? | Zero or one occurrences | "he.?o"
{} | Exactly the specified number of occurrences | "he.{2}o"
\| | Either or | "falls\|stays"
() | Capture and group |


In [192]:
txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

['ai', 'ai']


In [193]:
x = re.findall("Portugal", txt)
print(x)

[]


In [201]:
x = re.search("\s", txt) #searching for whitespace charaters

  x = re.search("\s", txt) #searching for whitespace charaters


In [203]:
print(x.start()) #first white space charater in txt

3


In [206]:
x = re.search("Portugal", txt)
print(x)

None


In [208]:
x = re.split("\s", txt)
print(x)

['The', 'rain', 'in', 'Spain']


  x = re.split("\s", txt)


In [210]:
x = re.split("\s", txt, 1) #total number of times we can split by \s
print(x)

['The', 'rain in Spain']


  x = re.split("\s", txt, 1) #total number of times we can split by \s


In [216]:
x = re.sub("\s", "9", txt) #replacing every \s with 9

  x = re.sub("\s", "9", txt) #replacing every \s with 9


In [217]:
print(x)

The9rain9in9Spain


In [222]:
x = re.sub("\s", "9", txt, 2) #only do sub 2 times

  x = re.sub("\s", "9", txt, 2) #only do sub 2 times


In [223]:
print(x)

The9rain9in Spain


In [224]:
x = re.search("ai", txt)

In [225]:
print(x)

<re.Match object; span=(5, 7), match='ai'>


The Match object has properties and methods used to retrieve information about the search, and the result:

.span() returns a tuple containing the start-, and end positions of the match.

.string returns the string passed into the function

.group() returns the part of the string where there was a match

In [227]:
print(txt)

The rain in Spain


In [231]:
x = re.search(r"\bS\w+", txt) #The regular expression looks for any words that starts with an upper case "S"
print(x.span())
print(x.string)
print(x.group())

(12, 17)
The rain in Spain
Spain


# Tuples

In [243]:
t = (1, 2.5, 'data')
type(t)

tuple

In [244]:
t = 1, 2.5, 'data'
type(t)

tuple

It is important to remember that Python uses zero-based numbering.

In [246]:
t[2]

'data'

In [247]:
type(t[0])

int

There are only two special methods that this object type provides: count() and
index().

In [251]:
t.count('data')

1

In [253]:
t.index('data')

2

Tuple objects are immutable objects. This means that they, once defined, cannot be changed easily.

# Lists

From a finance point of view, you can achieve a lot working only with list
objects, such as storing stock price quotes and appending new data. A list object is
defined through brackets and the basic capabilities and behaviors are similar to those
of tuple objects.

In [254]:
l = [1, 2.5, 'data']

In [255]:
type(l)

list

In [256]:
l[2]

'data'

In [None]:
l.append([4, 3])


In [258]:
l

[1, 2.5, 'data', [4, 3]]

In [259]:
l.extend([1.0, 2.0, 3.0])

In [261]:
l

[1, 2.5, 'data', [4, 3], 1.0, 2.0, 3.0]

In [262]:
l.insert(1, 'insert')

In [263]:
l

[1, 'insert', 2.5, 'data', [4, 3], 1.0, 2.0, 3.0]

In [264]:
l.remove('data')

In [265]:
l

[1, 'insert', 2.5, [4, 3], 1.0, 2.0, 3.0]

In [266]:
p = l.pop(3)

In [268]:
print(l, p)

[1, 'insert', 2.5, 1.0, 2.0, 3.0] [4, 3]


In [269]:
l[2:5]

[2.5, 1.0, 2.0]

| Method | Arguments | Returns/Result |
|--------|-----------|----------------|
| l[i] = x | [i] | Replaces i-th element by x |
| l[i:j:k] = s | [i:j:k] | Replaces every k-th element from i to j – 1 by s |
| append(x) | x | Appends x to object |
| count(x) | x | Number of occurrences of object x |
| del l[i:j:k] | [i:j:k] | Deletes elements with index values i to j – 1 and step size k |
| extend(s) | s | Appends all elements of s to object |
| index(x[, i[, j]]) | x[, i[, j]] | First index of x between elements i and j – 1 |
| insert(i, x) | i, x | Inserts x at/before index i |
| remove(x) | x | Removes element x at first match |
| pop(i) | i | Removes element with index i and returns it |
| reverse() | None | Reverses all items in place |
| sort([cmp[, key[, reverse]]]) | [cmp[, key[, reverse]]] | Sorts all items in place |


In [281]:
l_dash = l[2:5]
l_dash.sort()
print(l_dash)
print(l[2:5])

[1.0, 2.0, 2.5]
[2.5, 1.0, 2.0]


In [284]:
for element in l[2:5]:
    print(element ** 2)

6.25
1.0
4.0


In [285]:
r = range(0,8,1)

In [287]:
type(r)

range

In [288]:
for i in r:
    print(i)

0
1
2
3
4
5
6
7


In [289]:
for i in range(2,5):
    print(l[i] ** 2)

6.25
1.0
4.0


In [290]:
for i in range(1, 10):
    if i %2 == 0:
        print("%d is even"%i)
    elif i%3 == 0:
        print("%d is multiple of 3"%i)
    else:
        print("%d is odd"%i)

1 is odd
2 is even
3 is multiple of 3
4 is even
5 is odd
6 is even
7 is odd
8 is even
9 is multiple of 3


A specialty of Python is so-called list comprehensions. Instead of looping over existing list objects, this approach generates list objects via loops in a rather compact fashion.

In [291]:
m = [i**2 for i in range(5)]
m

[0, 1, 4, 9, 16]

Python provides a number of tools for functional programming support as well—i.e.,
the application of a function to a whole set of inputs (in our case list objects).
Among these tools are filter(), map(), and reduce(). However, one needs a function definition first. To start with something really simple, consider a function f()
that returns the square of the input x.

In [292]:
def f(x):
    return x**2

In [293]:
f(2)

4

In [294]:
def even(x):
    return x%2 == 0
even(3)

False

In [296]:
list(map(even, range(0, 10)))

[True, False, True, False, True, False, True, False, True, False]

In [298]:
list(map(lambda x: x**2, range(0, 10)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [299]:
list(filter(even, range(15)))

[0, 2, 4, 6, 8, 10, 12, 14]

# Dicts

They are so-called key-value stores.

In [300]:
d = {
    'Name' : 'Dhairya Kantawala',
    'Department' : 'Mathematics',
    'Age' : 20
}

In [301]:
type(d)

dict

In [302]:
print(d['Name'], d['Age'])

Dhairya Kantawala 20


In [306]:
d.keys()

dict_keys(['Name', 'Department', 'Age'])

In [307]:
d.values()

dict_values(['Dhairya Kantawala', 'Mathematics', 20])

In [308]:
d.items()

dict_items([('Name', 'Dhairya Kantawala'), ('Department', 'Mathematics'), ('Age', 20)])

In [309]:
birthday = True
if birthday:
    d['Age']+=1
print(d['Age'])

21


In [313]:
for item in d.items():
    print(item)

('Name', 'Dhairya Kantawala')
('Department', 'Mathematics')
('Age', 21)


In [314]:
for value in d.values():
    print(value)

Dhairya Kantawala
Mathematics
21


| Method | Arguments | Returns/result |
| --- | --- | --- |
| d[k] | [k] | Item of d with key k |
| d[k] = x | [k] | Sets item key k to x |
| del d[k] | [k] | Deletes item with key k |
| clear | () | Removes all items |
| copy | () | Makes a copy |
| items | () | Iterator over all items |
| keys | () | Iterator over all keys |
| values | () | Iterator over all values |
| popitem | (k) | Returns and removes item with key k |
| update | ([e]) | Updates items with items from e |


In [336]:
d.update({'Height': "5'9"})

In [337]:
d['Height']

"5'9"

In [338]:
d.clear()

In [339]:
d

{}

# Sets

The objects are unordered collections of other
objects, containing every element only once.

In [340]:
s = set(['u', 'du', 'u', 'du'])

In [341]:
s

{'du', 'u'}

In [342]:
t = set(['f', 'u'])

In [343]:
t

{'f', 'u'}

In [344]:
s.union(t)

{'du', 'f', 'u'}

In [345]:
s.intersection(t)

{'u'}

In [347]:
s.difference(t) #items in s but not in t

{'du'}

In [348]:
s.symmetric_difference(t)

{'du', 'f'}

In [349]:
from random import randint
l = [randint(0, 10) for i in range(1000)]
len(l)

1000

In [350]:
s = set(l)

In [351]:
s

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}