# Lists

A list is a collection data type that is ordered, mutable and allows duplicate elements. In Python lists can be defined by using brackets, and we can add elements using coma. Lists can contain integer, boolean, string etc. To get an element from the list we can use the [] operator, which takes as an input the index of the intended element. Python uses iterable lists, which allow for loops to iterate through the collection.

In [11]:
list = ["banana", "cherry", "apple"]

# Iterate through list
for element in list:
    print(element)

banana
cherry
apple


Checking if an element is in the list is also easy. Although list have *contains()* method, this can be simply replaced by an *in* operator which better approximates human language. Similarly, concatanating can be simply conducted with the *+* operator. Other methods that are implemented in python are sorting, filtering, merging etc. The documentation can be viewed in

In [30]:
# Chech containment
if "apple" in list:
    print("Apple is in the list")

# Example for + and * operators
new_list = ["watermelon"] * 3
list = list + new_list
print(list)

Apple is in the list
['banana', 'cherry', 'apple', 'watermelon', 'watermelon', 'watermelon']


Slicing is a very important way we can access parts of the list using a colon. The operator is used similarly to indexing, but it returns a list. If we don't specify a start, then it start all the way from the beginning, and if we don't specify a stop index, then it goes all the way to the end. We can also give an optional step index. This by default is one.

# Tuples

Tuples are also a collection data type that is ordered and imutable. The difference between tuples and lists, is that tuples cannot be changed. Tuples are often used while returning the solution of a function, since we don't want any inconsistencies between the function and the program. If we try to change the value of a tuple we will get a type error. Besides that, working with tuples corresponds with working working with lists.

In [47]:
# Define tuple with parantheses
tuple = ('a', 'p', 'p', 'l', 'e')

# Slicing is also recognized
a = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
b = a[2:5]
print(b, type(b))

(3, 4, 5) <class 'tuple'>


Using tuples for returning function elements also gives us the opportunity to return multiple separate elements, because tuples can be segmented using assignemt to multiple variables. This is referred as unpacking

In [50]:
# Unpacking with tuple
tuple = "Maria", 3, False
a, b, c = tuple
print(a, type(a), b, type(b), c, type(c))

Maria <class 'str'> 3 <class 'int'> False <class 'bool'>


A lits is always larger because it uses more bytes for the same elements. Therefore it is more efficient to work with tuples than with lists. This is partly due to different compiling of lists and tuples, so it is considered a management overhead.

In [55]:
# Size difference between lists and tuples
import sys
list = [0, 1, 2, "hello", True]
tuple = (0, 1, 2, "hello", True)
print(sys.getsizeof(list), "bytes")
print(sys.getsizeof(tuple), "bytes")

# Efficiency difference between lists and tuples
import timeit
print(timeit.timeit(stmt="[0, 1, 2, 3, 4, 5]", number=1000000))
print(timeit.timeit(stmt="(0, 1, 2, 3, 4, 5)", number=1000000))

104 bytes
80 bytes
0.0557221000053687
0.011612999995122664


# Dictionaries

Dictionary is a collection data type that is unordered and mutable. It consists of a collection of key-value pairs. So each key-value pair maps the key to its associated value. Values are referenced using the key. Trying to index a non existing key will return a Type error. Interestingly we can reference values without using the key, but, of course, it will be inefficient.

In [82]:
my_dict = {"name": "Max", "age": 28, "merried": False}
value = my_dict["age"]
print(value)

28


Not only we can check if a key is in the dictionary but if a value is contained as well. Consequently we can loop through both values and keys.

In [87]:
# If statement can be replaced by try-catch block as well
if "name" in my_dict:
    print("The name in the example is", my_dict["name"])

# Looping through the dictionary
for key, value in my_dict.items():
    print(key, value)

The name in the example is Max
name Max
age 28
merried False


When copying a dictionary we have to be careful. Assignment operator doesn't copy the values of variables but the pointer of the dictionary. Therefore manipulating one dictionary will inevitably affect the other dictionary as well. It is wise to use other functions for copying dictionaries. We can also merge dictionaries. Duplicate key-value elements will be simplified, but in case of different values for the same key, the second operand's values will be relevant.

In [89]:
# How not to do it
new_dict = my_dict
new_dict["email"] = "max@gmail.com"
print(my_dict)

# Instead use copy function
new_dict = my_dict.copy()
new_dict["gender"] = "Male"
print(my_dict)

# Merging two dictionaries
new_dict["age"] = 29
my_dict.update(new_dict)
print(my_dict)

{'name': 'Max', 'age': 28, 'merried': False, 'email': 'max@gmail.com'}
{'name': 'Max', 'age': 28, 'merried': False, 'email': 'max@gmail.com'}
{'name': 'Max', 'age': 29, 'merried': False, 'email': 'max@gmail.com', 'gender': 'Male'}


Note: Almost enithing can be used as a key, if it is inmutable. For example a tuple can be used as a key, but a list cannot.

# Sets

A set is a collection data type that is unordered and mutable. But unlike lists, or tuples, it does not allow duplicate elements. Similar to the mathematical set theory if we insert a value in a set that is already contained, then the set does not include it. Adding and removing elements are conducted by *add()* and *remove()* functions for specific elements, but also with *pop()* method, which removes the element that has been included for the last time. Sets have been often used to mirror the behaviour of mathematical functions like union, intersection etc. These functions do modify the sets. For this purpose special, *update()* functions are defined for each method. 

In [110]:
odds = {1, 3, 5, 7, 9}
evens = {0, 2, 4, 6, 8}
primes = {2, 3, 5, 7}

# Returns non duplicate elements from each set
union = odds.union(primes)
print(union)

# Returns element contain by both odds and prime sets
intersection = odds.intersection(primes)
print(intersection)

# Return all elements from the first set that are not in the second set
difference = evens.difference(primes)
print(difference)

# Returns all elements from both sets, that are not shared.
symmetric_difference = evens.symmetric_difference(primes)
print(symmetric_difference)

{1, 2, 3, 5, 7, 9}
{3, 5, 7}
{0, 8, 4, 6}
{0, 3, 4, 5, 6, 7, 8}


Furthermore we can easily calculate if a set is a *subset* or a *superset* of another set. A subset means that all the elements of the first set are contained by the second set. The opposite is called the *superset* method, in which case it returns true, if the first set contains all the element from the second set. We can also calculate disjoint set, if the two sets do not share elements.

In [118]:
setA = {1, 2, 3, 4, 5, 6, 7, 8, 9}
setB = {1, 2, 3}
setC = {0, 5, 6, 7}

# All of the elements of a subset must be contained by another set
print("Set A is the subset of the B set: ", setA.issubset(setB))

# Superset contains all the elements of another set
print("Set A is the superset of the B set: ", setA.issuperset(setB))

# Disjoint is true if the intersection is null
print("Set A is disjoint with set C: ", setA.isdisjoint(setC))
print("Set B is disjiont with set C: ", setB.isdisjoint(setC))

Set A is the subset of the B set:  False
Set A is the superset of the B set:  True
Set A is disjoint with set C:  False
Set B is disjiont with set C:  True


The **frozen set** is also a collection type that is the immutable version of a normal set.

# Strings

A **String** is an ordered and immutable collection data type, that is used for text representation, therefore is one of the most used data types in Python. A string is created with either single or double quotes. If a quote is part of string, then it is recommended to use double quotes for defining the string, as single quotes will be interpreted as part of the string. 

In some cases triple quotes are used if the text spans multiple lines. In this case backslash is used, if we want to wrap the text in code, without adding a new line character. Be careful, as triple quotes are used for code documentation as well.

As a list, characters can be accessed by indexing the string. Keep in mind that the string is inmutable, therefore character assignment throws an exception. Furthermore all methods used with lists can be used with lists as well.

In [183]:
my_string = "Hello World"

# Slicing is very useful in case of strings
print("Using slicing: ", my_string[:5])
print("A string can be easily reversed: ", my_string[::-1])

# Checking for containment with in operand
if 'Hello' in my_string:
    print("We can check for containment easily")

Using slicing:  Hello
A string can be easily reversed:  dlroW olleH
We can check for containment easily


Although, most important string methods are those, which change the semantics of the string value. Not only does Python offer a variety of methods for formating a string such as *upper()*, *lower()*, *strip()*, etc., but also includes standard formulas such as *startswith()*, *endswith()*, and of course pattern matching. Basically every SQL formula can be expressed by using Python built in string functions. 

In [185]:
# Check for greetings
print("Have it greeted us?: ", my_string.startswith("Hello"))

# Replace substring in string
print("Computer started: ", my_string.lower().replace("world", "david").title())

Have it greeted us?:  True
Computer started:  Hello David


Because string and lists are very similar we can convert them easily. For that we have to use a delimitar paramether that specifies the start end end of each element. If no delimiter is specified than each character is a separate list element. For merging a list of characters or strings back into a string type, we recommend using the *join()* function, as the *+* operator always creates a new string (because strings are immutable), and therefore is expensive.

In [219]:
from timeit import default_timer as timer

# Separate string into a list
new_string = "how,are,you,doint"
new_list = new_string.split(',')
print("New string into a list: ", new_list)

# Joining the list elements to a string using a function
start = timer()
new_string = ''.join(new_list)
stop = timer()
print(f"Joining: {new_string} --Time: {stop - start}")

# Concatanating list elements back into a string
start = timer()
new_string = ''
for character in new_list:
    new_string += character
stop = timer()
print(f"Concatanating: {new_string} --Time: {stop-start}")

New string into a list:  ['how', 'are', 'you', 'doint']
Joining: howareyoudoint --Time: 3.1099989428184927e-05
Concatanating: howareyoudoint --Time: 3.8300000596791506e-05


# Collections

The collections module implements special container data types, and provides alternatives with additional functionality compared to the general collections like dictionaries, lists or tuples. So we will be talking about five different types from the collections module: The counter the named tuple, the ordered dictionary, the default dictionary, and the deque:

- **Counter:** The counter is a container that stores the elements as dictionary keys, and their counts as dictionary values. We can also ask for the most common element in the dictionary.
- **Named Tuple:** The named tuples are an easy to create and lightweight object types, similar to a struct. The first argument is always the class name. The second argument are the names of the field in the class, separated by either comma or space.
- **Ordered Dictionary:** Just like a normal dictionary, but it remembers the order in which the elements where added. Therefore we can index elements, put() and pop(). Note, that in newer Python versions, the default dictionary type also remembers the order of elements.
- **Default Dictionary:** Is also similar to a dictionary container, with the only difference is, that it will have a default value is the keys have been not set yet, depending on the type of the dictionary it was set
- **Deque:** Is a double ended queue, and it is used to add and remove elements from either end. This is implemented in such a way, that it is very efficient. It can be also rotated.

In [290]:
# Example for counter
from collections import Counter
my_counter = Counter("aaaaaaabbbbbbcccccc")
print(f"The counter: {my_counter} and the two most common type elements are: {my_counter.most_common(2)}")

# Example for named tuple
from collections import namedtuple
Point = namedtuple('Point', 'x,y')
pt = Point(1, -4)
print(f"The namedtuple: {Point} is a struct that has a entity: {pt}")

# Example for ordered dictionary
from collections import OrderedDict
ordered_dict = OrderedDict()
ordered_dict['First Name'] = "Varga"
ordered_dict['Last Name'] = "David"
print(f"The ordereddict: {ordered_dict}")

# Example for default dictionary
from collections import defaultdict
default_dict = defaultdict(int)
default_dict['a'] = 1
default_dict['b'] = 2
print(f"The defaultdict: {default_dict}, the default value of int is {default_dict['c']}")

# Example for deque
from collections import deque
my_deque = deque(my_list)
print(f"The deque: {my_deque} pop from left: {my_deque.popleft()} - and extend it: {my_deque.extendleft(['dear', 'friend'])}")

The counter: Counter({'a': 7, 'b': 6, 'c': 6}) and the two most common type elements are: [('a', 7), ('b', 6)]
The namedtuple: <class '__main__.Point'> is a struct that has a entity: Point(x=1, y=-4)
The ordereddict: OrderedDict([('First Name', 'Varga'), ('Last Name', 'David')])
The defaultdict: defaultdict(<class 'int'>, {'a': 1, 'b': 2}), the default value of int is 0
The deque: deque(['how', 'are', 'you', 'doint']) pop from left: how - and extend it: None
