### Collections

**Collections** are collections of objects of the same (or not the same) type. 100 cats, 10 dogs, [watermelon, pumpkin, apple] all are examples of collection.

There are several datatypes in Python that help you store collections in a suitable way. Let's dive in

### Lists

List is a sequence of values. A list could contain elements of various types.

In [None]:
list1 = ['apple', 5, True]

You can even put a list inside a list, then it will be called nested list

In [None]:
list2 = ['apple', ['tomato', 'fish'], 5]

You also can create an empty list, without any elements

In [None]:
empty_list = []

#### Indexing

Lists support indexing starts at 0 (as any indexed thing in Python). However, it is possible to use negative indices

![Пример индексации](https://cdn-images-1.medium.com/max/1600/1*ew02kFoUbAqhe2ozFnDa8g.png)



In [None]:
my_list2[1]

In [None]:
my_list2[-1]

Small hint about indexing 

![Индексация строк](https://cdn-images-1.medium.com/max/1555/1*HqdR129XR8-1ojtTYLou7g.png)

#### Slices

In the same way as for strings, you can take list slices

In [None]:
my_list2[0:-1]

In [None]:
my_list2[-1:]

In [None]:
my_list2

You can specify the step that shows spacing between values

In [None]:
my_list2[1::1]

In [None]:
my_list2[0::2]

In [None]:
my_list2[::-1]

#### Loop through the elements of the list (iteration)

You can walk through the list simply by indexing each of its elements in turn. To implement this a **len** function is useful. It returns the length of a list.

In [None]:
my_list3 = [1, 7, -5, 0, 32]

In [None]:
for i in range(len(my_list3)):
    print (my_list3[i])

Generally, to iterate over the elements of collections in Python you can use the **for** loop directly!

In [None]:
for elem in my_list3:
    print(elem)

**for** can be applied to strings as well. In this case, you will iterate over the characters of the string

In [None]:
s = "Hello, world"
for c in s:
    print(c)

**List operations**

Lists can be added (**concatenated**, actually)

In [None]:
a = [2,3,4]
b = [-5, 10]
a + b

In [None]:
a - b # And what did you expect to see?)

In [None]:
a * b # same thing

The list can be repeated several times by multiplying it by a number, however, this is not recommended for use except in the most obvious cases (we will analyze why later)

In [None]:
[0] * 5

You can check if an item is in the list using the **in** keyword

In [None]:
a = [1,3,10]
if 5 in a:
    print ("5 is in a")
else:
    print ("5 is not in a")

Iterating through the elements of the container has a peculiarity - a variable that is sequentially assigned with values of items from the container can be used later.
This is useful when, for example, we are looking for the first element that satisfies a given property.

In [None]:
a = [1, 3, 4, 100, 34]
for i in a:
    if i % 2 == 0:
        break
print(i)

But if no item was found, then `i` will take the value of the last item in the list.

In [None]:
a = [1, 3, 7, 11]
for i in a:
    if i % 2 == 0:
        break
i

Another problem is that if `i` was not previously declared, and we decided to go through an empty list, then `i` will remain undeclared and we will receive an error

In [None]:
del i # remove variable i
a = []
for i in []:
    if i % 2 == 0:
        break
i

Both problems could be solved by using **else** in a **for** loop. 

Note that **else** goes on the same level as **for** and refers to it!

**else** in this case is executed only if the exit from loop was not performed using break (i.e. we went through all the values ​​in the container)

In [None]:
a = [1, 3, 7, 11]
for i in a:
    if i % 2 == 0:
        break
else:
    i = None 
    print("No even elements")
print (i)

In [None]:
a = []
for i in a:
    if i % 2 == 0:
        break
else:
    i = None
    print("No even elements")
print (i)

If the required element is still there, then everything works well!)

In [None]:
a = [1, 3, 7, 12]
for i in a:
    if i % 2 == 0:
        break
else:
    i = None
    print("No even elements")
print (i)


#### Checking a list for emptiness

Works not only for a list, but for all types of collections/containers

In [None]:
if not a: # if a is empty
    print("HI")

b = [1]
if  b: # if b is not empty
    print("HI")

#### Mutability of lists

In the case of strings, if you try to change a character, an error will pop up

In [None]:
a = "Hello, world!"
a[-1] = "?"

So you have to do something like this:

In [None]:
a = "Hello, world!"
b = a[:-1] + "?"
print(a)
print(b)

It's much easier with lists. They are **mutable** objects

In [None]:
a = [1, -3, 10, 6]

In [None]:
a[-1] = 5
a

In [None]:
a[0:2] = [0,1]

In [None]:
a

### Changeable objects. Tasks.

In [None]:
a = [10, 5, 4]
b = a
a[-2] = 7

What's in b? 

In [None]:
b

What's the secret?
If we recall the analogy about a cat and a box, then everything becomes clear
![cat](http://vokrugkota.ru/media/tumblr_mocf77rFWi1qb8jcho6_1280.jpg)

In your computer memory, you have a great number of boxes. In one of them you put a cat (in this case, a list).
But at the same time, you decided to call this box both **a** and **b**. Python didn't create a new cat in a new box. But now this box has two names. Therefore, teasing the cat in the **a** variable, you risk getting scratched hands in the **b** variable (meow)

In [None]:
colors = ['red', 'blue', 'green']
b = colors

![colors](https://developers.google.com/edu/python/images/list2.png)

To prevent this behavior in Python, you'll have to make a copy of the list!

In [None]:
a = [10, 5, 4]
b = a[:] # take a slice of the entire list - thus copying it 
a[-2] = 7
print ("a", a)
print ("b", b)

It is also possible to use the built-in Python function - **list**

In [None]:
a = [10, 5, 4]
b = list(a) # copying a list
a[-2] = 7
print ("a", a)
print ("b", b)

In [None]:
list("ATGCGAGCCG")

You can also use Python's mutable objects method - **copy**

In [None]:
a = [10, 5, 4]
b = a.copy()
a[-2] = 7
print ("a", a)
print ("b", b)

After the usual copying procedure, changes in lists will be propagated if we do not take mutable objects nested in them (for example, lists). However, if you change the nested list, then the changes will affect both lists.

In [None]:
lst1 = ['a','b',['ab','ba']]
lst2 = lst1.copy()
a[-1][0] = 5
lst2[2][1] = "d"
lst2[0] = "c";
print(lst1)
print(lst2)

![shallow_copy](https://www.python-course.eu/images/deep_copy_4)

For nested lists and simply mutable objects in mutable objects use `deepcopy`

In [None]:
from copy import deepcopy

lst1 = ['a','b',['ab','ba']]

lst2 = deepcopy(lst1)

lst2[2][1] = "d"
lst2[0] = "c";

print(lst1)
print(lst2)


![deep_copy](https://www.python-course.eu/images/deep_copy_5)

### List methods

The list, like many other built-in types in Python, has a set of functions designed to work with it and called using a construction like **lst.method_name**

**append** - Appending to the list

For example, we can create a list of all odd numbers up to 20

In [None]:
lst = []
for i in range(20 + 1):
    if i % 2 == 1:
        lst.append(i)
        
print (lst)

**extend** - add to the end of our list all elements from another

In [None]:
a = [1,4]
b = [1, 2, 7]
c = a + b
print(a, b, c)
a[1] = 5
print(a, b, c)


In [None]:
lst = [1, 4]
lst.extend([1,2,7])
lst

**count** - count the number of occurrences of an element in a list

In [None]:
lst = [1, 7, -5, 6, 0, 1]
lst.count(1)

**sort** - sort the list in place

In [None]:
lst.sort()
lst

**index** - find the first index of the value in the list, if there is no value, then an error will be returned

In [None]:
lst.index(0)

### String methods

For strings, in addition to the `split` method parsed in the last lesson, there are also many other methods. Note that, unlike a list, if these methods involve modifying a string, they will return a NEW string.

**strip** - removes line breaks and other 'space' characters at the edges of the line. There are analogues - `rstrip` and `lstrip`, which remove these characters from one side only.

In [None]:
a = "   hello \n"
b = a.strip()
print (b)
print("START:", a, "END")

**replace** - replace all occurrences of one substring with another

In [None]:
a = "hello, hello, hello"
b = a.replace("hello", "hell")
print(b)

**find** - find an occurrence of a substring in a string. If there is no substring in the string, then return -1

In [None]:
a = "Hello, world"
a.find("world")

In [None]:
a.find("wild")

In [None]:
a = "hello, hello, hello"
f_ind = a.find("hello")
print (f_ind )
print (a.find("hello", f_ind  + 1)) # find from the specified position

In [None]:
?a.find

To check if a string is contained in another string, one can simply use the **in** keyword

In [None]:
print ("wo" in "world")
print ("he" in "world")

**startswith** - if the string begins with the given substring

In [None]:
print (a.startswith("hello"))
print (a.startswith("ello"))

**endswith** - if the string ends with the given substring

In [None]:
print (a.endswith("hello"))
print (a.endswith("he"))

**join** - join multiple strings separated by specified separator symbol

In [None]:
", ".join(["1", "2", "apple"])

In [None]:
a = list("ATGATAGATAG")
a

In [None]:
"".join(a)

# Dictionaries

Often we are faced with the task of finding a different value corresponding to it for some unnumbered meaning. For example, according to the person's full name, her passport number

**Dictionary** (dict) is a data type that stores the correspondence of some values ​​to others (dictionary keys).
The easiest way to create a dictionary is to use curly braces ({})

In [None]:
my_dict = { "Dmitry" : 5,
            "Alexander" : 10,
            "Jupyter": 20}
my_dict

To get the value corresponding to the key, you need to type the name of the dictionary and the key in square brackets.

In [None]:
my_dict['Jupyter']

If there is no key with such a value, you will get an error:

In [None]:
my_dict['Dogs']

You can use the `dict.get(key, default_value=None)` method in order to get a default value in the absence of a key in the dictionary

In [None]:
print (my_dict.get("Dogs")) # return None 

In [None]:
my_dict.get("Dogs", 5)

In [None]:
my_dict.get("Jupyter", 5)

#### Checking for a key in a dictionary

As with other collections, the command **in** is used to check for a key in the dictionary.

In [None]:
if "Dogs" in my_dict:
    print("Hello")
else:
    print("World")

### Loop through dictionary keys

Loop through dictionary keys can be done in two ways

Note that dictionary methods that 'return' what is stored in it return not-quite-lists, this cannot be treated like a list

In [None]:
my_dict.keys()

In [None]:
for key in my_dict.keys():
    print (key)

In [None]:
for key in my_dict:
    print (key)

### Loop through dictionary values

In [None]:
my_dict.values()

In [None]:
for value in my_dict.values():
    print (value)

Loop through the keys and values ​​of a dictionary

In [None]:
for key in my_dict.keys():
    value = my_dict[key]
    print (key, value)

In [None]:
my_dict.items()

In [None]:
for key, value in my_dict.items():
    print (key, value)

In [None]:
list(my_dict.keys())

In [None]:
list(my_dict.values())

In [None]:
list(my_dict.items())

### Adding items to the dictionary


A dictionary can store any object as a value, but as a key it needs special objects, in the first approximation immutable - numbers and strings, for example.

In [None]:
d = dict() # another way to create dict
# d = {}
d[10] = "Hello"
d

In [None]:
d["hello"] = "world"
d

In [None]:
d[10] = -100
d

In [None]:
my_key = "hi"
my_value = [1,2,3]
d[my_key] = my_value
d

In [None]:
d[[1,2,3]] = 5  

# error, list is mutable

### Removing items from a dictionary

In [None]:
a = {"A" : 0,
     "T" : 1, 
     "G": 2, 
     "C" : 3, 
     "G" : 4, 
     "U" : 5}


In [None]:
a

In [None]:
del a['U']
a

In [None]:
del a['U']

You can use the `pop` method - it is more correct

In [None]:
a = {"A" : 0, "T" : 1, "G": 2, "C" : 3, "G" : 4, "U" : 5}
a.pop('U')
a

In [None]:
a.pop('U')

In [None]:
a.pop('U', None)

**But you should't do like this!**

In [None]:
a = {"A" : 0, 
     "T" : 1,
     "G": 2, 
     "C" : 3, 
     "G" : 4,
     "U" : 5}

for key in a.keys():
    if key == "C":
        a.pop(key)
    

You can work around the problem like this:

In [None]:
a = {"A" : 0, "TC" : 1, 
     "G": 2, "CG" : 3, 
     "GC" : 4, "U" : 5}

keys = list(a.keys())
for key in keys:
    if "C" in key:
        a.pop(key)
a

### Key order in the dictionary

Prior to Python version 3.5, no one guaranteed you any order of the keys in the dictionary. So, how they were given to you by the keys function, etc., did not depend neither on the order of insertion of elements, nor on the result of their comparison directly.

However, since the Pythonversion 3.7, **the keys order** in the dictionary matches the **order of their insertion** into it. If the key already existed in the dictionary and you overwrote it, then the order of the keys will not change.

### Advantages of dictionaries

Retrieving a value by key and adding a new key to the dictionary is much faster than if you had to iterate over the list looking for the desired value.
As a result, there is a huge set of tasks where dictionaries could and should be used.
Together with lists dicts provide the ability to present a data structure of any complexity.

### Disadvantages of dictionaries

Dictionaries consume a lot of memory.
All dictionary operations are fast on average - a single operation can take a very long time.

# Tuples

If you really want to make a list to be the key of the dictionary, then you need to use a tuple. At first glance, a tuple differs from a list only by replacing square brackets with parentheses, for example:

In [None]:
ta = (1, 2, 5, 4)
ta[0]

In [None]:
a = (1)
print (a)

a = (1, )
print(a)

A tuple is a lot like a list, but the main difference is that a tuple is immutable.

In [None]:
ta[0] = 5

In [None]:
dt = {}

dt[(1, 2)] = "Hello"
dt[(2, 1)] = "world"
dt

In [None]:
dt[(1,2)]

Apart from being able to serve as a dictionary key, in the case of tuples, access operations (get an element of a tuple, slice, etc.) are faster. Python can optimize a program that creates many small tuples which is not available for lists. 

On the other hand, any operation of editing a tuple leads to the creation of a new tuple, which in large quantities can slow down the program and eat up your computer's memory.

# Set

In some cases, it is convenient to use the set data type. This is exactly a set in the mathematical sense: its elements are unique and cannot be repeated (unlike in a list or tuple).

In [None]:
s = {-10, 20, 45}
s.add(348)
s.add(-100000)
s.add(-10)
s

In [None]:
my_set1 = {1, 7, 9} # first way to create set
my_set2 = set([0, -4, 10, 1]) # second way


In [None]:
lst = [1, 10, -5, -5, 20]
my_set = set(lst)
my_set

In [None]:
for elem in my_set:
    print(elem)

Sets are used in a typical solution to keep only unique items in the list

In [None]:
lst = [1, 10, -5, -5, 20, -10, 20, 0, 0, 0]
my_set = set(lst)
unique_lst = list(my_set)
unique_lst

Operations of union, intersection and difference are defined on sets.

In [None]:
my_set1 & my_set2 # intersection

In [None]:
my_set1 | my_set2 # join

In [None]:
my_set1 - my_set2 # difference 

In [None]:
my_set1.symmetric_difference(my_set2)

You can check for the presence of an element in a set using the already known **in** keyword

In [None]:
5 in my_set1

In [None]:
nucleotides = {"A", "G", "T", "U", "C"}

n = input()

if not (n in nucleotides):
    print ("error")


In [None]:
nucleotides = {"A", "G", "T", "U", "C"}

n = input()

if n not in nucleotides:
    print ("error")


Set is a mutable type (cannot be a dictionary key, but can be a dictionary value). Mutability is needed to use the `add`, `pop`, and `remove` methods that are convenient in some situations

In [None]:
print (my_set1.pop())
my_set1

In [None]:
my_set1.add(5)
my_set1

### Order of elements in a set

The order of the elements in the set is still not guaranteed and does not depend on anything. Your launch result may differ, for example)

In [None]:
s = {-10, 20, 45}
s.add(348)
s.add(-100000)
s

## Frozenset

The immutable counterpart to `set` is **frozenset**. Unlike a regular set, it can be used as a key in a dictionary.

In [None]:
my_frozen = frozenset([4, 5, 6])
my_frozen

In [None]:
my_frozen.add(4)

###  Collections

The `collections` module has several modifications to the standard dictionary that make some things much easier (for example, counting different items in a list)

#### Counter

In [None]:
from collections import Counter

nucleotides = ["A", "T", "G", "C"]
seq = "ATAATATATATGAGGCGGCGCGCGCG"
cnt = Counter(seq)
print(cnt)
for n in nucleotides:
    print (cnt[n])

#### defaultdict

In [None]:
from collections import defaultdict

dl_dict = defaultdict(list) 
# for each key that is not in the dictionary, but we asked for it - create an empty list by default
print(dl_dict)
print(dl_dict[1])
print(dl_dict)
dl_dict[1].append("A")
dl_dict[1].append("A")
dl_dict[2].append("A")
dl_dict[100].append("C")
print ("dict: ", dl_dict, sep="   ")
print ("dict[1]", dl_dict[1], sep="   ")
print ("dict[20]", dl_dict[20], sep="   ")
print ("dict", dl_dict, sep="  ")