# Week 2
Here, collections are covered, with information mostly taken from last year Python programming fundamentals notebook, with added exercises.

## Collections: lists, tuples, sets, dictionaries

We can store several values together in a collection. The simplest and most intuitive Python collection is a list. To create a list, square brackets are used:
```python
my_list = []
type(my_list)
```

Lists can store any values: numbers, strings, booleans, other variables, even other lists or collections.

A list can be created with elements already provided:
```python
my_list = [1, 2.0, "three", [4, 5]]
```

Elements in lists are ordered, so each element can be accessed using an index. The process of getting a value by an index is called indexation:
```python
my_list[0]
```

You can select multiple elements by specifying a range:

```python
my_list[1:3]
```
Here, the syntax is **start:end**. Remember that numbering begins at 0 in Python. To select elements counting from the back of the list, use negative numbers.

If you try to get an element by a non-existing index (bigger than the length of the list), you will get an error (`IndexError`):

```python
my_list[4]
```

Main functions to know and use with lists:
* `len(x)` returns the length of the `x` list. Returns a new value.
* `x.append(a)` appends the value of the `a` variable to the end of the `x` list. Does it **in-place**.
* `x.insert(index, a)` inserts `a` in the place specified by the index. Does it **in-place**.
* `x.extend([a])` extends the list x with another list. Does it **in-place**.
* `x.remove(a)` will remove first occurence of the `a` value from list. Does it **in-place**.
* `x.pop(a)` removes the element of index `a`, and also returns the value of `a`. Does it **in-place**.
* `x.clear()` removes everything from the list. Does it **in-place**.
* `x.index(a)` finds the index of the `a` value inside `x`. Returns a new value.
  * If there are copies of `a` values, will return the index for first occurence: `[1, 2, 2, 3].index(2)` will always return 1.
* `x.count(a)` counts the number of `a` values inside `x`. Returns a new value.
* `sorted(x)`  returns a new list with is a sorted copy of `x`. Values are sorted from smaller to bigger.
* `x.sort()` sorts the list **in-place**.
* `''.join(x)` converts the list to string by concatenating all values together.  Returns a new value.
  * You can add a symbol between values, e.g. `','.join(['a', 'b', 'c'])` will return `'a,b,c'`.

**Understand the difference between inplace changes and creating copies**

Let's check two very similar functions: `sorted(x)` and `x.sort()`:
```python
x = [5, 7, 6, 8]
print('The sorted function returns a copy:', sorted(x))
print('You can see that the original lis}t remains unchanged:', x)
print('But .sort() does not return anything: ', x.sort())
print('And the original list is now changed:', x)
```

In [1]:
my_list = [5, 6, 7, 8, 2, 10, 3]
len(my_list)
sorted(my_list)

[2, 3, 5, 6, 7, 8, 10]

### Exercises:

1. Create a list containing 5 elements of 3 types and print it
2. Print the 3rd element of this list
3. Append an element to the list
4. Add an element at the second index in the list
5. Add a `True` value to the list in the last element of a given list. Then, print it to the screen.

  `test_list = [1, 2, 3, 4, 100, 200, 300, 400, "a", "b", "c", "d", [], [], [], []]`
6. Remove any three elements from the list.
7. When dealing with unknown data structures, indexing often takes getting used to. A messy, multi-level list is given below. Select and print the `'target'` element.

  ```python
messy_list = [[[[1, 2, 3],[4, 5, 6]],[[7, 8, 9],[10, 11, 12]]],[['target', 14, 15],[16, 17, 18]],[[19, 20, 21],[22, 23, 24]]]
```

In [2]:
some_list = [5, True, 'word', 5., 9]
print(some_list)
print(some_list[2])
some_list.append('new_element')
print(some_list)
some_list.insert(2, 9.14)
print(some_list)
some_list.insert (1, True)
print(some_list)

[5, True, 'word', 5.0, 9]
word
[5, True, 'word', 5.0, 9, 'new_element']
[5, True, 9.14, 'word', 5.0, 9, 'new_element']
[5, True, True, 9.14, 'word', 5.0, 9, 'new_element']


In [6]:
test_list = [1, 2, 3, 4, 100, 200, 300, 400, "a", "b", "c", "d", [], [], [], []]
test_list[-1].append(True)
print(test_list)
test_list.pop(4)
test_list.pop(1)
test_list.pop(2)
print(test_list)

[1, 2, 3, 4, 100, 200, 300, 400, 'a', 'b', 'c', 'd', [], [], [], [True]]
[1, 3, 200, 300, 400, 'a', 'b', 'c', 'd', [], [], [], [True]]


In [7]:
messy_list = [[[[1, 2, 3],[4, 5, 6]],[[7, 8, 9],[10, 11, 12]]],[['target', 14, 15],[16, 17, 18]],[[19, 20, 21],[22, 23, 24]]]
print(messy_list[1][0][0])


target


Lists allow inplace changes. In programming, such objects are called **mutable**. Usually it is very convenient, but sometimes collections are preferred to be stable and unchangeable.

There is a way to forbid any changes of a collection: use **tuples** instead of lists!

### Tuples

Tuples (`tuple` objects) are ordered but **immutable** collections.

They are defined with round brackets or the `tuple()` command. Let's see how tuples are immutable:

```python
my_list = [1, 2, 3]
my_list[1] = 100
print(my_list)  # we see that the list has a changed element
```

```python
my_tuple = (1, 2, 3)  # round brackets instead of square ones
my_tuple[1] = 100  # throws a TypeError!
```

Some of list-associated commands work with tuples as well, e.g.:
* `len(t)` - get length of the `t` tuple
* `sorted(t) ` - but not `t.sort()`!
* `t.count(a)`
* `''.join(t)`
* etc.

### Sets

Both lists and tuples can hold duplicate values inside. If you need a collection of unique values, you should use sets (`set` objects). They are the same as those defined in mathematics (*LT - aibė*).

Sets are defined with the `set()` command. You can also sometimes use the curly brackets:
* If you will write something like `{1, 2, 3}`, it will be recognized as a set.
* However, `{}` will create an empty dictionary instead.

```python
my_set = {1, 2, 2, 3}
print(my_set)  # notice that the duplicated 2 was automatically removed
```

You can use the `set()` function to get the unique values from any collection:

```python
my_list = [1, 1, 1, 2, 2, 3]
print(set(my_list))
my_tuple = (1, 1, 1, 2, 2, 3)
print(set(my_tuple))
```
Sets are **unordered** and cannot be indexed:
```python
my_set[0]  # returns a TypeError
```

You can use the `sorted()` function on a set, but it will return a list:

```python
a = {1, 3, 2}
print(type(a))
print(type(sorted(a)))
```

Sets have three main operations:
* Union (∪), or `|` in Python
* Intersect (∩), or `&` in Python
* Except (-, minus, difference), or `-` in Python
```python
set_0 = {1, 2, 3, 4, 5}
set_1 = {3, 4, 5, 6}
print('Union:', set_0 | set_1)
print('Intersection:', set_0 & set_1)
print('Set 0 minus set 1:', set_0 - set_1)
```
.Sets also have some other functions (not that sets are **mutable**):
* `s.add(a)` - add the value of `a` variable (if it is not there already). In-place.
* `s.discard(a)` - remove the value of `a` (if it is not there already). In-place.
* `s.update(s1)` - add all values from the `s1` set (basically a union of `s` and `s1`). In-place.
* `s.union(s1)`
* `s.intersection(s1)`
* `s.difference(s1)`
* `len(s)`

In [24]:
my_set = set([5, 6, 6, 7])
my_set

{5, 6, 7}

### Exercises

You have two tuples of genes sequenced from neural and immune cells.

`immune = ('BDNF', 'CD3E', 'CX3CR1', 'FOXP3', 'LCK', 'FOXP3', 'CD8A', 'STAT3', 'CD19', 'IL7R', 'FOXP3', 'CD8A', 'TREM2', 'CCR7', 'CD4', 'FOXP3', 'CX3CR1', 'LCK', 'STAT3', 'GATA3')`
  
`neuron = ('SNAP25', 'SYN1', 'MAP2', 'BDNF', 'TREM2', 'NEUROD1', 'CX3CR1', 'SYN1', 'GAP43', 'GRIN1', 'CHRNA7', 'SNAP25', 'TH', 'GAP43', 'TREM2', 'BDNF', 'STAT3', 'SLC6A3', 'BCL11B')`

1. Print how many unique genes are in each cell type
2. Print how many unique genes are found across both cells
3. Print how many genes are found in both cells. Print these genes

In [11]:
immune = ('BDNF', 'CD3E', 'CX3CR1', 'FOXP3', 'LCK', 'FOXP3', 'CD8A', 'STAT3', 'CD19', 'IL7R', 'FOXP3', 'CD8A', 'TREM2', 'CCR7', 'CD4', 'FOXP3', 'CX3CR1', 'LCK', 'STAT3', 'GATA3')
neuron = ('SNAP25', 'SYN1', 'MAP2', 'BDNF', 'TREM2', 'NEUROD1', 'CX3CR1', 'SYN1', 'GAP43', 'GRIN1', 'CHRNA7', 'SNAP25', 'TH', 'GAP43', 'TREM2', 'BDNF', 'STAT3', 'SLC6A3', 'BCL11B')
# 1
immune_set = set(immune)
neuron_set = set(neuron)
print(len(immune_set))
print(len(neuron_set))
# 2
print(len(immune_set|neuron_set))
#3
both_set = immune_set&neuron_set
print(len(both_set))
print(both_set)


13
14
23
4
{'STAT3', 'CX3CR1', 'TREM2', 'BDNF'}


In [29]:
var = 5
#print('my varaiable is')
#print( var)
print(f'my variable is {var}')


my variable is 5


### Dictionaries

Dictionaries, or `dict` objects, are different from lists, tuples and sets in that they store **pairs** of elements instead. In these pairs, one element is called `key` and the other is `value`. Like in paper dictionaries, keys are what you use to find values: e.g. you search for a word to find its translation to another language.

Empty dictionaries are created with either `dict()` or curly brackets `{}`. To define key-value pairs, the `:` symbol is used between them, e.g.:

```python
my_dict = {'vardas': 'name', 'laikas': 'time'}
```

In this case, `'vardas'` and `'laikas'` are keys, and `'name'` and `'time'` are values.

There are almost no exceptions to what can be used as dictionary elements: numbers, strings, booleans, tuples can be both keys and values.
* Keys should be immutable. Lists, sets, other dictionaries cannot be keys. Keys should also be unique.
* Values can be almost anything, including lists, sets, dictionaries, even function names. Values can repeat in the same dictionary.

Dictionaries cannot be indexed. Instead, keys should be used to find respective values. To find a value by its key, square brackets are used:

```python
my_dict['vardas']
```

You can add a new key-value pair using square brackets:
```python
my_dict['simplified_pi'] = 3.14
```

Main functions for `dict` objects are (examples wit a `d` dictionary):
* `d.keys()` - get all keys of the dictionary
* `d.values()` - get all values
* `d.items()` - will return key-value pairs as tuples
* `d.get(key)` - another way to get a value using a `key` (see example below)
* `d.update(d1)` - add key-value pairs (with possible owerwrite) from `d1` to `d`
* `len(d)`

Quite often it may be better to use the `.get()` function instead of square brackets to get values. That is because the square brackets approach can raise a `KeyError` error if the key does not exist in the dictionary:
```python
my_dict['spring']
print('Default is None:', my_dict.get('spring'))
print('Custom default:', my_dict.get('spring', 2024))
```

In [34]:
dictionary = {'key1' : 'value1', 'listkey':[1, 2, 3, 4]}
dictionary['key2'] = 10
dictionary
#dictionary.get('key3','key2')
dictionary.values()
dictionary.keys()
#dictionary.items()- mes naudojame kai norime sukurti list'ą, bet neprarasti info apie keys ir values.

dict_keys(['key1', 'listkey', 'key2'])

### Exercises

You have a dictionary of cell types and a gene commonly used for their identification:
```python
cells = {
    "t_cell": "CD3E",
    "b_cell": "CD19",
    "macrophage": "CD68",
    "neuron": "RBFOX3",
    "hepatocyte": "ALB",
    "cardiomyocyte": "TNNT2",
    "adipocyte": "LEP",
    "fibroblast": "COL1A1",
    "pancreatic_b": "INS",
    "endothelial": "INS",
}
```
1. Check what cell types are available in this dictionary
2. Add another type of cell to this dictionary (use google)
3. Print the gene specific to adipocytes
4. The endothelial cell gene is wrong. Change it to *PECAM1*

Another dictionary holds quick explanations of each gene function. Check one selected gene, and see what happens when you try to look for a gene that is not in the dictionary, such as the one you added. Use `.get()` to prevent a `KeyError`.

```python
functions = {
    "CD3E":"Part of the T-cell receptor complex, essential for T-cell development and activation.",
    "CD19":"A B-cell surface molecule involved in B-cell activation and development.",
    "CD68":"A glycoprotein used as a marker for macrophages, involved in phagocytosis.",
    "RBFOX3":"A neuron-specific protein used to identify mature neurons.",
    "ALB":"The major protein produced by liver cells, involved in maintaining blood osmotic pressure.",
    "TNNT2":" A cardiac-specific protein that regulates muscle contraction in the heart.",
    "LEP":"A hormone secreted by adipocytes that regulates energy balance and fat storage.",
    "COL1A1":"Produces type I collagen, the most abundant protein in connective tissues.",
    "INS":"The gene encoding insulin, a hormone essential for glucose regulation, produced specifically by pancreatic beta cells.",
    "PECAM1":"An adhesion molecule involved in angiogenesis and a marker of endothelial cells.",
}
```



In [22]:
cells = {
    "t_cell": "CD3E",
    "b_cell": "CD19",
    "macrophage": "CD68",
    "neuron": "RBFOX3",
    "hepatocyte": "ALB",
    "cardiomyocyte": "TNNT2",
    "adipocyte": "LEP",
    "fibroblast": "COL1A1",
    "pancreatic_b": "INS",
    "endothelial": "INS",
}
print(f'Available cell types are {list(cells.keys())}')
cells['plant_cell'] = 'PL'
print(cells)
print(cells.get('adipocyte'))
cells['endothelial'] = 'PECAM1'
print(cells)

Available cell types are ['t_cell', 'b_cell', 'macrophage', 'neuron', 'hepatocyte', 'cardiomyocyte', 'adipocyte', 'fibroblast', 'pancreatic_b', 'endothelial']
{'t_cell': 'CD3E', 'b_cell': 'CD19', 'macrophage': 'CD68', 'neuron': 'RBFOX3', 'hepatocyte': 'ALB', 'cardiomyocyte': 'TNNT2', 'adipocyte': 'LEP', 'fibroblast': 'COL1A1', 'pancreatic_b': 'INS', 'endothelial': 'INS', 'plant_cell': 'PL'}
LEP
{'t_cell': 'CD3E', 'b_cell': 'CD19', 'macrophage': 'CD68', 'neuron': 'RBFOX3', 'hepatocyte': 'ALB', 'cardiomyocyte': 'TNNT2', 'adipocyte': 'LEP', 'fibroblast': 'COL1A1', 'pancreatic_b': 'INS', 'endothelial': 'PECAM1', 'plant_cell': 'PL'}
