# Collections of things

From your experience with R and the little bit of python you've done you already know a bit about the core types of objects (e.g. integers, floats and strings). And playing with Biopython you've found some of the custom classes (like the `SeqRecord` object). This lesson is abuot how to collect up items.

## Lists (collecting random assortsments of stuff)

Lists are the general purpose collection of any combination of tiems (inlike an  R vector there is no requirement that items be of the same type). Check out [SW Carpentry for the basic intro](https://swcarpentry.github.io/python-novice-inflammation/05-lists/index.html). Here we'll show how to record information from biological data. For instance, let's read in our fastq file...


In [21]:
from Bio import SeqIO

recs = SeqIO.parse("../first_task/reads.fq", "fastq")

.. and now record the length of each sequence in the object. To do this we will start by making an empty list (using the square brackey syntax `[]`) and iterate through the list, recording the length of each one then appending it to the list

In [25]:
recs = SeqIO.parse("../first_task/reads.fq", "fastq")
seq_lens = []
for seq in recs:
    this_ones_length = len(seq)
    seq_lens.append(this_ones_length)
seq_lens

[100, 100, 100, 100, 100, 100, 100, 100, 100, 100]

Starting an empty list at appending to it is a very common idiom in a lot of languages, but in python there is an alternative called a list comprehension. This replaces the code above with a single line. It is up to you if you prefer the list comprehension or for-loop syntax

In [27]:
recs = SeqIO.parse("../first_task/reads.fq", "fastq")
[len(seq) for seq in recs]

[100, 100, 100, 100, 100, 100, 100, 100, 100, 100]

## dictionaries (storing things to look up later)

Dictonaries are a way to store some information in a way you can look it up later. Say you want to remember when people won their nobel prizes for some reason. Here's how you'd make a dictoinary  to look that up. The syntax here is `key :  value` where key can be used to retrieve a value. Values can be almost any data type, while keys need to be 'hashable'... which is a concet for later!


In [34]:
nobel_dict = {"McClintock": 1983, "Lederberg": 1958, "Delbrück": 1969}

To retrive a value you use square backets with the key

In [35]:
nobel_dict["McClintock"]

1983

Let's use a dictionary to store our sequencing reads to look up later. This example uses the other syntax for buliding a dictoinary, assigning a given value to a key using the square bracket notation.

In [38]:
seq_dict = {} # or dict() to intialize an emp
recs = SeqIO.parse("../first_task/reads.fq", "fastq")
for seq in recs:
    seq_dict[ seq.id ] = seq
seq_dict


{'JpRwMsVW': SeqRecord(seq=Seq('ATGGCATCACCGTCACCACCAGGGGAGGCGTCGCCTGCGCAGCGCACCACCGTC...CCC', SingleLetterAlphabet()), id='JpRwMsVW', name='JpRwMsVW', description='JpRwMsVW', dbxrefs=[]),
 'LISzqTNF': SeqRecord(seq=Seq('ATGGCATCAGCGTCACCACCAGGGGACGCGTCGCCTGCGCAGCGCACCACCGTC...CCC', SingleLetterAlphabet()), id='LISzqTNF', name='LISzqTNF', description='LISzqTNF', dbxrefs=[]),
 'QevuyjfB': SeqRecord(seq=Seq('ATGGCAGCACCGTCACCACCAGGGGACGCGTCGCCTGCGCAGCGCACCACCGTC...CCC', SingleLetterAlphabet()), id='QevuyjfB', name='QevuyjfB', description='QevuyjfB', dbxrefs=[]),
 'WdkVXRjQ': SeqRecord(seq=Seq('ATGGCATCACCGTCACCACCAGGGGACGCGTCGCCTGCGCAGCGCACCACCGTC...CCC', SingleLetterAlphabet()), id='WdkVXRjQ', name='WdkVXRjQ', description='WdkVXRjQ', dbxrefs=[]),
 'ZFbfxWsl': SeqRecord(seq=Seq('ATGGCATCACCGTCACCACCAGGGGACGCGTCGCCTGCGCAGCGCACCACCGTC...CCC', SingleLetterAlphabet()), id='ZFbfxWsl', name='ZFbfxWsl', description='ZFbfxWsl', dbxrefs=[]),
 'agoPnkEt': SeqRecord(seq=Seq('ATGGCATCACCGTCACCACCAGG

To look up a given sequence we use teh square brackets again

In [39]:
seq_dict['JpRwMsVW']

SeqRecord(seq=Seq('ATGGCATCACCGTCACCACCAGGGGAGGCGTCGCCTGCGCAGCGCACCACCGTC...CCC', SingleLetterAlphabet()), id='JpRwMsVW', name='JpRwMsVW', description='JpRwMsVW', dbxrefs=[])

## Other collections

The other collections you might run into are tuples (like lists but the values can't be overwritten once formed), and sets (like mathmatical sets, so you can do operatoins like union and intersect). You probably won't need these often. There is also a whole collection of collections in the module... collections. One of them, `Counter` is quite useful for... counting things


In [45]:
from collections import Counter
a_list = ["A", "A", "B", "A", "C", "B", "A"]
freqs = Counter(a_list)
freqs

Counter({'A': 4, 'B': 2, 'C': 1})

In [46]:
freqs["A"]

4