# Lecture 5 - Strings, dictionaries and word counts

Question: For a given text, how would you count occurrence of each word? For a given word, we might use `count()` method but about counting all words?

Input: *"One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed ..."*

Output: *{'one': 1, 'morning': 1, 'when': 1, 'gregor': 2, 'samsa': 2, 'woke': 1, 'from': 1, 'troubled': 1, 'dreams': 1, 'he': 7, 'found': 1,...*


## Strings

Please visit [Strings](https://www.tutorialspoint.com/python3/python_strings.htm) page at TutorialsPoint for extensive information about strings. Please go over all possible string methods at [this](https://www.w3schools.com/python/python_ref_string.asp) page.

In [None]:
couple_of_words = ['hello','world']
sentence = 'It was the best of times, it was the worst of times.'

### Accessing values in strings

In [None]:
sentence[3]

In [None]:
couple_of_words[1]

In [None]:
couple_of_words[1][2]

In [None]:
equivalent_list = [['h','e','l','l','o'], ['w','o','r','l','d']]
#ozel_bir_dizi[1][2]

Slicing, as in lists, is possible as well

In [None]:
sentence[37:]

### Escape characters

While printing characters, the special characters such as Tab, Newline, etc. are denoted with escape characters. Please refer to [Strings](https://www.tutorialspoint.com/python3/python_strings.htm) page at TutorialsPoint for full list.

In [None]:
print("hello\tworld!\nhello again!")

Please visit [Strings](https://www.tutorialspoint.com/python3/python_strings.htm) page at TutorialsPoint for list of "String Special Operators".

Question: If you want to print "\n" literally, how would you do it?

In [None]:
print("\\n is newline char")

### String concatanation

`+` sign can be used to concatanate (combine) strings. 

In [None]:
name="John"
lastname="Doe"
payment=53.444

In [None]:
print(name + lastname)

In [None]:
print(name + " " + lastname)

In [None]:
# try this
#print(name + " " + lastname + " owes $" + payment )

### String formatting

`%` is used as formatting operator in Python which requires an accompanying **tuple**.

In [None]:
data = ("John", "Doe", 53.4444)
format_string = "Hello %s %s. Your current balance is $%.2f"

print(format_string % data)

Please visit [this blog post](https://realpython.com/python-string-formatting/) for in-depth information about string formatting in Python. Let's use format string to print similar output.

In [None]:
f"Hello {name} {lastname}. Your current balance is ${payment}."
#f"Hello {name.upper()} {lastname.upper()}. Your current balance is ${payment:0.2f}."

In [None]:
f"Hello {name+lastname}. Your current balance is ${payment:0.2f}."

### String methods

Please refer to [Strings](https://www.tutorialspoint.com/python3/python_strings.htm) page at TutorialsPoint for full list of methods.

In [None]:
sentence = 'It was the best of times, it was the worst of times.'
sentence.upper()

In [None]:
sentence.count('e')

> some of the functions for lists are shared with strings!

In [None]:
len(sentence)

In [None]:
sentence.replace('t','s',3)

In [None]:
sentence.lower().split()
#sentence.lower().split(",")

How can we calculate reverse complement of a DNA string?

In [None]:
a_gene= "ACGACTACGACTA"
a_gene.replace('A','T').replace('T','A')

Uh oh! that's wrong. Looks like we need another method.

## Sets and tuples

Remember, in addition to lists, Python has tuples and sets. Tuples are immutable lists (no change or replace) and sets are unordered collections with no duplicate elements. 

### Table of Difference between List, Set, and Tuple

| List                                     | Set                                        | Tuple                                        |
| ---------------------------------------- | ------------------------------------------ | -------------------------------------------- |
| Lists is Mutable                         | Set is Mutable                             | Tuple is Immutable                           |
| It is Ordered collection of items        | It is Unordered collection of items        | It is Ordered collection of items            |
| Items in list can be replaced or changed | Items in set cannot be changed or replaced | Items in tuple cannot be changed or replaced |

### Tuples 
Tuples are similar to lists, they allow duplicates and their elements are ordered. Some list functions/methods also work for tuples.

In [None]:
t = ('red','green','blue','yellow','blue','blue')
print(len(t))
print(t.index('blue'))
print(t.count('blue'))
print('yellow' in t)
print(t[3])

However, updating a tuple element or updating the whole tuple with `append`, `insert` or `extend` is not allowed.

In [None]:
t[2] = 'pink'

In [None]:
t.append('white')

Tuples are:
* useful to group elements of different types together
* safer than a list
* faster to process than lists

### Sets 

Sets don't have duplicates and they're not indexable.

In [None]:
set_example = {1, 1, 2, 3, 3, 3}
print(set_example)

fruit_set = {'🍎', '🍓', '🍐', '🍎', '🍎', '🍓'}
print(fruit_set) 

vehicle_set = {'🚐', '🏍', '🚗'}
vehicle_set[0]

Sets can be used to generate unique elements from a list. `set()` function converts a list (or tuple) into a set.

In [None]:
a_list = [1,2,3,2,3,4,4,1,5]
set(a_list)

# Dictionaries

Let's imagine we would like to store ages of people. Since we just learned lists, let's keep that information in two lists, one for names, other one for actual ages.

In [None]:
names = ['Jane', 'John', 'Sam']
ages = [22, 25, 24]

In such a small list, we can retrieve age of "John" by accessing second element of `ages` list by `ages[1]`. However, if the list is long, then we have a problem: we have to locate

# Dictionaries

Dictionaries are sometimes found in other languages as "hash tables" or "associative arrays". Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by **keys**, which can be any immutable type; strings and numbers can always be keys. 

It is best to think of a dictionary as a set of *key: value* pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: `{}`. Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary; this is also the way dictionaries are written on output.

In [None]:
a_list = [9,'a',True,15]

a_dictionary = {'Mon':23, 'Tue':25, 'Wed':24, 'Thu':26, 'Fri':28}

print(a_list[1])

print(a_dictionary['Wed'])

a_list[2] = 10

a_dictionary['Fri'] = 30

Let's visualize this code at Pythontutor site ([link](https://pythontutor.com/render.html#code=a_list%20%3D%20%5B9,'a',True,15%5D%0A%0Aa_dictionary%20%3D%20%7B'Mon'%3A23,%20'Tue'%3A25,%20'Wed'%3A24,%20'Thu'%3A26,%20'Fri'%3A28%7D%0A%0Aa_list%5B1%5D%0A%0Aa_dictionary%5B'Wed'%5D%0A%0Aa_list%5B2%5D%20%3D%2010%0A%0Aa_dictionary%5B'Fri'%5D%20%3D%2030&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false))

In [None]:
%%HTML

<iframe width="800" height="500" frameborder="0" src="https://pythontutor.com/iframe-embed.html#code=a_list%20%3D%20%5B9,'a',True,15%5D%0A%0Aa_dictionary%20%3D%20%7B'Mon'%3A23,%20'Tue'%3A25,%20'Wed'%3A24,%20'Thu'%3A26,%20'Fri'%3A28%7D%0A%0Aa_list%5B1%5D%0A%0Aa_dictionary%5B'Wed'%5D%0A%0Aa_list%5B2%5D%20%3D%2010%0A%0Aa_dictionary%5B'Fri'%5D%20%3D%2030&codeDivHeight=400&codeDivWidth=350&cumulative=false&curInstr=0&heapPrimitives=nevernest&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false"> </iframe>

Performing `list(d)` on a dictionary returns a list of all the keys used in the dictionary, in insertion order (if you want it sorted, just use `sorted(d)` instead). To check whether a single key is in the dictionary, use the `in` keyword.

In [None]:
tel = {'jack': 4098, 'sape': 4139}
tel['guido'] = 4127
tel

In [None]:
tel['jack']

In [None]:
del tel['sape']
tel['irv'] = 4127
tel

In [None]:
list(tel)

In [None]:
sorted(tel)

In [None]:
'guido' in tel

In [None]:
'jack' not in tel

In [None]:
for key in sorted(tel):
    print("%s:\t%s" % (key,tel[key]))

In [None]:
for key in sorted(tel):
    print(f"{key}:\t📞 {tel[key]}")

The `dict()` constructor builds dictionaries directly from sequences of key-value pairs:

In [None]:
dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])

Let's generate capitals dictionary

In [None]:
#capitals = dict([('Turkey','Ankara'), ('Japan','Tokyo'), ('Germany','Berlin')])
#capitals['Turkey']

In addition, dict comprehensions can be used to create dictionaries from arbitrary key and value expressions:

In [None]:
new_dict = {x: x**2 for x in (2, 4, 6)}
new_dict

In [None]:
new_dict[6]

When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments:

In [None]:
dict(sape=4139, guido=4127, jack=4098)

### Dictionary from two lists

In [None]:
countries = ['Turkey', 'Japan', 'Germany']
capitals = ['Ankara', 'Tokyo', 'Berlin']

In [None]:
capital_dict = dict(zip(countries,capitals))

In [None]:
print(capital_dict)

In [None]:
capital_dict['Turkey']

## Dictionary Methods

Python has a set of built-in methods that you can use on dictionaries.

| Method                                                       | Description                                                  |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [clear()](https://www.w3schools.com/python/ref_dictionary_clear.asp) | Removes all the elements from the dictionary                 |
| [copy()](https://www.w3schools.com/python/ref_dictionary_copy.asp) | Returns a copy of the dictionary                             |
| [fromkeys()](https://www.w3schools.com/python/ref_dictionary_fromkeys.asp) | Returns a dictionary with the specified keys and values      |
| [get()](https://www.w3schools.com/python/ref_dictionary_get.asp) | Returns the value of the specified key                       |
| [items()](https://www.w3schools.com/python/ref_dictionary_items.asp) | Returns a list containing the a tuple for each key value pair |
| [keys()](https://www.w3schools.com/python/ref_dictionary_keys.asp) | Returns a list contianing the dictionary's keys              |
| [pop()](https://www.w3schools.com/python/ref_dictionary_pop.asp) | Removes the element with the specified key                   |
| [popitem()](https://www.w3schools.com/python/ref_dictionary_popitem.asp) | Removes the last    inserted key-value pair                  |
| [setdefault()](https://www.w3schools.com/python/ref_dictionary_setdefault.asp) | Returns the value of the specified key. If the key does not exist: insert the key, with the specified value |
| [update()](https://www.w3schools.com/python/ref_dictionary_update.asp) | Updates the dictionary with the specified key-value pairs    |
| [values()](https://www.w3schools.com/python/ref_dictionary_values.asp) | Returns a list of all the values in the dictionary           |

------

## Counting words with dictionaries

In [None]:
%%HTML

<iframe src="https://pythontutor.com/iframe-embed.html#code=def%20word_count%28str%29%3A%0A%20%20%20%20counts%20%3D%20
dict%28%29%0A%20%20%20%20words%20%3D%20str.split%28%29%0A%0A%20%20%20%20for%20word%20in%20words%3A%0A%20%20%20%20%20%20%20%20
if%20word%20in%20counts%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20counts%5Bword%5D%20%2B%3D%201%0A%20%20%20%20%20%20%20%20
else%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20counts%5Bword%5D%20%3D%201%0A%0A%20%20%20%20return%20counts%0A%0Aprint%28%20
word_count%28'the%20quick%20brown%20fox%20jumps%20over%20the%20lazy%20dog.'%29%29&amp;codeDivHeight=400&amp;codeDivWidth=350&amp;
cumulative=false&amp;curInstr=0&amp;heapPrimitives=nevernest&amp;origin=opt-frontend.js&amp;py=3&amp;rawInputLstJSON=%5B%5D&amp;
textReferences=false" width="100%" height="500" frameborder="0"> </iframe>

## Another example of word count

In [None]:
paragraph="""One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed 
into a horrible vermin. He lay on his armour-like back, and if he lifted his head a little he could see his 
brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover it 
and seemed ready to slide off any moment. His many legs, pitifully thin compared with the size of the rest of 
him, waved about helplessly as he looked. "What's happened to me?" he thought. It wasn't a dream. His room, a 
proper human room although a little too small, lay peacefully between its four familiar walls. A collection of 
textile samples lay spread out on the table - Samsa was a travelling salesman - and above it there hung a picture 
that he had recently cut out of an illustrated magazine and housed in a nice, gilded frame. It showed a lady 
fitted out with a fur hat and fur boa who sat upright, raising a heavy fur muff that covered the whole of her 
lower arm towards the viewer. Gregor then turned to look out the window at the dull weather."""

#paragraph.lower().translate(None, paragraph.punctuation).split()
words = paragraph.replace(",","").lower().split()
print(words)

### Count with loop

In [None]:
counts=dict()
# or
# counts={}
for word in words:
    if word in counts:
        counts[word] += 1
    else:
        counts[word] = 1

print(counts)

We can access values for particular keys by `dict[key]`notation

In [None]:
counts["he"]

In [None]:
counts["the"]

Here's another way to generate a dictionary with word counts. In this case, we use `get()` method to access keys, which allows assigning default values (0 in our case) if a key is not found.

In [None]:
counts2={}

for word in words: 
    counts2[word] = counts2.get(word, 0) + 1
    
print(counts2)

### Count with list comprehension

In [None]:
wordfreq = [words.count(w) for w in words]
print(str(wordfreq))

In [None]:
wordfreq_pairs = dict(zip(words,wordfreq))
print(str(wordfreq_pairs))

## Dictionary comprehension


In addition to list comprehension, dictionary comprehension is possible. Below is the example for even number squares dictionary constructed with both loop and comprehension.

In [None]:
d = {}
for i in range(2,21,2):
    d[i]=i**2
print(d)

In [None]:
d2 = { num: num**2 for num in range(2,21,2) }
print(d2)

### about reverse complement of a DNA sequence

remember `a_gene`, let's get the reverse complement

In [None]:
a_gene= "ACGACTACGACTA"

Let's check `translate()` method from [this source](https://www.w3schools.com/python/ref_string_translate.asp):
> The `translate()` method returns a string where some specified characters are replaced with the character described in a dictionary, or in a mapping table.
> Use the `maketrans()` method to create a mapping table.
> If a character is not specified in the dictionary/table, the character will not be replaced.
> If you use a dictionary, you **must use ascii codes instead of characters**.

Let's use the [ASCII table](https://www.asciitable.com/) to get ASCII codes for A, T, G and C which are 65, 84, 67 and 71, respectively.

In [None]:
dna_table= {65: 84, 84: 65, 67: 71, 71:67}

In [None]:
a_gene.translate(dna_table)

## Complex data structures

By mixing and nesting lists and dictionaries altogether pretty complex data structures can emerge. 

In [None]:
people = {1: {'name': 'John', 'age': '27', 'gender': 'Male'},
          2: {'name': 'Marie', 'age': '22', 'gender': 'Female'}}

print(people[1]['name'])

In [None]:
countries = {'Turkey': {'capital': 'Ankara', 
                        'largest_city': {'name': 'Istanbul', 'population':'15.4'},
                        'population_million': [72.3, 78.5, 84.3],
                        'pop_years': [2010, 2015, 2020] },
            'Germany': {'capital': 'Berlin', 
                        'largest_city': {'name': 'Berlin', 'population':'3.6'},
                        'population_million': [81.7, 81.7, 83.2],
                        'pop_years': [2010, 2015, 2020] }
            }

In [None]:
countries['Turkey']['largest_city']['population']

In [None]:
countries['Germany']['population_million'][-1]

These data structures might look very complicated but an equivalent of such structure is used frequently for data exchange between servers or users. The format used for web communication is called *JSON* and it can house very complicated data.