# Strings, dictionaries and word counts

Question: For a given text, how would you count occurrence of each word? For a given word, we might use `count()` method but about counting all words?

Input: *"One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed ..."*

Output: *{'one': 1, 'morning': 1, 'when': 1, 'gregor': 2, 'samsa': 2, 'woke': 1, 'from': 1, 'troubled': 1, 'dreams': 1, 'he': 7, 'found': 1,...*


## Strings

Please visit [Strings](https://www.tutorialspoint.com/python3/python_strings.htm) page at TutorialsPoint for extensive information about strings. Please go over all possible string methods at [this](https://www.w3schools.com/python/python_ref_string.asp) page.

In [1]:
couple_of_words = ['hello','world']
sentence = 'It was the best of times, it was the worst of times.'

### Accessing values in strings

In [2]:
sentence[3]

'w'

In [3]:
couple_of_words[1]

'world'

In [4]:
couple_of_words[1][2]

'r'

In [5]:
equivalent_list = [['h','e','l','l','o'], ['w','o','r','l','d']]
#ozel_bir_dizi[1][2]

Slicing, as in lists, is possible as well

In [6]:
sentence[37:]

'worst of times.'

### Escape characters

While printing characters, the special characters such as Tab, Newline, etc. are denoted with escape characters. Please refer to [Strings](https://www.tutorialspoint.com/python3/python_strings.htm) page at TutorialsPoint for full list.

In [7]:
print("hello\tworld!\nhello again!")

hello	world!
hello again!


Please visit [Strings](https://www.tutorialspoint.com/python3/python_strings.htm) page at TutorialsPoint for list of "String Special Operators".

Question: If you want to print "\n" literally, how would you do it?

In [8]:
print("\\n is newline char")

\n is newline char


### String concatanation

`+` sign can be used to concatanate (combine) strings. 

In [9]:
name="John"
lastname="Doe"
payment=53.444

In [10]:
print(name + lastname)

JohnDoe


In [11]:
print(name + " " + lastname)

John Doe


In [12]:
# try this
#print(name + " " + lastname + " owes $" + payment )

### String formatting

`%` is used as formatting operator in Python which requires an accompanying **tuple**.

In [13]:
data = ("John", "Doe", 53.4444)
format_string = "Hello %s %s. Your current balance is $%.2f"

print(format_string % data)

Hello John Doe. Your current balance is $53.44


Please visit [this blog post](https://realpython.com/python-string-formatting/) for in-depth information about string formatting in Python. Let's use format string to print similar output.

In [14]:
f"Hello {name} {lastname}. Your current balance is ${payment}."
#f"Hello {name.upper()} {lastname.upper()}. Your current balance is ${payment:0.2f}."

'Hello John Doe. Your current balance is $53.444.'

In [15]:
f"Hello {name+lastname}. Your current balance is ${payment:0.2f}."

'Hello JohnDoe. Your current balance is $53.44.'

### String methods

Please refer to [Strings](https://www.tutorialspoint.com/python3/python_strings.htm) page at TutorialsPoint for full list of methods.

In [16]:
sentence = 'It was the best of times, it was the worst of times.'
sentence.upper()

'IT WAS THE BEST OF TIMES, IT WAS THE WORST OF TIMES.'

In [17]:
sentence.count('e')

5

> some of the functions for lists are shared with strings!

In [18]:
len(sentence)

52

In [19]:
sentence.replace('t','s',3)

'Is was she bess of times, it was the worst of times.'

In [20]:
sentence.lower().split()
#sentence.lower().split(",")

['it',
 'was',
 'the',
 'best',
 'of',
 'times,',
 'it',
 'was',
 'the',
 'worst',
 'of',
 'times.']

How can we calculate reverse complement of a DNA string?

In [21]:
a_gene= "ACGACTACGACTA"
a_gene.replace('A','T').replace('T','A')

'ACGACAACGACAA'

Uh oh! that's wrong. Looks like we need another method.

# Dictionaries

Dictionaries are sometimes found in other languages as "hash tables" or "associative arrays". Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by **keys**, which can be any immutable type; strings and numbers can always be keys. 

It is best to think of a dictionary as a set of *key: value* pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: `{}`. Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary; this is also the way dictionaries are written on output.

Performing `list(d)` on a dictionary returns a list of all the keys used in the dictionary, in insertion order (if you want it sorted, just use `sorted(d)` instead). To check whether a single key is in the dictionary, use the `in` keyword.

In [22]:
tel = {'jack': 4098, 'sape': 4139}
tel['guido'] = 4127
tel

{'jack': 4098, 'sape': 4139, 'guido': 4127}

In [23]:
tel['jack']

4098

In [24]:
del tel['sape']
tel['irv'] = 4127
tel

{'jack': 4098, 'guido': 4127, 'irv': 4127}

In [25]:
list(tel)

['jack', 'guido', 'irv']

In [26]:
sorted(tel)

['guido', 'irv', 'jack']

In [27]:
'guido' in tel

True

In [28]:
'jack' not in tel

False

In [29]:
for key in sorted(tel):
    print("%s:\t%s" % (key,tel[key]))

guido:	4127
irv:	4127
jack:	4098


In [30]:
for key in sorted(tel):
    print(f"{key}:\t📞 {tel[key]}")

guido:	📞 4127
irv:	📞 4127
jack:	📞 4098


The `dict()` constructor builds dictionaries directly from sequences of key-value pairs:

In [31]:
dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])

{'sape': 4139, 'guido': 4127, 'jack': 4098}

Let's generate capitals dictionary

In [32]:
#capitals = dict([('Turkey','Ankara'), ('Japan','Tokyo'), ('Germany','Berlin')])
#capitals['Turkey']

In addition, dict comprehensions can be used to create dictionaries from arbitrary key and value expressions:

In [33]:
new_dict = {x: x**2 for x in (2, 4, 6)}
new_dict

{2: 4, 4: 16, 6: 36}

In [34]:
new_dict[6]

36

When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments:

In [35]:
dict(sape=4139, guido=4127, jack=4098)

{'sape': 4139, 'guido': 4127, 'jack': 4098}

### Dictionary from two lists

In [36]:
countries = ['Turkey', 'Japan', 'Germany']
capitals = ['Ankara', 'Tokyo', 'Berlin']

In [37]:
capital_dict = dict(zip(countries,capitals))

In [38]:
print(capital_dict)

{'Turkey': 'Ankara', 'Japan': 'Tokyo', 'Germany': 'Berlin'}


In [39]:
capital_dict['Turkey']

'Ankara'

## Dictionary Methods

Python has a set of built-in methods that you can use on dictionaries.

| Method                                                       | Description                                                  |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [clear()](https://www.w3schools.com/python/ref_dictionary_clear.asp) | Removes all the elements from the dictionary                 |
| [copy()](https://www.w3schools.com/python/ref_dictionary_copy.asp) | Returns a copy of the dictionary                             |
| [fromkeys()](https://www.w3schools.com/python/ref_dictionary_fromkeys.asp) | Returns a dictionary with the specified keys and values      |
| [get()](https://www.w3schools.com/python/ref_dictionary_get.asp) | Returns the value of the specified key                       |
| [items()](https://www.w3schools.com/python/ref_dictionary_items.asp) | Returns a list containing the a tuple for each key value pair |
| [keys()](https://www.w3schools.com/python/ref_dictionary_keys.asp) | Returns a list contianing the dictionary's keys              |
| [pop()](https://www.w3schools.com/python/ref_dictionary_pop.asp) | Removes the element with the specified key                   |
| [popitem()](https://www.w3schools.com/python/ref_dictionary_popitem.asp) | Removes the last    inserted key-value pair                  |
| [setdefault()](https://www.w3schools.com/python/ref_dictionary_setdefault.asp) | Returns the value of the specified key. If the key does not exist: insert the key, with the specified value |
| [update()](https://www.w3schools.com/python/ref_dictionary_update.asp) | Updates the dictionary with the specified key-value pairs    |
| [values()](https://www.w3schools.com/python/ref_dictionary_values.asp) | Returns a list of all the values in the dictionary           |

------

## Counting words with dictionaries

In [40]:
%%HTML

<iframe src="https://pythontutor.com/iframe-embed.html#code=def%20word_count%28str%29%3A%0A%20%20%20%20counts%20%3D%20
dict%28%29%0A%20%20%20%20words%20%3D%20str.split%28%29%0A%0A%20%20%20%20for%20word%20in%20words%3A%0A%20%20%20%20%20%20%20%20
if%20word%20in%20counts%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20counts%5Bword%5D%20%2B%3D%201%0A%20%20%20%20%20%20%20%20
else%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20counts%5Bword%5D%20%3D%201%0A%0A%20%20%20%20return%20counts%0A%0Aprint%28%20
word_count%28'the%20quick%20brown%20fox%20jumps%20over%20the%20lazy%20dog.'%29%29&amp;codeDivHeight=400&amp;codeDivWidth=350&amp;
cumulative=false&amp;curInstr=0&amp;heapPrimitives=nevernest&amp;origin=opt-frontend.js&amp;py=3&amp;rawInputLstJSON=%5B%5D&amp;
textReferences=false" width="100%" height="500" frameborder="0"> </iframe>

## Another example of word count

In [41]:
paragraph="""One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed 
into a horrible vermin. He lay on his armour-like back, and if he lifted his head a little he could see his 
brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover it 
and seemed ready to slide off any moment. His many legs, pitifully thin compared with the size of the rest of 
him, waved about helplessly as he looked. "What's happened to me?" he thought. It wasn't a dream. His room, a 
proper human room although a little too small, lay peacefully between its four familiar walls. A collection of 
textile samples lay spread out on the table - Samsa was a travelling salesman - and above it there hung a picture 
that he had recently cut out of an illustrated magazine and housed in a nice, gilded frame. It showed a lady 
fitted out with a fur hat and fur boa who sat upright, raising a heavy fur muff that covered the whole of her 
lower arm towards the viewer. Gregor then turned to look out the window at the dull weather."""

#paragraph.lower().translate(None, paragraph.punctuation).split()
words = paragraph.replace(",","").lower().split()
print(words)

['one', 'morning', 'when', 'gregor', 'samsa', 'woke', 'from', 'troubled', 'dreams', 'he', 'found', 'himself', 'transformed', 'in', 'his', 'bed', 'into', 'a', 'horrible', 'vermin.', 'he', 'lay', 'on', 'his', 'armour-like', 'back', 'and', 'if', 'he', 'lifted', 'his', 'head', 'a', 'little', 'he', 'could', 'see', 'his', 'brown', 'belly', 'slightly', 'domed', 'and', 'divided', 'by', 'arches', 'into', 'stiff', 'sections.', 'the', 'bedding', 'was', 'hardly', 'able', 'to', 'cover', 'it', 'and', 'seemed', 'ready', 'to', 'slide', 'off', 'any', 'moment.', 'his', 'many', 'legs', 'pitifully', 'thin', 'compared', 'with', 'the', 'size', 'of', 'the', 'rest', 'of', 'him', 'waved', 'about', 'helplessly', 'as', 'he', 'looked.', '"what\'s', 'happened', 'to', 'me?"', 'he', 'thought.', 'it', "wasn't", 'a', 'dream.', 'his', 'room', 'a', 'proper', 'human', 'room', 'although', 'a', 'little', 'too', 'small', 'lay', 'peacefully', 'between', 'its', 'four', 'familiar', 'walls.', 'a', 'collection', 'of', 'textile',

### Count with loop

In [42]:
counts=dict()
# or
# counts={}
for word in words:
    if word in counts:
        counts[word] += 1
    else:
        counts[word] = 1

print(counts)

{'one': 1, 'morning': 1, 'when': 1, 'gregor': 2, 'samsa': 2, 'woke': 1, 'from': 1, 'troubled': 1, 'dreams': 1, 'he': 7, 'found': 1, 'himself': 1, 'transformed': 1, 'in': 2, 'his': 6, 'bed': 1, 'into': 2, 'a': 12, 'horrible': 1, 'vermin.': 1, 'lay': 3, 'on': 2, 'armour-like': 1, 'back': 1, 'and': 6, 'if': 1, 'lifted': 1, 'head': 1, 'little': 2, 'could': 1, 'see': 1, 'brown': 1, 'belly': 1, 'slightly': 1, 'domed': 1, 'divided': 1, 'by': 1, 'arches': 1, 'stiff': 1, 'sections.': 1, 'the': 8, 'bedding': 1, 'was': 2, 'hardly': 1, 'able': 1, 'to': 4, 'cover': 1, 'it': 4, 'seemed': 1, 'ready': 1, 'slide': 1, 'off': 1, 'any': 1, 'moment.': 1, 'many': 1, 'legs': 1, 'pitifully': 1, 'thin': 1, 'compared': 1, 'with': 2, 'size': 1, 'of': 5, 'rest': 1, 'him': 1, 'waved': 1, 'about': 1, 'helplessly': 1, 'as': 1, 'looked.': 1, '"what\'s': 1, 'happened': 1, 'me?"': 1, 'thought.': 1, "wasn't": 1, 'dream.': 1, 'room': 2, 'proper': 1, 'human': 1, 'although': 1, 'too': 1, 'small': 1, 'peacefully': 1, 'betwe

We can access values for particular keys by `dict[key]`notation

In [43]:
counts["he"]

7

In [44]:
counts["the"]

8

Here's another way to generate a dictionary with word counts. In this case, we use `get()` method to access keys, which allows assigning default values (0 in our case) if a key is not found.

In [45]:
counts2={}

for word in words: 
    counts2[word] = counts2.get(word, 0) + 1
    
print(counts2)

{'one': 1, 'morning': 1, 'when': 1, 'gregor': 2, 'samsa': 2, 'woke': 1, 'from': 1, 'troubled': 1, 'dreams': 1, 'he': 7, 'found': 1, 'himself': 1, 'transformed': 1, 'in': 2, 'his': 6, 'bed': 1, 'into': 2, 'a': 12, 'horrible': 1, 'vermin.': 1, 'lay': 3, 'on': 2, 'armour-like': 1, 'back': 1, 'and': 6, 'if': 1, 'lifted': 1, 'head': 1, 'little': 2, 'could': 1, 'see': 1, 'brown': 1, 'belly': 1, 'slightly': 1, 'domed': 1, 'divided': 1, 'by': 1, 'arches': 1, 'stiff': 1, 'sections.': 1, 'the': 8, 'bedding': 1, 'was': 2, 'hardly': 1, 'able': 1, 'to': 4, 'cover': 1, 'it': 4, 'seemed': 1, 'ready': 1, 'slide': 1, 'off': 1, 'any': 1, 'moment.': 1, 'many': 1, 'legs': 1, 'pitifully': 1, 'thin': 1, 'compared': 1, 'with': 2, 'size': 1, 'of': 5, 'rest': 1, 'him': 1, 'waved': 1, 'about': 1, 'helplessly': 1, 'as': 1, 'looked.': 1, '"what\'s': 1, 'happened': 1, 'me?"': 1, 'thought.': 1, "wasn't": 1, 'dream.': 1, 'room': 2, 'proper': 1, 'human': 1, 'although': 1, 'too': 1, 'small': 1, 'peacefully': 1, 'betwe

### Count with list comprehension

In [46]:
wordfreq = [words.count(w) for w in words]
print(str(wordfreq))

[1, 1, 1, 2, 2, 1, 1, 1, 1, 7, 1, 1, 1, 2, 6, 1, 2, 12, 1, 1, 7, 3, 2, 6, 1, 1, 6, 1, 7, 1, 6, 1, 12, 2, 7, 1, 1, 6, 1, 1, 1, 1, 6, 1, 1, 1, 2, 1, 1, 8, 1, 2, 1, 1, 4, 1, 4, 6, 1, 1, 4, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 2, 8, 1, 5, 8, 1, 5, 1, 1, 1, 1, 1, 7, 1, 1, 1, 4, 1, 7, 1, 4, 1, 12, 1, 6, 2, 12, 1, 1, 2, 1, 12, 2, 1, 1, 3, 1, 1, 1, 1, 1, 1, 12, 1, 5, 1, 1, 3, 1, 4, 2, 8, 1, 2, 2, 2, 12, 1, 1, 2, 6, 1, 4, 1, 1, 12, 1, 2, 7, 1, 1, 1, 4, 5, 1, 1, 1, 6, 1, 2, 12, 1, 1, 1, 4, 1, 12, 1, 1, 4, 2, 12, 3, 1, 6, 3, 1, 1, 1, 1, 1, 12, 1, 3, 1, 2, 1, 8, 1, 5, 1, 1, 1, 1, 8, 1, 2, 1, 1, 4, 1, 4, 8, 1, 1, 8, 1, 1]


In [47]:
wordfreq_pairs = dict(zip(words,wordfreq))
print(str(wordfreq_pairs))

{'one': 1, 'morning': 1, 'when': 1, 'gregor': 2, 'samsa': 2, 'woke': 1, 'from': 1, 'troubled': 1, 'dreams': 1, 'he': 7, 'found': 1, 'himself': 1, 'transformed': 1, 'in': 2, 'his': 6, 'bed': 1, 'into': 2, 'a': 12, 'horrible': 1, 'vermin.': 1, 'lay': 3, 'on': 2, 'armour-like': 1, 'back': 1, 'and': 6, 'if': 1, 'lifted': 1, 'head': 1, 'little': 2, 'could': 1, 'see': 1, 'brown': 1, 'belly': 1, 'slightly': 1, 'domed': 1, 'divided': 1, 'by': 1, 'arches': 1, 'stiff': 1, 'sections.': 1, 'the': 8, 'bedding': 1, 'was': 2, 'hardly': 1, 'able': 1, 'to': 4, 'cover': 1, 'it': 4, 'seemed': 1, 'ready': 1, 'slide': 1, 'off': 1, 'any': 1, 'moment.': 1, 'many': 1, 'legs': 1, 'pitifully': 1, 'thin': 1, 'compared': 1, 'with': 2, 'size': 1, 'of': 5, 'rest': 1, 'him': 1, 'waved': 1, 'about': 1, 'helplessly': 1, 'as': 1, 'looked.': 1, '"what\'s': 1, 'happened': 1, 'me?"': 1, 'thought.': 1, "wasn't": 1, 'dream.': 1, 'room': 2, 'proper': 1, 'human': 1, 'although': 1, 'too': 1, 'small': 1, 'peacefully': 1, 'betwe

## Dictionary comprehension


In addition to list comprehension, dictionary comprehension is possible. Below is the example for even number squares dictionary constructed with both loop and comprehension.

In [48]:
d = {}
for i in range(2,21,2):
    d[i]=i**2
print(d)

{2: 4, 4: 16, 6: 36, 8: 64, 10: 100, 12: 144, 14: 196, 16: 256, 18: 324, 20: 400}


In [49]:
d2 = { num: num**2 for num in range(2,21,2) }
print(d2)

{2: 4, 4: 16, 6: 36, 8: 64, 10: 100, 12: 144, 14: 196, 16: 256, 18: 324, 20: 400}


### about reverse complement of a DNA sequence

remember `a_gene`, let's get the reverse complement

In [50]:
a_gene= "ACGACTACGACTA"

Let's check `translate()` method from [this source](https://www.w3schools.com/python/ref_string_translate.asp):
> The `translate()` method returns a string where some specified characters are replaced with the character described in a dictionary, or in a mapping table.
> Use the `maketrans()` method to create a mapping table.
> If a character is not specified in the dictionary/table, the character will not be replaced.
> If you use a dictionary, you **must use ascii codes instead of characters**.

Let's use the [ASCII table](https://www.asciitable.com/) to get ASCII codes for A, T, G and C which are 65, 84, 67 and 71, respectively.

In [51]:
dna_table= {65: 84, 84: 65, 67: 71, 71:67}

In [52]:
a_gene.translate(dna_table)

'TGCTGATGCTGAT'

## Complex data structures

By mixing and nesting lists and dictionaries altogether pretty complex data structures can emerge. 

In [53]:
people = {1: {'name': 'John', 'age': '27', 'gender': 'Male'},
          2: {'name': 'Marie', 'age': '22', 'gender': 'Female'}}

print(people[1]['name'])

John


In [54]:
countries = {'Turkey': {'capital': 'Ankara', 
                        'largest_city': {'name': 'Istanbul', 'population':'15.4'},
                        'population_million': [72.3, 78.5, 84.3],
                        'pop_years': [2010, 2015, 2020] },
            'Germany': {'capital': 'Berlin', 
                        'largest_city': {'name': 'Berlin', 'population':'3.6'},
                        'population_million': [81.7, 81.7, 83.2],
                        'pop_years': [2010, 2015, 2020] }
            }

In [55]:
countries['Turkey']['largest_city']['population']

'15.4'

In [57]:
countries['Germany']['population_million'][-1]

83.2