## What is a dictionary

- Unordered key-value pairs.
- Keys are immutables (numbers, strings, tuples).
- Values can be any object.


## When to use dictionaries

- ID to Name mapping.
- Object to Count mapping.
- Name of a feature to value of the feature.
- Name of an attribute to value of the attribute.

## Various dictionary examples

In [12]:
person_1 = {
    'fname': 'Ali',
    'lname': 'Izadi',
    'email': 'ali@izadi.com',
    'children': ['Felfeli', 'Ghelgheli',' Morghe Zarde Kakoli'],
}
person_2 = {
    'fname': 'SASAN',
    'lname': 'JOKAR',
    'email': 'sasan@jokar.com',
    'phone': '123-456',
}


In [10]:
people = [person_1, person_2]
print(people[0]['fname'])
for person in people:
    print(person)
print('----------------')

people_by_name = {
    'Ali Izadi': 'ali@izadi.com',
    'Sasan Jokar': 'sasan@jokar.com',
}
print(people_by_name['Sasan Jokar'])
for name, email in people_by_name.items():
    print(f"{name}  ->  {email}")
print('----------------')



full_people_by_name = {
    'Sasan': person_1,
    'Ali': person_2,
}

print(full_people_by_name['Ali']['lname'])
print(full_people_by_name['Sasan'])
for fname, data in full_people_by_name.items():
    print(fname)
    print(data)

Ali
{'fname': 'Ali', 'lname': 'Izadi', 'email': 'ali@izadi.com', 'children': ['Felfeli', 'Ghelgheli', ' Morghe Zarde Kakoli']}
{'fname': 'SASAN', 'lname': 'JOKAR', 'email': 'sasan@jokar.com', 'phone': '123-456'}
----------------
sasan@jokar.com
Ali Izadi  ->  ali@izadi.com
Sasan Jokar  ->  sasan@jokar.com
----------------
JOKAR
{'fname': 'Ali', 'lname': 'Izadi', 'email': 'ali@izadi.com', 'children': ['Felfeli', 'Ghelgheli', ' Morghe Zarde Kakoli']}
Sasan
{'fname': 'Ali', 'lname': 'Izadi', 'email': 'ali@izadi.com', 'children': ['Felfeli', 'Ghelgheli', ' Morghe Zarde Kakoli']}
Ali
{'fname': 'SASAN', 'lname': 'JOKAR', 'email': 'sasan@jokar.com', 'phone': '123-456'}


## Dictionary

- We can start from an empty dictionary and then fill it with key-value pairs.


In [14]:
user = {}
user['name'] = 'Ali'
print(user)        # {'name': 'Ali'}

user['email'] = 'ali@izadi.com'
print(user)        # {'name': 'Ali', 'email': 'ali@izadi.com'}

the_name = user['name']
print(the_name)    # Ali

field = 'name'
the_value = user[field]
print(the_value)   # Ali

user['name'] = 'Sasan Joe'
print(user)      # {'name': 'Sasan Joe', 'email': 'ali@izadi.com'}


{'name': 'Ali'}
{'name': 'Ali', 'email': 'ali@izadi.com'}
Ali
Ali
{'name': 'Sasan Joe', 'email': 'ali@izadi.com'}


## Create dictionary

- We can also start with a dictionary that already has some data in it.


In [15]:
user = {
   'fname': 'Foo',
   'lname': 'Bar',
}

print(user)   # {'lname': 'Bar', 'fname': 'Foo'}

user['email'] = 'foo@bar.com'

{'fname': 'Foo', 'lname': 'Bar'}


## keys

- Sometimes we don't know up front what keys we might have
- Keys are returned in seemingly **random order**.


In [18]:
import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} FILENAME")

filename = sys.argv[1]

planets = {'Jupiter' :300, 'Saturn':500,  'Earth' :0}
print(planets) #

print(planets.keys())        #
print(list(planets.keys()))  #

{'Jupiter': 300, 'Saturn': 500, 'Earth': 0}
dict_keys(['Jupiter', 'Saturn', 'Earth'])
['Jupiter', 'Saturn', 'Earth']


## Loop over keys

In [19]:
user = {
    'fname': 'Foo',
    'lname': 'Bar',
}

for key in user.keys():
    print(key)

# lname
# fname

for key in user.keys():
    print(f"{key} -> {user[key]}")

# lname -> Bar
# fname -> Foo

fname
lname
fname -> Foo
lname -> Bar


## Loop over dictionary keys
- **Looping over the "dictionary" is just like looping over the keys**
- prefer to use the somedictionary.keys() expression

In [20]:
user = {
    'fname': 'Foo',
    'lname': 'Bar',
}

for key in user:
    print(f"{key} -> {user[key]}")

# lname -> Bar
# fname -> Foo

fname -> Foo
lname -> Bar


## Loop using items

In [21]:
people = {
    "Tal"  : "123",
    "Maya" : "456",
    "Ruth" : "789",
}

for name, uid in people.items():
    print(f"{name} => {uid}")

Tal => 123
Maya => 456
Ruth => 789


In [22]:
user = {
    'fname': 'Foo',
    'lname': 'Bar',
}

for tpl in user.items():      # iterates on tuples
    print(f"{tpl[0]} -> {tpl[1]}")
    print("{} -> {}".format(*tpl))

# fname -> Foo
# fname -> Foo
# lname -> Bar
# lname -> Bar

fname -> Foo
fname -> Foo
lname -> Bar
lname -> Bar


## values

- Values are returned in the same random order as the keys are.


In [24]:
user = {
   'fname': 'Foo',
   'lname': 'Bar',
   'workplace': 'Bar',
}

print(user)   # {'fname': 'Foo', 'lname': 'Bar', 'workplace': 'Bar'}

print(user.keys())    # dict_keys(['fname', 'lname', 'workplace'])

print(user.values())  # dict_values(['Foo', 'Bar', 'Bar'])

{'fname': 'Foo', 'lname': 'Bar', 'workplace': 'Bar'}
dict_keys(['fname', 'lname', 'workplace'])
dict_values(['Foo', 'Bar', 'Bar'])


## Not existing key
- If we try to fetch the value of a key that does not exist, we get an **KeyError exception**.



In [26]:
user = {
    'fname': 'Foo',
    'lname': 'Bar',
}

print(user['fname'])
print(user['email'])

Foo


KeyError: 'email'

## Get key

- If we use the **get method**, we get None if the key does not exist.

- None will be interpreted as False, if checked as a boolean.


In [27]:
user = {
    'fname': 'Foo',
    'lname': 'Bar',
    'address': None,
}

print(user.get('fname'))        # Foo     - because 'fname' has the value 'Foo'
print(user.get('email'))        # None    - because 'email' does not exist
print(user.get('address'))      # None    - because 'address' has the value None

# set a default value to return
print(user.get('fname', 'ABC')) # Foo     - because the value of 'fname' is 'Foo'
print(user.get('answer', 42))   # 42      - because 'answer' does not exist
print(user.get('address', 23))  # None    - because None is the value of the 'address' key

Foo
None
None
Foo
42
None


## Does the key exist?

In [28]:
user = {
    'fname': 'Foo',
    'lname': 'Bar',
    'answer': None,
}

print('fname' in user)  # True
print('email' in user)  # False
print('answer' in user) # True
print('Foo' in user)    # False

for attr in ['fname', 'email', 'lname']:
    if attr in user:
        print(f"{attr} => {user[attr]}")

# fname => Foo
# lname => Bar

True
False
True
False
fname => Foo
lname => Bar


## Does the value exist?

In [29]:
user = {
   'fname': 'Foo',
   'lname': 'Bar',
}

print('fname' in user.values())  # False
print('Foo' in user.values())    # True

False
True


## Delete key

- **del()**
- **.pop()**

In [30]:
user = {
    'fname': 'Foo',
    'lname': 'Bar',
    'email': 'foo@bar.com',
}

print(user) # {'lname': 'Bar', 'email': 'foo@bar.com', 'fname': 'Foo'}

fname = user['fname']
del user['fname']
print(fname) # Foo
print(user) # {'lname': 'Bar', 'email': 'foo@bar.com'}

lname_was = user.pop('lname')
print(lname_was) # Bar
print(user) # {'email': 'foo@bar.com'}

{'fname': 'Foo', 'lname': 'Bar', 'email': 'foo@bar.com'}
Foo
{'lname': 'Bar', 'email': 'foo@bar.com'}
Bar
{'email': 'foo@bar.com'}


## List of dictionaries

In [40]:
people = [
    {
        'name'  : 'Foo Bar',
        'email' : 'foo@example.com'
    },
    {
        'name'     : 'Tal Bar',
        'email'    : 'tal@example.com',
        'address'  : 'Borg, Country',
        'children' : [
            'Alpha',
            'Beta'
        ]
    }
]
children = people[1]['children']

# print(people)
print(people[0]['name'])
print(people[1]['children'][0])
people[1]['children'].append('Gamma')
print(children)

print(list(map(lambda p: p['name'], people)))
print(list(map(lambda p: p['name'] + ', ' + p['email'], people)))
people[0]['children'] = ['Zorg', 'Buzz']

Foo Bar
Alpha
['Alpha', 'Beta', 'Gamma']
['Foo Bar', 'Tal Bar']
['Foo Bar, foo@example.com', 'Tal Bar, tal@example.com']


## Shared dictionary

In [49]:
people = [
    {
       "name" : "Foo",
       "id"   : "1",
    },
    {
       "name" : "Bar",
       "id"   : "2",
    },
    {
       "name" : "Moo",
       "id"   : "3",
    },
]

by_name = {}
by_id = {}
for person in people:
    by_name[ person['name' ] ] = person
    by_id[ person['id' ] ] = person
print(by_name)
print(by_id)
print('-------------------')

print(by_name["Foo"])
by_name["Foo"]['email'] = 'foo@bar.co'


print('--------Change in original people list-----------')

people[0]["name"] = "Foooooo"; # people, by_name and by_id keep reference to person items

print(people)

print('---------Shared Part----------')

print("by_name", by_name)
print("by_id", by_id)

print('---------Not shared Part----------')

print(by_name["Foo"])  # the key remained Foo !!!!
print(by_id["1"])

{'Foo': {'name': 'Foo', 'id': '1'}, 'Bar': {'name': 'Bar', 'id': '2'}, 'Moo': {'name': 'Moo', 'id': '3'}}
{'1': {'name': 'Foo', 'id': '1'}, '2': {'name': 'Bar', 'id': '2'}, '3': {'name': 'Moo', 'id': '3'}}
-------------------
{'name': 'Foo', 'id': '1'}
--------Change in original people list-----------
[{'name': 'Foooooo', 'id': '1', 'email': 'foo@bar.co'}, {'name': 'Bar', 'id': '2'}, {'name': 'Moo', 'id': '3'}]
---------Shared Part----------
by_name {'Foo': {'name': 'Foooooo', 'id': '1', 'email': 'foo@bar.co'}, 'Bar': {'name': 'Bar', 'id': '2'}, 'Moo': {'name': 'Moo', 'id': '3'}}
by_id {'1': {'name': 'Foooooo', 'id': '1', 'email': 'foo@bar.co'}, '2': {'name': 'Bar', 'id': '2'}, '3': {'name': 'Moo', 'id': '3'}}
---------Not shared Part----------
{'name': 'Foooooo', 'id': '1', 'email': 'foo@bar.co'}
{'name': 'Foooooo', 'id': '1', 'email': 'foo@bar.co'}


## immutable collection: tuple as dictionary key

In [50]:
points = {}
p1 = (2, 3)

points[p1] = 'Joe'
points[(17, 5)] = 'Jane'

print(points)
for k in points.keys():
    print(k)
    print(k.__class__)
    print(k.__class__.__name__)
    print(points[k])

{(2, 3): 'Joe', (17, 5): 'Jane'}
(2, 3)
<class 'tuple'>
tuple
Joe
(17, 5)
<class 'tuple'>
tuple
Jane


## immutable numbers: numbers as dictionary key

In [51]:
number = {
    23   : "Twenty three",
    17   : "Seventeen",
    3.14 : "Three dot fourteen",
    42   : "The answer",
}

print(number)
print(number[42])
print(number[3.14])

{23: 'Twenty three', 17: 'Seventeen', 3.14: 'Three dot fourteen', 42: 'The answer'}
The answer
Three dot fourteen


## Sort a dictionary

- "sort a dictionary" usually means sorting the keys of the dictionary, but what does it mean in Python if we call sorted on a dictionary?

In [52]:
scores = {
   'Foo' : 10,
   'Bar' : 34,
   'Miu' : 88,
   'Abc' : 34,
}

print(scores) # {'Foo': 10, 'Bar': 34, 'Miu': 88, 'Abc': 34}

sorted_names = sorted(scores) # "sort dictionary" sorts the keys
print(sorted_names)  # ['Abc', 'Bar', 'Foo', 'Miu']

sorted_keys = sorted(scores.keys())
print(sorted_keys)  # ['Abc', 'Bar', 'Foo', 'Miu']

{'Foo': 10, 'Bar': 34, 'Miu': 88, 'Abc': 34}
['Abc', 'Bar', 'Foo', 'Miu']
['Abc', 'Bar', 'Foo', 'Miu']


## Sort dictionary values

In [53]:
scores = {
   'Foo' : 10,
   'Bar' : 34,
   'Miu' : 88,
   'Abc' : 34,
}

# sort the values, but we cannot get the keys back!
sorted_values = sorted(scores.values())
print(sorted_values) # [10, 34, 34, 88]

[10, 34, 34, 88]


## Sort dictionary by value

- Sort the keys by the values


In [57]:
scores = {
   'Foo' : 10,
   'Bar' : 34,
   'Miu' : 88,
   'Abc' : 34,
}

def by_value(x):
    return scores[x]

sorted_names = sorted(scores.keys(), key=by_value)
print(sorted_names) # ["Foo", "Bar", "Abc", "Miu"]

# sort using a lambda expression
sorted_names = sorted(scores.keys(), key=lambda x: scores[x])

print(sorted_names) # ["Foo", "Bar", "Abc", "Miu"]

for k in sorted_names:
    print("{} : {}".format(k, scores[k]))

# Foo : 10
# Bar : 34
# Abc : 34
# Miu : 88

['Foo', 'Bar', 'Abc', 'Miu']
['Foo', 'Bar', 'Abc', 'Miu']
Foo : 10
Bar : 34
Abc : 34
Miu : 88


In [58]:
scores = {
   'Foo' : 10,
   'Bar' : 34,
   'Miu' : 88,
   'Abc' : 34,
}

# sort the keys according to the values:
sorted_names = sorted(scores, key=scores.__getitem__)

print(sorted_names) # ["Foo", "Bar", "Miu", "Abc"]

for k in sorted_names:
    print("{} : {}".format(k, scores[k]))

# Foo : 10
# Bar : 34
# Abc : 34
# Miu : 88

['Foo', 'Bar', 'Abc', 'Miu']
Foo : 10
Bar : 34
Abc : 34
Miu : 88


## Sort dictionary keys by value (another example)

In [64]:
scores = {
    "Jane"    : 30,
    "Joe"     : 20,
    "George"  : 30,
    "Hellena" : 90,
}

for name in scores.keys():
    print(f"{name:8} {scores[name]}")

print('')
for name in sorted(scores.keys()):
    print(f"{name:8} {scores[name]}")

print('')
for val in sorted(scores.values()):
    print(f"{'':8} {val}")

print('')
for name in sorted(scores.keys(), key=lambda x: scores[x]):
    print(f"{name:8} {scores[name]}")

Jane     30
Joe      20
George   30
Hellena  90

George   30
Hellena  90
Jane     30
Joe      20

         20
         30
         30
         90

Joe      20
Jane     30
George   30
Hellena  90


## Insertion Order is kept
- Since Python 3.7



In [66]:
d = {}
d['a'] = 1
d['b'] = 2
d['d'] = 4
d['c'] = 3

print(d)

{'a': 1, 'b': 2, 'd': 4, 'c': 3}


## Change order of keys in dictionary - OrderedDict

In [68]:
from collections import OrderedDict

d = OrderedDict()
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4

print(d)
d.move_to_end('a')

print(d)
d.move_to_end('d', last=False) # Move the key 'd' to the beginning of the OrderedDict.

print(d)

for key in d.keys():
    print(key)

OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
OrderedDict([('b', 2), ('c', 3), ('d', 4), ('a', 1)])
OrderedDict([('d', 4), ('b', 2), ('c', 3), ('a', 1)])
d
b
c
a


## Set order of keys in dictionary - OrderedDict

In [69]:
from collections import OrderedDict

d = {}
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4
print(d)

planned_order = ('b', 'c', 'd', 'a')
e = OrderedDict(sorted(d.items(), key=lambda x: planned_order.index(x[0])))
print(e)

print('-----')
# Create index to value mapping dictionary from a list of values
planned_order = ('b', 'c', 'd', 'a')
plan = dict(zip(planned_order, range(len(planned_order))))
print(plan)

f = OrderedDict(sorted(d.items(), key=lambda x: plan[x[0]]))
print(f)

{'a': 1, 'b': 2, 'c': 3, 'd': 4}
OrderedDict([('b', 2), ('c', 3), ('d', 4), ('a', 1)])
-----
{'b': 0, 'c': 1, 'd': 2, 'a': 3}
OrderedDict([('b', 2), ('c', 3), ('d', 4), ('a', 1)])


## OrderedDict functions

The `OrderedDict` class from the `collections` module in Python provides several functions to work with ordered dictionaries. Here are some of the commonly used functions:

1. `__init__(self, *args, **kwargs)`: Initializes an `OrderedDict` object. It can be instantiated with an optional iterable of key-value pairs or keyword arguments.

2. `__getitem__(self, key)`: Returns the value associated with the given key.

3. `__setitem__(self, key, value)`: Sets the value for the specified key.

4. `__delitem__(self, key)`: Deletes the key-value pair associated with the given key.

5. `__iter__(self)`: Returns an iterator over the keys of the `OrderedDict`, preserving the order in which they were inserted.

6. `__reversed__(self)`: Returns a reverse iterator over the keys of the `OrderedDict`.

7. `__contains__(self, key)`: Checks whether the `OrderedDict` contains the specified key.

8. `keys(self)`: Returns a view object containing all the keys in the `OrderedDict`, in the order they were inserted.

9. `values(self)`: Returns a view object containing all the values in the `OrderedDict`, in the order they were inserted.

10. `items(self)`: Returns a view object containing all the key-value pairs in the `OrderedDict`, in the order they were inserted.

11. `popitem(self, last=True)`: Removes and returns the last key-value pair by default, or the first if `last=False`.

12. `pop(self, key, default=None)`: Removes and returns the value associated with the specified key. If the key is not found, it returns the specified default value.

13. `clear(self)`: Removes all the elements from the `OrderedDict`.

14. `copy(self)`: Returns a shallow copy of the `OrderedDict`.

15. `update(self, *args, **kwargs)`: Updates the `OrderedDict` with key-value pairs from another `OrderedDict`, an iterable of key-value pairs, or keyword arguments.

16. `move_to_end(self, key, last=True)`: Moves the specified key to either the last position (by default) or the first position if `last=False`.

These functions provide various operations to manipulate and interact with `OrderedDict` objects while preserving the insertion order of elements.

## Setdefault

- Trying to access a key in a dictionary that does not exist will result a KeyError exception.
- Using the get method we can avoid this. The get method, will return the value of the key if the key exists. None if the key does not exists, or a default value if it was supplied to the get method. This will not change the dictionary.

- Using the setdefault method is similar to the get method but it will also create the key with the given value.



In [72]:
grades = {}
# print(grades['basic'])              # KeyError: 'basic'
print(grades.get('python'))           # None
print(grades.get('python', 'snake'))  # snake
print(grades)                         # {}

print(grades.setdefault('perl'))      # None
print(grades)                         # {'perl': None}

print(grades.setdefault('python', 'snake')) # 'snake'
print(grades)                         # {'perl': None, 'python': 'snake'}
print(grades.setdefault('python')) # 'snake'

None
snake
{}
None
{'perl': None}
snake
{'perl': None, 'python': 'snake'}
snake


## Default Dict

In [76]:
counter = {}

word = 'eggplant'

counter[word] += 1
# counter[word] = counter[word] + 1

KeyError: 'eggplant'

In [77]:
counter = {}

word = 'eggplant'

if word not in counter:
    counter[word] = 0
counter[word] += 1

print(counter)

{'eggplant': 1}


In [78]:
from collections import defaultdict

counter = defaultdict(int)

word = 'eggplant'

counter[word] += 1

print(counter)

defaultdict(<class 'int'>, {'eggplant': 1})


## Do not change dictionary in loop

In [79]:
my_dict = {'a': 1, 'b': 2, 'c': 3}

# Method 1: Changing dictionary keys
for key in my_dict:
    if key == 'b':
        del my_dict[key]

print(my_dict)

RuntimeError: dictionary changed size during iteration

In [81]:
my_dict = {'a': 1, 'b': 2, 'c': 3, 'd' : 4}
keys_to_remove = []

# Identify keys to remove
for key in my_dict:
    if key == 'b' or key == 'c':
        keys_to_remove.append(key)

# Remove identified keys
for key in keys_to_remove:
    del my_dict[key]

print(my_dict)

{'a': 1, 'd': 4}


## Named tuple (sort of immutable dictionary)

In [82]:
from collections import namedtuple

Person = namedtuple('Person', ['name', 'email'])

one = Person(name='Joe', email='joe@example.com')
two = Person(name='Jane', email='jane@example.com')

print(one.name)
print(two.email)

Joe
jane@example.com


## Create dictionary from List

In [84]:
categories_list = ['animals', 'vegetables', 'fruits']

categories_dict = {cat:[] for cat in categories_list}
print(categories_dict)
categories_dict['animals'].append('cat')
print(categories_dict)

{'animals': [], 'vegetables': [], 'fruits': []}
{'animals': ['cat'], 'vegetables': [], 'fruits': []}


# <b style='color:red'>Exercies</b>

## Exercise: count characters

- Write a script that given a long text will count how many times each character appears.
- Change the code so it will be able to count characters in a file.


`text = """
This is a very long text.
OK, maybe it is not that long after all.
"""`

In [91]:
text = """
This is a very long text.
OK, maybe it is not that long after all.
"""
char_count = dict()
for c in text:
    if c in char_count:
        char_count[c] += 1
    else:
        char_count [c] = 1
        
for c, count in char_count.items():
    print(f'{c:8} {count:8}')


               3
T               1
h               2
i               4
s               3
               13
a               5
v               1
e               4
r               2
y               2
l               4
o               3
n               3
g               2
t               7
x               1
.               2
O               1
K               1
,               1
m               1
b               1
f               1


## Exercise: count words

`words = ['Wombat', 'Rhino', 'Sloth', 'Tarantula', 'Sloth', 'Rhino', 'Sloth']`


- Expected output: (the order is not important)


`Wombat:1`<br/>
`Rhino:2`<br/>
`Sloth:3`<br/>
`Tarantula:1`<br/>

In [94]:
words = ['Wombat', 'Rhino', 'Sloth', 'Tarantula', 'Sloth', 'Rhino', 'Sloth']
word_count = dict()
for w in words:
    if w in word_count:
        word_count[w] += 1
    else:
        word_count [w] = 1
        
for w, count in word_count.items():
    print(f'{w:10} {count:8}')

Wombat            1
Rhino             2
Sloth             3
Tarantula         1


## Exercise: count words from a file

In [100]:
word_count = dict()

with open('__FILES/lorem ipsum.txt', 'r') as f:
    for line in f:
        words = line.split(' ')
        for w in words:
            if w in word_count:
                word_count[w] += 1
            else:
                word_count [w] = 1

for w, count in word_count.items():
    print(f'{w:10} {count:8}')

لورم              1
ایپسوم            1
یا                3
طرح‌نما           1
(به               1
انگلیسی:          1
Lorem             1
ipsum)            1
به                7
متنی              1
آزمایشی           2
و                14
بی‌معنی           2
در                6
صنعت              1
چاپ،              1
صفحه‌آرایی        2
طراحی             3
گرافیک            3
گفته              1
می‌شود.           1
طراح              1
از                7
این               1
متن               6
عنوان             1
عنصری             1
ترکیب             1
بندی              2
برای              2
پر                1
کردن              1
صفحه              4
ارایه             1
اولیه             1
شکل               1
ظاهری             1
کلی               1
طرح               1
سفارش             1
گرفته             2
شده               2
استفاده           3
می                2
نماید،            1
تا                3
نظر               5
گرافیکی           2
نشانگر            1
چگونگی            1


## Exercise: Apache log

In [104]:
ip_count = dict()

with open('__FILES/apache.log') as apache_log:
    for line in apache_log:
        ip_addr = line.split(' ')[0]
        if ip_addr in ip_count:
            ip_count[ip_addr] += 1
        else:
            ip_count[ip_addr] = 1
for ip, c in ip_count.items():
    print(f'{ip:10} {c:8}')

127.0.0.1        12
139.12.0.2        2
217.0.22.3        7


## Exercise: counting DNA bases

In [111]:
from collections import defaultdict

sequence =  "ACTNGTGCTYGATRGTAGCYXGTN"
element_count = defaultdict(int)
for s in sequence:
    element_count[s] += 1

total = len(sequence)

for ele, count in element_count.items():
    percent = round(100 * count / total, 2)
    print(f'{ele} {count} - {percent:0.2f} %')

A 3 - 12.50 %
C 3 - 12.50 %
T 6 - 25.00 %
N 2 - 8.33 %
G 6 - 25.00 %
Y 2 - 8.33 %
R 1 - 4.17 %
X 1 - 4.17 %


## Exercise: Count Amino Acids

- Each sequence consists of many repetition of the 4 bases represented by the ACTG characters.
- There are 64 codons (sets of 3 bases following each other)
- There are 20 Amino Acids each of them are represented by 3 bases (by one codon).
- Some of the Amino Acids can be represented in multiple ways, represented in the Codon Table. For example Histidine can be encoded by both CAT, CAC
- Create a script that given a file witha DNA sequence in it, will count the Amino acids from the sequence.
- Read the sequence saved in a txt file.
- You can generate a sequence with a random number generator and save it to that file, but it would be much better if you used a real sequence.
- An even better way would be to read the sequence from a FASTA file. You can download one from NCBI.

In [74]:
codon_table = {
    'Phe' : ['TTT', 'TTC'],
    'Leu' : ['TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'],
    'Ile' : ['ATT', 'ATC', 'ATA'],
    'Met' : ['ATG'],
    'Val' : ['GTT', 'GTC', 'GTA', 'GTG'],
    'Ser' : ['TCT', 'TCC', 'TCA', 'TCG', 'AGT', 'AGC'],
    'Pro' : ['CCT', 'CCC', 'CCA', 'CCG'],
    'Thr' : ['ACT', 'ACC', 'ACA', 'ACG'],
    'Ala' : ['GCT', 'GCC', 'GCA', 'GCG'],
    'Tyr' : ['TAT', 'TAC'],
    'His' : ['CAT', 'CAC'],
    'Gln' : ['CAA', 'CAG'],
    'Asn' : ['AAT', 'AAC'],
    'Lys' : ['AAA', 'AAG'],
    'Asp' : ['GAT', 'GAC'],
    'Glu' : ['GAA', 'GAG'],
    'Cys' : ['TGT', 'TGC'],
    'Trp' : ['TGG'],
    'Arg' : ['CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'],
    'Gly' : ['GGT', 'GGC', 'GGA', 'GGG'],
    'STOP' : ['TAA', 'TAG', 'TGA']
}

In [127]:
from collections import defaultdict

codon_count = defaultdict(int)

with open('__FILES/actg_sequence.txt', 'r') as astg:
    for n, line in enumerate(astg):
        for i in range(0,len(line)-3):
            seq = line[i:i+3]
            for k, v in codon_table.items():
                if seq.upper() in v:
                    codon_count[k] += 1
                    # print(k + ' ' + seq.upper())
for k,v in codon_count.items():
    print(f'{k:8} {v:8}')

Lys            45
Arg            95
Gly            55
Glu            38
Ser            99
Ala            47
Leu            85
STOP           41
Val            64
Phe            36
Asp            26
Ile            39
Tyr            22
Thr            64
Pro            57
His            34
Gln            33
Asn            35
Met            11
Cys            27
Trp            12


## Exercise: List of dictionaries

In [134]:
people = list()
with open('__FILES/people.txt' , 'r') as f:
    for i, p in enumerate(f):
        if i == 0:
            continue
        fname,lname,born = p.strip('\n').split(',')
        people.append({'fname' : fname, 'lname' : lname, 'born' : born})

for p in people:
    print(p)

{'fname': 'Graham', 'lname': 'Chapman', 'born': '8 January 1941'}
{'fname': 'Eric', 'lname': 'Idle', 'born': '29 March 1943'}
{'fname': 'Terry', 'lname': 'Gilliam', 'born': '22 November 1940'}
{'fname': 'Terry', 'lname': 'Jones', 'born': '1 February 1942'}
{'fname': 'John', 'lname': 'Cleese', 'born': '27 October 1939'}
{'fname': 'Michael', 'lname': 'Palin', 'born': '5 May 1943'}


## Exercise: Dictionary of dictionaries

In [139]:
people = dict()
with open('__FILES/people.txt' , 'r') as f:
    for i, p in enumerate(f):
        if i == 0:
            continue
        fname,lname,born = p.strip('\n').split(',')
        people[(fname,lname)] = dict()
        people[(fname,lname)]['born'] = born



for p,v in people.items():
    print(p, v['born'])
print()
print(people[('Eric', 'Idle')]['born']) # 29 March 1943

('Graham', 'Chapman') 8 January 1941
('Eric', 'Idle') 29 March 1943
('Terry', 'Gilliam') 22 November 1940
('Terry', 'Jones') 1 February 1942
('John', 'Cleese') 27 October 1939
('Michael', 'Palin') 5 May 1943

29 March 1943


## Exercise: Age limit with dictionaries

- Ask the user what is their age and in which country are they located.
- Tell them if they can legally drink alcohol.
- See the Legal drinking age list.
- Given a file like the following create a new file with a third column in which you write "yes", or "no" depending if the person can legally drink alcohol in that country.

In [144]:
age_limit = {'US' : 21, 'UK' : 18, 'IR' : 69}

age = int(input('How old are you?'))
country = input('Where are you from?')

if country not in age_limit:
    print('Country not supported')
else:
    if age >= age_limit[country]:
        print('Salute!')
    else:
        print('You can go to prison for that!')

How old are you? 68
Where are you from? IR


You can go to prison for that!


## Exercise: Merge files with timestamps

In [148]:
a = [
    '1601009973,1',
    '1601009975,3',
    '1601009976,4',
    '1601009978,6',
    '1601009981,9',
    '1601009982,10',
    '1601009983,11',
    '1601009984,12',
    '1601009987,15',
    '1601009989,17',
    '1601009990,18',
    '1601009991,19',
    '1601009992,20'
]

b = [
    '1601009974,2',
    '1601009977,5',
    '1601009980,8',
    '1601009988,16',
]

c = [
    '1601009979,7',
    '1601009985,13',
    '1601009986,14',
]

In [151]:
import sys


idx_a = 0
idx_b = 0
while idx_a < len(a) and idx_b < len(b):
    line_a = a[idx_a]
    line_b = b[idx_b]

    time_a = line_a.split(',')[0]
    time_b = line_b.split(',')[0]
    if int(time_a) < int(time_b):
        print(line_a)
        idx_a += 1
    else:
        print(line_b)
        idx_b += 1

1601009973,1
1601009974,2
1601009975,3
1601009976,4
1601009977,5
1601009978,6
1601009980,8
1601009981,9
1601009982,10
1601009983,11
1601009984,12
1601009987,15
1601009988,16


In [155]:
files = []
for f in [a,b,c]:
    files.extend(f)

sorted_files = sorted(files, key = lambda k: k.split(',')[0])
for s in sorted_files:
    print(s)

1601009973,1
1601009974,2
1601009975,3
1601009976,4
1601009977,5
1601009978,6
1601009979,7
1601009980,8
1601009981,9
1601009982,10
1601009983,11
1601009984,12
1601009985,13
1601009986,14
1601009987,15
1601009988,16
1601009989,17
1601009990,18
1601009991,19
1601009992,20


## Sort Hungarian letters (lookup table)

In [157]:
letters = [
    "a", "á", "b", "c", "cs", "d", "dz", "dzs", "e", "é", "f",
    "g", "gy", "h", "i", "í", "j", "k", "l", "ly", "m", "n",
    "ny", "o", "ó", "ö", "ő", "p", "q", "r", "s", "sz", "t",
    "ty", "u", "ú", "ü", "ű", "v", "w", "x", "y", "z", "zs",
]
print(enumerate(letters))
print('-------')
print(list(enumerate(letters)))
print('-------')
print(dict(enumerate(letters)))
print('-------')

# reverse key:value in letter dict
#mapping = {v:k for k, v in dict(enumerate(letters)).items()}

mapping = {letter:ix for ix, letter in enumerate(letters)}
print(mapping)
print('------------------')

text = ["cs", "á", "ő", "ú", "e", "dzs", "zs", "a", "ny"]
print(sorted(text))
print('------------------')
print(sorted(text, key=lambda letter: mapping[letter]))

<enumerate object at 0x7f9f60613200>
-------
[(0, 'a'), (1, 'á'), (2, 'b'), (3, 'c'), (4, 'cs'), (5, 'd'), (6, 'dz'), (7, 'dzs'), (8, 'e'), (9, 'é'), (10, 'f'), (11, 'g'), (12, 'gy'), (13, 'h'), (14, 'i'), (15, 'í'), (16, 'j'), (17, 'k'), (18, 'l'), (19, 'ly'), (20, 'm'), (21, 'n'), (22, 'ny'), (23, 'o'), (24, 'ó'), (25, 'ö'), (26, 'ő'), (27, 'p'), (28, 'q'), (29, 'r'), (30, 's'), (31, 'sz'), (32, 't'), (33, 'ty'), (34, 'u'), (35, 'ú'), (36, 'ü'), (37, 'ű'), (38, 'v'), (39, 'w'), (40, 'x'), (41, 'y'), (42, 'z'), (43, 'zs')]
-------
{0: 'a', 1: 'á', 2: 'b', 3: 'c', 4: 'cs', 5: 'd', 6: 'dz', 7: 'dzs', 8: 'e', 9: 'é', 10: 'f', 11: 'g', 12: 'gy', 13: 'h', 14: 'i', 15: 'í', 16: 'j', 17: 'k', 18: 'l', 19: 'ly', 20: 'm', 21: 'n', 22: 'ny', 23: 'o', 24: 'ó', 25: 'ö', 26: 'ő', 27: 'p', 28: 'q', 29: 'r', 30: 's', 31: 'sz', 32: 't', 33: 'ty', 34: 'u', 35: 'ú', 36: 'ü', 37: 'ű', 38: 'v', 39: 'w', 40: 'x', 41: 'y', 42: 'z', 43: 'zs'}
-------
{'a': 0, 'á': 1, 'b': 2, 'c': 3, 'cs': 4, 'd': 5, 'dz': 6