# <center> Data structures. Part 2 </center>

**DIctionaries**

A dictionary is an unordered structure, it consists of keys and values. We cannot refer to the object by index, but we can refer to the key.

In [1]:
president = {
  "name": "Uknown",
  "age": 78,
  "political party": 'Democratic',
  "children" : ['Hunter', 'Beau', 'Ashley', 'Naomi Christina']
}

We cannot refer to the object by index, but we can refer to the key:

In [2]:
president[0]

KeyError: 0

In [3]:
president['age']

78

To find out what keys or values are in the dictionary, you can use the corresponding methods.

In [4]:
president.keys()

dict_keys(['name', 'age', 'political party', 'children'])

In [5]:
president.values()

dict_values(['Uknown', 78, 'Democratic', ['Hunter', 'Beau', 'Ashley', 'Naomi Christina']])

In [6]:
president.items()

dict_items([('name', 'Uknown'), ('age', 78), ('political party', 'Democratic'), ('children', ['Hunter', 'Beau', 'Ashley', 'Naomi Christina'])])

In [7]:
for key, value in president.items():
    print('key:', key, '- value:', value)

key: name - value: Uknown
key: age - value: 78
key: political party - value: Democratic
key: children - value: ['Hunter', 'Beau', 'Ashley', 'Naomi Christina']


In [8]:
for key in president:
    print(key)

name
age
political party
children


Adding key-value pairs to a dictionary is very simple, like replacing a value by index in lists. The deletion is done using the ```del``` function. We can also check the presence of the key in the dictionary.

In [9]:
president['place of birth'] = 'Scranton, Pennsylvania, U.S.' # new key and value
president['education'] = ['University of Delaware (BA)','Syracuse University (JD)']
president['name'] = 'Joe Biden'  # overwrite the value in existing key
print(president)

{'name': 'Joe Biden', 'age': 78, 'political party': 'Democratic', 'children': ['Hunter', 'Beau', 'Ashley', 'Naomi Christina'], 'place of birth': 'Scranton, Pennsylvania, U.S.', 'education': ['University of Delaware (BA)', 'Syracuse University (JD)']}


In [10]:
del president['children']
print(president)

{'name': 'Joe Biden', 'age': 78, 'political party': 'Democratic', 'place of birth': 'Scranton, Pennsylvania, U.S.', 'education': ['University of Delaware (BA)', 'Syracuse University (JD)']}


In [11]:
'place of birth' in president # is 'place of birth' among keys?

True

**Sets**

<img src="https://www.learnbyexample.org/wp-content/uploads/python/Python-Set-Operatioons.png" height=300 width=300>

In [12]:
european_union = {"Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Czech Republic", "Denmark",
    "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Ireland", "Italy", "Latvia", 
    "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal", "Romania", 
    "Slovakia", "Slovenia", "Spain", "Sweden"}

eurozone = {"Austria", "Belgium", "Cyprus", "Portugal", "Spain", 
            "Slovakia", "Slovenia", "Ireland", "Italy", "France", "Germany", "Greece",
            "Estonia", "Finland", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands"}

In [13]:
print(european_union | eurozone)  # union of sets -- we join all unique names from two sets together
print(european_union & eurozone)  # intersection -- keep the elements that are in both sets
print(european_union - eurozone)  # difference -- elements in the first set, but not in the second
print(european_union ^ eurozone)  # symmetric difference -- elements from A | B, but not from A & B

{'Sweden', 'Slovenia', 'Croatia', 'Poland', 'Denmark', 'Czech Republic', 'Latvia', 'Ireland', 'Netherlands', 'France', 'Bulgaria', 'Finland', 'Italy', 'Malta', 'Estonia', 'Greece', 'Belgium', 'Cyprus', 'Portugal', 'Spain', 'Luxembourg', 'Austria', 'Romania', 'Germany', 'Hungary', 'Slovakia', 'Lithuania'}
{'Slovenia', 'Latvia', 'Ireland', 'Netherlands', 'France', 'Finland', 'Italy', 'Estonia', 'Malta', 'Greece', 'Belgium', 'Cyprus', 'Portugal', 'Spain', 'Luxembourg', 'Austria', 'Germany', 'Slovakia', 'Lithuania'}
{'Sweden', 'Bulgaria', 'Croatia', 'Poland', 'Denmark', 'Czech Republic', 'Romania', 'Hungary'}
{'Sweden', 'Bulgaria', 'Croatia', 'Poland', 'Denmark', 'Czech Republic', 'Romania', 'Hungary'}


Instead of signs as operators you can use methods:

In [14]:
print(european_union.union(eurozone))
print(european_union.intersection(eurozone))
print(european_union.difference(eurozone))
print(european_union.symmetric_difference(eurozone))

{'Sweden', 'Slovenia', 'Croatia', 'Poland', 'Denmark', 'Czech Republic', 'Latvia', 'Ireland', 'Netherlands', 'France', 'Bulgaria', 'Finland', 'Italy', 'Malta', 'Estonia', 'Greece', 'Belgium', 'Cyprus', 'Portugal', 'Spain', 'Luxembourg', 'Austria', 'Romania', 'Germany', 'Hungary', 'Slovakia', 'Lithuania'}
{'Slovenia', 'Latvia', 'Ireland', 'Netherlands', 'France', 'Finland', 'Italy', 'Estonia', 'Malta', 'Greece', 'Belgium', 'Cyprus', 'Portugal', 'Spain', 'Luxembourg', 'Austria', 'Germany', 'Slovakia', 'Lithuania'}
{'Sweden', 'Bulgaria', 'Croatia', 'Poland', 'Denmark', 'Czech Republic', 'Romania', 'Hungary'}
{'Sweden', 'Bulgaria', 'Croatia', 'Poland', 'Denmark', 'Czech Republic', 'Romania', 'Hungary'}


Summing up the properties. **Ordered** means that Python stores elements in the same order you specified when you created it. **Mutable** means that we can change the elements, for example, to replace one of the elements with a new value. **Constructor** demonstrates how to create, represent and store each data type. Note the difference in brackets.

<img src="https://miro.medium.com/max/1400/1*Det-kkoSw9T4IZ4XrypVNQ.png" style="width:700px;height:300px"/>

### Strings

The key operations: concatenation, split, join and extending string with its duplicates by *N* times. 

In [68]:
greeting = 'welcome to the course on Data Analysis!'

whom = 'Dear students, '

# concatenation
print(whom + greeting)

Dear students, welcome to the course on Data Analysis!


In [69]:
# repeat by mulyiplicator
print('You will need to work', 'very ' * 5, 'hard...')

You will need to work very very very very very  hard...


In [70]:
# string starts with
print(greeting.startswith('Chilling'))

False


In [71]:
# string ends with
print(greeting.endswith('sis!'))
print('Recheck it'.endswith('t'))

True
True


In [72]:
# split
text_line = 'You need_to split this_sentence into four parts_by underscore sign.'
print(text_line.split('_'))

['You need', 'to split this', 'sentence into four parts', 'by underscore sign.']


In [73]:
# join
text_line = 'You need to join this sentence by asterisks instead of white spaces.'
'*'.join(text_line.split(' '))

'You*need*to*join*this*sentence*by*asterisks*instead*of*white*spaces.'

The find( ) method returns the index of the first character of the first occurrence of the substring, if contained in the string. If there is no such substring, it returns -1.

In [74]:
# find

prologue = '''The morning had dawned clear and cold, with a crispness that hinted at
the end of summer. They set forth at daybreak to see a man beheaded, twenty in all, and
Bran rode among them, nervous with excitement. This was the first time he had been deemed
old enough to go with his lord father and his brothers to see the king’s justice done. It was
the ninth year of summer, and the seventh of Bran’s life'''

prologue.find('Bran')

159

The name 'Bran' was found and 159 is the index of the first letter of this word in a string 'prologue'. 

In [75]:
print(prologue[prologue.find('Bran'):]) # prints the whole string starting from 'Bran'

Bran rode among them, nervous with excitement. This was the first time he had been deemed
old enough to go with his lord father and his brothers to see the king’s justice done. It was
the ninth year of summer, and the seventh of Bran’s life


In [76]:
print(prologue[prologue.find('Bran'):prologue.find(' rode')]) #slicing

Bran


Let's make a simple version of sentiment analysis from scratch. If the review contains the word 'bad', then we will write that this is a negative review.

In [77]:
feedback = 'This place was bad enough.'

print(feedback)
if feedback.find('bad') == -1: # if 'bad' not found
    print('Feedback is not negative')
else:
    print('Feedback is negative')

This place was bad enough.
Feedback is negative


In [78]:
# replace
statement = '''I take you, Monica, to be my wife.\
I promise to be true to you in good times and in bad, in sickness and in health.\
I will love you and honour you all the days of my life.'''

statement.replace('Monica', 'Rachel')

'I take you, Rachel, to be my wife.I promise to be true to you in good times and in bad, in sickness and in health.I will love you and honour you all the days of my life.'