## Dictionaries

`dict` `dict()`

- created using the curly braces `{}`
- used to store data in the form of **key value pairs**
- syntax: `{key1: value1, key2: value2, ... }`

example:

In [2]:
dict1 = {'Name': 'John', 'Salary': 1000, 'Grade': 'B'}

In [2]:
dict1

{'Name': 'John', 'Salary': 1000, 'Grade': 'B'}

### Important properties of dictionaries

1. Heterogeneous in nature
2. Values can repeat. However, keys can't
3. Very useful when storing data in the form of key values.

In [3]:
dict1['Name'] ## access elements from a dictionary

'John'

In [5]:
dict1['Grade']

'B'

In [6]:
## slicing not possible

In [7]:
dict1['Name': 'Grade']

TypeError: unhashable type: 'slice'

In [None]:
## updating dictionaries

## in order to update a dictionary, just access the element to update using [] and provide the new value

In [3]:
print(dict1)
dict1['Name'] = 'Brad'
print(dict1)

{'Name': 'John', 'Salary': 1000, 'Grade': 'B'}
{'Name': 'Brad', 'Salary': 1000, 'Grade': 'B'}


In [None]:
## if an existing key is provided while updating a dictionary, the dictionary gets updated.
## if a new key is provided, a new item is added in the dictionary

In [4]:
print(dict1)
dict1['PAN'] = 1001
print(dict1)

{'Name': 'Brad', 'Salary': 1000, 'Grade': 'B'}
{'Name': 'Brad', 'Salary': 1000, 'Grade': 'B', 'PAN': 1001}


With a combination of lists and dictionaries, we can start storing some real world data.

In [57]:
data1 = {'Name': ['John', 'Chris', 'Brad'],
        'Salary': [1000, 2000, 2500],
        'Age': [30, 40, 50]}

In [1]:
data2 = [{'Ticker': 'HDFC', 'NetIncome': 200, 'Revenue': 1000},
      {'Ticker': 'TATA', 'NetIncome': 100, 'Revenue': 1000, 'PE':1.2}]

## example of semi structured. Similar to json files and documents in MongoDB Databases

### Important methods of a dictionary

In [6]:
print(dir(dict))

['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']


### `.items()`, `.keys()`, and `.values()`

In [11]:
dict1.keys() ## returns all the keys of a dictionary

dict_keys(['Name', 'Salary', 'Grade'])

In [12]:
dict1.values()  ## returns all the values

dict_values(['John', 1000, 'B'])

In [13]:
dict1.items()  ## returns all the items as a *list of tuples 

dict_items([('Name', 'John'), ('Salary', 1000), ('Grade', 'B')])

### Iterating over the items of a dictionary

There are three ways of iterating over a dictionary.

1. Iterate over the keys
2. Iterate over the values
3. Iterate over both keys and values at the same time

In [5]:
## 1. Iterate over the keys


for key in dict1.keys():
    print(dict1[key])

Brad
1000
B
1001


**Important**: use the `TAB` key for autocomplete in jupyter

In [6]:
dict2 = {0: 'A', 1: 'B', 2: 'C'}

In [7]:
dict2[0]

'A'

In [8]:
## iterate over the values:

for value in dict1.values():
    print(value)

Brad
1000
B
1001


In [11]:
## 3. Iterate over both

for key, value in dict1.items():
    print(key, ':', value)

    

Name : Brad
Salary : 1000
Grade : B
PAN : 1001


In [12]:
## Adding items to a new dictionary

## Q: create a new dictionary that takes in the same items as dict1. However, the values are changed as follows:
## - if the value is a string, make it uppercase
## - if its a number, add 100 to it

new_dict = {} ## In order to add items to a new dictionary, first initialize or create an empty dictionary

for key, value in dict1.items():
    if type(value) == str:
        new_dict[key] = value.upper()
        
    elif type(value) == int:
        new_dict[key] = value + 100
        
print(new_dict)

{'Name': 'BRAD', 'Salary': 1100, 'Grade': 'B', 'PAN': 1101}


### Applying our knowledge of lists and dictionaries so far to get some context out of textual data.

1. Get some text data and store it in a string.
2. Clean the data to make it consistent ( all lowercase ) and remove symbols like `.` (period) and `,`.
2. Split the cleaned string into a list of separate words.
3. Count the occurences of each word in the text and store the word and its count as key value pairs in a dictionary.

In [1]:
## Step 1: read the data

with open('d:/python_new/news1.txt') as f:
    data = f.read()

In [4]:
## Step 2: clean the data

data = data.lower().replace('.', '').replace(',', '').replace(';', '')
print(data)

for the first time since early november the total market capitalization of the cryptocurrency industry exceeded $1 trillion according to data from coingecko on january 14 bitcoin increased to over $21000 on hopes that inflation may have peaked and reached a bottom the biggest cryptocurrency increased up to 75% to $21299 in value since november 8 it hadn't been above $20000 and january 14 was its 11th straight day of growth ether the second-largest cryptocurrency rose up to 97% and other coins like cardano and dogecoin also registered significant gains before this most recent breakout the price of bitcoin had been trapped in a small range between $16000 and $17000 for weeks the rising movements have surprised shorts according to statistics from coinglass cryptocurrency short liquidations have exceeded $100 million in five of the last six days the biggest amount was reached on january 14 and exceeded $296 million the increases coincided with consumer pricing data released last week that 

In [5]:
## split the string into a list of words

words = data.split()
print(words)

['for', 'the', 'first', 'time', 'since', 'early', 'november', 'the', 'total', 'market', 'capitalization', 'of', 'the', 'cryptocurrency', 'industry', 'exceeded', '$1', 'trillion', 'according', 'to', 'data', 'from', 'coingecko', 'on', 'january', '14', 'bitcoin', 'increased', 'to', 'over', '$21000', 'on', 'hopes', 'that', 'inflation', 'may', 'have', 'peaked', 'and', 'reached', 'a', 'bottom', 'the', 'biggest', 'cryptocurrency', 'increased', 'up', 'to', '75%', 'to', '$21299', 'in', 'value', 'since', 'november', '8', 'it', "hadn't", 'been', 'above', '$20000', 'and', 'january', '14', 'was', 'its', '11th', 'straight', 'day', 'of', 'growth', 'ether', 'the', 'second-largest', 'cryptocurrency', 'rose', 'up', 'to', '97%', 'and', 'other', 'coins', 'like', 'cardano', 'and', 'dogecoin', 'also', 'registered', 'significant', 'gains', 'before', 'this', 'most', 'recent', 'breakout', 'the', 'price', 'of', 'bitcoin', 'had', 'been', 'trapped', 'in', 'a', 'small', 'range', 'between', '$16000', 'and', '$17000

In [6]:
print(len(words))

225


In [7]:
## In order to get some context, we would want to remove words that occur a lot but don't give much meaning such as 
## the following

stop_words = ['a', 'the', 'so', 'have', 'for', 'and', 'was', 'For', 'of', 'to', 'in', 'up', 'The', 'on', 'from',
                'that', 'been']

In [10]:
## Create a new list that removes the words in words_to_omit

filtered_words = [var for var in words if var not in stop_words]
print(filtered_words)

['first', 'time', 'since', 'early', 'november', 'total', 'market', 'capitalization', 'cryptocurrency', 'industry', 'exceeded', '$1', 'trillion', 'according', 'data', 'coingecko', 'january', '14', 'bitcoin', 'increased', 'over', '$21000', 'hopes', 'inflation', 'may', 'peaked', 'reached', 'bottom', 'biggest', 'cryptocurrency', 'increased', '75%', '$21299', 'value', 'since', 'november', '8', 'it', "hadn't", 'above', '$20000', 'january', '14', 'its', '11th', 'straight', 'day', 'growth', 'ether', 'second-largest', 'cryptocurrency', 'rose', '97%', 'other', 'coins', 'like', 'cardano', 'dogecoin', 'also', 'registered', 'significant', 'gains', 'before', 'this', 'most', 'recent', 'breakout', 'price', 'bitcoin', 'had', 'trapped', 'small', 'range', 'between', '$16000', '$17000', 'weeks', 'rising', 'movements', 'surprised', 'shorts', 'according', 'statistics', 'coinglass', 'cryptocurrency', 'short', 'liquidations', 'exceeded', '$100', 'million', 'five', 'last', 'six', 'days', 'biggest', 'amount', '

In [11]:
print(len(filtered_words))

158


In [12]:
words_count = {}  ## create an empty dictionary that will hold each word and its count

for word in filtered_words:
    words_count[word] = filtered_words.count(word)  ## count using the .count() method of a list
    #print(words_count)
    
print(words_count)

{'first': 1, 'time': 1, 'since': 2, 'early': 1, 'november': 2, 'total': 1, 'market': 1, 'capitalization': 1, 'cryptocurrency': 4, 'industry': 1, 'exceeded': 3, '$1': 1, 'trillion': 1, 'according': 2, 'data': 2, 'coingecko': 1, 'january': 4, '14': 3, 'bitcoin': 2, 'increased': 3, 'over': 1, '$21000': 1, 'hopes': 1, 'inflation': 2, 'may': 1, 'peaked': 1, 'reached': 2, 'bottom': 1, 'biggest': 2, '75%': 1, '$21299': 1, 'value': 1, '8': 1, 'it': 1, "hadn't": 1, 'above': 1, '$20000': 1, 'its': 1, '11th': 1, 'straight': 1, 'day': 1, 'growth': 1, 'ether': 1, 'second-largest': 1, 'rose': 1, '97%': 1, 'other': 1, 'coins': 1, 'like': 1, 'cardano': 1, 'dogecoin': 1, 'also': 1, 'registered': 1, 'significant': 1, 'gains': 1, 'before': 1, 'this': 2, 'most': 1, 'recent': 1, 'breakout': 1, 'price': 2, 'had': 1, 'trapped': 1, 'small': 1, 'range': 1, 'between': 1, '$16000': 1, '$17000': 1, 'weeks': 1, 'rising': 1, 'movements': 1, 'surprised': 1, 'shorts': 1, 'statistics': 1, 'coinglass': 1, 'short': 1, '

In [13]:
## remove words that occur only once to get some context

for key, value in words_count.items():
    if value != 1:
        print(key, ":", value)

since : 2
november : 2
cryptocurrency : 4
exceeded : 3
according : 2
data : 2
january : 4
14 : 3
bitcoin : 2
increased : 3
inflation : 2
reached : 2
biggest : 2
this : 2
price : 2
million : 2
last : 2
six : 2
days : 2
levels : 2


## tuple

A tuple is like a list. However, tuples are **immutable** in nature meaning that they cannot be changed or updated in the same memory location. However, they can be overwritten completely.

1. How to create
`()` `tuple()`

In [36]:
t1 = (1, 2, 'python', True, [1, 2, 3]) ## tuples are also heterogeneous

In [37]:
t1

(1, 2, 'python', True, [1, 2, 3])

In [38]:
print(type(t1))

<class 'tuple'>


In [39]:
t2 = tuple(range(11))  ## another way to create a tuple is to use type conversion

In [40]:
print(type(t2))

<class 'tuple'>


In [41]:
t2

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

2. How to access elements from a tuple

Using the indexing and slicing syntax using `[]`

In [89]:
t3[0]

1

In [90]:
t3[-1]

6

In [91]:
t3[:3]

(1, 2, 3)

3. Updating elements in a tuple

**Not possible** since tuples are immutable.

In [43]:
t2[0] = 'world'

TypeError: 'tuple' object does not support item assignment

4. Methods of a tuple

In [94]:
print(dir(tuple))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']


In [45]:
t1

(1, 2, 'python', True, [1, 2, 3])

In [46]:
t1.count(1) ##

2

## `set`

keyless dictionary 
`{}` `set()`

syntax: `{ value1, value2, .... }`

sets will only store unique values and that too in an ascending order. These datastructures are the same as sets in Mathematics and can be used to do operations like `intersection`, `union`, `difference`, etc.

In [116]:
s1 = {1, 2, 3, 4}

In [117]:
print(type(s1))

<class 'set'>


In [47]:
s2 = {1, 1, 2, 2, 3, 3, 4, 4}

In [49]:
s2 ## a set will only store unique values and that too in an ascending order

{1, 2, 3, 4}

In [50]:
## Q1: Extract the unique items out of a list.

l1 = [1, 2, 2, 3, 4, 1, 2, 3]

In [51]:
list(set(l1))

[1, 2, 3, 4]

In [52]:
s4 = {'hello', 'hello', 1, 2, 3} ## sets are also heterogeneous. However, they cannot store other collections within them.

In [53]:
s4

{1, 2, 3, 'hello'}

In [54]:
## accessing elements is not possible in a set

In [55]:
s4[0] = 1

TypeError: 'set' object does not support item assignment

In [56]:
s4 = {1, 1, 'python', True}

In [57]:
s4

{1, 'python'}

### Methods of a set

In [58]:
print(dir(s4))

['__and__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__iand__', '__init__', '__init_subclass__', '__ior__', '__isub__', '__iter__', '__ixor__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__or__', '__rand__', '__reduce__', '__reduce_ex__', '__repr__', '__ror__', '__rsub__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__xor__', 'add', 'clear', 'copy', 'difference', 'difference_update', 'discard', 'intersection', 'intersection_update', 'isdisjoint', 'issubset', 'issuperset', 'pop', 'remove', 'symmetric_difference', 'symmetric_difference_update', 'union', 'update']


In [59]:
a = {1, 2, 3}
b = {3, 4, 5}

In [60]:
## .union(), .intersection(), .difference()

In [61]:
a.union(b)

{1, 2, 3, 4, 5}

In [62]:
a.difference(b)

{1, 2}

In [63]:
b.difference(a)

{4, 5}

In [64]:
a.intersection(b)

{3}

In [143]:
a  ## these methods do not lead to any permanent change

{1, 2, 3}

In [144]:
b

{3, 4, 5}

In [65]:
a.intersection_update(b)  ## In order to make a permanent change, we have other methods available with _update added
                        ## to their names

In [66]:
a

{3}