

---
# Welcome to ML2LM
## Data Structures

Data structures are ways to organize and store data so that they can be accessed and worked with efficiently. Python offers several built-in data structures such as lists, tuples, sets and dictionaries. Each of these has unique properties and use cases. Let's cover them one by one.

**List:**
A list in Python is a data structure that allows you to store an ordered collection of items. These items can be of any data type (e.g., integers, strings, floats, or even other lists). <br>

Key Features:<br>
*Ordered*: Elements in a list maintain the order in which they are added.<br>
*Mutable*: Lists can be modified after creation — elements can be added, removed or changed.<br>
*Dynamic Size*: Python lists can grow or shrink dynamically without a fixed size limit.<br>
*Heterogeneous*: A single list can store elements of different types.<br>


In [240]:
countries = ['india', 'nepal', 'italy', 'china', 'russia']
print(countries)

['india', 'nepal', 'italy', 'china', 'russia']


In [241]:
capitals = list(('new delhi', 'kathmandu', 'roma', 'beijing', 'moscow'))
print(capitals)

['new delhi', 'kathmandu', 'roma', 'beijing', 'moscow']


In [242]:
characters = list('india')
print(characters)

['i', 'n', 'd', 'i', 'a']


In [243]:
# Accessing the elements
print(countries[0]) # by index
print(countries[-1]) # by negative index
print(countries[1:3]) # 1 is inclusive and 3 is exclusive
print(countries[:3])
print(countries[1:])

india
russia
['nepal', 'italy']
['india', 'nepal', 'italy']
['nepal', 'italy', 'china', 'russia']


In [244]:
# Using Loops to Access Elements (to be revisited after for loops)
# for loop
for country in countries:
  print(country)


india
nepal
italy
china
russia


In [245]:

# list comprehension
i_countries = [country for country in countries if country[0] == 'i'] # list of countries that begin with a
print(i_countries)


['india', 'italy']


In [246]:
# zip
i_countries = [(country, capital) for country, capital in zip(countries, capitals) if country[0] == 'i']
print(i_countries)

[('india', 'new delhi'), ('italy', 'roma')]


In [247]:
# nested list
nested_list = [countries, capitals]
print(nested_list)
print(nested_list[1][2])

[['india', 'nepal', 'italy', 'china', 'russia'], ['new delhi', 'kathmandu', 'roma', 'beijing', 'moscow']]
roma


In [248]:
# append
countries.append('usa')
print(countries)

['india', 'nepal', 'italy', 'china', 'russia', 'usa']


In [249]:
# appending a list
countries.append(['france', 'germany'])
print(countries)

['india', 'nepal', 'italy', 'china', 'russia', 'usa', ['france', 'germany']]


In [250]:
# extend
countries.extend(['brazil', 'australia'])
print(countries)

['india', 'nepal', 'italy', 'china', 'russia', 'usa', ['france', 'germany'], 'brazil', 'australia']


In [251]:
# remove
countries.remove(['france', 'germany'])
print(countries)
countries.remove('usa')
print(countries)

['india', 'nepal', 'italy', 'china', 'russia', 'usa', 'brazil', 'australia']
['india', 'nepal', 'italy', 'china', 'russia', 'brazil', 'australia']


In [252]:
popped_item = countries.pop(1)  # Remove and return item at index 1
print(popped_item)
print(countries)

nepal
['india', 'italy', 'china', 'russia', 'brazil', 'australia']


| Method   | Parameter | Removes | Returns | Error If Not Found? |
|----------|-----------|---------|---------|----------------------|
| `remove()` | Element value | First occurrence of the element | No | Yes (`ValueError`) |
| `pop()` | Index (optional) | Element at the given index (or last by default) | Yes | Yes (`IndexError`) |


In [253]:
print(countries.index('china'))
print(countries.index('malaysia'))

2


ValueError: 'malaysia' is not in list

In [254]:
try:
  print(countries.index('malaysia'))
except ValueError:
  print('malaysia is not in the list')

malaysia is not in the list


In [255]:
countries.append('italy')
print(countries)
print(countries.count('italy'))

['india', 'italy', 'china', 'russia', 'brazil', 'australia', 'italy']
2


In [256]:
countries.sort()
print(countries)

['australia', 'brazil', 'china', 'india', 'italy', 'italy', 'russia']


In [257]:
print(countries)
countries.reverse() # directly modifies the existing list
print(countries)

['australia', 'brazil', 'china', 'india', 'italy', 'italy', 'russia']
['russia', 'italy', 'italy', 'india', 'china', 'brazil', 'australia']


In [258]:
print(countries[::-1]) # doesn't directly modify the existing list

['australia', 'brazil', 'china', 'india', 'italy', 'italy', 'russia']


**Note:** reverse() modifies the original list. *list_name[::-1]* creates a new list.

In [259]:
# Copying a list
countries_copy = countries # countries_copy is a reference to countries
countries_copy.append('malaysia')
print(countries) # updating countries_copy also makes changes to countries
print(countries_copy)

['russia', 'italy', 'italy', 'india', 'china', 'brazil', 'australia', 'malaysia']
['russia', 'italy', 'italy', 'india', 'china', 'brazil', 'australia', 'malaysia']


In [260]:
countries_copy = countries.copy()
countries_copy.append('greece')
print(countries)
print(countries_copy)

['russia', 'italy', 'italy', 'india', 'china', 'brazil', 'australia', 'malaysia']
['russia', 'italy', 'italy', 'india', 'china', 'brazil', 'australia', 'malaysia', 'greece']


In [261]:
# Clearing a list
countries.clear()
print(countries)

[]


 **Tuples:**
A tuple is a collection of ordered, immutable items. Similar to lists, tuples can store elements of any data type, including integers, strings, floats, and other tuples. However, unlike lists, tuples cannot be modified after they are created. <br>

Key Features:<br>
*Ordered*<br>
*Immutable*: Once a tuple is created, its elements cannot be changed. This makes tuples useful in situations where you want to ensure that the data remains constant.<br>
*Heterogeneous*<br>
*Hashable*: Since tuples are immutable, they can be used as keys in dictionaries, unlike lists.


In [262]:
# Creating tuples
countries = ('india', 'nepal', 'italy')
capitals = 'new delhi', 'kathmandu', 'rome'
print(countries, capitals)

('india', 'nepal', 'italy') ('new delhi', 'kathmandu', 'rome')


In [263]:
# Concatenation
concat_tuple = countries + capitals
print(concat_tuple)

# Repetition
repeat_tuple = countries * 3
print(repeat_tuple)

# Tuple Unpacking
c1, c2, c3 = countries
print(c1)

('india', 'nepal', 'italy', 'new delhi', 'kathmandu', 'rome')
('india', 'nepal', 'italy', 'india', 'nepal', 'italy', 'india', 'nepal', 'italy')
india


Notes:<br>
1.   Tuples are ideal when you have data that should not change throughout the program, such as coordinates, fixed configurations, or records.
2.   Functions in Python can return multiple values as tuples. This allows you to return several related pieces of data from a function.<!--  -->
E.g. `return (latitude, longitude)`




**Sets:**
A set is an unordered collection of unique elements. Unlike lists or tuples, sets do not store duplicate values, and their order is not guaranteed. Sets are commonly used when you need to store distinct values and don't care about their order.<br>

Key Features:<br>
*Unordered*: Sets do not maintain the order of elements.<br>
*Unique Elements*: A set can only contain one instance of each element.<br>
*Mutable*: Sets are mutable, meaning you can add or remove elements after the set is created.<br>
*No Indexing*: Since sets are unordered, they do not support indexing, slicing or other sequence-like behavior.<br>

In [266]:
# Creating Sets
countries = {'india', 'nepal', 'italy', 'india'}
print(countries)

{'india', 'nepal', 'italy'}


In [267]:
# Using set() Constructor from a list
countries_list = ['india', 'nepal', 'italy', 'india']
countries = set(countries_list)
print(countries)

{'india', 'nepal', 'italy'}


In [268]:
# Checking Membership
print('india' in countries)

True


In [273]:
# Adding elements
countries.add('china')
print(countries)

{'india', 'china', 'nepal', 'italy'}


In [274]:
# Removing Elements
countries.remove('china')
print(countries)

{'india', 'nepal', 'italy'}


In [272]:
countries.discard('china')
print(countries) # doesn't raise a KeyError

{'india', 'nepal', 'italy'}


In [275]:
# Length
print(len(countries))

3


In [293]:
set1 = {1,2}
set2 = {2,3,4}

# Union
print("Union:", set1|set2) # same as set1.union(set2)

# Intersection
print("Intersection:", set1&set2) # same as set1.intersection(set2)

# Difference
print("Difference:", set1-set2) # same as set1.difference(set2)

# Symmetric Difference
print("Symmetric Difference:", set1^set2) # same as set1.symmetric_difference(set2)


Union: {1, 2, 3, 4}
Intersection: {2}
Difference: {1}
Symmetric Difference: {1, 3, 4}


In [294]:
set3 = {1,2,3,4}
set4 = {3,4}

# Subset and Superset
print("Subset:", set1.issubset(set3))
print("Superset:", set3.issuperset(set1))

# Disjoint set
print("Disjoint:", set1.isdisjoint(set4))

Subset: True
Superset: True
Disjoint: True


In [295]:
# Update
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

set1.update(set2)
print(set1)

{1, 2, 3, 4, 5, 6}


In [279]:
# Intersection Update
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

set1.intersection_update(set2)
print(set1)

{3, 4}


In [280]:
# Difference Update
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

set1.difference_update(set2)
print(set1)

{1, 2}


In [281]:
# Symmetric Difference Update
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

set1.symmetric_difference_update(set2)
print(set1)

{1, 2, 5, 6}


Notes<br>
1.   Sets are commonly used to remove duplicates from a list or other iterable.
2.   Sets allow for fast membership tests, making them ideal for situations where you need to quickly check if an element exists in a collection.


**Dictionary:**

A dictionary is a collection of unordered, changeable and indexed elements. It is similar to a real-world dictionary, where a word (key) is associated with a definition (value). In Python, dictionaries are defined using curly braces `{}` and consist of key-value pairs separated by a colon `:`.


In [284]:
# Creating dictionaries
person = {
    'name': 'S Bose',
    'age': 30,
    'city': 'Kolkata'
}

print(person)

# Accessing elements by key
print(person['city'])
print(person.get('city'))

# Adding and Updating Key-Value Pairs
person['gender'] = 'male'
print(person)

person['age'] = 31
print(person)

# Removing Key-Value Pairs
del person['age']
print(person)

person.pop('gender')
print(person)

{'name': 'S Bose', 'age': 30, 'city': 'Kolkata'}
Kolkata
Kolkata
{'name': 'S Bose', 'age': 30, 'city': 'Kolkata', 'gender': 'male'}
{'name': 'S Bose', 'age': 31, 'city': 'Kolkata', 'gender': 'male'}
{'name': 'S Bose', 'city': 'Kolkata', 'gender': 'male'}
{'name': 'S Bose', 'city': 'Kolkata'}


In [285]:
contacts = {
    'S Bose': {'phone': '123-456-7890', 'email': 'sbose@example.com'},
    'N Islam': {'phone': '987-654-3210', 'email': 'nisl@example.com'},
    'RM Roy': {'phone': '555-555-5555', 'email': 'rmroy@example.com'}
}

print(contacts)


{'S Bose': {'phone': '123-456-7890', 'email': 'sbose@example.com'}, 'N Islam': {'phone': '987-654-3210', 'email': 'nisl@example.com'}, 'RM Roy': {'phone': '555-555-5555', 'email': 'rmroy@example.com'}}


In [286]:

# Iterating Through Keys, Values, and Items
# To be revisited after for loops
# Keys
for key in contacts:
    print(key)

print(contacts.keys())


S Bose
N Islam
RM Roy
dict_keys(['S Bose', 'N Islam', 'RM Roy'])


In [287]:

# Values
for value in contacts.values():
    print(value)


{'phone': '123-456-7890', 'email': 'sbose@example.com'}
{'phone': '987-654-3210', 'email': 'nisl@example.com'}
{'phone': '555-555-5555', 'email': 'rmroy@example.com'}


In [288]:

# Key-Value Pairs
for key, value in contacts.items():
    print(f'Name: {key}, Phone: {value["phone"]}, Email: {value["email"]}')


Name: S Bose, Phone: 123-456-7890, Email: sbose@example.com
Name: N Islam, Phone: 987-654-3210, Email: nisl@example.com
Name: RM Roy, Phone: 555-555-5555, Email: rmroy@example.com


In [289]:
# Updating items
contacts.update({'RM Roy': {'phone': '555-555-3333', 'email': 'rmroy@example.com'},
                 'IC Chatterjee': {'phone': '777-777-7777', 'email':'icc@example.com'}})
print(contacts)


{'S Bose': {'phone': '123-456-7890', 'email': 'sbose@example.com'}, 'N Islam': {'phone': '987-654-3210', 'email': 'nisl@example.com'}, 'RM Roy': {'phone': '555-555-3333', 'email': 'rmroy@example.com'}, 'IC Chatterjee': {'phone': '777-777-7777', 'email': 'icc@example.com'}}


In [290]:
# Removing keys
contact = contacts.pop('RM Roy')
print(contact)
print(contacts)

{'phone': '555-555-3333', 'email': 'rmroy@example.com'}
{'S Bose': {'phone': '123-456-7890', 'email': 'sbose@example.com'}, 'N Islam': {'phone': '987-654-3210', 'email': 'nisl@example.com'}, 'IC Chatterjee': {'phone': '777-777-7777', 'email': 'icc@example.com'}}


In [None]:
contact = contacts.popitem() # pops last item
print(contact)
print(contacts)

In [291]:
# Dictionary Comprehension
squares = {x: x**2 for x in range(1, 6)}
print(squares)

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


Notes<br>
1. Dictionaries are ideal for situations where you need to map one value to another.
2. Dictionaries are often used for counting the occurrences of items, such as in word frequency analysis or tallying votes. Example below.

In [292]:
text = "the new news news anchor delivered the breaking news with confidence"

word_count = {}
for word in text.split():
    word_count[word] = word_count.get(word, 0) + 1
print(word_count)

{'the': 2, 'new': 1, 'news': 3, 'anchor': 1, 'delivered': 1, 'breaking': 1, 'with': 1, 'confidence': 1}
