# Python Data Structures

data structures organize and store data.
python has 4  data structures:
lists, tuples, dictionaries, and sets.

## list

list is an ordered collection of objects.
they are separated by commas.
the entire list is enclosed in square brackets.
lists are mutable, you can add, remove, and modify elements.

In [1]:
example_1 = [2, 4, 7]
example_2 = ['Bob', 'John', 'Will']

# list allows different data types to be in one list
example_3 = ['Ford', 'America', 'Europe']

# creating a list
regions = ['Asia', 'America', 'Europe']

list are usually populated by with a loop starting with an empty list

In [2]:
# using common objects methods
my_list = []

# .append() to add items
my_list.append('Pay bills')
my_list.append('Tidy up')
my_list.append('Walk the dog')
my_list.append('Cook dinner')

# output
print(my_list)
print(my_list[0]) # for first element since python uses zero indexing

['Pay bills', 'Tidy up', 'Walk the dog', 'Cook dinner']
Pay bills


In [3]:
# inserting an item in between two items
i = my_list.index('Cook dinner')
my_list.insert(i, 'Go to the pharmacy')
print(my_list)

# to check how many times an item appears
print(my_list.count('Tidy up'))

# using slice notation
print(my_list[0:3])

['Pay bills', 'Tidy up', 'Walk the dog', 'Go to the pharmacy', 'Cook dinner']
1
['Pay bills', 'Tidy up', 'Walk the dog']


In [4]:
# both start and end indices are optional
print(my_list[:3]) # omitting first
print(my_list[3:]) # omitting last
print(my_list[:]) # omitting both

['Pay bills', 'Tidy up', 'Walk the dog']
['Go to the pharmacy', 'Cook dinner']
['Pay bills', 'Tidy up', 'Walk the dog', 'Go to the pharmacy', 'Cook dinner']


In [5]:
# using slice notation for appending and inserting
my_list[len(my_list):] = ['Mow the lawn', 'Water plants']
# len returns the number of items in the list
print(my_list)

['Pay bills', 'Tidy up', 'Walk the dog', 'Go to the pharmacy', 'Cook dinner', 'Mow the lawn', 'Water plants']


a queue is an abstract data type.
one end for inserting items - enqueue, one end for removing items - dequeue
that is, first-in, first-out (FIFO)

In [6]:
# turning a list to a queue using python's deque (double-ended queue) object
# using a to-do list example

from collections import deque
queue = deque(my_list)
queue.append('Wash the car')
print(queue.popleft(), ' - Done!')
my_list_upd = list(queue)

Pay bills  - Done!


### using a list as a stack
a stack is an abstract data structure.
stack implements last-in, first-out (LIFO)

In [7]:
my_list = ['Pay bills', 'Tidy up', 'Walk the dog', 'Go to the pharmacy', 'Cook dinner']
stack = []
for task in my_list:
    stack.append(task)
while stack:
    print(stack.pop(), ' - Done!')

Cook dinner  - Done!
Go to the pharmacy  - Done!
Walk the dog  - Done!
Tidy up  - Done!
Pay bills  - Done!


### using lists and stacks for natural language processing

In [8]:
import spacy
txt = 'List is a ubiquitous data structure in the Python programming language.'

nlp = spacy.load('en_core_web_sm')
doc = nlp(txt)
stk = []
for w in doc:
    if w.pos_ == 'NOUN' or w.pos_ == 'PROPN':
        stk.append(w.text)
    elif (w.head.pos_ == 'NOUN' or w.pos_ == 'PROPN') and (w in w.head.lefts):
        stk.append(w.text)
    elif stk:
        chunk = ''
        while stk:
            chunk = stk.pop() + ' ' + chunk
        print(chunk.strip())

List
a ubiquitous data structure
the Python programming language


### importing with list comprehensions
let's find the head of each word in the sentence

In [9]:
import spacy

txt = 'List is a ubiquitous data structure in the Python programming language.'

nlp = spacy.load('en_core_web_sm')
doc = nlp(txt)

for t in doc:
    print(t.text, t.head.text)

List is
is is
a structure
ubiquitous structure
data structure
structure is
in structure
the language
Python language
programming language
language in
. is


### creating using list comprehension

In [10]:
import spacy

txt = 'List is arguably the most useful type in the Python programming language.'

nlp = spacy.load('en_core_web_sm')
doc = nlp(txt)

head_lefts = [t.text if t in t.head.lefts else 0 for t in doc]
print(head_lefts)

['List', 0, 0, 'the', 'most', 'useful', 0, 0, 'the', 'Python', 'programming', 0, 0]


Moving through a list word by word through the rest of the text

In [11]:
for word in doc:
    head_lefts = [t.text if t in t.head.lefts else 0 for t in doc [w.i:]]
    print(head_lefts)

[0, 0]
[0, 0]
[0, 0]
[0, 0]
[0, 0]
[0, 0]
[0, 0]
[0, 0]
[0, 0]
[0, 0]
[0, 0]
[0, 0]
[0, 0]


Analyzing each fragment, looking for the next zero

In [12]:
for w in doc:
    head_lefts = [t.text if t in t.head.lefts else 0 for t in doc[w.i:]]
    i0 = head_lefts.index(0)
    if i0 > 0:
        noun = [1 if t.pos_== 'NOUN' or t.pos_== 'PROPN' else 0 for t in reversed(doc[w.i:w.i+i0 +1])]
        try:
            i1 = noun.index(1) + 1
        except ValueError:
            pass
        print(head_lefts[:i0 +1])
        print(doc[w.i+i0 +1-i1])

['List', 0]
List
['the', 'most', 'useful', 0]
type
['most', 'useful', 0]
type
['useful', 0]
type
['the', 'Python', 'programming', 0]
language
['Python', 'programming', 0]
language
['programming', 0]
language


PUTTING IT ALL TOGETHER!

In [13]:
import spacy

txt = 'List is arguably the most useful type in the Python programming language.'

nlp = spacy.load('en_core_web_sm')
doc = nlp(txt)
stk = []

for w in doc:
    head_lefts = [t.text if t in t.head.lefts else 0 for t in doc[w.i:]]
    i0 = 0
    try:
        i0 = head_lefts.index(0)
    except ValueError:
        pass
    i1 = 0
    if i0 > 0:
        noun = [1 if t.pos_== 'NOUN' or t.pos_== 'PROPN' else 0 for t in reversed(doc[w.i:w.i+i0 +1])]
        try:
            i1 = noun.index(1) + 1
        except ValueError:
            pass
        if w.pos_ == 'NOUN' or w.pos_ == 'PROPN':
            stk.append(w.text)
        elif (i1 > 0):
            stk.append(w.text)
        elif stk:
            chunk = ''
            while stk:
                chunk = stk.pop() + ' ' + chunk
            print(chunk.strip())

## tuples

In [14]:
"""
a tuple is an ordered collection of objects.
tuples are immutable. once created it can not be changed.
typically used to store collections of heterogeneous data.
especially useful for holding properties of an object.
"""

# example of a simple tuple
('Ford', 'Mustang', 1964)

('Ford', 'Mustang', 1964)

example of a list of tuples

In [15]:
# a to-do list with a tuple of time-task pairs
[('8:00', 'Pay bills'), ('8:30', 'Tidy up'), ('9:30', 'Walk the dog'), ('10:00', 'Go to the pharmacy'), ('10:30', 'Cook dinner')]
task_list = ['Pay bills', 'Tidy up', 'Walk the dog', 'Go to the pharmacy', 'Cook dinner']
tm_list = ['8:00', '8:30', '9:30', '10:00', '10:30']
sched_list = [(tm, task) for tm, task in zip(tm_list, task_list)]
print(sched_list)
print(sched_list[1][0])

[('8:00', 'Pay bills'), ('8:30', 'Tidy up'), ('9:30', 'Walk the dog'), ('10:00', 'Go to the pharmacy'), ('10:30', 'Cook dinner')]
8:30


Immutability

In [16]:
"""
they're not mutable.
you can't modify them
"""

sched_list[1][0] = '9:00'

TypeError: 'tuple' object does not support item assignment

### Dictionaries
dictionaries are mutable, unordered collection of key-value pairs

In [17]:
"""
each key is used to identify a value
for instance:
"""
{'Make': 'Ford', 'Model': 'Mustang', 'Year': 1964}

# dictionaries like tuples are good for storing heterogeneous data about real-world objects

{'Make': 'Ford', 'Model': 'Mustang', 'Year': 1964}

A list of dictionaries

In [18]:
## to-do list implementation as a list of dictionaries
dict_list = [
    {'time': '8:00', 'name': 'Pay bills'},
    {'time': '8:30', 'name': 'Tidy up'},
    {'time': '9:30', 'name': 'Walk the dog'},
    {'time': '10:00', 'name': 'Go to the pharmacy'},
    {'time': '10:30', 'name': 'Cook dinner'},
]

# unlike tuples, dictionaries are mutable.
dict_list[1]['time'] = '9:00' # how to accesss data

Adding to a Dictionary with setdefault()

In [19]:
"""
setdefault() method makes it easy to add key-values and
avoid making double assignment by printing the current value for a value
that already exists
"""

#example
car = {
    'brand': 'Volkswagen',
    'style': 'Sedan',
    'model': 'Jetta'
}

# trying a new model on the dictionary
print(car.setdefault('model', 'Passat')) # will print current value 'Jetta'

print(car.setdefault('year', 2022)) # will add year-2022 pair to the dictionary and print it
print(car) #confirming the new addition

Jetta
2022
{'brand': 'Volkswagen', 'style': 'Sedan', 'model': 'Jetta', 'year': 2022}


Practical NLP example (part A): counting the number of occurrences of each word in a text phrase

In [20]:
txt = '''Python is one of the most promising programming 
languages today. Due to the simplicity of Python syntax, 
many researchers and scientists prefer Python over many other 
languages.'''

txt = txt.replace('.', '').replace(',','') # removing punctuations
lst = txt.split()
print(lst)

['Python', 'is', 'one', 'of', 'the', 'most', 'promising', 'programming', 'languages', 'today', 'Due', 'to', 'the', 'simplicity', 'of', 'Python', 'syntax', 'many', 'researchers', 'and', 'scientists', 'prefer', 'Python', 'over', 'many', 'other', 'languages']


Practical NLP example (part B): performing the counting

In [21]:
dct = {}
for w in lst:
    c = dct.setdefault(w, 0)
    dct[w] += 1

print(dct)

# sorting by number of occurrences
dct_sorted = dict(sorted(dct.items(), key=lambda x: x[1], reverse = True))
print(dct_sorted)

{'Python': 3, 'is': 1, 'one': 1, 'of': 2, 'the': 2, 'most': 1, 'promising': 1, 'programming': 1, 'languages': 2, 'today': 1, 'Due': 1, 'to': 1, 'simplicity': 1, 'syntax': 1, 'many': 2, 'researchers': 1, 'and': 1, 'scientists': 1, 'prefer': 1, 'over': 1, 'other': 1}
{'Python': 3, 'of': 2, 'the': 2, 'languages': 2, 'many': 2, 'is': 1, 'one': 1, 'most': 1, 'promising': 1, 'programming': 1, 'today': 1, 'Due': 1, 'to': 1, 'simplicity': 1, 'syntax': 1, 'researchers': 1, 'and': 1, 'scientists': 1, 'prefer': 1, 'over': 1, 'other': 1}


### Loading JSON into a Dictionary

In [22]:
d = { "PONumber" : 2608,
     "ShippingInstructions" : {"name" : "John Silver",
                               "Address": { "street" : "426 Light Street",
                                           "city" : "South San Francisco",
                                           "state" : "CA",
                                           "zipCode" : 99237,
                                           "country" : "United States of America" },
                               "Phone" : [ { "type" : "Office", "number" : "809-123-9309" },
                                          { "type" : "Mobile", "number" : "417-123-4567" }]
                               }
     }

In [23]:
# saving the dictionary directly to a JSON file
import json
with open("po.json", "w") as outfile:
    json.dump(d, outfile)

# using json.load() method to load contents of a JSON file directly into python dictionary
with open("po.json",) as fp:
    d = json.load(fp)

# Sets
a Python set is an unordered collection of unique items.
Duplicate items not allowed.
Defined by curly brackets containing items separated by commas.

In [24]:
# example of a set
{'London', 'New York', 'Paris'}

{'London', 'New York', 'Paris'}

In [25]:
# how to remove duplicates from a list
lst = ['John Silver', 'Tim Jemison', 'John Silver', 'Maya Smith']
lst = list(set(lst))
print(lst)

['Maya Smith', 'John Silver', 'Tim Jemison']


In [26]:
# how to remove duplicates from a list and still maintain order using sorted()
lst = ['John Silver', 'Tim Jemison', 'John Silver', 'Maya Smith']
lst = list(sorted(set(lst), key=lst.index))
print(lst)

['John Silver', 'Tim Jemison', 'Maya Smith']


### Performing Common Set Operations
set objects coming with methods for performing common math operations on sequences.
This includes unions and intersections.

In [27]:
"""
classifying a huge number of photons into groups based on what's in the photos
Clarifai API can generate descriptive tags for a given photo.
Tags can be compared using the intersection() method.
The more tags that there are in both sets, the more similar two images are with respect to their theme.
Consider the following simplified example:
"""

photo1_tags = {'coffee', 'breakfast', 'drink', 'table', 'tableware', 'cup', 'food'}
photo2_tags = {'food', 'dish', 'meat', 'meal', 'tableware', 'dinner', 'vegetable'}
intersection = photo1_tags.intersection(photo2_tags)
print(intersection)
if len(intersection) >= 2:
    print("The photos contain similar objects.")

{'food', 'tableware'}
The photos contain similar objects.


## Exercise #1

In [None]:
photo_list = [
    {
        "name": "photo1.jpg",
        "tags": {'coffee', 'breakfast', 'drink', 'table', 'tableware', 'cup', 'food'}
    },
    {
        "name": "photo2.jpg",
        "tags": {'food', 'dish', 'meat', 'meal', 'tableware', 'dinner',
        'vegetable'}
    },
    {
        "name": "photo3.jpg",
        "tags": {'city', 'skyline', 'cityscape', 'skyscraper',
        'architecture', 'building',
        'travel'}
    },
    {
        "name": "photo4.jpg",
        "tags": {'drink', 'juice', 'glass', 'meal', 'fruit', 'food', 'grapes'}
    }
    ]

photo_groups = {}

for i in range(1, len(photo_list)):
    for j in range(i + 1, len(photo_list) + 1):
        print(f"Intersecting photo {i} with photo {j}")
        # Implement intersection here, saving results to photo_groups
        