<a href="https://colab.research.google.com/github/ArnavPabby/ds1002-ncn6dt/blob/main/notebooks/05-python-data-structures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Metadata

```
Course:   DS1002
Module:   04 Basics
Topic:    Data Structures
```

# Data Structures in Python

-----------------------
### PREREQUISITES
- variables
- data types

### REFERENCE
- Python documentation on data structures
https://docs.python.org/3/tutorial/datastructures.html


- Python data structures
https://learning.oreilly.com/library/view/python-in-a/9781098113544/ch03.html#data_types


### OBJECTIVES
- Present the essential Python data structures and some of their basic functionality

NOTE:
1. See above references for more details. As the course progresses, our use of data structures will complexify.
2. We jump ahead a bit, showing functionality involving `for-loops`,  
both covered in more detail later.


### CONCEPTS
- zero-based indexing
- list
- tuple
- set
- dictionary (dict)
- enumerate
- range

# Data Structures

In contrast to primitive data types, data structures organize types into structures that have certain properties, such as **order**, **mutability**, and **addressing scheme**, e.g. by index.

# Lists

A list is an ordered sequence of items.

Each element of a list is associated with an integer that represents the order in which the element appears.

Lists are indexed with **brackets** `[]`.

List elements are accessed by providing their order number in the brackets.

Lists are **mutable**, meaning you can modify them after they have been created.

They can contain mixed types.

## Constructing

They can be **constructed** in several ways:

In [None]:
list1 = []
list2 = list(())
list3 = "some string".split()
numbers = [1,2,3,4]

list3

> Using any of the above methods, make a list containing your numerical birth month, the first letter of your name, & a boolean to the question: I like coffee.

In [18]:
list1 = [11, "A", True]
print(list1)

[11, 'A', True]


## Indexing

**Zero-based indexing**  

Python uses zero-based indexing, which means for a collection `mylist`

`mylist[0]` references the first element  
`mylist[1]` references the second element, etc

For any iterable object of length *N*:  
`mylist[:n]` will return the first *n* elements from index *0* to *n-1*  
`mylist[-n:]` will return the last *n* elements from index *N-n* to *N-1*

In [8]:
numbers = [1,2,3,4,5,6]

In [9]:
numbers[0] # Access first element (output: 1)

1

In [10]:
numbers[0] + numbers[3] # doing arithmetic with the values (output: 5)

5

In [11]:
len(numbers)


6

In [12]:
numbers[-1:]

[6]

In [13]:
numbers[:2] # returns 1 (index 0) through 2 (index 2 minus 1)

[1, 2]

In [14]:
numbers[-2:] # returns 3 (4 minus 2 = index 2) through 4 (4 minus 1 = index 3)

[5, 6]

Can find the index of an element by using `.index()`

In [15]:
numbers.index(3)

2

> Using the list you created above, using indexing to pull out your numerical birth month.

In [21]:
list1[:1]

[11]

## Slicing

`a[start:stop]` items start through stop-1

`a[start:]`     items start through the rest of the array

`a[:stop]`      items from the beginning through stop-1

`a[:]`          a copy of the whole array


In [None]:
numbers[0:2] # Output: [1, 2]

In [None]:
numbers[1:3] # Output: [2, 3]

In [None]:
numbers[2:]  # Output: [3, 4]

> Slice your list using a method from above.

In [23]:
list1[0:2]

[11, 'A']

## Multiply lists by a scalar

A scalar is a single value number.

In [24]:
numbers * 2

[1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]

> multiple your list by 4

In [25]:
list1*4

[11, 'A', True, 11, 'A', True, 11, 'A', True, 11, 'A', True]

## Concatenate lists with `+`

In [26]:
numbers2 = [30, 40, 50]

In [27]:
numbers + numbers2 # concatenate two lists

[1, 2, 3, 4, 5, 6, 30, 40, 50]

> Add `numbers` to your list    

In [28]:
list1+numbers

[11, 'A', True, 1, 2, 3, 4, 5, 6]

## Lists can mix types

In [29]:
myList = ['coconuts', 777, 7.25, 'Sir Robin', 80.0, True]

In [30]:
myList

['coconuts', 777, 7.25, 'Sir Robin', 80.0, True]

**What happens if we multiply a list with strings?**

In [31]:
myList * 2

['coconuts',
 777,
 7.25,
 'Sir Robin',
 80.0,
 True,
 'coconuts',
 777,
 7.25,
 'Sir Robin',
 80.0,
 True]

In [32]:
myList*myList

TypeError: ignored

## Lists can be nested

In [33]:
names = ['Darrell', 'Clayton', ['Billie', 'Arthur'], 'Samantha']
print(names[2]) # returns a *list*
print(names[0]) # returns a *string*

['Billie', 'Arthur']
Darrell


# Dictionaries `dict`

Like a hash table, containing key-value pairs.

Elements are indexed using brackets `[]` (like lists).

But they are constructed used braces `{}` or `dict()`.

Key names must be unique. If you re-use a key, you overwrite its value.

Keys don't have to be strings -- they can be numbers or tuples or expressions that evaluate to one of these.

### Constructing

In [34]:
dictionary1 = {
    'a': 1,
    'b': 2,
    'c': 3
}

In [35]:
dictionary2 = dict(x=55, y=29, z=99) # Note the absence of quotes around keys

In [36]:
dictionary2

{'x': 55, 'y': 29, 'z': 99}

In [37]:
dictionary3 = {'A': 'foo', 99: 'bar', (1,2): 'baz'}

In [38]:
dictionary3

{'A': 'foo', 99: 'bar', (1, 2): 'baz'}

### Retrieve a value

Just write a key as the *index*.

In [39]:
phonelist = {'Tom':123, 'Bob':456, 'Sam':897}

In [40]:
phonelist['Bob']

456

### Print list of keys, values, or both

Use the `.keys()`, `.values()`, or `.items()` methods.

Keys are not sorted. They print in the order entered.

`.keys()`

In [41]:
phonelist.keys() # Returns a list

dict_keys(['Tom', 'Bob', 'Sam'])

`.values()`

In [42]:
phonelist.values() # Returns a list

dict_values([123, 456, 897])

`.items()`

In [43]:
phonelist.items() # Returns a list of tuples

dict_items([('Tom', 123), ('Bob', 456), ('Sam', 897)])

In [44]:
phonelist

{'Tom': 123, 'Bob': 456, 'Sam': 897}

Example of sorting keys using a **For Loop**

In [45]:
for key in sorted(phonelist.keys()):
    print(key)

Bob
Sam
Tom


# Tuples

A tuple is like a list but with one big difference: **a tuple is an immutable object!**

You can't change a tuple once it's created.

A tuple can contain any number of elements of any datatype.

Accessed with brackets `[]` but constructed with or without parentheses `()`.

In [46]:
numbers[3] = 30
numbers

[1, 2, 3, 30, 5, 6]

In [47]:
# tuples are immutable
numbers4 = (26, 27, 28)
numbers4[2] = 30

TypeError: ignored

## Constructing

Created with comma-separated values, with or without parentheses.


In [48]:
letters = 'a', 'b', 'c', 'd'

In [49]:
letters

('a', 'b', 'c', 'd')

In [50]:
numbers = (1,2,3,4) # numbers 1,2,3,4 stored in a tuple

A single valued tuple must include a comma `,`, e.g.

In [None]:
# missing the comma
tuple0 = (29)

In [51]:
tuple0, type(tuple0)

NameError: ignored

In [52]:
# comma included
tuple1 = (29,)

In [53]:
tuple1, type(tuple1)

((29,), tuple)

In [55]:
# attempting to reassign a tuple value
numbers[0] = 5 # Trying to assign a new value 5 to the first position

TypeError: ignored

> List one important attribute of list, dictionary, and tuple
A list can mix data types
A dictionary is made up of key value pairs
A tuple is immutable, i.e it cannot be changed.

# Common functions and methods to all sequences

```
len()
in
+
*
```

In [56]:
[1, 3] * 8

[1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3]

In [57]:
(1, 3) * 8

(1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3)

## Membership with `in`

Returns a boolean.

In [58]:
'Sam' in phonelist

True

# Sets

A `set` is an unordered collection of unique objects.

They are subject to set operations.

In [59]:
peanuts = {'snoopy','snoopy','woodstock'}

In [60]:
peanuts

{'snoopy', 'woodstock'}

Note the set is deduped

Since sets are unordered, they don't have an index. This will break:

In [61]:
peanuts[0]

TypeError: ignored

In [62]:
for peanut in peanuts:
    print(peanut)

snoopy
woodstock


**Check if a value is in the set using `in`**

In [63]:
'snoopy' in peanuts

True

Combine two sets

In [64]:
set1 = {'python','R'}
set2 = {'R','SQL'}

This fails:

In [65]:
set1 + set2

TypeError: ignored

This succeeds:

In [66]:
set1.union(set2)

{'R', 'SQL', 'python'}

Get the set intersection

In [67]:
set1.intersection(set2)

{'R'}

# Ranges

A range is a sequence of integers, from `start` to `stop` by `step`.
- The `start` point is zero by default.  
- The `step` is one by default.  
- The `stop` point is NOT included.  

Ranges can be assigned to a variable.

In [68]:
rng = range(5)
rng

range(0, 5)

More often, ranges are used in iterations, which we will cover later.

In [69]:
for rn in rng:
    print(rn)

0
1
2
3
4


another range:

In [70]:
rangy = range(1, 11, 2)
for rn in rangy:
    print(rn)

1
3
5
7
9
