# Using Data Structures Effectively 

## Native Python Data Structures

### Lists

You can measure how the time taken to look up an element in a list changes as a list grows.

In [1]:
small_list = list(range(10))

In [2]:
small_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [3]:
%%timeit
last_element = small_list[-1]

45.3 ns ± 4.17 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [4]:
large_list = list(range(10000))

In [5]:
%%timeit
last_element = large_list[-1]

55.7 ns ± 6.41 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [6]:
%%timeit
4200 in small_list

257 ns ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [7]:
%%timeit
4200 in large_list

62.9 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


If you need to search for an element in a list, you’ll need to compare the element you’re searching for with every item in the list. You can measure how this changes as the list grows using the lists from the previous code examples.
First, you can measure how long it takes to search the list containing 10 elements:


In [8]:
%%timeit
4200 in small_list


142 ns ± 2.14 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [9]:
%%timeit
4200 in large_list

43.9 µs ± 1.83 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


It takes approximately 1,000 times as long. Searching a Python list in this way is O(n). There are more efficient ways of searching a list, including binary search, which is O(log n). But if you need to frequently search if something is present it’s probably better to use a dictionary or a set. 

### Tuples

In [10]:
a_tuple = ( 'a', 'b', 'c', 'd', 'e' )
a_tuple = 'a', 'b', 'c', 'd', 'e'
a_tuple = 'a',

type( a_tuple )

tuple

If you want to create a tuple with a single value, add a comma (,) after the value, but don’t add parenthesis

In [11]:
a_tuple = tuple()
print( a_tuple )
a_tuple = tuple( 'lupins' )
print( a_tuple )

()
('l', 'u', 'p', 'i', 'n', 's')


In [13]:
a_tuple = ( 'a', 'b', 'c', 'd', 'e' )
print( a_tuple[1:3] )
print( a_tuple[:3] )
print( a_tuple[1:] )

# Uncomment to see error
# a_tuple[0] = 'z'

('b', 'c')
('a', 'b', 'c')
('b', 'c', 'd', 'e')


### Tuples as return values
A function can only return one value
- However, if we make that value a tuple, we can effectively return multiple values
- For example, the divmod function takes two (2) arguments and retunrs a tuple of two (2) values, the quotient and remainder

In [14]:
quotient, remainder = divmod( 7, 3 )
print( quotient )
print( remainder )

2
1


In [15]:
def min_max( a_tuple ):
    return min( a_tuple ), max( a_tuple )

numbers = ( 13, 7, 55, 42 )
min_num, max_num = min_max( numbers )
print( min_num )
print( max_num )

7
55


### Variable-length argument tuples
- All the functions we have built and used required a specific number of arguments
- You can use tuples to build functions that accept a variable number of arguments
- Prepend the argument’s variable name with an * to do this
- It is referred to as the gather operator

In [16]:
def printall( *args ):
    print( args )

printall( 1 , 2.0 , '3' )

(1, 2.0, '3')


The complement is the scatter operator. It allows you to pass a sequence of values as individual arguments to the function. 

In [19]:
a_tuple = ( 7, 3 )
# divmod( a_tuple ) # Uncomment to see error
divmod( *a_tuple )

(2, 1)

### Dictionaries

In [20]:
from faker import Faker

fake = Faker()

In [21]:
small_dict = {}
for i in range(10):
    small_dict[fake.name()] = fake.address()

In [22]:
small_dict

{'Karen Arias': '2594 Torres Loaf Suite 296\nParkchester, AS 28468',
 'Christopher Salazar': '01137 Shaw Mission\nPalmerside, AR 61898',
 'Adam Benson': '6222 Danielle Burg\nLake David, FM 14632',
 'Felicia Russell': '846 Alfred Shore Suite 213\nEast Jasonborough, NC 34082',
 'Sandy Roman': '677 Mary Forks Suite 680\nWest Robertborough, MA 10166',
 'Brandon Lopez': '27083 Nathan Port\nSouth Carlos, WA 22997',
 'Kenneth Pineda': '90468 Moore Valleys\nChloeton, CT 67782',
 'Sean Walker': '359 Francisco Tunnel Suite 685\nWest Dakota, PR 92158',
 'Andrew Cruz': '715 Philip Locks\nBakermouth, PA 15588',
 'Kristin Meadows': '3396 Jackson Isle Apt. 152\nNicolemouth, ND 42026'}

In [24]:
%%timeit
small_dict["Karen Arias"]

43.8 ns ± 3.47 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [25]:
large_dict = {}

for i in range(10000):
    large_dict[fake.name()] = fake.address()

In [28]:
import itertools

# Get the first 10 items from the dictionary
first_10_items = dict(itertools.islice(large_dict.items(), 10))

print(first_10_items)

{'William Glover': '060 Timothy Knoll Suite 987\nClarkland, MN 81366', 'Kim Collins': 'PSC 3311, Box 7346\nAPO AP 15910', 'Tyler Reed': '8982 Walker Lodge Apt. 092\nWest Hannah, MP 14021', 'Matthew Mcmahon': '962 Shawn Wall\nEast Kyle, PW 50650', 'Elizabeth Baker': '19048 Lopez Road Suite 816\nEast Susanhaven, KS 95063', 'Zachary Chen': '7508 Anne Ford Suite 224\nWest Joshuaborough, NM 59072', 'Matthew Gonzales': '413 Kevin Route\nLake Jessica, IL 90093', 'Rhonda Meyer': 'PSC 9287, Box 9746\nAPO AP 85177', 'Jeremy Melendez': '4416 Scott Burgs\nLake Sandraburgh, NV 76988', 'Mr. Andrew Nichols': '8662 Kelsey Parkways Suite 031\nNorth Christopher, GA 40984'}


In [29]:
%%timeit
large_dict["William Glover"]

49.8 ns ± 4.7 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


## Sets

In [30]:
%%timeit
4200 in large_list

45.6 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [31]:
%%timeit
large_set = set(large_list)
4200 in large_set

131 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


Converting a list to a set then performing the lookup takes more than three times as long as the list lookup due to the time needed to convert the list to the set. However, once you have converted the list to a set, subsequent lookups are fast

In [32]:
large_set = set(large_list)

In [33]:
%%timeit
5436 in large_set

47.6 ns ± 2.33 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


The set lookup is 1,000 times faster than the list lookup. Converting to a set is a great option when you want to repeatedly check whether items are present in a list but it isn’t worthwhile if you want to do it only a small number of times. As with everything in this lesson, it’s worth experimenting and measuring what is faster for the particular problem you’re working on.