# Using Data Structures Effectively 

## Native Python Data Structures

### Lists

You can measure how the time taken to look up an element in a list changes as a list grows.

In [34]:
small_list = list(range(10))

In [35]:
small_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [36]:
%%timeit
last_element = small_list[-1]

31.5 ns ± 0.786 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [37]:
large_list = list(range(10000))

In [38]:
%%timeit
last_element = large_list[-1]

30.9 ns ± 0.606 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


If you need to search for an element in a list, you’ll need to compare the element you’re searching for with every item in the list. You can measure how this changes as the list grows using the lists from the previous code examples.
First, you can measure how long it takes to search the list containing 10 elements:


In [41]:
%%timeit
4200 in small_list


163 ns ± 29.1 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [42]:
%%timeit
4200 in large_list

50.6 µs ± 5.16 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


It takes approximately 1,000 times as long. Searching a Python list in this way is O(n). There are more efficient ways of searching a list, including binary search, which is O(log n). But if you need to frequently search if something is present it’s probably better to use a dictionary or a set. 

### Tuples

In [48]:
a_tuple = ( 'a', 'b', 'c', 'd', 'e' )
a_tuple = 'a', 'b', 'c', 'd', 'e'
a_tuple = 'a',

type( a_tuple )

tuple

If you want to create a tuple with a single value, add a comma (,) after the value, but don’t add parenthesis

In [49]:
a_tuple = tuple()
print( a_tuple )
a_tuple = tuple( 'lupins' )
print( a_tuple )

()
('l', 'u', 'p', 'i', 'n', 's')


In [52]:
a_tuple = ( 'a', 'b', 'c', 'd', 'e' )
print( a_tuple[1:3] )
print( a_tuple[:3] )
print( a_tuple[1:] )

# Uncomment to see error
# a_tuple[0] = 'z'

('b', 'c')
('a', 'b', 'c')
('b', 'c', 'd', 'e')


### Tuples as return values
A function can only return one value
- However, if we make that value a tuple, we can effectively return multiple values
- For example, the divmod function takes two (2) arguments and returns a tuple of two (2) values, the quotient and remainder

In [54]:
quotient, remainder = divmod( 17, 3 )
print( quotient )
print( remainder )

5
2


### Tuples as function args


In [56]:
def min_max( a_tuple ):
    return min( a_tuple ), max( a_tuple )

numbers = ( 13, 7, 55, 42, 234, -34 )
min_num, max_num = min_max( numbers )
print( min_num )
print( max_num )

-34
234


### Variable-length argument tuples
- Most functions require a specific number of arguments
- You can use tuples to build functions that accept a variable number of arguments
- Prepend the argument’s variable name with an * to do this
- It is referred to as the gather operator

In [57]:
def printall( *args ):
    print( args )

printall( 1 , 2.0 , '3' )

(1, 2.0, '3')


The complement is the scatter operator. It allows you to pass a sequence of values as individual arguments to the function. 

In [58]:
a_tuple = ( 7, 3 )
# divmod( a_tuple ) # Uncomment to see error
divmod( *a_tuple )

(2, 1)

### Dictionaries

In [59]:
from faker import Faker

fake = Faker()

In [60]:
small_dict = {}
for i in range(10):
    small_dict[fake.name()] = fake.address()

In [61]:
small_dict

{'Jonathan Herrera': '02386 Eric Route Suite 425\nPort Robertton, IL 08515',
 'John Parker': '98559 Bennett Drives\nNew Josephshire, ND 38607',
 'Charles Hubbard': '914 Mills Knoll\nWest Markside, AS 14801',
 'Tanya Bennett': '1390 Smith Junctions Suite 539\nMccallshire, MH 06150',
 'Jonathan Blevins': '1332 Timothy Knolls Suite 904\nWest Eileenstad, VA 25373',
 'Caroline Torres': '0603 Hopkins Flat\nEast Ryan, VA 77773',
 'Nancy Peck': '470 John Well Suite 996\nPort Stephenshire, MD 07804',
 'Marilyn Adkins': '93849 Edward Mews\nWest Sherry, AR 91343',
 'Cassandra Martin': '5775 Davis Field\nNew Tonyport, MH 20955',
 'Barbara Mueller': '42853 Yang Ville Suite 381\nNew Brenda, WY 11161'}

In [63]:
%%timeit
small_dict["Nancy Peck"]

40.9 ns ± 1.44 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [64]:
large_dict = {}

for i in range(10000):
    large_dict[fake.name()] = fake.address()

In [65]:
import itertools

# Get the first 10 items from the dictionary
first_10_items = dict(itertools.islice(large_dict.items(), 10))

print(first_10_items)

{'Jessica Koch': '461 Stephanie Ferry Apt. 707\nParkerhaven, KS 26988', 'Aaron Fowler': '488 Preston Mission\nPaigeshire, DE 57856', 'Jeffrey Morris': '932 Winters Locks Apt. 971\nRyanburgh, KS 24511', 'Matthew Hall': '0836 Corey Falls\nLake Devin, ID 49635', 'Kevin Johnson': '080 Kimberly Cliff Suite 989\nNorth Claire, TN 86541', 'Jeremy Griffin': '16979 Charlotte View Suite 972\nLake Patrick, FM 93468', 'Scott Garrison': '38126 Meyer Square\nNorth Barry, AZ 11191', 'Kimberly Klein': '6032 Laurie Roads Apt. 034\nCherylstad, NY 86191', 'Amy Wiley': '120 Garcia River\nSarahton, OH 56700', 'Sharon Curtis': '461 Walls Landing Apt. 228\nNew Jimmyland, MN 70144'}


In [67]:
%%timeit
large_dict["Jessica Koch"]

44.3 ns ± 2.82 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


## Sets

In [68]:
%%timeit
4200 in large_list

47.2 µs ± 3.61 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [69]:
%%timeit
large_set = set(large_list)
4200 in large_set

126 µs ± 1.24 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


Converting a list to a set then performing the lookup takes more than three times as long as the list lookup due to the time needed to convert the list to the set. However, once you have converted the list to a set, subsequent lookups are fast

In [70]:
large_set = set(large_list)

In [71]:
%%timeit
5436 in large_set

46.3 ns ± 2.36 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


The set lookup is 1,000 times faster than the list lookup. Converting to a set is a great option when you want to repeatedly check whether items are present in a list but it isn’t worthwhile if you want to do it only a small number of times. As with everything in this lesson, it’s worth experimenting and measuring what is faster for the particular problem you’re working on.