# Using Data Structures Effectively 

## Native Python Data Structures

### Lists

You can measure how the time taken to look up an element in a list changes as a list grows.

In [77]:
small_list = list(range(10))

In [78]:
small_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [79]:
%%timeit
last_element = small_list[-1]

31.9 ns ± 0.346 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [80]:
large_list = list(range(10000))

In [81]:
%%timeit
last_element = large_list[-1]

32.2 ns ± 0.556 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


If you need to search for an element in a list, you’ll need to compare the element you’re searching for with every item in the list. You can measure how this changes as the list grows using the lists from the previous code examples.
First, you can measure how long it takes to search the list containing 10 elements:


In [82]:
%%timeit
4200 in small_list


144 ns ± 7.88 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [83]:
%%timeit
4200 in large_list

53.4 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


It takes approximately 1,000 times as long. Searching a Python list in this way is O(n). There are more efficient ways of searching a list, including binary search, which is O(log n). But if you need to frequently search if something is present it’s probably better to use a dictionary or a set. 

### Tuples

In [86]:
a_tuple = ( 'a', 'b', 'c', 'd', 'e' )
a_tuple = 'a', 'b', 'c', 'd', 'e'
a_tuple = 'a',

type( a_tuple )

tuple

If you want to create a tuple with a single value, add a comma (,) after the value, but don’t add parenthesis

In [87]:
a_tuple = tuple()
print( a_tuple )
a_tuple = tuple( 'lupins' )
print( a_tuple )

()
('l', 'u', 'p', 'i', 'n', 's')


In [90]:
a_tuple = ( 'a', 'b', 'c', 'd', 'e' )
print( a_tuple[1:3] )
print( a_tuple[:3] )
print( a_tuple[1:] )

# Uncomment to see error
# a_tuple[0] = 'z'

('b', 'c')
('a', 'b', 'c')
('b', 'c', 'd', 'e')


### Tuples as return values
A function can only return one value
- However, if we make that value a tuple, we can effectively return multiple values
- For example, the divmod function takes two (2) arguments and returns a tuple of two (2) values, the quotient and remainder

In [91]:
quotient, remainder = divmod( 17, 3 )
print( quotient )
print( remainder )

5
2


### Tuples as function args


In [93]:
def min_max( a_tuple ):
    return min( a_tuple ), max( a_tuple )

numbers = ( 13, 7, 55, 42, 234, -34, 123, 655 )
min_num, max_num = min_max( numbers )
print( min_num )
print( max_num )

-34
655


### Variable-length argument tuples
- Most functions require a specific number of arguments
- You can use tuples to build functions that accept a variable number of arguments
- Prepend the argument’s variable name with an * to do this
- It is referred to as the gather operator

In [94]:
def printall( *args ):
    print( args )

printall( 1 , 2.0 , '3' )

(1, 2.0, '3')


The complement is the scatter operator. It allows you to pass a sequence of values as individual arguments to the function. 

In [58]:
a_tuple = ( 7, 3 )
# divmod( a_tuple ) # Uncomment to see error
divmod( *a_tuple )

(2, 1)

### Dictionaries

In [95]:
from faker import Faker

fake = Faker()

In [96]:
small_dict = {}
for i in range(10):
    small_dict[fake.name()] = fake.address()

In [97]:
small_dict

{'James Jones': '2524 Daniel Plaza Apt. 367\nLisaport, VI 44638',
 'Sheryl Schaefer': '58782 Watson Loop Apt. 114\nMichaelview, RI 64083',
 'Larry Paul': '7955 Jones Streets\nSouth Casey, AL 08798',
 'Samantha Austin': '038 Johnson Road Suite 673\nAngelhaven, PW 17437',
 'Tanner Carroll': '673 Thomas Throughway\nMillerside, VI 51422',
 'Susan Stewart': '706 Gross Loop Suite 731\nReesestad, CT 69181',
 'Bryan Fowler': '782 Donald Valley\nRodriguezfort, MO 78192',
 'Brian Reynolds': '6421 Lewis Wells\nSouth Maureenton, NH 66327',
 'Keith Hall': '56931 Montoya Fords\nLake Anthony, ID 99578',
 'Kimberly Moore': '636 Scott Rapid\nTimothyberg, CO 09790'}

In [98]:
%%timeit
small_dict["James Jones"]

45 ns ± 9.36 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [99]:
large_dict = {}

for i in range(10000):
    large_dict[fake.name()] = fake.address()

In [100]:
import itertools

# Get the first 10 items from the dictionary
first_10_items = dict(itertools.islice(large_dict.items(), 10))

print(first_10_items)

{'James Everett': 'USNV Wade\nFPO AA 89645', 'Bradley Ford': '68498 Davis Mountain Suite 055\nHarryville, KS 20735', 'Lauren Anderson': '26013 Webster Avenue Suite 590\nNew Cherylside, UT 37632', 'Matthew Carlson': '5995 Ray Drive\nNew Christina, SD 73446', 'Cody Calhoun': '252 Tiffany Passage\nPort Samanthaborough, MS 28729', 'Chad Wang': '22095 David Prairie Suite 613\nLake Malloryside, NV 09951', 'Kari Brown': '791 Morris Mountains Apt. 817\nKellyton, MO 49595', 'Cynthia Brown': '9557 Morton Path Suite 863\nLake Ashleyfort, MS 09723', 'Christina Rodriguez': '4735 Torres Shoal Suite 036\nJenniferstad, FL 56705', 'James Martinez': '9320 Edwards Fields\nAdrianchester, TN 86974'}


In [101]:
%%timeit
large_dict["James Everett"]

44.7 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


## Sets

In [102]:
%%timeit
4200 in large_list

44.8 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [103]:
%%timeit
large_set = set(large_list)
4200 in large_set

145 µs ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


Converting a list to a set then performing the lookup takes more than three times as long as the list lookup due to the time needed to convert the list to the set. However, once you have converted the list to a set, subsequent lookups are fast

In [104]:
large_set = set(large_list)

In [105]:
%%timeit
5436 in large_set

47.9 ns ± 4.29 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


The set lookup is 1,000 times faster than the list lookup. Converting to a set is a great option when you want to repeatedly check whether items are present in a list but it isn’t worthwhile if you want to do it only a small number of times. As with everything in this lesson, it’s worth experimenting and measuring what is faster for the particular problem you’re working on.