# Chapter 2 -- Python Data Structures

In this chapter, we will touch briefly on Python's data structures.  As the name suggests, data structures are containers for data and other objects.  We introduce these objects since they form the building-blocks for other structures discussed in subsequent chapters.

Python has four main data structures.  There are:
    
    1. list
    2. tuple
    3. set
    4. dictionary

### List

A list contains an ordered collection of items.  You determine the order.  In Python, items in a list are separarated by commas and enclosed in square brackets.  Lists are mutable, meaning you can add,  remove, or alter items.  

In [2]:
# defining a list

a_list = ['ale', 'lager', 'stout', 'hefeweizen', 'stout']
print('There are', len(a_list), 'items in "a_list"')

There are 5 items in "a_list"


In the example below, the parameter:

    end=' '

for the print() method suppresses the default new-line character (CR/LF).

In [3]:
# print the list

print("A list of beers include:")
for i in a_list:
    print(i, end=' ')

A list of beers include:
ale lager stout hefeweizen stout 

### Indexing

Python provides indexing methods for a number of objects, including lists.  SAS has a similiar construct \_n\_ for the Data Step.  Both indexing methods act as keys providing access to an individual item (or set of items).  In Python's case, the default start index position is 0, and for SAS it is 1.

The SAS code example in the cell below is an imperfect analogy for a Python list, since the SAS logic uses a variable to hold the beer_type values.  Nonetheless, each program illustrates access to an 'item' value by an indexing method.

We will encounter additional Python indexing methods in subsequent chapters.

In [4]:
# Retrieve list item by index position.

print('Value for beer type is:', a_list[0])

Value for beer type is: ale


````
    4         data beers;
    5         length beer_type $ 10;
    6         input beer_type $ @@;
    7         
    8         if _n_ = 1 then
    9            put 'Value for beer type is: ' beer_type;
    10        
    11        list;
    12        datalines;

    Value for beer type is: ale
    RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
    13        ale lager stout hefeweizen stout 
````    

The next two examples illustrate the append() and sort() methods for a list.  They produce no visible output, so we use the print() method to view the results.

In [5]:
# List append method()

a_list.append('malt')

In [6]:
# list sort method()

a_list.sort(reverse=True)

In [7]:
# print the amended list

print("A list of beers include:")
for i in a_list:
    print(i, end=' ')

A list of beers include:
stout stout malt lager hefeweizen ale 

In [8]:
# list count method

print('a_list count for stout is:', a_list.count('stout'))

a_list count for stout is: 2


The example below illustrates Python's flexibility.  Since everything is an object, you can have a list containing other lists.  The built-in len() method is introduced to provide a method to count the number of items in the list.

In [9]:
# lists can contain different objects, including other lists

b_list = ['ales', 23, a_list]
print(b_list)
print('Item count for b_list is:', len(b_list))

['ales', 23, ['stout', 'stout', 'malt', 'lager', 'hefeweizen', 'ale']]
Item count for b_list is: 3


### Tuple

A tuple is similar to a list, but unlike lists, are immutable.   Tuples are defined by a list of comma-separated items inside a set of optional parentheses.  Ledgility of code demands their use, however!  

A common use case for tuples is where Python statements or user-defined functions can assume that the items will not change, for example the names of the months.

In [10]:
# defining a tuple

dishes = ('eggs', 'green ham', 'biscuits', 'grits', 'steak')
print('The breakfast menu has:', len(dishes), 'items')

The breakfast menu has: 5 items


In [11]:
# defining another tuple

more_dishes = ('pancakes', 'cupcakes', 'twinkies', dishes)
for i in more_dishes:
    print(i, end=' ')

pancakes cupcakes twinkies ('eggs', 'green ham', 'biscuits', 'grits', 'steak') 

In [12]:
# the tuple within a tuple remains a tuple that is indexed

len(more_dishes)

4

The index operators behave similiarly for tuples and lists.  In the example below, you access the 4th item in the tuple specifying more_dishes[3].  For the second item in the fourth item, you specify more_dishes[3][1].

In [13]:
# what is the 2nd item in the original tuple of dishes?  Python indexing starts at 0.

print(more_dishes[3][1])

green ham


In [14]:
# get the number of items in the tuple: more_dishes. 
# Subtract 1 since the 'tuple-within-a-tuple' item is not counted.  more_dishes[3] is the start 
# position of the 'tuple-within-a-tuple'. 

print('The number of items in more_dishes is:', len(more_dishes)-1+len(more_dishes[3]))

The number of items in more_dishes is: 8


### Dictionary

A dictionary provides a look-up method through key/value pairs.  Keys must be unique.  Keys must be immutable objects such as lists, however, values can be either mutable or unmutable objects.

Key/value pairs are specified as:

x = {key1: value1, key2: value2, key_n: value_n}


            

In [15]:
# create a dictionary

capital = {'Oregon' : 'Salem',
           'Washington' : 'Olympia',
           'California' : 'Sacrament',
           'Nevada' : 'Carson City'
          }
print(type(capital))            

<class 'dict'>


In [16]:
# Print a value by key

print('The capital of Nevada is', capital['Nevada'])

The capital of Nevada is Carson City


In [17]:
# adding a key/value pair

capital['Colorado'] = 'Denver'

In [18]:
# printing key/value pairs.  String formatting is described further in Chapter 3.

print('Number of value/pairs in the dictionary capital is {}:'.format(len(capital)))

Number of value/pairs in the dictionary capital is 5:


In [19]:
# Delete a key/value pair by key 
del capital['California']

## Sequences

There are three basic sequence types; list, tuple, and range. Range is covered in Chapter 4. A nice feature for sequences is membership testing.  Using operators such as in, not in, and concatenation, we test for the presence or absences of values.  

Implict in these examples are boolean values(True False) being returned.  Boolean operators are discussed in more detail in Chapter 3.

The precedence order for sequence operations is found <a href="https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range">here.</a>

In [20]:
# item membership in a list

item = 'ale'
if (item in a_list):
    print('found')
else:
    print('not found')

found


In [21]:
# item membership in a tuple using the and operator

item1 = 'eggs' 
item2 = 'cupcakes'
if (item1 and item2 in more_dishes):
    print('found')
else:
    print('not found')

found


In [22]:
# for loop to iterate over the key/value pairs

for state, capital in capital.items():
    print('The capital of {} is {}'.format(state, capital))

The capital of Colorado is Denver
The capital of Oregon is Salem
The capital of Washington is Olympia
The capital of Nevada is Carson City


In [24]:
# membership test using in for dictionary

if 'Nevada' in capital:
    print("Nevada's capital:", capital['Nevada'])

In [25]:
# using not in for membership test

'California' not in capital

True

## Set

A set object is an unordered collection of distinct objects. They are used for membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.  

Another common use case is when the existence of an object is more important than order or obtaining a count of occurences of items. 

In [26]:
# membership test using in for set 
months1 = set(['January', 'February', 'March', 'April', 'May', 'June'])

'Jan' in months1

False

In [27]:
# copy() method for sets

months2 = months1.copy()

In [28]:
# remove() method for sets

months1.remove('February')

In [29]:
# add() method for set

months2.add('July')

The three examples above do not produce any visible output.  The sets Month1 and Month2 are displayed below.

In [30]:
# print() method for sets Month1 and Month2

print('The set months1 contains:', months1)
print('The set months2 contains:', months2)

The set months1 contains: {'June', 'March', 'January', 'May', 'April'}
The set months2 contains: {'July', 'June', 'March', 'February', 'January', 'May', 'April'}


In [31]:
# find intersection of two sets

months1 & months2

{'April', 'January', 'June', 'March', 'May'}

...and test if the set months2 is a super-set of the set months1. 

In [32]:
# test if one set is a superset of another

months2.issuperset(months1)

True