PYTHON LISTS, TUPLES and METHODS
================================

**Author:** Marcus Birkenkrahe



## Overview



![img](../img/lists.jpg "Llyfrgell Genedlaethol Cymru / Llanfachraeth in darkness (1957)")

-   The `list` data type
-   Working with lists
-   Augmented assignment operators
-   Methods as type-specific functions
-   Standard Python vs. NumPy vs. pandas
-   Magic 8 Ball reloaded



## The `list` data type



-   A `list` contains multiple values in an ordered sequence.

-   A `list` is a *value* and can be stored in an object, and it also
    contains values also called *items*.

-   The list items can be of any data type including lists:



In [1]:
print([1,2,3])   # numeric list (numeric items)
print(['cat','bat','rat','elephant'])    # string list (string items)
print(['hello', True, None, 42, 3.1415]) # mixed type list

-   Lists can be stored like any other value:



In [1]:
spam = ['cat', 'bat', 'rat']
print(len(spam))    # number of items in spam
print(type(spam))   # class of spam
print([] == list('')) # empty list

-   `spam` is four things:
    
    1.  a `list` variable (storage)
    2.  a `list` value (stored)
    3.  an ordered sequence of string values (indexed)
    4.  an object (instanced)
    
    ![img](../img/7_list.png "A list with its index values")



## Practice list creation, extraction and deletion



You should be able to do all of these exercises with what you learnt
in the DataCamp course "Introduction to Python" ([notebook in GitHub](https://gist.github.com/birkenkrahe/0e1b69ba3ce842324335062842f28729)):

1.  Assign these items to `spam` and extract them using a ranged `for` loop
    on one line separated by a single space: `cat bat rat elephant`



In [1]:
spam = ['cat', 'bat', 'rat', 'elephant']
for i in range(4):
    print(spam[i], end=' ')

: cat bat rat elephant

1.  What if the list has `N` elements? Can you generalize the loop?



In [1]:
spam = ['cat', 'bat', 'rat', 'elephant']
for i in range(len(spam)):
    print(spam[i], end=' ')

: cat bat rat elephant

1.  Use elements of `spam` to print the sentence `'The bat ate the cat.'`
    formatted with an f-string:



In [1]:
spam = ['cat', 'bat', 'rat', 'elephant']
print(f"'The {spam[1]} ate the {spam[0]}.'")

1.  Which error do you get when you use an index that exceeds the number
    of values in your list value? Create an example.



In [1]:
spam = ['cat', 'bat', 'rat', 'elephant']
print(spam[5])

1.  Can index values be non-integer? Find out!



In [1]:
spam = ['cat', 'bat', 'rat', 'elephant']
print(spam[int(1.0)])
print(spam[1.0])

: bat

1.  How can you extract the last number in this list of lists?



In [1]:
spam = [['cat','bat'], [10,20,30,40,50]]

In [1]:
spam = [['cat','bat'], [10,20,30,40,50]]
print(spam[1][4],
      spam[1][-1],
      spam[-1][4],
      spam[-1][-1],
      end='')

: 50 50 50 50

1.  Write `'The elephant is afraid of the bat.'` using *negative* indices
    of `spam = ['cat', 'bat', 'rat', 'elephant']`:



In [1]:
spam = ['cat', 'bat', 'rat', 'elephant']
print(f"'The {spam[-1]} is afraid of the {spam[-4]}.'")

: 'The elephant is afraid of the cat.'

1.  From `spam = ['cat', 'bat', 'rat', 'elephant']`, extract
    `['cat','bat','rat']`:



In [1]:
spam = ['cat', 'bat', 'rat', 'elephant']
print(spam[0:3],    # slicing first three elements
      spam[-4:-1],  # slicing first three elements 'from the rear'
      sep='\n')
del spam[-1]        # deleting the last element
print(spam)

: ['cat', 'bat', 'rat']
   : ['cat', 'bat', 'rat']
   : ['cat', 'bat', 'rat']

1.  Change `spam = ['cat', 'bat', 'rat', 'elephant']` to the list
    `['cat','armadillo','rat', 'armadillo']`:



In [1]:
spam = ['cat', 'bat', 'rat', 'elephant']
spam[-1] = 'armadillo'
print(spam)
spam[1] = 'armadillo'
print(spam)

: ['cat', 'bat', 'rat', 'armadillo']
   : ['cat', 'armadillo', 'rat', 'armadillo']

1.  Create `spam = ['cat', 'bat', 'cat', 'bat']` by list concatenation
    and replication:



In [1]:
spam = ['cat','bat'] * 2
print(spam)
del spam
spam = ['cat','bat'] + ['cat','bat']
print(spam)

: ['cat', 'bat', 'cat', 'bat']
    : ['cat', 'bat', 'cat', 'bat']

## Working with lists - `allMyCats`



-   Here is a `list`-less version of a program to get the names of six
    cats from the user and printing them ([pythontutor](https://autbor.com/allmycats1/)):



In [1]:
catName1 = input('Enter the name of cat 1: ')
catName2 = input('Enter the name of cat 2: ')
catName3 = input('Enter the name of cat 3: ')
catName4 = input('Enter the name of cat 4: ')
catName5 = input('Enter the name of cat 5: ')
catName6 = input('Enter the name of cat 6: ')
print(f'The cat names are: {catName1}, {catName2},\
 {catName3}, {catName4}, {catName5}, {catName6}')

-   Instead, use a single variable that contains a `list` value
    ([pythontutor](https://autbor.com/allmycats2/)):



In [1]:
catNames = []
while True:
    print('Enter name of cat (or nothing to stop):')
    name = input()
    if name == '':
        break
    catNames = catNames + [name]
if not catNames:
    print('You should get a cat')
else:
    print('The cat names are:')
    for name in catNames:
        print(f'{name}')

1.  Initialize empty list `catNames`
    1.  Infinite loop: ask for cat's `name` until empty entry
    2.  Check if `catNames` were entered
    3.  If `catNames` were entered, print them looping over the `list`



## Looping over lists



-   Notice how the `for` loop ranges over the list elements without `range`:



In [1]:
for i in ['a','b', None, 10,100]:
    print(i,end=' ')

-   Can you print this list using a `for` loop with `range`?



In [1]:
List = ['a','b', None, 10,100]
for i in range(len(List)):
    print(List[i],end=' ')

-   Instead of using `range` to get the integer index of the list items,
    call `enumerate` instead:



In [1]:
List = ['a','b', None, 10,100]
for index, item in enumerate(List):
    print(f'Index {index} in the list is: {item}')

-   There is no simple way to get the name of `List` once it's been
    created because the variable name is just a *reference* to the data.

-   All `global` objects are available in a *dictionary* `globals().items()`.



In [1]:
print(globals().items())

## Scope and lists



-   Challenge:
    1.  copy the code cell into a new code cell in Colab
    2.  wrap the input routine into a function `getCatNames()`
    3.  make `catNames` global
    4.  call `getCatNames` before the final printout.



In [1]:
def getCatNames():
    global catNames  # make `catNames` global
    catNames = [ ]
    while True:
        print('Enter name of cat (or nothing to stop):')
        name = input()
        if name == '':
            return
        catNames = catNames + [name]
    return catNames

getCatNames()   # function call

if not catNames:
    print('You should get a cat')
else:
    print('The cat names are:')
    for name in catNames:
        print(f'{name}')

-   How could you keep `catNames` in local scope (inside the function) and
    still access its values outside?



In [1]:
def getCatNames():
    catNames = [ ]
    while True:
        print('Enter name of cat (or nothing to stop):')
        name = input()
        if name == '':
            return catNames
        catNames = catNames + [name]

myCatNames = getCatNames()
print(myCatNames)

1.  This function returns from the loop (and from the function call)
    when an empty string is entered (no input).
    1.  Otherwise it keeps adding cat names to the `catNames` list.
    2.  Upon returning from the function call, the list `catNames` is
        destroyed, but when the function call is saved in an object
        `myCatNames`, this object will hold the `return` value from
        `getCatNames`.



## `in` or `out`?



-   The `in` or `not in` command works on lists:



In [1]:
spam = ['cat', 'bat', 'rat']
print('cat' in spam)
print('chicken' not in spam)

: True
  : True

## Practice the `in` keyword for lists



-   Write a script that lets the user type in a pet `name` and checks if
    the `name` is `in` a list `myPets` (which you need to create first). If it
    is `in` the list, say "I have a pet with that name", otherwise say
    that you don't.

-   Solution:



In [1]:
myPets = ['Nanny', 'Rosie', 'Poppy', 'Jack']
name = input('Enter a pet name: ')
if name not in myPets:
    print(f"I don't have a pet named {name}.")
else:
    print(f"{name} is my pet.")

-   Here I put the `input` command in a function `getPetName`. When it is
    called, it returns `name`, but `name` is local to the function, and you
    need to transfer it to the global variable `petName` to be used:



In [1]:
def getPetName():
    name = input('Enter a pet name: ')
    return name

myPets = ['Nanny', 'Rosie', 'Poppy', 'Jack']

petName = getPetName()

if petName not in myPets:
    print(f"I don't have a pet named {petName}.")
else:
    print(f"{petName} is my pet.")

## Multiple assignments (`tuple` unpacking)



-   You can assign multiple variables with the values in one line.

-   The one assignment per line way:



In [1]:
cat = ['fast', 'moody', 'black']
speed = cat[0]
disposition = cat[1]
color = cat[2]
print(f'The {color} cat is {speed} and {disposition}')

: The black cat is fast and moody

-   Multiple assignments: number of variables and length of list must be
    exactly equal otherwise you get a `ValueError`.



In [1]:
cat = ['fast', 'moody', 'black']
speed, disposition, color = cat # stored as tuple and unpacked
print(f'The {color} cat is {speed} and {disposition}')

: The black cat is fast and moody

-   Handle the `ValueError` that is caused by adding a variable `name` to
    the assignment:



In [1]:
cat = ['fast', 'moody', 'black']
speed, disposition, color, name = cat # name is not known
print(f'The {color} cat is {speed} and {disposition}')

-   Solution:
    1.  put the assignment into a `try` clause and add a `except ValueError:`
        clause
    2.  to test, run original version (exception), then add `'Jack'` to `cat`
        in the first line



In [1]:
cat = ['fast', 'moody', 'black']
try:
    speed, disposition, color, name = cat
except ValueError:
    print('ValueError - check multiple assignment')
else:
    print(f'The {color} cat named {name} is {speed} and {disposition}')

## Lists as `random` arguments



-   The `random.choice` function will return a randomly selected item from
    the list:



In [1]:
import random
pets = ['dog', 'cat', 'squirrel','moose','mouse','pony','snake']
print(random.choice(pets))

-   This is a shorter form of `pets[random.randint(0,len(pets)-1]`:



In [1]:
import random
pets = ['dog', 'cat', 'squirrel','moose','mouse','pony','snake']
print(pets[random.randint(0,len(pets)-1)])

-   The `random.shuffle` function will reorder the items in a list: it
    modifies the list *in place* rather than returning a new list.



In [1]:
import random
people = ['Alice', 'Bob', 'Carol', 'David']
random.shuffle(people)
print(people)

## Augmented assignment operators



![img](../img/7_augmented.png "Augmented assignment operators")

-   These operators work for numbers, strings and lists:



In [1]:
spam = 'Hello, '
spam += 'world!'   # equivalent to spam = spam + 'world!'
print(spam)

bacon = ['Huzza']
bacon *= 3         # equivalent to bacon = bacon * 3
print(bacon)

## Methods for specific data types



-   Each data type as its own set of methods, e.g. the `list` data type
    has methods for finding, adding, removing and manipulating values.

-   Examples:
    1.  to call the `list` method `index` on the item `'hello'` of a list `spam`:



In [1]:
spam = ['hello','world']
print(spam.index('hello'))  # returns an index

1.  to call the `str` method `count` on the substring `'_'` of the string
    `'hello_world'` stored in `ham`:



In [1]:
ham = 'hello_world'
print(ham.count('_'))  # returns a count

-   This approach transfers to other packages such a `numpy` or `pandas` -
    the focus of the methods is on the library purpose like numeric data
    processing or statistical analysis.

-   Where applicable, I will contrast standard Python with NumPy and/or
    pandas (Kudos OpenAI: ChatGPT has been invaluable for this task.)



## Finding a value in a `list` with `index`



-   If the value is not in the list, a `ValueError` is raised:



In [1]:
spam = ['hello', 'hi', 'howdy', 'hey']
print(spam.index('howdy'))
print(spam.index('howdy howdy howdy'))

-   When there are duplicates, the first instance is returned:



In [1]:
spam = ['hello', 'hi', 'howdy', 'hey', 'hi']
print(spam.index('hi'))

## Finding a value in a numpy `array` with `where`



-   In NumPy, you can use the `where` function - a lot more information is
    available, but you need more skill to sort through it:



In [1]:
import numpy as np
spam = ['hello', 'hi', 'howdy', 'hey', 'hi']

# turn list into numpy array
spam_np = np.array(spam)

# store value of index for item
idx = np.where(spam_np == 'howdy')

print(idx)    # index informaion (full)
print(idx[0][0])  # index only
print(spam_np[idx])   # array value

## Finding a value in a pandas `series` with `pd.index`



-   In pandas, you can use Boolean indexing:



In [1]:
import pandas as pd

# Create a pandas Series
spam_pd = pd.Series(['hello', 'hi', 'howdy', 'hey', 'hi'])

# Find the index where the value is equal to 'howdy'
index = spam_pd[spam_pd == 'howdy'].index[0]

print(index)

-   If the value is not found in the Series, it will raise an
    `IndexError`.



## Adding values for lists with `append` and `insert`



-   You can add new values to a list with `append` (at the end) and
    `insert`.

-   Append `'moose'` at the end of `spam`:



In [1]:
spam = ['cat', 'dog', 'bat']
print(spam)
spam.append('moose')
print(spam)

-   Insert `'chicken'` as item number `1` into `spam`:



In [1]:
spam = ['cat', 'dog', 'bat']
print(spam)
spam.insert(1,'chicken')
print(spam)

: ['cat', 'dog', 'bat']
  : ['cat', 'chicken', 'dog', 'bat']

-   These functions modify a list *in place*: neither of them gives the
    new value as a return value - they return `None` instead:



In [1]:
spam = ['cat', 'dog', 'bat']
print(spam.append('moose'))
print(spam)
print(spam.insert(1,'chicken'))
print(spam)

-   If that's so, what does `spam = spam.append('elephant')` do?



In [1]:
spam = ['cat', 'dog', 'bat']
print(spam)
spam = spam.append('elephant')
print(spam)

## Adding and inserting for NumPy `array`



-   By contrast, NumPy's `np.append` and `np.insert` methods create a new
    array and you need to assign the result back to the array to keep
    it:



In [1]:
import numpy as np

spam_np = np.array(['cat', 'dog', 'bat', 'elephant'])

print(spam_np)

spam_np = np.append(spam_np, 'moose')

print(spam_np)

spam_np = np.insert(spam_np, 1, 'chicken')

print(spam_np)

-   The behavior of NumPy for strings is tricky though: e.g. string
    items in the array will be truncated if the inserted string is
    larger than the largest string already in the array.

-   To test that, run the code above and remove `'elephant'`: the
    resulting inserted array will list `'chick'` and not `'chicken'`.

-   Numbers work better: an example with `np.append`



In [1]:
import numpy as np

# Create a numpy array
arr = np.array([1, 2, 3, 4, 5])

# Append a single value
arr = np.append(arr, 6)
print(arr)  # Output: [1 2 3 4 5 6]

# Append multiple values
arr = np.append(arr, [7, 8, 9])
print(arr)  # Output: [1 2 3 4 5 6 7 8 9]

-   An example with `np.insert`:



In [1]:
import numpy as np

# Create a numpy array
arr = np.array([1, 2, 3, 4, 5])

# Insert a single value at index 2
arr = np.insert(arr, 2, 6)
print(arr)  # Output: [1 2 6 3 4 5]

# Insert multiple values at index 3
arr = np.insert(arr, 3, [7, 8, 9])
print(arr)  # Output: [1 2 6 7 8 9 3 4 5]

: [1 2 6 3 4 5]
  : [1 2 6 7 8 9 3 4 5]

## Adding columns and rows in pandas `DataFrame`



-   The central structure for `pandas` is the DataFrame, a tabular
    structure of column vectors of the same length with each vector only
    having one type.

-   Let's import `pandas` as `pd` and create a DataFrame `df`:



In [1]:
import pandas as pd

# Create a DataFrame of four column vectors A,B,C,D
df = pd.DataFrame({
    'A': ['foo', 'bar', 'baz'],
    'B': ['one', 'one', 'two'],
    'C': ['x', 'y', 'z'],
    'D': [1, 2, 3]
})

-   Adding a new column to a DataFrame by adding it like an index:



In [1]:
<<df>>
# Add a new column E
df['E'] = ['alpha', 'beta', 'gamma']

print(df)

-   Inserting a new column at a specific position with `df.insert`:



In [1]:
<<df>>
# Insert a new column at position 1
df.insert(1, 'F', ['apple', 'banana', 'cherry'])

print(df)

:      A       F    B  C  D
  : 0  foo   apple  one  x  1
  : 1  bar  banana  one  y  2
  : 2  baz  cherry  two  z  3

-   Adding a new row to a DataFrame with `df.concat` (`df.append` is also
    available but it is deprecated as of 2022):



In [1]:
<<df>>
# Create a new DataFrame for the new row
new_row = pd.DataFrame([{'A': 'qux', 'B': 'three', 'C': 'w',\
                         'D': 4, 'E': 'delta', 'F': 'durian'}])

# Use pd.concat to append the new row
df = pd.concat([df, new_row])

# Use pd.append to append the new row once again
df = df.append(new_row)

print(df)

:      A      B  C  D      E       F
  : 0  foo    one  x  1    NaN     NaN
  : 1  bar    one  y  2    NaN     NaN
  : 2  baz    two  z  3    NaN     NaN
  : 0  qux  three  w  4  delta  durian
  : 0  qux  three  w  4  delta  durian

## Trying to call a method on another data type



-   The `append` and `insert` methods are `list` methods and won't work for
    strings or integers:



In [1]:
eggs = 'hello'
eggs.append('world')

-   Calling `insert` on an integer:



In [1]:
bacon = 42
bacon.insert(1,'world')

## Removing values from lists with `remove` or `del`



-   The `remove` method removes its arguments in place:



In [1]:
spam = ['cat','bat','rat','elephant']
print(spam)
spam.remove('bat')
print(spam)

-   Trying to remove a value that does not exist raises a `ValueError`:



In [1]:
spam = ['cat','bat','rat','elephant']
spam.remove('chicken')

-   If there are multiple identical items, only the first will be
    removed:



In [1]:
spam = ['cat','bat','rat','elephant','cat','bat']
print(spam)
spam.remove('bat')
print(spam)   # only the first instance is removed

-   Wondering at this point how many values you can remove at a time?
    Check the help (don't forget that this is a `list` method):



In [1]:
help(list.remove)

: Help on method_descriptor:
  :
  : remove(self, value, /)
  :     Remove first occurrence of value.
  :
  :     Raises ValueError if the value is not present.
  :

-   If you know the index of the item you want to remove, you can use
    the `del` keyword to delete items:



In [1]:
spam = ['cat','bat','rat','elephant','cat','bat']
del spam[1]
print(spam)

-   To remove more than one item at a time, you can either use a `list`
    comprehension (`set` building), or the `filter` function (lambda):



In [1]:
spam = ['cat','bat','rat','elephant','cat','bat']

# Remove all 'bat' items
spam = [item for item in spam if item != 'bat']

print(spam)  # Output: ['cat', 'rat', 'elephant', 'cat']

-   In the example, the `filter` function (*iterator*) takes an anonymous or
    `lambda` function as the argument:



In [1]:
spam = ['cat','bat','rat','elephant','cat','bat']

# Remove all 'bat' items
spam = list(filter(lambda item: item != 'bat', spam))

print(spam)  # Output: ['cat', 'rat', 'elephant', 'cat']

## Removing values from a NumPy `array`



-   You cannot directly remove an item from an `array` like in a Python
    list with `remove` but you can create a new array that doesn't include
    the items to be removed.

-   Using Boolean indexing or masking:



In [1]:
import numpy as np

# Create a numpy array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# Create a new array that doesn't include the value 5
arr = arr[arr != 5]

print(arr)  # Output: [1 2 3 4 6 7 8 9]

: [1 2 3 4 6 7 8 9]

-   Using the `np.delete` method:



In [1]:
import numpy as np

# Create a numpy array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# Create a new array that doesn't include the item at index 4
arr = np.delete(arr, 4)

print(arr)  # Output: [1 2 3 4 6 7 8 9]

: [1 2 3 4 6 7 8 9]

## Removing values from a pandas `DataFrame`



-   The `pd.drop` function is used to remove either columns or rows from a
    DataFrame: the keyword parameter `axis` is `1` for columns, `0` for rows.

-   Unlike the NumPy arrays, you can specify if you wish to modify the
    DataFrame in place using the `inplace` keyword parameter.

-   Remove a column:



In [1]:
import pandas as pd

# Create a simple dataframe
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

print("Original DataFrame")
print(df)

# Drop column 'A'
df = df.drop('A', axis=1)

print("DataFrame After Dropping Column 'A'")
print(df)

-   Remove a row:



In [1]:
import pandas as pd

# Create a simple dataframe
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

print("Original DataFrame")
print(df)

# Drop row at index 1
df = df.drop(1, axis=0)

print("DataFrame After Dropping Row at Index 1")
print(df)

: Original DataFrame
  :    A  B  C
  : 0  1  4  7
  : 1  2  5  8
  : 2  3  6  9
  : DataFrame After Dropping Row at Index 1
  :    A  B  C
  : 0  1  4  7
  : 2  3  6  9

## Sorting values in a list with `sort`



-   Lists of number values or strings can be sorted with `list.sort`:



In [1]:
spam = [2, 5, 3.14, 1, -7]
spam.sort()  # default sort is ascending
print(spam)

ham = ['ants', 'cats', 'dogs', 'badgers', 'elephants']
ham.sort()  # default sort is ascending in alphabetical order
print(ham)

-   To reverse the order from ascending to descending use the `reverse`
    keyword:



In [1]:
spam = [2, 5, 3.14, 1, -7]
spam.sort(reverse=True)  # reverse sorting
print(spam)

ham = ['ants', 'cats', 'dogs', 'badgers', 'elephants']
ham.sort(reverse=True)  # reverse sorting
print(ham)

: [5, 3.14, 2, 1, -7]
  : ['elephants', 'dogs', 'cats', 'badgers', 'ants']

-   As you can see in the `help(list.sort)` docstring, you can also sort
    using a function, e.g. the `len` function:



In [1]:
ham = ['ants', 'cats', 'dogs', 'badgers', 'elephants', 'snakes']
ham.sort(key=len,reverse=True)  # reverse sorting by length
print(ham)

-   In the last example, note that `'ants'` goes before `'cats'` before
    `'dogs'` because within a group of strings with the same `len` value,
    sorting is alphabetical (in ascending order).

-   To change this is more complex: you use an anonymous `lambda` function
    in the `sort` function that sorts first by `len` and then reverses the
    order:



In [1]:
ham = ['ants', 'cats', 'dogs', 'badgers', 'elephants', 'snakes']

# Sort the list by the number of characters in each string, and then reverse the alphabetical order
ham.sort(key=lambda x: (len(x), x), reverse=True)

print(ham)

## Reversing values in a list with `reverse`



-   To quickly reverse the order of list items, use `list.reverse`:



In [1]:
spam = ['cat', 'dog', 'moose']
spam.reverse()
print(spam)

: ['moose', 'dog', 'cat']

-   This is a simple function that does not offer any keyword
    parameters:



In [1]:
help(list.reverse)

## Sorting a NumPy `array`



-   The `np.sort` function offers different sorting algorithms (`kind`), and
    you can specify along which dimension to sort (`axis`), and the `order`.



In [1]:
import numpy as np
help(np.sort)

-   A simple example - sorting is in ascending order by default, and a
    new sorted array is created.



In [1]:
import numpy as np

# Create a numpy array
arr = np.array([3, 2, 1, 5, 4])

print("Original array:")
print(arr)

# Sort the array
sorted_arr = np.sort(arr)

print("Sorted array:")
print(sorted_arr)

: Original array:
  : [3 2 1 5 4]

-   To reverse the sorting order of a NumPy array, you can use the
    `[::-1]` slicing operation after sorting the array:



In [1]:
import numpy as np

# Create a numpy array
arr = np.array([3, 2, 1, 5, 4])

print("Original array:")
print(arr)

# Sort the array in descending order
sorted_arr_desc = np.sort(arr)[::-1]

print("Array sorted in descending order:")
print(sorted_arr_desc)

: Original array:
  : [3 2 1 5 4]
  : Array sorted in descending order:
  : [5 4 3 2 1]

-   This slicing trick also works with lists:



In [1]:
spam = [3, 2, 1, 5, 4]
spam.sort()
print(spam)
print(spam[::-1])

: [1, 2, 3, 4, 5]
  : [5, 4, 3, 2, 1]

## Sorting a pandas `DataFrame`



-   You can sort a DataFrame by values in one or more columns with the
    `pd.sort_values` method:



In [1]:
import pandas as pd

# create a simple dataframe with columns, A,B,C
df = pd.DataFrame({
    'A': [2,3,1],
    'B': [1,3,2],
    'C': ['b','a','c']
})

print("Original DataFrame")
print(df)

# Sort by column 'A'
df_sorted = df.sort_values('A')

print("DataFrame sorted by column 'A'")
print(df_sorted)

-   A DataFrame is not a matrix: to sort by the rows you need to sort by
    the row labels (the index) using the `sort_index` method:



In [1]:
import pandas as pd

# create a simple dataframe with columns, A,B,C 
df = pd.DataFrame({
    'A': [2,3,1],
    'B': [1,3,2],
    'C': ['b','a','c']
}, index = ['Y', 'X', 'Z'])

print("Original DataFrame")
print(df)

# sort by index
df_sorted = df.sort_index()

print("DataFrame sorted by index")
print(df_sorted)

#+begin_example
  Original DataFrame
     A  B  C
  Y  2  1  b
  X  3  3  a
  Z  1  2  c
  DataFrame sorted by index
     A  B  C
  X  3  3  a
  Y  2  1  b
  Z  1  2  c
  #+end_example

## Exceptions to Python indentation rules for `list`



-   Indentation is significant in Python because the indentation for a
    line of code tells Python what block it is in, otherwise you get an
    `IndentationError`.

-   Lists, however, can span several lines in any indentation, and the
    same goes for pandas `DataFrame` and NumPy `array` structures: Python
    knows that the structure isn't finished before the ending bracket.

-   List example:



In [1]:
spam = ['apples',
        'oranges',
                        'bananas',
 'peaches'                            ]
print(spam)
print(type(spam))

-   NumPy example:



In [1]:
import numpy as np
arr = np.array(['apples',
        'oranges',
                        'bananas',
 'peaches'                            ])
print(arr)
print(type(arr))

-   Pandas example:



In [1]:
import pandas as pd
df = pd.DataFrame({ 'A': [1,2,
                          3],
    'B' :
                    [4, 5, 6],
                       'C':
[7,8,9]
                    })
print(df)
print(type(df))

:    A  B  C
  : 0  1  4  7
  : 1  2  5  8
  : 2  3  6  9
  : <class 'pandas.core.frame.DataFrame'>

## Practice `list` methods - Magic 8 Ball reloaded



1.  Earlier, you created a Magic 8 ball program as a fortune teller:



In [1]:
import random

def getAnswer(answerNumber):
    if answerNumber == 1:
        return 'It is certain'
    elif answerNumber == 2:
        return 'It is decidely so'
    elif answerNumber == 3:
        return 'It is Yes'
    elif answerNumber == 4:
        return 'Reply hazy try again'
    elif answerNumber == 5:
        return 'Ask again later'
    elif answerNumber == 6:
        return 'Concentrate and ask again'
    elif answerNumber == 7:
        return 'My reply is no'
    elif answerNumber == 8:
        return 'Outlook not so good'
    elif answerNumber == 9:
        return 'Very doubtful'

r = random.randint(1,9)
fortune = getAnswer(r)
print(fortune)

1.  Using lists, write a much more elegant version of the previous
    Magic 8 Ball program:
    -   instead of several lines of nearly identical `elif` statements,
        create a single list `messages` to work with. The list holds the
        messages as its items.
    -   instead of calling a function `getAnswer`, `print` a message using
        `random.randint` to pick the index (i.e. the position) of the
        message - there are 9 messages. Remember that `random.randint(a,b)`
        picks an integer in `[a,b]`.
    -   You can generalize the program further by making the upper bound
        of `random.randint` independent of the number 9. Now you could add
        messages to the list ad infinitum.

2.  Solution:



In [1]:
import random

messages = ['It is certain',
            'It is decidedly so',
            'Yes, definitely',
            'Reply hazy try again',
            'Ask again later',
            'Concentrate and ask again',
            'My reply is no',
            'Outlook not so good',
            'Very doubtful']
print(messages[random.randint(0,len(messages)-1)])

: Very doubtful

1.  Test the performance of both programs in Colab using `%timeit`. Do
    you record any difference?



## Summary



-   Lists are useful data types since they allow you to write code that
    works on a modifiable number of values in a single variable.
-   Lists are a sequence data type that is mutable, meaning that their
    contents can change.
-   Tuples and strings, though also sequence data types, are immutable
    and cannot be changed.
-   A variable that contains a tuple or string value can be overwritten
    with a new tuple or string value
-   This is not the same thing as modifying the existing valuein place —
    like, say, the `append()` or `remove()` methods do on lists.
-   Variables do not store list values directly; they store references
    to lists. Any change you make to a list may impact other variables.
-   You can use `copy()` or `deepcopy()` if you want to make changes to a
    list in one variable without modifying the original list.
-   NumPy array and pandas DataFrame structures are purpose-built to
    handle multi-dimensional numeric data (NumPy) or general data in
    tabular form (pandas).
-   The methods to manipulate arrays and DataFrames in many ways
    parallel the functions for lists (often they have the same name).



## Glossary



| TERM/COMMAND|DEFINITION|
|---|---|
| <code>random.choice</code>|Return randomly selected list item|
| <code>random.shuffle</code>|Randomly shuffle list items|
| <code>np.array</code>|Numpy array creation|
| <code>list.append</code>|Append values to list <i>in place</i>|
| <code>list.insert</code>|Insert value at list index value <i>in place</i>|
| <code>np.append</code>|Create new array with appended value|
| <code>np.insert</code>|Create new array with inserted value|
| <code>df.insert</code>|Insert new column in pandas DataFrame|
| <code>df.concat</code>|Add new row to pandas DataFrame|
| <code>list.remove</code>|Remove values from list|
| <code>del</code>|Keyword to remove specific list value|
| Comprehension|Building Boolean index flags for sets|
| <code>lambda</code>|Keyword for anonymous functions|
| <code>filter</code>|Iterator to filter sequence data|
| <code>np.delete</code>|Create new array without the deleted value|
| <code>pd.drop</code>|Remove columns or values from DataFrame|
| <code>list.sort</code>|Sort list values in place (<code>reverse=False</code>)|
| <code>list.reverse</code>|Reverse list items in place|
| <code>np.sort</code>|Sort NumPy arrays|
| <code>[::-1]</code>|Reverse sorting order slicing (lists or arrays)|
| <code>pd.sort_values</code>|Sort DataFrame by values in one or more columns|
| <code>pd.sort_index</code>|Sort DataFrame by row index|

