<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Python's-list" data-toc-modified-id="Python's-list-1">Python's list</a></span></li><li><span><a href="#Learning-Outcomes" data-toc-modified-id="Learning-Outcomes-2">Learning Outcomes</a></span></li><li><span><a href="#Highlights-of-list-data-structure" data-toc-modified-id="Highlights-of-list-data-structure-3">Highlights of <code>list</code> data structure</a></span></li><li><span><a href="#List-Methods" data-toc-modified-id="List-Methods-4">List Methods</a></span></li><li><span><a href="#Append-vs-extend" data-toc-modified-id="Append-vs-extend-5">Append vs extend</a></span></li><li><span><a href="#Prefer-methods-over-symbols" data-toc-modified-id="Prefer-methods-over-symbols-6">Prefer methods over symbols</a></span></li><li><span><a href="#List-slicing-with-zero-based-indexing" data-toc-modified-id="List-slicing-with-zero-based-indexing-7">List slicing with zero-based indexing</a></span></li><li><span><a href="#List-copies" data-toc-modified-id="List-copies-8">List copies</a></span></li><li><span><a href="#Naming-lists" data-toc-modified-id="Naming-lists-9">Naming lists</a></span></li><li><span><a href="#Python's-tuple" data-toc-modified-id="Python's-tuple-10">Python's tuple</a></span></li><li><span><a href="#Tuples-are-defined-with-commas" data-toc-modified-id="Tuples-are-defined-with-commas-11">Tuples are defined with commas</a></span></li><li><span><a href="#What-are-common-data-science-scenarios-for-tuples?" data-toc-modified-id="What-are-common-data-science-scenarios-for-tuples?-12">What are common data science scenarios for tuples?</a></span></li><li><span><a href="#What-the-heck-is-hashable?" data-toc-modified-id="What-the-heck-is-hashable?-13">What the heck is hashable?</a></span></li><li><span><a href="#Takeaways" data-toc-modified-id="Takeaways-14">Takeaways</a></span></li><li><span><a href="#Bonus-Material" data-toc-modified-id="Bonus-Material-15">Bonus Material</a></span></li></ul></div>

<center><h2>Python's list</h2></center>

<center><h2>Learning Outcomes</h2></center>

__By the end of this session, you should be able to__:

- Explain the features and dangers of using lists.
- Write code with common list methods.
- List when to use tuples in data science code.

<center><img src="../images/suitcase.png" width="35%"/></center>

Lists are like a suitcase - a nice way to carry things around.

In [1]:
reset -fs

In [2]:
my_list = [42, "brian",  "🐶"]

In [3]:
type(my_list)

list

Highlights of `list` data structure
-----

- Collection of ordered items, aka a sequence.
- Can hold any kind of data, including lists.
- Can contain items of different types (or all the same type)
- Lists are mutable

In [4]:
my_other_list = [sum, my_list, True, Ellipsis]

List Methods
-----

You should be familiar with all the methods for a list.

You should be recognize when to use them in solve a problem and compare / contrast them.

In [67]:
# All list methods
[method for method in dir(list) if not method.startswith("__")]

['append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [5]:
# my_other_list.<tab>

In [6]:
# How can I find the number of times an item appears in a list?
my_list = [42, 'brian', 42]
my_list.count(42)

2

In [7]:
# How can I sort a list?
yet_another_list = [42, 3.14, -20]
yet_another_list.sort()  # In-place
yet_another_list

[-20, 3.14, 42]

Lists are mutable so sorting can happen in-place.

Append vs extend
-----

These two methods are often confused and mixing them up can cause bugs.

In [53]:
help(list.append)

Help on method_descriptor:

append(self, object, /)
    Append object to the end of the list.



In [52]:
# list.append inserts the item to the 
nums = [1, 2]
nums.append([3, 4]) 
nums

[1, 2, [3, 4]]

In [54]:
help(list.extend)

Help on method_descriptor:

extend(self, iterable, /)
    Extend list by appending elements from the iterable.



In [56]:
# list.extend concatenates two sequences 
nums = [1, 2]
nums.extend([3, 4]) 
nums

[1, 2, 3, 4]

Prefer methods over symbols
------

In [23]:
a = [42]
t = (42, 42)
a += t # What are trying to do?

In [24]:
a = [42]
t = (42, 42)
a + t # What are trying to do?

TypeError: can only concatenate list (not "tuple") to list

In [25]:
a = [42]
t = (42, 42)
a.append(t) # More explicit
a

[42, (42, 42)]

In [26]:
a = [42]
t = (42, 42)
a.extend(t)  
a

# More explicit. However, you shouldn't write code like this.
# It does implicit casting which adds cognitively load

[42, 42, 42]

In [27]:
# It is better to be explicit about types
a = [42]
t = (42, 42)
a.extend(list(t))
a

[42, 42, 42]

List slicing with zero-based indexing
-----

Why pick zero based indexing?

from Guido Van Rossum (creator of Python):

> Let's first look at use cases. Probably the most common use cases for slicing are "get the first n items" and "get the next n items starting at i" (the first is a special case of that for i == the first index). It would be nice if both of these could be expressed as without awkward +1 or -1 compensations.

> Using 0-based indexing, half-open intervals, and suitable defaults (as Python ended up having), they are beautiful: a[:n] and a[i:i+n]; the former is long for a[0:n].

> Using 1-based indexing, if you want a[:n] to mean the first n elements, you either have to use closed intervals or you can use a slice notation that uses start and length as the slice parameters. Using half-open intervals just isn't very elegant when combined with 1-based indexing. Using closed intervals, you'd have to write a[i:i+n-1] for the n items starting at i. So perhaps using the slice length would be more elegant with 1-based indexing? Then you could write a[i:n]. And this is in fact what ABC did -- it used a different notation so you could write a@i|n.

(See http://homepages.cwi.nl/~steven/abc/qr.html#EXPRESSIONS.)

<center><img src="../images/python-list-index.png" width="75%"/></center>

Get the first n items or get the next n items starting at i

```python
 letters[:n]  or letters[i:i+n]
```

In [49]:
letters = ["a", "b", "c", "d", "e"]

In [50]:
letters[:2]

['a', 'b']

In [51]:
# Like a wall moving forward; Up to but not including the last item
letters[1:4]

['b', 'c', 'd']

In [11]:
# List slicing works the same way as string (or any other sequence data)
# [start, stop, step]
letters[3:1:-1]

['d', 'c']

In [8]:
# Pythonic way to reverse any sequence
letters[::-1] # Creates just a "view" of the data without modifing it

['e', 'd', 'c', 'b', 'a']

In [72]:
letters = ["a", "b", "c", "d", "e"]
letters.reverse() # Modifies the list in-place; Not very common use case.
letters 

['e', 'd', 'c', 'b', 'a']

List copies
----

Whenever you need to make a copy of a list (or a dictionary), do not simply use the assignment operator.

In [12]:
nums = [1, 42]
nums_copy = nums
nums_copy[0] = 42

# What is the value of nums?
nums

[42, 42]

See what happens in Python Tutor

In [13]:
nums = [1, 42]
nums_copy = nums.copy()
nums_copy[0] = 42

print(nums)  

# What is the value of nums?
nums

[1, 42]

[Source](https://www.pythoncircle.com/post/602/5-common-mistakes-made-by-beginner-python-programmers/)

Naming lists
-----

In [62]:
# Don't name a list with just 'l', it looks too much like '1'
l = [] #  Letter "l" list looks like digit "1"

In [1]:
# Technically you can't name it 1 but still don't use just l  
1 = [] # Number 1 list  

SyntaxError: cannot assign to literal (<ipython-input-1-6c0c37df847c>, line 2)

Try to name a list with a specific plural noun

In [36]:
seq     = []  # Very generic, try to be more specific
items   = []  # Generic 
values  = []  # Slightly less generic
nums    = []  # Better
letters = []  # Even better
names   = []  # Good 
colors  = []  # Very good
fruit   = []  # Excellent

[Source](https://stackoverflow.com/questions/7785071/better-python-list-naming-other-than-list)

Python's tuple
-----

A tuples is an __immutable__ ordered collection of heterogeneous items.

Immutable are safer. Try to use them more often.

Tuples are defined with commas
-----

In [33]:
(42)

42

In [34]:
(42, )

(42,)

> In all cases except the empty tuple the comma is the important thing. 

> Parentheses are only required when required for other syntactic reasons: to distinguish a tuple from:   
> - a set of function arguments  
> - operator precedence  
> - to allow line breaks  

[Source](https://stackoverflow.com/questions/7992559/what-is-the-syntax-rule-for-having-trailing-commas-in-tuple-definitions)

In [28]:
t = 42, 42

In [29]:
type(t)

tuple

What are common data science scenarios for tuples?
-----

1. Database records 

    You don't want to accidentally change read-only data during data analysis.
<br>
<br>
2. Complex keys for dictionaries

    Tuples are immutable thus hashable, thus can be keys in a dictionary
    
    By the way, each tuple element has to be hashable too
    

[Source](https://stackoverflow.com/questions/1938614/in-what-case-would-i-use-a-tuple-as-a-dictionary-key)

In [58]:
cities_population = { ("San Francisco", "CA"): 883_305,
                      ("Portland", "OR"):      653_115,
                      ("Portland", "ME"):       66_417,
                    }

In [73]:
for city, state in cities_population:
    print(f"{city} in the state of {state} has a population of {cities_population[city, state]:,}")

San Francisco in the state of CA has a population of 883,305
Portland in the state of OR has a population of 653,115
Portland in the state of ME has a population of 66,417


What the heck is hashable?
----

In order to make something into a dict key or set, the object has to be hashable. Hashable means a hash value never changes during its lifetime.

Only __immutable__ objects have a hash value that never changes during its lifetime.

`int`, `float`, `str`, and `tuple` are the most common object types that are immutable and can be used in sets and dict keys.


In [37]:
help(hash)

Help on built-in function hash in module builtins:

hash(obj, /)
    Return the hash value for the given object.
    
    Two objects that compare equal must also have the same hash value, but the
    reverse is not necessarily true.



In [35]:
hash((42, 42))

-6730455444736158977

In [36]:
hash([42, 42])

TypeError: unhashable type: 'list'

<center><h2>Takeaways</h2></center>

- A list is a __mutable__ ordered sequence of heterogeneous items.
- You should be familar with list's methods.
- A tuples is an __immutable__ ordered sequence of heterogeneous items. 
- Slicing works the same for sequences: `list`, `tuple`, `str` 


Bonus Material
-----

In [31]:
# Extra commaas do not matter
scores = [67, 56, 61.5, ]
scores

[67, 56, 61.5]