# Python Idioms

# Introduction


1. Introduce yourself

2. Describe my experience as a Python beginner
  
  - Questions are welcome!

3. What is an idiom?

Segue into what an idiom is. In language, an idiom is like a saying or figure of speech.

## What is an idiom?

#### *"...the usual way to code a task in a specific language."*

Source: https://stackoverflow.com/questions/302459/what-is-a-programming-idiom

# Convention: Two Approaches

Introduce the structure of the presentation

## Approach 1: basic solution

This is what people may try first when learning Python for the first time

## Approach 2: "pythonic" solution

This is the idiomatic approach to the task

It's not wrong to use Approach 1!

Both approaches achieve the same goal...

...but using Approach 2 may

- improve readability

Simpler, easier to read

- save time (programming time and running time)

Use builtin modules!

Less time reinventing the wheel, plus an optimized solution may already exist

- earn hacker cred 😎

People spend more time reading code than writing it!

Everyone appreciates elegant code.

Nothing's cooler than a savvy coder 😉

# Topics

Variables

Collections

Strings

In each topic, I'll introduce a common task and show each Approach.

I've tried to find idioms that appear in later version of Python 2, and all versions of Python 3 unless otherwise stated.

# Variables

## Task: Unpacking a sequence

# Task: Unpacking a sequence

## Approach 1

Use the index operator repeatedly

In [1]:
lst = ['a', 'b', 'c', 'd']    # ordered, mutable collection

foo = lst[0]
bar = lst[1]
baz = lst[-1] # what does -1 mean? More on that later...

Question: What are the values of each variable going to be?

In [2]:
print(foo, bar, baz)

a b d


This works fine for a few elements, but what if you want many elements?

# Task: Unpacking a sequence

## Approach 2

Use _ to discard unwanted elements

This is a throwback to Justin's presentation on underscores in Python!

The `_` variable is a real variable like any other in Python, but conventionally, it's used to discard objects we don't care about.

In [3]:
lst = ['a', 'b', 'c', 'd']

foo, bar, _, baz = lst # ignore the elements after bar and before baz

In [4]:
print(foo, bar, baz)

print(_)

a b d
c


More accurately, the "ignored" value is assigned to \_. You can use it just like any other variable, technically.

A common idiom is to use \_ to ignore a value returned by a function if it isn't needed.

What if you want to unpack single and multiple elements?

In [5]:
lst = ['a', 'b', 'c', 'd']

foo, *bar, baz = lst # unpack single elements and ranges within lst

In [6]:
print(foo, baz)

print(bar)

type(bar)

a d
['b', 'c']


list

Use the \* operator for sequences of elements (e.g. lists)

Use the \*\* operator to unpack collections of name-value pairs (e.g. dictionaries) in special circumstances (function arguments)

# Variables

## Task: Swapping two objects

# Task: Swapping two objects

## Approach 1

Use a temporary variable

In most languages I've worked with, this is the only way to swap two objects

In [7]:
a, b = 1, 2

temp = a
a    = b
b    = temp

In [8]:
print(a, b)

2 1


# Task: Swapping two objects

## Approach 2

Unpack the variables in reverse

In [9]:
a, b = 1, 2

a, b = b, a

In [10]:
print(a, b)

2 1


You could swap 2+ variables this way!

# Variables

## Task: Testing a single variable

By testing, I mean checking the variable for some true/false condition

# Task: Testing a single variable

## Approach 1

Check for equality

In [11]:
x = 1

if x != 0:
    print('x is non-zero')
else:
    print('x is zero')

x is non-zero


# Task: Testing a single variable

## Approach 2

Check the object's "truthy/falsy" value

Truthy/falsy is the pythonic way of describing objects that evaluate to true/false in a special way

In [12]:
x = 1

if x:
    print('x is non-zero')
else:
    print('x is zero')

x is non-zero


"*By default, an object is considered true unless its class defines either a `__bool__()` method that returns False or a `__len__()` method that returns zero, when called with the object.*"

Source: https://docs.python.org/3/library/stdtypes.html#truth-value-testing

What if you want to test a collection?

In [13]:
tup = (1,2,3,4) # an ordered, immutable sequence

if tup: # the same as `if len(tup) != 0: ...`
    print('tup is non-empty')
else:
    print('tup is empty')

tup is non-empty


This is how to test a collection, but what about testing the items inside it?

Segue into topic II...

# Collections

Typical collections include lists, tuples, and dictionaries.

# Collections

## Task: Testing all objects in a collection

Like before, this means testing an object for some true/false condition

# Task: Testing all objects in a collection

## Approach 1

Use a loop

Let's assume we want to check if at least one object meets the desired condition.

In [14]:
tup = (0, 0, 1, 0)

found_non_zero_element = False

for element in tup:
    if element:
        found_non_zero_element = True
        break

if found_non_zero_element:
    print('Found a non-zero element')
else:
    print('tup is empty or only contains zeroes')

Found a non-zero element


# Task: Testing all objects in a collection

## Approach 2

Use the `any()` function

No need to write boilerplate `for` loops for this task

In [15]:
tup = (0, 0, 1, 0)

if any(tup):
    print('Found a non-zero element')
else:
    print('Only zeroes in tup')

Found a non-zero element


Conversely, using `all()` will return true if and only if **all** elements evaluate to `True`

In [16]:
tup = (0, 0, 1, 0)

if all(tup):
    print('All elements in tup are non-zero')
else:
    print('At least one element in tup is zero')

At least one element in tup is zero


There are ways to use `any()` and `all()` with any boolean condition...

# Collections

## Task: Enumerating a sequence

This means iterating each object in a sequence while keeping count or index position in mind.

Compare to getting each object without caring about index or count...

# Task: Enumerating a sequence

## Approach 1
Manually set an index variable

In [17]:
some_list = [7, 8, 9]

print("index", ":", "value")
for index in range(len(some_list)):
    print(index, ":", some_list[index])


index : value
0 : 7
1 : 8
2 : 9


Manually incrementing an index variable is risky.

Many languages support `for (i=0; i < len(some_list); ++i) { ... }` syntax, but Python takes a different approach...

# Task: Enumerating a sequence

## Approach 2
Use the `enumerate()` function

In [18]:
some_list = [7, 8, 9]

print("index", ":", "value")
for index, value in enumerate(some_list):
    print(index, ":", value)


index : value
0 : 7
1 : 8
2 : 9


*"`enumerate(thing)`, where `thing` is either an iterator or a sequence, returns a iterator that will return `(0, thing[0])`, `(1, thing[1])`, `(2, thing[2])`, and so forth."*

Source: https://docs.python.org/2.3/whatsnew/section-enumerate.html

What if you don't need an index variable?

In [19]:
some_list = [7, 8, 9]

for value in some_list:
    print(value)

7
8
9


We've already seen this kind of `for` loop several times in this presentation

# Collections

## Task: Get a subset of a sequence

Here, let's assume we know the start and end positions of our subset within the original sequence.

# Task: Get a subset of a sequence

## Approach 1
Use a loop

In [20]:
names = ["bob", "sue", "george", "julie", "stan", "martha", "leo"]
subset = []

start = 2
end = 5

for current_index in range(start, end): # range gives the values start, start+1, start+2, ..., end-1
    subset.append(names[current_index])

print(subset)

['george', 'julie', 'stan']


# Task: Get a subset of a sequence

## Approach 2
Slice the collection

In [21]:
names = ["bob", "sue", "george", "julie", "stan", "martha", "leo"]

start = 2
end = 5

subset = names[start:end] # equivalent to names[slice(2, 5)]

print(subset)

['george', 'julie', 'stan']


You can also create slice objects using `slice()`, but it's more common to use the shorthand shown above

How do slices work?

Like the `range()` function, `slice` accepts three parameters:

- start
- stop (exclusive)
- step

In [22]:
# Source: https://stackoverflow.com/questions/509211/understanding-slice-notation

names = ["bob", "sue", "george", "julie", "stan", "martha", "leo"]

start = 2
stop = 5
step = 2

print("original list:", names, end='\n\n')

print("items start through stop-1:", names[start:stop], end='\n\n')

print("items start through the rest of the list:", names[start:], end='\n\n')

print("items from the beginning through stop-1", names[:stop], end='\n\n')

original list: ['bob', 'sue', 'george', 'julie', 'stan', 'martha', 'leo']

items start through stop-1: ['george', 'julie', 'stan']

items start through the rest of the list: ['george', 'julie', 'stan', 'martha', 'leo']

items from the beginning through stop-1 ['bob', 'sue', 'george', 'julie', 'stan']



So far, we've seen slices get objects one after the other. 

That's because the default slice **step** is 1.


What if we wanted a larger step?

In [23]:
# Source: https://stackoverflow.com/questions/509211/understanding-slice-notation

names = ["bob", "sue", "george", "julie", "stan", "martha", "leo"]

start = 2
stop = 5
step = 2

print("original list:", names, end='\n\n')

print("every step-th item from beginning to end of the list:", names[::step], end='\n\n')

print("every step-th item from start to stop-1:", names[start:stop:step], end='\n\n')

print("a copy of the entire list:", names[:], end='\n\n')

original list: ['bob', 'sue', 'george', 'julie', 'stan', 'martha', 'leo']

every step-th item from beginning to end of the list: ['bob', 'george', 'stan', 'leo']

every step-th item from start to stop-1: ['george', 'stan']

a copy of the entire list: ['bob', 'sue', 'george', 'julie', 'stan', 'martha', 'leo']



Start, stop, and step can also be *negative*:

`some_list = ["bob", "sue", "george", "julie", "stan", "martha", "leo"]`

`# + index:     0      1       2         3       4         5       6`

`# - index:    -7     -6      -5        -4      -3        -2      -1`

A negative step reverses the **direction** of the slice.

In [24]:
some_list = ["bob", "sue", "george", "julie", "stan", "martha", "leo"]

print("Get last item:", some_list[-1], end='\n\n')

print("First two items, reversed:", some_list[1::-1], end='\n\n')

print("Last two items, reversed:", some_list[:-3:-1], end='\n\n')

print("Everything except the last two items, reversed:", some_list[-3::-1], end='\n\n')

Get last item: leo

First two items, reversed: ['sue', 'bob']

Last two items, reversed: ['leo', 'martha']

Everything except the last two items, reversed: ['stan', 'julie', 'george', 'sue', 'bob']



2. Start at index 1 (second item), slice in reverse until begininning of list is reached

3. Start at end of list, slice in reverse until index -3 (exclusive) is reached

4. Start at index -3 (third last item), slice in reverse until beginning of list is reached

Slices are very powerful, but can be hard to read at first.

Start by checking the sign of step (positive by default), then you'll know if `start->end-1` is forward or reverse through the sequence

What if the subset you need is based on some true/false condition?

Slices won't help you here...

THIS IS A BONUS TOPIC -- Skip if time is running out!

Use a comprehension: 

`subset = [ <item> for <item> in <sequence> if <condition> ]`

`subset = { <key>:<val> for <key> in <dict> if <condition> }`

In [25]:
names = ["bob", "sue", "george", "julie", "stan", "martha", "leo"]

# get all three-letter names
three_letter_names = [name for name in names if len(name) == 3]

print(three_letter_names)

['bob', 'sue', 'leo']


In [26]:
names = ["bob", "sue", "george", "julie", "stan", "martha", "leo"]

# map names to their lengths
names_and_lengths = { name : len(name) for name in names }

print(names_and_lengths)

{'bob': 3, 'sue': 3, 'george': 6, 'julie': 5, 'stan': 4, 'martha': 6, 'leo': 3}


# Strings

# Strings

## Task: String formatting

Let's assume we have several variables we want to print at once

# Task: String formatting

## Approach 1

Concatenate each string with `+`

In [27]:
animal_1 = "fox"
animal_2 = "dog"

print("The quick brown " + animal_1 + " jumps over the lazy " + animal_2)

The quick brown fox jumps over the lazy dog


This approach is fine for a couple of strings, but inefficient when many strings need to be combined. More on this later...


# Task: String formatting

## Approach 2

Use a string formatting mechanism

In [28]:
# There are a few options depending on which version of Python you're running

animal_1 = "fox"
animal_2 = "dog"

# %-formatting (Python 2.6+)
print("The quick brown %s jumps over the lazy %s" % (animal_1, animal_2))

# str.format() (Python 2.6+)
print("The quick brown {} jumps over the lazy {}".format(animal_1, animal_2))

# f-strings (Python 3.6+)
print(f"The quick brown {animal_1} jumps over the lazy {animal_2}")

The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog


Many formatting options exist (google "Python format specifiers"), but they deserve their own presentation.

# Strings

## Task: Handling multiline strings

# Task: Handling multiline strings

## Approach 1

Manually escape each newline

By escape, I mean using a special character to represent a newline.

In Python, the backslash character denotes the start of an escape sequence for some character.

In [29]:
some_string = "This is a \nmultiline string!" # could also use io.linesep instead of \n

print(some_string)

This is a 
multiline string!


Escaped characters can make the original string hard to read.

# Task: Handling multiline strings

## Approach 2

Use `""" triple quotes """` to preserve newlines

In [30]:
some_string = """This is a
multiline string!"""

print(some_string)

This is a
multiline string!


This is useful when you need to preserve the newlines in a large multiline string.

# Strings

## Task: Escaping backslashes

This is a very common task when working with Windows file paths.

# Task: Escaping backslashes

## Approach 1

Manually escape each backslash character

Recall that `\` begins an escape sequence. So, `\\` escapes a single backslash.

In [31]:
path_unescaped = "C:\path\to\my\file" # backslashes aren't escaped (why doesn't this work?)

path_escaped = "C:\\path\\to\\my\\file" # backslashes are escaped

print(f"Opening {path_unescaped}")
print()
print(f"Opening {path_escaped}")


# with open(path) as my_file:
#    do stuff

Opening C:\path	o\myile

Opening C:\path\to\my\file


# Task: Escaping backslashes

## Approach 2

Use a raw string

In [32]:
path = r"C:\path\to\my\file" # an r-prefix indicates a raw string

print(f"Opening {path}")

Opening C:\path\to\my\file


Raw strings treat backslashes as literal characters.

# Strings

## Task: Joining strings

# Task: Joining strings

## Approach 1

Use the `+` operator

Similar to the String Formatting task, except let's assume we are only dealing with strings and not a mix of strings, ints, and other objects.

In [33]:
first_name = "Bilbo"
last_name = "Baggins"

full_name = first_name + " " + last_name

print(full_name)

Bilbo Baggins


This approach is fine for a few strings, but can cause performance issues with large numbers of strings

# Task: Joining strings

## Approach 2

Use `''.join()`

In [34]:
first_name = "Bilbo"
last_name = "Baggins"

full_name = ' '.join((first_name, last_name))

print(full_name)

Bilbo Baggins


When is it time to use `''.join()` instead of `+`?

Explain the setup below:

We will test the concat and join methods with the same list of strings as it gets larger each time.

In [35]:
from timeit import Timer # used to time executions


test_rounds = [ 1, 10, 100 ]

for n in test_rounds:
    print(f"testing with {n} concats and joins (time in seconds):")
    
    list_of_strings = ["Hello world!"] * n
    
    time_concat = Timer(stmt = "for x in list_of_strings: s += x", setup='s = ""', globals=globals()).timeit(1)
    print(time_concat)

    time_join = Timer(stmt = "s = ''.join(list_of_strings)", setup='s = ""', globals=globals()).timeit(1)
    print(time_join)
    
    print()

testing with 1 concats and joins (time in seconds):
1.000000000139778e-06
1.000000000139778e-06

testing with 10 concats and joins (time in seconds):
2.700000000022129e-06
1.2000000002565514e-06

testing with 100 concats and joins (time in seconds):
4.500000000007276e-05
2.700000000022129e-06



For 1-10 strings, the join and concat times are pretty close (on the same order of magnitude).

For 100 strings, the join is about 10x faster than concat. Let's see if that trend continues for many strings.

In [36]:
test_rounds = [ 10000, 100000, 1000000 ]

for n in test_rounds:
    print(f"testing with {n} concats and joins (time in seconds):")
    
    list_of_strings = ["Hello world!"] * n  # creates a list of n strings
    
    time_concat = Timer(stmt = "for x in list_of_strings: s += x", setup='s = ""', globals=globals()).timeit(1)
    print(time_concat)

    time_join = Timer(stmt = "s = ''.join(list_of_strings)", setup='s = ""', globals=globals()).timeit(1)
    print(time_join)
    
    print()

testing with 10000 concats and joins (time in seconds):
0.002031800000000139
9.790000000009513e-05

testing with 100000 concats and joins (time in seconds):
0.043153499999999845
0.0011418000000000816

testing with 1000000 concats and joins (time in seconds):
9.1315457
0.011656399999999678



Takeaway:

< 100: negligible

\> 100: `''.join()` is much faster than `+`

# Thank you!