# 20: Introduction to Python

Author: Greg Wray  
2025-FEB-22  
  
Type code into this notebook during lecture. Run code in the selected cell by clicking the `run` (play button) icon or typing `shift-return`.  
Modify and experiment!! This is the best way to get a feel for how Python works (and any other language).   
Consider adding comments to record notes and findings: `#` starts a comment on a new line or part way through a line (just like R and bash). 

### First Python program

In [4]:
print("Hello world!")

Hello world!


### Simple expressions

Most math operators in Python are the same as R.
The main exception is the exponentiation operator, `**`.   
Use round brackets to enforce and clarity order of operations. As with other languages, inequalities return boolean (True/False) values.

In [2]:
# basic math
(2 + 3) * 5

25

In [3]:
# floor division (returns the remainder)
100 % 20            

0

In [202]:
# exponentiation
3**3                # unlike R; in Python ^ is a bit-wise operator

27

In [203]:
# boolean expression
(3 + 7) > 9

True

### Atomic data types and assignment

Data types are implicit in Python; the rules are very similar to R.   
Variable names can be re-bound (re-assigned) at any time. Warning: re-binding erases the previous value!

In [14]:
# Python uses dynamic typing
varA = 42               # unquoted numerals without decimal are integers
varC = 42.0             # unquoted numerals with a decimal are floats
varC = "42"             # single and double quotes both indicate string, even if only numerals are present
varD = 'Forty-two'      # re-binding to the same identifier is allowed
varD = False            # True and False without quotes are booleans

varC                    # evaluating a declared variable returns its current value

'42'

In [264]:
# to see what type Python used when creating a variable, use the type() function
type(varC)

str

### Common data structures

The four most common Python data structures are list, tuple, set, and dictionary.  
These are general-purpose **containers**: able to hold any data type or structure, including mixed contents.  
You can specify and tell them apart by the type of brackets/braces and (for dictionary) by separators.

In [271]:
# list -- general-purpose container: square brackets
dna_bases = ['a', 'c', 't', 'g']
# tuple -- immutable version of a list: round brackets
some_integers = (12, 7, 23, 0, 8, -2, 4, 2)
# set -- unordered version of a list with no repeat items; curly braces
palette = {'blue', 'orange', 'green', 'red'}
# dictionary -- curly braces enclose a vector of key:value pairs 
office_numbers = {"Paul": 4103, "Greg": 4104, "Tania": 4115}

In [207]:
# to extract the length of a container, use len()
len(my_dictionary)

3

In [208]:
# len() also works with strings (but not other atomic data types)
len(c)          # returns the number of characters (different from R)

9

In [209]:
# type() works with all data types
type(dna_bases)

list

In [237]:
# mixing data types and nesting within containers is allowed; it is also possible to include variables, expressions, and functions
big_list = [palette, True, [1, 2, 4, 8, 16], "Jabberwocky", 5**8, len(dna_bases)]
big_list

[{'blue', 'green', 'orange', 'red'},
 True,
 [1, 2, 4, 8, 16],
 'Jabberwocky',
 390625,
 4]

In [16]:
# a more practical and common use is create a custom data object
tmnt = {"Leonardo": ["leader", "blue", "katana"], 
        "Raphael" : ["muscle", "red", "sai"],
        "Donatello" : ["brains", "purple", "bo"],
        "Michelangelo" : ["comedian", "orange", "nunchuk"]}
tmnt

{'Leonardo': ['leader', 'blue', 'katana'],
 'Raphael': ['muscle', 'red', 'sai'],
 'Donatello': ['brains', 'purple', 'bo'],
 'Michelangelo': ['comedian', 'orange', 'nunchuk']}

### Iterables

An iterable is a data object that can return items one at a time. Strings and the four data structures mentioned above are iterable.

In [266]:
# looping over a container returns one item at a time
for x in palette:
    print(x)

green
blue
red
orange


In [261]:
# looping over a string returns one character at a time
for c in varC:
    print(i)

F
o
r
t
y
-
t
w
o


### Indexing

Two key points to remember about indexing in Python:
1. zero-based (unlike R, which is 1-based)
2. slices include the beginning but not the end value (unlike R, which includes both)   

In [239]:
# square-bracket indexing works for all iterables except dictionaries
some_integers[2]

23

In [240]:
# slices return ranges
some_integers[0:5]

(12, 7, 23, 0, 8)

In [241]:
# open slices work
some_integers[4:]

(8, -2, 4, 2)

In [242]:
# step size works as a third argument
some_integers[0:6:2]

(12, 23, 8)

In [243]:
# step size can be combined with open slices
some_integers[:6:2]

(12, 23, 8)

In [244]:
# negative indexing is useful
some_integers[-1]     # returns the last item

2

In [245]:
# use open slices and negative indexing to reverse the complete iterable
some_integers[::-1]   # also known as the "martian smiley"

(2, 4, -2, 8, 0, 23, 7, 12)

In [246]:
# to index a dictionary, use the key for the item you want to retrieve
office_numbers["Tania"]

4115

In [267]:
# to retrieve an item within a list in a dictionary, use consecutive square-bracket indexing 
tmnt['Leonardo'][2]       # returns the third item in the list associated with 'Leonardo'

'katana'

### Immutability

Immutable data objects cannot be updated once assigned, but they can be deleted and the same name re-used.   
Atomic data types, string, and tuples are the most common immutable data structure in Python.   

In [270]:
# lists are mutable
dna_bases[2] = 'u'
dna_bases

['a', 'c', 'u', 'g']

In [249]:
# tuples are not mutable
some_integers[2] = 33

TypeError: 'tuple' object does not support item assignment

In [269]:
# but tuples can be "erased" by assigning a new value to the same identifier
some_integers = "parrot"
some_integers

'parrot'

Mutable containers have an important property: assigning to a new variable does *not* create a new copy of the data.   
Updating the contents for one variable also updates it for the other variable. 

In [251]:
# assign to a new identifier
dna_bases = ['a', 'c', 't', 'g']
rna_bases = dna_bases
rna_bases

['a', 'c', 't', 'g']

In [252]:
# update new_list
rna_bases[2]= 'u'
rna_bases

['a', 'c', 'u', 'g']

In [253]:
# now, check the original list
dna_bases

['a', 'c', 'u', 'g']

Immutable objects behave differently: assigning to a new variable creates an independent copy of the data.   
Updating the copy does not affect the original.

In [254]:
# create an immutable object and make a copy
my_string = "Hello"
new_string = my_string
new_string

'Hello'

In [255]:
# modify the copy
new_string = new_string + " world!"
new_string

'Hello world!'

In [256]:
# now, check the original string
my_string

'Hello'

### Syntax and formatting

Python was designed to have clean, readable code.  
To illustrate, we'll write another program.

In [20]:
# analysis of island names

# input data
islas_galapagos = ['Isabella', 'Fernandina', 'San Salvador', 'Santa Cruz', 
        'San Cristobal', 'Floreana']

# function that returns a sorted list of unique letters from an input string
def get_letters(name):
    letters_list = []
    for char in name:
        if char.isalpha():
            letters_list.append(char)
    letters_list = list(set(letters_list))
    letters_list.sort()
    return letters_list

# main loop analyzes each input string
for x in islas_galapagos :
    print("The name of the island", x)
    print("  is", len(x), "letters long")
    print("  ends with the letters", x[-3:])
    if ' ' in x : 
        print ("  is composed of multiple words") 
    else :
        print ("  is composed of one word")
    print("  contains the letters", get_letters(x))
    print()

# summary of work done
print('Total island names analyzed:', len(islas_galapagos))

The name of the island Isabella
  is 8 letters long
  ends with the letters lla
  is composed of one word
  contains the letters ['I', 'a', 'b', 'e', 'l', 's']

The name of the island Fernandina
  is 10 letters long
  ends with the letters ina
  is composed of one word
  contains the letters ['F', 'a', 'd', 'e', 'i', 'n', 'r']

The name of the island San Salvador
  is 12 letters long
  ends with the letters dor
  is composed of multiple words
  contains the letters ['S', 'a', 'd', 'l', 'n', 'o', 'r', 'v']

The name of the island Santa Cruz
  is 10 letters long
  ends with the letters ruz
  is composed of multiple words
  contains the letters ['C', 'S', 'a', 'n', 'r', 't', 'u', 'z']

The name of the island San Cristobal
  is 13 letters long
  ends with the letters bal
  is composed of multiple words
  contains the letters ['C', 'S', 'a', 'b', 'i', 'l', 'n', 'o', 'r', 's', 't']

The name of the island Floreana
  is 8 letters long
  ends with the letters ana
  is composed of one word
  co

### Libraries

Standard libraries:   
* Do not need to be installed   
* Contain many very useful data structures and functions    
* Some common standard libraries: math, statistics, itertools, datetime, pathlb, sqlite3 
   
Third-party libraries:   
* Must be installed before use   
* Some common third-party libraries: scipy, numpy, pandas, matplotlib, seaborn   
   
All libraries must be imported before use and referred to by name when calling functions to avoid "name collisions"   

In [2]:
# try using a standard function that needs to be loaded first
output_value = factorial(3)

NameError: name 'factorial' is not defined

In [4]:
# import the library
import math
output_value = factorial(3)

NameError: name 'factorial' is not defined

In [7]:
# refer to the library to avoid name collisions; now it works!
output_value = math.factorial(3)
output_value

6