# Python Refresher  

This document summarizes what you need to know about Python in Jupyter Notebooks for data science. 

There are three sections:

- **Basic Python Programming** shows you all the basic features you need to get started
- **Advanced Python** explains a number of advanced features which everyone should know about after the first week
- **Appendices** present a number of special topics which may or may not be useful, depending on your project

For a brief introduction to using Jupyter Notebooks, 

I also refer you to the tutorial notebooks on Latex, Markdown, Numpy, and Pandas. 


## Basic Python Programming

### First Lesson in Python: Variables and assignment statements

Python is an **interpreted** language, which means is mostly acts like a calculator,
accepting *definitions* of variables and functions, and *expressions* which are evaluated
in the context of the global definitions. 

Defining variables is done using the equals sign. You must line up the beginning of each statement,
and you don't need a semicolon. 

In [1]:
x = 4
y = x + 2
a = [4,5]

print(x)
print(y)
print(a)

4
6
[4, 5]


### Types and variables

Python, like many interpreted languages, is "weakly-typed," that is, values have types, but
variables are NOT declared with a type, and in general, you can assign any type of value to
any variable. 

Here are some examples:

In [2]:
x = 4

print(x)

x = 2.3

print(x)

x = True

print(x)

x = "hi there"

print(x)


Types are checked when variables are used, and may generate errors at execution time:

![Screen%20Shot%202023-01-10%20at%2010.08.02%20AM.png](attachment:Screen%20Shot%202023-01-10%20at%2010.08.02%20AM.png)

### Second lesson in Python: IF statements and how to indicate scope and nested context

The most distinctive feature of Python is
that it does not use an "end of line" character, and uses indentation to indicate nested contexts--what in other languages might be indicated with curly braces `{....}`:

Java:

    if(count < 20) {
         final_count = count;
         System.out.println("count =", count);
    }
    
Python:

    if(count < 20):
        final_count = count
        print("count ="count)
        
What is other languages is indicated using opening and closing parentheses (or curly braces)
is indicated in Python by a tab-indented line.  The nesting level of a nested statement is
indicated by its alignment on the page, NOT by the presence of opening and closing characters such
as `{....}`. 

Therefore you will need to pay attention to where statements line up rather than what
characters surround it. 

The other difference is that at the end of the conditional expression, a colon (:) will introduce
the nested code block. 

Here are some examples of conditional statements. 
        

In [3]:
x1 = 5

if(x1 >2):
    print("x1 is greater than  2")

x2 = 1
if(x2 >2):
    print("x2 is greater than  2")
else:
    print("x2 is NOT greater than 2")
    
x3 = 7
if(x3 < 2):
    print("x3 < 2")
elif(x3 < 5):
    print("2 <= x3 < 5")
elif(x3 < 8):
    print("5 <= x3 < 8 ")
else:
    print("x3 >= 8")

x1 is greater than  2
x2 is NOT greater than 2
5 <= x3 < 8 


### Third lesson in Python: lists as the primary data structure

Python, like most interpreted languages, makes heavy use of lists rather than arrays. What's the difference? Mostly, that lists can change shape, and (because of weak typing) can hold any kind of data:

In [None]:
A = [1,2,3,4]
B = ["hi", "there", "Paul"]
C = [a,'hi', True,[2,'b']]

In [5]:
A

[1, 2, 3, 4]

In [6]:
print(B)

['hi', 'there', 'Paul']


In [7]:
print(C)

[[4, 5], 'hi', True, [2, 'b']]


In [8]:
# Pulling stuff out of lists

print(B[2])

# Using negative indices (going right to left)

print(B[-1])     # last element of the list

Paul
Paul


In [9]:
# Replacing elements of lists

B[2] = 'Wayne'
print(B)

['hi', 'there', 'Wayne']


You can use ranges of indices to extract parts of lists. Just remember that the second number in
the range is always 1 more than the last index you want. 

In [10]:
B[1:]

['there', 'Wayne']

In [11]:
B[:2]

['hi', 'there']

In [12]:
B[1:3]

['there', 'Wayne']

In [13]:
B[:]

['hi', 'there', 'Wayne']

#### The two most useful operations on lists

Here are two ways you will manipulate lists most often. For additional functions, see the section **Useful List Functions** in **Advanced Python** below. 

In [14]:
# append two lists using +

D = B + ['how','are','you']
print( D )

['hi', 'there', 'Wayne', 'how', 'are', 'you']


In [15]:
# Check if an object is a member of a list using in

if( 'hi' in D ):
    print('hi is in the list')
else:
    print('hi is not in the list')

hi is in the list


#### Strings can be manipulated as lists

Strings have their own set of associated functions, but basically they are just lists of characters, and
you can call many (but not all) list functions on them. 

In [16]:
s = "Hi there Wayne"

In [17]:
len(s)

14

In [18]:
max(s)

'y'

In [19]:
s[2:7]

' ther'

### Addendum to Third Lesson:  Tuples 

An alternative to lists is *tuples*, which are essentially immutable lists, i.e., once created, they can not be modified; this turns out to be necessary in some cases. Mostly, you can just treat them like lists, 
as shown here:

In [20]:
D = ("hi", "there", "Paul")

In [21]:
D[1:]

('there', 'Paul')

In [22]:
D[:2]

('hi', 'there')

In [23]:
D[1:3]

('there', 'Paul')

In [24]:
len(D)

3

But if you try to change a tuple's contents, you will get an error:

![Screen%20Shot%202023-01-10%20at%2010.29.50%20AM.png](attachment:Screen%20Shot%202023-01-10%20at%2010.29.50%20AM.png)

### Fourth Lesson:  Loops: while and for

The while statement should cause no problems, just remember to indent and
use a colon (:) at the end of the condition, which introduces the intended
code block:

In [25]:
x = 4
while(x < 10):
    print(x)
    x += 1                # Python does not allow "++x" so you'll have to do this to increment a variable
    

4
5
6
7
8
9


#### For loops

for loops iterate over the elements of a list, which may be created by the `range(...)` function. Just remember
that the upper bound in a range is always 1 more than the last number you want to produce:

In [26]:
for k in [1,2,3,5]:
    print(k)

1
2
3
5


In [27]:
list(range(10))        # a single number is assumed to be an upper bound with a starting value of 0

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [28]:
list(range(2,10))

[2, 3, 4, 5, 6, 7, 8, 9]

In [29]:
list(range(2,10,3))      # a third argument is the skip

[2, 5, 8]

In [30]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### Fifth Lesson:  Defining functions

Functions are defined using the keyword `def` and using the colon and indented block syntax
we have seen above.  Functions which return a value must use `return`:

In [31]:
def f(x):
    print(x,x,x)
    
f(5)

5 5 5


In [32]:
def is_even(x):
    if(x % 2 == 0):
        return True
    else:
        return False
    
is_even(5)

False

In [33]:
print("here is a definition")

def sayHi(name):
    return "Hi there, " + name

sayHi('Paul')

here is a definition


'Hi there, Paul'

Functions introduce a local scope for variables which are not available outside the function, just
as in Java and other languages.  You may refer to variables outside the scope, but if you want
to assign to them, you have to declare them as **global**. If you don't do this, you will get an error!


In [34]:
N = 5

def add2N(x):
    global N
    N = N + x
    
add2N(2)

print(N)

7


## Advanced Features of Python

### Useful List Functions

Here are some examples of useful functions that work on lists. There are two important things to remember
about calling a function on a list, the first having to do with the syntax and the second having to
do with the effect of the function:

1. Functions will be called in one of two ways, either as a function:

    n = max(C)
    
or using the dot notation familiar from object-oriented languages such as Java:

    k = C.index(3)
    
2. Functions may return a value (perhaps another list), leaving the original list unchanged:

    C1 = C.copy()
    
or they may modify the list "in place" and return <code>None</code>. In this case, you use them
as "imperative"statements that modify the list:

    C.append(6)
    
**Note that functions which modify the original list will always use the "dot notation."**
    
Here are examples of the most useful functions which return values. 

In [35]:
C = [1,3,4,2,3,4,5]

# Return the length of the list

print(len(C))

# Return a count of how many times a given element occurs in the list

print(C.count(4))

# Return the index of the first occurrence of an element (error if not found)

print(C.index(2))

# Return the largest/smallest element in the list

print(max(C))
print(min(C))

# Return a sorted copy of the list, in ascending order by default

print(sorted(C))
print(sorted(C,reverse=True))

# Return a copy of the list (two different ways)

print(C.copy())
print(C[:])


7
2
3
5
1
[1, 2, 3, 3, 4, 4, 5]
[5, 4, 4, 3, 3, 2, 1]
[1, 3, 4, 2, 3, 4, 5]
[1, 3, 4, 2, 3, 4, 5]


Here are two useful functions which change the list in place and return None. 

In [36]:
# Add a new element to the end of the list

C.append(12)
print(C)

# Sort the list in place

C.sort()
print(C)

C.sort(reverse=True)
print(C)

[1, 3, 4, 2, 3, 4, 5, 12]
[1, 2, 3, 3, 4, 4, 5, 12]
[12, 5, 4, 4, 3, 3, 2, 1]


### List Comprehensions

A list comprehension is a great way to create lists with a minimum of fuss and bugs. 
The basic idea is that instead of creating a list by specifying every step, say like this:

In [37]:
# Create a list of the first 10 squares

L = [0] * 10         # create a list of 10 zeros

for k in range(len(L)):
    L[k] = (k+1)**2
L

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

you can do it all in one line:

In [None]:
L1 = [  (k+1)**2  for k in range(len(L))  ]
print(L1)

The idea is to collect together all instances of the expression at the beginning, for all values of k produces
by the for. Some examples may clarify. 

In [None]:
from random import random            # The function random() returns a random double in the range [0..1)

L2 = [  random()  for k in range(10)  ]
print(L2)

You can also use multiple for loops:

In [None]:
D = ['0','1','2','3','4']
L = ['A','B','C','D','E']
     
X = [ d + el for d in D for el in L ]
print(X)

However, watch out, because the order of the **for**'s must be the same as if you were writing a loop:

In [None]:
T = [ [ 1,2,3] , [4,5,6], [7,8,9]]

Tall = [ x for lst in T for x in lst ]

Tall

List comprehensions can do a lot, especiallly if you use conditions in the "loop" part:

In [None]:
L3 = [63,241,7,43,99,132,6,-3,71,235,24,66]  

# let's pick 63 as the "pivot" for quicksort and partition the list into those
# numbers less than 63 and those greater or equal:

left = [ x for x in L3 if x < 63 ]
right = [ x for x in L3 if x >= 63 ]

print(L3)
print(left)
print(right)

In [None]:
#  List comprehensions can be used to write very complicated algorithms with very few lines of code!

def quicksort(L):
    if(L == []):
        return []
    else:
        pivot = L[0]
        left  = [ x for x in L[1:] if x < pivot ]       # partition list around pivot
        right = [ x for x in L[1:] if x >= pivot ] 
        return (  quicksort(left) + [pivot] + quicksort(right) )

quicksort(L3)

### Sets

A set in mathematics is a collection of elements in which there are no duplicates and order does not matter.
In Python, we can create sets, which are implemented by hash tables, and are much more efficient than lists, if
a set is really what you want. 


In [None]:
# create a set
S= {2,3,4,5}
print(S)

# duplicates are ignored
T = { 3, 4,3 }
print(T)
print()

# membership test
print( (3 in S) )

# subset test
print(  (T.issubset(S)) )
print()

# add an element to a set
S.add(7)
print(S)

# remove an element from a set
S.remove(3)
print(S)
print()

# create a new set using set operations union, intersection, and set difference

A = {'a', 'c', 'd'}
B = {'c', 'd', 2 }
C = {1, 2, 3}

print('A =',A)
print('B =',B)
print('C =',C)
print()

print('A U B =', A.union(B))
print('B U C =', B.union(C))
print('A U B U C =', A.union(B, C))
print('A.union() =', A.union())              # just make a copy
print()

# return a new set which is the intersection of others
print('A n B =',B.intersection(A))
print('B n C =',B.intersection(C))
print('A n C =',A.intersection(C),'     <- this is how empty set can be represented')
print('C n A n B =',C.intersection(A, B))
print()

# return a new set which is the set difference with another
print('A - B =',A.difference(B))
print('B - A =',B.difference(A))
print('A - C =',A.difference(C))


### Dictionaries

A dictionary is a data structures which stores (key,value) pairs (typically implemented as a hash table).
This is a great data structure for storing information about objects without having to muck around with lists.


In [None]:
D = { 'a' : 2, 'c' : 8, 'b' : 1}      # Dictionary storing, say, how many times a letter appears in a string
print(D)
E = {}        # empty dictionary
print(E)


#### Basic Dictionary Operations

Most of the manipulation of dictionaries looks like array or list manipulation
but using the keys instead of the position of the elements in the list. 


In [None]:
# Find the value associated with a particular key
D['c']


In [None]:
# Insert a key-value pair 
D['a'] = 4
D['z'] = 23
D

In [None]:
# update a value associated with a key by doing both
D['z'] = D['z'] + 2
D

In [None]:
### Miscellaneous functions

# get all keys
print( D.keys() )
print( list(D.keys()) )       # to just get a list

# get all values
print( list(D.values()) )

In [None]:
# Default values for dictionaries

# simplest: use the function get(...)
x = D.get('q',0)         # first argument to get is the key, the second is the default
                         # value if the key is not in the dictionary
print(x)

In [None]:
# The second way is to use a different kind of dictionary, which you have to import,
# and then define a function to return the default value

from collections import defaultdict

def get_default():
    return 0

A = defaultdict(get_default)

A['a'] = 5
print( A['a'] )
print( A['b'] )

In [None]:
### Calculating frequency counts using a dictionary

from collections import Counter       

F = Counter([3,4,2,3,4,5,4,3,2,3,4,5,4,3,8])

print("Counter creates a dictionary giving the frequency counts of each element in the list.")
print(F)

n = 3
print("The list has",F[n], 'instances of the number '+ str(n) + '.')

### Plotting and Graphing

The <code>scatter(...)</code> function is used to display points from a list of x values and the associated y values. 

In [None]:
import matplotlib.pyplot as plt


# To plot the points (1,2), (2,3), (3,6), (4,8) we would list the x values and the corresponding y values:
X = [1,2,3,4]
Y = [2,3,6,8]

print("\nThis is the list of points:",list(zip(X,Y)))
print("They must be input to the function as separate lists:")
print("\tX =",X)
print("\tY =",Y,"\n")

plt.scatter(X,Y)
plt.title('Graphing Points with scatter(X,Y)')
plt.xlabel("The X Values")
plt.ylabel("The Y Values")
plt.show()



####  Plotting curves

If you call <code>plot(...)</code> instead of <code>scatter(...)</code> you will display a curve created by connecting the points with straight lines. Essentially you can only plot straight lines between points, but if the points are close together, you will not notice, and it will look like a smooth curve. 

In [None]:
from math import sin

# To plot a curve through the points (1,2), (2,3), (3,6), (4,8) we would use: 
plt.plot([1,2,3,4], [2,3,6,8])
plt.title('A Sequence of Straight Lines')
plt.xlabel("The X Values")
plt.ylabel("The Y Values")
plt.show()

import numpy as np

X = np.linspace(1,20,100)            # returns a list of 100 equally-spaced values in the range [1..20]
Y = [sin(x) for x in X]
plt.plot(X,Y)
plt.title('A Smooth-Looking Curve')
plt.xlabel("The X Values")
plt.ylabel("The Y Values")
plt.show()

If you leave out the $X$ values, `plot(...)` will assume that you want the x-axis labeled 0, 1, ..., (len(X)-1),
corresponding to the labels on an array or list:

In [None]:
# Plot a list against the indices
Y = [2,3,6,8]

plt.plot(Y)
plt.title('Graphing lines with plot(Y)')
plt.xlabel("The X Values")
plt.ylabel("The Y Values")
plt.show()

If you want to do both, you can simply call both functions before you call show(). 

In [None]:
plt.scatter([1,2,3,4], [2,3,6,8])
plt.plot([1,2,3,4], [2,3,6,8])
plt.title('A Curve Through Some Points, Showing the Points')
plt.xlabel("The X Values")
plt.ylabel("The Y Values")
plt.show()

If you want to draw a single line from $(x_1,y_1)$ to $(x_2,y_2)$ you can plot $[x_1,x_2]$ and $[y_1,y_2].$

Here we have added a zero line to our sin curve:

In [None]:
X = np.linspace(1,20,100)            # returns a list of 100 equally-spaced values in the range [1..20]
Y = [np.sin(x) for x in X]
plt.plot(X,Y)
plt.plot([0,20],[0,0])
plt.title('A Smooth-Looking Curve')
plt.xlabel("The X Values")
plt.ylabel("The Y Values")
plt.show()

#### Bar Charts

If we do the exact same thing as we did with a simple plot, but use the function <code>bar(...)</code> we get a bar chart:

In [None]:
# To plot the points (1,2), (2,3), (3,6), (4,8) we would list the x values and the corresponding y values:
plt.figure(num=None, figsize=(8, 6))
plt.bar([1,2,3,4], [2,3,6,8])
plt.title('A Bar Chart')
plt.xlabel("The X Values")
plt.ylabel("The Y Values")
plt.show()

If the Y axis is probabilities (in the range 0 .. 1), we get a distribution of the probabilities among the outcomes of an experiment:

In [None]:
# Show the distribution of probabilities for a coin flip:
x = [0,1]
y = [0.5, 0.5]
labels = ['Heads', 'Tails']

plt.figure(num=None, figsize=(8, 6))
plt.xticks(x, labels)
plt.bar(x,y)
plt.title('Probability Distribution for Coin Flips')
plt.ylabel("Probability")
plt.xlabel("Outcomes")
plt.show()

With a few tweaks, you can create an attractive bar chart for arbitrary probability distributions:

In [None]:
import matplotlib.pyplot as plt

# Show the distribution of probabilities for flipping two dice
x = [k for k in range(1,30)]
y = [5**(k-1)*6**(-k) for k in x]

plt.figure(num=None, figsize=(14, 6))
plt.bar(x,y, width=1.0,edgecolor='black')
plt.title('Probability Distribution for tossing a die until a 6')
plt.ylabel("Probability")
plt.xlabel("Outcomes")
plt.show()

#### Histograms
- If you give a list of values to <code>hist(...)</code> it will create a histogram counting how many of each value occur; this list can be unordered;
- You will get a cleaner display if you specify where the edges of the bins are, and make sure the edges of the bins are visible, as shown in this example:

In [None]:
plt.figure(num=None, figsize=(8, 6))
plt.hist([1,2,4,2,6,2,4,5,6,4,6,3,4,3],bins=[0.5,1.5,2.5,3.5,4.5,5.6,6.5],edgecolor='black')
plt.title('Sample Histogram')
plt.xlabel("Outcomes")
plt.ylabel("Frequency")
plt.show()

### Customizing Your Plots

One thing you have probably noticed is that when you write "bare-bones" code such as we have above, certain
defaults are used for the size and layout of the figure and the style of the drawing. One of the most noticable is that when you draw multiple lines, Matplotlib will change the color each time you call the same function (notice that this doesn't happen when calling a different function, e.g., plot followed by scatter). 

#### Using Colors

Matplotlib cycles through a sequence of 10 colors, which is fine if that is what you want. For my taste, they are pretty ugly, and in the next section we will show you how to use the colors you want. 


In [None]:
print("\nThe 10 Matplotlib color sequence, starting at 12 o\'clock and going clockwise:")

plt.figure(figsize=(10,10))
for k in np.arange(0,2*np.pi,np.pi/20):                 # arange is like range, except it allows you to use floats
    plt.plot([0,np.sin(k)],[0,np.cos(k)],lw=4)
plt.title('Line Colors',fontsize=14)
plt.xlim([-1.2,1.2])
plt.ylim([-1.2,1.2])
plt.show()

Here is an example where we simply change the colors of the plot using the appropriate parameter; see a complete list of colors here: https://matplotlib.org/2.0.2/api/colors_api.html

In [None]:
# EXAMPLE: Plotting a square with lines of different colors
plt.figure(num=None, figsize=(8, 8), dpi=89)
plt.plot([0,1],[0,0],color='red') # Line connecting (0,0) to (1,0)
plt.plot([0,0],[0,1],color='green') # Line connecting (0,0) to (0,1)
plt.plot([0,1],[1,1],color='orange') # Line connecting (0,1) to (1,1)
plt.plot([1,1],[0,1],color='black') # Line connecting (1,0) to (1,1)

#### Changing the Style of Plots

Here is an example showing how to

  - change the size of the whole figure
  - change the color of lines or points
  - change the style of lines or points
  
To see a complete list of lines styles see:  https://matplotlib.org/2.0.2/api/lines_api.html

To see a complete list of colors see: https://matplotlib.org/2.0.2/api/colors_api.html

To see a complete list of marker (point) styles see:  https://matplotlib.org/2.0.2/api/markers_api.html#module-matplotlib.markers

In [None]:
# EXAMPLE: Plotting a square via lines 
plt.figure(figsize=(12, 12))             # the size is (horizontal, vertical)
plt.title("Line Colors and Styles",fontsize=16)
plt.plot([0,10],[0,0],  color='red') # Line connecting (0,0) to (1,0)
plt.plot([0,0],[0,10],  color='green') # Line connecting (0,0) to (0,1)
plt.plot([0,10],[10,10],color='orange') # Line connecting (0,1) to (1,1)
plt.plot([10,10],[0,10],color='black') # Line connecting (1,0) to (1,1)
plt.plot([1,9],[9,9], linewidth=5)    # give a linewidth in points, default is 1.0
plt.plot([1,9],[8,8], linewidth=5,color = '#eaafff')    # for custom color give the RGB value in hex
plt.plot([1,9],[7,7], linewidth=5,color='0.75') # for grey give the percentage of white in quotes
plt.plot([1,9],[6,6], lw=4,linestyle='--') # Linestyles
plt.plot([1,9],[5,5], lw=3,linestyle='-.') # Linestyles
plt.plot([1,9],[4,4], lw=2,linestyle=':') # Linestyles

plt.scatter(range(1,10),[3.5]*9,marker='.')  # various markers, if you don't specify the colors it will cycle
plt.scatter(range(1,10),[3]*9,marker='o')    # through a bunch of colors, starting with blue, orange, green, etc.
plt.scatter(range(1,10),[2.5]*9,marker='x')
plt.scatter(range(1,10),[2]*9,marker='s')
plt.scatter(range(1,10),[1.5]*9,marker='^')
plt.scatter(range(1,10),[1]*9,marker='D')
print()

#### Et Cetera

Then you can start getting obsessive, adding gridlines, changing the background color,  adding legends, text, and so on. 

Another nice feature of matplotlib is that you can insert simple Latex commands into titles and text.....

In [None]:
x = [i for i in range(11)]
y = [i**2 for i in x]


plt.figure(figsize=(8, 8))
plt.title('Graph of $x = x^2$')
plt.xlabel("X")
plt.ylabel("Y")
plt.grid()
plt.plot(x,y)
plt.show()


plt.figure(figsize=(8, 8))
plt.title('Graph of $y = x^2$')
plt.grid(color='w')                # grid of white lines -- don't use points with this, they look funny
plt.gca().set_facecolor('0.95')    # background of light grey
plt.plot(x,y,color='b')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()


plt.figure(figsize=(8, 8))
plt.title('Graph of $y = x^2$')
plt.grid(color='r',alpha=0.1)       # alpha sets the transparency, 0 = invisible and 1 = normal           
plt.plot(x,y,color='r',lw=0.5,label='Curve')
plt.scatter(x,y,color='r',marker='o',label='Points')
plt.legend()
plt.xlabel("X")
plt.ylabel("Y")
plt.show()


plt.figure(figsize=(8, 8))
plt.title('Graph of $y = x^2$',fontsize=16)
plt.xlabel("X",fontsize=14)
plt.ylabel("Y",rotation=0,fontsize=14)
plt.grid(color='0.95')
plt.text(0,90,"The title has been enlarged from default 12 points to 16 points.")
plt.text(0,80,"Notice that the $y$ axis label is rotated to be upright, \nand the x and y labels are also bigger, at 14 points.")   # lower left corner of text string is at point (0,60)
plt.text(0,60,"When drawing points and lines together it looks \nbetter if you make the lines thinner.")
plt.text(0,40,"Honestly I think it is also better to just use\nthe default marker (circles) when you draw \nlines, these triangles are kinda fussy\nand they don't seem to be centered on \nthe data point.")
plt.plot(x,y,color='b',lw=0.5)
plt.scatter(x,y,color='b',marker='^')
plt.show()




### Optional Arguments to Functions

Python allows a lot of flexibility when you define functions; in particular,
it allows you to define functions with a variable number of arguments,
and optional arguments with default values. This simplifies the syntax
when you have a function which you want to use in a variety of contexts;
many of the statistical functions operation in this way, for example. 

In [None]:
# Variable number of arguments, this looks a lot like how it is done in C

# The following function collects together all its arguments into a tuple,
# which can then be accessed as usual inside the function

def my_sum(*args):
    print('All the arguments:', args)
    print('Number of arguments:', len(args))
    if len(args) > 0:
        print('Last argument:', args[-1])
    return sum(args)
        
print('Sum = ', my_sum(2,2,5,4,7,-3))
print()
print('Sum = ', my_sum(4))
print()
print('Sum = ', my_sum())

In [None]:
# Optional Arguments with Default Values

# You may give arguments with default values using
# an initialization in the definition, as shown here:

#           x is a normal parameter, you MUST supply it 
#                          m is optional, if you leave it out it will get the default
#                                         b ditto
def LC_Hash(x, a = 2137, m = 6827, b = 0):
    print('x =',x, ', a =',a, ', m =',m, ', b =',b)
    return ((a * x) % m) + b

# Here we give all four values

print( LC_Hash( 5, 3,7, 1) )         # x <- 5, a <- 3, m <- 7, b <- 1
print()

# Here we give only first, rest will take optional values

print( LC_Hash( 5 ) )         # x <- 5, a <- 3, m <- 7, b <- 1
print()

# Here we give first two

print( LC_Hash( 15, 23 ) )        
print()

# Here we give only first three

print( LC_Hash( 15, 7, 11 ) )         # x <- 5, a <- 3, m <- 7, b <- 1
print()



In [None]:
# Is best to give values for a prefix of the arguments, in order, and not skip around
# When in doubt, use the name of the parameter

print( LC_Hash( 15, 11, m = 23, b = 3 ) )      
print()

Basically, Python matches the parameters and arguments by position, and THEN by
name:
<pre>
print( LC_Hash( 15, 11,  m = 23, b = 3 ) )      
     
   by position: ****** |
   by name:            | +++++++++++++


</pre>


If you try to do the reverse, it will complain:


![Screen%20Shot%202020-01-31%20at%207.14.28%20PM.png](attachment:Screen%20Shot%202020-01-31%20at%207.14.28%20PM.png)

### Passing lists or functions as parameters to functions

Remember that parameter passing in python is using pointers. Thus if you pass an array into
a function, and modify the array in the function, the modifications will still be in effect after the function exits.


In [None]:
def changeList(L):
    L[0] = 5
    
Lst = [1,2,3,4]

print(Lst)
changeList(Lst)
print(Lst)

In [None]:
# Passing a function into another function

def plus_one(x):
    return x + 1

def g(f,x):
    return f(x)

g(plus_one,4)

### Shallow vs Deep Copies of Lists

This discussion is based on the one found <a href="https://www.afternerd.com/blog/python-copy-list/">here</a>.

The question is:

> How do you make a copy of a list?

The answer depends on what you mean by a *copy*!  If you mean another reference to the same list, that is called a **shallow copy** (like the fact that my oldest son is called "John" or "JH" -- two names for the same person),
just assign the list to another name:

In [None]:
A = [1,2,3,4]
B = A
print(A)
print(B)

But to see that there is only one list (which now has two names), just make changes to each:

In [None]:
A = [1,2,3,4]
B = A

A[0] = 'a'

print(A)
print(B)
print()

B[3] = 'c'

print(A)
print(B)

A shallow copy only copies the *reference* to the list, not the list contents:
 
         A = [1,2,3,4]        
                             A -> [1,2,3,4]
                             
         B = A 
                             A -> [1,2,3,4]  <- B
                             
         A[0] = 'a'
                             A -> ['a',2,3,4]  <- B 
                             
         B[3] = 'c'
                             A -> ['a',2,3,'c']  <- B 
                             

To make a **deep copy** of a list, making an entirely new list with new elements, you can use
a slice, a list constructor, or Python3's `copy` function:

In [None]:
A = [1,2,3,4]

# Copying a list by slicing
B = A[:]

# Copying a list using the list constructor
C = list(A)

# Copying a list using Python3's copy function:
D = A.copy()

# These are all different lists:

A[0] = 'a'
B[1] = 'b'
C[2] = 'c'
D[3] = 'd'

print(A)
print(B)
print(C)
print(D)


A deep copy makes an entirely new list:
 
         A = [1,2,3,4]        
                             A -> [1,2,3,4]
                             
         B = A[:] 
                             A -> [1,2,3,4] 
                             B -> [1,2,3,4] 
                             
         C = list(A)
                             A -> [1,2,3,4] 
                             B -> [1,2,3,4] 
                             C -> [1,2,3,4]
                             
         D = list(A)
                             A -> [1,2,3,4] 
                             B -> [1,2,3,4] 
                             C -> [1,2,3,4] 
                             D -> [1,2,3,4]
                             
         A[0] = 'a'
                             A -> ['a',2,3,4] 
                             B -> [1,2,3,4] 
                             C -> [1,2,3,4] 
                             D -> [1,2,3,4]

         B[1] = 'b'
                             A -> ['a',2,3,4] 
                             B -> [1,'b',3,4] 
                             C -> [1,2,3,4] 
                             D -> [1,2,3,4]
                             
               --and so on--

###  Features/Bugs of Python to Watch out for

There are three things about Python that you need to know, if you
are more familiar with a language like Java or C. 

(1) Python is an **interpreted** language, which means it are processed
in a "Read-Eval-Print" loop: 
input expressions or definitions or assignment statements are read, evaluated,
and the result printed out, and then it starts all over again with the next expression. 
You can see the order in which the cells were evaluated by looking at the number in square braces to the left: 

    In [87]:

(2)  Python is a **weakly-typed** language, which means that values have types (of course) but 
variables don't have to be declared with a type and only contain values of that type.
Any variable can represent any type of value. 


(3) Python maintains a **global memory** of all definitions (function 
definitions, and assignment statements), which is maintained until
you terminate or restart the kernel.    This feature causes a lot of problems!

Features (2) and (3) make it difficult to keep track of the variables
in your program, and are the major source of bugs when you are
first learning Python.  Unfortunately, Jupyter notebooks do not
help, and in fact make these features more difficult to handle. 
We will develop strategies to minimize
these problems. 


### Managing variables in a weakly-typed language (feature 2)


Python does not force all variables to be declared with a type, as in Java and C,
and this leads to problems. The main problem is that you create a variable
accidentally with the same name, but different meaning. Here
is a variable X being used in three different ways. Confusing? YES. 

In [None]:
X = 4
print(X)

X = 'hi'
print(X)

X = [X, 'folks']              # The second X refers to the previous definition 
print(X)

You might think this is not a big deal, but if you make a habit of using only a small number of variable names
such as X, Y, x, k, i, etc. and if these occur over the WHOLE range of your notebook, you will almost certainly have some confusion somewhere about what a variable means. 

Even worse, Python allows you to redefine the standard names of functions, so the following, if you uncomment the last two lines, 
creates a confusing bug:

#### TODO:  

In the next cell, remove the hash mark from the last two lines, to "uncomment" them. Run the cell, and observe that it
seems fine. 

Now run it once more: it will generate a weird error, because the call to the constructor <code>list</code>
is now calling a list [2,5,4]. You've destroyed the binding between the variable list and its definition
in the standard library. 

Finally, put the hash marks back in, and hit `Restart & Run All` from the `Kernel` menu. 

<b>NOTE:  IF YOU GET A WEIRD BUG THAT MAKES NO SENSE, DO `Kernel -> Restart & Run All` BEFORE DOING ANYTHING ELSE. IT OFTEN FIXES THE PROBLEM.</b> 

#### Punchline:  Use as few global variables as possible. Do NOT reuse common names (X, i, lst) as global variables. Do NOT use the  `sum`, `list`, `sqrt` as variable names, as these are <b> function names</b> predefined in Python. 

Either write a function for each problem, storing everything as local variables, or give each global variable a unique name by adding the number of the problem. 

Example:  Suppose you have the following code which is your solution to a Problem Four in a homework:

In [None]:
A = [2,3,4]
B = A + [6]                 
print(B)

X = [2,5,4]
print(X)


In [None]:
# Here is a way of avoiding global variables by wrapping everything in a function (all variables local)

def solution4():
    A = [2,3,4]
    B = A + [6]                 
    print(B)

    X = [2,5,4]
    print(X) 
    
solution4()             # be sure to call it! 


In [None]:
# Or simply add the problem number to the name:

A4 = [2,3,4]
B4 = A4 + [6]                 
print(B4)

X4 = [2,5,4]
print(X4)

<b>Caveat:</b> You do NOT need to do this for all variables, since local variables in for loops or functions
are not a problem. Use the usual variables x,y,i, X, etc. for local variables; there is
no problem with these. 

### Managing the global list of variable bindings (feature 3)

Python is an interpreted language (feature 1) which uses a "Read-Eval-Print" loop to read
an expression, evaluate it, and print out the value. For definitions, such as
assignments and function definitions, there is also a change to the global master list
which holds all variable and function definitions. 

    To see the (long) list of global variable bindings, call the function globals()

In [None]:
#globals()

If you use unique global variable names, you should not have too much trouble with this,
but still there are strange things that happen if you don't know about this feature. 
The problem is about "old values" which were stored in the past, even if you don't need them. 

Let us look at one example to show the problem, and you can keep a watch out for this.

Suppose you write the following and run it:

In [None]:
X = [1,2,3]

In [None]:
X

Now there is a binding in the global memory:  <code>   X = [1,2,3]   </code>

But then suppose you change your mind and delete this statement. The problem is
that the binding is NOT removed unless you **Restart** the Kernel (in the Kernel menu). 

Thus, four hours later, you have completely forgotten about this X, and you write this
code, but you make a very small error, and leave out the '1' in 'X1'.  It's hard
to see that the single character is missing, and when you run it, PYTHON 
STILL HAS THE BINDING FOR X AND YOU WON'T KNOW ABOUT THE ERROR EXCEPT FOR THE RESULT
BEING WRONG.  You'll have to see the missing '1'. 

In [None]:
X1 = ['a','b','c','d','e','f','g']

print( X[:2] )        # expecting to see ['a','b']  but Python finds the old value of X and doesn't complain

#### PUNCHLINE:   If you have a nasty bug that you can't figure out, try Restart and Run All from the Kernel menu.  ALWAYS Restart and Run All before submitting to make sure everything works as it is supposed to. 

### Floating-Point Arithmetic

This section will summarize a few important points from the lecture notes in CS 132:

       https://www.cs.bu.edu/fac/snyder/cs132-book/L02Numerics.html

Python has both integer and floating point types. Integers, as long as they are not too large, can be stored precisely; however, real numbers (such as $\pi$), can only be approximated by storing the number in scientific notation in binary:

![Screen%20Shot%202021-05-24%20at%205.55.43%20PM.png](attachment:Screen%20Shot%202021-05-24%20at%205.55.43%20PM.png)

Because only a fixed number of bits are used, most real numbers cannot be represented exactly in a computer.

Another way of saying this is that, usually, a floating point number is an approximation of some particular real number.

Generally when we try to store a real number in a computer, what we wind up storing is the closest floating point number that the computer can represent.

Problems arise when we work with floating point numbers and confuse them with real numbers, thereby forgetting that most of the time we are not storing the real number exactly, but only a floating point number that is close to it.

Let’s look at some examples. First:

In [None]:
# ((1/8)*8)-1
a = 1/8
b = 8
c = 1
(a*b)-c

It turns out that 1/8, 8, and 1 can all be stored exactly in IEEE-754 floating point format.

So, we are storing the inputs exactly (1/8, 8 and 1), computing the results exactly, yielding 

  $$(1/8)∗8=1$$

and representing the result exactly (zero).

OK, here is another example:

In [None]:
# ((1/7)*7)-1
a = 1/7
b = 7
c = 1
a * b - c

Here the situation is different.

1/7 can not be stored exactly in floating point format.

In binary, 1/7 is 0.001001⎯⎯⎯⎯⎯⎯⎯⎯⎯, an infinitely repeating pattern that clearly cannot be represented in a finite sequence of bits.

Nonetheless, the computation (1/7)∗7 still yields exactly 1.0.

Why? Because the rounding of 0.001001⎯⎯⎯⎯⎯⎯⎯⎯⎯ to its closest floating point representation, when multiplied by 7, yields a value whose closest floating point representation is 1.0.

Now, let’s do something that seems very similar:

In [None]:
# ((1/70)*7)-0.1
a = 1/70
b = 7
c = 0.1
a * b - c

In this case, neither 1/70 nopr 0.1 can be stored exactly.

More importantly, the process of rounding 1/70 to its closest floating point representation, then multiplying by 7, yields a number whose closest floating point representation is not 0.1

However, that floating point representation is very close to 0.1.

Let’s look at the difference: -1.3877787807814457e-17.

This is about $−1\cdot 10^{−17}$ =  -0.0000000000000001.

Compared to 0.1, this is a very small number. The relative error abs(-0.0000000000000001 / 0.1) is about $10^{−16}$.

This suggests that when a floating point calculation is not exact, the error (in a relative sense) is usually very small.

#### Punchline 1:  Do not compare floating point numbers for equality¶

Two floating point computations that should yield the same result mathematically, may not do so due to rounding error.

However, in general, if two numbers should be equal, the relative error of the difference in the floating point should be small.

So, instead of asking whether two floating numbers are equal, we should ask whether the relative error of their difference is small.

In [None]:
a = 7
b = 1/10
c = 1/a
r1 = a * b * c
r2 = b * c * a
np.abs(r1-r2)/r1

In [None]:
np.finfo('float')

In [None]:
print(r1 == r2)

Instead, when comparing floating-point values, we should test whether they
are "close enough." 
This test is needed often enough that numpy has a function that implements it:

In [None]:
np.isclose(r1, r2)

#### Punchline 2:  Beware of ill-conditioned problems¶

An ill-conditioned problem is one in which the outputs depend in a very sensitive manner on the inputs.

That is, a small change in the inputs can yield a very large change in the outputs.

The simplest example is computing 1/(𝑎−𝑏).

In [None]:
print(f'r1 is {r1}')
print(f'r2 is very close to r1')
r3 = r1 + 0.0001
print(f'r3 is 0.1001')
print(f'1/(r1 - r2) = {1/(r1 - r2)}')
print(f'1/(r3 - r2) = {1/(r3 - r2)}')

If a is close to b, small changes in either make a big difference in the output.

Because the inputs to your problem may not be exact, if the problem is ill-conditioned, the outputs may be wrong by a large amount.

Later on in CS 132 we will see that the notion of ill-conditioning applies to matrix problems too, and in particular comes up when we solve certain problems involving matrices.

#### Punchline 3: Relative error can be magnified during subtractions¶

Two numbers, each with small relative error, can yield a value with large relative error if subtracted.

Let’s say we represent a = 1.2345 as 1.2345002 – the relative error is 0.0000002.

Let’s say we represent b = 1.234 as 1.2340001 – the relative error is 0.0000001.

Now, subtract a - b: the result is .0005001.

What is the relative error? 0.005001 - 0.005 / 0.005 = 0.0002

The relative error of the result is 1000 times larger than the relative error of the inputs.

Here’s an example in practice:

In [None]:
a = 1.23456789
b = 1.2345678
print(0.00000009)
print(a-b)
print(np.abs(a-b-0.00000009)/ 0.00000009)

We know the relative error in the inputs is on the order of $10^{−16}$, but the relative error of the output is on the order of $10^{−9}$ – i.e., a million times larger.

A good summary that covers additional issues is at https://docs.python.org/2/tutorial/floatingpoint.html.