# Errors and Exception Handling

In this lecture we will learn about Errors and Exception Handling in Python. You've definitely already encountered errors by this point in the course. For example:

In [2]:
print('Hello)

SyntaxError: unterminated string literal (detected at line 1) (1679058590.py, line 1)

Note how we get a SyntaxError, with the further description that it was an EOL (End of Line Error) while scanning the string literal. This is specific enough for us to see that we forgot a single quote at the end of the line. Understanding these various error types will help you debug your code much faster. 

This type of error and description is known as an Exception. Even if a statement or expression is syntactically correct, it may cause an error when an attempt is made to execute it. Errors detected during execution are called exceptions and are not unconditionally fatal.

You can check out the full list of built-in exceptions [here](https://docs.python.org/3/library/exceptions.html). Now let's learn how to handle errors and exceptions in our own code.

## try and except

The basic terminology and syntax used to handle errors in Python are the <code>try</code> and <code>except</code> statements. The code which can cause an exception to occur is put in the <code>try</code> block and the handling of the exception is then implemented in the <code>except</code> block of code. The syntax follows:

    try:
       You do your operations here...
       ...
    except ExceptionI:
       If there is ExceptionI, then execute this block.
    except ExceptionII:
       If there is ExceptionII, then execute this block.
       ...
    else:
       If there is no exception then execute this block. 

We can also just check for any exception with just using <code>except:</code> To get a better understanding of all this let's check out an example: We will look at some code that opens and writes a file:

In [12]:
try:
    f = open('testfile','w')
    f.write('Test write this')
except IOError:
    # This will only check for an IOError exception and then execute this print statement
    print("Error: Could not find file or read data")
# else:
#     print("Content written successfully")
#     f.close()

Now let's see what would happen if we did not have write permission (opening only with 'r'):

In [8]:
try:
    f = open('testfile','r')
    f.write('Test write this')
except:
    # This will only check for an IOError exception and then execute this print statement
    print("Error: Could not find file or read data")
# else:
#     print("Content written successfully")
#     f.close()



Error: Could not find file or read data


Great! Notice how we only printed a statement! The code still ran and we were able to continue doing actions and running code blocks. This is extremely useful when you have to account for possible input errors in your code. You can be prepared for the error and keep running code, instead of your code just breaking as we saw above.

We could have also just said <code>except:</code> if we weren't sure what exception would occur. For example:

In [11]:
try:
    f = open('testfile','r')
    f.write('Test write this')
except:
    # This will check for any exception and then execute this print statement
    print("Error: Could not find file or read data")
else:
    print("Content written successfully")
    f.close()

Error: Could not find file or read data


Great! Now we don't actually need to memorize that list of exception types! Now what if we kept wanting to run code after the exception occurred? This is where <code>finally</code> comes in.
## finally
The <code>finally:</code> block of code will always be run regardless if there was an exception in the <code>try</code> code block. The syntax is:

    try:
       Code block here
       ...
       Due to any exception, this code may be skipped!
    finally:
       This code block would always be executed.

For example:

In [13]:
try:
    f = open("testfile", "w")
    f.write("Test write statement")
    f.close()
finally:
    print("Always execute finally code blocks")

Always execute finally code blocks


In [15]:
try:
    print(f'Hello {name}')
except:
    print('Exception happened')
finally:
    print('execution done!')

Exception happened
execution done!


We can use this in conjunction with <code>except</code>. Let's see a new example that will take into account a user providing the wrong input:

In [23]:
def askint():
    try:
        val = int(input("Please enter an integer: "))
    except:
        print("Looks like you did not enter an integer!")
        val = int(input("Please enter an integer: "))
    except:
        print('Hey there you entered it wrong again. Please enter an integer. You will have one more try to get it right.')
        val = int(input("Please enter an integer: "))
    finally:
        print("Finally, I executed!")
    print(val)

SyntaxError: default 'except:' must be last (1436839707.py, line 4)

In [22]:
askint()

Please enter an integer:  y


Looks like you did not enter an integer!


Please enter an integer:  y


Finally, I executed!


ValueError: invalid literal for int() with base 10: 'y'

In [18]:
askint()

Please enter an integer:  y


Looks like you did not enter an integer!
Finally, I executed!


UnboundLocalError: cannot access local variable 'val' where it is not associated with a value

Notice how we got an error when trying to print val (because it was never properly assigned). Let's remedy this by asking the user and checking to make sure the input type is an integer:

In [None]:
def askint():
    try:
        val = int(input("Please enter an integer: "))
    except:
        print("Looks like you did not enter an integer!")
        val = int(input("Try again-Please enter an integer: "))
    finally:
        print("Finally, I executed!")
    print(val)

In [None]:
askint()

Hmmm...that only did one check. How can we continually keep checking? We can use a while loop!

In [24]:
def askint():
    while True:
        try:
            val = int(input("Please enter an integer: "))
        except:
            print("Looks like you did not enter an integer!")
            continue
        else:
            print("Yep that's an integer!")
            break
        finally:
            print("Finally, I executed!")
        print(val)

In [25]:
askint()

Please enter an integer:  y


Looks like you did not enter an integer!
Finally, I executed!


Please enter an integer:  y


Looks like you did not enter an integer!
Finally, I executed!


Please enter an integer:  u


Looks like you did not enter an integer!
Finally, I executed!


Please enter an integer:  t


Looks like you did not enter an integer!
Finally, I executed!


Please enter an integer:  r


Looks like you did not enter an integer!
Finally, I executed!


Please enter an integer:  7


Yep that's an integer!
Finally, I executed!


So why did our function print "Finally, I executed!" after each trial, yet it never printed `val` itself? This is because with a try/except/finally clause, any <code>continue</code> or <code>break</code> statements are reserved until *after* the try clause is completed. This means that even though a successful input of **3** brought us to the <code>else:</code> block, and a <code>break</code> statement was thrown, the try clause continued through to <code>finally:</code> before breaking out of the while loop. And since <code>print(val)</code> was outside the try clause, the <code>break</code> statement prevented it from running.

Let's make one final adjustment:

In [None]:
def askint():
    while True:
        try:
            val = int(input("Please enter an integer: "))
        except:
            print("Looks like you did not enter an integer!")
            continue
        else:
            print("Yep that's an integer!")
            print(val)
            break
        finally:
            print("Finally, I executed!")

In [None]:
askint()

**Great! Now you know how to handle errors and exceptions in Python with the try, except, else, and finally notation!**

# Decorators


Decorators can be thought of as functions which modify the *functionality* of another function. They help to make your code shorter and more "Pythonic". 

To properly explain decorators we will slowly build up from functions. Make sure to run every cell in this Notebook for this lecture to look the same on your own computer.<br><br>So let's break down the steps:

## Functions Review

In [None]:
def func():
    return 1

In [None]:
func()

## Scope Review
Remember from the nested statements lecture that Python uses Scope to know what a label is referring to. For example:

In [None]:
s = 'Global Variable'

def check_for_locals():
    print(locals())

Remember that Python functions create a new scope, meaning the function has its own namespace to find variable names when they are mentioned within the function. We can check for local variables and global variables with the <code>locals()</code> and <code>globals()</code> functions. For example:

In [None]:
print(globals())

Here we get back a dictionary of all the global variables, many of them are predefined in Python. So let's go ahead and look at the keys:

In [None]:
print(globals().keys())

Note how **s** is there, the Global Variable we defined as a string:

In [None]:
globals()['s']

Now let's run our function to check for local variables that might exist inside our function (there shouldn't be any)

In [None]:
check_for_locals()

Great! Now lets continue with building out the logic of what a decorator is. Remember that in Python **everything is an object**. That means functions are objects which can be assigned labels and passed into other functions. Lets start with some simple examples:

In [None]:
def hello(name='Jose'):
    return 'Hello '+name

In [None]:
hello()

Assign another label to the function. Note that we are not using parentheses here because we are not calling the function **hello**, instead we are just passing a function object to the **greet** variable.

In [None]:
greet = hello

In [None]:
greet

In [None]:
greet()

So what happens when we delete the name **hello**?

In [None]:
del hello

In [None]:
hello()

In [None]:
greet()

Even though we deleted the name **hello**, the name **greet** *still points to* our original function object. It is important to know that functions are objects that can be passed to other objects!

## Functions within functions
Great! So we've seen how we can treat functions as objects, now let's see how we can define functions inside of other functions:

In [None]:
def hello(name='Jose'):
    print('The hello() function has been executed')
    
    def greet():
        return '\t This is inside the greet() function'
    
    def welcome():
        return "\t This is inside the welcome() function"
    
    print(greet())
    print(welcome())
    print("Now we are back inside the hello() function")

In [None]:
hello()

In [None]:
welcome()

Note how due to scope, the welcome() function is not defined outside of the hello() function. Now lets learn about returning functions from within functions:
## Returning Functions

In [None]:
def hello(name='Jose'):
    
    def greet():
        return '\t This is inside the greet() function'
    
    def welcome():
        return "\t This is inside the welcome() function"
    
    if name == 'Jose':
        return greet
    else:
        return welcome

Now let's see what function is returned if we set x = hello(), note how the empty parentheses means that name has been defined as Jose.

In [None]:
x = hello()

In [None]:
x

Great! Now we can see how x is pointing to the greet function inside of the hello function.

In [None]:
print(x())

Let's take a quick look at the code again. 

In the <code>if</code>/<code>else</code> clause we are returning <code>greet</code> and <code>welcome</code>, not <code>greet()</code> and <code>welcome()</code>. 

This is because when you put a pair of parentheses after it, the function gets executed; whereas if you don’t put parentheses after it, then it can be passed around and can be assigned to other variables without executing it.

When we write <code>x = hello()</code>, hello() gets executed and because the name is Jose by default, the function <code>greet</code> is returned. If we change the statement to <code>x = hello(name = "Sam")</code> then the <code>welcome</code> function will be returned. We can also do <code>print(hello()())</code> which outputs *This is inside the greet() function*.

## Functions as Arguments
Now let's see how we can pass functions as arguments into other functions:

In [None]:
def hello():
    return 'Hi Jose!'

def other(func):
    print('Other code would go here')
    print(func())

In [None]:
other(hello)

Great! Note how we can pass the functions as objects and then use them within other functions. Now we can get started with writing our first decorator:

## Creating a Decorator
In the previous example we actually manually created a Decorator. Here we will modify it to make its use case clear:

In [None]:
def new_decorator(func):

    def wrap_func():
        print("Code would be here, before executing the func")

        func()

        print("Code here will execute after the func()")

    return wrap_func

def func_needs_decorator():
    print("This function is in need of a Decorator")

In [None]:
func_needs_decorator()

In [None]:
# Reassign func_needs_decorator
func_needs_decorator = new_decorator(func_needs_decorator)

In [None]:
func_needs_decorator()

So what just happened here? A decorator simply wrapped the function and modified its behavior. Now let's understand how we can rewrite this code using the @ symbol, which is what Python uses for Decorators:

In [None]:
@new_decorator
def func_needs_decorator():
    print("This function is in need of a Decorator")

In [None]:
func_needs_decorator()

**Great! You've now built a Decorator manually and then saw how we can use the @ symbol in Python to automate this and clean our code. You'll run into Decorators a lot if you begin using Python for Web Development, such as Flask or Django!**

# Iterators and Generators

In this section of the course we will be learning the difference between iteration and generation in Python and how to construct our own Generators with the *yield* statement. Generators allow us to generate as we go along, instead of holding everything in memory. 

We've touched on this topic in the past when discussing certain built-in Python functions like **range()**, **map()** and **filter()**.

Let's explore a little deeper. We've learned how to create functions with <code>def</code> and the <code>return</code> statement. Generator functions allow us to write a function that can send back a value and then later resume to pick up where it left off. This type of function is a generator in Python, allowing us to generate a sequence of values over time. The main difference in syntax will be the use of a <code>yield</code> statement.

In most aspects, a generator function will appear very similar to a normal function. The main difference is when a generator function is compiled they become an object that supports an iteration protocol. That means when they are called in your code they don't actually return a value and then exit. Instead, generator functions will automatically suspend and resume their execution and state around the last point of value generation. The main advantage here is that instead of having to compute an entire series of values up front, the generator computes one value and then suspends its activity awaiting the next instruction. This feature is known as *state suspension*.


￼￼To start getting a better understanding of generators, let's go ahead and see how we can create some.

In [None]:
# Generator function for the cube of numbers (power of 3)
def gencubes(n):
    for num in range(n):
        yield num**3

In [None]:
for x in gencubes(10):
    print(x)

Great! Now since we have a generator function we don't have to keep track of every single cube we created.

Generators are best for calculating large sets of results (particularly in calculations that involve loops themselves) in cases where we don’t want to allocate the memory for all of the results at the same time. 

Let's create another example generator which calculates [fibonacci](https://en.wikipedia.org/wiki/Fibonacci_number) numbers:

In [None]:
def genfibon(n):
    """
    Generate a fibonnaci sequence up to n
    """
    a = 1
    b = 1
    for i in range(n):
        yield a
        a,b = b,a+b

In [None]:
for num in genfibon(10):
    print(num)

What if this was a normal function, what would it look like?

In [None]:
def fibon(n):
    a = 1
    b = 1
    output = []
    
    for i in range(n):
        output.append(a)
        a,b = b,a+b
        
    return output

In [None]:
fibon(10)

Notice that if we call some huge value of n (like 100000) the second function will have to keep track of every single result, when in our case we actually only care about the previous result to generate the next one!

## next() and iter() built-in functions
A key to fully understanding generators is the next() function and the iter() function.

The next() function allows us to access the next element in a sequence. Lets check it out:

In [None]:
def simple_gen():
    for x in range(3):
        yield x

In [None]:
# Assign simple_gen 
g = simple_gen()

In [None]:
print(next(g))

In [None]:
print(next(g))

In [None]:
print(next(g))

In [None]:
print(next(g))

After yielding all the values next() caused a StopIteration error. What this error informs us of is that all the values have been yielded. 

You might be wondering that why don’t we get this error while using a for loop? A for loop automatically catches this error and stops calling next(). 

Let's go ahead and check out how to use iter(). You remember that strings are iterables:

In [None]:
s = 'hello'

#Iterate over string
for let in s:
    print(let)

But that doesn't mean the string itself is an *iterator*! We can check this with the next() function:

In [None]:
next(s)

Interesting, this means that a string object supports iteration, but we can not directly iterate over it as we could with a generator function. The iter() function allows us to do just that!

In [None]:
s_iter = iter(s)

In [None]:
next(s_iter)

In [None]:
next(s_iter)

Great! Now you know how to convert objects that are iterable into iterators themselves!

The main takeaway from this lecture is that using the yield keyword at a function will cause the function to become a generator. This change can save you a lot of memory for large use cases. For more information on generators check out:

[Stack Overflow Answer](http://stackoverflow.com/questions/1756096/understanding-generators-in-python)

[Another StackOverflow Answer](http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do-in-python)

# Collections Module

The collections module is a built-in module that implements specialized container data types providing alternatives to Python’s general purpose built-in containers. We've already gone over the basics: dict, list, set, and tuple.

Now we'll learn about the alternatives that the collections module provides.

## Counter

*Counter* is a *dict* subclass which helps count hashable objects. Inside of it elements are stored as dictionary keys and the counts of the objects are stored as the value.

Let's see how it can be used:

In [26]:
from collections import Counter

**Counter() with lists**

In [27]:
lst = [1,2,2,2,2,3,3,3,1,2,1,12,3,2,32,1,21,1,223,1]

Counter(lst)

Counter({1: 6, 2: 6, 3: 4, 12: 1, 32: 1, 21: 1, 223: 1})

**Counter with strings**

In [28]:
Counter('aabsbsbsbhshhbbsbs')

Counter({'b': 7, 's': 6, 'h': 3, 'a': 2})

**Counter with words in a sentence**

In [29]:
s = 'How many times does each word show up in this sentence word times each each word'

words = s.split()

Counter(words)

Counter({'each': 3,
         'word': 3,
         'times': 2,
         'How': 1,
         'many': 1,
         'does': 1,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1})

In [30]:
# Methods with Counter()
c = Counter(words)



In [31]:
Counter(words).most_common()

[('each', 3),
 ('word', 3),
 ('times', 2),
 ('How', 1),
 ('many', 1),
 ('does', 1),
 ('show', 1),
 ('up', 1),
 ('in', 1),
 ('this', 1),
 ('sentence', 1)]

## Common patterns when using the Counter() object

    sum(c.values())                 # total of all counts
    c.clear()                       # reset all counts
    list(c)                         # list unique elements
    set(c)                          # convert to a set
    dict(c)                         # convert to a regular dictionary
    c.items()                       # convert to a list of (elem, cnt) pairs
    Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
    c.most_common()[:-n-1:-1]       # n least common elements
    c += Counter()                  # remove zero and negative counts

## defaultdict

defaultdict is a dictionary-like object which provides all methods provided by a dictionary but takes a first argument (default_factory) as a default data type for the dictionary. Using defaultdict is faster than doing the same using dict.set_default method.

**A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.**

In [34]:
from collections import defaultdict

In [35]:
d = {}

In [37]:
d['one'] = 'first'

In [38]:
d  = defaultdict(object)

In [39]:
d['one'] 

<object at 0x1d228e1fdd0>

In [40]:
d

defaultdict(object, {'one': <object at 0x1d228e1fdd0>})

In [45]:
for item in d:
    print(item)

item

# scope remains global

one


'one'

Can also initialize with default values:

In [46]:
d = defaultdict(lambda: 0)

In [47]:
d['one']

0

# namedtuple
The standard tuple uses numerical indexes to access its members, for example:

In [48]:
t = (12,13,14)

In [49]:
t[0]

12

For simple use cases, this is usually enough. On the other hand, remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields and is constructed far from where it is used. A namedtuple assigns names, as well as the numerical index, to each member. 

Each kind of namedtuple is represented by its own class, created by using the namedtuple() factory function. The arguments are the name of the new class and a string containing the names of the elements.

You can basically think of namedtuples as a very quick way of creating a new object/class type with some attribute fields.
For example:

In [51]:
from collections import namedtuple

In [52]:
Dog = namedtuple('Dog',['age','breed','name'])

sam = Dog(age=2,breed='Lab',name='Sammy')

frank = Dog(age=2,breed='Shepard',name="Frankie")

We construct the namedtuple by first passing the object type name (Dog) and then passing a string with the variety of fields as a string with spaces between the field names. We can then call on the various attributes:

In [53]:
sam

Dog(age=2, breed='Lab', name='Sammy')

In [54]:
sam.age

2

In [56]:
sam.breed

'Lab'

In [55]:
sam[0]

2

## Conclusion

Hopefully you now see how incredibly useful the collections module is in Python and it should be your go-to module for a variety of common tasks!

# Opening and Reading Files

So far we've discussed how to open files manually, one by one. Let's explore how we can open files programatically. 

_____

### Review: Understanding File Paths

In [None]:
pwd

### Create Practice File

We will begin by creating a practice text file that we will be using for demonstration.

In [None]:
f = open('practice.txt','w+')

In [None]:
f.write('test')
f.close()

### Getting Directories

Python has a built-in [os module](https://docs.python.org/3/library/os.html) that allows us to use operating system dependent functionality.

You can get the current directory:

In [None]:
import os

In [None]:
os.getcwd()

### Listing Files in a Directory

You can also use the os module to list directories.

In [None]:
# In your current directory
os.listdir()

In [None]:
# In any directory you pass
os.listdir("C:\\Users")

### Moving Files 

You can use the built-in **shutil** module to to move files to different locations. Keep in mind, there are permission restrictions, for example if you are logged in a User A, you won't be able to make changes to the top level Users folder without the proper permissions, [more info](https://stackoverflow.com/questions/23253439/shutil-movescr-dst-gets-me-ioerror-errno-13-permission-denied-and-3-more-e)

In [57]:
import shutil

In [None]:
shutil.move('practice.txt','C:\\Users\\Marcial')

In [None]:
os.listdir()

In [None]:
shutil.move('C:\\Users\\Marcial\practice.txt',os.getcwd())

In [None]:
os.listdir()

### Deleting Files
____
**NOTE: The os module provides 3 methods for deleting files:**
* os.unlink(path) which deletes a file at the path your provide
* os.rmdir(path) which deletes a folder (folder must be empty) at the path your provide
* shutil.rmtree(path) this is the most dangerous, as it will remove all files and folders contained in the path.
**All of these methods can not be reversed! Which means if you make a mistake you won't be able to recover the file. Instead we will use the send2trash module. A safer alternative that sends deleted files to the trash bin instead of permanent removal.**
___

Install the send2trash module with:

    pip install send2trash
    
at your command line.

In [59]:
import send2trash   # recycle bin for virtual environment you created in first class

In [None]:
os.listdir()

In [None]:
send2trash.send2trash('practice.txt')

In [None]:
os.listdir()

### Walking through a directory

Often you will just need to "walk" through a directory, that is visit every file or folder and check to see if a file is in the directory, and then perhaps do something with that file. Usually recursively walking through every file and folder in a directory would be quite tricky to program, but luckily the os module has a direct method call for this called os.walk(). Let's explore how it works.

In [None]:
os.getcwd()

In [None]:
os.listdir()

In [61]:
import os

In [65]:
os.getcwd()

'C:\\Users\\Dell\\Python\\pyenv\\Github\\Python_org'

In [66]:
for folder , sub_folders , files in os.walk("Class_files"):
    
    print("Currently looking at folder: "+ folder)
    print('\n')
    print("THE SUBFOLDERS ARE: ")
    for sub_fold in sub_folders:
        print("\t Subfolder: "+sub_fold )
    
    print('\n')
    
    print("THE FILES ARE: ")
    for f in files:
        print("\t File: "+f)
    print('\n')
    
    # Now look at subfolders

Currently looking at folder: Class_files


THE SUBFOLDERS ARE: 
	 Subfolder: .ipynb_checkpoints
	 Subfolder: packages
	 Subfolder: Untitled_Folder


THE FILES ARE: 
	 File: Class 2,3 and 4.ipynb
	 File: Class 5 and 6.ipynb
	 File: Class 5, 6 and 7.ipynb
	 File: Class 8.ipynb
	 File: Class Pandas and Numpy.ipynb
	 File: testfile
	 File: whoops.txt


Currently looking at folder: Class_files\.ipynb_checkpoints


THE SUBFOLDERS ARE: 


THE FILES ARE: 
	 File: Class 2,3 and 4-checkpoint.ipynb
	 File: Class 5 and 6-checkpoint.ipynb
	 File: Class 5, 6 and 7-checkpoint.ipynb
	 File: Class 8-checkpoint.ipynb
	 File: Class Pandas and Numpy-checkpoint.ipynb
	 File: whoops-checkpoint.txt


Currently looking at folder: Class_files\packages


THE SUBFOLDERS ARE: 
	 Subfolder: .ipynb_checkpoints


THE FILES ARE: 
	 File: untitled.py


Currently looking at folder: Class_files\packages\.ipynb_checkpoints


THE SUBFOLDERS ARE: 


THE FILES ARE: 
	 File: untitled-checkpoint.py


Currently looking at fold

___
Excellent, you should now be aware of how to work with a computer's files and folders in whichever directory they are in. Remember that the os module works for any oeprating system that supports Python, which means these commands will work across Linux,MacOs, or Windows without need for adjustment.

# datetime module

Python has the datetime module to help deal with timestamps in your code. Time values are represented with the time class. Times have attributes for hour, minute, second, and microsecond. They can also include time zone information. The arguments to initialize a time instance are optional, but the default of 0 is unlikely to be what you want.

## time
Let's take a look at how we can extract time information from the datetime module. We can create a timestamp by specifying datetime.time(hour,minute,second,microsecond)

In [68]:
import datetime as dt

In [69]:
dt.date(2023, 10, 28)

datetime.date(2023, 10, 28)

In [72]:
print(dt.date(2023, 10, 28))

2023-10-28


In [73]:


t = datetime.time(4, 20, 1)

# Let's show the different components
print(t)
print('hour  :', t.hour)
print('minute:', t.minute)
print('second:', t.second)
print('microsecond:', t.microsecond)
print('tzinfo:', t.tzinfo)

04:20:01
hour  : 4
minute: 20
second: 1
microsecond: 0
tzinfo: None


Note: A time instance only holds values of time, and not a date associated with the time. 

We can also check the min and max values a time of day can have in the module:

In [74]:
print('Earliest  :', datetime.time.min)
print('Latest    :', datetime.time.max)
print('Resolution:', datetime.time.resolution)

Earliest  : 00:00:00
Latest    : 23:59:59.999999
Resolution: 0:00:00.000001


The min and max class attributes reflect the valid range of times in a single day.

## Dates
datetime (as you might suspect) also allows us to work with date timestamps. Calendar date values are represented with the date class. Instances have attributes for year, month, and day. It is easy to create a date representing today’s date using the today() class method.

Let's see some examples:

In [75]:
today = datetime.date.today()
print(today)
print('ctime:', today.ctime())
print('tuple:', today.timetuple())
print('ordinal:', today.toordinal())
print('Year :', today.year)
print('Month:', today.month)
print('Day  :', today.day)

2023-10-28
ctime: Sat Oct 28 00:00:00 2023
tuple: time.struct_time(tm_year=2023, tm_mon=10, tm_mday=28, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=5, tm_yday=301, tm_isdst=-1)
ordinal: 738821
Year : 2023
Month: 10
Day  : 28


As with time, the range of date values supported can be determined using the min and max attributes.

In [None]:
print('Earliest  :', datetime.date.min)
print('Latest    :', datetime.date.max)
print('Resolution:', datetime.date.resolution)

Another way to create new date instances uses the replace() method of an existing date. For example, you can change the year, leaving the day and month alone.

In [76]:
d1 = datetime.date(2015, 3, 11)
print('d1:', d1)

d2 = d1.replace(year=1990)
print('d2:', d2)

d1: 2015-03-11
d2: 1990-03-11


# Arithmetic
We can perform arithmetic on date objects to check for time differences. For example:

In [77]:
d1

datetime.date(2015, 3, 11)

In [78]:
d2

datetime.date(1990, 3, 11)

In [79]:
d1-d2

datetime.timedelta(days=9131)

In [84]:
datetime.date.today() - datetime.timedelta(days=30)

datetime.date(2023, 9, 28)

This gives us the difference in days between the two dates. You can use the timedelta method to specify various units of times (days, minutes, hours, etc.)

Great! You should now have a basic understanding of how to use datetime with Python to work with timestamps in your code!

# Math and Random Modules

Python comes with a built in math module and random module. In this lecture we will give a brief tour of their capabilities. Usually you can simply look up the function call you are looking for in the online documentation.

* [Math Module](https://docs.python.org/3/library/math.html)

* [Random Module](https://docs.python.org/3/library/random.html)

We won't go through every function available in these modules since there are so many, but we will show some useful ones.

## Useful Math Functions

In [85]:
import math

In [86]:
help(math)

Help on built-in module math:

NAME
    math

DESCRIPTION
    This module provides access to the mathematical functions
    defined by the C standard.

FUNCTIONS
    acos(x, /)
        Return the arc cosine (measured in radians) of x.
        
        The result is between 0 and pi.
    
    acosh(x, /)
        Return the inverse hyperbolic cosine of x.
    
    asin(x, /)
        Return the arc sine (measured in radians) of x.
        
        The result is between -pi/2 and pi/2.
    
    asinh(x, /)
        Return the inverse hyperbolic sine of x.
    
    atan(x, /)
        Return the arc tangent (measured in radians) of x.
        
        The result is between -pi/2 and pi/2.
    
    atan2(y, x, /)
        Return the arc tangent (measured in radians) of y/x.
        
        Unlike atan(y/x), the signs of both x and y are considered.
    
    atanh(x, /)
        Return the inverse hyperbolic tangent of x.
    
    cbrt(x, /)
        Return the cube root of x.
    
    ceil(x, /)

### Rounding Numbers

In [87]:
value = 4.35

In [None]:
math.floor(value)

In [None]:
math.ceil(value)

In [None]:
round(value)

### Mathematical Constants

In [89]:
math.pi

3.141592653589793

In [88]:
from math import pi

In [91]:
pi

3.141592653589793

In [None]:
math.e

In [92]:
math.tau

6.283185307179586

In [93]:
math.inf

inf

In [94]:
math.nan

nan

### Logarithmic Values

In [None]:
math.e

In [None]:
# Log Base e
math.log(math.e)

In [None]:
# Will produce an error if value does not exist mathmatically
math.log(0)

In [None]:
math.log(10)

In [None]:
math.e ** 2.302585092994046

### Custom Base

In [None]:
# math.log(x,base)
math.log(100,10)

In [None]:
10**2

### Trigonometrics Functions

In [None]:
# Radians
math.sin(10)

In [None]:
math.degrees(pi/2)

In [None]:
math.radians(180)

# Random Module

Random Module allows us to create random numbers. We can even set a seed to produce the same random set every time.

The explanation of how a computer attempts to generate random numbers is beyond the scope of this course since it involves higher level mathmatics. But if you are interested in this topic check out:
* https://en.wikipedia.org/wiki/Pseudorandom_number_generator
* https://en.wikipedia.org/wiki/Random_seed

## Understanding a seed

Setting a seed allows us to start from a seeded psuedorandom number generator, which means the same random numbers will show up in a series. Note, you need the seed to be in the same cell if your using jupyter to guarantee the same results each time. Getting a same set of random numbers can be important in situations where you will be trying different variations of functions and want to compare their performance on random values, but want to do it fairly (so you need the same set of random numbers each time).

In [95]:
import random

In [124]:
random.randint(0,100)

24

In [117]:
random.randint(0,100)

45

In [126]:
# The value 101 is completely arbitrary, you can pass in any number you want
random.seed(1)
# You can run this cell as many times as you want, it will always return the same number
random.randint(0,100)

17

In [127]:
random.seed(101)
print(random.randint(0,100))

74


In [128]:
# The value 101 is completely arbitrary, you can pass in any number you want
random.seed(101)
print(random.randint(0,100))
print(random.randint(0,100))
print(random.randint(0,100))
print(random.randint(0,100))
print(random.randint(0,100))

74
24
69
45
59


In [129]:
random.seed(101)
random.randint(0,100)

74

### Random Integers

In [None]:
random.randint(0,100)

### Random with Sequences

#### Grab a random item from a list

In [130]:
mylist = list(range(0,20))

In [131]:
mylist

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [135]:
random.choice(mylist)

14

In [136]:
mylist

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [139]:
random.random()

0.6020428331322688

### Sample with Replacement

Take a sample size, allowing picking elements more than once. Imagine a bag of numbered lottery balls, you reach in to grab a random lotto ball, then after marking down the number, **you place it back in the bag**, then continue picking another one.

In [140]:
random.choices(population=mylist,k=10)

[18, 9, 15, 6, 8, 5, 3, 8, 10, 3]

### Sample without Replacement

Once an item has been randomly picked, it can't be picked again. Imagine a bag of numbered lottery balls, you reach in to grab a random lotto ball, then after marking down the number, you **leave it out of the bag**, then continue picking another one.

In [141]:
random.sample(population=mylist,k=10)

[10, 7, 12, 13, 6, 1, 15, 18, 4, 3]

### Shuffle a list

**Note: This effects the object in place!**

In [None]:
# Don't assign this to anything!
random.shuffle(mylist)

In [None]:
mylist

### Random Distributions

#### [Uniform Distribution](https://en.wikipedia.org/wiki/Uniform_distribution)

In [None]:
# Continuous, random picks a value between a and b, each value has equal change of being picked.
random.uniform(a=0,b=100)

#### [Normal/Gaussian Distribution](https://en.wikipedia.org/wiki/Normal_distribution)

In [None]:
random.gauss(mu=0,sigma=1)

Final Note: If you find yourself using these libraries a lot, take a look at the NumPy library for Python, covers all these capabilities with extreme efficiency. We cover this library and a lot more in our data science and machine learning courses.

# Python Debugger

You've probably used a variety of print statements to try to find errors in your code. A better way of doing this is by using Python's built-in debugger module (pdb). The pdb module implements an interactive debugging environment for Python programs. It includes features to let you pause your program, look at the values of variables, and watch program execution step-by-step, so you can understand what your program actually does and find bugs in the logic.

This is a bit difficult to show since it requires creating an error on purpose, but hopefully this simple example illustrates the power of the pdb module. <br>*Note: Keep in mind it would be pretty unusual to use pdb in an Jupyter Notebook setting.*

___
Here we will create an error on purpose, trying to add a list to an integer

In [142]:
x = [1,3,4]
y = 2
z = 3

result = y + z
print(result)
result2 = y+x
print(result2)

5


TypeError: unsupported operand type(s) for +: 'int' and 'list'

Hmmm, looks like we get an error! Let's implement a set_trace() using the pdb module. This will allow us to basically pause the code at the point of the trace and check if anything is wrong.

In [146]:
import pdb

x = [1,3,4]
y = 2
z = 3

result = y + z
print(result)

# pdb.set_trace()

result2 = x.append(y)

print(result2)

5
None


Great! Now we could check what the various variables were and check for errors. You can use 'q' to quit the debugger. For more information on general debugging techniques and more methods, check out the official documentation:
https://docs.python.org/3/library/pdb.html

# Overview of Regular Expressions

Regular Expressions (sometimes called regex for short) allows a user to search for strings using almost any sort of rule they can come up. For example, finding all capital letters in a string, or finding a phone number in a document. 

Regular expressions are notorious for their seemingly strange syntax. This strange syntax is a byproduct of their flexibility. Regular expressions have to be able to filter out any string pattern you can imagine, which is why they have a complex string pattern format.

Let's begin by explaining how to search for basic patterns in a string!

## Searching for Basic Patterns

Let's imagine that we have the following string:

In [147]:
text = "The person's phone number is 408-555-1234. Call soon!"

We'll start off by trying to find out if the string "phone" is inside the text string. Now we could quickly do this with:

In [149]:
'phone' in text

True

But let's show the format for regular expressions, because later on we will be searching for patterns that won't have such a simple solution.

In [150]:
import re

In [151]:
pattern = 'phone'

In [152]:
re.search(pattern,text)

<re.Match object; span=(13, 18), match='phone'>

In [153]:
pattern = "NOT IN TEXT"

In [154]:
re.search(pattern,text)

Now we've seen that re.search() will take the pattern, scan the text, and then returns a Match object. If no pattern is found, a None is returned (in Jupyter Notebook this just means that nothing is output below the cell).

Let's take a closer look at this Match object.

In [155]:
pattern = 'phone'

In [156]:
match = re.search(pattern,text)

In [157]:
match

<re.Match object; span=(13, 18), match='phone'>

Notice the span, there is also a start and end index information.

In [158]:
match.span()

(13, 18)

In [159]:
match.start()

13

In [160]:
match.end()

18

In [163]:
match.group()

'phone'

But what if the pattern occurs more than once?

In [164]:
text = "my phone is a new phone"

In [165]:
match = re.search("phone",text)

In [166]:
match.span()

(3, 8)

Notice it only matches the first instance. If we wanted a list of all matches, we can use .findall() method:

In [167]:
matches = re.findall("phone",text)

In [168]:
matches

['phone', 'phone']

In [169]:
len(matches)

2

To get actual match objects, use the iterator:

In [170]:
for match in re.finditer("phone",text):
    print(match.span())

(3, 8)
(18, 23)


If you wanted the actual text that matched, you can use the .group() method.

In [171]:
match.group()

'phone'

# Patterns

So far we've learned how to search for a basic string. What about more complex examples? Such as trying to find a telephone number in a large string of text? Or an email address?

We could just use search method if we know the exact phone or email, but what if we don't know it? We may know the general format, and we can use that along with regular expressions to search the document for strings that match a particular pattern.

This is where the syntax may appear strange at first, but take your time with this, often its just a matter of looking up the pattern code.

Let' begin!

## Identifiers for Characters in Patterns

Characters such as a digit or a single string have different codes that represent them. You can use these to build up a pattern string. Notice how these make heavy use of the backwards slash \ . Because of this when defining a pattern string for regular expression we use the format:

    r'mypattern'
    
placing the r in front of the string allows python to understand that the \ in the pattern string are not meant to be escape slashes.

Below you can find a table of all the possible identifiers:

<table ><tr><th>Character</th><th>Description</th><th>Example Pattern Code</th><th >Exammple Match</th></tr>

<tr ><td><span >\d</span></td><td>A digit</td><td>file_\d\d</td><td>file_25</td></tr>

<tr ><td><span >\w</span></td><td>Alphanumeric</td><td>\w-\w\w\w</td><td>A-b_1</td></tr>



<tr ><td><span >\s</span></td><td>White space</td><td>a\sb\sc</td><td>a b c</td></tr>



<tr ><td><span >\D</span></td><td>A non digit</td><td>\D\D\D</td><td>ABC</td></tr>

<tr ><td><span >\W</span></td><td>Non-alphanumeric</td><td>\W\W\W\W\W</td><td>*-+=)</td></tr>

<tr ><td><span >\S</span></td><td>Non-whitespace</td><td>\S\S\S\S</td><td>Yoyo</td></tr></table>

For example:

In [180]:
text = "My telephone number is 408-555-12345"

In [183]:
phone = re.search(r'\d\d\d-\d\d\d-\d\d\d\d\d',text)

In [184]:
phone.group()

'408-555-12345'

Notice the repetition of \d. That is a bit of an annoyance, especially if we are looking for very long strings of numbers. Let's explore the possible quantifiers.

## Quantifiers

Now that we know the special character designations, we can use them along with quantifiers to define how many we expect.

<table ><tr><th>Character</th><th>Description</th><th>Example Pattern Code</th><th >Exammple Match</th></tr>

<tr ><td><span >+</span></td><td>Occurs one or more times</td><td>	Version \w-\w+</td><td>Version A-b1_1</td></tr>

<tr ><td><span >{3}</span></td><td>Occurs exactly 3 times</td><td>\D{3}</td><td>abc</td></tr>



<tr ><td><span >{2,4}</span></td><td>Occurs 2 to 4 times</td><td>\d{2,4}</td><td>123</td></tr>



<tr ><td><span >{3,}</span></td><td>Occurs 3 or more</td><td>\w{3,}</td><td>anycharacters</td></tr>

<tr ><td><span >\*</span></td><td>Occurs zero or more times</td><td>A\*B\*C*</td><td>AAACC</td></tr>

<tr ><td><span >?</span></td><td>Once or none</td><td>plurals?</td><td>plural</td></tr></table>

Let's rewrite our pattern using these quantifiers:

In [186]:
m = re.search(r'\d{3}-\d{3}-\d{5}',text)

In [187]:
m.group()

'408-555-12345'

## Groups

What if we wanted to do two tasks, find phone numbers, but also be able to quickly extract their area code (the first three digits). We can use groups for any general task that involves grouping together regular expressions (so that we can later break them down). 

Using the phone number example, we can separate groups of regular expressions using parenthesis:

In [195]:
phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{5})')

In [196]:
results = re.search(phone_pattern,text)

In [197]:
# The entire result
results.group()

'408-555-12345'

In [198]:
# Can then also call by group position.
# remember groups were separated by parenthesis ()
# Something to note is that group ordering starts at 1. Passing in 0 returns everything
results.group(1)

'408'

In [199]:
results.group(2)

'555'

In [200]:
results.group(3)

'12345'

In [201]:
# We only had three groups of parenthesis
results.group(4)

IndexError: no such group

## Additional Regex Syntax

### Or operator |

Use the pipe operator to have an **or** statment. For example

In [202]:
re.search(r"man|woman","This man was here.")

<re.Match object; span=(5, 8), match='man'>

In [203]:
re.search(r"man|woman","This woman was here.")

<re.Match object; span=(5, 10), match='woman'>

### The Wildcard Character

Use a "wildcard" as a placement that will match any character placed there. You can use a simple period **.** for this. For example:

In [204]:
re.findall(r".at","The cat in the hat sat here.")

['cat', 'hat', 'sat']

In [205]:
re.findall(r".at","The bat went splat")

['bat', 'lat']

Notice how we only matched the first 3 letters, that is because we need a **.** for each wildcard letter. Or use the quantifiers described above to set its own rules.

In [207]:
re.findall(r"...at","The bat went splat")

['e bat', 'splat']

However this still leads the problem to grabbing more beforehand. Really we only want words that end with "at".

In [208]:
# One or more non-whitespace that ends with 'at'
re.findall(r'\S+at',"The bat went splat")

['bat', 'splat']

### Starts with and Ends With

We can use the **^** to signal starts with, and the **$** to signal ends with:

In [212]:
# Ends with a number
re.findall(r'\S+\d$','This ends with a number result2')

['result2']

In [215]:
# Starts with a number
re.findall(r'^\d\S+','1application is the loneliest number.')

['1application']

Note that this is for the entire string, not individual words!

### Exclusion

To exclude characters, we can use the **^** symbol in conjunction with a set of brackets **[]**. Anything inside the brackets is excluded. For example:

In [216]:
phrase = "there are 3 numbers 34 inside 5 this sentence."

In [217]:
re.findall(r'[^\d]',phrase)

['t',
 'h',
 'e',
 'r',
 'e',
 ' ',
 'a',
 'r',
 'e',
 ' ',
 ' ',
 'n',
 'u',
 'm',
 'b',
 'e',
 'r',
 's',
 ' ',
 ' ',
 'i',
 'n',
 's',
 'i',
 'd',
 'e',
 ' ',
 ' ',
 't',
 'h',
 'i',
 's',
 ' ',
 's',
 'e',
 'n',
 't',
 'e',
 'n',
 'c',
 'e',
 '.']

To get the words back together, use a + sign 

In [218]:
re.findall(r'[^\d]+',phrase)

['there are ', ' numbers ', ' inside ', ' this sentence.']

We can use this to remove punctuation from a sentence.

In [219]:
test_phrase = 'This is a string! But it has punctuation. How can we remove it?'

In [220]:
re.findall('[^!.? ]+',test_phrase)

['This',
 'is',
 'a',
 'string',
 'But',
 'it',
 'has',
 'punctuation',
 'How',
 'can',
 'we',
 'remove',
 'it']

In [221]:
clean = ' '.join(re.findall('[^!.? ]+',test_phrase))

In [222]:
clean

'This is a string But it has punctuation How can we remove it'

## Brackets for Grouping

As we showed above we can use brackets to group together options, for example if we wanted to find hyphenated words:

In [223]:
text = 'Only find the hypen-words in this sentence. But you do not know how long-ish they are'

In [225]:
re.findall(r'[\w]+-[\w]+',text)

['hypen-words', 'long-ish']

## Parenthesis for Multiple Options

If we have multiple options for matching, we can use parenthesis to list out these options. For Example:

In [226]:
# Find words that start with cat and end with one of these options: 'fish','nap', or 'claw'
text = 'Hello, would you like some catfish?'
texttwo = "Hello, would you like to take a catnap?"
textthree = "Hello, have you seen this caterpillar?"

In [None]:
re.search(r'cat(fish|nap|claw)',text)

In [None]:
re.search(r'cat(fish|nap|claw)',texttwo)

In [None]:
# None returned
re.search(r'cat(fish|nap|claw)',textthree)

### Conclusion

Excellent work! For full information on all possible patterns, check out: https://docs.python.org/3/howto/regex.html

____

# Timing your code
Sometimes it's important to know how long your code is taking to run, or at least know if a particular line of code is slowing down your entire project. Python has a built-in timing module to do this. 

## Example Function or Script

Here we have two functions that do the same thing, but in different ways.
How can we tell which one is more efficient? Let's time it!

In [229]:
def func_one(n):
    '''
    Given a number n, returns a list of string integers
    ['0','1','2',...'n]
    '''
    return [str(num) for num in range(n)]

In [230]:
func_one(10)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [231]:
def func_two(n):
    '''
    Given a number n, returns a list of string integers
    ['0','1','2',...'n]
    '''
    return list(map(str,range(n)))

In [232]:
func_two(10)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

### Timing Start and Stop

We can try using the time module to simply calculate the elapsed time for the code. Keep in mind, due to the time module's precision, the code needs to take **at least** 0.1 seconds to complete.

In [227]:
import time

In [233]:
# STEP 1: Get start time
start_time = time.time()
# Step 2: Run your code you want to time
result = func_one(1000000)
# Step 3: Calculate total time elapsed
end_time = time.time()
difference = end_time - start_time
print(difference)

0.4344003200531006


In [None]:
end_time

In [234]:
# STEP 1: Get start time
start_time = time.time()
# Step 2: Run your code you want to time
result = func_two(100000000)
# Step 3: Calculate total time elapsed
end_time = time.time() - start_time

In [235]:
end_time

57.65534329414368

### Timeit Module

What if we have two blocks of code that are quite fast, the difference from the time.time() method may not be enough to tell which is fater. In this case, we can use the timeit module.

The timeit module takes in two strings, a statement (stmt) and a setup. It then runs the setup code and runs the stmt code some n number of times and reports back average length of time it took.

In [236]:
import timeit

The setup (anything that needs to be defined beforehand, such as def functions.)

In [237]:
setup = '''
def func_one(n):
    return [str(num) for num in range(n)]
'''

In [238]:
stmt = 'func_one(100)'

In [239]:
timeit.timeit(stmt,setup,number=100000)

5.596082600066438

Now let try running func_two 10,000 times and compare the length of time it took.

In [None]:
setup2 = '''
def func_two(n):
    return list(map(str,range(n)))
'''

In [None]:
stmt2 = 'func_two(100)'

In [None]:
timeit.timeit(stmt2,setup2,number=100000)

It looks like func_two is more efficient. You can specify more number of runs if you want to clarify the different for fast performing functions.

In [None]:
timeit.timeit(stmt,setup,number=1000000)

In [None]:
timeit.timeit(stmt2,setup2,number=1000000)

## Timing you code with Jupyter "magic" method

**NOTE: This method is ONLY available in Jupyter and the magic command needs to be at the top of the cell with nothing above it (not even commented code)**

In [None]:
%%timeit
func_one(100)

In [None]:
%%timeit
func_two(100)

Great! Check out the documentation for more information:
https://docs.python.org/3/library/timeit.html