This notebook contains a mix of standard library modules and third-party libaries for doing more advanved and more useful output with your programs.

# tqdm: Progress bars!

Install with:

```bash
conda install tqdm
```

`tqdm` gives you progress bars when iterating through something.  This is immensely helpful if you've got a loop that will run over a lot of data, e.g., looping through lines in a very large file.  It has two important functions in it:

- `tqdm.tqdm()`: wraps any iterable object and provides progress bars as you iterate through it.
- `tqdm.trange()`: shortcut for `tqdm.tqdm(range(...))`.

If you're using Jupyter notebooks, you can import these from `tqdm.notebook` for prettier printing.  Otherwise, import them from `tqdm` directly.  `tqdm` also integrates wonderfully with Pandas, which we'll see next month.

In [1]:
from tqdm import tqdm, trange
from time import sleep # sleep(n) -> pause the program for n seconds

for i in trange(10):
    sleep(1)
    
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for i in tqdm(numbers):
    sleep(1)

100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:10<00:00,  1.01s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:10<00:00,  1.01s/it]


In [2]:
# Same code, just using the tqdm.notebook implementations
from tqdm.notebook import tqdm, trange
from time import sleep

for i in trange(10):
    sleep(1)
    
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]
for i in tqdm(numbers):
    sleep(1)

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

You can also add some nice decorations to the progress bars, like descriptions, and turn on unit scaling:

In [3]:
for i in trange(10_000_000, desc="A big loop!"):
    pass

for i in trange(10_000_000, desc="A big loop!", unit_scale=True):
    pass

A big loop!:   0%|          | 0/10000000 [00:00<?, ?it/s]

A big loop!:   0%|          | 0.00/10.0M [00:00<?, ?it/s]

You can manually control when the progress bar updates.  Note that if you don't give `tqdm.tqdm()` an iterable, or you give it one that doesn't have a known length, it will only show the number of iterations complete.  It won't print estimated time to completion information.  You can pass `total=` to force it to use that value for the total number of iterations it has to go though, if you know it ahead of time.

In [4]:
with tqdm(desc="Odd numbers found") as pbar:
    for i in range(1000):
        if i % 2 == 1:
            pbar.update(1)
            sleep(0.001)

Odd numbers found: 0it [00:00, ?it/s]

Lastly, you can have multiple progress bars going at once, e.g. to track different things.

In [5]:
with tqdm(desc="Numbers checked", position=0, total=1000) as pbar1, tqdm(desc="Odd numbers found", position=1) as pbar2:
    for i in range(1000):
        pbar1.update(1)
        if i % 2 == 1:
            pbar2.update(1)
        sleep(0.001)

Numbers checked:   0%|          | 0/1000 [00:00<?, ?it/s]

Odd numbers found: 0it [00:00, ?it/s]

# Warnings, Exceptions, and Asserting: How to make things break

Sometimes you want to make sure your code crashes.  In fact, this happens more than you might think: forcing your code to crash can be a great way to make sure it's only doing *exactly* what it was supposed to, without adding a lot of extra logic.  E.g.: if your code is supposed to work on text files, but you give it an image file, it probably makes sense to have it crash and spot back a message like "FileError: Can't open a .png file as a text file."

Sometimes you want the messages to not crash the program, but you want to let the user know that something looks fishy.  E.g.: "FileWarning: .txt file looks like a .csv file.  Opening as a .txt file anyways."

It's easy to do both of these.  First, crashing your program.  We call this *raising* (or sometimes *throwing*) an *exception* (or *error*).  Use the magic `raise` keyword in Python to make this happen.

In [6]:
raise ValueError("Oh no, the program hit an exception and is exiting!")

ValueError: Oh no, the program hit an exception and is exiting!

The general syntax is very simple: `raise SomethingError("Error message")`.  `SomethingError` is any of Python's built-in error/exception types--of which there are a lot, like `ValueError`, `DivideByZeroError`, `NameError`, and so on--but you can also create your own error types if needed.  Usually that's not necessary; Python's built-in exception types cover a lot of ground, and you should be putting the relevant information in the message text anyways.

You can also use the `assert` keyword as a shorthand for some checks that *need* to be true for the program to continue.

In [7]:
name = "Henry"
assert name == "George", ValueError(f"Expected `name` to be 'George', got: '{name}'.")

AssertionError: Expected `name` to be 'George', got: 'Henry'.

Thi is *not quite* shorthand for 
```python
if name != "George":
    raise ValueError(...)
```

since Python can be told to ignore all `assert` checks.  This might be useful if, e.g., you have a lot of assert checks, but you've made sure they all pass, and you want to run your code without them for performance boosts, but you don't want to manually go back and remove them all.  (After all, you might need to use them later!)

Warning are a bit more complex, but not by much.  Warnings are usually reserved for telling you when something looks a bit weird, but not so weird the code can't keep running.  Use the `warning` module in the standard library to issue your own warnings.

In [8]:
import warnings
warnings.warn("You did something strange.", UserWarning)



The second argument to `warnings.warn()` is any of Python's Warning types, which are all built-ins.  `UserWarning` is a good catch-all when you're not sure what category a warning belongs to.

# Prettier printing with `pprint` and `icecream`

Consider the following dictionary, which is pretty messy and deeply nested:

In [9]:
my_dict = {"name":"Henry","favorite languages":{"python":{"rank":"2","proficiency":"high"},"julia":{"rank":"1","proficiency":"medium"},"haskell":{"rank":"3","proficiency":"medium-low"}}, "degrees":[["Rice", "Physics", "BA"], ["Rice", "Linguistics", "BA"], ["UT Arlington", "Linguistics", "MA"]]}

If we just print it out, we get a hard-to-read mess.

In [10]:
print(my_dict)

{'name': 'Henry', 'favorite languages': {'python': {'rank': '2', 'proficiency': 'high'}, 'julia': {'rank': '1', 'proficiency': 'medium'}, 'haskell': {'rank': '3', 'proficiency': 'medium-low'}}, 'degrees': [['Rice', 'Physics', 'BA'], ['Rice', 'Linguistics', 'BA'], ['UT Arlington', 'Linguistics', 'MA']]}


We would like to see this a bit more nicely-formatted for us.  There are two major options.  (well, three, if you consider that this can be converted to a JSON and formatted using the JSON library--but we're focused on more general ways to format things nicely).  The first is to use the `pprint` module in the standard library.  This library "pretty-prints" (hence the name) data structures you pass to the `pprint.pprint()` function.

In [11]:
import pprint
pprint.pprint(my_dict)

{'degrees': [['Rice', 'Physics', 'BA'],
             ['Rice', 'Linguistics', 'BA'],
             ['UT Arlington', 'Linguistics', 'MA']],
 'favorite languages': {'haskell': {'proficiency': 'medium-low', 'rank': '3'},
                        'julia': {'proficiency': 'medium', 'rank': '1'},
                        'python': {'proficiency': 'high', 'rank': '2'}},
 'name': 'Henry'}


If you need to get the prett-formatted string, because you want to do something with it later, you can use `pprint.pformat`.

In [12]:
formatted = pprint.pformat(my_dict)
print(formatted)

{'degrees': [['Rice', 'Physics', 'BA'],
             ['Rice', 'Linguistics', 'BA'],
             ['UT Arlington', 'Linguistics', 'MA']],
 'favorite languages': {'haskell': {'proficiency': 'medium-low', 'rank': '3'},
                        'julia': {'proficiency': 'medium', 'rank': '1'},
                        'python': {'proficiency': 'high', 'rank': '2'}},
 'name': 'Henry'}


There are a lot of things you can control with the pretty-printing tools in Python--you can read the `pprint` documentation for details--but I find that just using `pprint.pprint()` usually gets the job done.

There is also a third-party library, `icecream`, which has a function in it named `ic()`.  This function behaves like pretty-printing, but it's more designed for debugging and monitoring your program.  Install icecream with:

```bash
conda install -c conda-forge icecream
```

(`icecream` is not available in the main `conda` channel--you need to grab it from `conda-forge`)

In [13]:
from icecream import ic
ic(my_dict)

ic| my_dict: {'degrees': [['Rice', 'Physics', 'BA'],
                          ['Rice', 'Linguistics', 'BA'],
                          ['UT Arlington', 'Linguistics', 'MA']],
              'favorite languages': {'haskell': {'proficiency': 'medium-low', 'rank': '3'},
                                     'julia': {'proficiency': 'medium', 'rank': '1'},
                                     'python': {'proficiency': 'high', 'rank': '2'}},
              'name': 'Henry'}


{'name': 'Henry',
 'favorite languages': {'python': {'rank': '2', 'proficiency': 'high'},
  'julia': {'rank': '1', 'proficiency': 'medium'},
  'haskell': {'rank': '3', 'proficiency': 'medium-low'}},
 'degrees': [['Rice', 'Physics', 'BA'],
  ['Rice', 'Linguistics', 'BA'],
  ['UT Arlington', 'Linguistics', 'MA']]}

Note the red text in the above output--that's the part you usually care about.  In PyCharm or anything other than Jupyter, the output generally looks nicer, including supporting colors!

# `logging`: How to check what your code did

*For more details: check the [logging HOWTO](https://docs.python.org/3/howto/logging.html) guide in the Python documentation.*

Here's a good rule of thumb: *more feedback is better* (in life, generally, but also in code).  It's much better to have code that tells you what it's doing at each step, ideally with progress bars (like `tqdm`!), rather than code that just sits there spinning away until it magically gets an answer.  If your code has no intermediate output, it's much harder to readin about things like:
- Is my code stuck somewhere?  Maybe in an infinite loop?
- Did my code get a lot more data than it was expecting, so it's just taking a while to churn through it?
- What file did my code actually load?
- Does anything look a little strange when the code is running?

You *could* just use a lot of `print()` function calls, but this causes a problem.  You're probably already using `print()` for the "important output" from the program; if you start adding tons of `print()` calls everywhere to monitor things like the above problems, it can be harder to find the output you actually need.

Enter *logging:* a set of tools and techniques to let you print out as many detailed messages about your program as you want, without cluttering up the important output from `print()` functions!  Usually, any *logged messages* will be written to a *logfile*, which you (usually) only need to read if something crashes and you need to debug it.

Python's standard library has a `logging` module that will do basically everything you need.

In [14]:
import logging

logging.debug(
    "I'm a DEBUG message.  You probably don't need to read me "
    "unless you're doing a really deep dive into what happened."
)
logging.info(
    "I'm an INFO message.  I might be useful for quick sanity "
    "checks.  I'm usually less detailed than a debug message."
)
logging.warn(
    "I'm a WARNING message.  If you see me, something in your "
    "program looks weird.  The program is still running, but you "
    "should probably check on this, because it might be "
    "something you want to deal with."
)
logging.error(
    "I'm an ERROR message.  If you see me, something has gone wrong. "
    "The code might keep running--maybe this is an error that got caught "
    "and handled--but something still went wrong."
)
logging.exception(
    "I'm an EXCEPTION message.  I'm basically the same as an error "
    "message, but usually you'll see me when the thing that went wrong "
    "is a pretty common kind of issue.  E.g., divding by zero, referencing "
    "a variable that doesn't exist, or trying to add two things that can't "
    "be added."
)
logging.critical(
    "I'm a CRITICAL message.  Usually if you see me, it means something "
    "has gone so terribly wrong that the program (or some part of the program) "
    "can't keep running and has to stop."
)

  logging.warn(
ERROR:root:I'm an ERROR message.  If you see me, something has gone wrong. The code might keep running--maybe this is an error that got caught and handled--but something still went wrong.
ERROR:root:I'm an EXCEPTION message.  I'm basically the same as an error message, but usually you'll see me when the thing that went wrong is a pretty common kind of issue.  E.g., divding by zero, referencing a variable that doesn't exist, or trying to add two things that can't be added.
NoneType: None
CRITICAL:root:I'm a CRITICAL message.  Usually if you see me, it means something has gone so terribly wrong that the program (or some part of the program) can't keep running and has to stop.


Note how debug and info messages were not printed out.  Each kind of message ha a *log level* associated with it; debug is the lowest, and critical is the highest.  By default, Python only prints out warnings and above.  You can change this, though, pretty easily, in two ways.

1. When you run your program, use `python --log=DEBUG` (or whatever the *lowest* log level is you want to see).
2. Use a `logging.basicConfig()` call to change the options from within your program.

Note that neither of these work in Jupyter notebooks, so I won't be showing them here.  But, try copying an passting the code below into PyCharm (or another editor of your choice) and running it to see what happens.

In [15]:
import logging
logging.basicConfig(level=logging.DEBUG)
logging.debug(
    "I'm a DEBUG message.  You probably don't need to read me "
    "unless you're doing a really deep dive into what happened."
)
logging.info(
    "I'm an INFO message.  I might be useful for quick sanity "
    "checks.  I'm usually less detailed than a debug message."
)
logging.warn(
    "I'm a WARNING message.  If you see me, something in your "
    "program looks weird.  The program is still running, but you "
    "should probably check on this, because it might be "
    "something you want to deal with."
)

  logging.warn(


You can also use the `logging.basicConfig()` to redirect all log messages to a file.  This is the most common way to use logging--usually, it's for checking what happened *after the fact* rather than *while the code is running.*

In [16]:
# make sure to still set `level=`; only messages at or above this logging level
# are sent to the log file.  This also doesn't always work great in Jupyter
# Notebooks--the file might not get created where you want it to.
logging.basicConfig(filename="logging_demo.log", level=logging.DEBUG)
logging.debug("I'm a debug message.")
logging.critical("Something broke!")

CRITICAL:root:Something broke!
