(auto-office)=
# The Automated Office

In this chapter, we'll look at a range of ways to automate processes and tasks you might need to undertake in an office context.

Let's import a few of the packages we'll need first. You may need to install some of these; the Chapter on {ref}`code-preliminaries` covers how to install new packages.

In [None]:
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Plot settings
plt.style.use(
    "https://github.com/aeturrell/coding-for-economists/raw/main/plot_style.txt"
)

# Pandas: Set max rows displayed for readability
pd.set_option("display.max_rows", 8)

# Set seed for random numbers
seed_for_prng = 78557
prng = np.random.default_rng(
    seed_for_prng
)  # prng=probabilistic random number generator
# Turn off warnings
warnings.filterwarnings("ignore")

## Files

Python is sometimes thought of as a 'glue' language because it can glue together lots of different functionalities (including calling other languages). The ins and outs of your operating system are no different.

The single most important module for manipulating files in Python is `os`, which interacts with your operating system and is built-in to Python (so no need for a separate install). Let's start by getting the current working directory (`getcwd()`) for the kernel (this will be whatever computer the code is being run on).

In [None]:
import os

os.getcwd()

`os` can be used to create files and directories, for example `os.mkdir()` creates a new directory (but throws an error if it already exists). There are also commands to remove files, which should of course be used with care!

One particularly useful `os` method is `stat(path).st_size`, which returns the size of the file from a given path. To get a bit meta, we can use it to query the size of the page you're currently reading.

In [None]:
# Size in bytes
print(f"The current page is {os.stat('auto-office.ipynb').st_size/1e3} kilobytes.")

Another command you should be aware of is `os.chdir(path)` which, when used, changes the working path of your code. To see the contents of the directory that your interactive window is currently in, use `os.listdir()`. Here's an example of using it, though we'll only show the first five files:

In [None]:
os.listdir()[:5]

`shutil` is another handy file-manipulation module built-in to Python. It has `copyfile` and `move` functions, which do exactly what you'd expect. You can find more information on organising files and folders in an automated way over in [Chapter 9](https://automatetheboringstuff.com/chapter9/) of *Automate the Boring Stuff With Python*.

[**watchdog**](https://pythonhosted.org/watchdog) is a library that allows you to monitor files on a computer for changes, and to log changes to a text file when they do occur. This can be useful in a production setting, or for monitoring changes in files on a connected network drive.

### Downloading Files

Downloading files programmatically and repeatably is possible using the `urllib` library, which comes built-in with Python. Here's an example of how to use it to download a file and give it a specific name:

```python

import urllib.request

url = "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1031268/NIC_Annual_Report_and_Accounts_2020_to_2021_Final_4_November.pdf"

urllib.request.urlretrieve(url, "nic_ann_rep.pdf")
```

You can also download and unzip files in one fell swoop.

```python
from io import BytesIO
from urllib.request import urlopen
from zipfile import ZipFile

# URL of the zip file
zipurl = "https://files.stlouisfed.org/files/htdocs/uploads/FRED-QD%20Appendix.zip"

# extract to path
extract_to = "downloads/"

zipfile = ZipFile(BytesIO(urlopen(url).read()))
zipfile.extractall(path=extract_to)
```

If you don't want to actually save the files on your computer, you can still look at the contents of them:

In [None]:
from io import BytesIO
from urllib.request import urlopen
from zipfile import ZipFile

# URL of the zip file
zipurl = "https://files.stlouisfed.org/files/htdocs/uploads/FRED-QD%20Appendix.zip"

# Take a look at the contents
with urlopen(zipurl) as zipresp:
    with ZipFile(BytesIO(zipresp.read())) as zfile:
        print("\n".join(zfile.namelist()))

## Links and Websites

### Checking Dead Links

Let's say you've created a new website, perhaps using the [Quarto tool](https://quarto.org/docs/websites/) that's featured in the Chapter on {ref}`quarto`. When you wrote it, all the links worked fine! But the internet shifts and changes (one reason why PDFs are under-rated...), and there's no guarantee that the links that used to work still will.

Fortunately, there are command line tools like the Python package [**deadlinks**](https://github.com/butuzov/deadlinks) out there that can check your links programmatically to see if any are down. Although **deadlinks** is a Python package, it's set up to work as a *command line* tool, so the syntax to use it is

```bash
deadlinks https://your-webpage-here.html
```

in the terminal. You will need to install **deadlinks** via pip.

### Websites

You might be surprised to know you can programmatically open up browser windows, navigate around, and do anything you'd reasonably do as a user. In fact, this technique is often used for web scraping, a technique to programmatically obtain information from websites. We've seen a bit of web scraping already in {ref}`data-extraction`, so here let's focus on simply opening websites programmatically.

The module that lets you do this is called **webbrowser**, and it's built-in to the standard Python library (no need to install anything but Python itself).

```python
import webbrowser

url = 'https://docs.python.org/'

# Open URL in a new tab, if a browser window is already open.
webbrowser.open_new_tab(url)

# Open URL in new window, raising the window if possible.
webbrowser.open_new(url)
```