# Maintaining Your Database

It's worth noting that database maintenance & performance tuning are large topics that could occupy entire books, & this lesson serves as an introduction to a handful of essentials.

---

# Recovering Unused Space with VACUUM

The PostgreSQL `VACUUM` command helps manage the size of a database, which can grow as a result of routine operations.

For example, when you update a row value, the database creates a new version of that row with the updated value & retains (but hides) the old version of the row. The PostgreSQL documentation refers to these rows that you can't see as *dead tuples*, with *tuples* -- an ordered list of elements -- being the name for the internal implementation of rows in a PostgreSQL database. The same thing happens when you delete a row. Though the row is no longer visible to you, it lives on as a dead row in the table.

This is by design, so the database can provide certain features in environments where multiple transactions are occurring, & an old version of a row might be needed by transaction others than the current one.

The `VACUUM` command cleans up these dead rows. Running `VACUUM` on its own designates the space occupied by dead rows as available for the database to use again (assuming that any transactions using the rows have been completed). In most cases, `VACUUM` doesn't return the space to your system's disk; it just flags that space as available for new data. To actually shrink the size of the data file, you can run `VACUUM FULL`, which rewrites the table to a new version that doesn't include the dead row space. It drops the old version.

Although `VACUUM FULL` frees space on your system's disk, there are a couple of caveats to keep in mind. First, `VACUUM FULL` takes more time to complete than `VACUUM`. Second, it must have exclusive access to the table while rewriting it, which means that no one can update data during the operation. The regular `VACUUM` command can run while updates & other operations are happening. Finally, not all dead space in a table is bad. In many cases, having available space to put new tuples instead of needing to ask the operating system for more disk space can improve performance.

You can run either `VACUUM` or `VACUUM FULL` on demand, but PostgreSQL by default runs an *autovacuum* background process that monitors the database & runs `VACUUM` as needed. 

## Tracking Table Size

We'll create a small test table & monitor its growth as we fill it with data & perform an update. 

### Creating a Table & Checking Its Size

The query below creates a `vacuum_test` table with a single column to hold an integer. Run the code & then we'll measure the table's size.

```
CREATE TABLE vacuum_test (
    integer_column integer
);
```

Before we fill the table with test data, let's check how much space it occupies on disk to establish a reference point. We can do so in two ways: check the table properties via the pgAdmin interface or run queries using PostgreSQL administrative functions. In pgAdmin, click once on a table & then highlight it, & then click the **Statistics** tab. Table size is one of about two dozen indicators in this list.

<img src = "Creating a Table to Test Vacuuming.png" width = "600" style = "margin:auto"/>

We'll focus on the running queries technique here because knowing these queries is help if for some reason pgAdmin isn't available or you're using another graphical user interface (GUI). The query below shows how to check the `vacuum_test` table size using PostgreSQL functions.

```
SELECT pg_size_pretty(
    pg_total_relation_size('vacuum_test')
);
```

The outermost function, `pg_size_pretty()`, converts bytes to a more easily understandable format in kilobytes, megabytes, or gigabytes. Wrapped inside is the `pg_total_relation_size()` function, which reports how many bytes a table, its indexes, & any offline compressed data takes up on disk. Because the table is empty at this point, running the code in pgAdmin should return a value of `0 bytes`, like this:

<img src = "Determining the Size of vacuum_test.png" width = "600" style = "margin:auto"/>

You can get the same information using the command line. Launch `psql` & at the prompt, enter the meta-command `\dt+ vacuum_test`, which should display the following information including table size:

<img src = "vacuum_test in the Command Prompt.png" width = "600" style = "margin:auto"/>

### Checking Table Size After Adding New Data

Let's add some data to the table & then check its size again. We'll use the `generate_series()` function to fill the table's `integer_column` with 500,000 rows:

```
INSERT INTO vacuum_test
SELECT * FROM generate_series(1, 500000);
```

This standard `INSERT INTO` statement addes the results of `generate_series()`, which is a series of values from 1 to 500,000, as rows to the table. After the query completes, check its size with `pg_size_pretty()` & `pg_total_relation_size()`.

<img src = "Inserting 500,000 Rows Into vacuum_test.png" width = "600" style = "margin:auto"/>

The query reports that the `vacuum_test` table, now with a single column of 500,000 integers uses 18,161,664 bytes (18,161,664 / 1024 / 1024 is approximately 17.3 MB) of disk space.

### Checking Table Size After Updates

Now, let's update the data to see how that affects the table size. We'll use the code below to update every row in the `vacuum_test` table by adding `1` to the `integer_column` values, replacing the existing value with a number that's one greater.

```
UPDATE vacuum_test
SET integer_column = integer_column + 1;
```

Run the code, then test the table size again.

```
SELECT pg_size_pretty(
    pg_total_relation_size('vacuum_test')
);
```

The table size doubled from 17MB to 35MB. The increase seems excessive, because the `UPDATE` simply replaced existing numbers with values of a similar size. As you might have guessed, the reason for this increase in table size is that for every updated value, PostgreSQL creates a new row, & the dead row remains in the table. Even though you see only 500,000 rows, the table has double that number. This behaviour can lead to surprises for database owners who don't monitor disk space.

<img src = "Checking Table Size After Updates.png" width = "600" style = "margin:auto"/>

## Monitoring the Autovacuum Process

PostgreSQL's autovacuum process monitors the database & launches `VACUUM` automatically when it detects a large number of dead rows in a table. Although autovacuum is enabled by default, you can turn it on or off & configure it using the settings. Because autovacuum runs in the background, you won't see any immediately visible indication that it's working, but you can check its activity by querying data that PostgreSQL collects about system performance.

PostgreSQL has it own *statistics collector* that tracks database activity & usage. You can look at the statistics by querying one of several views the system provides. (See a complete list of views for monitoring the state of the system in the PostgreSQL documentation under ["The Statistics Collector"](https://www.postgresql.org/docs/current/monitoring-stats.html)) To check the activity of autovacuum, we query the `pg_stat_all_tables` view, as shown below:

```
SELECT relname,
       last_vacuum,
       last_autovacuum,
       vacuum_count,
       autovacuum_count
FROM pg_stat_all_tables
WHERE relname = 'vacuum_test';
```

A view provides the results of a stored query. The query stored by the view `pg_stat_all_tables` returns a column called `relname`, which is the name of the table, plus columns with statistics related to index scans, rows inserted & deleted, & other data. For this query, we're interested in `last_vacuum` & `last_autovacuum`, which contain the last time the table was vacuumed manually & automatically, respectively. We also ask for `vacuum_count` & `autovacuum_count`, which show the number of times the vacuum was run manually & automatically.

By default, autovacuum checks tables every minute. So, if a minute has passed since you last updated `vacuum_test`, you should see details of vacuum activity when you run the query above. Here's what my system shows.

<img src = "Monitoring Autovacuum Process.png" width = "600" style = "margin:auto"/>

The table shows the data & time of the last autovacuum, & the `autovacuum_count` column shows one occurence. This result indicates that autovacuume executed a `VACUUM` command on the table once. However, because we've not vacuumed manually, the `last_vacuum` column is empty, & the `vacuum_count` is `0`.

Recall that `VACUUM` designates dead rows as available for the database to reuse but typically doesn't reduce the size of the table on disk. You can confirm this by running our previous query to check the table size, which shows the table remains at 35MB even after the automatic vacuum.

<img src = "Checking Table Size After Autovacuum.png" width = "600" style = "margin:auto"/>

## Running VACUUM Manually

To run `VACUUM` manually, we only need a single line of code.

```
VACUUM vacuum_test;
```

This command should return the message `VACUUM` from the server. Now when you fetch statistics again, using the below query, you should see that the `last_vacuum` column reflects the date & time of the manual vacuum you just ran, & the number in the `vacuum_count` column should increase by one.

```
SELECT relname,
       last_vacuum,
       last_autovacuum,
       vacuum_count,
       autovacuum_count
FROM pg_stat_all_tables
WHERE relname = 'vacuum_test';
```

<img src = "Running VACUUM Manually.png" width = "600" style = "margin:auto"/>

In this example, we executed `VACUUM` on our test table, but you can also run `VACUUM` on the entire database by omitting the table name. In addition, you can add the `VERBOSE` keyword to return information such as the number of rows found in a table & the number of rows removed, among other information.

## Reducing Table Size with VACUUM FULL

Next, we'll run `VACUUM` with the `FULL` option, which actually returns the space taken up by dead tuples back to disk. It does this by creating a new version of a table with the dead rows discarded.

To see how `VACUUM FULL` works, run the command:

```
VACUUM FULL vacuum_test;
```

After the command executes, test the table size again. it should be back down to 17MB, the size it was when we first inserted data.

<img src = "Using VACUUM FULL to Reclaim Disk Space.png" width = "600" style = "margin:auto"/>

It's never prudent or safe to run out of disk space, so minding the size of your database files as well as your overall system space is a worthwhile routine to establish. Using `VACUUM` to prevent database files from growing bigger than they have to is a good start.

---

# Changing Server Settings

You can alter the settings for your PostgreSQL server by editing values in *postgresql.conf*, one of several configuration text files that control server settings. other files include *pg_hba.conf*, which controls connections to the server, & *pg_ident.conf*, which database administrators can use to map usernames on a network to usernames in PostgreSQL. Most of the values in the file are set to defaults you may never need to adjust, but it's worth exploring in case you're fine-tuning your system.

## Locating & Editing postgresql.conf

The location of *postgresql.conf* varies depending on your operating system & install method. You can run the command below to locate the file:

```
SHOW config_file;
```

When you run the command on macOS, it shows the path to the file, as shown here:

<img src = "Showing the Location of postgresqlconf.png" width = "600" style = "margin:auto"/>

To edit *postgresql.conf*, navigate in your file system to the directory displayed by `SHOW config_file;`, & open the file using a text editor. 

When you open the file, the first several lines should read as follows:

<img src = "postgresqlconf File.png" width = "600" style = "margin:auto"/>

The *postgre.conf* file is organised into sections that specify settings for file locations, security, logging of information, & other processes. Many lines begin with a hash mark (`#`), which indicates the line is commented out & the setting shown is the active default.

For example, in the *postgresql.conf* file section "Autovacuum Parameters", the default is for autovacuume to be turned on (which is good, standard practice). The hash mark (`#`) in front of the line means that the line is commented out & the default value is in effect:

<img src = "Autovacuum in postgresqlconf.png" width = "600" style = "margin:auto"/>

To change this or other default settings, you would remove the hash mark, adjust the setting value, & save *postgresql.conf*. Some changes, such as memory allocations, require a restart of the server; they're noted in *postgresql.conf*. Other changes require only a reload of settings files. You can reload settings files by running the function `pg_reload_conf()` under an account with superuser permissions or by executing the `pg_ct1` command.

The code below shows settings you may want to change, excerpted from the *postgresql.conf* section "Client Connection Default". Use your text editor to search the file for the following:

```
datestyle = 'iso, mdy'

timezone = 'America/New_York'

default_text_search_config = 'pg_catalog.english'
```

<img src = "/Users/jiehengyu/Desktop/PostgreSQL/Chapter_19/Sample postgresqlconf Settings.png">

You can use the `datestyle` setting to specify how PostgreSQL displays dates in query results. This setting takes two parameters separated by a comma: the Output format & the ordering of month, day, & year. The default for the output format is the ISO format `YYYY-MM-DD`; however, you can also use other formats: arrange `m`, `d`, & `y` in the order you prefer.

The `timezone` parameter sets the server time zone. It states `America/Los Angeles`, which reflects the time zone on my machine when I installed Postgresql. Yours may be different based on your location. When setting up PostgreSQL for use as the backend to a database application or on a network, administrators often set this value to `UTC` & use that as a standard on machines across multiple locations.

The `default_text_search_config` value sets the language used by the full-text search operations. Here, mine is set to `english`. Depending on your needs, you can set this to `spanish`, `german`, `russian`, or any other language of your choice.

These three examples represent only a handful of settings available for adjustment. Unless you end up deep in system tuning, you probably won't have to tweak much else. Also use caution when changin settings on a network server used by multiple people or applications; changes can have unintended consequences, so it's worth communicating with colleagues first.

## Reloading Settings with pg_ctl

The command line utility `pg_ctl` allows you to perform actions on a PostgreSQL server, such as starting & stopping it & checking its status. Here, we'll use the utility to reload the settings files so the changes we make will take effect. Running the command reloads all settings files at once.

You'll need to open & configure a command line prompt using `psql`. After you launch the command prompt, use the following command to reload, replacing the apth with the path to the PostgreSQL data directory:

On macOS, its `pg_ctl reload -D '/path/to/data/directory/'`

To find the location of your PostgreSQL data directory, run the query below:

```
SHOW data_directory;
```

<img src = "Showing the Location of the Data Directory.png" width = "600" style = "margin:auto"/>

Place the path after the `-D` argument, between single quotes. You run this command on your system's command prompt, not inside the `psql` application. Enter the command & press ENTER; it should respond with the message `server signaled`. The settings files will be reloaded, & changes should take effect.

<img src = "Reloading Settings.png" width = "600" style = "margin:auto"/>

If you've changed settings that required a server restart, replace `reload` with `restart`.

---

# Backing Up & Restoring Your Database

You might want to back up your entire database either for safekeeping or for transferring data to a new or upgraded server. PostgreSQL offers command line tools that make backup & restore operations easy.

## Using pg_dump to Export a Database or Table

The PostgreSQL command line tool `pg_dump` creates an output file that contains all the data from your database; SQL commands for re-creating tables, view, functions, & other database objects; & commands for loading the data into tables. You can also use `pg_dump` to save only selected tables in your database. By default, `pg_dump` outputs a text file; we'll discuss an alternate custom compressed format & other options as well.

To export the `analysis` database we've used for our exercises to a file, run the command below at your system's command prompt (not in `psql`):

```
pg_dump -d analysis -U user_name -Fc -v -f analysis_backup.dump
```

Here, we start the command with `pg_dump` & use similar connection arguments as with `psql`. We specify the database to export with the `-d` argument, followed by the `-U` argument & your username. Next, we use the `-Fc` argument to specify that we want to generate this export in a custom PostgreSQL compressed format & the `-v` argument to generate verbose output. Then we use the `-f` argument to direct the output of `pg_dump` to a text file named *analysis_backup.dump*. To place the file in a directory other than the one your terminal prompt is current open to, you can specify the complete directory path before the filename.

When you execute the command, depending on your installation, you might see a password prompt. Fill in that password, if prompted. Then depending on the size of your database, the command could take a few minutes to complete. You'll see a series of messages about the objects the command is reading & outputting. When it's done, it should return you to a new command prompt, & you should see a file named *analysis_backup.dump* in your current directory.

<img src = "Exporting the analysis Database with pg_dump.png" width = "600" style = "margin:auto"/>

To limit the export to one or more tables that match a particular name, use the `-t` argument followed by the name of the table in single quotes. For example, to back up the `train_rides` table, use the following command:

```
pg_dump -t 'train-rides' -d analysis -U postgres -Fc -v -f train_backup.dump
```

<img src = "Exporting the train_rides Table with pg_dump.png" width = "600" style = "margin:auto"/>

## Restoring a Database Export with pg_restore

The `pg_restore` utility restores data from your exported database file. You might need to restore your database when migrating data to a new server or when upgrading to a new major version of PostgreSQL. To restore the `analysis` database (assuming you're on a server where `analysis` doesn't exist), at the command prompt, run the following command:

```
pg restore -C -v -d postgres -U user_name analysis_backup.dump
```

After `pg_restore`, you add the `-C` argument, which tells the utility to create the `analysis` database on the server. Then, as you saw previously, the `-v` argument provides verbose output, & `-d` specifies the name of the database to connect to, followed by the `-U` argument & your usename. Press ENTER & the restore will begin. When it's done, you should be able to view your restored database via `psql` or in pgAdmin.

## Exploring Additional Backup & Restore Options

You can configure `pg_dump` with multiple options to include or exclude certain database objects, such as tables matching a name pattern, or to specify the output format. For example, when we backed up the `analysis` database, we specified the `-Fc` argument with `pg_dump` to generate the backup in a custom PostgreSQL compressed format. By excluding the `-Fc` argument, the utility will output in plain text, & you can view the contents of the backup with a text editor. For details, check the full `pg_dump` documentation at [https://www.postgresql.org/docs/current/app-pgdump.html](https://www.postgresql.org/docs/current/app-pgdump.html). For corresponding restore options, check the `pg_restore` documentation at [https://www.postgresql.org/docs/current/app-pgrestore.html](https://www.postgresql.org/docs/current/app-pgrestore.html).

---

# Wrapping Up

In this lesson, we learned how to track & conserve space in our databases using the `VACUUM` feature in PostgreSQL. We also learned how to change system settings as well as back up & restore databases using other command line tools. Although we may not need to perform these tasks every tday, these maintenance tricks can help enhance the performance of your databases.