# Interactive Pepys _Getting Started_ guide

### How this tutorial works

(with modification)

This notebook is designed to give you an introduction to Pepys without requiring you to install anything. It will mix paragraphs of explanation with interactive cells where you can run Python code or command-line interfaces to experiment with Pepys.

If you haven't used Jupyter notebooks before, then all you need to know is that to run a cell you click on it and press Shift-Enter. Try this now with the cell below. You should see some output showing that setup has been completed.

In [1]:
%reload_ext notebook_xterm
import os
from glob import glob
from IPython.display import HTML
if os.path.basename(os.getcwd()) == 'docs':
    os.chdir("..")
try:
    del os.environ['PEPYS_CONFIG_FILE']
except KeyError:
    pass
print(f'Current path {os.getcwd()}')
print('Set up complete')

Current path /home/ian/git/pepys-import
Set up complete


Now we're going to introduce an interactive command-line within the notebook. The cell below starts with `%xterm` - that tells the notebook to create a command-line shell, and then run the command given after the `%xterm` bit. The cell below will run a command that prints out the current date and time. Once that command has run, you will be given a command-prompt at which you can run any valid Linux command. Try running `pwd` to see what directory we're in.

In [2]:
%xterm date

<notebook_xterm.terminalserver.TerminalServer at 0x7f4c502375b0>

The command-line window is deliberately kept relatively small, so that you can easily see these instructions too. If you're finished with a command-line window, then click the _close_ button on the top right to close it. If you run another cell with `%xterm` at the beginning then it will automatically close all previous terminals before opening a new one.

### 1. Your first import with Pepys
We're going to be working with a sample data file called `gpx_1_0.gpx`. It's in the GPX file format that is commonly used by the handheld GPS trackers popularised by cyclists & walkers. The file contains some introductory metadata, followed by 5 position records. Pepys contains a series of importer modules, one of which can recognise and load `*.gpx` files.

The Python code in the cell below will display the contents of the `gpx_1_0.gpx` file:

In [3]:
with open('tests/sample_data/track_files/gpx/gpx_1_0.gpx', 'r') as f:
    print(f.read())

<?xml version="1.0" encoding="UTF-8"?>
<p:gpx xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd"
	xmlns:p="http://www.topografix.com/GPX/1/0" creator="test" version="1.0">

	<p:name>1285669575458</p:name>
	<p:desc>Saved with Debrief version dated Fri Apr 27 09:31:37 BST 2012</p:desc>
	<p:time>2012-04-27T16:29:38+01:00</p:time>

	<p:trk>
		<p:name>NELSON</p:name>
		<p:trkseg>
			<p:trkpt lat="22.1862861" lon="-21.6978806">
				<p:ele>0.000</p:ele>
				<p:time>2012-04-27T16:29:38+01:00</p:time>
				<p:course>268.7</p:course>
				<p:speed>4.5</p:speed>
			</p:trkpt>
			<p:trkpt lat="22.2862861" lon="-21.7978806">
				<p:ele>0.000</p:ele>
				<p:time>2012-04-27T16:30:38+01:00</p:time>
				<p:course>264.7</p:course>
				<p:speed>5.5</p:speed>
			</p:trkpt>
			<p:trkpt lat="22.4862861" lon="-21.2978806">
				<p:ele>0.000</p:ele>
				<p:time>2012-04-27T16:31:38+01:00</p:time>
				<p:course>263

You can see the five position records (in the `<p:trkpt>` elements), each of which has a location, elevation, time, course and speed. Now, let's import this into Pepys.

Normally we would introduce Pepys by using the integration into the _Send To_ menu in Windows - but that isn't possible in this interactive notebook. Instead, we'll run Pepys using the command-line. This is actually what happens 'behind the scenes' when the _Send To_ menu is used - so we'll get the same result.

The cell below will run the Pepys Import command-line interface, telling it to import the `gpx_1_0.gpx` file. Run it now, and then read on below to find out how to interact with Pepys.

In [4]:
%xterm python -m pepys_import.cli --path tests/sample_data/track_files/gpx/gpx_1_0.gpx

<notebook_xterm.terminalserver.TerminalServer at 0x7f4c502375b0>

When you run the cell above you'll see a welcome banner followed by a table showing the status of the Pepys database before the import was run. As this is your first import, the number of rows in each table will be zero.

You will then get an interactive interface allowing you to define the metadata for the various 'objects' that are being imported. In Pepys, these represent real-world objects such as Platforms (a vessel such as a ship or submarine), Sensors (some sort of measurement/sensing device on a Platform) and Datafiles (a file that is being imported into Pepys). All these objects have metadata associated with them. For example, a Platform has a Nationality and a Platform Type, and all objects have a Classification for security purposes.

The first question asks you to accept the default name for the datafile - just press Enter for this. You'll then be asked to select a classification for the datafile. Various pre-configured classifications will be listed (these are just examples used during testing). Choose option `4` to set the classification to `Private`. You will then be given all the information about the datafile, to allow you to confirm everything is set correctly - just choose option `1` to confirm and continue.

The next set of questions are about the Platform. You should choose to add a new platform (option `2`), and then you'll be asked for details. It doesn't matter what name or details you give it, as this is just an example - but note that some fields are optional. Next is the search for a nationality - again, you'll get a list of the most common nationalities, but can search for others if you want to. For now, just choose `3` for `United Kingdom`. When you're given the list of pre-defined platform types to choose from, choose `2` to add a new platform type and enter `Frigate`, before selecting a classification of `Private` and confirming the addition.

The next stage is entering data about the sensor used to collect the data that we're importing. In this case, we're importing a GPX file, which is a file format used by GPS systems - so the sensor name is pre-filled as `GPS`. So, choose to add a new sensor (option `2`), add a classification and confirm the details are correct.

You've now finished adding metadata, and the import itself will take place and a progress bar will (very quickly) move up to 100% completed. You will then see a summary of the database status after the import - it should show 5 States, and 1 Platform are now in the database. This matches our expectations from the datafile - it had 5 position reports, and they were all about one platform.

### 2. Checking the database status with Pepys
We just saw that when Pepys imports new data, it shows a database status table beforehand and afterwards. We can also view a more-detailed status table by using the Pepys Admin application. If you were running this from Windows you can just choose the _Pepys Admin_ entry in the _Start Menu_ - again, we're going to run it from the command line here. Run the next cell, and you'll see the standard welcome banner followed by a menu.

Choose option `2` to get a status report. This will show you the status of every table in the database - you might need to scroll up in the terminal to view them all. You can see that as well as the entries in the States and Platforms tables, the import we did also created rows in the Datafiles, Changes and Logs tables, and there are also some entries in various reference tables that were automatically created when we initialised the database.

Now choose `.` to exit Pepys Admin

In [None]:
%xterm python -m pepys_admin.cli

### 3. Trying to import the same file again
Pepys keeps track of which files have been imported already, so that we don't accidentally import them multiple times. In this case, there is already an entry in the Datafiles table for the `gpx_1_0.gpx` file that we just imported, so if we try and import it again we'll just get a message telling us it has already been imported (note: the date and time given in the message is in UTC, so may be different to your local time). Try this by running the cell below (this is exactly the same command we ran to import it earlier):

In [None]:
%xterm python -m pepys_import.cli --path tests/sample_data/track_files/gpx/gpx_1_0.gpx

### 4. Exporting a datafile using Pepys Admin
Pepys Admin can do more than just view the database status. One key feature is the ability to export a datafile. To do this, run the cell below. You'll see the main menu: choose option `3`. You will need to select a platform - there is only one, so just press Enter. It will then give you a list of sensors and the periods they were active. Again there is only one, so just enter `1` and press Enter. Press Enter to accept the default value for the output filename. You should then see a message saying that the objects have been successfully exported.

Again, exit Pepys Admin by choosing option `.`.

In [None]:
%xterm python -m pepys_admin.cli

The Python code below will load and display the contents of the exported file. You can see it is a file in REP format, with data that matches the original GPX file that was imported (if you want to see the contents of the original GPX file, then replace the filename below with `tests/sample_data/track_files/gpx/gpx_1_0.gpx`)

In [None]:
with open('exported_GPS.rep', 'r') as f:
    print(f.read())

### 5. Import a file with errors
Unfortunately, some files have errors and won't be able to be parsed correctly by Pepys. Here we'll see what happens if we ask Pepys to import a folder full of files, and some of them have errors in them. As the command we're going to run will actually move some files for us, we're going to create a copy of the input files to work on.

The command in the cell below will copy the folder `tests/sample_data/track_files` to a new folder called `track_files_test`

In [None]:
!cp -R tests/sample_data/track_files track_files_test

Now we can move on to running the Pepys command. The command we are running in the cell below is very similar to the first command we ran, but with three differences:

 - We are passing a whole folder of data rather than a single file
 - We are telling Pepys to use the _default resolver_ which means it won't ask us questions to resolve metadata about objects it finds in the files (platforms, sensors etc), but will just use default values. This is great for testing Pepys, as it means the process doesn't require answering _any_ questions - everything just runs automatically.
 - We are telling Pepys to archive the files that are successfully imported: these files will be moved from their original location to a special archive location (that's why we copied them earlier).
 
Run the cell below now, and then continue to the instructions below the terminal

In [None]:
%xterm python -m pepys_import.cli --path track_files_test/rep_data --resolver default --archive

First, lets look at what output we get for a file which was imported successfully. You should see a list headed `Import succeeded for:` (you may have to scroll up a bit). Four files should be listed there: `sen_tracks.rep`, `sen_ssk_freq.dsf`, `rep_test1.rep`, `uk_track.rep`. The list will also show where these files were archived to: in this case, a folder called `archive` with subfolders for the year, month, day, hour, minute and second.

To see what's in this archive folder, type `tree archive` and press _Enter_. This will show you a tree view of the archive folder, showing that there are output files in the `report` folder (output logs and highlighted files) and the archived input files have been put in the `sources` folder.

Lets have a look at an output file first. The Python code in the cell below will display the contents of the `rep_test1_output.log` file.

In [None]:
filename = glob("archive/**/*rep_test1_output.log", recursive=True)[0]
with open(filename, 'r') as f:
    print(f.read())

The output above shows that the file `rep_test1.rep` was processed by three different importers, each of which recorded some measurements to the database.

Now let's have a look at a highlighted output file. This shows how the file was parsed by Pepys, and which elements of the file were interpreted as which fields. Through the magic of IPython, we can embed that HTML file in the notebook by running the cell below. You will see highlighting of various parts of the file - hover over the highlighted sections to see how those parts of the file were interpreted.

In [None]:
filename = glob("archive/**/*rep_test1_highlighted.html", recursive=True)[0]
with open(filename, 'r') as f:
    html = f.read()
HTML(html)

Now we'll look at the error report for one of the files. Run the Python code below to view the error report for the file `rep_test1_bad.rep`.

In [None]:
filename = glob("archive/**/*rep_test1_bad_errors.log", recursive=True)[0]
with open(filename, 'r') as f:
    print(f.read())

You can see here that there are two errors, referring to lines 8 and 24, both complaining that there aren't enough tokens (individual components, in this case separated by spaces) in the line.


### 6. Fix the errors and re-import

Click [here](../../edit/track_files_test/rep_data/rep_test1_bad.rep) to open `rep_test1_bad.rep` in the Jupyter file editor, and do the following:

 - Edit line 24 by adding some text like `Test observation` to the end of it. It should now look like: `;NARRATIVE2: 100112   121200 SEARCH_PLATFORM OBSERVATION Test observation`
 - Delete line 8 entirely
 
Save the file by pressing Ctrl-S or using the _Save_ option in the _File_ menu.

Now we'll run the import again, by running the cell below.

In [None]:
%xterm python -m pepys_import.cli --path track_files_test/rep_data --resolver default --archive

You should now see in the output that `rep_test1_bad.rep` was processed successfully.

### 7. Check the contents of the database itself
The Pepys Admin application has the ability to view the raw database tables themselves. To do this, run the cell below to open Pepys Admin, and then choose option `6` (View Data), and then option `1` (View Table). This will give you a list of database tables - start typing `Platform` and then select it from the list using the arrow keys.

You'll see the contents of the Platforms table displayed: this should include the platform that you created manually the first time you ran Pepys Import, plus various other platforms added automatically by the default resolver including `SPLENDID` and `SEARCH_PLATFORM`. You'll see each platform has a nationality and platform type. In fact, the database stores more information about platforms (including pennant numbers, trigraphs and more) but for ease of visualisation these are left out of the database display here.

Now look at some other tables: choose option `1` again and look at the `States` table, in which you'll see entries for the individual measurements that have been imported. Here we're only showing a few columns, so you can't see the actual location, speed, bearing and so on, but you can see what sensor was used and the time of the measurement. Don't worry that this list seems short - it is only showing a limited number of rows: you can see from the database status output that was displayed earlier that there are actually many hundreds of rows in the States table.

Similarly, look at the `Changes` table. This shows the reason for various changes to the database - here you can see various reasons including `Importing reference data` and importing various filenames. This allows all data in the database to be traced back to the files it came from.

Feel free to investigate the other tables in the database.

In [None]:
%xterm python -m pepys_admin.cli

### That's it!
You've now completed the _Getting Started_ tutorial for Pepys. For more information on Pepys, visit the [full documentation](https://pepys-import.readthedocs.io/en/latest/index.html).