# Interactive Pepys _Getting Started_ guide

### How this tutorial works

This notebook is designed to give you an introduction to Pepys without requiring you to install anything. It will mix paragraphs of explanation with interactive cells where you can run Python code or command-line interfaces to experiment with Pepys.

If you haven't used Jupyter notebooks before, then all you need to know is that to run a cell you click on it and press Shift-Enter. Try this now with the cell below. You should see some output showing that setup has been completed.

In [None]:
%reload_ext notebook_xterm
import os
from glob import glob
from IPython.display import HTML
if os.path.basename(os.getcwd()) == 'docs':
    os.chdir("..")
try:
    del os.environ['PEPYS_CONFIG_FILE']
except KeyError:
    pass
print(f'Current path {os.getcwd()}')
print('Set up complete')

Now we're going to introduce an interactive command-line within the notebook. The cell below starts with `%xterm` - that tells the notebook to create a command-line shell, and then run the command given after the `%xterm` bit. The cell below will run a command that prints out the current date and time. Once that command has run, you will be given a command-prompt at which you can run any valid Linux command. Try running `pwd` to see what directory we're in.

In [None]:
%xterm date

The command-line window is deliberately kept relatively small, so that you can easily see these instructions too. If you're finished with a command-line window, then click the _close_ button on the top right to close it. If you run another cell with `%xterm` at the beginning then it will automatically close all previous terminals before opening a new one.

### 1. Your first import with Pepys
We're going to be working with a sample data file called `gpx_1_0.gpx`. It's in the GPX file format that is commonly used by the handheld GPS trackers popularised by cyclists & walkers. The file contains some introductory metadata, followed by 5 position records. Pepys contains a series of importer modules, one of which can recognise and load `*.gpx` files.

The Python code in the cell below will display the contents of the `gpx_1_0.gpx` file:

In [None]:
with open('tests/sample_data/track_files/gpx/gpx_1_0.gpx', 'r') as f:
    print(f.read())

You can see the five position records (in the `<p:trkpt>` elements), each of which has a location, elevation, time, course and speed. Now, let's import this into Pepys.

Normally we would introduce Pepys by using the integration into the _Send To_ menu in Windows - but that isn't possible in this interactive notebook. Instead, we'll run Pepys using the command-line. This is actually what happens 'behind the scenes' when the _Send To_ menu is used - so we'll get the same result.

The cell below will run the Pepys Import command-line interface, telling it to import the `gpx_1_0.gpx` file. When you run the cell you'll see a welcome banner followed by a table showing the status of the Pepys database before the import was run. As this is your first import, the number of rows in each table will be zero.

You will then get an interactive interface allowing you to define the metadata for the various 'objects' that are being imported - things like datafiles, platforms and sensors. It doesn't matter what you enter at this point - but you can use the pre-existing classification _Private_ (start typing it at the appropriate point and you'll see an autocomplete menu appear, press _TAB_ to insert the completion).

After answering all the questions, the import will take place and a progress bar will (very quickly) move up to 100% completed. You will then see a summary of the database status after the import - it should show 5 States, and 1 Platform are now in the database. This matches our expectations from the datafile - it had 5 position reports, and they were all about one platform (called `NELSON`).

In [None]:
%xterm python -m pepys_import.cli --path tests/sample_data/track_files/gpx/gpx_1_0.gpx

### 2. Checking the database status with Pepys
We just saw that when Pepys imports new data, it shows a database status table beforehand and afterwards. We can also view a more-detailed status table by using the Pepys Admin application. If you were running this from Windows you can just choose the _Pepys Admin_ entry in the _Start Menu_ - again, we're going to run it from the command line here. Run the next cell, and you'll see the standard welcome banner followed by a menu.

Choose option `2` to get a status report. This will show you the status of every table in the database - you might need to scroll up in the terminal to view them all. You can see that as well as the entries in the States and Platforms tables, the import we did also created rows in the Datafiles, Changes and Logs tables, and there are also some entries in various reference tables that were automatically created when we initialised the database.

Now choose `0` to exit Pepys Admin

In [None]:
%xterm python -m pepys_admin.cli

### 3. Trying to import the same file again
Pepys keeps track of which files have been imported already, so that we don't accidentally import them multiple times. In this case, there is already an entry in the Datafiles table for the `gpx_1_0.gpx` file that we just imported, so if we try and import it again we'll just get a message telling us it has already been imported. Try this by running the cell below (this is exactly the same command we ran to import it earlier):

In [None]:
%xterm python -m pepys_import.cli --path tests/sample_data/track_files/gpx/gpx_1_0.gpx

### 4. Exporting a datafile using Pepys Admin
Pepys Admin can do more than just view the database status. One key feature is the ability to export a datafile. To do this, run the cell below. You'll see the main menu: choose option `4`. You will need to select a platform - there is only one, so just press Enter. It will then give you a list of sensors and the periods they were active. Again there is only one, so just enter `1` and press Enter. Press Enter to accept the default value for the output filename. You should then see a message saying that the objects have been successfully exported.

Again, exit Pepys Admin by choosing option `0`.

In [None]:
%xterm python -m pepys_admin.cli

The Python code below will load and display the contents of the exported file. You can see it is a file in REP format, with data that matches the original GPX file that was imported (if you want to see the contents of the original GPX file, then replace the filename below with `tests/sample_data/track_files/gpx/gpx_1_0.gpx`)

In [None]:
with open('exported_GPS.rep', 'r') as f:
    print(f.read())

### 5. Import a file with errors
Unfortunately, some files have errors and won't be able to be parsed correctly by Pepys. Here we'll see what happens if we ask Pepys to import a folder full of files, and some of them have errors in them. As the command we're going to run will actually move some files for us, we're going to create a copy of the input files to work on.

The command in the cell below will copy the folder `tests/sample_data/track_files` to a new folder called `track_files_test`

In [None]:
!cp -R tests/sample_data/track_files track_files_test

Now we can move on to running the Pepys command. The command we are running in the cell below is very similar to the first command we ran, but with three differences:

 - We are passing a whole folder of data rather than a single file
 - We are telling Pepys to use the _default resolver_ which means it won't ask us questions to resolve metadata about objects it finds in the files (platforms, sensors etc), but will just use default values. This is great for testing Pepys, as it means the process doesn't require answering _any_ questions - everything just runs automatically.
 - We are telling Pepys to archive the files that are successfully imported: these files will be moved from their original location to a special archive location (that's why we copied them earlier).
 
Run the cell below now, and then continue to the instructions below the terminal

In [None]:
%xterm python -m pepys_import.cli --path track_files_test/rep_data --resolver default --archive

First, lets look at what output we get for a file which was imported successfully. You should see a list headed `Import succeeded for:` (you may have to scroll up a bit). Four files should be listed there: `sen_tracks.rep`, `sen_ssk_freq.dsf`, `rep_test1.rep`, `uk_track.rep`. The list will also show where these files were archived to: in this case, a folder called `archive` with subfolders for the year, month, day, hour, minute and second.

To see what's in this archive folder, type `tree archive` and press _Enter_. This will show you a tree view of the archive folder, showing that there are output files in the `report` folder (output logs and highlighted files) and the archived input files have been put in the `sources` folder.

Lets have a look at an output file first. The Python code in the cell below will display the contents of the `rep_test1_output.log` file.

In [None]:
filename = glob("archive/**/*rep_test1_output.log", recursive=True)[0]
with open(filename, 'r') as f:
    print(f.read())

The output above shows that the file `rep_test1.rep` was processed by three different importers, each of which recorded some measurements to the database.

Now let's have a look at a highlighted output file. This shows how the file was parsed by Pepys, and which elements of the file were interpreted as which fields. Through the magic of IPython, we can embed that HTML file in the notebook by running the cell below. You will see highlighting of various parts of the file - hover over the highlighted sections to see how those parts of the file were interpreted.

In [None]:
filename = glob("archive/**/*rep_test1_highlighted.html", recursive=True)[0]
with open(filename, 'r') as f:
    html = f.read()
HTML(html)

Now we'll look at the error report for one of the files. Run the Python code below to view the error report for the file `rep_test1_bad.rep`.

In [None]:
filename = glob("archive/**/*rep_test1_bad_errors.log", recursive=True)[0]
with open(filename, 'r') as f:
    print(f.read())

You can see here that there are two errors, referring to lines 8 and 24, both complaining that there aren't enough tokens (individual components, in this case separated by spaces) in the line.


### 6. Fix the errors and re-import

Click [here](../../edit/track_files_test/rep_data/rep_test1_bad.rep) to open `rep_test1_bad.rep` in the Jupyter file editor, and do the following:

 - Edit line 24 by adding some text like `Test observation` to the end of it. It should now look like: `;NARRATIVE2: 100112   121200 SEARCH_PLATFORM OBSERVATION Test observation`
 - Delete line 8 entirely
 
Save the file by pressing Ctrl-S or using the _Save_ option in the _File_ menu.

Now we'll run the import again, by running the cell below.

In [None]:
%xterm python -m pepys_import.cli --path track_files_test/rep_data --resolver default --archive

You should now see in the output that `rep_test1_bad.rep` was processed successfully.

### 7. Check the SQLite database itself
By default, Pepys stores the data it imports in a SQLite database (a sort of portable database file that doesn't require a specific database server to access it). The screenshow below shows a SQLite viewer application viewing the States table. You can see some of the data that has been imported above (some of the numbers will look different, as the database stores data in SI units - so the speeds are all in metres per second, rather than knots).

![States table view](SQLiteBrowserScreenshot.png)

If you want to look at the SQLite database yourself, then click [here](../../edit/pepys_test.sqlite) to view the database in the Jupyter file editor. You'll see an error saying the file can't be viewed - ignore that, and choose _File->Download_ to download it. You can now open it in any SQLite viewer - for example using [DB Browser for SQLite](https://sqlitebrowser.org/).

### That's it!
You've now completed the _Getting Started_ tutorial for Pepys. For more information on Pepys, visit the [full documentation](https://pepys-import.readthedocs.io/en/latest/index.html).