This program is used to take an AIMS roster as input and produce various useful formats as output.
This module is the core of the program. It processes the AIMS roster through a number of data structure changes.
The AIMS roster itself is currently in the form of HTML4.01 Transitional and fails validation. The format is not well defined and is subject to change without notice, so this program is necessarily somewhat brittle and will need to be updated whenever the underlying roster format is changed.
Converting HTML into “lines”
This step is handled by
lines(...) in conjunction with the
RosterParser class. The RosterParser class subclasses
the python standard library, which is designed to handle parsing of
potentially badly formed HTML. Parsing provided by this class is
necessarily extremely basic.
The input of
lines(...) is the HTML AIMS roster in the form of
a python string. Currently this is formatted in the form of a
for each page of the roster containing a
<tr> for each line on the
<tr> is then broken down into a large number of
blocks to form a grid, the width of each
<td> being fixed in the top
<tr> on the page and columns of the required width being formed by
colspan attributes. Notably, empty rows are formed from a full
complement of completely empty
<td> elements, and
handle_data(...) on completely empty elements.
The output, which I shall refer to as “lines” is a list of lists of the form:
[ [line0cell0, line0cell1, ...], [line1cell1, line1cell2, ...], ... ]
where, for example,
line10cell15 would represent the 6th non-empty
<td> element from the 11th
<tr> element in the HTML string. The
division into pages is not captured as it does not appear to be useful
or relevant for our purposes.
###Converting “lines” into “columns”
Most of the pertinent information is contained in the block of rows on the first page that appears as a table when the HTML is viewed. This table, so far as I can tell, always starts on row 5 of the page 1 row table. Hence the number of columns we require is equal to the number of cells in the 5th line.
The marker for the end of the pertinent rows is the word “Block” appearing as one of the items in the list.
columns(...) takes the “lines” format and
converts it to “columns” format by identifying the pertinent lists
and re-arranging them in the form:
[ [col0cell0, col0cell1, ...], [col1cell0, col1cell1, ...], ... [col31cell0, col31cell1, ...] ]
where, for example,
col5cell6 is the 7th entry down from the 6th
column for the left of the pertinent table.
###Converting “columns” into an “event stream”
The next step is to convert the “duty columns” into a single consistent stream of identified, datestamped objects.
event_stream(...) processes the “duty columns” format by working
down each column in turn.
When a data item is identified as a time, it is combined with the date
that the column represents and pushed to the stream as a standard
If it is identified as a pertinent string such as a flight number or
airport code, it is pushed to the stream as an
Event object which is a
combination of a
date object and a string. This makes
datetime objects on the
If a blank line is found, it is pushed to the stream as a
with the type attribute
LINE. At the end of each column, a
object with the type attribute
COLUMN is pushed.
Various non-pertinent strings are ignored if found.
Note that objects in the “event stream”, with the exception of ignored strings, have a one to one relationship with entries in the roster. A typical two sector duty in this stream will therefore look something like:
[ ..., BREAK(column), EVENT(flight#), DATETIME(duty start), DATETIME(off blocks), EVENT(departure airport), EVENT(arrival airport), DATETIME(on blocks), BREAK(line), EVENT(flight#), DATETIME(off blocks), EVENT(departure airport), EVENT(arrival airport), DATETIME(on blocks), DATETIME(off duty), BREAK(column), ...]
There will always be a
Break object with type attribute
the first and last items of an “event stream”.
###Converting the “event stream” into a “duty stream”
The entries in the “event stream” now need to be broken up into duties.
Unfortunately, column breaks are ambiguous. Sometimes they represent a gap between duties, sometimes a gap between sectors and sometimes no gap at all. This ambiguity needs to be resolved, and to do so we need to consider two entries either side of the column break to identify it via context, and this is a somewhat messy process. The rules are:
DATETIME, BREAK(column), EVENT ---> change to BREAK(line)
EVENT, BREAK(column), DATETIME ---> should not occur
DATETIME, BREAK(column), DATETIME ---> remove
EVENT, BREAK(column), EVENT ---> ambiguous so:
BREAK(any), EVENT, BREAK(column), EVENT ---> change to BREAK(line)
DATETIME, EVENT, BREAK(column), EVENT ---> remove
EVENT, EVENT, BREAK(column), EVENT ---> should not occur
Once all the column breaks have been either removed or changed to line breaks, the task is to determine which line breaks represent breaks between duty blocks and which line breaks represent breaks between sectors, standbys within a duty block.
We can first tackle all day duties:
BREAK(line), EVENT, BREAK(line) ---> BREAK(duty), EVENT, BREAK(duty)
With that done, all remaining line breaks should be of the form:
DATETIME, BREAK(line), EVENT, DATETIME
If those two DATETIME objects are more than 8 hours apart, we can safely assume that the line break is in fact a duty break and replace it. All remaining line breaks then represent breaks between items within a duty block.
duty_stream(...) function takes the “event stream” and
carries out all this processing on it. It then breaks up the event
stream at the BREAK(duty) entries to give a “duty stream” output. For
our two sector duty block it will look like:
[ ... , [ EVENT(flight#), DATETIME(duty start), DATETIME(off blocks), EVENT(departure airport), EVENT(arrival airport), DATETIME(on blocks), BREAK(line), EVENT(flight#), DATETIME(off blocks), EVENT(departure airport), EVENT(arrival airport), DATETIME(on blocks), DATETIME(off duty)], ... ]
###Converting “duty stream” to “duty list”
duty_list(...) function carries out the conversion to the final
Firstly, a duty that spanned midnight of the first day on the roster will result in orphaned entries in the first column. Similarly, a duty that spans midnight on the last day of the roster will only be partially shown. Where possible, these need to be fixed up with fake data in order to process the maximum amount of available information.
The “duty stream” can then be converted to the “duty list” format, which is of the following form:
[ [ DATE, [DUTYSTART, DUTYEND], [ITEM], [ITEM], ... ], [ DATE, [DUTYSTART, DUTYEND], [ITEM], [ITEM], ... ], ... ]
[ITEM] is one of:
SECTORof the form
[EVENT(flight#), DATETIME(offblocks), EVENT(departure airfield), EVENT(arrival airfield), DATETIME(onblocks)]
STANDBYLIKEof the form
[EVENT(type), DATETIME(start), DATETIME(end)]
An all day duty of the form
[EVENT(type)]. In this case
DUTYENDwill both be