In [1]:
!test -f aircraft_small.bufr || wget https://github.com/ecmwf/pdbufr/raw/master/tests/sample-data/aircraft_small.bufr
!test -f temp.bufr || wget https://github.com/ecmwf/pdbufr/raw/master/tests/sample-data/temp.bufr

# Flat dump

In [2]:
import pdbufr

The flat dump mode is activated when [read_bufr()](../read_bufr.rst) is called with ``flat`` = ``True``. In this mode messages/subsets are extracted as a whole preserving the column order (see exceptions below). 

Since the results contain a large number of columns with very long names the **transpose** of the DataFrames are shown in all the examples below to make better use of the available space.

### Options

By default all the header and data keys are extracted:

In [3]:
df = pdbufr.read_bufr("aircraft_small.bufr", flat=True)
df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
edition,3,3,3,3,3,3,3,3,3,3
masterTableNumber,0,0,0,0,0,0,0,0,0,0
bufrHeaderSubCentre,0,0,0,0,0,0,0,0,0,0
bufrHeaderCentre,98,98,98,98,98,98,98,98,98,98
updateSequenceNumber,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...
#1#dewpointTemperature,,,,,,,,,,
#1#relativeHumidity,,,,,,,,,,
#1#airframeIcing,,,,,,,,,,
#1#centre,98,98,98,98,98,98,98,98,98,98


However, we can extract only the **header keys**:

In [4]:
df = pdbufr.read_bufr("aircraft_small.bufr", columns="header", flat=True)
df.T[:6]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
edition,3,3,3,3,3,3,3,3,3,3
masterTableNumber,0,0,0,0,0,0,0,0,0,0
bufrHeaderSubCentre,0,0,0,0,0,0,0,0,0,0
bufrHeaderCentre,98,98,98,98,98,98,98,98,98,98
updateSequenceNumber,0,0,0,0,0,0,0,0,0,0
dataCategory,4,4,4,4,4,4,4,4,4,4


or only the **data keys**:

In [5]:
df = pdbufr.read_bufr("aircraft_small.bufr", columns="data", flat=True)
df.T[:18]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
subsetNumber,1,1,1,1,1,1,1,1,1,1
#1#aircraftFlightNumber,QGOBTRRA,QGOBTRRA,UOZDOZ2S,UOZDOZ2S,UOZDOZ2S,UOZDOZ2S,VUVTEWZQ,4IPASOZA,WSSASKBA,WSSASKBA
#1#aircraftRegistrationNumberOrOtherIdentification,HGSKJFBA,HGSKJFBA,O2RYR4JA,O2RYR4JA,O2RYR4JA,O2RYR4JA,4NK13QZA,0IKWU1JA,P4MAWDZA,P4MAWDZA
#1#aircraftNavigationalSystem,,,,,,,,,,
#1#aircraftDataRelaySystemType,3,3,3,3,3,3,3,3,3,3
#1#instrumentationForWindMeasurement,4,4,4,4,4,4,4,4,4,4
#1#temperatureObservationPrecision,0.1,0.1,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25
#1#originalSpecificationOfLatitudeOrLongitude,1,1,10,10,10,10,10,1,10,10
#1#aircraftRollAngle,,,,,,,,,,
#1#stationType,0,0,0,0,0,0,0,0,0,0


Filtering works similarly to the hierarchical (i.e. non-flat) mode:

In [6]:
df = pdbufr.read_bufr("aircraft_small.bufr", 
    columns="data", 
    filters={"aircraftFlightNumber": "UOZDOZ2S"}, 
    flat=True)
df.T.iloc[:18]

Unnamed: 0,0,1,2,3
subsetNumber,1,1,1,1
#1#aircraftFlightNumber,UOZDOZ2S,UOZDOZ2S,UOZDOZ2S,UOZDOZ2S
#1#aircraftRegistrationNumberOrOtherIdentification,O2RYR4JA,O2RYR4JA,O2RYR4JA,O2RYR4JA
#1#aircraftNavigationalSystem,,,,
#1#aircraftDataRelaySystemType,3,3,3,3
#1#instrumentationForWindMeasurement,4,4,4,4
#1#temperatureObservationPrecision,0.25,0.25,0.25,0.25
#1#originalSpecificationOfLatitudeOrLongitude,10,10,10,10
#1#aircraftRollAngle,,,,
#1#stationType,0,0,0,0


### Column alignment

The aircraft messages we have examined so far had identical structure; each message contained the very same keys in the very same order. The result was always a nicely aligned DataFrame.

However, in a BUFR file each message can have a different structure and the alignment is not guaranteed at all. We will demonstrate it with a BUFR file containing radiosonde data.

First, we extract the first message only. From the output we can see it contains 24 pressure level blocks.

In [7]:
df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": 1}, flat=True)
df.T.iloc[-16:]

Unnamed: 0,0
#23#pressure,26300.0
#23#verticalSoundingSignificance,4.0
#23#nonCoordinateGeopotential,89290.0
#23#airTemperature,218.5
#23#dewpointTemperature,198.5
#23#windDirection,
#23#windSpeed,
#24#pressure,25800.0
#24#verticalSoundingSignificance,4.0
#24#nonCoordinateGeopotential,90490.0


Next, we extract the second message. This message contains one more block (25 in total):

In [8]:
df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": 2}, flat=True)
df.T.iloc[-16:]

Unnamed: 0,0
#24#pressure,23200.0
#24#verticalSoundingSignificance,4.0
#24#nonCoordinateGeopotential,98410.0
#24#airTemperature,223.1
#24#dewpointTemperature,192.1
#24#windDirection,
#24#windSpeed,
#25#pressure,20500.0
#25#verticalSoundingSignificance,4.0
#25#nonCoordinateGeopotential,106300.0


Now, if we extract these messages together the columns will not be aligned:

In [9]:
df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": [1,2]}, flat=True)
df.T.iloc[-16:]



Unnamed: 0,0,1
#24#pressure,25800.0,23200.0
#24#verticalSoundingSignificance,4.0,4.0
#24#nonCoordinateGeopotential,90490.0,98410.0
#24#airTemperature,218.5,223.1
#24#dewpointTemperature,196.5,192.1
#24#windDirection,,
#24#windSpeed,,
#1#centre,98.0,98.0
#1#generatingApplication,1.0,1.0
#25#pressure,,20500.0


 So what happened here? The resulting DataFrame was built message by message and columns not yet present were automatically appended to the end by Pandas. We can see that this happened to block #25 from the second message. It changed the original column order because "#1#centre" and "#1#generatingApplication" now come before and not after block #25. While this is probably a harmless change in this case we can imagine it can pose a significant challenge for more complex message types. 
 
 As a safety measure, when messages are not fully aligned [read_bufr()](../read_bufr.rst) prints a warning message to the stderr.

 To disable the warning message use the **warnings** module as shown below:

In [10]:
import warnings
warnings.filterwarnings("ignore", module="pdbufr")

df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": [1,2]}, flat=True)
df.T.iloc[-16:]

Unnamed: 0,0,1
#24#pressure,25800.0,23200.0
#24#verticalSoundingSignificance,4.0,4.0
#24#nonCoordinateGeopotential,90490.0,98410.0
#24#airTemperature,218.5,223.1
#24#dewpointTemperature,196.5,192.1
#24#windDirection,,
#24#windSpeed,,
#1#centre,98.0,98.0
#1#generatingApplication,1.0,1.0
#25#pressure,,20500.0
