# Flight Data Files to CSV
Traverse `data/Flights` to find data files to put into `data/flights.csv`.  We look for specific data files in each flight directory, and where such a file is found, we make that entry in `flights.csv` for that particular flight.

For example, if we find file `data/Flights/0069/Flt0069.kml` we put `http://localhost/data/Flights/0069/Flt0069.kml` in the `data_kml` column for flight 69.  The field has to be a valid URL, hence the `http://localhost/` prefix; this is is just a parameter to a function, below.  The idea is that you're developing on your local WordPress installation and using a WordPress migration plugin or equivalent to push to the cloud, auto-renaming in the process.

The CSV we read and write here, `data/flights.csv`, is read by the import page defined in the Perlan WordPress plugin to create all the posts of type 'flight' (a Custom Post Type we defined in the Pods plugin).  This import page can delete 'flight' posts, too: SOP is to delete all 'flight' posts and then recreate them (from `flights.csv`).

See top-level `doc` for the whole workflow.

In [1]:
import os
import pandas as pd
import path_utils as pu

In [2]:
root = "/Users/jdm/workbench/Perlan" # YMMV
os.chdir(root)

In [3]:
!pwd

/Users/jdm/workbench/Perlan


In [4]:
!ls

[34mControlled.svn[m[m                    [34mdata_website.archives[m[m
Perlan Encore Fellowship          [34mdata_website.broken[m[m
[34mPerlanProject-2020-07-07T19-16-38[m[m [34mdata_website.drupaled.broken[m[m
[34mScience.git[m[m                       [34mperlanproject.org[m[m
[34mTRASH_LATER[m[m                       [34mpods[m[m
[34massets[m[m                            [34mpods.old[m[m
[34mclippings[m[m                         [34mtmp[m[m
[34mdata[m[m                              [34mwindField[m[m
data website plan.ooutline        wp-config.php.save
[34mdata_website[m[m


# Read CSV

In [5]:
svn_root  = 'Controlled.svn/Systems/Data Network Logs/'
data_root = 'data/'
flights_root = data_root + 'Flights'
ballons_root = data_root + 'Soundings'

In [6]:
csv = pd.read_csv(f"{data_root}/flights.csv")

## Initial Cleanup

In [7]:
# Zero-out the columns we populate by walking the directory tree to find files
csv['data_ac'] = None
csv['data_uv'] = None
csv['data_kml'] = None
csv['data_adp'] = None
csv['data_imu'] = None

In [8]:
# Delete 'Unnamed' columns - this happens if CSV is saved via df.to_csv(index=True)
# This shouldn't happen, of course, and it's harmless, but annoying.
for col in csv.columns:
    match = 'Unnamed'
    if col[:len(match)] == match:
        print(f"Deleting junk column {col}")
        del csv[col]

In [9]:
# delete any null rows
csv.dropna(how='all', inplace=True)

In [10]:
# ensure correct types - int columns can become float if any missing data
csv['flight_number'] = csv.flight_number.astype(int, copy=False)

## Sanity Check: Input
Look these over to make sure everything looks OK.

In [11]:
csv.head()

Unnamed: 0,flight_number,flight_date,city,airport,takeoff_time_local,landing_time_local,duration,release_altitude_feet,maximum_altitude_feet,maximum_gps_altitude_feet,significance,pilot_front,pilot_rear,data_ac,data_uv,data_kml,data_adp,data_imu
0,1,2015-09-23,"Redmond, OR",KRDM,818.0,851.0,0.6,8100.0,8100.0,,First Flight,Jim Payne,Morgan Sandercock,,,,,
1,2,2016-01-15,"Minden, NV",KMEV,1307.0,1350.0,0.7,10800.0,10800.0,,flight testing,Jim Payne,Miguel Iturmendi,,,,,
2,3,2016-01-15,"Minden, NV",KMEV,1420.0,1502.0,0.7,10700.0,10700.0,,,Jim Payne,Miguel Iturmendi,,,,,
3,4,2016-01-27,"Minden, NV",KMEV,1423.0,1445.0,0.4,7600.0,7600.0,,,Jim Payne,,,,,,
4,5,2016-01-27,"Minden, NV",KMEV,1525.0,1555.0,0.5,8700.0,8700.0,,,Jim Payne,,,,,,


In [12]:
rec = csv[csv['flight_number'] == 65]
print(rec)

    flight_number flight_date         city airport  takeoff_time_local  \
64             65  2019-09-17  El Calafate    SAWC              1120.0   

    landing_time_local  duration  release_altitude_feet  \
64              1650.0       5.5                51000.0   

    maximum_altitude_feet  maximum_gps_altitude_feet         significance  \
64                65000.0                        NaN  Last flight of 2019   

   pilot_front         pilot_rear data_ac data_uv data_kml data_adp data_imu  
64   Jim Payne  Morgan Sandercock    None    None     None     None     None  


In [13]:
csv.columns

Index(['flight_number', 'flight_date', 'city', 'airport', 'takeoff_time_local',
       'landing_time_local', 'duration', 'release_altitude_feet',
       'maximum_altitude_feet', 'maximum_gps_altitude_feet', 'significance',
       'pilot_front', 'pilot_rear', 'data_ac', 'data_uv', 'data_kml',
       'data_adp', 'data_imu'],
      dtype='object')

# Do the Work

In [14]:
flts = pu.get_subdirs(flights_root)
#flts

In [15]:
# Walk the Flights dir, looking for data files, and put them in the CSV
def stuff_file_paths(df=None, flight_dirs=None, host="http://localhost", verbose=1):
    for flt in flight_dirs:
        files_full = pu.get_files(flt)
        files = [f[f.rfind('/')+1:] for f in files_full]
        if files == []:
            continue
        
        nr = int(flt[-4:])
        if verbose:
            print(f"#{nr}\t{flt}\t{files}")

        for file in files_full:
            file = f'{host}/{file}'
            base = file[:file.rfind('.')]
            suffix = file[file.rfind('.')+1:]
            idx = nr - 1
            #if verbose:
            #    print(f"base = {base}   suffix = {suffix}")
            if suffix == 'kml':
                df.loc[idx, 'data_kml'] = file
                if verbose:
                    print(f"KML = {file}")
            #print(f"base[-2:]={base[-2:]}")                    
            for kind in ['AC', 'ADP', 'IMU', 'UV']:
                if base[-len(kind):] == kind:
                    df.loc[idx, f'data_{kind.lower()}'] = file
                    if verbose:
                        print(f"{kind} = {file}")                
    return df

stuff_file_paths(df=csv, flight_dirs=flts)

#61	data/Flights/0061	['Flt0061IMU.zip', 'Flt0061AC.xlsb', 'Flt0061.kml', 'Flt0061ADP.csv', 'Flt0061UV.xlsx']
IMU = http://localhost/data/Flights/0061/Flt0061IMU.zip
AC = http://localhost/data/Flights/0061/Flt0061AC.xlsb
KML = http://localhost/data/Flights/0061/Flt0061.kml
ADP = http://localhost/data/Flights/0061/Flt0061ADP.csv
UV = http://localhost/data/Flights/0061/Flt0061UV.xlsx
#59	data/Flights/0059	['Flt0059AC.xlsx', 'Flt0059IMU.zip', 'Flt0059ADP.csv', 'Flt0059.kml', 'Flt0059UV.xlsx']
AC = http://localhost/data/Flights/0059/Flt0059AC.xlsx
IMU = http://localhost/data/Flights/0059/Flt0059IMU.zip
ADP = http://localhost/data/Flights/0059/Flt0059ADP.csv
KML = http://localhost/data/Flights/0059/Flt0059.kml
UV = http://localhost/data/Flights/0059/Flt0059UV.xlsx
#50	data/Flights/0050	['Flt0050UV.xlsx', 'Flt0050.kml', 'Flt0050AC.xlsb']
UV = http://localhost/data/Flights/0050/Flt0050UV.xlsx
KML = http://localhost/data/Flights/0050/Flt0050.kml
AC = http://localhost/data/Flights/0050/Flt0050A

Unnamed: 0,flight_number,flight_date,city,airport,takeoff_time_local,landing_time_local,duration,release_altitude_feet,maximum_altitude_feet,maximum_gps_altitude_feet,significance,pilot_front,pilot_rear,data_ac,data_uv,data_kml,data_adp,data_imu
0,1,2015-09-23,"Redmond, OR",KRDM,818.0,851.0,0.6,8100.0,8100.0,,First Flight,Jim Payne,Morgan Sandercock,,,,,
1,2,2016-01-15,"Minden, NV",KMEV,1307.0,1350.0,0.7,10800.0,10800.0,,flight testing,Jim Payne,Miguel Iturmendi,,,,,
2,3,2016-01-15,"Minden, NV",KMEV,1420.0,1502.0,0.7,10700.0,10700.0,,,Jim Payne,Miguel Iturmendi,,,,,
3,4,2016-01-27,"Minden, NV",KMEV,1423.0,1445.0,0.4,7600.0,7600.0,,,Jim Payne,,,,,,
4,5,2016-01-27,"Minden, NV",KMEV,1525.0,1555.0,0.5,8700.0,8700.0,,,Jim Payne,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,61,2019-08-31,El Calafate,SAWC,1037.0,1430.0,3.9,46600.0,56300.0,,,Jim Payne,Miguel Iturmendi,http://localhost/data/Flights/0061/Flt0061AC.xlsb,http://localhost/data/Flights/0061/Flt0061UV.xlsx,http://localhost/data/Flights/0061/Flt0061.kml,http://localhost/data/Flights/0061/Flt0061ADP.csv,http://localhost/data/Flights/0061/Flt0061IMU.zip
61,62,2019-09-06,El Calafate,SAWC,1333.0,1636.0,3.1,42000.0,42300.0,,,Jim Payne,Morgan Sandercock,http://localhost/data/Flights/0062/Flt0062AC.xlsb,http://localhost/data/Flights/0062/Flt0062UV.xlsx,http://localhost/data/Flights/0062/Flt0062.kml,http://localhost/data/Flights/0062/Flt0062ADP.csv,http://localhost/data/Flights/0062/Flt0062IMU.zip
62,63,2019-09-11,El Calafate,SAWC,1235.0,1811.0,5.6,47100.0,50600.0,,tow height record,Jim Payne,Tim Gardner,http://localhost/data/Flights/0063/Flt0063AC.xlsb,http://localhost/data/Flights/0063/Flt0063UV.xlsb,http://localhost/data/Flights/0063/Flt0063.kml,http://localhost/data/Flights/0063/Flt0063ADP.zip,http://localhost/data/Flights/0063/Flt0063IMU.zip
63,64,2019-09-14,El Calafate,SAWC,913.0,1230.0,3.3,45100.0,49200.0,,,Jim Payne,Miguel Iturmendi,http://localhost/data/Flights/0064/Flt0064AC.xlsb,http://localhost/data/Flights/0064/Flt0064UV.xlsb,http://localhost/data/Flights/0064/Flt0064.kml,http://localhost/data/Flights/0064/Flt0064ADP.csv,http://localhost/data/Flights/0064/Flt0064IMU.zip


# Sanity Check: Output

In [16]:
csv

Unnamed: 0,flight_number,flight_date,city,airport,takeoff_time_local,landing_time_local,duration,release_altitude_feet,maximum_altitude_feet,maximum_gps_altitude_feet,significance,pilot_front,pilot_rear,data_ac,data_uv,data_kml,data_adp,data_imu
0,1,2015-09-23,"Redmond, OR",KRDM,818.0,851.0,0.6,8100.0,8100.0,,First Flight,Jim Payne,Morgan Sandercock,,,,,
1,2,2016-01-15,"Minden, NV",KMEV,1307.0,1350.0,0.7,10800.0,10800.0,,flight testing,Jim Payne,Miguel Iturmendi,,,,,
2,3,2016-01-15,"Minden, NV",KMEV,1420.0,1502.0,0.7,10700.0,10700.0,,,Jim Payne,Miguel Iturmendi,,,,,
3,4,2016-01-27,"Minden, NV",KMEV,1423.0,1445.0,0.4,7600.0,7600.0,,,Jim Payne,,,,,,
4,5,2016-01-27,"Minden, NV",KMEV,1525.0,1555.0,0.5,8700.0,8700.0,,,Jim Payne,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,61,2019-08-31,El Calafate,SAWC,1037.0,1430.0,3.9,46600.0,56300.0,,,Jim Payne,Miguel Iturmendi,http://localhost/data/Flights/0061/Flt0061AC.xlsb,http://localhost/data/Flights/0061/Flt0061UV.xlsx,http://localhost/data/Flights/0061/Flt0061.kml,http://localhost/data/Flights/0061/Flt0061ADP.csv,http://localhost/data/Flights/0061/Flt0061IMU.zip
61,62,2019-09-06,El Calafate,SAWC,1333.0,1636.0,3.1,42000.0,42300.0,,,Jim Payne,Morgan Sandercock,http://localhost/data/Flights/0062/Flt0062AC.xlsb,http://localhost/data/Flights/0062/Flt0062UV.xlsx,http://localhost/data/Flights/0062/Flt0062.kml,http://localhost/data/Flights/0062/Flt0062ADP.csv,http://localhost/data/Flights/0062/Flt0062IMU.zip
62,63,2019-09-11,El Calafate,SAWC,1235.0,1811.0,5.6,47100.0,50600.0,,tow height record,Jim Payne,Tim Gardner,http://localhost/data/Flights/0063/Flt0063AC.xlsb,http://localhost/data/Flights/0063/Flt0063UV.xlsb,http://localhost/data/Flights/0063/Flt0063.kml,http://localhost/data/Flights/0063/Flt0063ADP.zip,http://localhost/data/Flights/0063/Flt0063IMU.zip
63,64,2019-09-14,El Calafate,SAWC,913.0,1230.0,3.3,45100.0,49200.0,,,Jim Payne,Miguel Iturmendi,http://localhost/data/Flights/0064/Flt0064AC.xlsb,http://localhost/data/Flights/0064/Flt0064UV.xlsb,http://localhost/data/Flights/0064/Flt0064.kml,http://localhost/data/Flights/0064/Flt0064ADP.csv,http://localhost/data/Flights/0064/Flt0064IMU.zip


In [17]:
# rename any columns, as needed.  I often forget the params for df.rename, so
# keeping an example here is handy as a crutch for my age-addled brain.
if False:
    csv.rename(mapper={'pic':'pilot_front','sic':'pilot_rear'}, axis=1, inplace=True)
    csv.columns

In [18]:
for kind in ['AC', 'ADP', 'KML', 'IMU', 'UV']:
    print(f"Non-null entries for {kind}: {len(csv[csv[f'data_{kind.lower()}'].notnull()])}")

Non-null entries for AC: 43
Non-null entries for ADP: 8
Non-null entries for KML: 29
Non-null entries for IMU: 10
Non-null entries for UV: 20


In [19]:
csv[ csv['data_kml'].notnull()][['data_ac', 'data_adp', 'data_kml', 'data_imu', 'data_uv']]

Unnamed: 0,data_ac,data_adp,data_kml,data_imu,data_uv
20,http://localhost/data/Flights/0021/Flt0021AC.xlsx,,http://localhost/data/Flights/0021/Flt0021.kml,,
21,http://localhost/data/Flights/0022/Flt0022AC.xlsx,,http://localhost/data/Flights/0022/Flt0022.kml,,
22,http://localhost/data/Flights/0023/Flt0023AC.xlsx,,http://localhost/data/Flights/0023/Flt0023.kml,,
24,http://localhost/data/Flights/0025/Flt0025AC.xlsx,,http://localhost/data/Flights/0025/Flt0025.kml,,
25,http://localhost/data/Flights/0026/Flt0025AC.xlsx,,http://localhost/data/Flights/0026/Flt0026.kml,,
30,http://localhost/data/Flights/0031/Flt0031AC.xlsx,,http://localhost/data/Flights/0031/Flt0031.kml,,
31,http://localhost/data/Flights/0032/Flt0032AC.xlsb,,http://localhost/data/Flights/0032/Flt0032.kml,,
32,http://localhost/data/Flights/0033/Flt0033AC.xlsb,,http://localhost/data/Flights/0033/Flt0033.kml,,
34,http://localhost/data/Flights/0035/Flt0035AC.xlsb,,http://localhost/data/Flights/0035/Flt0035.kml,,
35,http://localhost/data/Flights/0036/Flt0036AC.xlsb,,http://localhost/data/Flights/0036/Flt0036.kml,,


In [20]:
csv.columns

Index(['flight_number', 'flight_date', 'city', 'airport', 'takeoff_time_local',
       'landing_time_local', 'duration', 'release_altitude_feet',
       'maximum_altitude_feet', 'maximum_gps_altitude_feet', 'significance',
       'pilot_front', 'pilot_rear', 'data_ac', 'data_uv', 'data_kml',
       'data_adp', 'data_imu'],
      dtype='object')

In [21]:
csv.dtypes

flight_number                  int64
flight_date                   object
city                          object
airport                       object
takeoff_time_local           float64
landing_time_local           float64
duration                     float64
release_altitude_feet        float64
maximum_altitude_feet        float64
maximum_gps_altitude_feet    float64
significance                  object
pilot_front                   object
pilot_rear                    object
data_ac                       object
data_uv                       object
data_kml                      object
data_adp                      object
data_imu                      object
dtype: object

In [22]:
csv.flight_number  # index is zero-based, thus (flight_number - 1)

0      1
1      2
2      3
3      4
4      5
      ..
60    61
61    62
62    63
63    64
64    65
Name: flight_number, Length: 65, dtype: int64

# Write CSV

In [23]:
csv.to_csv(f"{data_root}/flights.csv", index=False)
print("CSV saved!")

CSV saved!
