# Data Conversion Demo
#### This notebook will demonstrate the new functionality provided by the metadata conversion file `clean_air.util.file_converter.py`.

## Setup

In [None]:
import os
import openpyxl

import cap_sample_data
from clean_air.util import file_converter as fc

#### You will also need to set up cap-sample-data, as this is currently not in an importable format.  

In [None]:
SAMPLEDIR = cap_sample_data.path

## Functions
#### This file has several methods defined within it, but the following three are intended to be accessed from the front end and as such will be demonstrated here.

## 1. convert_excel(filepath, output_location)
#### This function is for the ingestion of excel metadata files (drawn from the forms provided by Elle) and conversion of the necessary data into the required output format, which must be specified in the filename parameter.  Here are some examples of how to use it:

In [None]:
# 1) Setting up object with input and output paths
input_data = os.path.join(SAMPLEDIR, "test_data", "metadata_form_responses.xlsx")
save_location = os.path.join("assets", "tmp_output_files")
conversion_file = fc.MetadataForm(input_data, save_location)

# 2.a) Converting to json (make sure you specify filetype in the filename)
conversion_file.convert_excel('json')

# 2.b) Converting to yaml (again, specifying filetype as either `.yml` or `.yaml`)
conversion_file.convert_excel('yaml')

#### You can view the output files by navigating through the notebook home page (one level up) into `assets` and then into `tmp_output_files`.  Notice the different output formats in the two files.  I think they are rather lovely.

## 2. convert_netcdf(filepath, output_location)
#### This is designed to ingest aircraft data in netCDF format into CSV files.  There is no variation in filetype here, it only accepts netCDF as input and only provides CSV as output.

In [None]:
# 1) Set up your datafile object
input_data = os.path.join(SAMPLEDIR, "aircraft", "MOCCA_M251_20190903.nc")
save_csv = os.path.join("assets", "tmp_output_files", "MOCCA.csv")
conversion_file = fc.DataFile(input_data, save_csv)

# 2) Call the converter
conversion_file.convert_netcdf()

#### Once again, you can view the output files in `assets`/`tmp_output_files`.  The output format is difficult for a human to read, but easy for a machine.

## 3. generate_dataframe(filepath)
#### This is just in case we ever need a simple dataframe as opposed to a saved file.  It works for both excel and netCDF input files, and converts directly to a pandas dataframe, without removing or rearranging any data.

In [None]:
# 3.a) Getting a dataframe from an excel file
input_data = os.path.join(SAMPLEDIR, "test_data", "metadata_form_responses.xlsx")
excel_df = fc.generate_dataframe(filepath=input_data)

excel_df

In [None]:
# 3.b) Getting a dataframe from a netcdf file
input_data = os.path.join(SAMPLEDIR, "aircraft", "MOCCA_M251_20190903.nc")
netcdf_df = fc.generate_dataframe(filepath=input_data)

netcdf_df