# Moussala data parser

Maybe we will use something from https://github.com/ODZ-UJF-AV-CR/AIRDOS_calibration/blob/master/airdos_parser/AIRDOS4_256_flux.ipynb

Steps to get Liulin data:
- get Liulin data from Todor
- concatenate them and convert the timestamps in the result (see transform-date.sh)

Steps to get Airdos data:
- use Martin's downloader script

OR: Steps to get Airdos data:
- retrieve the data from space.astro.cz using the get_data.sh (note that these will be GB of data)
- concatenate the data via "find space.astro.cz -name '*A2_meta.csv' -exec mv '{}' A2/ \;" and cat as needed.

Note:
Integration time for one record should be 10.24 s (pers. comm. with MK on 180702).

Then this notebook makes a HDF file.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tables
import datetime
from IPython.display import display, Math, Latex
import sys
import os
import gc
import plotly
import plotly.graph_objs as go
plotly.offline.init_notebook_mode(connected=True)

# Parse AIRDOS data and save them to HDF

In [2]:
def compute_flux( fstore, hdfkey, fna, NOISE_LEVEL ):
    # Chunked flux computing from a CSV file
    print('Computing flux for '+fna)
    fna_chunksize = 100000
    dfa_chunks = pd.read_csv(fna, sep=',', header=None, comment='*', parse_dates=[0], error_bad_lines=False,chunksize=fna_chunksize)

    DATA_LEVEL=256
    LAST_CHANNEL=514
    surface = 2.0 # cm2
    dtime=10.24 # s

    store = pd.HDFStore(fstore,mode='a')
    
    dfa = pd.DataFrame()
    lines_parsed = 0
    
    for chunk in dfa_chunks:
        dfa_chunk = pd.DataFrame()
        # Compute particle flux, assuming the data are inbetween NOISE_LEVEL and LAST_LEVEL channels
        dfa_chunk = chunk.iloc[:,DATA_LEVEL:LAST_CHANNEL]
        dfa_chunk['flux'] = chunk.iloc[:,NOISE_LEVEL:LAST_CHANNEL].sum(axis=1).astype('float64')
        dfa_chunk['flux'] = dfa_chunk['flux']/(surface*dtime)
        dfa_chunk['timestamp'] = pd.to_datetime(chunk.iloc[:,0], unit='s', errors='coerce')
        # Drop all events with any NaN/NaT events
        dfa_chunk.dropna(inplace=True)
        dfa_chunk.set_index('timestamp',drop=False,inplace=True)
        #dfa = pd.concat([dfa,dfa_chunk])
        store.append(hdfkey,dfa_chunk,format='table',append=True,complevel=9, complib='blosc')
        lines_parsed += len(dfa_chunk.index)
        del dfa_chunk
        sys.stdout.write('.')
        gc.collect()

    print(' ')
    print(repr(lines_parsed)+' records parsed.')
    print('')
    del dfa_chunks
    store.close()



In [3]:
# Data file to store the data
hdf = 'moussala.h5'
pd.set_option('mode.chained_assignment','raise')

print('Parsing data for Liulin')
store = pd.HDFStore(hdf,mode='w')
df = pd.read_csv('liulin-all.dat', sep=',', header=0, parse_dates=[0])
df.set_index('timestamp',drop=False,inplace=True)
store.put('liulin', df, format='t',complevel=9, complib='blosc')
store.close()
del(df)
gc.collect()

compute_flux(hdf,'A1','A1-2017-2018-04.csv', 267)
compute_flux(hdf,'A2','A2-2017-2018-04.csv', 267)

print('All data saved to '+repr(hdf))

Parsing data for Liulin
Computing flux for A1-2017-2018-04.csv


Skipping line 12375: expected 521 fields, saw 916

Skipping line 14838: expected 521 fields, saw 1040



......... 
871117 records parsed.

Computing flux for A2-2017-2018-04.csv


Skipping line 12360: expected 521 fields, saw 917

Skipping line 17177: expected 521 fields, saw 562



.......

Skipping line 781443: expected 521 fields, saw 884



.. 
870366 records parsed.

All data saved to 'moussala.h5'
