# Parallel writes of fewer files to csvs 

because writing to the database is taking a very long time, we will instead use `dask` to read in and parse a couple of files in parallel. We will save out to spreadsheets since we are doing parallel writes. 

## local install
Install mosparse library locally

In [None]:
%pip install -ve ../mosparse/

## Import

In [2]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


The libraries that will be used for this example.

In [3]:
from pathlib import Path

from tqdm.auto import tqdm

import dask.bag as db
from dask.diagnostics import ProgressBar

import mosparse.mavreader as mpr
import mosparse.mavparse as mpp

## Find the 2017 files
Directions to get to the file you want to open.

In [4]:
mos_files = list(Path("../../avnmav/").glob("mav2018*")) + list(Path("../../avnmav/").glob("mav2019*"))

In [5]:
len(mos_files)

96

## convert the .gz file to a csv
Extracts data from filepath, parses the file into a spreadsheet, saves out as a .csv

In [8]:
def process_file(filepath):
    columns = ['station', 'runtime', 'ftime', 'N/X', 'X/N','Q06', 'Q12']
    with open("completed_files.txt", 'a') as f:
        with mpr.MavReader(filepath, stations=True) as station_generator:
            [mpp.write_station(station, filename=f'{filepath.name}.csv', columns=columns) for 
                 station in station_generator if len(station)>0]
            print(filepath.name, file=f)

## concurrently process files
apply the process_file function to multiple files in the mos_files list at the same time

In [None]:
pb = db.from_sequence(mos_files).map(process_file)
with ProgressBar():
    pb.compute()

[#####                                   ] | 14% Completed |  1hr 46min 58.2s