### Pipeline ID : dataclean

### Input Description

RAW OHLC data.

### Output  

Clean OHLC data in a hdf store

### Operations

This code takes a financial market data file and runs it through a processing pipeline. The following operations are carried out :

- Localise the time data to market time
- Merge with existing RAW data based on datetime
- Save the resulting RAW data to HDF5

In [4]:
#!pip install --upgrade ../../quantutils
#git+https://github.com/cwilko/quantutils.git
    
import os
import json
import pandas
import numpy
    
import quantutils.dataset.pipeline as ppl
from quantutils.api.bluemix import ObjectStore
from quantutils.api.marketinsights import MarketInsights

PIPELINE_ID = "marketdirection"

    
##############
## Pipeline ##
##############

CONFIG_FILE = "../datasets/rawConvert.json"

with open(CONFIG_FILE) as data_file:    
    config = json.load(data_file)

DS = config["datasources"]

objStore = ObjectStore('cred/object_storage_cred.json')
mi = MarketInsights('cred/MIOapi_cred.json')

markets = dict()
## Loop over datasources...

for datasource in DS:
    
    DS_path = config["dataPath"] + datasource["name"] + "/"
    SRC_path = DS_path + "raw/"
        
    # Get HDFStore
    hdfFile = DS_path + datasource["name"] + ".hdf"
    print hdfFile
    hdfStore = pandas.HDFStore(hdfFile)
    
    for timeseries in datasource["timeseries"]:
        
        # Load Dataframe from store
        if timeseries["name"] in hdfStore:
            tsData = hdfStore[timeseries["name"]]
        else:
            tsData = pandas.DataFrame()
                        
        ## Loop over any source files...
        for infile in os.listdir(SRC_path):          

            newData = ppl.loadRawData(datasource, timeseries, SRC_path, infile)
            if not newData is None:

                ### RAW PIPELINE #############################################

                newData = ppl.localize(newData, datasource["timezone"], timeseries["timezone"])
                
                tsData = ppl.merge(newData, tsData)                
                
                ##############################################################  
        
        ppl.save_hdf(tsData, timeseries["name"], hdfStore)
         

hdfStore.close()


/home/cwilkin/Development/repos/marketinsights-pipeline/datasets/tradefair/tradefair.hdf
Adding WallSt-hourly-050517.txt to DOW table
Converting from Europe/London to US/Eastern
Merging data...
Adding WallSt-hourly-071116.txt to DOW table
Converting from Europe/London to US/Eastern
Merging data...
Adding WallSt-hourly-200318.txt to DOW table
Converting from Europe/London to US/Eastern
Merging data...
Adding WallSt-hourly-120217.txt to DOW table
Converting from Europe/London to US/Eastern
Merging data...
Adding WallSt-hourly-021116.txt to DOW table
Converting from Europe/London to US/Eastern
Merging data...
Adding WallSt-hourly-160517.txt to DOW table
Converting from Europe/London to US/Eastern
Merging data...
Adding WallSt-hourly-301016.txt to DOW table
Converting from Europe/London to US/Eastern
Merging data...
Adding WallSt-hourly-230617.txt to DOW table
Converting from Europe/London to US/Eastern
Merging data...
Adding WallSt-hourly-091116.txt to DOW table
Converting from Europe/Lon


object name is not a valid Python identifier: u'WallSt-hourly'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though



Converting from US/Eastern to US/Eastern
Merging data...
Adding D&J-IND_130101_141231.csv to DOW table
Converting from US/Eastern to US/Eastern
Merging data...
Adding D&J-IND_161003_180319.csv to DOW table
Converting from US/Eastern to US/Eastern
Merging data...
Saved data to HDFStore: /D&J-IND
Adding SANDP-500_130101_141231.csv to SPY table



object name is not a valid Python identifier: u'D&J-IND'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though



Converting from US/Eastern to US/Eastern
Merging data...
Adding SANDP-500_150101_170519.csv to SPY table
Converting from US/Eastern to US/Eastern
Merging data...
Adding SANDP-500_161003_180319.csv to SPY table
Converting from US/Eastern to US/Eastern
Merging data...
Saved data to HDFStore: /SANDP-500



object name is not a valid Python identifier: u'SANDP-500'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though

