Beefing up command line dynamic data handling functionality #11

prakaa · 2020-06-10T00:22:46Z

PR to primarily improve programmatic functionality in handling dynamic (time-series) data.

Changes include:

Modification of dyanmic_data_compiler function to include:

Optional user input of file format to store data. While feather might have faster read/write, parquet has excellent compression characteristics and good compatability with packages for handling large on-memory/cluster datasets (e.g. Dask). This helps with local storage (especially for Causer Pays data) and file size for version control
Option to retain or delete downloaded csvs in cache
Option to not merge downloaded data. Useful when NEMOSIS is used to download large volumes of data for further processing.

See docstring below:

def dynamic_data_compiler(start_time, end_time, table_name, raw_data_location,
                          select_columns=None, filter_cols=None,
                          filter_values=None, fformat='feather',
                          keep_csv=True, data_merge=True, **kwargs):

    Downloads and compiles data for all dynamic tables.
    Refer to README for tables
    Args:
        start_time (str): format 'yyyy/mm/dd HH:MM:SS'
        end_time (str): format 'yyyy/mm/dd HH:MM:SS'
        table_name (str): table as per documentation
        raw_data_location (str): directory to download and cache data to.
                                 existing data will be used if in this dir.
        select_columns (list): return select columns
        filter_cols (list): filter on columns
        filter_values (list): filter index n filter col such that values are
                              equal to index n filter value
        fformat (string): "feather" or "parquet" for storage and access
        keep_csv (bool): retains CSVs in cache
        data_merge (bool): concatenate DataFrames.
        **kwargs: additional arguments passed to the pd.to_{fformat}() function
    Returns:
        all_data (pd.Dataframe): All data concatenated.

Added FCAS Providers as a static table. This reads another tab of the Generators and Exemptions xlsx from AEMO
Generalisation and exception handling in downloader functions with some more descriptive errors. The most important error handling is for Causer Pays data. Each .zip is for 30 min intervals, but when unzipped, each .csv file is for 5 minutes.
Omission of rows of data in filters if date format in raw data is incorrect (in some raw files, there have been cases where null or "incorrect" rows of data contained a timestamp that was formatted incorrectly)
Write files names generalised, not just feather
Tests for static tables now include start and end date as required by function
setup.py now includes xlrd, which is require by pandas.read_excel
Some files include wrapping lines to comply with PEP-8 80 char lines

TODO:

README needs to be updated to highlight new command-line functionality with examples

…g tests

…ceptions and http errors

…SW-CEEM-master

…r exception handling

…return data

…d .CSV

…ceptions and http errors

…r exception handling

…return data

…d .CSV

…o pocket-rocket-nemosis

prakaa added 30 commits February 11, 2020 13:47

xlrd added to deps for data fetch methods using pands.read_excel

2c94edc

removed dates from static tables and bug fix to static_xl

c2bd1f1

remove all provision of start and end time to static tables, includin…

0cc50ee

…g tests

updated URL for FCAS variables table

5cf8f65

added user-agent header to requests and error handling for request ex…

28ca541

…ceptions and http errors

add all HTTP 400 error codes to handler

9e259a2

bugfix: bitwise operators converted to boolean

1d0843b

Merge branch 'master' of https://github.com/UNSW-CEEM/NEMOSIS into UN…

7ce3895

…SW-CEEM-master

modify fetch methods to ouput multiple formats and clean up downloade…

c5c2ed9

…r exception handling

fixed bugs for HTTP 400 error catching

6bb4795

print warning not raise http error as intermediate fcas times do not …

fcba777

…return data

add option in data fetch methods to not concatenate data

e86f080

option to pass kwargs to data format writer (pd.to_*). Detect .csv an…

43c0d9b

…d .CSV

filter function on date now omits data with invalid dates

5246897

xlrd added to deps for data fetch methods using pands.read_excel

05ce336

removed dates from static tables and bug fix to static_xl

d57c459

bug fix for static_xl

31d072d

updated URL for FCAS variables table

e59e73a

added user-agent header to requests and error handling for request ex…

c3cffa0

…ceptions and http errors

add all HTTP 400 error codes to handler

01da430

bugfix: bitwise operators converted to boolean

f5b7e1e

modify fetch methods to ouput multiple formats and clean up downloade…

11ea674

…r exception handling

fixed bugs for HTTP 400 error catching

677d076

print warning not raise http error as intermediate fcas times do not …

6576e1c

…return data

add option in data fetch methods to not concatenate data

5519004

option to pass kwargs to data format writer (pd.to_*). Detect .csv an…

823d72b

…d .CSV

filter function on date now omits data with invalid dates

8e69420

read function infers dtypes, does not cast to string

5e5491e

add start and end time back into static_xl

de0d97d

merge to keep start and end time in fetch methods

2ebea07

prakaa added 6 commits March 22, 2020 11:39

extract fcas providers table from registration and exemption list

aaac2df

Merge branch 'pocket-rocket-nemosis' of github.com:prakaa/NEMOSIS int…

d24abad

…o pocket-rocket-nemosis

add start and end dates to data-fetch-methods. need to update tests, etc

5a7f38b

start+end times in static table fn calls and tests

fbc8777

fix fetching when csv fformat specified

5f6e5e4

for parquet and csv (where possible) remove index from file write

2d5548e

prakaa changed the title ~~Beefing up command line Causer Pays functionality~~ Beefing up command line dynamic data handling functionality Jun 10, 2020

nick-gorman merged commit 47acd8d into UNSW-CEEM:master Feb 28, 2021

nick-gorman mentioned this pull request Mar 1, 2021

Added optional 'create_feather' parameter. #14

Closed

prakaa deleted the pocket-rocket-nemosis branch March 4, 2021 02:12

prakaa mentioned this pull request May 8, 2021

API functionality revamp, text fixes, README revamp #15

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beefing up command line dynamic data handling functionality #11

Beefing up command line dynamic data handling functionality #11

prakaa commented Jun 10, 2020 •

edited

Loading

Beefing up command line dynamic data handling functionality #11

Beefing up command line dynamic data handling functionality #11

Conversation

prakaa commented Jun 10, 2020 • edited Loading

prakaa commented Jun 10, 2020 •

edited

Loading