Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gb/misc features #72

Merged
merged 10 commits into from
Nov 27, 2019
Merged

Gb/misc features #72

merged 10 commits into from
Nov 27, 2019

Conversation

grantbuster
Copy link
Member

@grantbuster grantbuster commented Nov 26, 2019

Addition of four features:

  1. Generation can now pass through the main resource data arrays (dni, dhi, windspeed) (issue Generation to pass through resource data #71).
  2. The collection method that purges the chunked node files will now check to make sure that all source datasets are in the final output file before deleting chunked files. If any datasets are missing (not collected), a warning is printed and the chunks are not deleted (issue Collection purge chunks to protect against incomplete collection #70).
  3. Representative profiles can now take a profile weighting argument which is input as the "gid_counts" such that rep profiles are weighted by their exclusion fraction (issue Representative Profiles should consider weighting #66).
  4. A low-memory collection method was added to the collection handler. This collects one dataset from one file chunk at a time and writes directly from the source file to the final output file. Travis tested this and it works.

Pytests added for all except 2 (which i tested manually on eagle).

Copy link
Collaborator

@MRossol MRossol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. create a log_memory function in loggers.py to replace the memory logging throughout reV:
    log_memory(logger, level='DEBUG')
  2. Update DatasetCollector to no longer use SmartParallelJob and instead transfer file by file, dataset by dataset with smart logic to check to see if the entire file/dataset can fit in memory, if not then transfer as many chunks as will fit in memory at a time until file/dataset has been transfered

@grantbuster
Copy link
Member Author

@MRossol, added those two features. Cleaned up collection a lot. Made a low memory test, although I realize now that this will be kind of hardware dependent. Still, works on my machine!

Copy link
Collaborator

@MRossol MRossol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The architecture looks great, much cleaner.
A few questions comments on method names

"""
Add results from SmartParallelJob to out
@staticmethod
def _get_site_mem_req(shape, dtype, n=100):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused is this the memory for one site, one chunk, or the whole dataset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for one site. I realize i forgot to update some of the docstrings. Fixed.

.format(os.path.basename(fp_source), e))
raise e

def _low_mem_collect(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is the standard collection method I think we should rename it to just _collect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

@MRossol
Copy link
Collaborator

MRossol commented Nov 27, 2019

Rebase and merge away

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants