Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reeds #18

Merged
merged 95 commits into from
Jan 9, 2020
Merged

Reeds #18

merged 95 commits into from
Jan 9, 2020

Conversation

MRossol
Copy link
Collaborator

@MRossol MRossol commented Nov 29, 2019

reV to ReEDS pipeline w/ CLI

@MRossol MRossol added the feature New feature or request label Nov 29, 2019
@MRossol
Copy link
Collaborator Author

MRossol commented Nov 29, 2019

Main functionality implemented along with a CLI
TODO: Create tests, will do so before chat on Thursday

@MRossol
Copy link
Collaborator Author

MRossol commented Nov 30, 2019

Test complete.

@mmowers see /reVX/tests/data/reeds/ReEDS_* for the current outputs

@mmowers
Copy link
Collaborator

mmowers commented Dec 2, 2019

@MRossol, awesome thanks, I was on the master branch instead of reeds, so those files weren't there.

I see the supply curve in region/class/bin designations in ReEDS_classifications.csv, capacity factor means and standard deviations by timeslice in ReEDS_Timeslice_means.csv and ReEDS_Timeslice_stdevs.csv, and representative profiles in ReEDS_Profiles.h5.

Some thoughts/questions (we can discuss on the phone if easier):

  1. Are you still ultimately shooting for the input/output structure of the doc? If not we should discuss and align.
  2. Are all these outputs all for PV or onshore wind?
  3. ReEDS_classifications.csv:
    1. is site_lcoe called "mean_lcoe"?
    2. Do you have a run of the full supply curve somewhere as well that i could see?
    3. What is the "res_class" column referring to?
    4. I see two region columns, "reeds_region" and "region". They look identical, are they?
    5. "bin" is the supply curve bin, correct?
    6. In this case, is "class" is based on wind speed or LCOE?
  4. ReEDS_Timeslice_means.csv and ReEDS_Timeslice_stdevs.csv:
    1. What do the column headers mean? region/class/bin? If so, we don't need the bin designation- we only need this data by timeslice, region, and class (see "performance_[tech].csv" in the doc)
  5. ReEDS_Profiles.h5:
    1. What are the separate tables, rep_profiles_0, rep_profiles_1, and rep_profiles_2?
    2. It looks like there may be a separate profile for reach region/bin/class. If so, we only need a profile by region and class (see "hourly_cf_[tech].pkl" in the doc)
    3. It looks like this data is for every 30 mins, could we get this for hourly instead?

@MRossol
Copy link
Collaborator Author

MRossol commented Dec 2, 2019

@mmowers See inline below:

  1. Are you still ultimately shooting for the input/output structure of the [doc]
    (https://docs.google.com/document/d/1SEOafxhZphXw7nFARVpQ4L1J22GZNh45wKkyCBqBdEU)? If not we should discuss and align.
  • We are happy to try and create the outputs you need, this was the first attempt, I would like to discuss the input formats
  1. Are all these outputs all for PV or onshore wind?
  • These are all for PV, it was just for testing purposes, i.e. its what we had
  1. ReEDS_classifications.csv:
    1. is site_lcoe called "mean_lcoe"?
    • yes
    1. Do you have a run of the full supply curve somewhere as well that i could see?
    • not yet
    1. What is the "res_class" column referring to?
    • It is the resource class defined by reV to select technology
    1. I see two region columns, "reeds_region" and "region". They look identical, are they?
    • In this case they are the same, I can walk through how it works tomorrow
    1. "bin" is the supply curve bin, correct?
    • We need to talk about this tomorrow as I am very confused by your nomenclasture between resource classes (TRGs), supply curve bins, and "clusters"
    1. In this case, is "class" is based on wind speed or LCOE?
    • I know this is not "accurate" but these are TRG classes even though its for solar, I wanted to test the code
  2. ReEDS_Timeslice_means.csv and ReEDS_Timeslice_stdevs.csv:
    1. What do the column headers mean? region/class/bin? If so, we don't need the bin designation- we only need this data by timeslice, region, and class (see "performance_[tech].csv" in the doc)
    • Good to know can update after we confirm what classes are vs bins...
  3. ReEDS_Profiles.h5:
    1. What are the separate tables, rep_profiles_0, rep_profiles_1, and rep_profiles_2?
    • This is an example of pulling 3 representative profiles for each "region", 0 is the most representative, followed by 1, then 2.
    1. It looks like there may be a separate profile for reach region/bin/class. If so, we only need a profile by region and class (see "hourly_cf_[tech].pkl" in the doc)
    2. It looks like this data is for every 30 mins, could we get this for hourly instead?
    • the NSRDB is natively at 30minutes and my preference would be to NOT downscale to hourly. you can instead just pull every other datapoint if you want hourly. It seems short-sighted to remove data...

Copy link
Member

@grantbuster grantbuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall but will definitely benefit from a walk through. I assume we'll have other thoughts after tomorrow's regroup but here are some initial thoughts.

if isinstance(cluster_on, str):
cluster_on = [cluster_on, ]

data = RPMClusters._normalize_values(rev_table[cluster_on].values,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well move this method to a common utility repo with the clustering algorithms?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, will do

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to the ClusterMethods class

logger = logging.getLogger(__name__)


class ReedsProfiles(RepProfiles):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did 90% of the work!

profile.
n_profiles : int
Number of representative profiles to save to fout.
bins : None | str | pandas.DataFrame | pandas.Series | dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want more description on what None does since its default. Also maybe corresponding description of the reg_cols defaults.

raise ReedsValueError(msg)

index_col = [c for c in timeslice_map.columns
if 'time' in c.lower()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the search string should be "datetime" or even "datetimeindex". I'm picturing two columns, one with "datetimeindex" and the other with "timeslice_id" (has "time" in it!). Plus all of the error messages say "datetime".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit I got lazy here, but I agree, will update

"""
means = []
stdevs = []
for s, slice_map in timeslices.groupby('slice'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that your slice ID column is just called "slice". Can you add a check for this in _parse_timeslices()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, meant to run the groupby on all remaining columns, will fix

Create ReEDS timeslices from region-bin-class groups and representative
profiles
"""
def __init__(self, rep_profiles, timeslice_map, meta=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to verify that the timeslice statistics are calculated from representative profiles and not profiles from all sites in a region/timeslice. I'm totally not sure, but that would be a high level misunderstanding.

Looking at @mmowers' doc, it would appear this is an open question.

stdevs = pd.concat(stdevs, axis=1).T
stdevs.index.name = 'timeslice'

return means, stdevs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean method! Nice!

logger = logging.getLogger(__name__)


class ReedsClassifier:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good but I definitely would benefit from a walk through :)

@grantbuster
Copy link
Member

Michael and Grant code review notes:

Classification items:

  1. Expose "groups" (region, bin, class) as properties: groups, keys. (Done)
  2. Raise key error in getitem (Done)
  3. Agg table - specify mean/sum for different vars (Done)
  4. Region input might be a shape file? Should be a global utility with cli. (I can do this)

Timeslices:

  1. Might want to run with stats on the full CF profiles for all sites in each unique region/bin/class/timeslice. Might want to pass in gen output handler and gid list to parallel workers.

@MRossol
Copy link
Collaborator Author

MRossol commented Dec 7, 2019

@grantbuster
Reeds should be done, I've implemented legacy formats for profiles and timeslices.

I would love your eyes on these two formatting methods in ReedsTimeslices as they are really slow but I'm not sure how else to speed them up without a for loop...
def _flatten_timeslices(table, value_name, reg_cols):
def _create_correlation_table(corr_coeffs, reg_cols):

The timeslice CLI entry needs to be updated on your branch, hopefully we can do that and merge it monday...

@grantbuster
Copy link
Member

Call with Matt:
• Regions - need to allow a subset of regions
• Timeslices - option to use hour number (to "match" end of hour data)
• Classes - Make sure classes can be in any sorted order (min/max)
• Classes need to start at 1
• Bins need to start at 1

@grantbuster
Copy link
Member

Merging this pull request since we've tested thoroughly.

@grantbuster grantbuster merged commit e646323 into master Jan 9, 2020
@grantbuster grantbuster deleted the reeds branch January 9, 2020 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants