-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector averaging upgrade #94
Conversation
merge new loads functions
Loads reorganization
Thank you @hivanov-nrel. The 33 commits ahead is essentially just how many commits you are adding to the repository. If you were 0 ahead then you would have no changes. Can you check out why your commits are not passing the build before we review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for adding this update!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Chris I have a couple of things we need to change and a couple of questions. An overarching question I have is about why I would use the get_statistics
function instead of just calling *.describe()
. I get the same answer using the .describe()
and it is all in one DataFrame.
I would say if we keep the get_statistics
function we need to adjust the results so that the returned index is meaningful. Currently the function is returning the first time index for min, max etc. I believe we should be returning 1 DataFrame indexed by max, min etc like 'describe()
does.
mhkit/river/performance.py
Outdated
@@ -104,7 +104,7 @@ def tip_speed_ratio(rotor_speed,rotor_diameter,inflow_speed): | |||
''' | |||
Function used to calculate the tip speed ratio (TSR) of a MEC device with rotor | |||
|
|||
Parameters | |||
Parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The colons on the end of Parameters and Return will not build correctly. Please remove the colons from all Parameters and Returns in the docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird one related to my initial concerns when i posted this PR. I know this change had already been implemented in a previous PR months ago (as with a lot of the commits that are labeled as "ahead" of the master). Rebecca had mentioned it might be the case where I worked on something before doing a git pull and this caused the commit history to get screwy. Anyways, I will update this particular change. My main concern is that when this gets merged into the master, it doesn't mess up its commit history somehow. I am not much of an expert on Github so maybe it's harmless, but just wanted to throw that out there.
mhkit/utils.py
Outdated
for v in vector_channels: | ||
Ux = sum(np.sin(datachunk[v]*np.pi/180))/len(datachunk) | ||
Uy = sum(np.cos(datachunk[v]*np.pi/180))/len(datachunk) | ||
vector_avg = (90 - np.arctan2(Uy,Ux)*180/np.pi) # number doesnt seem right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning comment. Let us resolve this prior to merging. What is the issue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this is just a leftover comment when i was developing....will erase
mhkit/utils.py
Outdated
if vector_avg<0: vector_avg = vector_avg+360 | ||
elif vector_avg>360: vector_avg = vector_avg-360 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the vector average could be off by more than 1 rotation? E.g. 720 degrees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is related to the previous line where you calculate vector_avg where the arctan2 output could be greater than 90, which results in a negative angle. The if statement converts that to a positive angle. The second elif is unnecessary so ill remove.
mhkit/utils.py
Outdated
if vector_avg<0: vector_avg = vector_avg+360 | ||
elif vector_avg>360: vector_avg = vector_avg-360 | ||
means[i][v] = vector_avg # overwrite scalar average for channel | ||
magsum = round((Ux**2 + Uy**2)*1e8)/1e8 # round to 8th decimal place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we rounding this number to single precision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to prevent very rare cases where roundoff error in some data acquisition systems can cause the following calculation to hiccup. Could be unnecessary.
@ssolson Thanks for reviewing. Sorry for the late reply. I am actively working to answer each of your comments. As for the overarching concern that you had....i agree with you that describe gets you the same type of results for most cases. But if you are directly comparing describe() to get_statistics(), I do think that get_statistics() is more robust in handling more types of cases. For instance, it does a qc check on the raw timestamps, it allows the user to resample the data regardless of file/dataframe length, and the vector averaging upgrade is important for directional channels (which I assume users will definitely have). So I think it comes down to whether we should use describe() within the get_statistics function. The main difference there being how we want the output to be organized. In my experience, it is most useful to organize the stats indexed by a timestamp, especially when you are looping through thousands of 10 minute files and concatenating all of that data together. I am not entirely happy that the result ends with having 4 separate dataframes. I personally would use a multi-level dataframe indexed by time and stat, but i dont think thats possible within MHKit. Alternatively, if describe() is applied, I think you would run into issues with the way the data is organized especially when concatenating all the files together. Let me know if this doesn't make sense. |
Hey Chris I finally got a chance to review this again with your updated input. I believe what you have written here is the pandas The quick timestep check seems useful. I think the vector averaging thing you have done here seems very useful and I almost think it should be its own function. Then we could use this vector averaging function outside of the statistics. What do you think about breaking out the vector averaging into its own function (is this useful?) and maybe utilizing the builtin resample function in the get stats here to make the code more readable and utilize optimizations made by the pandas team for this type of data analysis? |
@ssolson I would be in favor of breaking out vector averaging to be a separate function, but i think it doesnt hurt to also keep it in Thoughts? |
Thanks for explaining Chris. I believe I see the problem you are solving here. Okay so if you think vector averaging would be useful generall then I believe we are on the same page. I was not suggesting removing vector averaging from the current get_stats function but to call the proposed/ new broken out vector averaging function from within get stats. I outline the idea below where we call vec_avg from get_stats. I think we are in agreement here. Thank you again for working with me to explain this.
|
No problem. Glad we are on the same page. I made the update and pushed the commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chris thank you for the quick turn around I have couple of minor comments and questions.
mhkit/utils.py
Outdated
Parameters | ||
---------- | ||
data : pandas Series, numpy array, list | ||
Vector channel to calculate statistics on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the data in this function is an array-like structure containing data in degrees e.g. (63, 93, 65, 54, 101,...)?
Probably worth noting that the data must be in degrees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We definitely should be explicit about that.
mhkit/utils.py
Outdated
if not np.isreal(epsilon): # check if epsilon is imaginary (error) | ||
vector_std = 0 | ||
print('WARNING: vector averaging error in calculating epsilon') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says error. The warning message calls it an error. I think we need a bit more clarity around this.
I think the warning should contain why it was flagged e.g. 'WARNING: epsilon contains imaginary values'. Also what is epsilon and what would cause this to have imaginary values? Can we provide better feedback to the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epsilon being imaginary comes from the infinitesimal chance that the magsum
variable is greater than 1. I basically eliminate this possibility with the roundoff error correction i have in place in the step before it but i left this if statement as a catch-all. But i dont expect this warning to ever be thrown (hopefully).
mhkit/utils.py
Outdated
@@ -76,6 +86,43 @@ def get_statistics(data,freq,period=600): | |||
|
|||
return means,maxs,mins,stdevs | |||
|
|||
def vector_averaging(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a dedicated test for this function? It will help make it clear that it is being tested and serve as an example for using it by itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
else: | ||
vector_std = np.arcsin(epsilon)*(1+0.1547*epsilon**3)*180/np.pi | ||
|
||
return vector_avg, vector_std |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like this function might be called directional_data_stats
because it returns more than just the average.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought....i initially left it as vector_averaging
since its what everyone refers to although i agree its technically not appropriate. I have updated this to be called vector_statistics
instead. I think thats close enough to where people should be able to recognize what its for.
mhkit/utils.py
Outdated
@@ -22,7 +22,9 @@ def get_statistics(data,freq,period=600): | |||
Sample rate of data [Hz] | |||
period : float/int | |||
Statistical window of interest [sec], default = 600 | |||
|
|||
vector_channels : string or list (optional) | |||
List of channel names that are to be vector averaged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these channels required to be in units of degrees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes....ill make that explicit as well.
Chris we also just got the testing fixed. Could you merge the main repo changes into your branch so the builds will start to pass? |
Fantastic work Chris! I will merge these changes now. |
Hi everyone, I am not sure why this is saying that I am 33 commits ahead of the main master. I am pretty sure most of those commits were implemented with my upgrade to loads module PR....let me know if I am missing something.
Anyways, this should only involve implementation of the vector averaging upgrade commit. It adds functionality to the stats function so that you more accurately calculate stats for directional channels that need vector averaging.