Vector averaging upgrade #94

hivanov-nrel · 2021-02-10T00:10:20Z

Hi everyone, I am not sure why this is saying that I am 33 commits ahead of the main master. I am pretty sure most of those commits were implemented with my upgrade to loads module PR....let me know if I am missing something.

Anyways, this should only involve implementation of the vector averaging upgrade commit. It adds functionality to the stats function so that you more accurately calculate stats for directional channels that need vector averaging.

merge new loads functions

… into loads

Loads

Loads reorganization

… into loads

Loads

ssolson · 2021-02-12T14:21:41Z

Thank you @hivanov-nrel. The 33 commits ahead is essentially just how many commits you are adding to the repository. If you were 0 ahead then you would have no changes.

Can you check out why your commits are not passing the build before we review this PR?

…pull94

rpauly18

Looks good. Thanks for adding this update!

ssolson

Hey Chris I have a couple of things we need to change and a couple of questions. An overarching question I have is about why I would use the get_statistics function instead of just calling *.describe(). I get the same answer using the .describe() and it is all in one DataFrame.

I would say if we keep the get_statistics function we need to adjust the results so that the returned index is meaningful. Currently the function is returning the first time index for min, max etc. I believe we should be returning 1 DataFrame indexed by max, min etc like 'describe() does.

ssolson · 2021-02-26T21:04:36Z

mhkit/river/performance.py

@@ -104,7 +104,7 @@ def tip_speed_ratio(rotor_speed,rotor_diameter,inflow_speed):
    '''
    Function used to calculate the tip speed ratio (TSR) of a MEC device with rotor

-    Parameters
+    Parameters:


The colons on the end of Parameters and Return will not build correctly. Please remove the colons from all Parameters and Returns in the docstrings.

This is a weird one related to my initial concerns when i posted this PR. I know this change had already been implemented in a previous PR months ago (as with a lot of the commits that are labeled as "ahead" of the master). Rebecca had mentioned it might be the case where I worked on something before doing a git pull and this caused the commit history to get screwy. Anyways, I will update this particular change. My main concern is that when this gets merged into the master, it doesn't mess up its commit history somehow. I am not much of an expert on Github so maybe it's harmless, but just wanted to throw that out there.

ssolson · 2021-02-26T21:10:47Z

mhkit/utils.py

+            for v in vector_channels:
+                Ux = sum(np.sin(datachunk[v]*np.pi/180))/len(datachunk)
+                Uy = sum(np.cos(datachunk[v]*np.pi/180))/len(datachunk)
+                vector_avg = (90 - np.arctan2(Uy,Ux)*180/np.pi) # number doesnt seem right


Concerning comment. Let us resolve this prior to merging. What is the issue here?

Sorry this is just a leftover comment when i was developing....will erase

ssolson · 2021-02-26T21:12:20Z

mhkit/utils.py

+                if vector_avg<0: vector_avg = vector_avg+360
+                elif vector_avg>360: vector_avg = vector_avg-360              


Is it possible that the vector average could be off by more than 1 rotation? E.g. 720 degrees?

No this is related to the previous line where you calculate vector_avg where the arctan2 output could be greater than 90, which results in a negative angle. The if statement converts that to a positive angle. The second elif is unnecessary so ill remove.

ssolson · 2021-02-26T21:13:14Z

mhkit/utils.py

+                if vector_avg<0: vector_avg = vector_avg+360
+                elif vector_avg>360: vector_avg = vector_avg-360              
+                means[i][v] = vector_avg # overwrite scalar average for channel
+                magsum = round((Ux**2 + Uy**2)*1e8)/1e8 # round to 8th decimal place


Why are we rounding this number to single precision?

This is to prevent very rare cases where roundoff error in some data acquisition systems can cause the following calculation to hiccup. Could be unnecessary.

hivanov-nrel · 2021-03-15T21:00:07Z

@ssolson Thanks for reviewing. Sorry for the late reply. I am actively working to answer each of your comments.

As for the overarching concern that you had....i agree with you that describe gets you the same type of results for most cases. But if you are directly comparing describe() to get_statistics(), I do think that get_statistics() is more robust in handling more types of cases. For instance, it does a qc check on the raw timestamps, it allows the user to resample the data regardless of file/dataframe length, and the vector averaging upgrade is important for directional channels (which I assume users will definitely have).

So I think it comes down to whether we should use describe() within the get_statistics function. The main difference there being how we want the output to be organized. In my experience, it is most useful to organize the stats indexed by a timestamp, especially when you are looping through thousands of 10 minute files and concatenating all of that data together. I am not entirely happy that the result ends with having 4 separate dataframes. I personally would use a multi-level dataframe indexed by time and stat, but i dont think thats possible within MHKit. Alternatively, if describe() is applied, I think you would run into issues with the way the data is organized especially when concatenating all the files together.

Let me know if this doesn't make sense.

ssolson · 2021-03-22T14:17:37Z

Hey Chris I finally got a chance to review this again with your updated input. I believe what you have written here is the pandas resample function with specific methods (mean, max, min, std). https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html
This method has pad method for handling NaNs.
I am not aware of any limits to the DataFrame size with respect to describe or resample which you mention for calling describe above.

The quick timestep check seems useful.

I think the vector averaging thing you have done here seems very useful and I almost think it should be its own function. Then we could use this vector averaging function outside of the statistics. What do you think about breaking out the vector averaging into its own function (is this useful?) and maybe utilizing the builtin resample function in the get stats here to make the code more readable and utilize optimizations made by the pandas team for this type of data analysis?

hivanov-nrel · 2021-03-22T20:31:26Z

@ssolson resample is a great function which I use from time to time. The reason I created get_statistics was to circumvent one caveat that i have found. And this is mostly centered around user workflow and IEC standards rather than pure functionality. So IEC standard usually specifies that data is stored as 10 minutes files. When recording data the start and end time of a file could be 2021-01-01 12:03:47 to 2021-01-01 12:13:47. When post-processing hundreds of these files, you usually run a loop that parses each file, formats it, calculates, and stores statistics. If you were to use resample on that file to calculate statistics, you would get two values that correspond to times starting at 12:00:00 and 12:10:00 (and they would be incorrect). The function i created avoids this by specifically grabbing chunks of continuous 10 min data regardless of start time and stores the statistics based on that first timestamp value. This is pretty standard timestamp indexing as later on in the data analysis process when you make plots and find outliers, you can easily map those outliers to the specific 10 minute file that contains them. From there you can troubleshoot the raw data and figure out what happened. Maybe there is a way to force resample to do this, but I am not aware.

I would be in favor of breaking out vector averaging to be a separate function, but i think it doesnt hurt to also keep it in get_statistics as i imagine that it would be convenient for the user to let the function do it all in one step. But I am sure there might be cases where vector averaging can be used independently.

Thoughts?

ssolson · 2021-03-22T20:39:56Z

Thanks for explaining Chris. I believe I see the problem you are solving here. Okay so if you think vector averaging would be useful generall then I believe we are on the same page. I was not suggesting removing vector averaging from the current get_stats function but to call the proposed/ new broken out vector averaging function from within get stats. I outline the idea below where we call vec_avg from get_stats. I think we are in agreement here. Thank you again for working with me to explain this.

def vec_avg():
    insert math here
   return vector_avg

def get stats():
    calc stats
    if vector avg:
        vector_avg = vec_avg()
    return stats
'''

hivanov-nrel · 2021-03-23T00:17:58Z

No problem. Glad we are on the same page. I made the update and pushed the commit.

ssolson

Chris thank you for the quick turn around I have couple of minor comments and questions.

ssolson · 2021-03-23T14:46:24Z

mhkit/utils.py

+    Parameters
+    ----------
+    data : pandas Series, numpy array, list
+        Vector channel to calculate statistics on


So the data in this function is an array-like structure containing data in degrees e.g. (63, 93, 65, 54, 101,...)?

Probably worth noting that the data must be in degrees?

Agreed. We definitely should be explicit about that.

ssolson · 2021-03-23T14:49:37Z

mhkit/utils.py

+    if not np.isreal(epsilon): # check if epsilon is imaginary (error)
+        vector_std = 0
+        print('WARNING: vector averaging error in calculating epsilon')


The comment says error. The warning message calls it an error. I think we need a bit more clarity around this.

I think the warning should contain why it was flagged e.g. 'WARNING: epsilon contains imaginary values'. Also what is epsilon and what would cause this to have imaginary values? Can we provide better feedback to the user?

Epsilon being imaginary comes from the infinitesimal chance that the magsum variable is greater than 1. I basically eliminate this possibility with the roundoff error correction i have in place in the step before it but i left this if statement as a catch-all. But i dont expect this warning to ever be thrown (hopefully).

ssolson · 2021-03-23T14:51:47Z

mhkit/utils.py

@@ -76,6 +86,43 @@ def get_statistics(data,freq,period=600):

    return means,maxs,mins,stdevs

+def vector_averaging(data):


Can you add a dedicated test for this function? It will help make it clear that it is being tested and serve as an example for using it by itself.

ssolson · 2021-03-23T14:56:28Z

mhkit/utils.py

+    else:
+        vector_std = np.arcsin(epsilon)*(1+0.1547*epsilon**3)*180/np.pi
+
+    return vector_avg, vector_std


So it looks like this function might be called directional_data_stats because it returns more than just the average.

I had the same thought....i initially left it as vector_averaging since its what everyone refers to although i agree its technically not appropriate. I have updated this to be called vector_statistics instead. I think thats close enough to where people should be able to recognize what its for.

ssolson · 2021-03-23T14:57:45Z

mhkit/utils.py

@@ -22,7 +22,9 @@ def get_statistics(data,freq,period=600):
        Sample rate of data [Hz]
    period : float/int
        Statistical window of interest [sec], default = 600 
-
+    vector_channels : string or list (optional)
+        List of channel names that are to be vector averaged


Are these channels required to be in units of degrees?

Yes....ill make that explicit as well.

ssolson · 2021-03-23T15:20:01Z

Chris we also just got the testing fixed. Could you merge the main repo changes into your branch so the builds will start to pass?

ssolson · 2021-03-24T22:21:57Z

Fantastic work Chris! I will merge these changes now.

Ivanov and others added 30 commits July 13, 2020 11:07

increase load range subdivision

deb835a

introduce script with uncategorized functions

e18d893

Merge branch 'master' of https://github.com/MHKiT-Software/MHKiT-Python

8de0e24

incorporate new functions into loads script

18f0ea0

minor improvements/bug fixes

7955c45

updates for new functions added

36cd62a

Merge pull request #2 from hivanov-nrel/develop

a376da8

merge new loads functions

Merge branch 'master' of https://github.com/MHKiT-Software/MHKiT-Python

5d25979

Merge branch 'master' of https://github.com/hivanov-nrel/MHKiT-Python

e862a10

ssolson comments v1

9b664e0

ssolson comments v2

ebc5622

Merge branch 'master' of https://github.com/MHKiT-Software/MHKiT-Python…

efb6d8b

… into loads

Fix the try/except. Added data check to plots.

ae57519

Break out the try/ except statements

b5c4d17

Remove assert isinstance and try/except for one error catching statement

830d1b3

Adjust the kwargs names

ea965ce

Merge pull request #3 from ssolson/loads

23ecfab

Loads

loads module reorg

861fc46

Remove Device

21d0541

Move all functions to river performance and mirror in tidal

49401f9

remove reference to device for performance

ba3497b

Adjust the tests

a5d5fc5

Update example to use new loads structure

a75b9a8

Merge pull request #4 from ssolson/loads

37e81c9

Loads reorganization

import numpy to graphics, minor docstring formatting changes

e81c9a1

Merge branch 'master' of https://github.com/MHKiT-Software/MHKiT-Python…

2599625

… into loads

Merge pull request #5 from ssolson/loads

da1c1aa

Loads

Merge branch 'master' of https://github.com/MHKiT-Software/MHKiT-Python

490a42b

Merge branch 'master' of https://github.com/hivanov-nrel/MHKiT-Python

3bb94cb

vector averaging upgrade

a01627b

Merge branch 'master' of https://github.com/MHKiT-Software/MHKiT-Python

73cedf4

hivanov-nrel requested review from rpauly18 and ssolson February 10, 2021 00:11

rpauly18 added 2 commits February 23, 2021 14:15

Merge branch 'master' of github.com:MHKiT-Code-Hub/mhkit-python into …

7a2d8ff

…pull94

minor updates to docstring

4199807

rpauly18 approved these changes Feb 23, 2021

View reviewed changes

rpauly18 mentioned this pull request Feb 23, 2021

adding vecto_channels to get_stats MHKiT-Software/MHKiT-MATLAB#53

Merged

ssolson reviewed Feb 26, 2021

View reviewed changes

hivanov-nrel added 3 commits March 17, 2021 16:27

Merge branch 'master' of https://github.com/MHKiT-Software/MHKiT-Python

ea3a7c7

missing docstring update from weird commit issue

1333704

vector average clarifications

73d4b75

vector averaging independent function

45bee50

ssolson reviewed Mar 23, 2021

View reviewed changes

hivanov-nrel added 2 commits March 23, 2021 15:54

vector statistics polishing

e4e3a96

Merge branch 'master' of https://github.com/MHKiT-Software/MHKiT-Python

3ea877d

ssolson merged commit 238d85f into MHKiT-Software:master Mar 24, 2021

rpauly18 mentioned this pull request Mar 25, 2021

Incorporate vector averging in get_statistics #82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector averaging upgrade #94

Vector averaging upgrade #94

hivanov-nrel commented Feb 10, 2021

ssolson commented Feb 12, 2021

rpauly18 left a comment

ssolson left a comment

ssolson Feb 26, 2021

hivanov-nrel Mar 15, 2021

ssolson Feb 26, 2021

hivanov-nrel Mar 15, 2021

ssolson Feb 26, 2021

hivanov-nrel Mar 17, 2021

ssolson Feb 26, 2021

hivanov-nrel Mar 17, 2021

hivanov-nrel commented Mar 15, 2021

ssolson commented Mar 22, 2021

hivanov-nrel commented Mar 22, 2021

ssolson commented Mar 22, 2021

hivanov-nrel commented Mar 23, 2021

ssolson left a comment

ssolson Mar 23, 2021

hivanov-nrel Mar 23, 2021

ssolson Mar 23, 2021

hivanov-nrel Mar 23, 2021

ssolson Mar 23, 2021

hivanov-nrel Mar 23, 2021

ssolson Mar 23, 2021

hivanov-nrel Mar 23, 2021

ssolson Mar 23, 2021

hivanov-nrel Mar 23, 2021

ssolson commented Mar 23, 2021

ssolson commented Mar 24, 2021

		if vector_avg<0: vector_avg = vector_avg+360
		elif vector_avg>360: vector_avg = vector_avg-360

		@@ -76,6 +86,43 @@ def get_statistics(data,freq,period=600):

		return means,maxs,mins,stdevs

		def vector_averaging(data):

Vector averaging upgrade #94

Vector averaging upgrade #94

Conversation

hivanov-nrel commented Feb 10, 2021

ssolson commented Feb 12, 2021

rpauly18 left a comment

Choose a reason for hiding this comment

ssolson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hivanov-nrel commented Mar 15, 2021

ssolson commented Mar 22, 2021

hivanov-nrel commented Mar 22, 2021

ssolson commented Mar 22, 2021

hivanov-nrel commented Mar 23, 2021

ssolson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssolson commented Mar 23, 2021

ssolson commented Mar 24, 2021