-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ticket/2594/dev #2609
Ticket/2594/dev #2609
Conversation
5ae3a2a
to
a5135dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions. One other issue is making sure that the data that is currently released and accessed via the ecephys_session objects can still be loaded from it's NWBs.
allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py
Outdated
Show resolved
Hide resolved
allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py
Outdated
Show resolved
Hide resolved
Does the ephys_session notebook do this? It currently runs through the notebook. |
# | ||
|
||
|
||
def eval_str(val): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add the type suggestions to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, since it takes in any value of any type and returns any value of any type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the return types are consistent at least.
allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py
Outdated
Show resolved
Hide resolved
allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py
Outdated
Show resolved
Hide resolved
You'll have to check. As I said during sprint planning, you're about the first of the current Pika's to look at this code. My guess is yes, but it wouldn't hurt to double check. |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
fe84bfc
to
22b80bc
Compare
f59aa0e
to
24b7838
Compare
I believe it does do it. I committed the re-ran notebook. You can see the comparison here: The table values are now either floating point or tuples instead of string. Cell 17 is a good overview in the differences. The re-run is oddly cut off at cell 36 in this preview. It isn't cut off when I view it locally. Seems like it's something to do with the animation.jhtml. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left some feedback
stimulus_presentations.fillna(nonapplicable, inplace=True) | ||
|
||
# pandas does not automatically convert boolean cols for fillna | ||
boolean_colnames = stimulus_presentations.dtypes[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pandas 1.1 introduced dropna
argument for groupby, which allows using na values as keys. If this is not a possibility, then I guess this is ok.
I also don't know what the use case is here, but I'm surprised we want to group by a missing key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call with the dropna. Although this would require a refactor since several parts of the code and notebook refer to the NA value as 'null'. I changed those to check for nan instead.
Some of these old files can require a good amount of linting once they're touched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after doing more testing, I discovered one cell of the notebook had a change in results. Apparently, there's an unresolved pandas bug with dropna in the groupby function when used with MultiIndex groupings. pandas-dev/pandas#36470
I reverted everything back to 'null'
if val.replace('.', '').isdigit(): # checks if val is numeric | ||
val = eval(val) | ||
elif val[0] == "[" and val[-1] == "]": # checks if val is list | ||
val = tuple(eval(val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will break on the string "[foo]" or for that matter "['foo',;[]]", so you need try/catch here, which I know Chris didn't like but I think it's necessary to catch any number of issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those don't seem like valid entries for any of the fields. It should fail regardless if it passes an eval statement.
I don't see any tests that check for invalid entries though?
# | ||
|
||
|
||
def eval_str(val): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already code for this in the codebase, search literal_col_eval
. Can that be repurposed? I know that function is in a specific module. Maybe it can be moved/modified to a general util module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this function to brain_observatory/behavior/swdb/utilities.py
if val.replace('.', '').isdigit(): # checks if val is numeric | ||
val = eval(val) | ||
elif val[0] == "[" and val[-1] == "]": # checks if val is list | ||
val = tuple(eval(val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place to convert to tuple, this function should just eval. Conversion to tuple should be done outside of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated this to two functions since I think it makes the code better organized through modularization. However, pandas apply is not an efficient way to iterate through all the rows, and running it twice may have additional overhead.
""" | ||
|
||
if isinstance(val, str): | ||
if val.replace('.', '').isdigit(): # checks if val is numeric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why you need to check if it is a number of list ahead of time, can't you just call eval
if it is a string? That should return the correct thing regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be issues if a string of characters are inputted into eval, like with the column stimulus_name
.
The alternative is to specify which columns you want to eval. I thought coming up with a set of rules to decide which column to eval instead of explicitly naming a list of them would make it easier to maintain.
col_type_map).fillna(nonapplicable) | ||
|
||
# eval str(numeric) and str(lists), convert lists to tuple for | ||
# dict key compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an example here of what the current values in the dataframe are and why we need to call eval. I think that will be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description above provides some example along with the rationale for calling eval.
- stimulus_presentations['spatial_frequency'] contains numeric values of mixed str and float types, along with str(list)
- Unique values from spatial_frequency column - current
['0.08', '[0.0, 0.0]', '0.04', 0.08, 0.04, 0.32, 0.02, 0.16]
- Test in static_gratings expects float vals for spatial_frequency (sfvals)
- Unique values from spatial_frequency column - current
Here's a summary of the unique values in the dataframe taken before the fix was applied
color: [-1.0 1.0]
contrast: [0.8 1.0]
frame: [-1.0 0.0 1.0 ... 3597.0 3598.0 3599.0]
orientation: [0.0 30.0 45.0 60.0 90.0 120.0 135.0 150.0 180.0 225.0 270.0 315.0]
phase: ['0.0' '0.25' '0.5' '0.75' '[0.0, 0.0]' '[3644.93333333, 3644.93333333]'
'[42471.86666667, 42471.86666667]']
size: ['[1920.0, 1080.0]' '[20.0, 20.0]' '[250.0, 250.0]' '[300.0, 300.0]']
spatial_frequency: [0.02 0.04 '0.04' 0.08 '0.08' 0.16 0.32 '[0.0, 0.0]']
temporal_frequency: [1.0 2.0 4.0 8.0 15.0]
x_position: [-40.0 -30.0 -20.0 -10.0 0.0 10.0 20.0 30.0 40.0]
y_position: [-40.0 -30.0 -20.0 -10.0 0.0 10.0 20.0 30.0 40.0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a description to the PR. I also think an inline description would be helpful since the data structure is extremely messy and unconventional
aadebb4
to
989582d
Compare
1 - (1 / N))) | ||
return ls | ||
|
||
def literal_col_eval(df: pd.DataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the two utilities functions added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swdb is summer workshop of the dynamic brain. It doesn't belong here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought SWDB might've meant software db. (:
How about creating a utilities.py in allensdk/core
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure :)
].apply(naming_utilities.eval_str) | ||
|
||
|
||
col_list = ["phase, size, spatial_frequency"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the logic here to eval/tuple by specifying columns instead of creating rules.
from allensdk.brain_observatory.behavior.behavior_project_cache.\ | ||
project_apis.data_io.project_cloud_api_base import ProjectCloudApiBase # noqa: E501 | ||
|
||
|
||
def literal_col_eval(df: pd.DataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this function to utilities.py and removed the default values for columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please move the utility functions out of the swdb package, since that is for a workshop
- Please do not lint until AFTER review. It is near impossible to review with the linting included
Otherwise looks good.
@@ -2,19 +2,44 @@ | |||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changed in this file other than the linting? Why did you need to touch this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's when I rebased my PR onto the updated RC branch, there were merge conflicts that I needed to resolve which marked the file as being touched, and my script to run black on changed files formatted it.
@@ -3,23 +3,44 @@ | |||
from pynwb import NWBFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changed in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's when I rebased my PR onto the updated RC branch, there were merge conflicts that I needed to resolve which marked the file as being touched, and my script to run black on changed files formatted it.
col_type_map).fillna(nonapplicable) | ||
|
||
# eval str(numeric) and str(lists), convert lists to tuple for | ||
# dict key compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a description to the PR. I also think an inline description would be helpful since the data structure is extremely messy and unconventional
bc230f4
to
d1c6980
Compare
…resentations dtype to object before fillna
…ndex to probe_channel_number
change to double quotes for docstring lint add inline notes for eval/tuple rationale resolve merge conflicts resolve merge conflict
d1c6980
to
25c055d
Compare
just to follow up the first issue:
and it works |
This PR address three issues
_build_stimulus_presentations
method has pandas operations that trigger errors.stimulus_presentations.fillna
triggering an error when replacing pd.NA values with a string in a boolean type column (opacity).stimulus_presentations['spatial_frequency']
contains numeric values of mixed str and float types, along with str(list)['0.08', '[0.0, 0.0]', '0.04', 0.08, 0.04, 0.32, 0.02, 0.16]
['0.02' '0.04' '0.08' '0.16' '0.32' '[0.0, 0.0]']
local_index
toprobe_channel_number
as reported in ticket 'probe_channel_number' should replace 'local_index' in ecephys_session, line 1244 (_build_mean_waveforms) #2573