Ticket/2594/dev #2609

mikejhuang · 2022-11-16T23:55:51Z

This PR address three issues

_build_stimulus_presentations method has pandas operations that trigger errors.
- stimulus_presentations.replace triggering error with pandas 1.1.5
  - Error reported in ticket on pandas repo, fixed on Nov 20 2021 This coincides to the pandas 1.4 release
- stimulus_presentations.fillna triggering an error when replacing pd.NA values with a string in a boolean type column (opacity).
  - solution: first convert all "boolean" type columns to the "object" type.
stimulus_presentations['spatial_frequency'] contains numeric values of mixed str and float types, along with str(list)
- Unique values from spatial_frequency column - current
  ['0.08', '[0.0, 0.0]', '0.04', 0.08, 0.04, 0.32, 0.02, 0.16]
- Unique values from spatial_frequency column - 2020 pre-release:
  ['0.02' '0.04' '0.08' '0.16' '0.32' '[0.0, 0.0]']
- Test in static_gratings expects float vals for spatial_frequency (sfvals)
- Stimulus table fixture expects string values
  - However, no tests fail when eval all numeric/list strings in the stimulus_presentation tables.
Replace instance of local_index to probe_channel_number as reported in ticket 'probe_channel_number' should replace 'local_index' in ecephys_session, line 1244 (_build_mean_waveforms) #2573

morriscb

A few questions. One other issue is making sure that the data that is currently released and accessed via the ecephys_session objects can still be loaded from it's NWBs.

allensdk/brain_observatory/ecephys/ecephys_session.py

allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py

mikejhuang · 2022-11-18T18:12:18Z

One other issue is making sure that the data that is currently released and accessed via the ecephys_session objects can still be loaded from it's NWBs.

Does the ephys_session notebook do this? It currently runs through the notebook.

morriscb · 2022-11-18T18:09:28Z

allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py

-#
+
+
+def eval_str(val):


Does it make sense to add the type suggestions to this function?

I don't think so, since it takes in any value of any type and returns any value of any type.

But the return types are consistent at least.

allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py

allensdk/brain_observatory/ecephys/ecephys_session.py

morriscb · 2022-11-18T18:18:29Z

Does the ephys_session notebook do this? It currently runs through the notebook.

You'll have to check. As I said during sprint planning, you're about the first of the current Pika's to look at this code. My guess is yes, but it wouldn't hurt to double check.

review-notebook-app · 2022-11-18T19:09:57Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

mikejhuang · 2022-11-18T20:12:57Z

You'll have to check. As I said during sprint planning, you're about the first of the current Pika's to look at this code. My guess is yes, but it wouldn't hurt to double check.

I believe it does do it. I committed the re-ran notebook. You can see the comparison here:
2020 pre-release run
current PR re-run

The table values are now either floating point or tuples instead of string. Cell 17 is a good overview in the differences.

The re-run is oddly cut off at cell 36 in this preview. It isn't cut off when I view it locally. Seems like it's something to do with the animation.jhtml.

aamster

Looks good, I left some feedback

aamster · 2022-11-21T15:47:08Z

allensdk/brain_observatory/ecephys/ecephys_session.py

-        stimulus_presentations.fillna(nonapplicable, inplace=True)
+
+        # pandas does not automatically convert boolean cols for fillna
+        boolean_colnames = stimulus_presentations.dtypes[


Pandas 1.1 introduced dropna argument for groupby, which allows using na values as keys. If this is not a possibility, then I guess this is ok.

I also don't know what the use case is here, but I'm surprised we want to group by a missing key

Good call with the dropna. Although this would require a refactor since several parts of the code and notebook refer to the NA value as 'null'. I changed those to check for nan instead.

Some of these old files can require a good amount of linting once they're touched.

Actually, after doing more testing, I discovered one cell of the notebook had a change in results. Apparently, there's an unresolved pandas bug with dropna in the groupby function when used with MultiIndex groupings. pandas-dev/pandas#36470

I reverted everything back to 'null'

aamster · 2022-11-21T20:39:55Z

allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py

+        if val.replace('.', '').isdigit():  # checks if val is numeric
+            val = eval(val)
+        elif val[0] == "[" and val[-1] == "]":  # checks if val is list
+            val = tuple(eval(val))


this will break on the string "[foo]" or for that matter "['foo',;[]]", so you need try/catch here, which I know Chris didn't like but I think it's necessary to catch any number of issues.

Those don't seem like valid entries for any of the fields. It should fail regardless if it passes an eval statement.
I don't see any tests that check for invalid entries though?

aamster · 2022-11-21T20:56:04Z

allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py

-#
+
+
+def eval_str(val):


There's already code for this in the codebase, search literal_col_eval . Can that be repurposed? I know that function is in a specific module. Maybe it can be moved/modified to a general util module.

I moved this function to brain_observatory/behavior/swdb/utilities.py

aamster · 2022-11-21T20:56:20Z

allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py

+        if val.replace('.', '').isdigit():  # checks if val is numeric
+            val = eval(val)
+        elif val[0] == "[" and val[-1] == "]":  # checks if val is list
+            val = tuple(eval(val))


I don't think this is the right place to convert to tuple, this function should just eval. Conversion to tuple should be done outside of this function.

I separated this to two functions since I think it makes the code better organized through modularization. However, pandas apply is not an efficient way to iterate through all the rows, and running it twice may have additional overhead.

aamster · 2022-11-21T20:57:53Z

allensdk/brain_observatory/ecephys/stimulus_table/naming_utilities.py

+    """
+
+    if isinstance(val, str):
+        if val.replace('.', '').isdigit():  # checks if val is numeric


I'm not sure why you need to check if it is a number of list ahead of time, can't you just call eval if it is a string? That should return the correct thing regardless.

There can be issues if a string of characters are inputted into eval, like with the column stimulus_name.
The alternative is to specify which columns you want to eval. I thought coming up with a set of rules to decide which column to eval instead of explicitly naming a list of them would make it easier to maintain.

aamster · 2022-11-21T20:59:12Z

allensdk/brain_observatory/ecephys/ecephys_session.py

+            col_type_map).fillna(nonapplicable)
+
+        # eval str(numeric) and str(lists), convert lists to tuple for
+        # dict key compatibility


Please add an example here of what the current values in the dataframe are and why we need to call eval. I think that will be helpful.

The PR description above provides some example along with the rationale for calling eval.

stimulus_presentations['spatial_frequency'] contains numeric values of mixed str and float types, along with str(list)

Unique values from spatial_frequency column - current
['0.08', '[0.0, 0.0]', '0.04', 0.08, 0.04, 0.32, 0.02, 0.16]

Test in static_gratings expects float vals for spatial_frequency (sfvals)

Here's a summary of the unique values in the dataframe taken before the fix was applied

color: [-1.0 1.0] contrast: [0.8 1.0] frame: [-1.0 0.0 1.0 ... 3597.0 3598.0 3599.0] orientation: [0.0 30.0 45.0 60.0 90.0 120.0 135.0 150.0 180.0 225.0 270.0 315.0] phase: ['0.0' '0.25' '0.5' '0.75' '[0.0, 0.0]' '[3644.93333333, 3644.93333333]' '[42471.86666667, 42471.86666667]'] size: ['[1920.0, 1080.0]' '[20.0, 20.0]' '[250.0, 250.0]' '[300.0, 300.0]'] spatial_frequency: [0.02 0.04 '0.04' 0.08 '0.08' 0.16 0.32 '[0.0, 0.0]'] temporal_frequency: [1.0 2.0 4.0 8.0 15.0] x_position: [-40.0 -30.0 -20.0 -10.0 0.0 10.0 20.0 30.0 40.0] y_position: [-40.0 -30.0 -20.0 -10.0 0.0 10.0 20.0 30.0 40.0]

Thanks for adding a description to the PR. I also think an inline description would be helpful since the data structure is extremely messy and unconventional

mikejhuang · 2022-11-23T03:31:42Z

allensdk/brain_observatory/behavior/swdb/utilities.py

        1 - (1 / N)))
    return ls
+
+def literal_col_eval(df: pd.DataFrame,


Here are the two utilities functions added

swdb is summer workshop of the dynamic brain. It doesn't belong here :)

Ah, I thought SWDB might've meant software db. (:
How about creating a utilities.py in allensdk/core ?

mikejhuang · 2022-11-23T03:32:55Z

allensdk/brain_observatory/ecephys/ecephys_session.py

-                ].apply(naming_utilities.eval_str)
-
+
+        col_list = ["phase, size, spatial_frequency"]


I changed the logic here to eval/tuple by specifying columns instead of creating rules.

mikejhuang · 2022-11-23T03:33:45Z

...servatory/behavior/behavior_project_cache/project_apis/data_io/behavior_project_cloud_api.py

 from allensdk.brain_observatory.behavior.behavior_project_cache.\
    project_apis.data_io.project_cloud_api_base import ProjectCloudApiBase  # noqa: E501


-def literal_col_eval(df: pd.DataFrame,


Moved this function to utilities.py and removed the default values for columns

aamster

Please move the utility functions out of the swdb package, since that is for a workshop
Please do not lint until AFTER review. It is near impossible to review with the linting included

Otherwise looks good.

aamster · 2022-11-23T23:14:06Z

allensdk/brain_observatory/behavior/behavior_ophys_experiment.py

@@ -2,19 +2,44 @@
 import pandas as pd


What changed in this file other than the linting? Why did you need to touch this file?

I think it's when I rebased my PR onto the updated RC branch, there were merge conflicts that I needed to resolve which marked the file as being touched, and my script to run black on changed files formatted it.

aamster · 2022-11-23T23:16:12Z

allensdk/brain_observatory/behavior/data_objects/metadata/behavior_ophys_metadata.py

@@ -3,23 +3,44 @@
 from pynwb import NWBFile


What changed in this file?

I think it's when I rebased my PR onto the updated RC branch, there were merge conflicts that I needed to resolve which marked the file as being touched, and my script to run black on changed files formatted it.

aamster · 2022-11-23T23:22:38Z

allensdk/brain_observatory/ecephys/ecephys_session.py

+            col_type_map).fillna(nonapplicable)
+
+        # eval str(numeric) and str(lists), convert lists to tuple for
+        # dict key compatibility


Thanks for adding a description to the PR. I also think an inline description would be helpful since the data structure is extremely messy and unconventional

allensdk/brain_observatory/behavior/swdb/utilities.py

…resentations dtype to object before fillna

…ndex to probe_channel_number

lint fix lint2

change to double quotes for docstring lint add inline notes for eval/tuple rationale resolve merge conflicts resolve merge conflict

ZeroAda · 2023-12-04T21:18:23Z

just to follow up the first issue: _build_stimulus_presentations method has pandas operations that trigger errors. I change the lines # stimulus_presentations.replace("", nonapplicable, inplace=True) in ecephys_session.pyto the following:

bool_columns = stimulus_presentations.select_dtypes(include=['bool']).columns stimulus_presentations[bool_columns] = stimulus_presentations[bool_columns].astype('object') stimulus_presentations[bool_columns] = stimulus_presentations[bool_columns].fillna(nonapplicable)

and it works

mikejhuang force-pushed the ticket/2594/dev branch 2 times, most recently from 5ae3a2a to a5135dd Compare November 17, 2022 00:01

mikejhuang requested review from morriscb and aamster November 18, 2022 16:57

morriscb requested changes Nov 18, 2022

View reviewed changes

mikejhuang requested a review from morriscb November 18, 2022 18:05

morriscb reviewed Nov 18, 2022

View reviewed changes

mikejhuang force-pushed the ticket/2594/dev branch from fe84bfc to 22b80bc Compare November 18, 2022 19:20

mikejhuang requested a review from morriscb November 18, 2022 19:21

mikejhuang force-pushed the ticket/2594/dev branch 2 times, most recently from f59aa0e to 24b7838 Compare November 18, 2022 20:00

morriscb approved these changes Nov 18, 2022

View reviewed changes

aamster requested changes Nov 21, 2022

View reviewed changes

aamster mentioned this pull request Nov 22, 2022

Fix issues in notebooks #2611

Closed

mikejhuang force-pushed the ticket/2594/dev branch 3 times, most recently from aadebb4 to 989582d Compare November 23, 2022 00:39

mikejhuang requested a review from aamster November 23, 2022 00:40

mikejhuang commented Nov 23, 2022

View reviewed changes

aamster approved these changes Nov 23, 2022

View reviewed changes

mikejhuang force-pushed the ticket/2594/dev branch 3 times, most recently from bc230f4 to d1c6980 Compare November 28, 2022 09:37

changed pandas requirement to 1.4.0 that fixes bug, change stimulus_p…

73db183

…resentations dtype to object before fillna

mikejhuang added 10 commits November 28, 2022 11:01

convert spatial_frequency datatype to str, replace deprecated local_i…

9378347

…ndex to probe_channel_number

lint

19bc3bc

lint fix lint2

change test fixture columns with local_index to probe_channel_number

92b1629

eval str and convert list to tuple

cbc6a75

lint

6b61dea

reverted pandas version

656d617

replaced try/except to conditionals in eval_str

97fe065

rebase and resolve merge conflict

6cb56d0

refactor eval/tuple functions

03ac5a7

create utils in allensdk/core for df processing

25c055d

change to double quotes for docstring lint add inline notes for eval/tuple rationale resolve merge conflicts resolve merge conflict

mikejhuang force-pushed the ticket/2594/dev branch from d1c6980 to 25c055d Compare November 28, 2022 20:10

mikejhuang merged commit 28e8497 into rc/2.16.1 Nov 28, 2022

mikejhuang deleted the ticket/2594/dev branch November 28, 2022 20:18

		].apply(naming_utilities.eval_str)


		col_list = ["phase, size, spatial_frequency"]

		#


		def eval_str(val):

		#


		def eval_str(val):

Ticket/2594/dev #2609

Ticket/2594/dev #2609

Conversation

mikejhuang commented Nov 16, 2022 • edited Loading

morriscb left a comment

Choose a reason for hiding this comment

mikejhuang commented Nov 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morriscb commented Nov 18, 2022

review-notebook-app bot commented Nov 18, 2022

mikejhuang commented Nov 18, 2022 • edited Loading

aamster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikejhuang Nov 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aamster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZeroAda commented Dec 4, 2023

mikejhuang commented Nov 16, 2022 •

edited

Loading

mikejhuang commented Nov 18, 2022 •

edited

Loading

mikejhuang Nov 22, 2022 •

edited

Loading