Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in validate method #86

Closed
zachdj opened this issue Apr 20, 2019 · 1 comment
Closed

Bug in validate method #86

zachdj opened this issue Apr 20, 2019 · 1 comment
Labels

Comments

@zachdj
Copy link
Collaborator

zachdj commented Apr 20, 2019

After #76 landed, I get the following stack trace when evaluating models:

Traceback (most recent call last):
  File "/home/zach/anaconda3/envs/apollo/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/zach/anaconda3/envs/apollo/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/zach/Develop/apollo/apollo/bin/evaluate.py", line 93, in <module>
    main()
  File "/home/zach/Develop/apollo/apollo/bin/evaluate.py", line 78, in main
    multioutput=multioutput)
  File "/home/zach/Develop/apollo/apollo/models/base.py", line 180, in validate
    axis=1, join='inner')
  File "/home/zach/anaconda3/envs/apollo/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 228, in concat
    copy=copy, sort=sort)
  File "/home/zach/anaconda3/envs/apollo/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 381, in __init__
    self.new_axes = self._get_new_axes()
  File "/home/zach/anaconda3/envs/apollo/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 448, in _get_new_axes
    new_axes[i] = self._get_comb_axis(i)
  File "/home/zach/anaconda3/envs/apollo/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 469, in _get_comb_axis
    sort=self.sort)
  File "/home/zach/anaconda3/envs/apollo/lib/python3.7/site-packages/pandas/core/indexes/api.py", line 70, in _get_objs_combined_axis
    return _get_combined_index(obs_idxes, intersect=intersect, sort=sort)
  File "/home/zach/anaconda3/envs/apollo/lib/python3.7/site-packages/pandas/core/indexes/api.py", line 115, in _get_combined_index
    index = index.intersection(other)
  File "/home/zach/anaconda3/envs/apollo/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 645, in intersection
    tz=result.tz, freq=None)
AttributeError: 'Index' object has no attribute 'tz'

Looks like the offending line is

matched = pd.concat([predictions, ground_truth], axis=1, join='inner')

ground_truth is loaded with the apollo.datasets.ga_power.open_sqlite method, which uses pandas Timestamp parsing instead of our new function in apollo.timestamps.
The timestamps in the index of DataArrays returned by open_sqlite don't have timezone info.

@zachdj zachdj added the bug label Apr 20, 2019
@zachdj zachdj changed the title Bug in Model.validate Bug in validate method Apr 20, 2019
@zachdj
Copy link
Collaborator Author

zachdj commented Apr 24, 2019

I ran into some trouble trying to fix this.
The apollo.datasets.ga_power.open_sqlite function makes a call to pandas read_sql_query function:

df = pd.read_sql_query(
        sql=query,
        con=connection,
        params=params,
        index_col='reftime',
        parse_dates=['reftime'])

This will parse the 'reftime' column as a date and create a pandas Dataframe with a DatetimeIndex (underlying type is datetime64[ns]). When the Dataframe is converted to an xarray Dataset, everything is hunky-dory, but the Dataset's index is not timezone-aware.

We can make the index of the pandas Dataframe timezone-aware by replacing the above call to read_sql_query with the following:

df = pd.read_sql_query(
        sql=query,
        con=connection,
        params=params,
        index_col='reftime',
        parse_dates={
            'reftime': {
                'utc': True,
                'unit': 's'
            }
        })

This creates a Dataframe with a DatetimeIndex which is timezone-aware (underlying type isdatetime64[ns, UTC]). But when the Dataframe is converted to xarray, the index gets messed up because xarray doesn't handle the timezone-aware DatetimeIndex very well. Check out darothen's comment on xarray issue #1490:

But for some weird reason, numpy doesn't recognize its own datetime dtypes when they have timezone information.

So it looks like there may be some wrinkles making the Datasets returned from apollo.datasets.ga_power timezone-aware, which was my original plan for fixing this issue.

@zachdj zachdj closed this as completed in 3f99cc9 Apr 26, 2019
zachdj added a commit that referenced this issue Apr 26, 2019
Fix validation and target-loading bugs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant