Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drillhole.get_data() doesn't return values #275

Open
cardinalgeo opened this issue Dec 8, 2022 · 5 comments
Open

Drillhole.get_data() doesn't return values #275

cardinalgeo opened this issue Dec 8, 2022 · 5 comments

Comments

@cardinalgeo
Copy link

cardinalgeo commented Dec 8, 2022

Hi everyone! I'm enjoying using geoh5py so far, though I've found a potential bug (described below). Any idea what's going on?
Best, Robert Collar

Environment data

OS (Windows | Mac | Linux distro) and version: Darwin arm64 21.3.0
Type of virtual environment used (N/A | venv | virtualenv | conda | ...): conda
python version: 3.9.13
geoh5py version: 0.4.0

Expected behavior

When I create a Drillhole object using, for example, well = Drillhole.create(workspace), add attributes (e.g., well.name = "Drillhole") and interval data, close the workspace, and then reopen the workspace (e.g., in a different jupyter notebook), I expect the interval data values to be retrievable.

Actual behavior

Instead, when I follow the steps above, the interval data values are None. Interestingly, if I initialize a Drillhole object and its attributes all at once (and then subsequently add the interval data like before), the interval data values are retrievable.

Steps to reproduce

In the example below, after closing and reopening the workspaces, the interval_data values are only printed for the first well.

from geoh5py.workspace import Workspace
from geoh5py.groups import DrillholeGroup
from geoh5py.objects import Drillhole
import numpy as np

import pandas as pd

workspace1 = Workspace("../bug_mre1.geoh5")
workspace2 = Workspace("../bug_mre2.geoh5")

dh_group1 = DrillholeGroup.create(workspace1)
dh_group2 = DrillholeGroup.create(workspace2)

# Create a simple well
total_depth = 100
dist = np.linspace(0, total_depth, 10)
azm = np.ones_like(dist) * 45.
dip = np.linspace(-89, -75, dist.shape[0])
collar = np.r_[0., 10., 10]

# data intervals
from_to = np.vstack([
    [0.25, 25.5],
    [30.1, 55.5],
    [56.5, 80.2]
])

# create Drillhole object for first well
well1 = Drillhole.create(
    workspace1, collar=collar, surveys=np.c_[dist, azm, dip], name="Drillhole", parent=dh_group1
)

well1.add_data({
    "random_values": {
        "values": np.array([1, 1, 1],dtype="float64"),
        "from-to": from_to,
    }
})

# create Drillhole object for second well
well2 = Drillhole.create(workspace2)

well2.name = "Drillhole"
well2.parent = dh_group2
well2.collar=collar
well2.surveys=np.c_[dist, azm, dip]
    
well2.add_data({
    "random_values": {
        "values": np.array([1, 1, 1],dtype="float64"),
        "from-to": from_to,
    }
})

# inspect interval data for each well 
for object in workspace1.objects: 
    if isinstance(object, Drillhole): 
        print(object.get_data("random_values")[0].values)

for object in workspace2.objects: 
    if isinstance(object, Drillhole): 
        print(object.get_data("random_values")[0].values)

# close and open workspaces
workspace1.close()
workspace2.close()

workspace1 = Workspace("../bug_mre1.geoh5")
workspace2 = Workspace("../bug_mre2.geoh5")

# re-inspect interval data for each well
for object in workspace1.objects: 
    if isinstance(object, Drillhole): 
        print(object.get_data("random_values")[0].values)

for object in workspace2.objects: 
    if isinstance(object, Drillhole): 
        print(object.get_data("random_values")[0].values)

# close workspaces
workspace1.close()
workspace2.close()
@github-actions
Copy link

github-actions bot commented Dec 8, 2022

JIRA issue [GEOPY-723] was created.

@domfournier
Copy link
Contributor

@cardinalgeo thanks for raising this. I agree the behaviour is not ideal. We are getting rid of the data after closing the file to force a refresh on re-opening. It has to do with how drillhole store data in the background. We can re-think this for sure, but for now, just leave your workspace opened (or within context) to access the data values.

@cardinalgeo
Copy link
Author

Got it, thanks @domfournier! But why does the behavior differ between the two scenarios run in the example (i.e., the data are not returned for workspace2/well2, in which the attributes are added after initialization, but are returned for workspace1/well1, in which I initialize the workspace with all the attributes except the interval data)?

@domfournier
Copy link
Contributor

domfournier commented Feb 15, 2023

Hi @cardinalgeo ,

Just getting back to this, sorry for the obnoxiously long silence.

So turns out that you did something that we didn't see coming. In your example, well2 is first created as a standalone Drillhole object, which you then assign to a DrillholeGroupConcatenator (new since geoh5 v2 format).

image

The behavior of Drillholes and DrillholeConcatenated is quite different in terms of storage of attributes and values and can't be mixed. The simplest fix will be to prevent you from assigning a Drillhole to a DrillholeGroupConcatenator, and force you to assign on instantiation instead (like you did on well1)

Hope this makes sense.

@domfournier domfournier mentioned this issue Feb 15, 2023
@cardinalgeo
Copy link
Author

Got it, thanks for clarifying @domfournier!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants