Pre-processing stage key error #6

Afshinshafei · 2022-09-21T10:15:59Z

First of all, thank you for your GREAT code, it is a real game changer.
I had an issue with running the pre-processing stage by running the exact datasets used inside the written example code in this stage (13 days) and still got the error which is: KeyError "Unable to open object (object 'fields' doesn't exist)"

I don't know what causes this problem because in my point of view everything must be ok with the code and datasets.

TeunZoer · 2022-09-29T12:15:01Z

Hi everyone, I ran into the exact same error with the example code but have no idea how to solve it. Does someone know what is causing the error?

Afshinshafei · 2022-09-29T13:26:56Z

I am searching for more than one week that what can cause this error and the most recommendation was data corruption which is not our case I think.
if someone knows how to solve it, it can save so much time for us thank you very much.

Melissa3248 · 2022-10-03T22:06:20Z

I ran into the same problem running the parallel_copy_small_set.py file, and I was able to fix the issue by adding two lines in the writetofile() function:

def writetofile(src, dest, channel_idx, varslist, src_idx=0, frmt='nc'):
    if os.path.isfile(src):
        batch = 2**4
        rank = MPI.COMM_WORLD.rank
        Nproc = MPI.COMM_WORLD.size
        Nimgtot = 52#src_shape[0]

        Nimg = Nimgtot//Nproc
        base = rank*Nimg
        end = (rank+1)*Nimg if rank<Nproc - 1 else Nimgtot
        idx = base

        fdest = h5py.File(dest, 'a', driver='mpio', comm=MPI.COMM_WORLD)
        fdest['fields'] = np.empty((1,20,720,1440))

        for variable_name in varslist:

            if frmt == 'nc':
                fsrc = DS(src, 'r', format="NETCDF4").variables[variable_name]
            elif frmt == 'h5':
                fsrc = h5py.File(src, 'r')[varslist[0]]
            #print("fsrc shape", fsrc.shape)
            fdest = h5py.File(dest, 'a', driver='mpio', comm=MPI.COMM_WORLD)

            start = time.time()
            while idx<end:
                if end - idx < batch:
                    if len(fsrc.shape) == 4:
                        ims = fsrc[idx:end,src_idx]
                    else:
                        ims = fsrc[idx:end]
                    print(ims.shape)
                    fdest['fields'][idx:end, channel_idx, :, :] = ims
                    break
                else:
                    if len(fsrc.shape) == 4:
                        ims = fsrc[idx:idx+batch,src_idx]
                    else:
                        ims = fsrc[idx:idx+batch]
                    #ims = fsrc[idx:idx+batch]
                    print("ims shape", ims.shape)
                    fdest['fields'][idx:idx+batch, channel_idx, :, :] = ims
                    idx+=batch
                    ttot = time.time() - start
                    eta = (end - base)/((idx - base)/ttot)
                    hrs = eta//3600
                    mins = (eta - 3600*hrs)//60
                    secs = (eta - 3600*hrs - 60*mins)

            ttot = time.time() - start
            hrs = ttot//3600
            mins = (ttot - 3600*hrs)//60
            secs = (ttot - 3600*hrs - 60*mins)
            channel_idx += 1

The two lines I added are:
fdest = h5py.File(dest, 'a', driver='mpio', comm=MPI.COMM_WORLD)
fdest['fields'] = np.empty((1,20,720,1440))

This initializes the destination file with a shape of (timepoint, # features, latitude, longitude). The code then appends data for each time point, so in the 13 day example, you obtain a dataset with size (52,20,720,1440). Hope this fix also works for you!

jdppthk · 2022-10-03T22:13:10Z

Thanks @Melissa3248. I would recommend pre-populating the empty hdf5 files. Before running the parallel_copy_small_set.py file. I will push a script to do that. It should just do the following:

time_steps = 52
with h5py.File('filename.h5', 'w') as f:
f.create_dataset('fields', shape = (time_steps, 20, 720, 1440), dtype='f')

look through the h5py docs for more details https://docs.h5py.org/en/stable/high/dataset.html

TeunZoer · 2022-10-04T14:50:06Z

Thanks @Melissa3248 and @jdppthk for your solutions. Melissa her solution (with a small adjustment in the empty numpy array) only worked for the first variable, from the second variable ('v10') I then get the following error:

File "/home/teun/Documents/TUD/FourCastNet/FourCastNet-0.0.0/data_process/parallel_copy_small_set_Melissa.py", line 119, in
writetofile(src, dest, 1, ['v10'])
File "/home/teun/Documents/TUD/FourCastNet/FourCastNet-0.0.0/data_process/parallel_copy_small_set_Melissa.py", line 71, in writetofile
fdest['fields'] = np.empty((16,20,721,1440))
File "/home/teun/miniconda3/lib/python3.9/site-packages/h5py/_hl/group.py", line 433, in setitem
h5o.link(ds.id, self.id, name, lcpl=lcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)

For me the pre-populating proposed by jdppthk worked!

jdppthk · 2022-10-04T16:58:54Z

Glad it works for you. Closing this issue.

jdppthk closed this as completed Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-processing stage key error #6

Pre-processing stage key error #6

Afshinshafei commented Sep 21, 2022

TeunZoer commented Sep 29, 2022

Afshinshafei commented Sep 29, 2022 •

edited

Melissa3248 commented Oct 3, 2022

jdppthk commented Oct 3, 2022

TeunZoer commented Oct 4, 2022

jdppthk commented Oct 4, 2022

Pre-processing stage key error #6

Pre-processing stage key error #6

Comments

Afshinshafei commented Sep 21, 2022

TeunZoer commented Sep 29, 2022

Afshinshafei commented Sep 29, 2022 • edited

Melissa3248 commented Oct 3, 2022

jdppthk commented Oct 3, 2022

TeunZoer commented Oct 4, 2022

jdppthk commented Oct 4, 2022

Afshinshafei commented Sep 29, 2022 •

edited