Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgltf_accessor_read_float and others should verify buffer boundaries for the index parameter to avoid crashes #110

Open
ghost opened this issue Mar 10, 2020 · 12 comments

Comments

@ghost
Copy link

ghost commented Mar 10, 2020

cgltf_accessor_read_float and the other functions of that type should verify the buffer boundaries. I get that this probably wasn't done for performance reasons, but just look at the typical workflow people will use:

  1. Get mesh indices via cgltf_accessor_read_index
  2. Get the mesh attribute of type cgltf_attribute_type_position
  3. Stuff the indices into cgltf_accessor_read_float to read actual position

Who is going to manually verify the mesh indices especially given the buffer accessor arithmetics are complicated enough that cgltf_accessor_read_float and such exist in the first place?

I think that leads to the obvious conclusion that the most common use case for cgltf_accessor_read_float involves the offset being directly input from whatever unchecked numbers written in the mesh indices data. This means a malformed model could easily crash, which I think is worth preventing.

(And I get that the library cannot reasonably prevent crashes due to errors on the engine side, but I think it should try to prevent crashes from malformed models. If only to make debugging broken models easier for artists, and to allow catching invalid load attempts sanely rather than everything exploding with a segmentation fault.)

@ghost ghost changed the title cgltf_accessor_read_float and others should verify buffer boundaries to avoid crashes cgltf_accessor_read_float and others should verify buffer boundaries for the index parameter to avoid crashes Mar 10, 2020
@prideout
Copy link
Contributor

👍

@jkuhlmann
Copy link
Owner

Have you seen the cgltf_validate() function? I'm pretty sure it validates the indices. In my opinion, that's also the better approach because you can verify that the data is valid if you're suspicious, but you don't have to pay the performance penalty.

@ghost
Copy link
Author

ghost commented Mar 12, 2020

Ah, neat! Is there a similar function to batch convert all sparse to non-sparse data before I even iterate over anything? What about some cgltf_option that says "slow, safe & comprehensive" and does all this for me, all the validating and converting? I get some engine designers especially towards the triple-A in-house end would rather have the fast loading, but I feel like for many middleware projects or lower spec games that don't mind loading screens such a switch might be nice to have.

Also, is it too much if I suggest a basic API reference in some more easier to browse format like doxygen? 😬 I know that adds a lot of work compared to just examples, but there is really a lot of functionality to miss right now

Edit:

something I just noticed, regarding this in cgltf_validate:

				if (indices && indices->buffer_view && indices->buffer_view->buffer->data)
				{
					cgltf_size index_bound = cgltf_calc_index_bound(indices->buffer_view, indices->offset, indices->component_type, indices->count);

					if (index_bound >= first->count)
					{
						return cgltf_result_data_too_short;
					}
				}

Why does it only check first and not all attributes? What if the other attribute buffers (UV coordinates, ...) contain less items than the first one? I mean in a regular model that would probably not be the case, but in an invalid one that might be, right? So it feels like this should actually be a loop on all attributes to check index_bound against. (Unless I missed something and this makes sense the way it is, then ignore my remark 😂 )

@zeux
Copy link
Contributor

zeux commented Mar 13, 2020

@etc0de This is because all accessors must have the same size, and this is checked before the code you quoted:

				for (cgltf_size k = 0; k < data->meshes[i].primitives[j].attributes_count; ++k)
				{
					if (data->meshes[i].primitives[j].attributes[k].data->count != first->count)
					{
						return cgltf_result_invalid_gltf;
					}
				}

@jkuhlmann
Copy link
Owner

To be honest, I don't really feel the need to add an option to automatically validate. Explicity calling cgltf_validate() seems more explicit and more obvious than having an option that defaults to false.

From your feedback, @etc0de, I understand though that better documentation and samples would be very helpful. So far, the top of cgltf.h was supposed to be the documentation in order to keep the "library" as self-contained as possible.
What was you approach so far in figuring out how to use the library?

@ghost
Copy link
Author

ghost commented Mar 16, 2020

Explicity calling cgltf_validate() seems more explicit and more obvious than having an option that defaults to false. Good point, but maybe it should default to true? I personally think it might be a good idea to default to safer than faster, but I understand both defaults would have their merit.

As for the documentation, it's mainly a time issue. The thing is, most other single header file loader libraries for images or audio or such have a much simpler interface: you just get out raw bytes, combined with amount of channels and the pixel or audio sample format, and that's kinda it. GLTF on the other hand is a very complex format with lots of parts that go together, so assembling a model into an OpenGL buffer just takes combining a lot of different things. Without any examples I need to scroll through the entire source to see what holds the info I need, because guessing this from the top area alone is not really feasible when not familiar with the format. Also, buffers with complex contents like polygons take a bit of searching around the code to find the info on how large each element inside is, how many elements they contain, etc. There's just more to dig up and combine than raw byte size of the overall buffer, to get an actually meaningful end result.

What was you approach so far in figuring out how to use the library?

I basically just read the source code a lot. It's nice and clean so that does work, it's really just that it adds up time-wise compared to a quick example that already shows the combination of things needed for a task in a way shorter amount of lines. (So far I got static geometry working with textures, UVs, and normals, I haven't gone into the rigging and animation part yet.)

In addition to examples, a doxygen-style generated API reference would help mainly for the structs such that people can click the members and jump to the respective definition immediately and explore the data structures better. Github's code view is just not very suitable for this, and while an IDE can help it takes a bit of effort to fire one up and load this into it (and some people may not use one). But I think the examples are way more important, a browsable doxygen thing would just be the cherry on top.

@NPatch
Copy link

NPatch commented Aug 2, 2020

This is because all accessors must have the same size, and this is checked before the code you quoted:

@zeux Can I ask why that is?

I'm hitting issues trying to write gltf files for photogrammetry models and at the moment, I'm writing out index ranges(submeshes really) as separate primitives with their own accessors so I can share a buffer view(seeing as all indices are the same type). But I keep hitting the index_bound check. Indices in my case are a total sum of the index counts of the primitive/accessor pairs which are all different in size.

@zeux
Copy link
Contributor

zeux commented Aug 2, 2020

@NPatch I thought this is required by the spec https://github.com/KhronosGroup/glTF/tree/master/specification/2.0 but I currently can not find the relevant reference... glTF validator does have the same check (see meshPrimitiveUnequalAccessorsCount in validator source). I'm going to file a spec clarification issue.

Note that the restriction here is on accessors, not buffer views - if each primitive has their own set of accessors, why can't you enforce that the count is the same?

@NPatch
Copy link

NPatch commented Aug 2, 2020

Yes I know the restriction is on accessors. I didn't phrase it well. I was talking about primitive->indices accessors.

I think you meant this:

Implementation note: Each primitive corresponds to one WebGL draw call (engines are, of course, free to batch draw calls). When a primitive's indices property is defined, it references the accessor to use for index data, and GL's drawElements function should be used. When the indices property is not defined, GL's drawArrays function should be used with a count equal to the count property of any of the accessors referenced by the attributes property (they are all equal for a given primitive).

I did figure out why I was hitting the count check. I was writing data out to binary files myself and used buffer->uri, but forgot to set buffer->data to NULL, so it defaulted to garbage high values and eventually hit the index count check at https://github.com/jkuhlmann/cgltf/blob/master/cgltf.h#L1416

Otherwise the gltf I'm creating passes all tests in glTF 2.0 validator without warnings for the accessors.

Btw I think it's possible to do indexing for UVs or Normals, like the OBJ spec allows( or just per-wedge uvs), to reduce file size, but that check for accessor.count would probably fail for that(positions.count != uvs.count). Although it's possible, not sure if an extension would be the preferred way.

@zeux
Copy link
Contributor

zeux commented Aug 2, 2020

Btw I think it's possible to do indexing for UVs or Normals, like the OBJ spec allows( or just per-wedge uvs), to reduce file size

glTF only allows a single index array.

@NPatch
Copy link

NPatch commented Aug 2, 2020

I was thinking you could expect pairs or triplets of indices as one and perhaps use the stride to hack access to them(if that doesn't violate any other tests that is)?! Not sure. Perhaps an extension would be better for clarity.

@zeux
Copy link
Contributor

zeux commented Aug 2, 2020

All of this is inconsistent with glTF specification, where each index is used to index all attribute accessors simultaneously. So yes that would require an extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants