Skip to content

Commit

Permalink
fix non commutative logic error for some mpi zero copy cases (#1055)
Browse files Browse the repository at this point in the history
  • Loading branch information
cyrush committed Dec 21, 2022
1 parent 09770be commit dc77d7e
Show file tree
Hide file tree
Showing 6 changed files with 126 additions and 23 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ and this project aspires to adhere to [Semantic Versioning](https://semver.org/s
- Fixes (correctness and performance) to `topology::unstructured::generate_offsets`
- Changed `Schema::has_path()` (and transitively `Node::has_path()` ) to ignore leading `/`s.

#### Relay
- Updated C++ and Python tutorial docs for Compatible Schemas with a new example to outline the most common use case.

### Fixed

#### Blueprint
Expand All @@ -33,6 +36,7 @@ and this project aspires to adhere to [Semantic Versioning](https://semver.org/s
#### Relay
- Leading `/`s in tree paths no longer undermine io::IOHandle reads for conduit_bin, json, conduit_json, conduit_base64_json, and yaml flavored files.
- Updated `conduit.relay.io.blueprint.{load_mesh|read_mesh} to only the read the necessary subset of root file entries. Updated MPI version to only read root file entries on rank 0 and broadcast them to other ranks.
- Fixed write compatibly check in `relay::mpi::gather`, `relay::mpi::all_gather`, and `relay::mpi::broadcast_using_schema`. Node compatible check is not commutative and checks in leaf zero-copy logic were reversed.



Expand Down
19 changes: 18 additions & 1 deletion src/docs/sphinx/tutorial_cpp_basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,24 @@ You can check if a Schema is compatible with another Schema using the **Schema::

- **If the calling Schema describes a List**: The passed test Schema must describe a List, the calling Schema must have at least as many children as the test Schema, and when compared in list order each of the test Schema's children must be compatible with the calling Schema's children.

- **If the calling Schema describes a leaf data type**: The calling Schema's and test Schema's **dtype().id()** and **dtype().element_bytes()** must match, and the calling Schema **dtype().number_of_elements()** must be greater than or equal than the test Schema's.
- **If the calling Schema describes a leaf data type**: The calling Schema's and test Schema's ``dtype().id()`` and ``dtype().element_bytes()`` must match, and the calling Schema ``dtype().number_of_elements()`` must be greater than or equal than the test Schema's.

Here is a C++ pseudocode example that shows the most common use of ``Node::compatible()``:

.. code:: cpp
conduit::Node a,b;
// In this example:
// the calling schema is `a.schema()`
// the test schema is `b.schema()`
// ask if `a` can already hold data described by `b`
if(a.compatible(b))
{
// data from `b` can be written to `a` without a new allocation
// ...
}
Node References
Expand Down
20 changes: 19 additions & 1 deletion src/docs/sphinx/tutorial_python_basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,25 @@ You can check if a Schema is compatible with another Schema using the **Schema::

- **If the calling Schema describes a List**: The passed test Schema must describe a List, the calling Schema must have at least as many children as the test Schema, and when compared in list order each of the test Schema's children must be compatible with the calling Schema's children.

- **If the calling Schema describes a leaf data type**: The calling Schema's and test Schema's **dtype().id()** and **dtype().element_bytes()** must match, and the calling Schema **dtype().number_of_elements()** must be greater than or equal than the test Schema's.
- **If the calling Schema describes a leaf data type**: The calling Schema's and test Schema's ``dtype().id()`` and ``dtype().element_bytes()`` must match, and the calling Schema ``dtype().number_of_elements()`` must be greater than or equal than the test Schema's.


Here is a Python pseudocode example that shows the most common use of ``Node.compatible()``:

.. code:: python
a = conduit.Node()
b = conduit.Node()
# In this example:
# the calling schema is `a.schema()`
# the test schema is `b.schema()`
# ask if `a` can already hold data described by `b`
if a.compatible(b) :
# data from `b` can be written to `a` without a new allocation
# ...
Expand Down
18 changes: 2 additions & 16 deletions src/libs/relay/conduit_relay_io_blueprint.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1980,14 +1980,7 @@ void read_mesh(const std::string &root_file_path,
{
// we have a problem, broadcast string message
// from rank 0 all ranks can throw an error
if(par_rank ==0)
{
n_global.set(error_oss.str());
}
else
{
n_global.reset();
}
n_global.set(error_oss.str());
conduit::relay::mpi::broadcast_using_schema(n_global,
0,
mpi_comm);
Expand All @@ -1998,14 +1991,7 @@ void read_mesh(const std::string &root_file_path,
{
// broadcast the mesh name and the bp index
// from rank 0 to all ranks
if(par_rank == 0)
{
n_global.set(mesh_name);
}
else
{
n_global.reset();
}
n_global.set(mesh_name);
conduit::relay::mpi::broadcast_using_schema(n_global,
0,
mpi_comm);
Expand Down
11 changes: 6 additions & 5 deletions src/libs/relay/conduit_relay_mpi.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -592,9 +592,9 @@ reduce(const Node &snd_node,
{

rcv_ptr = rcv_node.contiguous_data_ptr();


if( !snd_node.compatible(rcv_node) ||
// make sure `rcv_node` can hold data described by `snd_node`
if( !rcv_node.compatible(snd_node) ||
rcv_ptr == NULL ||
!rcv_node.is_compact() )
{
Expand Down Expand Up @@ -667,9 +667,9 @@ all_reduce(const Node &snd_node,


rcv_ptr = rcv_node.contiguous_data_ptr();


if( !snd_node.compatible(rcv_node) ||
// make sure `rcv_node` can hold data described by `snd_node`
if( !rcv_node.compatible(snd_node) ||
rcv_ptr == NULL ||
!rcv_node.is_compact() )
{
Expand Down Expand Up @@ -1617,7 +1617,8 @@ broadcast_using_schema(Node &node,
!(bcast_schema.dtype().is_empty() ||
bcast_schema.dtype().is_object() ||
bcast_schema.dtype().is_list() )
&& bcast_schema.compatible(node.schema()))
// make sure `node` can hold data described by `bcast_schema`
&& node.schema().compatible(bcast_schema))
{

bcast_data_ptr = node.contiguous_data_ptr();
Expand Down
77 changes: 77 additions & 0 deletions src/tests/relay/t_relay_mpi_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1227,6 +1227,83 @@ TEST(conduit_mpi_test, bcast_using_schema_non_empty_node)
}
}

//-----------------------------------------------------------------------------
TEST(conduit_mpi_test, reuse_of_node_as_input)
{
// this test tests out specifics with
// compatible schemas

int rank = mpi::rank(MPI_COMM_WORLD);
int com_size = mpi::size(MPI_COMM_WORLD);

int value = 0;
std::string msg;

if(rank == 0)
{
value = 1;
msg = "HI!";
}

Node n_local, n_global;
n_local.set(value);
relay::mpi::sum_all_reduce(n_local,
n_global,
MPI_COMM_WORLD);
value = n_global.as_int();

if(value == 1)
{

n_global.set(msg);
conduit::relay::mpi::broadcast_using_schema(n_global,
0,
MPI_COMM_WORLD);
msg = n_global.as_string();
}

EXPECT_EQ(value,1);
EXPECT_EQ(msg,"HI!");
}


//-----------------------------------------------------------------------------
TEST(conduit_mpi_test, reduce_compat_check)
{
// this test tests out specifics with
// compatible schemas

int rank = mpi::rank(MPI_COMM_WORLD);
int com_size = mpi::size(MPI_COMM_WORLD);

int value = 0;

if(rank == 0)
{
value = 1;
}

Node n_local, n_global;
n_local.set(value);
relay::mpi::sum_all_reduce(n_local,
n_global,
MPI_COMM_WORLD);
value = n_global.as_int();

// now all gather
relay::mpi::all_gather(n_local,
n_global,
MPI_COMM_WORLD);

n_global.print();
Node n_expected, info;
n_expected.append() = 1;
n_expected.append() = 0;

EXPECT_FALSE(n_global.diff(n_expected,info));

}

//-----------------------------------------------------------------------------
int main(int argc, char* argv[])
{
Expand Down

0 comments on commit dc77d7e

Please sign in to comment.