Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add node summary string methods #698

Merged
merged 4 commits into from Feb 6, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 8 additions & 3 deletions CHANGELOG.md
Expand Up @@ -19,8 +19,8 @@ and this project aspires to adhere to [Semantic Versioning](https://semver.org/s
- Added support for using C++11 initializer lists to set Node and DataArray values from numeric arrays. See C++ tutorial docs (https://llnl-conduit.readthedocs.io/en/latest/tutorial_cpp_numeric.html#c-11-initializer-lists) for more details.
- Added a Node::describe() method. This method creates a new node that mirrors the current Node, however each leaf is replaced by summary stats and a truncated display of the values. For use cases with large leaves, printing the describe() output Node is much more helpful for debugging and understanding vs wall of text from other to_string() methods.
- Added conduit::utils::format methods. These methods use fmt to format strings that include fmt style patterns. The formatting arguments are passed as a conduit::Node tree. The `args` case allows named arguments (args passed as object) or ordered args (args passed as list). The `maps` case also supports named or ordered args and works in conjunction with a `map_index`. The `map_index` is used to fetch a value from an array, or list of strings, which is then passed to fmt. The `maps` style of indexed indirection supports generating path strings for non-trivial domain partition mappings in Blueprint. This functionality is also available in Python, via the `conduit.utils.format` method.
- Added DataArray::fill method, which set all elements of a DataArray to a given value.

- Added `DataArray::fill` method, which set all elements of a DataArray to a given value.
- Added `Node::to_summary_string` methods, which allow you to create truncated strings that describe a node tree, control the max number of children and max number of elements shown.

#### Relay
- Added Relay IO Handle mode support for `a` (append) and `t` (truncate). Truncate allows you to overwrite files when the handle is opened. The default is append, which preserves prior IO Handle behavior.
Expand All @@ -43,8 +43,13 @@ and this project aspires to adhere to [Semantic Versioning](https://semver.org/s

### Removed

#### General
- Removed `Node::fetch_child` and `Schema::fetch_child` methods for v0.7.0. (Deprecated in v0.6.0 -- prefer `fetch_existing`)
- Removed `Schema::to_json` method variants with `detailed` for v0.7.0. (Deprecated in v0.6.0 -- prefer standard `to_json`)
- Removed `Schema::save` method variant with `detailed` for v0.7.0. (Deprecated in v0.6.0 -- prefer standard `save`)

#### Relay
- `conduit::relay::io_blueprint::save` methods were removed for v0.7.0. (Deprecated in v0.6.0)
- Removed `conduit::relay::io_blueprint::save` methods for v0.7.0. (Deprecated in v0.6.0 -- prefer `conduit::relay::io::blueprint::save_mesh`)



Expand Down
335 changes: 335 additions & 0 deletions src/libs/conduit/conduit_node.cpp
Expand Up @@ -11981,6 +11981,341 @@ Node::to_string_default() const
return to_string();
}



//-----------------------------------------------------------------------------
// -- Summary string construction methods ---
//-----------------------------------------------------------------------------

//-----------------------------------------------------------------------------
std::string
Node::to_summary_string()const
{
Node opts;
return to_summary_string(opts);
}

//-----------------------------------------------------------------------------
std::string
Node::to_summary_string(const conduit::Node &opts)const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that the other conduit::Node::to_* methods are moving to an options conduit::Node approach? This seems like a break from convention otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to move in that direction, number of options is getting large.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added: #700 to track this

{
std::ostringstream oss;
to_summary_string_stream(oss,opts);
return oss.str();
}

//-----------------------------------------------------------------------------
void
Node::to_summary_string_stream(std::ostream &os,
const conduit::Node &opts) const
{
// unpack options and enforce defaults
index_t num_children_threshold = 7;
index_t num_elements_threshold = 5;
index_t indent = 2;
index_t depth = 0;
std::string pad = " ";
std::string eoe = "\n";

if(opts.has_child("num_children_threshold") &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're using num_children and num_elements because they're standard Conduit vernacular, but could we use something a bit shorter than threshold since it isn't as standard? Maybe something short, like max?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used threshold b/c that is what I settled on for the DataArray class (mirrors numpy's param name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opts["num_children_threshold"].dtype().is_number())
{
num_children_threshold = (index_t)opts["num_children_threshold"].to_int32();
}

if(opts.has_child("num_elements_threshold") &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be useful to include a max/threshold on tree depth as well? I could see it being useful to get a broad idea of the basic building blocks of a schema. Not saying we need to do it for this PR, but something to think about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, we should do that in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added #701 to track this

opts["num_elements_threshold"].dtype().is_number())
{
num_elements_threshold = (index_t)opts["num_elements_threshold"].to_int32();
}

if(opts.has_child("indent") &&
opts["indent"].dtype().is_number())
{
indent = (index_t)opts["indent"].to_int32();
}

if(opts.has_child("depth") &&
opts["depth"].dtype().is_number())
{
depth = (index_t)opts["depth"].to_int32();
}

if(opts.has_child("pad") &&
opts["pad"].dtype().is_string())
{
pad = opts["pad"].as_string();
}

if(opts.has_child("eoe") &&
opts["eoe"].dtype().is_string())
{
eoe = opts["eoe"].as_string();
}

to_summary_string_stream(os,
num_children_threshold,
num_elements_threshold,
indent,
depth,
pad,
eoe);
}


//-----------------------------------------------------------------------------
//-- (private interface)
//-----------------------------------------------------------------------------
void
Node::to_summary_string_stream(const std::string &stream_path,
const conduit::Node &opts) const
{
std::ofstream ofs;
ofs.open(stream_path.c_str());
if(!ofs.is_open())
{
CONDUIT_ERROR("<Node::to_summary_string_stream> failed to open file: "
<< "\"" << stream_path << "\"");
}
to_summary_string_stream(ofs,opts);
ofs.close();
}

//-----------------------------------------------------------------------------
std::string
Node::to_summary_string_default() const
{
return to_summary_string();
}


//-----------------------------------------------------------------------------
//-- (private interface)
//-----------------------------------------------------------------------------
void
Node::to_summary_string_stream(std::ostream &os,
index_t num_children_threshold,
index_t num_elements_threshold,
index_t indent,
index_t depth,
const std::string &pad,
const std::string &eoe) const
{
// rubber, say hello to the road:

std::ios_base::fmtflags prev_stream_flags(os.flags());
os.precision(15);
if(dtype().id() == DataType::OBJECT_ID)
{
os << eoe;
int nchildren = m_children.size();
int threshold = num_children_threshold;

// if we are neg or zero, show all children
if(threshold <=0)
{
threshold = nchildren;
}

// if above threshold only show threshold # of values
int half = threshold / 2;
int bottom = half;
int top = half;
int num_skipped = m_children.size() - threshold;

//
// if odd, show 1/2 +1 first
//

if( (threshold % 2) > 0)
{
bottom++;
}

bool done = (nchildren == 0);
int idx = 0;

while(!done)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in these loops is rather hard for me to follow. I think using the lambdas we're afforded by C++11 could make it much more legible, e.g.:

const auto to_summary = [] (const conduit::Node& n, ...)
{
    // ... //
}

for(index_t idx = 0; idx <= bottom; idx++)
{
    to_summary(node);
}

// ... skipped print logic goes here ... //

for(index_t idx = nchildren - top; idx < nchildren; idx++)
{
    to_summary(node);
}

Just my 2 cents here; I don't mind keeping this as is for the sake of expediency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added #699 to track this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i think this is great idea)

{
utils::indent(os,indent,depth,pad);
os << m_schema->object_order()[idx] << ": ";
m_children[idx]->to_summary_string_stream(os,
num_children_threshold,
num_elements_threshold,
indent,
depth+1,
pad,
eoe);

// if the child is a leaf, we need eoe
if(m_children[idx]->number_of_children() == 0)
os << eoe;

idx++;

if(idx == bottom && num_skipped > 0)
{
utils::indent(os,indent,depth,pad);
idx = nchildren - top;
os << "... ( skipped "
<< num_skipped;
if( num_skipped == 1)
{
os << " child )";
}
else
{
os << " children )";
}
os << eoe;
}

if(idx == nchildren)
{
done = true;
}
}
}
else if(dtype().id() == DataType::LIST_ID)
{
os << eoe;
int nchildren = m_children.size();
int threshold = num_children_threshold;

// if we are neg or zero, show all children
if(threshold <=0)
{
threshold = nchildren;
}

// if above threshold only show threshold # of values
int half = threshold / 2;
int bottom = half;
int top = half;
int num_skipped = m_children.size() - threshold;

//
// if odd, show 1/2 +1 first
//

if( (threshold % 2) > 0)
{
bottom++;
}

bool done = (nchildren == 0);
int idx = 0;

while(!done)
{
utils::indent(os,indent,depth,pad);
os << "- ";
m_children[idx]->to_summary_string_stream(os,
num_children_threshold,
num_elements_threshold,
indent,
depth+1,
pad,
eoe);

// if the child is a leaf, we need eoe
if(m_children[idx]->number_of_children() == 0)
os << eoe;

idx++;

if(idx == bottom && num_skipped > 0)
{
utils::indent(os,indent,depth,pad);
idx = nchildren - top;
os << "... ( skipped "
<< num_skipped;
if( num_skipped == 1)
{
os << " child )";
}
else
{
os << " children )";
}
os << eoe;
}

if(idx == nchildren)
{
done = true;
}
}
}
else // assume leaf data type
{
// if we are neg or zero, show full array
//
if(num_elements_threshold <= 0)
{
num_elements_threshold = dtype().number_of_elements();
}

switch(dtype().id())
{
// ints
case DataType::INT8_ID:
as_int8_array().to_summary_string_stream(os,
num_elements_threshold);
break;
case DataType::INT16_ID:
as_int16_array().to_summary_string_stream(os,
num_elements_threshold);
break;
case DataType::INT32_ID:
as_int32_array().to_summary_string_stream(os,
num_elements_threshold);
break;
case DataType::INT64_ID:
as_int64_array().to_summary_string_stream(os,
num_elements_threshold);
break;
// uints
case DataType::UINT8_ID:
as_uint8_array().to_summary_string_stream(os,
num_elements_threshold);
break;
case DataType::UINT16_ID:
as_uint16_array().to_summary_string_stream(os,
num_elements_threshold);
break;
case DataType::UINT32_ID:
as_uint32_array().to_summary_string_stream(os,
num_elements_threshold);
break;
case DataType::UINT64_ID:
as_uint64_array().to_summary_string_stream(os,
num_elements_threshold);
break;
// floats
case DataType::FLOAT32_ID:
as_float32_array().to_summary_string_stream(os,
num_elements_threshold);
break;
case DataType::FLOAT64_ID:
as_float64_array().to_summary_string_stream(os,
num_elements_threshold);
break;
// char8_str
case DataType::CHAR8_STR_ID:
os << "\""
<< utils::escape_special_chars(as_string())
<< "\"";
break;
// empty
case DataType::EMPTY_ID:
break;
}
}

os.flags(prev_stream_flags);
}

//-----------------------------------------------------------------------------
// -- JSON construction methods ---
//-----------------------------------------------------------------------------
Expand Down