Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NanoAOD] Don't handle column types redundantly anymore #30436

Merged
merged 8 commits into from Jul 24, 2020
155 changes: 59 additions & 96 deletions DataFormats/NanoAOD/interface/FlatTable.h
Expand Up @@ -3,12 +3,12 @@

#include "DataFormats/Math/interface/libminifloat.h"
#include "FWCore/Utilities/interface/Exception.h"

#include <boost/range/sub_range.hpp>
#include "FWCore/Utilities/interface/Span.h"

#include <cstdint>
#include <vector>
#include <string>
#include <type_traits>

namespace nanoaod {

Expand All @@ -17,7 +17,8 @@ namespace nanoaod {
struct MaybeMantissaReduce {
MaybeMantissaReduce(int mantissaBits) {}
inline T one(const T &val) const { return val; }
inline void bulk(boost::sub_range<std::vector<T>> data) const {}
template <typename Span>
inline void bulk(Span const &data) const {}
};
template <>
struct MaybeMantissaReduce<float> {
Expand All @@ -26,7 +27,8 @@ namespace nanoaod {
inline float one(const float &val) const {
return (bits_ > 0 ? MiniFloatConverter::reduceMantissaToNbitsRounding(val, bits_) : val);
}
inline void bulk(boost::sub_range<std::vector<float>> data) const {
template <typename Span>
inline void bulk(Span &&data) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why rvalue reference instead of

Suggested change
inline void bulk(Span &&data) const {
inline void bulk(Span const& data) const {

?

Copy link
Contributor Author

@guitargeek guitargeek Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I once read this blog post about move semantics:
https://eli.thegreenplace.net/2014/perfect-forwarding-and-universal-references-in-c/#id6

Specifically the part "Don't let T&& fool you here ...". So it's not a rvalue reference, but makes the function automatically deduce if taking the argument by value or by reference is appropriate. Therefore it appeared to me that it might be a good habit to use && by default for template deduced arguments, especially in this class where bulk is later used with an rvalue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T&& is sometimes referred to as the universal reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this isocpp blogpost is a more official reference for these universal references:
https://isocpp.org/blog/2012/11/universal-references-in-c11-scott-meyers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, forgot that detail, never mind then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular case I do not see a need for the universal reference as the type is directly interacted with instead of passed to another function via the use of std::forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's such a minor change that I think I can also do it later in one go when I rebase this PR for the nanoAOD repository, or if you have more major comments, if this is okay.

Speaking about perfect forwarding, I recently noticed in the PR where I used FWCore/SOA that the SOA Table might make several unnecessary copies because even though std::forward is used, this is not matched by the corresponding universal reference for the argument, for example here (and also in other functions for the implementation details):
https://github.com/cms-sw/cmssw/blob/master/FWCore/SOA/interface/Table.h#L145

I didn't open an issue about this because I didn't think it was that important, but I it's just something to keep in mind for the next time anyone will work with FWCore/SOA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's such a minor change that I think I can also do it later in one go when I rebase this PR for the nanoAOD repository, or if you have more major comments, if this is okay.

Perfectly fine for me, it is minor indeed.

I didn't open an issue about this because I didn't think it was that important, but I it's just something to keep in mind for the next time anyone will work with FWCore/SOA.

The best way to remind us (core) is to open an issue :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks!

if (bits_ > 0)
MiniFloatConverter::reduceMantissaToNbitsRounding(bits_, data.begin(), data.end(), data.begin());
}
Expand All @@ -35,11 +37,11 @@ namespace nanoaod {

class FlatTable {
public:
enum ColumnType {
FloatColumn,
IntColumn,
UInt8Column,
BoolColumn
enum class ColumnType {
Float,
Int,
UInt8,
Bool
}; // We could have other Float types with reduced mantissa, and similar

FlatTable() : size_(0) {}
Expand All @@ -65,21 +67,21 @@ namespace nanoaod {

/// get a column by index (const)
template <typename T>
boost::sub_range<const std::vector<T>> columnData(unsigned int column) const {
auto columnData(unsigned int column) const {
auto begin = beginData<T>(column);
return boost::sub_range<const std::vector<T>>(begin, begin + size_);
return edm::Span(begin, begin + size_);
}

/// get a column by index (non-const)
template <typename T>
boost::sub_range<std::vector<T>> columnData(unsigned int column) {
auto columnData(unsigned int column) {
auto begin = beginData<T>(column);
return boost::sub_range<std::vector<T>>(begin, begin + size_);
return edm::Span(begin, begin + size_);
}

/// get a column value for singleton (const)
template <typename T>
const T &columValue(unsigned int column) const {
const auto &columValue(unsigned int column) const {
if (!singleton())
throw cms::Exception("LogicError", "columnValue works only for singleton tables");
return *beginData<T>(column);
Expand All @@ -104,49 +106,45 @@ namespace nanoaod {
};
RowView row(unsigned int row) const { return RowView(*this, row); }

template <typename T, typename C = std::vector<T>>
void addColumn(const std::string &name,
const C &values,
const std::string &docString,
ColumnType type = defaultColumnType<T>(),
int mantissaBits = -1) {
template <typename T, typename C>
void addColumn(const std::string &name, const C &values, const std::string &docString, int mantissaBits = -1) {
if (columnIndex(name) != -1)
throw cms::Exception("LogicError", "Duplicated column: " + name);
if (values.size() != size())
throw cms::Exception("LogicError", "Mismatched size for " + name);
check_type<T>(type); // throws if type is wrong
auto &vec = bigVector<T>();
columns_.emplace_back(name, docString, type, vec.size());
columns_.emplace_back(name, docString, defaultColumnType<T>(), vec.size());
vec.insert(vec.end(), values.begin(), values.end());
if (type == FloatColumn) {
flatTableHelper::MaybeMantissaReduce<T>(mantissaBits).bulk(columnData<T>(columns_.size() - 1));
}
flatTableHelper::MaybeMantissaReduce<T>(mantissaBits).bulk(columnData<T>(columns_.size() - 1));
}

template <typename T, typename C>
void addColumnValue(const std::string &name,
const C &value,
const std::string &docString,
ColumnType type = defaultColumnType<T>(),
int mantissaBits = -1) {
void addColumnValue(const std::string &name, const C &value, const std::string &docString, int mantissaBits = -1) {
if (!singleton())
throw cms::Exception("LogicError", "addColumnValue works only for singleton tables");
if (columnIndex(name) != -1)
throw cms::Exception("LogicError", "Duplicated column: " + name);
check_type<T>(type); // throws if type is wrong
auto &vec = bigVector<T>();
columns_.emplace_back(name, docString, type, vec.size());
if (type == FloatColumn) {
vec.push_back(flatTableHelper::MaybeMantissaReduce<T>(mantissaBits).one(value));
} else {
vec.push_back(value);
}
columns_.emplace_back(name, docString, defaultColumnType<T>(), vec.size());
vec.push_back(flatTableHelper::MaybeMantissaReduce<T>(mantissaBits).one(value));
}

void addExtension(const FlatTable &extension);

template <class T>
struct dependent_false : std::false_type {};
template <typename T>
static ColumnType defaultColumnType() {
throw cms::Exception("unsupported type");
if constexpr (std::is_same<T, float>())
return ColumnType::Float;
else if constexpr (std::is_same<T, int>())
return ColumnType::Int;
else if constexpr (std::is_same<T, uint8_t>())
return ColumnType::UInt8;
else if constexpr (std::is_same<T, bool>())
return ColumnType::Bool;
else
static_assert(dependent_false<T>::value, "unsupported type");
}

// this below needs to be public for ROOT, but it is to be considered private otherwise
Expand All @@ -161,25 +159,36 @@ namespace nanoaod {

private:
template <typename T>
typename std::vector<T>::const_iterator beginData(unsigned int column) const {
const Column &col = columns_[column];
check_type<T>(col.type); // throws if type is wrong
return bigVector<T>().begin() + col.firstIndex;
auto beginData(unsigned int column) const {
return bigVector<T>().cbegin() + columns_[column].firstIndex;
}
template <typename T>
typename std::vector<T>::iterator beginData(unsigned int column) {
const Column &col = columns_[column];
check_type<T>(col.type); // throws if type is wrong
return bigVector<T>().begin() + col.firstIndex;
auto beginData(unsigned int column) {
return bigVector<T>().begin() + columns_[column].firstIndex;
}

template <typename T>
const std::vector<T> &bigVector() const {
throw cms::Exception("unsupported type");
auto const &bigVector() const {
return bigVectorImpl<T>(*this);
}
template <typename T>
std::vector<T> &bigVector() {
throw cms::Exception("unsupported type");
auto &bigVector() {
return bigVectorImpl<T>(*this);
}

template <typename T, class This>
static auto &bigVectorImpl(This &table) {
// helper function to avoid code duplication, for the two accessor functions that differ only in const-ness
if constexpr (std::is_same<T, float>())
return table.floats_;
else if constexpr (std::is_same<T, int>())
return table.ints_;
else if constexpr (std::is_same<T, uint8_t>())
return table.uint8s_;
else if constexpr (std::is_same<T, bool>())
return table.uint8s_;
else
static_assert(dependent_false<T>::value, "unsupported type");
}

unsigned int size_;
Expand All @@ -189,54 +198,8 @@ namespace nanoaod {
std::vector<float> floats_;
std::vector<int> ints_;
std::vector<uint8_t> uint8s_;

template <typename T>
static void check_type(FlatTable::ColumnType type) {
throw cms::Exception("unsupported type");
}
};

template <>
inline void FlatTable::check_type<float>(FlatTable::ColumnType type) {
if (type != FlatTable::FloatColumn)
throw cms::Exception("mismatched type");
}
template <>
inline void FlatTable::check_type<int>(FlatTable::ColumnType type) {
if (type != FlatTable::IntColumn)
throw cms::Exception("mismatched type");
}
template <>
inline void FlatTable::check_type<uint8_t>(FlatTable::ColumnType type) {
if (type != FlatTable::UInt8Column && type != FlatTable::BoolColumn)
throw cms::Exception("mismatched type");
}

template <>
inline const std::vector<float> &FlatTable::bigVector<float>() const {
return floats_;
}
template <>
inline const std::vector<int> &FlatTable::bigVector<int>() const {
return ints_;
}
template <>
inline const std::vector<uint8_t> &FlatTable::bigVector<uint8_t>() const {
return uint8s_;
}
template <>
inline std::vector<float> &FlatTable::bigVector<float>() {
return floats_;
}
template <>
inline std::vector<int> &FlatTable::bigVector<int>() {
return ints_;
}
template <>
inline std::vector<uint8_t> &FlatTable::bigVector<uint8_t>() {
return uint8s_;
}

} // namespace nanoaod

#endif
28 changes: 16 additions & 12 deletions DataFormats/NanoAOD/src/FlatTable.cc
Expand Up @@ -13,16 +13,20 @@ void nanoaod::FlatTable::addExtension(const nanoaod::FlatTable& other) {
throw cms::Exception("LogicError", "Mismatch in adding extension");
for (unsigned int i = 0, n = other.nColumns(); i < n; ++i) {
switch (other.columnType(i)) {
case FloatColumn:
addColumn<float>(other.columnName(i), other.columnData<float>(i), other.columnDoc(i), other.columnType(i));
case ColumnType::Float:
addColumn<float>(other.columnName(i), other.columnData<float>(i), other.columnDoc(i));
break;
case IntColumn:
addColumn<int>(other.columnName(i), other.columnData<int>(i), other.columnDoc(i), other.columnType(i));
case ColumnType::Int:
addColumn<int>(other.columnName(i), other.columnData<int>(i), other.columnDoc(i));
break;
case BoolColumn: // as UInt8
case UInt8Column:
addColumn<uint8_t>(other.columnName(i), other.columnData<uint8_t>(i), other.columnDoc(i), other.columnType(i));
case ColumnType::Bool:
addColumn<bool>(other.columnName(i), other.columnData<bool>(i), other.columnDoc(i));
break;
case ColumnType::UInt8:
addColumn<uint8_t>(other.columnName(i), other.columnData<uint8_t>(i), other.columnDoc(i));
break;
default:
throw cms::Exception("LogicError", "Unsupported type");
}
}
}
Expand All @@ -31,13 +35,13 @@ double nanoaod::FlatTable::getAnyValue(unsigned int row, unsigned int column) co
if (column >= nColumns())
throw cms::Exception("LogicError", "Invalid column");
switch (columnType(column)) {
case FloatColumn:
case ColumnType::Float:
return *(beginData<float>(column) + row);
case IntColumn:
case ColumnType::Int:
return *(beginData<int>(column) + row);
case BoolColumn:
return *(beginData<uint8_t>(column) + row);
case UInt8Column:
case ColumnType::Bool:
return *(beginData<bool>(column) + row);
case ColumnType::UInt8:
return *(beginData<uint8_t>(column) + row);
}
throw cms::Exception("LogicError", "Unsupported type");
Expand Down
37 changes: 37 additions & 0 deletions FWCore/Utilities/interface/Span.h
@@ -0,0 +1,37 @@
#ifndef FWCore_Utilities_Span_h
#define FWCore_Utilities_Span_h

#include <cstddef>

namespace edm {
/*
*An edm::Span wraps begin() and end() iterators to a contiguous sequence
of objects with the first element of the sequence at position zero,
In other words the iterators should refer to random-access containers.

To be replaced with std::Span in C++20.
*/

template <class T>
class Span {
public:
Span(T begin, T end) : begin_(begin), end_(end) {}

T begin() const { return begin_; }
T end() const { return end_; }

bool empty() const { return begin_ == end_; }
auto size() const { return end_ - begin_; }

auto const& operator[](std::size_t idx) const { return *(begin_ + idx); }

auto const& front() const { return *begin_; }
auto const& back() const { return *(end_ - 1); }

private:
const T begin_;
const T end_;
};
}; // namespace edm

#endif