-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gcylon integration #470
Gcylon integration #470
Conversation
Added some initial reviews. In general, lets remove all the comment out code. |
cpp/src/examples/gcylon/gjoin.cpp
Outdated
#include <examples/gcylon/print.hpp> | ||
|
||
using std::cout; | ||
using std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is discouraged to use std::cout here and best practice is to use them at the code
cpp/src/examples/gcylon/gjoin.cpp
Outdated
|
||
auto mpi_config = std::make_shared<cylon::net::MPIConfig>(); | ||
auto ctx = cylon::CylonContext::InitDistributed(mpi_config); | ||
int myRank = ctx->GetRank(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should follow a consistant naming, this is a camel case naming while the other variable is with _
cpp/src/examples/gcylon/gjoin.cpp
Outdated
std::shared_ptr<cudf::table> tbl2 = constructTable(COLS, ROWS, start + 5, true); | ||
auto tv2 = tbl2->view(); | ||
cout << "myRank: " << myRank << ", initial dataframe. cols: "<< tv2.num_columns() << ", rows: " << tv2.num_rows() << endl; | ||
cout << "myRank: " << myRank << ", initial dataframe................................. " << endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second print is redundant? this case and the above case
#include <cylon/net/mpi/mpi_communicator.hpp> | ||
|
||
using std::cout; | ||
using std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not use the using std::cout
auto ctx = cylon::CylonContext::InitDistributed(mpi_config); | ||
int myRank = ctx->GetRank(); | ||
|
||
LOG(INFO) << "myRank: " << myRank << ", world size: " << ctx->GetWorldSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example uses LOG while the gjoin.cpp example uses std::cout, lets use LOG
cudf::io::sink_info sinkInfo(outFile); | ||
cudf::io::csv_writer_options writerOptions = cudf::io::csv_writer_options::builder(sinkInfo, tv); | ||
cudf::io::write_csv(writerOptions); | ||
cout << myRank << ", written joined table to the file: " << outFile << endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a mix of cout and LOG, I think we should stick to one
cpp/src/examples/gcylon/gshuffle.cpp
Outdated
std::shared_ptr<cudf::table> tbl = constructTable(COLS, rows); | ||
auto tv = tbl->view(); | ||
if (myRank == 0) { | ||
cout << "myRank: " << myRank << ", initial dataframe................................. " << endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same commens as gjoin_csv.cpp for cout and log
} | ||
cout << "myRank: " << myRank << ", rows in shuffled df: "<< shuffledtv.num_rows() << endl; | ||
|
||
if (RESULT_TO_FILE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check is redundant
// this creates a new buffer, so less efficient | ||
// int32_t offsetBase = getScalar<int32_t>((uint8_t *)offsetsBuffer->data()); | ||
// if (offsetBase > 0) { | ||
// auto base = std::make_unique<cudf::numeric_scalar<int32_t>>(offsetBase, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets remove this
|
||
namespace gcylon { | ||
|
||
__global__ void rebaseOffsets(int32_t * arr, int size, int32_t base) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the indentation seems to be off
cpp/src/gcylon/utils/util.cpp
Outdated
|
||
namespace gcylon { | ||
|
||
bool equal(cudf::table_view & tv1, cudf::table_view & tv2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest a more informative name such as table_equal
python/pygcylon/setup.py
Outdated
extra_compile_args = os.popen("mpic++ --showme:compile").read().strip().split(' ') | ||
extra_link_args = os.popen("mpic++ --showme:link").read().strip().split(' ') | ||
extra_compile_args = extra_compile_args + extra_link_args + additional_compile_args | ||
# extra_compile_args = additional_compile_args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove thse
need to add test cases for gcylon. not yet to be merged.