Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gcylon integration #470

Merged
merged 62 commits into from
Aug 30, 2021
Merged

Gcylon integration #470

merged 62 commits into from
Aug 30, 2021

Conversation

ahmet-uyar
Copy link
Collaborator

need to add test cases for gcylon. not yet to be merged.

.github/workflows/conda-actions.yml Show resolved Hide resolved
cpp/src/examples/gcylon/CMakeLists.txt Show resolved Hide resolved
cpp/src/examples/gcylon/shuffle_perf.cpp Outdated Show resolved Hide resolved
cpp/src/gcylon/gtable.hpp Outdated Show resolved Hide resolved
cpp/src/gcylon/utils/util.cpp Outdated Show resolved Hide resolved
cpp/src/gcylon/utils/util.hpp Show resolved Hide resolved
@supunkamburugamuve
Copy link
Collaborator

Added some initial reviews. In general, lets remove all the comment out code.

#include <examples/gcylon/print.hpp>

using std::cout;
using std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is discouraged to use std::cout here and best practice is to use them at the code


auto mpi_config = std::make_shared<cylon::net::MPIConfig>();
auto ctx = cylon::CylonContext::InitDistributed(mpi_config);
int myRank = ctx->GetRank();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should follow a consistant naming, this is a camel case naming while the other variable is with _

std::shared_ptr<cudf::table> tbl2 = constructTable(COLS, ROWS, start + 5, true);
auto tv2 = tbl2->view();
cout << "myRank: " << myRank << ", initial dataframe. cols: "<< tv2.num_columns() << ", rows: " << tv2.num_rows() << endl;
cout << "myRank: " << myRank << ", initial dataframe................................. " << endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second print is redundant? this case and the above case

#include <cylon/net/mpi/mpi_communicator.hpp>

using std::cout;
using std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not use the using std::cout

auto ctx = cylon::CylonContext::InitDistributed(mpi_config);
int myRank = ctx->GetRank();

LOG(INFO) << "myRank: " << myRank << ", world size: " << ctx->GetWorldSize();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example uses LOG while the gjoin.cpp example uses std::cout, lets use LOG

cudf::io::sink_info sinkInfo(outFile);
cudf::io::csv_writer_options writerOptions = cudf::io::csv_writer_options::builder(sinkInfo, tv);
cudf::io::write_csv(writerOptions);
cout << myRank << ", written joined table to the file: " << outFile << endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a mix of cout and LOG, I think we should stick to one

std::shared_ptr<cudf::table> tbl = constructTable(COLS, rows);
auto tv = tbl->view();
if (myRank == 0) {
cout << "myRank: " << myRank << ", initial dataframe................................. " << endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same commens as gjoin_csv.cpp for cout and log

}
cout << "myRank: " << myRank << ", rows in shuffled df: "<< shuffledtv.num_rows() << endl;

if (RESULT_TO_FILE) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is redundant

// this creates a new buffer, so less efficient
// int32_t offsetBase = getScalar<int32_t>((uint8_t *)offsetsBuffer->data());
// if (offsetBase > 0) {
// auto base = std::make_unique<cudf::numeric_scalar<int32_t>>(offsetBase, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove this


namespace gcylon {

__global__ void rebaseOffsets(int32_t * arr, int size, int32_t base) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the indentation seems to be off


namespace gcylon {

bool equal(cudf::table_view & tv1, cudf::table_view & tv2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a more informative name such as table_equal

extra_compile_args = os.popen("mpic++ --showme:compile").read().strip().split(' ')
extra_link_args = os.popen("mpic++ --showme:link").read().strip().split(' ')
extra_compile_args = extra_compile_args + extra_link_args + additional_compile_args
# extra_compile_args = additional_compile_args
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove thse

@ahmet-uyar ahmet-uyar merged commit 3c823f6 into cylondata:main Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants