Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tutorials to use write_vtu_with_pvtu_record #8963

Merged
merged 5 commits into from Dec 1, 2019

Conversation

peterrum
Copy link
Member

@peterrum peterrum commented Oct 25, 2019

some of the comments still have to be updated...

FYI @nfehn

closes #8960

data_out.write_pvtu_record(pvtu_master_output, filenames);

std::ofstream visit_master_output(
(output_dir + filename_base + ".visit"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we want to do with this .visit file?

@peterrum peterrum changed the title Update tutorials to use write_vtu_with_pvtu_record [WIP] Update tutorials to use write_vtu_with_pvtu_record Oct 26, 2019
@tjhei
Copy link
Member

tjhei commented Oct 26, 2019

you can drop step-50, see #8948

examples/step-18/step-18.cc Outdated Show resolved Hide resolved
DataOutBase::write_pvd_record(pvd_output, times_and_names);
}
// Let us open a file and write the data we have generated into it:
data_out.write_vtu_with_pvtu_record(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to timestep_no?

const std::string visit_master_filename =
("solution-" + Utilities::int_to_string(out_index, 5) + ".visit");
std::ofstream visit_master(visit_master_filename);
DataOutBase::write_visit_record(visit_master, filenames);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the visit record, do we care? @bangerth are you using .pvtu or .visit with visit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the .visit file. I think it is true that Visit can now read .pvd files, right?

}
// The next step is to write this data to disk.
data_out.write_vtu_with_pvtu_record(
"./", "solution-", cycle, 2, mpi_communicator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use MPI I/O here and explain it in a paragraph or so? Maybe use 8 or so?

@peterrum
Copy link
Member Author

@tjhei Thanks for your comments! I have integrated them. The only question is what happens with the visit record. In step-50, you have eliminated it...

@tjhei
Copy link
Member

tjhei commented Oct 26, 2019

I don't care about it, but I let Wolfgang comment if he needs them to use with visit. 😉

examples/step-18/step-18.cc Outdated Show resolved Hide resolved
@bangerth
Copy link
Member

bangerth commented Oct 26, 2019 via email

@peterrum
Copy link
Member Author

@bangerth I find it quite annoying to keep the .visit-stuff because for that we have to collect the filenames (which also happens within write_vtu_with_pvtu_record). The result is that the code will not be simpler as before...

Personally, I think there should be no write_vtu_with_pvtu_record() but there should be a DataOutFlags::Vtu (similarly as GridOutFlags::Vtu), which configures write_vtu() in regard of number of output files, .pvtu- and .visit-output (and other parameters). Or does anyone want to implement write_vtu_with_visit_record()?

@tjhei
Copy link
Member

tjhei commented Oct 26, 2019

I find it quite annoying to keep the .visit-stuff because for that we have to collect t

okay, so the current .visit master file format lists all individual files at each timestep:

!TIME 0
!TIME 1
!TIME 2
!TIME 3
!NBLOCKS 4
solution/solution-00000.0000.vtu
solution/solution-00000.0001.vtu
solution/solution-00000.0002.vtu
solution/solution-00000.0003.vtu
solution/solution-00001.0000.vtu
solution/solution-00001.0001.vtu
solution/solution-00001.0002.vtu
solution/solution-00001.0003.vtu
solution/solution-00002.0000.vtu
solution/solution-00002.0001.vtu
solution/solution-00002.0002.vtu
solution/solution-00002.0003.vtu
solution/solution-00003.0000.vtu
solution/solution-00003.0001.vtu
solution/solution-00003.0002.vtu
solution/solution-00003.0003.vtu

I just played around with this, and this works for me in visit as well:

!TIME 0
!TIME 1
!TIME 2
!TIME 3
!NBLOCKS 1
solution/solution-00000.pvtu
solution/solution-00001.pvtu
solution/solution-00002.pvtu
solution/solution-00003.pvtu

This means we can make a new function to generate a visit master record (like the .pvd) that doesn't need the individual filenames but references the .pvtus!

edit: @peterrum I can make these changes to the master record function, if you want. This means you can delete everything related to .visit files here for now.

@@ -1333,6 +1288,8 @@ namespace Step18
std::pair<double, std::string>(present_time, pvtu_master_filename));
std::ofstream pvd_output("solution.pvd");
DataOutBase::write_pvd_record(pvd_output, times_and_names);
std::ofstream visit_output("solution.visit");
DataOutBase::write_visit_record(visit_output, times_and_names);
Copy link
Member Author

@peterrum peterrum Oct 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjhei Like this? In the other visit-cases this does not work. Here it is used as an equivalent for .pvd, there is is used as an equivalent for .pvtu.

@peterrum
Copy link
Member Author

edit: @peterrum I can make these changes to the master record function, if you want. This means you can delete everything related to .visit files here for now.

@tjhei I have removed the .visit instances.

trailing dashes look ugly as the filename is "solution-_0001" etc.
@tjhei
Copy link
Member

tjhei commented Oct 29, 2019

I pushed a couple of minor changes on top of yours.

To explain the situation, there are two different kind of records:

  1. group pieces into a single time step (.pvtu can be read by visit and paraview and a .visit that can only be read by visit)
    => my vote would be to only write the .pvtu because:
    1. fewer files
    2. .pvtu is a standard, .visit is not
    3. less code in tutorials
    4. it is confusing to have two different kind of .visit files (see below)
  2. There are master records that contain several time steps with times (.pvd and .visit, can only be read by paraview and visit, respectively).
    => it turns out non of the tutorials were writing this file in visit format. I added it for step-18 here (I can look at other examples as well)

@bangerth are you okay with this?

@peterrum
Copy link
Member Author

@tjhei looks good! Thanks!

@peterrum
Copy link
Member Author

Is there a way to test the tutorials?

@tjhei
Copy link
Member

tjhei commented Nov 1, 2019

Is there a way to test the tutorials?

No, I am not aware of a way to do that easily.

// we have generated into it:
std::ofstream output(filename);
data_out.write_vtu(output);
// Let us open a file and write the data we have generated into it:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Let us open a file and write the data we have generated into it:
// Let us call a function that opens the necessary output files and writes the
// data we have generated into them. The function automatically constructs
// the file names from the given directory name (the first argument) and file
// name base (second argument). It augments the resulting string by pieces
// that result from the time step number and a "piece number" that corresponds
// to a part of the overall domain that can consist of one or more subdomains.
//
// The function also writes a record files (with suffix `.pvd`) for Paraview
// that describes how all of these output files combine into the data for
// this single time step:

std::ofstream output(filename);
data_out.write_vtu(output);
// Let us open a file and write the data we have generated into it:
const auto pvtu_master_filename = data_out.write_vtu_with_pvtu_record(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be explicit:

Suggested change
const auto pvtu_master_filename = data_out.write_vtu_with_pvtu_record(
const std::string pvtu_master_filename = data_out.write_vtu_with_pvtu_record(

@@ -1333,6 +1288,13 @@ namespace Step18
std::pair<double, std::string>(present_time, pvtu_master_filename));
std::ofstream pvd_output("solution.pvd");
DataOutBase::write_pvd_record(pvd_output, times_and_names);

std::ofstream visit_output("solution.visit");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::ofstream visit_output("solution.visit");
// The final piece of this is to also write a master record for Visit, in the same way as the
// call above already did for Paraview:
std::ofstream visit_output("solution.visit");

const std::string visit_master_filename =
("solution-" + Utilities::int_to_string(out_index, 5) + ".visit");
std::ofstream visit_master(visit_master_filename);
DataOutBase::write_visit_record(visit_master, filenames);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the .visit file. I think it is true that Visit can now read .pvd files, right?


std::ofstream visit_output("solution.visit");
static std::vector<std::pair<double, std::vector<std::string>>>
times_and_pieces;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of a static variable is really not very pleasant here because it means that you can't call the simulator twice in a row, or twice in parallel. I think it would be nicer to put this into a member variable instead.

But maybe we don't need this -- does Visit now read .pvd files?

DataOutBase::write_visit_record(visit_master, filenames);
}
static int out_index = 0;
data_out.write_vtu_with_pvtu_record(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with the static variable -- I recognize that that's preexisting, but if it were me, I'd say let's just make that a member variable with a more descriptive name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do the static variables in a separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

// in parallel with the help of MPI-IO. Additionally a PVTU record is
// generated, which groups the written VTU files.
data_out.write_vtu_with_pvtu_record(
"./", "solution", cycle, 2, mpi_communicator, 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 8 a good number? We ran this program with up to 16k processors. I'd expect that with just 8 blocks we'd get into a rather bad bottleneck here!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just 8 blocks we'd get into a rather bad bottleneck here

It should not be a bottleneck. All 16k processes would write in parallel in 8 files via MPI-IO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is several considerations:

  1. Does writing the output scale? When I wrote the routines (a few years ago), I tested writing into a single output file from 1000+ cores and got excellent performance (10s of GB/s). I don't see a problem with this.
  2. Is this sensible for visualization? If you do visualization in serial, it doesn't matter. If you visualize in parallel, it might be helpful to have more than one file. With 8, you can at least run on 8 nodes. Maybe it makes sense to increase this number slightly? Not that it matters that much...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we used max(8,ceil(n_procs/128))? I mean, right now we write one file per processor; maybe we can find a compromise that at least makes sure we don't make this the one function that blocks everything if someone tried to run the program on 10k processors...

@tjhei
Copy link
Member

tjhei commented Nov 2, 2019

I'm using the .visit file. I think it is true that Visit can now read .pvd files, right?

Yes, I verified that this works. Did you read my comment above? #8963 (comment)

@bangerth
Copy link
Member

bangerth commented Nov 3, 2019

@tjhei: Thanks, I had missed that. Then let's not worry about .visit files any more from now on.

@peterrum
Copy link
Member Author

peterrum commented Nov 5, 2019

@tjhei @bangerth I have implemented the suggestions. However, I have decided against max(8,ceil(n_procs/128)). Personally, I think everything above 1 is not really useful. This should be handled by a suitable configuration of MPI I/O according to the GPFS at hand. We show here an example, where we set the value to an arbitrary value (8), but actually it should be by default 1.

@tjhei
Copy link
Member

tjhei commented Nov 5, 2019

Personally, I think everything above 1 is not really useful.

It should be helpful if you use parallel visualization (paraview server for example).

but actually it should be by default 1.

from the write performance point of view? Likely, yes.

@bangerth: maybe we can find a compromise that at least makes sure we don't make this the one function that blocks everything if someone tried to run the program on 10k processors...

I have to admit that I haven't made a performance comparison with the number of files and maybe we should test before making any more guesses (especially complicated ones)? I am fairly certain that 1 file is way more performant than anything else. I don't see why 8 files would "block everything" in a large run.

To summarize: IO is hard for large runs and us guessing is probably not a good strategy here. What do we want to achieve here: the best performance, something instructional, or something safe?

@peterrum
Copy link
Member Author

peterrum commented Nov 5, 2019

It should be helpful if you use parallel visualization (paraview server for example).

I see! Thanks for the info!

@peterrum
Copy link
Member Author

Anything missing?

@masterleinad
Copy link
Member

/rebuild

Copy link
Member

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move forward here.

@masterleinad masterleinad merged commit c5626c5 into dealii:master Dec 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update tutorials step-37/18/32/45/55/48/42/64/50/40 to recent changes in PR #8904
4 participants