Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct a distributed parallel PUMI mesh from own file format #245

Open
a-jp opened this issue Aug 29, 2019 · 22 comments
Open

Construct a distributed parallel PUMI mesh from own file format #245

a-jp opened this issue Aug 29, 2019 · 22 comments

Comments

@a-jp
Copy link
Contributor

a-jp commented Aug 29, 2019

I am very new to PUMI. I am working with mixed cell type unstructured meshes, and using a file format not supported by PUMI. I was told via email that it would be ok to use the issue tracker to ask questions.

I have a mesh format that I can interrogate in parallel from disk, but that format is not directly supported by PUMI. I read 1/nprocs worth of cells on each processor and the associated vertex information for those cells. I have so far created an apf::Mesh2 object by calling apf::buildElement for each cell of a mixed cell-type unstructured mesh that is read on each processor. At the end of this process, each processor has a simply partitioned view of the full mesh (approximately 1/nprocs worth). However, these meshes are completely disconnected as far as PUMI is concerned. I want to reconnect them in parallel. I then want to parallel re-partition the mesh using parmetis to give a better partitioning. My mesh is stored on disk in a single file with global numbers, not in multiple part meshes. I also have no geom definition, just the discrete mesh.

Basically my work flow is: simply partition my mesh on disk by reading in parallel chunks of that mesh onto nprocs (done this); build a mesh that PUMI understands from my own file format (done this and built an apf mesh2); build a parallel view of the distributed mesh by connecting it back together (don’t know how to do this); parallel re-partition again this time properly using parmetis etc via PUMI (don’t know how to do this); and then build ghost/halos cells for that distributed mesh on physical and partition boundaries (don’t know how to do this). From there I’d like to associate scalars and vectors to the vertices, faces, and cells and run my solver. I would then like to mark the vertices, faces and cells into groups appropriate for my simulation such as groupings of cells and boundary conditions.

I would really appreciate if someone could give me a steer as to how to go about my next step which is stitching the mesh together in parallel? I ideally do not want to approach this by writing out part meshes, or indeed having a single processor read the whole mesh and then distribute it. So the above work flow is an attempt to have a parallel process from the start at the cost of partitioning it twice: once simply, and then re-partitioning it properly.

Can I achieve what I’m trying to do within PUMI, or a variant of the above? If I can, I’d really appreciate some help. I appreciate it would be best to approach this if my meshes were in a file format that can be read by PUMI but I am not in that position hence the above work flow.

Thanks,
Andy

@cwsmith
Copy link
Contributor

cwsmith commented Aug 29, 2019

Hello Andy,

What you described can be done with PUMI.

I'm pasting some points from the emails 'for the record'; the questions are good ones and may be helpful to others in the future.

Constructing a Parallel Mesh

construct(...) API

I'd recommend using the 'construct' API to manage the serial to parallel mesh construction.

core/apf/apfConvert.h

Lines 34 to 49 in fe28bfd

/** \brief construct a mesh from just a connectivity array
\details this function is here to interface with very
simple mesh formats. Given a set of elements described
only in terms of the ordered global ids of their vertices,
this function builds a reasonable apf::Mesh2 structure
and as a side effect returns a map from global ids
to local vertices.
This is a fully scalable parallel mesh construction
algorithm, no processor incurs memory or runtime costs
proportional to the global mesh size.
Note that all vertices will have zero coordinates, so
it is often good to use apf::setCoords after this. */
void construct(Mesh2* m, const int* conn, int nelem, int etype,
GlobalToVert& globalToVert);

It reads in mesh connectivity information via arrays in serial, or parallel, to create a partitioned pumi mesh. If your current reader has the logic to read 1/N elements from the serial file then you are well on your way to getting this working. A few examples of its use are below:

apf::construct(m, conn, nelem, etype, outMap);

apf::construct(mesh, m.elements, m.numElms, m.elementType, outMap);

Manual Definition of Remote Copies

The alternative is to manually define the connections between shared mesh entities. A 'shared' entity in an element based mesh partition is an entity that exists on multiple processes to form the closure of mesh elements assigned to those processes. Each shared entity in an APF/PUMI mesh knows about the other copies of that entity to facilitate communications.

The following code from construct(...) API implementation does this:

core/apf/apfConstruct.cc

Lines 152 to 155 in d9eeb2b

constructResidence(m, globalToVert);
constructRemotes(m, globalToVert);
stitchMesh(m);
m->acceptChanges();

Ghosts

The following code creates ghost entities:

core/test/pumi.cc

Lines 786 to 788 in fe28bfd

Ghosting* ghosting_plan = getGhostingPlan(m);
int before_mcount=pumi_mesh_getNumEnt(m, mesh_dim);
pumi_ghost_create(m, ghosting_plan);

Note, it is using an alternative, higher level, API with the PUMI prefix. I'm not as familiar with moving back and forth between PUMI_ objects and apf objects. When we get to this point of implementation @seegyoung may be able to point us at a example of the conversion syntax.

Partitioning

There are a few partitioning utilities provided with core:

split - recursive inertial bisection
zsplit - multi-level graph via zoltan/parmetis
ptnParma - combines partitioning with load balancing (partition improvment) https://github.com/SCOREC/core/wiki/Partitioning

Some useful discussion of these tools is in the following issue starting here:
#241 (comment)

Associating Data with Mesh Entities

Tags

Storing values with mesh entities is most simply done using 'tags'. Here is an example that creates, and sets, one double per mesh element:

core/test/elmBalance.cc

Lines 17 to 26 in fe28bfd

apf::MeshTag* setWeights(apf::Mesh* m) {
apf::MeshIterator* it = m->begin(m->getDimension());
apf::MeshEntity* e;
apf::MeshTag* tag = m->createDoubleTag("parma_weight", 1);
double w = 1.0;
while ((e = m->iterate(it)))
m->setDoubleTag(e, tag, &w);
m->end(it);
return tag;
}

Fields

The field apis (e.g., 'createField' that you found) provide additional functionality. This is an area of code that I'm less familiar with. I'd suggest creating an issue to ask field related questions. Below are a few utilities/examples that use fields.

https://github.com/SCOREC/core/blob/fe28bfd423a4e82dad073acb6a7432d7bdaf66b1/test/fieldReduce.cc

  • creates a field with one double per mesh vertex then exercises functions that accumulate values on shared mesh vertices, and max/min reductions on the shared vertices:

https://github.com/SCOREC/core/blob/fe28bfd423a4e82dad073acb6a7432d7bdaf66b1/test/field_io.cc

  • creates a field with one vector (3 doubles) per mesh vertex, writes it to disk with the mesh, then reads the mesh back and gets the handle/pointer to the existing field

@a-jp
Copy link
Contributor Author

a-jp commented Aug 29, 2019

Thank you! Really helpful. Sorry I didn't add tags I don't know how to do that. I fell down in trying to use the construct API because it appears to require in the 4th argument a constant element type.

apf::construct(mesh, m.elements, m.numElms, m.elementType, outMap);

But I have a mixed element mesh and so I don't understand how to use this for the case that elementType is a vector in my case with an a type per cell. So I went for constructing each element by hand using buildElement. What am I missing? How would I use this for mixed cell type meshes?

Thanks,
Andy

@cwsmith
Copy link
Contributor

cwsmith commented Aug 29, 2019

Ahhh - good catch. Sorry for missing that limitation of construct(..).

We should be able to modify the current version to work with multiple entity types by extending the arguments to be an array connectivity arrays for each user specified element type (or something like that). I'm sure there are some complexities lurking beneath that will make it non-trivial.

I'm spending most of my time in the next 1.5 weeks preparing an SC workshop submission. After that I can put time into the extension. Likewise, we'd be happy to accept a code contribution, and help with it, if you were interested in working on it before I can.

@a-jp
Copy link
Contributor Author

a-jp commented Aug 30, 2019

Ok, I can take a look. To help in that, can I ask, is there an equivalent pumi::construct? I haven't quite figured out whether I should be making an apf::Mesh2 or a pumi_mesh? Or indeed if there is a difference? So should I be using pumi_mesh_createElem rather than the apf api apf::buildElement?

I ask because I think where I want to be is to interface with PUMI and so developing code for an apf interface via modifying the construct functions would need to still allow me to get a pumi_mesh object eventually (in memory as a distributed parallel mesh).

Not sure if that makes total sense, I guess I'm saying that I don't really want to go down the route of getting all this working for my bespoke mesh format via the apf interface only to find out that I should have done everything via pumi_*. So I'm kind of trying to check that now without knowing quite how to ask the question...

@cwsmith
Copy link
Contributor

cwsmith commented Aug 30, 2019

The pumi_ functions are a lightweight wrapper around the APF (mesh and fields) and GMI (geometric model) libraries and provide a more consistent/unified set of APIs; most of the interface is defined in pumi/pumi.h. A few additional functionality have been implemented within this interface; i.e.; they exist above the underlying APF, GMI libraries. One example is ghosting.

I'll work with @seegyoung to nail down the details on constructing/satisfying the pumi object:

core/pumi/pumi.h

Lines 84 to 102 in d9eeb2b

class pumi
{
public:
pumi();
~pumi();
static pumi* instance();
pMesh mesh;
pGeom model;
int* num_local_ent;
int* num_own_ent;
int* num_global_ent;
pMeshTag ghosted_tag;
pMeshTag ghost_tag;
std::vector<pMeshEnt> ghost_vec[4];
std::vector<pMeshEnt> ghosted_vec[4];
private:
static pumi* _instance;
};

from an existing apf::mesh and provide a hello world for calling some ghosting functions.

@cwsmith
Copy link
Contributor

cwsmith commented Sep 12, 2019

@a-jp Did you work out the construction of the pumi instance?

@a-jp
Copy link
Contributor Author

a-jp commented Sep 12, 2019

Sort of. I found out that pMesh is just a typedef to apf::Mesh2 inside pumi.h. Since I've got a Mesh2 object I just blindly started using the pumi_* api, but I've run into problems. In parallel, on two processors, if I use the following code on a simple 2D, 4 quad mesh, I get a segfault on the ghost construction line:

pumi_mesh_print(mesh, false); // true segfaults
pumi_mesh_verify(mesh, true);
pumi_mesh_createFullAdjacency(mesh);
pumi_ghost_createLayer(mesh, mesh->getDimension() - 1, mesh->getDimension(), 1, 1);

This is holding me up as I can't seem to make ghosts and so I'm kind of assuming I've incorrectly built the pMesh object. On the positive side, the previous three lines run, so it can't be that far off.... Any thoughts?

@seegyoung
Copy link
Contributor

seegyoung commented Sep 12, 2019

Hi, Can you try without "pumi_mesh_createFullAdjacency(mesh);"?

After pumi_mesh_createFullAdjacency(mesh), any modification to mesh such as adaptation, migration and ghosting is not allowed as it will break the "full adjacency",

In most cases, a full adjacency is not necessary as PUMI supports "complete representation". I added that function for my internal testing for openmp.

@a-jp
Copy link
Contributor Author

a-jp commented Sep 12, 2019

Nah, that didn't work...

@a-jp
Copy link
Contributor Author

a-jp commented Sep 12, 2019

I was wondering if there was something wrong with my call stack. As I said, I'm running on two processors, in 2D, with a 4 quad mesh, 2 cells on each processor. I make the mesh and push the elements in and set the vertex positions. After that I do in this order:

apf::alignMdsRemotes(mesh);
apf::deriveMdsModel(mesh);
mesh->acceptChanges();
apf::verify(mesh, true);

then:

pumi_mesh_print(mesh, false); // true segfaults
pumi_mesh_verify(mesh, true);
//pumi_mesh_createFullAdjacency(mesh);
pumi_ghost_createLayer(mesh, mesh->getDimension() - 1, mesh->getDimension(), 1, 1);

I've kind of cribbed that together from examples, could it be I'm just missing some other assemble call that is breaking the ghosting?

Cheers,
Andy

@seegyoung
Copy link
Contributor

Please do not create a partitioned mesh from the scratch. For a partitioned mesh, there is a partition model between geometric model and mesh and unless the partition model and all the remote copy links are valid, the mesh won't work. Please understand that pumi_mesh_verify cannot catch all falsehood of the mesh especially if partitioned.

If you want partitioned mesh of your own, please construct a serial mesh and then run a partitioning program provided in test/*split.cc.

@jacobmerson
Copy link
Contributor

@seegyoung I think the point of his question is that he already has a partitioned mesh stored in some non-pumi file format, so @a-jp is interested in loading in his externally partitioned mesh in parallel to the pumi data structures.

@seegyoung
Copy link
Contributor

To the best of my knowledge, officially loading a parallel mesh is via .smb format only.

@cwsmith
Copy link
Contributor

cwsmith commented Sep 12, 2019

@seegyoung we have the construct API for constructing parallel meshes from node and element data:
#245 (comment)
I believe this is what @a-jp is doing for his larger/real meshes.

@a-jp
Copy link
Contributor Author

a-jp commented Sep 12, 2019

That's correct, I'm using the construct API. Could I ask for thoughts on if there was something wrong with my call stack? As I said, I'm running on two processors, in 2D, with a 4 quad mesh, 2 cells on each processor. I make the mesh and push the elements in and set the vertex positions: all via the construct API, in parallel. After that I do in this order:

apf::alignMdsRemotes(mesh);
apf::deriveMdsModel(mesh);
mesh->acceptChanges();
apf::verify(mesh, true);

then:

pumi_mesh_print(mesh, false); // true segfaults
pumi_mesh_verify(mesh, true);
//pumi_mesh_createFullAdjacency(mesh);
pumi_ghost_createLayer(mesh, mesh->getDimension() - 1, mesh->getDimension(), 1, 1);

I've kind of cribbed that together from examples, could it be I'm just missing some other call that is breaking the ghosting when the last function is called?

@cwsmith
Copy link
Contributor

cwsmith commented Sep 13, 2019

It turns out we don't currently have a supported way to convert from an apf::Mesh to a PUMI instance. The 'safest' way forward would be to write your partitioned smb mesh (created by construct(...)) to file/disk then use PUMI APIs to read it:

core/pumi/pumi.h

Lines 205 to 206 in d9eeb2b

// load a mesh from a file. Do static partitioning if num_in_part==1
pMesh pumi_mesh_load(pGeom geom, const char* fileName, int num_in_part, const char* mesh_type="mds");

and create the ghosts.

@a-jp
Copy link
Contributor Author

a-jp commented Sep 13, 2019

It turns out we don't currently have a supported way to convert from an apf::Mesh to a PUMI instance

That's a massive shame, especially given how much time I've thrown at this. What is required for this to work? There is I assume no construct for the PUMI api? Is that what's missing? I'm really not a fan of writing partitioned part files, especially for large meshes, or when restarting on different numbers of processors that had been used for the previous run, it gets super messy quickly, that's my personal experience at least.

I've already developed the construct API to allow mixed cell types and that seemed to be working at least everything seemed to be in the right place when visualised. It was the ghost creation that caused the problem. I would be happy to try to pitch in, if I could, can you tell me what's missing to go from apf->pumi at the technical level? I find it odd as I said since the pMesh is just a typedef for a Mesh2, is this a missing ancillary data structure?

@cwsmith
Copy link
Contributor

cwsmith commented Sep 18, 2019

@a-jp Take a look at the apfToPumi branch and the extended construct example:
https://github.com/SCOREC/core/blob/551d71589bc48af0dc5d348751bda95f782ee51c/test/constructThenGhost.cc
This example creates a distributed apf mesh with construct(...), creates a pumi instance, creates pumi ghosts, then synchronizes an element field. The image below shows the field before and after synchronization:

ghostSync

This has not been fully flushed out and is still under review on our side. If you want to push forward before we finish, feel free to try it out.

@a-jp
Copy link
Contributor Author

a-jp commented Sep 19, 2019

@cwsmith thanks! That's really interesting. Could you say what the new function is that does the conversion that was missing? Is it?

I'd be really happy to test it for you, as you know I had a working example that failed, which might now work. Can I ask, is there anything else that is required to be done by me prior to calling the new function?

So I know what to look out for: what are you still testing/working on? What's expected that may not work?

Thanks,
Andy

@cwsmith
Copy link
Contributor

cwsmith commented Sep 19, 2019

The new pumi_mesh_load(...) function you linked to simply sets the pumi instance's pointer for the mesh object:
7e80f94#diff-01216a9e3ca94548cc616b1b3fb76333R232-R238

I am somewhat concerned that this is too simple so I hesitate to say that this is complete. I'm really not sure what may be broken/missing without staring at the code some more.

Once your apf mesh is distributed/partitioned to N ranks, where N is the size of MPI _COMM_WORLD, then call pumi_mesh_load(...) and pass in your apf mesh. After that, try to only use the pumi_* APIs. If you can load your test mesh in and do the same beforeSync afterSync sanity check with ParaView (as done in constructThenGhost) that would be a good first step.

@a-jp
Copy link
Contributor Author

a-jp commented Sep 19, 2019

Ok I'll give it a go...

@cwsmith
Copy link
Contributor

cwsmith commented Oct 30, 2019

The apfToPumi branch was merged into develop. master will be synced with develop tonight if the nightly tests pass.

812a6a1

@cwsmith cwsmith mentioned this issue Nov 4, 2022
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants