Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetaModelica does not support strings larger than 2GB #10284

Open
adrpo opened this issue Mar 1, 2023 · 21 comments
Open

MetaModelica does not support strings larger than 2GB #10284

adrpo opened this issue Mar 1, 2023 · 21 comments
Assignees
Labels
COMP/OMC/MetaModelica Issues and pull request related to MetaModelica

Comments

@adrpo
Copy link
Member

adrpo commented Mar 1, 2023

It seems that if the string size goes above 2GB the MetaModelica strings start to fail badly generating errors.

@adrpo adrpo added the COMP/OMC/MetaModelica Issues and pull request related to MetaModelica label Mar 1, 2023
@adrpo adrpo self-assigned this Mar 1, 2023
@adrpo
Copy link
Member Author

adrpo commented Mar 1, 2023

See also #10171.

@casella
Copy link
Contributor

casella commented Mar 1, 2023

Do we really need to have MetaModelica on 32 bits? 32-bit or even 16-bit runtimes can make sense in an embedded context, but I don't think anybody will compile OMC on a 32-bits OS these days - what for?

@perost
Copy link
Member

perost commented Mar 1, 2023

Do we really need to have MetaModelica on 32 bits? 32-bit or even 16-bit runtimes can make sense in an embedded context, but I don't think anybody will compile OMC on a 32-bits OS these days - what for?

Strings seem to already store the length as 64 bits on 64-bit platforms, but some things seem to not be 64-bit yet. For example some of the string utility functions like System.unescapedStringLength that are implemented as external C functions, which use int as the return type since that's mandated by the Modelica specification.

So unfortunately I don't think it's quite as easy as just changing strings to be 64-bit, we'll have to at least change the way we use external C functions (or maybe easier to just make sure we don't use them to pass string lengths). But I wouldn't be surprised if there are other pitfalls when it comes to large strings too.

@adrpo
Copy link
Member Author

adrpo commented Mar 1, 2023

Trying to read a 3.7G file:

File:

adrpo@omod-r630-4:~/test/brsl$ ls -lah g.json
-rw-rw-r-- 1 adrpo adrpo 3.7G Mar  1 15:45 g.json

Script:

// readMoreThan2GBFile.mos
readFile("g.json"); getErrorString();

We get a funny error also called "Success":

adrpo@omod-r630-4:~/test/brsl$ time ~/OpenModelica/build/bin/omc readMoreThan2GBFile.mos
""
"Error: Failed to read the entire file: g.json: Success"
real    0m2.666s
user    0m0.262s
sys     0m2.617s

@adrpo adrpo closed this as completed Mar 1, 2023
@adrpo
Copy link
Member Author

adrpo commented Mar 1, 2023

Split the file into files less than 2GB

adrpo@omod-r630-4:~/test/brsl$ split --line-bytes=2GB g.json
adrpo@omod-r630-4:~/test/brsl$ ls -lah
total 7.3G
-rw-rw-r-- 1 adrpo adrpo 3.7G Mar  1 15:45 g.json
-rw-rw-r-- 1 adrpo adrpo   38 Mar  1 15:47 readMoreThan2GBFile.mos
-rw-rw-r-- 1 adrpo adrpo 1.9G Mar  1 15:53 xaa
-rw-rw-r-- 1 adrpo adrpo 1.8G Mar  1 15:54 xab

Script:

readFile("xaa"); getErrorString();

At first it reads the file fine but then it dies when trying to quote the output as it goes over 2GB

adrpo@omod-r630-4:~/test/brsl$ time ~/OpenModelica/build/bin/omc readLessThan2GBFile.mos
"Error: Internal error ValuesUtil.valString2 failed"
real    0m15.445s
user    0m9.588s
sys     0m6.492s

@perost
Copy link
Member

perost commented Mar 1, 2023

Yes, trying to return a large string in a script will not work since omc__escapedString in modelica_string.c stores the length of the string as an int. Not sure why readFile would fail though, I'm not seeing any obvious use of 32-bit length there at least.

@adrpo
Copy link
Member Author

adrpo commented Mar 1, 2023

I think some Windows calls are to blame, it seems that _wstat returns 0 or so.

@perost
Copy link
Member

perost commented Mar 1, 2023

I think some Windows calls are to blame, it seems that _wstat returns 0 or so.

I just tried it on Linux, and readFile fails in exactly the same way.

@adrpo
Copy link
Member Author

adrpo commented Mar 1, 2023

I also tried on Linux, I don't even bother with Windows at this moment so I don't know why I said the Windows calls are to blame :)

Yeah, is weird that it fails to read the file on Linux!

@adrpo
Copy link
Member Author

adrpo commented Mar 1, 2023

The only thing it does is omc_stat which calls the Linux stat and it seems it gets a != 0.

static char* SystemImpl__readFile(const char* filename)
{
  char* buf;
  int res;
  FILE * file = NULL;
  omc_stat_t statstr;
  res = omc_stat(filename, &statstr);

  if (res != 0) {

@adrpo adrpo reopened this Mar 1, 2023
@perost
Copy link
Member

perost commented Mar 1, 2023

I think the issue is rather the line further down:

if( (res = omc_fread(buf, sizeof(char), statstr.st_size, file, 0)) != statstr.st_size) {

res is an int here, so the comparison will fail when the size is larger than 2GB.

@adrpo
Copy link
Member Author

adrpo commented Mar 1, 2023

I closed this by mistake. We really need to use our own types like mmc_uint_t for sizes.
And Modelica should really evolve and use 64bit types instead of int.

@adrpo
Copy link
Member Author

adrpo commented Mar 1, 2023

Yes, you are right as the error is: "Error: Failed to read the entire file: g.json: Success" which is in that if.

@adrpo
Copy link
Member Author

adrpo commented Mar 1, 2023

Nowadays is really not far fetched to read 2G+ files. We need to look everywhere in the compiler and fix this properly.
I don't really know how to deal with the binary compatibility with external Modelica libraries (the only problem is if there are ints inside records) but for sure internally our compiler (especially if we compile MetaModelica) we should use proper types.

@perost
Copy link
Member

perost commented Mar 1, 2023

We could perhaps try to compile with -Wconversion, that would give a warning when trying to assign a value to a variable that's too small. I tried it on systemimpl.c and got around 80 warnings for just that file, so I think we might have quite a few similar issues.

@casella
Copy link
Contributor

casella commented Mar 1, 2023

Gosh, this MLS 32-bit int thing is really, really bad. Talk about future-proof design...

@mahge
Copy link
Contributor

mahge commented Mar 1, 2023

I don't really know how to deal with the binary compatibility with external Modelica libraries (the only problem is if there are ints inside records) but

This, fortunately, might not be a concern any longer after #9399. Records and their members are converted appropriately if they are to be sent to external functions.

On the other hand, modelica_integer is already typedefed to long. So apart from contemplating switching to fixed width types (we should if we can) the immediate solution is to enable the warnings as @perost suggested and fix them one by one.

@adrpo
Copy link
Member Author

adrpo commented Mar 1, 2023

Very good, we will take this one by one. I will start with a PR that fixes the string issues.

adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 1, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
- update bootstrapping sources (new C declarations needed)
@adrpo
Copy link
Member Author

adrpo commented Mar 2, 2023

With PR #10297 it works to generate a ~3GB JSON for the BR example. Now, truth is, that is a bit excessive and we should find ways to minimize the JSON output from getModelInstance as otherwise it will be slow and consume quite a bit of memory.
The PR seems to break the package management (I guess we have seen this before when @mahge tried to use int for MM Integer). I will continue fixing the issues until we get somewhere.

adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 2, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
- update bootstrapping sources (new C declarations needed)
@casella
Copy link
Contributor

casella commented Mar 2, 2023

With PR #10297 it works to generate a ~3GB JSON for the BR example. Now, truth is, that is a bit excessive and we should find ways to minimize the JSON output from getModelInstance as otherwise it will be slow and consume quite a bit of memory.

Indeed. I thought that avoiding to flatten anything which is not parameters and (graphical) annotations was the way to go, as those are the only things that are needed for the GUI, but I understood from @perost that this is not as easy as it seems. Did I miss something?

In that case, we could at least skip transferring the information about everything which is not parameters and annotations to the JSON file. @adrpo is that what you have in mind?

The PR seems to break the package management (I guess we have seen this before when @mahge tried to use int for MM Integer). I will continue fixing the issues until we get somewhere.

I would strongly suggest that we avoid merging such a PR onto master until we branched off 1.21.0-dev.beta1. This is really a dangerous change in the proximity of a release, I would like to avoid a litany of bugfix releases to repair indirect damage caused by 64 bit integers. We've been there already.

Bottom line: handling strings > 2 GB is nice to have, but is not a critical feature. Having reasonably sized JSON files is 😅

@adrpo
Copy link
Member Author

adrpo commented Mar 2, 2023

For sure this will not be in the master any time soon, whenever I am fixing some things needed for this one I am also breaking other things it seems :)

adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 2, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
- update bootstrapping sources (new C declarations needed)
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 2, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
- update bootstrapping sources (new C declarations needed)
adrpo added a commit to adrpo/OpenModelica that referenced this issue Mar 10, 2023
- use proper size types where needed in external C functions
- use "modelica_integer" instead of "int" when generating MetaModelica code
- update bootstrapping sources (new C declarations needed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
COMP/OMC/MetaModelica Issues and pull request related to MetaModelica
Projects
None yet
Development

No branches or pull requests

4 participants