Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] iso characters #107

Open
markjessell opened this issue Sep 7, 2023 · 2 comments
Open

[BUG] iso characters #107

markjessell opened this issue Sep 7, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@markjessell
Copy link
Contributor

Describe your issue

If CODE field entries have accents, e.g. "Amphibolites_et_métagabbros" then networkx fails

probably true for GROUP entries as well?

see https://stackoverflow.com/questions/61789659/networkx-impossible-to-read-my-gml-file-input-is-not-ascii-encoded

Minimal reproducing code example

use accents in a field that will be used as CODE

Error message

File "/home/mark/map2loop-2_latest/map2loop-2/map2loop/topology.py", line 39, in __init__
    self.graph = nx.read_gml(config.strat_graph_filename, label="id")
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/utils/decorators.py", line 766, in func
    return argmap._lazy_compile(__wrapper)(*args, **kwargs)
  File "<class 'networkx.utils.decorators.argmap'> compilation 5", line 5, in argmap_read_gml_1
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 195, in read_gml
    G = parse_gml_lines(filter_lines(path), label, destringizer)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 438, in parse_gml_lines
    graph = parse_graph()
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 427, in parse_graph
    curr_token, dct = parse_kv(next(tokens))
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 373, in parse_kv
    curr_token, value = parse_dict(curr_token)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 421, in parse_dict
    curr_token, dct = parse_kv(curr_token)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 373, in parse_kv
    curr_token, value = parse_dict(curr_token)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 421, in parse_dict
    curr_token, dct = parse_kv(curr_token)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 358, in parse_kv
    curr_token = next(tokens)
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 314, in tokenize
    for line in lines:
  File "/home/mark/anaconda3/envs/m2l2-py39/lib/python3.9/site-packages/networkx/readwrite/gml.py", line 188, in filter_lines
    raise NetworkXError("input is not ASCII-encoded") from err
@markjessell markjessell added the bug Something isn't working label Sep 7, 2023
@markjessell markjessell changed the title iso characters[BUG] [BUG] iso characters Sep 9, 2023
@RoyThomsonMonash
Copy link
Contributor

Hi Mark,

Networkx does support accents in labels, the code below works including reading and writing gml:
import networkx as nx
G = nx.Graph()
G.add_node("é")
print(G.nodes)
nx.write_gml(G, "tmp.gml")
G2 = nx.read_gml(G, "tmp.gml")
print(G2.nodes)

However it seems that the gml written out by map2model doesn't check for non-ASCII characters in the node labels before writing them, hence they corrupt the node labels (yEd still opens the gml file but corrupts the label). As map2model is C++ code using ASCII rather than python's UTF-8 this is what caused the problem.

Exploring further this is only a problem for the "c" field as "u" and "g"/"g2" labels are not an output from map2model.

In the example above the 'é' character needs to be escaped as "&#233" which is the html equivalent used for gml files. Either map2model can check for all the standard accented characters and replace them with the html equivalent or it could use a gml library so the encoding is done automatically.

@markjessell
Copy link
Contributor Author

Thanks Roy

I came across this as a way to normalise the strings prior to reading them in (since gml is just a text file anyway we could run this over the file prior to use by networkx?

https://stackoverflow.com/questions/44431730/how-to-replace-accented-characters

m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants