Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reference data - Concepts #1

Open
zoometh opened this issue Apr 4, 2024 · 0 comments
Open

reference data - Concepts #1

zoometh opened this issue Apr 4, 2024 · 0 comments

Comments

@zoometh
Copy link
Member

zoometh commented Apr 4, 2024

Errors when running excel2skos.py

Traceback (most recent call last):
  File "C:\Users\Thomas Huet\AppData\Local\Programs\Python\Python311\Lib\xml\sax\expatreader.py", line 217, in feed
    self._parser.Parse(data, isFinal)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 5201, column 98

Here it is the &, a special character that's creates the error

image

And ChatGPT has spotted a potential issue in EAMENA.xml, line 11528, a misread of the < character

image

...
  File src/lxml/parser.pxi:1881 in lxml.etree._parseDocFromFile

  File src/lxml/parser.pxi:1200 in lxml.etree._BaseParser._parseDocFromFile

  File src/lxml/parser.pxi:633 in lxml.etree._ParserContext._handleParseResultDoc

  File src/lxml/parser.pxi:743 in lxml.etree._handleParseResult

  File src/lxml/parser.pxi:672 in lxml.etree._raiseParseError

  File /mnt/data/EAMENA.xml:11528
    <10km)"}
    ^
XMLSyntaxError: StartTag: invalid element name, line 11528, column 7

This could be the reason why I struggle creating and importing (translated) Concepts in the training instance, fro example: EAMENA_fr.xml

The Python code (thks to Chat) to validate the XML is:

def parse_xml(  url = 'https://raw.githubusercontent.com/eamena-project/eamena-arches-dev/main/dbs/database.eamena/data/reference_data/concepts/EAMENA.xml'):
  # check if the XML structure is OK (no syntax errors)
  import requests
  from lxml import etree

  try:
    # url = 'https://raw.githubusercontent.com/eamena-project/eamena/master/eamena/pkg/reference_data/concepts/EAMENA.xml'
    response = requests.get(url)
    root = etree.fromstring(response.content)
    # Attempt to extract basic information again to confirm successful parsing
    root_info_new_upload = {'root_tag': root.tag, 'children_tags': [child.tag for child in root]}
    parsing_message = "XML parsed successfully."
  except etree.XMLSyntaxError as e:
    root_info_new_upload = None
    parsing_message = f"Error parsing XML: {str(e)}"
  print(root_info_new_upload, parsing_message)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant