-
-
Notifications
You must be signed in to change notification settings - Fork 66
Towards handling Import[xxx, "ZIP", member-name]
#1846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rocky
wants to merge
19
commits into
master
Choose a base branch
from
handle-import-zip
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
1d11a3b
Split out Import/Export functions...
rocky 5b03b4a
Tweak sort order
rocky 14df4cc
More refactoring/cleanup for getting post-import working
rocky f91b516
Revert a to_python() dict conversion for now.
rocky 5f3c1b1
Segregate and DRY import/export checking functions.
rocky 6437c60
Add mathics.eval: eval_DeleteFile and eval_FileExtension
rocky 85060e2
Start breaking up eval.eval_Import
rocky 0a64e33
Finally, we get to ZIP imports properly
rocky 968613e
Plumbing hooked up for Import zip with members.
rocky d32c54c
Allow single element field forms on more Importers
rocky a272e8d
Some Import functions do not support element selection. Work around t…
rocky 98e83bf
ZIP import starts working.
rocky 5dbe72d
Revise to fit better inside previous fileformats frameworks.
rocky d67cf1d
fileformats.json -> fileformats.jsonformat
rocky 1a4a319
More forceful wording in FIXME
rocky 7e86a2e
Add a minor comment on get_elements()
rocky ea36ef4
Set "Text" as a default WMA mime type when no other is found
rocky f6d36b8
Allow selectable default on infer_file_format
rocky 662428d
Better describe what up here with Import and Export
rocky File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,40 @@ | ||
| """ | ||
| File Formats | ||
| r"""Import/Export File Formats, Importers and Exporters | ||
|
|
||
| The data of files on a filesystem or retrieved from the Internet often are structured \ | ||
| according to a specific structures and rules. For example, consider different kinds of \ | ||
| structuring used in a JSON file, versus an HTML files, or a compressed GZIP file. | ||
|
|
||
| In some cases, such as archive files, e.g., ZIP, TAR, and JAR, the file contains component parts, \ | ||
| which in WMA terminology are called "members" which is part of the broader metadata items \ | ||
| called "elements". | ||
|
|
||
| A MIME type is typically associated with each kind of format. \Mathics3, following WMA, \ | ||
| uses a shortend name for this MIME type. For example \Mathics3 uses "HTML" as a shorthand \ | ||
| for the MIME type "text/html". | ||
|
|
||
| Below is a list of file supported file types that we have builtin importers or exporters written \ | ||
| in Python. Other importers, however, are written in \Mathics3. | ||
|
|
||
| Variable <url> | ||
| :\$ExportFormats: | ||
| /doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/\$exportformats</url> \ | ||
| contains a list of file formats that are supported by <url> | ||
| :Export: | ||
| /doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/export</url>, \ | ||
| while <url> | ||
| :\$ImportFormats: | ||
| /doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/\$importformats</url> \ | ||
| does the corresponding thing for <url> | ||
| :Import: | ||
| /doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/import</url>. | ||
|
|
||
| Built-in Importers. | ||
| Many Import/Export functions are registered in SystemFiles/Formats/*.wl which is \ | ||
| autoloaded on startup. | ||
|
|
||
| The Built-in Functions are defined in a separate context. | ||
| For example, HTML` or Compress`. This is done to not pollute the System` namespace. | ||
| """ | ||
|
|
||
| # The Built-in Functions are defined in a separate context under the | ||
| # System`. For example System`HTML` and System`XML. This is done to not | ||
| # pollute the System` namespace. | ||
| # This tells documentation how to sort this module | ||
| # Here we are also hiding "file_io" since this can erroneously appear at the top level. | ||
| sort_order = "mathics.builtin.importing-export-file-formats" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| """ | ||
| Compression & Archive Formats | ||
| """ | ||
|
|
||
| from mathics.core.builtin import Builtin, String | ||
| from mathics.core.evaluation import Evaluation | ||
| from mathics.eval.fileformats.compression import eval_ImportZIP | ||
|
|
||
| # See commit in __init__.py regarding the whacky way this gets called | ||
|
|
||
|
|
||
| class ImportZIP(Builtin): | ||
| """ | ||
| <url>:WMA link:https://reference.wolfram.com/language/ref/format/ZIP.html</url> | ||
|
|
||
| <dl> | ||
| <dt>'Compress`ImportZIP[path]' | ||
| <dd>Run zip for archive file $path$ | ||
| </dl> | ||
|
|
||
| """ | ||
|
|
||
| context = "Compress`" | ||
| summary_text = "import a ZIP file" | ||
|
|
||
| def eval(self, path: String, evaluation: Evaluation): | ||
| "Compress`ImportZIP[path_String]" | ||
| return eval_ImportZIP(path, evaluation) | ||
|
|
||
| def eval_with_elements(self, path: String, elements, evaluation: Evaluation): | ||
| "Compress`ImportZIP[path_String, elements_]" | ||
| return eval_ImportZIP(path, evaluation, elements) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,7 +2,7 @@ | |
| """ | ||
| HTML | ||
|
|
||
| Basic implementation for a HTML importer. | ||
| HTML importer. | ||
| """ | ||
|
|
||
|
|
||
|
|
@@ -15,6 +15,7 @@ | |
| from mathics.core.builtin import Builtin, MessageException | ||
| from mathics.core.convert.expression import to_expression, to_mathics_list | ||
| from mathics.core.convert.python import from_python | ||
| from mathics.core.evaluation import Evaluation | ||
| from mathics.core.expression import Expression | ||
| from mathics.core.list import ListExpression | ||
| from mathics.core.symbols import Symbol | ||
|
|
@@ -126,7 +127,7 @@ class _TagImport(_HTMLBuiltin): | |
| def _import(self, tree): | ||
| raise NotImplementedError | ||
|
|
||
| def eval(self, text, evaluation): | ||
| def eval(self, text: String, evaluation: Evaluation): | ||
| """%(name)s[text_String]""" | ||
| tree = parse_html(parse_html_file, text, evaluation) | ||
| if isinstance(tree, Symbol): # $Failed? | ||
|
|
@@ -135,6 +136,12 @@ def eval(self, text, evaluation): | |
| to_expression(SymbolRule, self.tag_name, self._import(tree)) | ||
| ) | ||
|
|
||
| def eval_with_element(self, text, element, evaluation: Evaluation): | ||
| """%(name)s[text_String, element_]""" | ||
| # FIXME: right now we aren't using element, and should use this to more | ||
| # efficiently extract part of the XML file that we want. | ||
| return self.eval(text, evaluation) | ||
|
|
||
|
|
||
| class _Get(_HTMLBuiltin): | ||
| context = "HTML`Parser`" | ||
|
|
@@ -401,7 +408,7 @@ class SourceImport(_HTMLBuiltin): | |
|
|
||
| summary_text = "import source code from a HTML file" | ||
|
|
||
| def eval(self, text, evaluation): | ||
| def eval(self, text, evaluation: Evaluation): | ||
| """%(name)s[text_String]""" | ||
|
|
||
| def source(filename): | ||
|
|
@@ -412,6 +419,12 @@ def source(filename): | |
|
|
||
| return parse_html(source, text, evaluation) | ||
|
|
||
| def eval_with_element(self, text, element, evaluation: Evaluation): | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| """%(name)s[text_String, element_]""" | ||
| # FIXME: right now we aren't using element, and should use this to more | ||
| # efficiently extract part of the XML file that we want. | ||
| return self.eval(text, evaluation) | ||
|
|
||
|
|
||
| class TitleImport(_TagImport): | ||
| """ | ||
|
|
@@ -437,7 +450,7 @@ def _import(self, tree): | |
|
|
||
| class XMLObjectImport(_HTMLBuiltin): | ||
| """ | ||
| ## <url>:native internal:</url> | ||
| <url>:WMA link:https://reference.wolfram.com/language/ref/XMLObject.html</url> | ||
|
|
||
| <dl> | ||
| <dt>'HTML`XMLObjectImport["filename"]' | ||
|
|
@@ -450,7 +463,13 @@ class XMLObjectImport(_HTMLBuiltin): | |
|
|
||
| summary_text = "import XML objects from a HTML file" | ||
|
|
||
| def eval(self, text, evaluation): | ||
| def eval(self, text, evaluation: Evaluation): | ||
| """%(name)s[text_String]""" | ||
| xml = to_expression("HTML`Parser`HTMLGet", text).evaluate(evaluation) | ||
| return ListExpression(Expression(SymbolRule, String("XMLObject"), xml)) | ||
|
|
||
| def eval_with_element(self, text, element, evaluation: Evaluation): | ||
| """%(name)s[text_String, element_]""" | ||
| # FIXME: right now we aren't using element, and should use this to more | ||
| # efficiently extract part of the HTML file that we want. | ||
| return self.eval(text, evaluation) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,32 +1,29 @@ | ||
| # -*- coding: utf-8 -*- | ||
|
|
||
| """ | ||
| JSON | ||
| JSON File Format | ||
|
|
||
| Basic implementation for an JSON importer. | ||
| JSON importer (via Python's "json" module). | ||
| """ | ||
|
|
||
| from mathics.core.builtin import Builtin | ||
| from mathics.core.expression import Evaluation | ||
| from mathics.core.builtin import Builtin, String | ||
| from mathics.core.evaluation import Evaluation | ||
| from mathics.eval.fileformats.jsonformat import eval_JSONImport | ||
|
|
||
|
|
||
| class JSONImport(Builtin): | ||
| class ImportJSON(Builtin): | ||
| """ | ||
| ## <url>:native internal:</url> | ||
| <url>:WMA link:https://reference.wolfram.com/language/ref/format/JSON.html</url> | ||
|
|
||
| <dl> | ||
| <dt>'JSON`Import`JSONImport["file"]' | ||
| <dd>parses "string" as a JSON file, and returns the data as a nested \ | ||
| list of rules. | ||
| <dt>'JSON`ImportJSON[path]' | ||
| <dd>Read $path$ as JSON and convert that to its corresponding Mathics3 equivalent. | ||
| </dl> | ||
|
|
||
| """ | ||
|
|
||
| summary_text = "import elements from json" | ||
| context = "JSON`Import`" | ||
| context = "JSON`" | ||
| messages = {"dec": "Decoding Error at `1`"} | ||
| summary_text = "import JSON file" | ||
|
|
||
| def eval(self, filename, evaluation: Evaluation): | ||
| """%(name)s[filename_String]""" | ||
| return eval_JSONImport(filename.value, evaluation) | ||
| def eval(self, path: String, evaluation: Evaluation): | ||
| "JSON`ImportJSON[path_String]" | ||
| return eval_JSONImport(path, evaluation) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmatera Is there a better way to combine this with the
def eval()above?Even if that is the case, it might be useful to have this broken out as a stub for when this is revised to be able to handle the element passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not using the parameter
element. In a proper implementation, this function should be more general thanself.eval. Also,parse_htmlshould have an extra attribute to filter a given element.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is exactly what FIXME says.
Yep. Revising HTML and XML is left for later. I will be happy when we are able to "Import" and extract a JSON file from a ZIP import which is needed for being able to install paclets from the public paclet server.
This is the main reason why any work on this is currently being done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but the output with the second element produce something different.
In the Mathics3 master branch, this seems to work:
So I do not see what this new method provides.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that. Below, I see lots of output, and I believe this is the same that you'd get from Wolframscript. The formatting of the text is different, but that's to be expected until we match up StandardForm output better.
If there is a specific difference, exactly what's different?
(Please try to give a small example of a difference.)
It is a placeholder function (that indicates FIXME) and it is there to indicate that it should be filled out to remove gross inefficiency that can arise by reading in lots of stuff and then throwing away or filtering most of it.
Instead, that code should be filled out to pass information to other eval routines that handle element retrieval in a better way.