New architecture #1

vbessonov · 2020-09-14T17:44:12Z

This PR changes the architecture of the library and decouples logic (parsing, validation, serialization) from POCO classes representing AST nodes such as Link, Collection, Manifest, etc:

it splits parsing into two separate phases: syntax analyses where SyntaxAnalyzer is responsible for parsing raw JSON into AST and SemanticAnalyzer conducts semantic checking of ready AST tree
it adds a separate class for serializing AST into JSON (to be developed yet)

leonardr · 2020-09-15T16:09:48Z

.python-version

@@ -0,0 +1 @@
+2.7.17


Should this be in the repo? I'm not an expert at pyenv but it seems like an artifact of your installation. I want this code base to work on both Python 2 and Python 3.

This file is actually quite often included into .gitignore but I thought it might be a good idea to let know what exact Python versions I used for testing

There is a similar question on Stack Overflow saying that .python-version can be kept in the repository for information purpose.

leonardr · 2020-09-15T16:24:02Z

src/python_rwpm_parser/parser/syntax.py

+
+            if sub_collection_keys:
+                raise ParsingError(
+                    'The following sub-collection keys are not registered: {0}'.format(sub_collection_keys))


I don't think this should be a parsing error, since RWPM allows extension roles (though it says "Extensions that are not registered in the registry must use a URI for their role.") I suggest treating these leftover keys as CollectionRoles with compact=False and require=False.

Then, when validating the collection, issue a warning if a collection has a role that is neither in (our model of) the registry nor a URI.

Currently the parser raises an exception and halts on the first failure. Would you like to have it to return a list of errors, warnings instead?

I was saying that this shouldn't even be considered an error. I think you're saying that it should be considered an error, but that an error doesn't necessarily need to stop the parsing process.

So, two questions to resolve: is this actually an error? And should a problem like this necessarily result in an exception?

First, I don't think we can say for sure that this particular thing is an error, because we only know the contents of the RWPM role registry at the time of the release of the software. I don't want to have to put out an immediate upgrade if a role is added to the registry so we can parse the new role without an error.

That said, if an "error" in this case just means something is added to a list, then it's liveable, and I'd be okay with returning a 2-tuple of (errors, warnings).

I removed this code and the parser doesn't raise an exception in the case of unknown collections

leonardr · 2020-09-15T17:48:36Z

The overall architecture looks great. I'm comfortable merging this in and using it as a basis for further work if that's what you'd like.

vbessonov · 2020-09-25T09:53:32Z

@leonardr, would you like to keep the old name python-opds-parser? I was thinking about something more generic like python-rwpm-parser. Do you think it makes sense?

leonardr · 2020-09-25T12:59:06Z

I'd be OK with python-webpub-manifest-parser, since webpub-manifest is the name of the Github project that contains the spec. We'll mention OPDS 2 in the description so this package shows up in searches. Let me know if you like this name and I'll rename the repo.

vbessonov · 2020-09-27T11:59:06Z

@leonardr, python-webpub-manifest-parser sounds great!

vbessonov · 2020-09-28T05:50:07Z

@leonardr, just one more question regarding the naming: would you like to drop python- in the case of the Python package?
So the name of the repository would be python-webpub-manifest-parser and the name of the Python package would be webpub-manifest-parser

leonardr · 2020-09-28T13:57:37Z

Yes, that's exactly what I was thinking. python- helps web searches but is not necessary for PyPI searches.

…n-webpub-manifest-parser into feature/new-architecture

vbessonov · 2020-10-07T19:59:41Z

@leonardr, I finished, the parser right now should be compliant with RWPM and OPDS 2.0 specs. However, it doesn't allow to extend the specs and add custom sub-collections yet - it won't throw any error but those custom sub-collections won't be available as part of the AST and won't be validated.

vbessonov · 2020-10-09T16:06:58Z

@leonardr, I have some integration tests locally which use two feeds from NYU, Pret Numerique and The University of Michigan Press. I find them extremely helpful and would like to include them into the repository but I'm not sure whether I'm allowed to make them public.

leonardr · 2020-10-09T16:43:07Z

You should be able to include feeds if a) the feeds are public in the first place or b) you obscure the metadata -- title, author, identifiers, and links. Show me the feeds privately and I'll advise.

vbessonov · 2020-10-19T19:12:57Z

@leonardr, I created a new PR because I messed up this branch and couldn't rebase it anymore.

leonardr reviewed Sep 15, 2020

View reviewed changes

Initial commit

d285d2b

vbessonov force-pushed the feature/new-architecture branch from f0bc27b to d285d2b Compare September 20, 2020 21:13

vbessonov added 5 commits September 29, 2020 00:26

Initial commit

d1a9921

Initial commit

ca04ffa

Initial commit

6902d13

Merge branch 'feature/new-architecture' of github.com:vbessonov/pytho…

032cb31

…n-webpub-manifest-parser into feature/new-architecture

Update README.md

8cc4768

Fix support for Python 3.x

ce3c789

vbessonov changed the title ~~[WIP] New architecture~~ New architecture Oct 9, 2020

vbessonov closed this Oct 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New architecture #1

New architecture #1

vbessonov commented Sep 14, 2020

leonardr Sep 15, 2020

vbessonov Oct 7, 2020

vbessonov Oct 9, 2020

leonardr Sep 15, 2020

vbessonov Sep 25, 2020

leonardr Sep 25, 2020

vbessonov Oct 7, 2020

leonardr commented Sep 15, 2020

vbessonov commented Sep 25, 2020

leonardr commented Sep 25, 2020 •

edited

vbessonov commented Sep 27, 2020

vbessonov commented Sep 28, 2020

leonardr commented Sep 28, 2020 •

edited

vbessonov commented Oct 7, 2020

vbessonov commented Oct 9, 2020

leonardr commented Oct 9, 2020

vbessonov commented Oct 19, 2020

		@@ -0,0 +1 @@
		2.7.17

New architecture #1

New architecture #1

Conversation

vbessonov commented Sep 14, 2020

leonardr Sep 15, 2020

Choose a reason for hiding this comment

vbessonov Oct 7, 2020

Choose a reason for hiding this comment

vbessonov Oct 9, 2020

Choose a reason for hiding this comment

leonardr Sep 15, 2020

Choose a reason for hiding this comment

vbessonov Sep 25, 2020

Choose a reason for hiding this comment

leonardr Sep 25, 2020

Choose a reason for hiding this comment

vbessonov Oct 7, 2020

Choose a reason for hiding this comment

leonardr commented Sep 15, 2020

vbessonov commented Sep 25, 2020

leonardr commented Sep 25, 2020 • edited

vbessonov commented Sep 27, 2020

vbessonov commented Sep 28, 2020

leonardr commented Sep 28, 2020 • edited

vbessonov commented Oct 7, 2020

vbessonov commented Oct 9, 2020

leonardr commented Oct 9, 2020

vbessonov commented Oct 19, 2020

leonardr commented Sep 25, 2020 •

edited

leonardr commented Sep 28, 2020 •

edited