Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added plain text extraction to the build process #93

Merged
merged 8 commits into from
Jun 6, 2017

Conversation

sonofmun
Copy link
Contributor

You can see what the results would look like in the /text/ directory here: https://github.com/sonofmun/canonical-lat-test/tree/0.0.147

sonofmun added 2 commits May 24, 2017 16:11
This reset the unwanted changed from the no_tar branch
@coveralls
Copy link

coveralls commented May 24, 2017

Coverage Status

Coverage decreased (-0.2%) to 93.247% when pulling f2236b4 on plain_text into 28741bd on master.

@PonteIneptique
Copy link
Member

Few comments before I forget :
You should export the list of deepest node instead of the highest (aka the text) thus cleaning up any non text node. You'll still need tei note excluded.

I would avoid using the resolver : il would simply look for xml file that are not cts. That will remove the need of parsing metadata.

I'd prefer that as an option.

@sonofmun
Copy link
Contributor Author

Would I get the deepest node by specifying the subreference in the getTextualNode method?
And I am not sure how I can export to plain text without either using the resolver or having to parse the file myself? In interactive_text.export(Mimetypes.PLAINTEXT), interactive_text needs to be MyCapytain.resources.texts.local.capitains.cts.__SimplePassage__. And since this is part of the build script, I think I need to reload the collection in order to use CapiTainS to extract the textual nodes, thus the resolver. Is there a better way to do this, e.g., to carry Repository over from hooktest to hooktest-build? Or am I completely missing the point, somehow?

@PonteIneptique
Copy link
Member

  1. List all files that are not cts.xml
  2. Initiate a text = CapitainsCtsText
  3. Retrieve references : reffs = text.getReffs(depth=len(text.citation))
  4. Retrieve passages : passages = [text.getTextualNode(passage) for passage in reffs]
  5. Export passages to text.

Stopped using the resolver and replaced it with CapitainsCtsText called on an lxml _Element

Added the --txt and --cites parameters to hooktest-build
@sonofmun
Copy link
Contributor Author

Stupid. I used my local paths to load some files in the tests. That is why it failed. I will correct it and push the new version.

@coveralls
Copy link

coveralls commented May 30, 2017

Coverage Status

Coverage decreased (-0.1%) to 93.32% when pulling c979d05 on plain_text into 28741bd on master.

@sonofmun
Copy link
Contributor Author

I need to test the exception on the plain_text function. Also, I think I need to specify the exception more closely. But I honestly don't remember what the error was that I was trying to catch.

@sonofmun
Copy link
Contributor Author

I remember now. The exception I was getting was when using the resolver and there being a file mentioned in the metadata that didn't actually exist in the file system. I will remove the try, except statement and repush since I am no longer using the resolver.

@coveralls
Copy link

coveralls commented May 30, 2017

Coverage Status

Coverage increased (+0.1%) to 93.59% when pulling c05b195 on plain_text into 28741bd on master.

@coveralls
Copy link

coveralls commented May 31, 2017

Coverage Status

Coverage increased (+0.1%) to 93.59% when pulling b03adf7 on plain_text into 28741bd on master.

@PonteIneptique
Copy link
Member

PonteIneptique commented Jun 5, 2017

All great and nearly ready to merge except :

  • Build class init documentation has not been updated
  • CHANGES not updated
  • Bump minor version (second number) because addition to the code base ;)

Updated the init documentation on the build class to include the new parameters (tar, txt, cites)

Updated the CHANGES.txt file to reflect the addition of the plain_text function

Bumped the version number to 1.1.0

Updated the setup.py to show this version
@PonteIneptique
Copy link
Member

Hey !
I actually missed one last thing : could you add a section to the readme for this new "method" ? I don't see the build shown there :)

@coveralls
Copy link

coveralls commented Jun 6, 2017

Coverage Status

Coverage increased (+0.1%) to 93.59% when pulling 142bb06 on plain_text into 28741bd on master.

I didn't add very much since the scope at this point is fairly limited. I simply added it into the flow of prose for the `.travis.yml` file for now. Once I add the `txt`, `csv`, and `json` parameters, I will add a whole section for hooktest-build.
@sonofmun
Copy link
Contributor Author

sonofmun commented Jun 6, 2017

As I said in the text for the commit:

I didn't add very much since the scope at this point is fairly limited. I simply added it into the flow of prose for the `.travis.yml` file for now. Once I add the `txt`, `csv`, and `json` parameters, I will add a whole section for hooktest-build.

@coveralls
Copy link

coveralls commented Jun 6, 2017

Coverage Status

Coverage increased (+0.1%) to 93.59% when pulling a40fa2e on plain_text into 28741bd on master.

@PonteIneptique PonteIneptique merged commit 12ecb2b into master Jun 6, 2017
@sonofmun sonofmun deleted the plain_text branch June 6, 2017 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants