Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the required functions of a "FileGetter/FileManipulator/InputValidator/etc" #400

Closed
whikloj opened this issue Jun 20, 2017 · 12 comments
Labels

Comments

@whikloj
Copy link
Contributor

whikloj commented Jun 20, 2017

This is a semi-documentation, semi-codebase question.

I am looking to to use MiK to move our Manitobia TEI stuff into Islandora. I went in to look at how to extend the base classes and they don't seem to have any abstract required functions that would be called?

I was expecting something more like an Interface which defines the required methods which each implementation would then implement. Say (for example) all Getters must implement a getFiles method which takes a... string for location and returns an array of files? Or maybe more generally it must implement a doGet function which can take the configuration and return the file information.

So what methods do I need to implement in my new Getter if I want it to work within the MiK framework? Or is it rather that my custom Getter can completely change my workflow?

Sorry I didn't get to attend the workshop at IslandoraCon 😞

@MarcusBarnes
Copy link
Owner

@whikloj You are correct in that the fileGetter abstract class is vary bare: https://github.com/MarcusBarnes/mik/blob/master/src/filegetters/FileGetter.php This is in part to allow maximum flexibility for different workflows. Maybe you can write me about where your TEI data is stored and which solution pack you'll be targeting for use in Islandora? I can then suggest a toolchain that you can use as a basis for your TEI project.

@whikloj
Copy link
Contributor Author

whikloj commented Jun 20, 2017

It is not stored in a custom system, so it is fairly easy to access from the filesystem.

I just need to grab the TEI files and Tiffs. I have some that will go to Large Image (maps, photos), some for compound (postcards, reports) and some for books (diaries).

@mjordan
Copy link
Collaborator

mjordan commented Jun 20, 2017

@whikloj developer docs are one of MIK's most glaring gaps. We've known this since #14, way back in May 2015. and it's been on our 1.0 release roadmap since the beginning. 😞 That said, if @MarcusBarnes 's advice doesn't yield a useful existing toolchain, we can work through your task with you.

@whikloj
Copy link
Contributor Author

whikloj commented Jun 20, 2017

@MarcusBarnes @mjordan Thanks for the responses and I'm certainly not complaining, it looks to be a really useful tool.

But as the barrier to access (none) and the number of items is relatively small (couple 1000) in my migration. (We are already using the standard uofm_newspaper_batch module to pull in the newspapers).

So I'm on the border of it being faster to just write a python script to reformat these things, but at the same time MiK seems very interesting to me. I'm just getting into this so I was unclear how the tool would know to run function(X) in my new TEIGetter class.

As a developer, I would suggest that an abstract class without any abstract methods should really be removed. It gives the impression that the classes are all extending a single class, when in reality they can all be totally unrelated.

@whikloj
Copy link
Contributor Author

whikloj commented Jun 20, 2017

Looking through the FileGetters it appears that getChildren() is in all of them and getFilePath() is in most (but not all) of them. In the cases where they don't have getFilePath() how do they know what gets executed?

This is just something I noticed and you can ignore it, but you have several ContentDM classes that have the same __construct. That seems like it is a perfect place for a sub-class. Then you could just call the parent constructor.

@mjordan
Copy link
Collaborator

mjordan commented Jun 20, 2017

As a developer, I would suggest that an abstract class without any abstract methods should really be removed. It gives the impression that the classes are all extending a single class, when in reality they can all be totally unrelated.

@whikloj That's a totally reasonable impression on your part.

@MarcusBarnes 's point about allowing the most diverse implementations is correct, but we probably overloaded that ability early on and didn't pay attention to it as the number of FileGetter classes grew. For example, all child classes have a getChildren() method:

CdmBooks.php:    public function getChildren($pointer)
CdmCompound.php:    public function getChildren($pointer)
CdmNewspapers.php:    public function getChildren($pointer)
CdmPdfDocuments.php:    public function getChildren($pointer)
CdmSingleFile.php:    public function getChildren($pointer)
CsvBooks.php:    public function getChildren($record_key)
CsvCompound.php:    public function getChildren($record_key)
CsvNewspapers.php:    public function getChildren($record_key)
CsvSingleFile.php:    public function getChildren($pointer)
OaipmhIslandoraObj.php:    public function getChildren($record_key)
OaipmhIslandoraObj.php.orig:    public function getChildren($record_key)
OaipmhOjsPdf.php:    public function getChildren($record_key)
OaipmhVital_2.php:    public function getChildren($record_key)
OaipmhXpath.php:    public function getChildren($record_key)

All FileGetter classes that deal with single file objects (CsvSingleFile, etc.) have a getFilePath() method

CsvSingleFile.php:    public function getFilePath($record_key)
OaipmhIslandoraObj.php:    public function getFilePath($record_key)
OaipmhIslandoraObj.php.orig:    public function getFilePath($record_key)
OaipmhOjsPdf.php:    public function getFilePath($identifier)
OaipmhVital_2.php:    public function getFilePath($record_key)
OaipmhXpath.php:    public function getFilePath($record_key)

and then it kind of breaks down from there to accommodate the specific needs of the toolchain:

CsvCompound.php has getCpdSourcePath()
CdmBooks.php has getPageOBJfileContent()
CsvBooks.php has getBookSourcePath()

etc.

@mjordan
Copy link
Collaborator

mjordan commented Jun 20, 2017

@whikloj sorry I posted my last comment before seeing your latest one. Yes, would make sense. I'm not sure we have the appetite to refactor the Cdm classes at this point since we don't have unit tests for those (we never got around to mocking a CONTENTdm server....) and they were certainly written on a fast timeline, so they are not perfect OOP code.

@mjordan
Copy link
Collaborator

mjordan commented Jun 20, 2017

@whikloj to answer your question about what fires the correct method in the FileGetters, it's whatever is within the toolchain's Writer, specifically, within the Writer's writePackages() method. We started developer docs to explain all this in the "Writing fetchers, file getters, metadata parsers, and writers" section of https://github.com/MarcusBarnes/mik/wiki/Information-for-developers, but never got very far.

@MarcusBarnes
Copy link
Owner

@mjordan I'm expecting to have an opportunity to use the CONTENTdm classes again. If things come together properly, I'll aim to do the simple refactoring of the CONTENTdm classes as suggested in the second part of #400 (comment) as I'll have access to a CONTENTdm instance to test against.

@whikloj
Copy link
Contributor Author

whikloj commented Jun 20, 2017

Oh okay, so the Writer is what calls stuff in the Getter and the Writer.php is an abstract class with required methods so that would be the entry point to the workflow. Your writer uses your getter to get files to write out. Am I close?

@MarcusBarnes
Copy link
Owner

@whikloj That's the idea. For certain toolchains (particularly CONTENTdm toochains), a great deal of compilation needs to take place before everything is in place to write out packages.

@mjordan
Copy link
Collaborator

mjordan commented Jun 21, 2017

@whikloj I've just added a section to https://github.com/MarcusBarnes/mik/wiki/Information-for-developers, directly at the top of the "Extending MIK" section, that explains MIK's overall workflow and the importance of the Writer's writePackages() method. I know you are asking for class API documentation, but does the new section fill in the context a bit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants