Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create vivo/home on application start-up (#192.1) #370

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ghost
Copy link

@ghost ghost commented Feb 23, 2023

VIVO GitHub issue: vivo-project/VIVO#3034

See details on related PR: vivo-project/VIVO#3839

Also see details on reverted PR: #192

Interested parties

@VIVO-project/vivo-committers @chenejac

@ghost ghost force-pushed the VIVO-3034-psuedo-rebase branch 2 times, most recently from 4a9e598 to fbd1de9 Compare February 23, 2023 21:39
@ghost ghost force-pushed the VIVO-3034-psuedo-rebase branch from fbd1de9 to bd48d16 Compare February 23, 2023 21:40
@ghost ghost marked this pull request as ready for review February 27, 2023 14:45
@ghost ghost linked an issue Feb 27, 2023 that may be closed by this pull request
Copy link
Contributor

@chenejac chenejac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwelling thanks for contribution. Can you please check my two comments? Also, I am wondering do we need example-settings.xml in Vitro project?

@ghost ghost requested a review from chenejac March 1, 2023 14:14
kaladay
kaladay previously approved these changes Mar 2, 2023
// Entry is a File
boolean write = true;

// reading bytes into memory to avoid having to unreliably reset stream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be warned, this can be a memory problem for large tar files.
It might be a good idea to do some sort of reasonable size check.
If under a certain max size, then load into memory; otherwise, just suffer the unreliable reset stream.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not. Give it more memory for the time needed to load the files. Stop trying to make every time I/O runtime. I am pretty sure this is still first-time startup and not every time startup. If we continue to add logic the certainty will be reduced. Closed loops for each time are needed in a centralized isolated scope. Application startup is determined by the orchestration not the composition.

I do agree that the exception should be handled appropriately. Maybe a warning log after exception bubbled to its encapsulation. Let's see.

* @return input stream of VIVO home tar file
*/
private InputStream getHomeDirTar() {
String tarLocation = "/WEB-INF/resources/home-files/vivo-home.tar";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised it would need a leading slash.

@ghost
Copy link
Author

ghost commented Mar 2, 2023

@litvinovg @chenejac it turns out that properties from an existing runtime properties in an existing VIVO home directory will be used over what comes from the context.xml.

https://github.com/vivo-project/Vitro/blob/VIVO-3034-psuedo-rebase/api/src/main/java/edu/cornell/mannlib/vitro/webapp/config/ConfigurationPropertiesImpl.java#L38

The context.xml properties are placed into a HashMap first and then the runtime properties. If the runtime properties has the Vitro.defaultNamespace and the rootUser.emailAddress it will replace what is defined in the context.xml or substituted from JAVA_OPTS.

@chenejac
Copy link
Contributor

chenejac commented Mar 3, 2023

@litvinovg @chenejac it turns out that properties from an existing runtime properties in an existing VIVO home directory will be used over what comes from the context.xml.

https://github.com/vivo-project/Vitro/blob/VIVO-3034-psuedo-rebase/api/src/main/java/edu/cornell/mannlib/vitro/webapp/config/ConfigurationPropertiesImpl.java#L38

The context.xml properties are placed into a HashMap first and then the runtime properties. If the runtime properties has the Vitro.defaultNamespace and the rootUser.emailAddress it will replace what is defined in the context.xml or substituted from JAVA_OPTS.

Not sure this is not desired behavior. This means when someone installing VIVO for the first time, and there is no VIVO_HOME, the information from context.xml will be used. If someone is upgrading VIVO, the information from runtime.properties are used (for older versions of VIVO prior to this PR). Not sure how likely is scenario that existing custom VIVO user would like to change defaultNamespace and emailAddress? Anyway, if that is needed, it is possible, we will describe that in the documentation (remove properties from runtime.properties and define in context.xml). I would keep this as it is now, if there are no some strong arguments against.

@ghost ghost requested review from kaladay and chenejac March 6, 2023 14:31
Copy link
Contributor

@chenejac chenejac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwelling I have posted a couple of suggestions, just to align VIVO and Vitro project. I think we should also consider adding properties in Vitro/pom.xml for making it possible to build Vitro project separately

        <app-name>vivo</app-name>
        <vivo-dir>${basedir}</vivo-dir>
        <root-user-address>vivo_root@mydomain.edu</root-user-address>
        <default-namespace>http://vivo.mydomain.edu/individual/</default-namespace>

I like the idea behind this PR, and I think this is definitely step forward, so I am on the side for approval of this PR. However, I will try to test different scenarios these days. If some scenario is not fully covered, but is not likely that we have a lot of VIVO customers who might suffer because of that, we will cover that by extension/customization of wiki documentation.

<Environment
type="java.lang.String"
name="vitro/rootUserAddress"
value="vivo_root@mydomain.edu" override="true"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value="vivo_root@mydomain.edu" override="true"/>
value="${root-user-address}" override="true"/>

<Environment
type="java.lang.String"
name="vitro/defaultNamespace"
value="http://vivo.mydomain.edu/individual/" override="true"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value="http://vivo.mydomain.edu/individual/" override="true"/>
value="${default-namespace}" override="true"/>

Copy link
Contributor

@chenejac chenejac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwelling can you please check my comments?
I think we have at least two scenarios for VIVO installers who are customizing VIVO:

  1. VIVO adopters who forked repository and making some changes there. I suppose in the process of upgrading VIVO, those adopters are actually merging their own customizations with new version of VIVO changes at the level of git (including changes in home/rdf directory).
  2. This PR actually enabling decoupling building and deployment process, which means that some future VIVO adopters don't need source code at all, they can just download war file of a release, define context in tomcat and run tomcat. Also, they can use the same approach for upgrading VIVO. However, those users can make some customization at the level of VIVO_HOME/rdf, and those customizations with more or less effort should be in place after upgrading VIVO to newer version. Not sure how often this case will be in the future, but we should try to support that, and I think your idea with manual creation of digest.md5 is good enough, we should just make it works and document that at some wiki page. My comments are linked with this case, which is not working at the moment in my test:
  • build war file
  • create an empty VIVO_HOME
  • run tomcat
  • the VIVO_HOME is successfully created!!!
  • imagine we are customizing VIVO by changing VIVO_HOME/rdf/abox/filegraph/us-states.rdf (remove Alabama)
  • imagine that we have new VIVO version which is changing some other file (I tried to change continents.n3 file), and want to upgrade VIVO
  • manually create digest.md5 file for VIVO_HOME
  • build war file (pointing to the same VIVO_HOME)
  • run tomcat
  • change in continents.n3 file is correct (from the fake new version of VIVO), but customized us-states.rdf file is overwritten (Alabama is back in the file), even though I have manually created digest.md5 file

*
* Checksum digest can be manually created with the following command.
*
* `find /vivo/home -type f | grep -E "^/vivo/home/bin/|^/vivo/home/config/|^/vivo/home/rdf/" | xargs md5sum > /vivo/home/digest.md5`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am using absolute paths for manually made digest.md5, it is not working for me in the untar method (relative paths starting with bin/, rdf/, config/* are expected there), therefore I have replaced this command with the one below. @wwelling can you please check is there any sense in that

Suggested change
* `find /vivo/home -type f | grep -E "^/vivo/home/bin/|^/vivo/home/config/|^/vivo/home/rdf/" | xargs md5sum > /vivo/home/digest.md5`
* `find /vivo/home -type f | grep -E "^/vivo/home/bin/|^/vivo/home/config/|^/vivo/home/rdf/" | sed 's/^[^\.]\+home\///' | xargs md5sum > /vivo/home/digest.md5`

Copy link
Author

@ghost ghost Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add additional logging in the vivo home directory class, (VitroHomeDirectory.java). Would you prefer DEBUG/WARN/INFO log levels populated each class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that might be useful. Depending of the status of Vitro home creation process, I suppose FATAL, ERROR, WARN and INFO levels should be used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenejac can we standardize the format of the digest file? Parallel File Systems – Enabling insights from our data 4/7/2023 11:04AM CT? If the application generates a POSIX standard file, we should be good to go. I suppose we should write a test for the digest file against the POSIX standard... Can we create an issue for this instead of any further changes to this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenejac can we standardize the format of the digest file? Parallel File Systems – Enabling insights from our data 4/7/2023 11:04AM CT? If the application generates a POSIX standard file, we should be good to go. I suppose we should write a test for the digest file against the POSIX standard... Can we create an issue for this instead of any further changes to this PR?

Sure, I think it is good idea, and I think it can be resolved in the new PR. @wwelling can you please open a ticket/github issue? Thanks

InputStream homeDirTar = getHomeDirTar();
TarArchiveInputStream tarInput = new TarArchiveInputStream(homeDirTar);
) {
while ((tarEntry = tarInput.getNextTarEntry()) != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while ((tarEntry = tarInput.getNextTarEntry()) != null) {
digest.put("untarVitro", checksum("generated by Vitro in deployment process".getBytes()));
while ((tarEntry = tarInput.getNextTarEntry()) != null) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding an entry in digest file which representing a flag that digest file is not manually created by using a command from terminal. It will allow to make distinction between

  1. if file has not changed in home and is not the same as new file (in new version of VIVO/Vitro)
  2. if file has been changed in home and it is aligned with VIVO_HOME/digest.md5 by manual creation of that using command from terminal, and is not the same as new file (in new version of VIVO/Vitro)

In the first case, it should be overwritten, in the second it shouldn't

Copy link
Author

@ghost ghost Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears, those whomever customized the ontology is not interested in submitting PR back upstream. We can try to force that hand through a digest file. Provide enough justification while being discrete and helpful as possible. I am no lawyer however it seems possible addition can be made to the existing license with enough layers. Anyhow, the digest file is just a hack to not overwrite something that changed manually.

Copy link
Author

@ghost ghost Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or does it overwrite. Yeah, that probably makes the biggest difference. Have to change the narrative around again.

The application defines the VIVO home directory not the implementer. If the end user is defined to be able to modify the VIVO home, they should be able to through the UI only.

Comment on lines +139 to +140
write = storedDigest.get(outFilename).equals(existingFileChecksum)
&& !existingFileChecksum.equals(newFileChecksum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also returning true for scenario number 3 after manual creation of digest.md5 from the command line:
An institution is upgrading VIVO to some future VIVO which is not changing any file which should be copied to config or rdf subdirectories of VIVO_HOME, but institution customized its previous VIVO version by changing something in VIVO_HOME/config or VIVO_HOME/rdf.
So, this is not situation when some institution is creating a fork and making changes from there, this is a situation when a VIVO installer is making customization direct in VIVO_HOME/rdf files. In order to prevent overwriting customized files in this case, I have tried to make change in line 114 in this file (see suggestion there), plus change in this line:

Suggested change
write = storedDigest.get(outFilename).equals(existingFileChecksum)
&& !existingFileChecksum.equals(newFileChecksum);
write = storedDigest.get(outFilename).equals(existingFileChecksum)
&& !existingFileChecksum.equals(newFileChecksum) && storedDigest.containsKey("untarVitro");

If some institution if forking VIVO/Vitro, and customized instance there, I suppose in the process of upgrade they will merge changes at the level of git, and run build from there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time a file changes in the VIVO home directory the digest must be updated, or the file will be overwritten on restart.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just add an environment variable that gets toggled after first time. This of course has to be persisted somewhere and if it changes somehow in the middle of a person making changes to the VIVO home directory it has to be changed back. Well, I have no confident in any expert in the world to make meaningful changes to an ontology without persisting the work they have done. This is accomplished by source versioning. As described before if you are building a fork of VIVO with changes in the home or installer artifact in how the home directory is defined, that will occur all the same as before with a digest file created on next start.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to find out some common practices from the VIVO community, I think that VIVO LG is planning some VIVO implementers survey in the next period, I will check whether we can add a couple of questions about direct and indirect changes in the VIVO_HOME directory and the process of upgrading VIVO.

@ghost
Copy link
Author

ghost commented Apr 7, 2023

@wwelling thanks for contribution. Can you please check my two comments? Also, I am wondering do we need example-settings.xml in Vitro project?

@chenejac I am thinking this through. In the scenario where we want to create a new Vitro application/project we want it to start as a developer without any effort. Without an example-settings.xml, how will we provide one without skimming documentation?

@chenejac
Copy link
Contributor

@wwelling thanks for contribution. Can you please check my two comments? Also, I am wondering do we need example-settings.xml in Vitro project?

@chenejac I am thinking this through. In the scenario where we want to create a new Vitro application/project we want it to start as a developer without any effort. Without an example-settings.xml, how will we provide one without skimming documentation?

So, we can't do it in the same way as VIVO by introducing variables app-name and vivo-dir and running maven build, and later unpacking war file into tomcat container?

@ghost
Copy link
Author

ghost commented Apr 22, 2023

@chenejac What would we need an example-settings.xml in Vitro for? It will no longer used in VIVO, vivo-project/VIVO#3844.

@ghost ghost force-pushed the VIVO-3034-psuedo-rebase branch from bd2d06e to 817a2d5 Compare April 22, 2023 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants