Skip to content

Tinkerpop 2274 tp33#1175

Closed
rpopov wants to merge 42 commits intoapache:tp33from
rpopov:TINKERPOP-2274-tp33
Closed

Tinkerpop 2274 tp33#1175
rpopov wants to merge 42 commits intoapache:tp33from
rpopov:TINKERPOP-2274-tp33

Conversation

@rpopov
Copy link
Copy Markdown
Contributor

@rpopov rpopov commented Aug 5, 2019

For initial review - there are still more changes to come

Resolving locale and file system - specific issues when compiling under Windows (10)

rpopov added 7 commits August 3, 2019 14:31
the test data. In order to avoid MAVEN warnings, provided explicitly the most-recent
version of jacoco in <pluginManagement> and removed <prerequisite>.
to avoid cryptic test errors. Added Readme.md to describe how to install
hadoop, spark and OS integration
…mputer.AbstractHadoopGraphComputerTest

to delete the temorary files first before attempting to create new ones
with the same name. Provided better error diagnostic messages.
Set POM to check if HADOOP_HOME and HADOOP_GREMLIN_LIBS env. vars are set
in advance in order not to fail the build with a cryptic message.
…he classes

and failing the tests due to loading org.apache.tinkerpop.gremlin.TestHelper
from gremlin-code instead of from gremlin-test by replacing the RANDOM
constant with new Random().
*Suggestion:* Remove org.apache.tinkerpop.gremlin.TestHelper from gremlin-core
@rpopov rpopov changed the base branch from master to tp33 August 5, 2019 06:00
@spmallette
Copy link
Copy Markdown
Contributor

Please note that travis is failing the build at this point. The error appears consistent across the different builds, so I think it's a legitimate problem with your changes and not travis instability.

spmallette and others added 4 commits August 5, 2019 07:00
…ow Travis build pass,

but this basically negates the use of Maven enforcer at all - in case of
not installed Hadoop the tests will fail with no indication why, instead of
the script to define & test its environment clearly.
<artifactId>maven-enforcer-plugin</artifactId>
<executions>
<execution>
<id>check-hadoop-installed</id>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify the need for enforcer for these environment variables? Was the build failing without those somehow? I don't have them set at the moment and mvn clean install builds fine for me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is to check the environment the build/tests run in and warn that it is incomplete, instead of throwing cryptic exceptions. Having this stated in the POM serves also as some documentation. In this specific case I could not realize the needed build environment setup neither how the CI environment provides Hadoop libraries for the tests

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where not setting HADOOP_GREMLIN_LIBS generates some WARN messages when you build:

[WARN] org.apache.tinkerpop.gremlin.hadoop.jsr223.HadoopGremlinPlugin - Be sure to set the environmental variable: HADOOP_GREMLIN_LIBS
   [WARN] org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   [WARN] org.apache.tinkerpop.gremlin.hadoop.jsr223.HadoopGremlinPlugin - Be sure to set the environmental variable: HADOOP_GREMLIN_LIBS
   [WARN] org.apache.tinkerpop.gremlin.hadoop.jsr223.HadoopGremlinPlugin - Be sure to set the environmental variable: HADOOP_GREMLIN_LIBS
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.844 s - in org.apache.tinkerpop.gremlin.hadoop.jsr223.HadoopGremlinPluginTest

but it doesn't error, even when you run integration tests on hadoop-gremlin. I think those environment variables are really more for usage at runtime rather than for actual tests. You mentioned that you get "cryptic exceptions" - what are those exceptions and do they occur during build on windows?

I'd rather not force folks to setup up environment variables unless they are necessary to the build somehow. But, now that I look at it again, i assume that <level>WARN</warn> isn't actually enforcement (meaning the build doesn't fail if those environment variables aren't present). So perhaps this addition is ok given how you have done it even if there aren't errors specifically.

@spmallette
Copy link
Copy Markdown
Contributor

The tp33 branch has been re-opened for development and is set to 3.3.9-SNAPSHOT. Please feel free to rebase.

spmallette and others added 11 commits August 13, 2019 08:03
Realized that we weren't even really testing GraphSON 2.0. It was hardcoded to graphson 3.0 for the vast majority of the tests. When I enabled GraphSON 2.0 tests i started getting test failures. I don't think they were failures in the sense that 2.0 doesn't work but more like the test semantics weren't properly setup for 2.0 serialization expectations. Anyway, disabling these tests for now until the issue can be resolved. CTR
the test data. In order to avoid MAVEN warnings, provided explicitly the most-recent
version of jacoco in <pluginManagement> and removed <prerequisite>.
to avoid cryptic test errors. Added Readme.md to describe how to install
hadoop, spark and OS integration
…mputer.AbstractHadoopGraphComputerTest

to delete the temorary files first before attempting to create new ones
with the same name. Provided better error diagnostic messages.
Set POM to check if HADOOP_HOME and HADOOP_GREMLIN_LIBS env. vars are set
in advance in order not to fail the build with a cryptic message.
…he classes

and failing the tests due to loading org.apache.tinkerpop.gremlin.TestHelper
from gremlin-code instead of from gremlin-test by replacing the RANDOM
constant with new Random().
*Suggestion:* Remove org.apache.tinkerpop.gremlin.TestHelper from gremlin-core
rpopov added 10 commits August 16, 2019 21:15
…ow Travis build pass,

but this basically negates the use of Maven enforcer at all - in case of
not installed Hadoop the tests will fail with no indication why, instead of
the script to define & test its environment clearly.
…tHelper

class, that is used both in core and tests projects, as a preparation to
avoid the collistion of TestHelper classes in both projects
…stHelper and

reduced the changes only within the that project. Any other projects should
continue using gremlin-test in order to continue using TestHelper class
withot any change in Java. Made TestHelper publish the methods of CoreTestHelper
this way avoiding duplicated implementations.
org.apache.tinkerpop.gremlin.spark.SparkGremlinGryoSerializerTest
org.apache.tinkerpop.gremlin.spark.SparkGremlinTest
that used to fail with:
ERROR shouldSupportCopyMethods(org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorageCheck)
java.lang.AssertionError
        at org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorageCheck.shouldSupportCopyMethods(SparkContextStorageCheck.java:75)
…X, so

rm may fail. Changed rm to report if it succeeded to remove all files in
scope, while still rm with an empty scope returns false.
Changed POM under Windows to skip the tests with removal of Spark's working files
due to the same problem above. See: [https://issues.apache.org/jira/browse/SPARK-12216]
@spmallette
Copy link
Copy Markdown
Contributor

It looks like you got some extra commits in here somehow. You might need to clean up your git history a bit.

rpopov added 3 commits August 24, 2019 09:56
… and

help further improvement of error reporting
help further improvement of error reporting.
Unify the naming of the Spark/Hadoop store directoires to comply with the
convention of directories naming - all directories end with a name (not with /),
which makes them uniformly refer (as of the spec of CoreTestHelper#makeTestDataDirectory()):
 a/b/c - the directoy itself
 a/b/c/ - the contents of the directory
which unifies and correlates with the behavior of rm:
  rm(a/b/c) - remove the directoy itself
  while in order to remove the contents of the directory and keep it use
  rm(a/b/c/*)
This complies with the tests in spark-gremlin on the Spark/Hadoop file system.
Revealed that the Spark context is not closed between the tests, while
the tests remove its storage files, therefore explicitluy store Spark Cintext
for every test, making the tests independent and (more) corect on Spark use.
Revealed that Spark does not close the FileInputStreams it iterates upon,
so that locks hang up in the file system. Thus, not being able to fix Spark/Hadoop
and specifically the use of MultiIterator, call explicitly System.gc() in
order to finalize the streams remaining open after closing the context.
@spmallette
Copy link
Copy Markdown
Contributor

I just wanted to call attention to my previous comment from a few weeks back - unfortunately, we can't easily evaluate/review this PR unless the commit history is cleaned up a bit.

… Refactor

FileSystemStorage to match the use & specification. Fix possible leaking
resources. Fix inconsistencies in the use of / and home directory.

NOTE: FileSystemStorage is still inconsistent in the use of / and home directory
and in appending /* and *
@rpopov
Copy link
Copy Markdown
Contributor Author

rpopov commented Sep 8, 2019

Replaced by #1188 PR

@rpopov rpopov closed this Sep 8, 2019
@rpopov rpopov deleted the TINKERPOP-2274-tp33 branch September 8, 2019 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants