Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vsihdfs Support #714

Closed
wants to merge 21 commits into from
Closed

Add vsihdfs Support #714

wants to merge 21 commits into from

Conversation

jamesmcclain
Copy link
Contributor

@jamesmcclain jamesmcclain commented Jun 25, 2018

What does this PR do?

This pull request adds (read-only) support for HDFS in the form of a vsihdfs driver.

What are related issues/pull requests?

Tasklist

  • Review
  • Adjust for comments
  • All CI builds and checks have passed
  • documenting this new FS in doc/gdal_virtual_file_systems.dox
  • adding some testing of it in a new autotest/gcore/vsihdfs.py

Environment

Provide environment details, if relevant:

  • OS:

Mostly developed on Linux Mint 17.3 (effectively Ubuntu 14.04), should be fairly portable to any Unix-like system.

  • Compiler:
src/gdal% gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

To Build

  1. Re-generate build files by typing ./autogen.sh in the root of the repository.
  2. Set the JAVA_HOME environment variable. In my case, export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
  3. Augment the LD_LIBRARY_PATH environment variable. In my case export LD_LIBRARY_PATH=$HOME/local/hadoop-2.7.6/lib/native:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server:$LD_LIBRARY_PATH. That is necessary for final link steps at the end to succeed (perhaps this could be topic for discussion because my current understanding of autoconf is not overwhelming).
  4. Make sure that the CLASSPATH environment variable points to the Hadoop native libraries. In my case export CLASSPATH=$($HOME/local/hadoop-2.7.6/bin/hadoop classpath --glob). This is not required for building, but is required for running.
  5. Configure, making use of the new --with-hdfs flag. In my case, /configure --prefix=$HOME/local/gdal-master --with-java=$JAVA_HOME --with-hdfs=$HOME/local/hadoop-2.7.6.
  6. Build
  7. Install

To Run

  1. LD_LIBRARY_PATH and CLASSPATH must be as described above.
  2. Run a program e.g. $HOME/local/gdal-master/bin/gdalinfo /vsihdfs/file:/tmp/TC_NG_Baghdad_IQ_Geo.tif

The above command produces this output for me:

Driver: GTiff/GeoTIFF
Files: none associated
Size is 4323, 4323
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-122.718792017114666,38.052231141926995)
Pixel Size = (0.000138800000000,-0.000138800000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (-122.7187920,  38.0522311) (122d43' 7.65"W, 38d 3' 8.03"N)
Lower Left  (-122.7187920,  37.4521987) (122d43' 7.65"W, 37d27' 7.92"N)
Upper Right (-122.1187596,  38.0522311) (122d 7' 7.53"W, 38d 3' 8.03"N)
Lower Right (-122.1187596,  37.4521987) (122d 7' 7.53"W, 37d27' 7.92"N)
Center      (-122.4187758,  37.7522149) (122d25' 7.59"W, 37d45' 7.97"N)
Band 1 Block=4323x1 Type=Byte, ColorInterp=Red
Band 2 Block=4323x1 Type=Byte, ColorInterp=Green
Band 3 Block=4323x1 Type=Byte, ColorInterp=Blue

@rouault
Copy link
Member

rouault commented Jun 26, 2018

I didn't actually look at your code, since seeing you use Java libraries, my main question is: why not using the REST API exposed at https://hadoop.apache.org/docs/r1.0.4/webhdfs.html like the other network VSI implementation in GDAL ? At first sight the API seems reasonable, and that would simplify so much the use and deployment of vsihdfs

@jamesmcclain
Copy link
Contributor Author

I didn't actually look at your code, since seeing you use Java libraries, my main question is: why not using the REST API

I think that the primary motivation is speed/scalability. The WebAPI does look like a convenient choice for many cases, though.


VSIHdfsFilesystemHandler::VSIHdfsFilesystemHandler()
{
this->poFilesystem = hdfsConnect("default", 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the overhead of this ? This constructor will be called at GDAL startup when VSI is initialized, so if there's significant time involved, the connection should be moved to the first time a real access (open, stat, etc) is done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

dnl JNI
dnl ---------------------------------------------------------------------------

AC_ARG_WITH(jni,[ --with-jni[=ARG] Include JVM JNI support (ARG=Path or no)],,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you mentionned in the checklist, there's already an option to detect the JVM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

else
if test JNI_ENABLED="yes"; then
AC_MSG_CHECKING([for HDFS in $with_hdfs])
AC_CHECK_LIB(hdfs,hdfsConnect,HAVE_HDFS_LIB=yes,HAVE_HDFS_LIB=no,$JNI_LIB -L$with_hdfs/lib/native)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so libhdfs is a C-api library that calls Java code ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. A small amount of information can be found here. The $HADOOP_PREFIX/include/hdfs.h header is the most comprehensive source of information on libhdfs that I have found.

* DEALINGS IN THE SOFTWARE.
****************************************************************************/

//! @cond Doxygen_Suppress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be an empty implementation of VSIInstallHdfsHandler() when HDFS_ENABLED is not defined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

vsi_l_offset Length()
{
hdfsFileInfo * poInfo = hdfsGetPathInfo(this->poFilesystem, this->oFilename.c_str());
if (poInfo != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use nullptr everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

size_t Write(const void *pBuffer, size_t nSize, size_t nMemb) override;
vsi_l_offset Length()
{
hdfsFileInfo * poInfo = hdfsGetPathInfo(this->poFilesystem, this->oFilename.c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a number of useless this-> throughout the code. If you want to distinguish local variables from class members, you could use m_xxxx or xxxxx_ convention for class member variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

vsi_l_offset Tell() override;
size_t Read(void *pBuffer, size_t nSize, size_t nMemb) override;
size_t Write(const void *pBuffer, size_t nSize, size_t nMemb) override;
vsi_l_offset Length()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it is not a one-liner, could you move the implementation of this method outside of the class declaration ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


int VSIHdfsHandle::Eof()
{
return (this->Tell() == this->Length());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not completely POSIX compliant. If a file is for example one-byte large, and you read one byte in it, and then call Eof(), it should return FALSE. Eof() should only be set to TRUE if you attempt to read beyond the file, and should be reset as soon as you do a Seek()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
else {
const char * pszPath = pszFilename + strlen(VSIHDFS);
hdfsFile poFile = hdfsOpenFile(poFilesystem, pszPath, O_RDONLY, 0, 0, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume hdfs API is thread-safe ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the claim can be found here. The exact wording used there seems to leave a little bit of room for a negative answer if there is a deficiency within the JNI functionality, but that is not believed to be the case.


hdfsFileInfo * paoInfo = hdfsListDirectory(this->poFilesystem, pszDirname, &mEntries);
if (paoInfo != NULL) {
papszNames = new char*[mEntries+1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be simplified a bit by using

CPLStringList aosNames;
for (int i = 0; i < mEntries; ++i)
    aosNames.AddString(paoInfo[i].mName);
return aosNames.StealList();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rouault
Copy link
Member

rouault commented Jun 26, 2018

Is there a way to silent off the Java exceptions on failed open() attempts ? They should not be verbose ideally

@jamesmcclain
Copy link
Contributor Author

jamesmcclain commented Jun 26, 2018

Is there a way to silent off the Java exceptions on failed open() attempts ? They should not be verbose ideally

That is a good question, I will try to find an answer.

EDIT: I have addressed this.

@rouault
Copy link
Member

rouault commented Jun 26, 2018

Reported by Doxygen

/home/travis/build/OSGeo/gdal/gdal/port/cpl_vsil_hdfs.cpp:346: warning: Found an #else without a preceding #if.
/home/travis/build/OSGeo/gdal/gdal/port/cpl_vsil_hdfs.cpp:353: warning: More #endif's than #if's found.

@@ -209,6 +207,9 @@ VSIHdfsFilesystemHandler::Open( const char *pszFilename,
const char *pszAccess,
bool)
{
if (poFilesystem == nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be protected by a mutex. And probably shall be done in Stat() as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return 0;
}
else if (bytes < 0)
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 is invalid for size_t. Should be 0

return hdfsRead(poFilesystem, poFile, pBuffer, nSize * nMemb);
int bytes = hdfsRead(poFilesystem, poFile, pBuffer, nSize * nMemb);
if (bytes == 0) {
bEOF = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is bytes == 0 returned if you read partly in the file and partly outside ? In that situation ideally you would return the number of members successfully read, and set eof

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed Read to have the following behavior:

The Read function returns a non-zero value if and only if the read
from HDFS succeeded. Given a read request of more than zero bytes,
hdfsRead returns zero if and only if EOF has been reached, so the EOF
state is set to be true if that is the case. hdfsRead returns -1 if
and only if the read failed, in that case Read returns zero with the
EOF state cleared.

The hdfsRead function returns zero (for a non-zero requested range) if and only if EOF has been reached:

@return      On success, a positive number indicating how many bytes
             were read.
             On end-of-file, 0.
             On error, -1.  Errno will be set to the error code.
             Just like the POSIX read function, hdfsRead will return -1
             and set errno to EINTR if data is temporarily unavailable,
             but we are not yet at the end of the file.

so I think that the behavior should now be correct. errno is also properly set after this code has run in case that is used anywhere.

* DEALINGS IN THE SOFTWARE.
****************************************************************************/

//! @cond Doxygen_Suppress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This //! @cond Doxygen_Suppress is not properly balanced with the corresponding //! @endcond if you take into account #ifdef / #endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. I don't have doxygen installed locally so I was reduced to guessing. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I may have cleaned these issues up, finally.

@rouault
Copy link
Member

rouault commented Jun 27, 2018

Doxygen still complaining since it takes the dummy code path

Generating code for file /home/travis/build/OSGeo/gdal/gdal/port/cpl_vsi.h:337: warning: Member VSIInstallHdfsHandler(void) (function) of file cpl_vsi.h is not documented.

You should be able to make it happy be defining HDFS_ENABLED in the PREDEFINED variable of the gdal/Doxyfile file

Windows build also fail since VSIInstallHdfsHandler() is called, but not defined. You should add cpl_vsil_hdfs.obj to port/makefile.vc

@jamesmcclain jamesmcclain force-pushed the vsihdfs branch 4 times, most recently from 53c14ea to 8898478 Compare June 27, 2018 16:37
@@ -260,8 +267,10 @@ VSIHdfsFilesystemHandler::Open( const char *pszFilename,
int
VSIHdfsFilesystemHandler::Stat( const char *pszeFilename, VSIStatBufL *pStatBuf, int)
{
memset(pStatBuf, 0, sizeof(*pStatBuf)); // XXX
CPLMutexHolder oHolder( &hMutex );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poFilesystem might be uninitialized at that point if Open() has never been called

@@ -61,6 +61,16 @@ CPL_CVSID("$Id$")
/* ==================================================================== */
/************************************************************************/

#define SILENCE(expr) {\
int hOldStderr = dup(2);\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case the process run short of file handles available, dup() and open() could fail. It would be prudent to check them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, done.

@@ -217,6 +217,11 @@ class VSIHdfsFilesystemHandler final : public VSIFilesystemHandler
VSIHdfsFilesystemHandler();
~VSIHdfsFilesystemHandler() override;

void EnsureFilesystem() {
if (poFilesystem == nullptr)
poFilesystem = hdfsConnect("default", 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent indentation. And implementation would be best outside of the class declaration. And you could also move the CPLMutexHolder in this method, since it is needed just for this lazy instanciation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, done.

@jamesmcclain jamesmcclain changed the title [WiP] Add vsihdfs Support Add vsihdfs Support Jun 28, 2018
@jamesmcclain
Copy link
Contributor Author

I think that all current comments have been satisfied.

@rouault
Copy link
Member

rouault commented Jun 29, 2018

I think that all current comments have been satisfied.

Yep, but there are still missing items from my point of view:

  • documenting this new FS in doc/gdal_virtual_file_systems.dox
  • adding some testing of it in a new autotest/gcore/vsihdfs.py

@jamesmcclain
Copy link
Contributor Author

Yep, but there are still missing items from my point of view:

Okay, sounds good, I will get to those as soon as I can.

def vsihdfs_1():
filename = '/vsihdfs/file:' + os.getcwd() + '/data/text.txt'
fp = gdal.VSIFOpenL(filename, 'rb')
if fp is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this is an optionally compiled VFS, tests should return 'skip' if the VFS is not present. Unfortunately there's no easy way of knowing if a VFS is compiled in or not, so presumably you would use that initial failure as a sign it is missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have implemented this.

return 'fail'

data = gdal.VSIFReadL(5, 1, fp)
if not data or data != 'Lorem':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you likely need a data.decode('ascii') != 'Lorem' (and in similar following call sites) for Python3 compat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, done.

@rouault
Copy link
Member

rouault commented Jun 30, 2018

A few Pyflakes warnings raised

autotest/gcore/vsihdfs.py:35: 'time' imported but unused
autotest/gcore/vsihdfs.py:125: local variable 'data' is assigned to but never used

@jamesmcclain
Copy link
Contributor Author

A few Pyflakes warnings raised

Okay, I think that those have been corrected.

# Read test
def vsihdfs_1():
filename = '/vsihdfs/file:' + os.getcwd() + '/data/text.txt'
fp = gdal.VSIFOpenL(filename, 'rb')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could set here gdaltest.have_vsihdfs = False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have done this.

fp = gdal.VSIFOpenL(filename, 'rb')
if fp is None:
return 'skip'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here gdaltest.have_vsihdfs = True, and then in the following test step you would test this instead of trying to reopen the file when you don't need it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have done this.


/vsihdfs/ is a file system handler that provides read access to HDFS.
This handler requires that GDAL to have been built with Java support
(--with-java) and HDFS support (--with-hdfs).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add that only Unix build support is implemented currently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.


Examples:
<pre>
/vsizip/file:/tmp/my.tif (a local file accessed through HDFS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the examples look wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed.

@rouault
Copy link
Member

rouault commented Jul 1, 2018

Squashed and merged

@rouault rouault closed this Jul 1, 2018
@jamesmcclain
Copy link
Contributor Author

Thanks much

@jamesmcclain jamesmcclain deleted the vsihdfs branch July 2, 2018 00:01
@lichenang
Copy link

import org.gdal.gdal.Dataset;
import org.gdal.gdal.Driver;

import org.gdal.gdal.gdal;
import org.gdal.gdalconst.gdalconstConstants;

public class Tifovr2Jpg {
public static void main(String[] args) {
gdal.AllRegister();
gdal.SetConfigOption("gdal_FILENAME_IS_UTF8", "NO");
System.out.println("下标为0的参数:"+args[0]);
Dataset ds=gdal.Open(args[0], gdalconstConstants.GA_ReadOnly);
System.out.println(ds);
//判断数据源是否非空
if (ds==null){
System.out.println(" Dateset为空... ");
System.exit(1);
}

      Driver driver =  ds.GetDriver();
    System.out.println("driver : "+driver);
}

}

root@Kylin:/home/gdalTest# java Tifovr2Jpg /vsihdfs/hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif
下标为0的参数:/vsihdfs/hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif
hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
hdfsGetPathInfo(hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif): constructNewObjectOfPath error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
ERROR 4: `/vsihdfs/hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif' does not exist in the file system, and is not recognized as a supported dataset name.
null
Dateset为空...

root@Kylin:/home/gdalTest# gdalinfo /vsihdfs/hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif
Driver: GTiff/GeoTIFF
Files: /vsihdfs/hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif
Size is 11251, 7501
Coordinate System is `'
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left ( 0.0, 0.0)
Lower Left ( 0.0, 7501.0)
Upper Right (11251.0, 0.0)
Lower Right (11251.0, 7501.0)
Center ( 5625.5, 3750.5)
Band 1 Block=11251x16 Type=Byte, ColorInterp=Red
Band 2 Block=11251x16 Type=Byte, ColorInterp=Green
Band 3 Block=11251x16 Type=Byte, ColorInterp=Blue

I generated files such as gdal.jar through make in the swig/java directory of gdal, and placed them in the ext path of jdk. When running my java project, the above error occurred. When accessing the file through gdalinfo, there was no problem appear.

@lichenang
Copy link

import org.gdal.gdal.Dataset;
import org.gdal.gdal.Driver;

import org.gdal.gdal.gdal;
import org.gdal.gdalconst.gdalconstConstants;

public class Tifovr2Jpg {
    public static void main(String[] args) {
        gdal.AllRegister();
        gdal.SetConfigOption("gdal_FILENAME_IS_UTF8", "NO");
        System.out.println("下标为0的参数:"+args[0]);
        Dataset ds=gdal.Open(args[0], gdalconstConstants.GA_ReadOnly);
        System.out.println(ds);
        //判断数据源是否非空
        if (ds==null){
		      System.out.println(" Dateset为空... ");
		      System.exit(1);
        }

	    Driver driver =  ds.GetDriver();
	    System.out.println("driver : "+driver);
	}

}
root@Kylin:/home/gdalTest# java Tifovr2Jpg /vsihdfs/hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif
下标为0的参数:/vsihdfs/hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif
hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
hdfsGetPathInfo(hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif): constructNewObjectOfPath error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
ERROR 4: `/vsihdfs/hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif' does not exist in the file system, and is not recognized as a supported dataset name.
null
 Dateset为空...
root@Kylin:/home/gdalTest# gdalinfo  /vsihdfs/hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif
Driver: GTiff/GeoTIFF
Files: /vsihdfs/hdfs://192.168.56.46:8020/home/chdh/rawdata/chdl/image/06490082/20220415000000/06490082.tif
Size is 11251, 7501
Coordinate System is `'
Image Structure Metadata:
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0, 7501.0)
Upper Right (11251.0,    0.0)
Lower Right (11251.0, 7501.0)
Center      ( 5625.5, 3750.5)
Band 1 Block=11251x16 Type=Byte, ColorInterp=Red
Band 2 Block=11251x16 Type=Byte, ColorInterp=Green
Band 3 Block=11251x16 Type=Byte, ColorInterp=Blue

I generated files such as gdal.jar through make in the swig/java directory of gdal, and placed them in the ext path of jdk. When running my java project, the above error occurred. When accessing the file through gdalinfo, there was no problem appear.

@jamesmcclain
Copy link
Contributor Author

@lichenang

Hello, thank you for your question. It is hard to say with precision what is going on, but the fact that gdalinfo works but code in a Java process does not might be an important clue.

When you use the vsihdfs code, a JVM is created which runs the parts of the filesystem code that are written in Java.

Therefore, when you write a Java program that uses the vsihdfs filesystem, you get a process with two JVMs: there is a JVM that contains the entry point of your program, that JVM invokes native code that is part of gdal, and that native code starts another JVM to run the parts of the HDFS implementation that are written in Java. This has been known to cause problems with some versions of the JDK. (Your stack trace doesn't seem to exhibit the problem just described, but I mentioned that to give you a flavor of the kind of problems that you can encounter.)

I might suggest trying the vsihdfsweb filesystem from within your Java code to see if that works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants