Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectStoreGlobber.glob(path) is returning null. Seems that globStatus(path) is not handling the wildcard * properly #95

Closed
gilv opened this issue Nov 18, 2016 · 25 comments

Comments

@gilv
Copy link
Contributor

gilv commented Nov 18, 2016

Create an object with the partition structure, something like
container/data/year=2015/month=11/data1.txt
container/data/year=2016/month=12/data2.txt
container/data/year=2015/month=11/data3.txt
container/data/year=2015/month=11/data4.txt

And now access all objects of the form
container/data/year=*/month=*

Stocator will return wrong result.

We also need to make sure that null is not returned, but empty collection. When ObjectStoreGlobber.glob(path) does not find any matches it should return an empty array instead of null

@gilv
Copy link
Contributor Author

gilv commented Nov 18, 2016

@djalova Can you please look at it? Thanks

@djalova
Copy link
Contributor

djalova commented Nov 19, 2016

@gilv I have objects with that structure but if I query for "container/data/year=*/month=*", it returns the correct results. For which query does this fail?

@gilv
Copy link
Contributor Author

gilv commented Nov 22, 2016

@djalova thanks. But there is some bug with returning wrong results when there are partitions /year=aaa/month=bbb and so on. I don't know yet what query return wrong result...can you experiment a bit with it? If i understand correctly then container/data/year=*/month=* this one work fine for you...may be you can try other options? like to list container/data or container/data/year=*?

@djalova
Copy link
Contributor

djalova commented Dec 1, 2016

@gilv For container/data there needs to be a 0 byte file named container/data for it to work. The container/data/year=* should work as is.

@robinnt
Copy link

robinnt commented Dec 7, 2016

I am also seeing an inconsistency with the default file system when there are no results returned from calling ObjectStoreFileSystem.globStatus. An empty array is expected but null is returned instead. I am seeing NullPointerException in many places because of this difference. Can this be changed to return an empty array when there are no results.

@djalova
Copy link
Contributor

djalova commented Dec 7, 2016

@robinnt This should be fixed in master, are you still running into NPEs with the latest code?

@robinnt
Copy link

robinnt commented Dec 9, 2016

@djalova I tried the code in the master branch and the NPEs are fixed. In the hive branch the fixes are not there but there are other fixes that are in progress that are needed. Can you add the NPE fix to the hive branch.

@djalova
Copy link
Contributor

djalova commented Dec 9, 2016

@robinnt The commit that fixes this should be in the hive branch as well. Can you send me a stack trace?

@robinnt
Copy link

robinnt commented Dec 9, 2016

@djalova ok got it. Thanks.

@robinnt
Copy link

robinnt commented Dec 13, 2016

Using the latest code in hive-support branch I am still seeing a problem where a call to globStatus returns an empty array when objects exist in swift that match.

These objects exist in swift:

gendata_19yrs_stoc
gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=2015/month=11
gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=2015/month=11/part-r-00003-attempt_1481248735958_0003_r_000003_0
gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=2015/month=12
gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=2015/month=12/part-r-00004-attempt_1481248735958_0003_r_000004_0

But nothing is returned by

globStatus("/gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=*/month=*")

@djalova
Copy link
Contributor

djalova commented Dec 13, 2016

@robinnt Does it still pick up everything if you query for "gendata_19yrs_stoc*" ?

@robinnt
Copy link

robinnt commented Dec 13, 2016

@djalova Also no results returned with

globStatus("/gendata_19yrs_stoc*")
or
globStatus("/gendata_19yrs_stoc/*")

@djalova
Copy link
Contributor

djalova commented Dec 13, 2016

@robinnt Do you have any logs I can look at? We have tests in TestSwiftOperations.java that mimic your scenario and those are passing. Maybe there is another issue here.

@robinnt
Copy link

robinnt commented Dec 13, 2016

Here is part of the log from calling globStatus until 404 Not Found is returned


2016-12-12 13:44:26,157 DEBUG [CommitterEvent Processor #1] com.ibm.stocator.fs.ObjectStoreFileSystem: Glob status: /gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=*/month=*

2016-12-12 13:44:26,159 DEBUG [CommitterEvent Processor #1] com.ibm.stocator.fs.common.ObjectStoreGlobber: Welcome to glob : /gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=*/month=*

2016-12-12 13:44:26,161 DEBUG [CommitterEvent Processor #1] com.ibm.stocator.fs.ObjectStoreFileSystem: list status: swift2d://stoctables.rn-softlayer/gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=,  prefix based true

2016-12-12 13:44:26,161 DEBUG [CommitterEvent Processor #1] com.ibm.stocator.fs.ObjectStoreFileSystem: get file status: swift2d://stoctables.rn-softlayer/gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=

2016-12-12 13:44:26,170 DEBUG [CommitterEvent Processor #1] org.javaswift.joss.command.impl.core.AbstractCommand: JOSS / Sending HTTP HEAD call https://dal.objectstorage.open.softlayer.com/v1/AUTH_b8233702081a4b98ab1cf044db6298bd/stoctables/gendata_19yrs_stoc%2F_DYN0.127403861618094193923682792174938877438%2Fyear%3D

2016-12-12 13:44:26,170 DEBUG [CommitterEvent Processor #1] org.javaswift.joss.command.impl.core.AbstractCommand: * X-Auth-Token=gAAAAABYTxo5E99AydN46thBkZzGcGy0i1lnF8Y5nTN-C4E8gWFBzh5h1T8ruLSleQqt6iXxDGW9RlgDAPP3qg-ZE6lxLK4c-jqlJtnbLOXiWdHPycO2a9hC_B9joILQgZkCngRa8tIvp3JRWsN60mpxNC_93yDSNgy2y0bWc2PdtHRaYcPtIKA

2016-12-12 13:44:26,170 DEBUG [CommitterEvent Processor #1] org.javaswift.joss.command.impl.core.AbstractCommand: * CONNECTION=Keep-Alive

2016-12-12 13:44:26,171 DEBUG [CommitterEvent Processor #1] org.apache.http.client.protocol.RequestAddCookies: CookieSpec selected: default

2016-12-12 13:44:26,171 DEBUG [CommitterEvent Processor #1] org.apache.http.client.protocol.RequestAuthCache: Auth cache not set in the context

2016-12-12 13:44:26,171 DEBUG [CommitterEvent Processor #1] org.apache.http.impl.conn.PoolingHttpClientConnectionManager: Connection request: [route: {s}->https://dal.objectstorage.open.softlayer.com:443][total kept alive: 1; route allocated: 1 of 25; total allocated: 1 of 50]

2016-12-12 13:44:26,171 DEBUG [CommitterEvent Processor #1] org.apache.http.impl.conn.PoolingHttpClientConnectionManager: Connection leased: [id: 0][route: {s}->https://dal.objectstorage.open.softlayer.com:443][total kept alive: 0; route allocated: 1 of 25; total allocated: 1 of 50]

2016-12-12 13:44:26,171 DEBUG [CommitterEvent Processor #1] org.apache.http.impl.conn.DefaultManagedHttpClientConnection: http-outgoing-0: set socket timeout to 5000

2016-12-12 13:44:26,171 DEBUG [CommitterEvent Processor #1] org.apache.http.impl.execchain.MainClientExec: Executing request HEAD /v1/AUTH_b8233702081a4b98ab1cf044db6298bd/stoctables/gendata_19yrs_stoc%2F_DYN0.127403861618094193923682792174938877438%2Fyear%3D HTTP/1.1

2016-12-12 13:44:26,171 DEBUG [CommitterEvent Processor #1] org.apache.http.impl.execchain.MainClientExec: Target auth state: UNCHALLENGED

2016-12-12 13:44:26,171 DEBUG [CommitterEvent Processor #1] org.apache.http.impl.execchain.MainClientExec: Proxy auth state: UNCHALLENGED

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.headers: http-outgoing-0 >> HEAD /v1/AUTH_b8233702081a4b98ab1cf044db6298bd/stoctables/gendata_19yrs_stoc%2F_DYN0.127403861618094193923682792174938877438%2Fyear%3D HTTP/1.1

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.headers: http-outgoing-0 >> X-Auth-Token: gAAAAABYTxo5E99AydN46thBkZzGcGy0i1lnF8Y5nTN-C4E8gWFBzh5h1T8ruLSleQqt6iXxDGW9RlgDAPP3qg-ZE6lxLK4c-jqlJtnbLOXiWdHPycO2a9hC_B9joILQgZkCngRa8tIvp3JRWsN60mpxNC_93yDSNgy2y0bWc2PdtHRaYcPtIKA

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.headers: http-outgoing-0 >> CONNECTION: Keep-Alive

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.headers: http-outgoing-0 >> Host: dal.objectstorage.open.softlayer.com

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.headers: http-outgoing-0 >> User-Agent: Apache-HttpClient/4.5.2 (Java/1.8.0_77)

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.headers: http-outgoing-0 >> Accept-Encoding: gzip,deflate

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.wire: http-outgoing-0 >> "HEAD /v1/AUTH_b8233702081a4b98ab1cf044db6298bd/stoctables/gendata_19yrs_stoc%2F_DYN0.127403861618094193923682792174938877438%2Fyear%3D HTTP/1.1[\r][\n]"

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.wire: http-outgoing-0 >> "X-Auth-Token: gAAAAABYTxo5E99AydN46thBkZzGcGy0i1lnF8Y5nTN-C4E8gWFBzh5h1T8ruLSleQqt6iXxDGW9RlgDAPP3qg-ZE6lxLK4c-jqlJtnbLOXiWdHPycO2a9hC_B9joILQgZkCngRa8tIvp3JRWsN60mpxNC_93yDSNgy2y0bWc2PdtHRaYcPtIKA[\r][\n]"

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.wire: http-outgoing-0 >> "CONNECTION: Keep-Alive[\r][\n]"

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.wire: http-outgoing-0 >> "Host: dal.objectstorage.open.softlayer.com[\r][\n]"

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.wire: http-outgoing-0 >> "User-Agent: Apache-HttpClient/4.5.2 (Java/1.8.0_77)[\r][\n]"

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.wire: http-outgoing-0 >> "Accept-Encoding: gzip,deflate[\r][\n]"

2016-12-12 13:44:26,172 DEBUG [CommitterEvent Processor #1] org.apache.http.wire: http-outgoing-0 >> "[\r][\n]"

2016-12-12 13:44:26,303 DEBUG [CommitterEvent Processor #1] org.apache.http.wire: http-outgoing-0 << "HTTP/1.1 404 Not Found[\r][\n]"

@robinnt
Copy link

robinnt commented Dec 13, 2016

@djalova
Copy link
Contributor

djalova commented Dec 13, 2016

@robinnt Does your query fail? I didn't see any exceptions thrown. The 404 error occurs because the ObjectStorefileSystem.listStatus() tries to look for objects that match the path exactly, and then when it fails it then decides to look for objects with that prefix. I can fix this behavior, but if you are running into any errors this isn't likely to be the cause. In the logs, the listing eventually returns 8 objects.

@robinnt
Copy link

robinnt commented Dec 14, 2016

That is correct there are no longer any exceptions thrown but the data is not committed into the correct location for the table so the table appears to be empty. This log is from a Big SQL LOAD command where the data is written to a temp directory (gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438) until all map tasks are completed successfully and then the data is moved to the final location (gendata_19yrs_stoc/). The globStatus is called during the move from temp dir to final dir but it incorrectly returns 0 objects so the commit fails with no data moved to the final dir. When the table is queried it is empty because all the data is still under the temp dir and was not properly moved to the table's location.

@djalova
Copy link
Contributor

djalova commented Dec 14, 2016

It seems in the logs there is a delete operation before the listing that causes an exception. Do you know where the delete is coming from? I think it deletes the objects that the next operation looks for.

2016-12-12 13:44:27,256 DEBUG [CommitterEvent Processor #1] com.ibm.stocator.fs.ObjectStoreFileSystem: delete: swift2d://stoctables.rn-softlayer/gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=__HIVE_DEFAULT_PARTITION__/month=__HIVE_DEFAULT_PARTITION__/_logs recursive true. modifed name stoctables/gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=__HIVE_DEFAULT_PARTITION__/month=__HIVE_DEFAULT_PARTITION__/_logs, hostname swift2d://stoctables.rn-softlayer/
...
2016-12-12 13:44:27,339 DEBUG [CommitterEvent Processor #1] com.ibm.stocator.fs.swift.SwiftAPIClient: Listing of stoctables/gendata_19yrs_stoc/_DYN0.127403861618094193923682792174938877438/year=__HIVE_DEFAULT_PARTITION__/month=__HIVE_DEFAULT_PARTITION__/_logs completed with 0 results
2016-12-12 13:44:27,460 DEBUG [CommitterEvent Processor #1] org.apache.hadoop.conf.Configuration: Handling deprecation for all properties in config...

@robinnt
Copy link

robinnt commented Dec 14, 2016

Yes after the globStatus returns an empty list the job tries to clean up and delete a _logs dir. This is expected. After the job ends the objects under the temp dir are still in swift so they were not deleted.

Seems when globStatus is called there should have been some messages with "Candidate accepted" showing the objects found but those messages are missing so something went wrong in the ObjectStoreGlobber.listStatus call where nothing is returned.

@djalova
Copy link
Contributor

djalova commented Dec 14, 2016

Are these files marked if the write operation is successful (i.e. does gendata.../_SUCCESS exist)?

@robinnt
Copy link

robinnt commented Dec 15, 2016

The _SUCCESS file is not created by LOAD

@djalova
Copy link
Contributor

djalova commented Dec 15, 2016

@gilv I think this is caused by this line https://github.com/SparkTC/stocator/blob/master/src/main/java/com/ibm/stocator/fs/swift/SwiftAPIClient.java#L502
From the logs, I can see that when glob() calls APIClient.list() the PaginationMap returns 4 objects, but none of them are returned back to glob. Since there isn't a _SUCCESS file the results in PaginationMap are ignored.

@robinnt
Copy link

robinnt commented Dec 15, 2016

I reran the rest with log level set to ALL so the trace messages are shown. You can see it passes through the code noted above.
container_e09_1481248735958_0007_01_000001.txt
above.

@djalova
Copy link
Contributor

djalova commented Dec 15, 2016

The following lines show that APIClient.list() skips over the objects because _SUCCESS is missing.

2016-12-15 11:03:43,999 TRACE [CommitterEvent Processor #1] com.ibm.stocator.fs.swift.SwiftAPIClient: gendata_19yrs_stoc2/_DYN0.2581013875130941702537108163988541555423/year=2015/month=11 created by failed Spark job. Skipped
...
2016-12-15 11:03:44,209 TRACE [CommitterEvent Processor #1] com.ibm.stocator.fs.swift.SwiftAPIClient: gendata_19yrs_stoc2/_DYN0.2581013875130941702537108163988541555423/year=2015/month=12 created by failed Spark job. Skipped

We currently use this file to differentiate between complete and incomplete files. If it interferes with other operations we might need to find another way to do this. @gilv What do you suggest?

@gilv
Copy link
Contributor Author

gilv commented Mar 22, 2018

Issue resolved with the introduction of the new object store globber

@gilv gilv closed this as completed Mar 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants