Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Share Fails When Attempting to Read Delta Table #197

Closed
dtgdev opened this issue Oct 11, 2022 · 5 comments
Closed

Delta Share Fails When Attempting to Read Delta Table #197

dtgdev opened this issue Oct 11, 2022 · 5 comments

Comments

@dtgdev
Copy link

dtgdev commented Oct 11, 2022

import delta_sharing

table_url = "/Users/user1/Applications/open-datasets.share#share1.default.test_facilities"

pandas_df = delta_sharing.load_as_pandas(table_url)

pandas_df.head(10)

My config.yaml:

The format version of this config file

version: 1

Config shares/schemas/tables to share

shares:

  • name: "share1"
    schemas:
    • name: "default"
      tables:
      • name: "test_facilities"
        location: "/tmp/test_facilities"

host: "localhost"
port: 9999
endpoint: "/delta-sharing"

I keep getting the following error when I run the above program. The delta table is there

HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9999/delta-sharing/shares/share1/schemas/default/tables/test_facilities/query
Response from server:
{'errorCode': 'INTERNAL_ERROR', 'message': ''}

Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: File system class org.apache.hadoop.fs.LocalFileSystem is not supported
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2055)
at com.google.common.cache.LocalCache.get(LocalCache.java:3966)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863)
at io.delta.standalone.internal.DeltaSharedTableLoader.loadTable(DeltaSharedTableLoader.scala:54)
at io.delta.sharing.server.DeltaSharingService.$anonfun$listFiles$1(DeltaSharingService.scala:282)
at io.delta.sharing.server.DeltaSharingService.processRequest(DeltaSharingService.scala:169)
... 60 more
Caused by: java.lang.IllegalStateException: File system class org.apache.hadoop.fs.LocalFileSystem is not supported
at io.delta.standalone.internal.DeltaSharedTable.$anonfun$fileSigner$1(DeltaSharedTableLoader.scala:97)
at io.delta.standalone.internal.DeltaSharedTable.withClassLoader(DeltaSharedTableLoader.scala:109)
at io.delta.standalone.internal.DeltaSharedTable.(DeltaSharedTableLoader.scala:84)
at io.delta.standalone.internal.DeltaSharedTableLoader.$anonfun$loadTable$1(DeltaSharedTableLoader.scala:58)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4868)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3533)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2282)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049)
... 65 more

@zsxwing
Copy link
Member

zsxwing commented Oct 11, 2022

Files on the local file system are not supported by the server. You need to put your table in S3, Azure, or GCS.

@dtgdev
Copy link
Author

dtgdev commented Oct 11, 2022

That makes. Thanks

I have similar problem in S3

import os
import delta_sharing

Point to the profile file. It can be a file on the local file system or a file on a remote storage.

profile_file = os.path.abspath('open-datasets.share')

Create a SharingClient.

client = delta_sharing.SharingClient(profile_file)

List all shared tables.

print("########### All Available Tables #############")
print(client.list_all_tables())

All available tables are printed correctly but when I try read the data from the table

The following errors are thrown when I run delta_sharing.lod_as_pandas(table_url, limit=10) . I have spent many hours to debug it but no luck. Any guidance will be much appreciated

table_url = profile_file + "#share1.default.test_locations"
data = delta_sharing.lod_as_pandas(table_url, limit=10)
print(data)

My config.yaml

The format version of this config file

version: 1

Config shares/schemas/tables to share

shares:

  • name: "share1"
    schemas:
    • name: "default"
      tables:
      • name: "test_locations"
        location: "s3a://test/test_locations"

host: "myserver.cxl.io"
port: 9999
endpoint: "/delta-sharing"

I can query the delta table but not with delta share

Cell In [8], line 15
13 # Fetch 10 rows from a table and convert it to a Pandas DataFrame. This can be used to read sample data from a table that cannot fit in the memory.
14 print("########### Loading 10 rows as a Pandas DataFrame #############")
---> 15 data = delta_sharing.load_as_pandas(table_url, limit=10)
16 # Print the sample.
17 print("########### Show the fetched 10 rows #############")

File ~/Library/Python/3.9/lib/python/site-packages/delta_sharing/delta_sharing.py:71, in load_as_pandas(url, limit, version)
69 profile_json, share, schema, table = _parse_url(url)
70 profile = DeltaSharingProfile.read_from_file(profile_json)
---> 71 return DeltaSharingReader(
72 table=Table(name=table, share=share, schema=schema),
73 rest_client=DataSharingRestClient(profile),
74 limit=limit,
75 version=version,
76 ).to_pandas()

File ~/Library/Python/3.9/lib/python/site-packages/delta_sharing/reader.py:71, in DeltaSharingReader.to_pandas(self)
70 def to_pandas(self) -> pd.DataFrame:
---> 71 response = self._rest_client.list_files_in_table(
72 self._table,
73 predicateHints=self._predicateHints,
74 limitHint=self._limit,
75 version=self._version
76 )
78 schema_json = loads(response.metadata.schema_string)
80 if len(response.add_files) == 0 or self._limit == 0:

File ~/Library/Python/3.9/lib/python/site-packages/delta_sharing/rest_client.py:111, in retry_with_exponential_backoff..func_with_retry(self, *arg, **kwargs)
106 raise HTTPError(
107 "It may be caused by an expired token as it has expired at "
108 + f"{self._profile.expiration_time}"
109 ) from e
110 else:
--> 111 raise e

File ~/Library/Python/3.9/lib/python/site-packages/delta_sharing/rest_client.py:99, in retry_with_exponential_backoff..func_with_retry(self, *arg, **kwargs)
97 times_retried += 1
98 try:
---> 99 return func(self, *arg, **kwargs)
100 except Exception as e:
101 if self._should_retry(e) and times_retried <= self._num_retries:

File ~/Library/Python/3.9/lib/python/site-packages/delta_sharing/rest_client.py:267, in DataSharingRestClient.list_files_in_table(self, table, predicateHints, limitHint, version)
264 if version is not None:
265 data["version"] = version
--> 267 with self._post_internal(
268 f"/shares/{table.share}/schemas/{table.schema}/tables/{table.name}/query",
269 data=data,
270 ) as lines:
271 protocol_json = json.loads(next(lines))
272 metadata_json = json.loads(next(lines))

File /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/contextlib.py:117, in _GeneratorContextManager.enter(self)
115 del self.args, self.kwds, self.func
116 try:
--> 117 return next(self.gen)
118 except StopIteration:
119 raise RuntimeError("generator didn't yield") from None

File ~/Library/Python/3.9/lib/python/site-packages/delta_sharing/rest_client.py:346, in DataSharingRestClient._request_internal(self, request, target, **kwargs)
344 except ValueError:
345 pass
--> 346 raise HTTPError(message, response=e.response) from None
347 finally:
348 response.close()

HTTPError: 500 Server Error: Internal Server Error for url: https://myserver.cxl.io/delta-sharing/shares/share1/schemas/default/tables/test_locations/query
Response from server:
{'errorCode': 'INTERNAL_ERROR', 'message': ''}

@zsxwing
Copy link
Member

zsxwing commented Oct 11, 2022

What's the error in the server? The most common issue is your server doesn't have the permission to access S3 (the S3 credential is not set up correctly)

@dtgdev
Copy link
Author

dtgdev commented Oct 12, 2022

You are correct. That fixed the issue. Thank you very much for you guidance!

@zsxwing
Copy link
Member

zsxwing commented Oct 12, 2022

Great to see you fixed the issue!

@zsxwing zsxwing closed this as completed Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants