-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileNotFoundError #74
Comments
@martindut fix as been merged into master, please reopen if you are still experiencing this issue with 0.2.1 release in PyPI. |
@houqp , now I'm getting this error if I run df = dt.to_pyarrow_table(): |
@martindut I think our integration tests aren't covering the use of pyarrow and the Azure storage engine. Would you mind opening a new issue with some details? |
@houqp I think this is still an issue. I am on version deltalake-0.4.8 numpy-1.20.3 pyarrow-4.0.0 When I run: from deltalake import DeltaTable
import os
os.environ['AZURE_STORAGE_ACCOUNT']='xxx'
os.environ['AZURE_STORAGE_KEY']='xx'
dt = DeltaTable('abfss://xxx@xxx.dfs.core.windows.net/delta_example/')
print(f'table version: {dt.version()}')
print(f'list of files: {dt.file_paths()}')
# convert to data.frame
df = dt.to_pyarrow_table().to_pandas() The
|
@samuel100 are you able to read one of the parquet file that's causing the error with |
Hi. I run the following code to open an delta table on Azure Datalake Gen 2
`python
from deltalake import DeltaTable
import os
os.environ['AZURE_STORAGE_ACCOUNT'] = 'xxxxxx'
os.environ['AZURE_STORAGE_KEY'] = 'xxxxxxxxxxxxxxxxx'
dt = DeltaTable("abfss://xxxxxxxx@xxxxxxx.dfs.core.windows.net/delta/silver/rawdata/holdings/taxhld/v1.0")
dt.version()
dt.files()
dt.file_paths()
`
This all works fine and it lists all the parquet files in the folders, but when I do
df = dt.to_pyarrow_table()
I get this error
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.8/site-packages/deltalake/__init__.py", line 29, in to_pyarrow_table return self.to_pyarrow_dataset().to_table() File "/usr/local/lib/python3.8/site-packages/deltalake/__init__.py", line 26, in to_pyarrow_dataset return dataset(self._table.file_paths(), format="parquet") File "/usr/local/lib/python3.8/site-packages/pyarrow/dataset.py", line 674, in dataset return _filesystem_dataset(source, **kwargs) File "/usr/local/lib/python3.8/site-packages/pyarrow/dataset.py", line 426, in _filesystem_dataset fs, paths_or_selector = _ensure_multiple_sources(source, filesystem) File "/usr/local/lib/python3.8/site-packages/pyarrow/dataset.py", line 312, in _ensure_multiple_sources raise FileNotFoundError(info.path) FileNotFoundError: abfss://xxxxx@xxxxx.dfs.core.windows.net/delta/silver/rawdata/holdings/taxhld/v1.0/company_name=xxx/source_db_name=xxx/source_fund_name=02/file_date=2020-01-01/part-00002-ab186831-cb3b-4294-8d2b-c2377e8eea52.c000.snappy.parquet
but the file does exist and is listed in the dt.file_paths()
The text was updated successfully, but these errors were encountered: