Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[source-sftp-bulkd] Setup of CSV file format stream fails with CheckAvailabilityError #37602

Open
1 task
jessica-bauer opened this issue Apr 26, 2024 · 2 comments

Comments

@jessica-bauer
Copy link

Connector Name

source-sftp-bulk

Connector Version

1.0.0

What step the error happened?

Configuring a new connector

Relevant information

I am trying to set up the SFTP Bulk source to pull the latest CSV file from our SFTP server. There are multiple files (5 max) that get uploaded into our SFTP each day and I need to have the latest uploaded file overwrite the data in the MySQL database.

However, I keep receiving a Configuration check failed error when trying to set up the source. I can confirm that the credentials are correct and the CSV files contain values, which were able to sync to the MySQL database (this was all done successfully with the source-sftp connector).

I anticipate my focus will probably need to be directed to the "list of streams to sync" portion as that's where I feel the failure is, but I am not sure what requires adjusting. This is what I have listed there:

Days To Sync If History Is Full
3

Format
{ "filetype": "csv", "encoding": "utf8", "delimiter": ",", "quote_char": "\"", "null_values": [], "true_values": [ "y", "yes", "t", "true", "on", "1" ], "double_quote": true, "false_values": [ "n", "no", "f", "false", "off", "0" ], "inference_type": "None", "header_definition": { "header_definition_type": "From CSV" }, "strings_can_be_null": true, "skip_rows_after_header": 0, "skip_rows_before_header": 0, "ignore_errors_on_fields_mismatch": false }

Globs
[ "**" ]

Name
staff

Schemaless
false

Validation Policy
Emit Record

Please let me know if you require additional information. I appreciate any insight or support you can provide me.

Relevant log output

['Traceback (most recent call last):\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 95, in _check_parse_record\n record = next(iter(parser.parse_records(stream.config, file, self.stream_reader, logger, discovered_schema=None)))\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/file_types/csv_parser.py", line 202, in parse_records\n for row in data_generator:\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/file_types/csv_parser.py", line 62, in read_data\n self._skip_rows(fp, rows_to_skip)\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/file_types/csv_parser.py", line 128, in _skip_rows\n fp.readline()\n File "/usr/local/lib/python3.9/site-packages/paramiko/file.py", line 275, in readline\n new_data = self._read(n)\n File "/usr/local/lib/python3.9/site-packages/paramiko/sftp_file.py", line 185, in _read\n t, msg = self.sftp._request(\n File "/usr/local/lib/python3.9/site-packages/paramiko/sftp_client.py", line 857, in _request\n return self._read_response(num)\n File "/usr/local/lib/python3.9/site-packages/paramiko/sftp_client.py", line 909, in _read_response\n self._convert_status(msg)\n File "/usr/local/lib/python3.9/site-packages/paramiko/sftp_client.py", line 938, in _convert_status\n raise IOError(errno.ENOENT, text)\nFileNotFoundError: [Errno 2] No such file\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 64, in check_availability_and_parsability\n self._check_parse_record(stream, file, logger)\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 102, in _check_parse_record\n raise CheckAvailabilityError(FileBasedSourceError.ERROR_READING_FILE, stream=stream.name, file=file.uri) from exc\nairbyte_cdk.sources.file_based.exceptions.CheckAvailabilityError: Error opening file. Please check the credentials provided in the config and verify that they provide permission to read files. Contact Support if you need assistance.\nstream=staff file=/Home/USER/outbound/20240422054039_report_prod_123abc6_.csv\n']

Contribute

  • Yes, I want to contribute
@marcosmarxm
Copy link
Member

Thanks for reporting the issue @jessica-bauer SFTP Bulk is a community connector and it isn't in the current roadmap for improvements. If you want to contribute fixing the issue please reach me out in Slack so I can provide you instructions to make the contribution 🎖️

@marcosmarxm marcosmarxm changed the title Setup of CSV file format stream in SFTP Bulk source fails with CheckAvailabilityError [source-sftp-bulkd] Setup of CSV file format stream fails with CheckAvailabilityError Apr 30, 2024
@jessica-bauer
Copy link
Author

Unfortunately I don't have the capability to contribute to this connector. I've determined the issue seems to be a permission issue between the SFTP server where the files are housed and the connector, which I'm not sure how since the other SFTP (non-bulk) connector worked fine. Transferring the files to another SFTP server seems to have helped me get around this issue for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants