Skip to content

Conversation

@jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Sep 22, 2023

What does this PR do?

The work in this PR is to investigate the 'errors' with the ftplib.storbinary method that are logged for the 'people' feed (described in this Github issue) and the 'articles' feed (described in this Jira ticket].

Here are some key learnings from this effort:

  • Articles feed

    • It seems the error may be due to the FTP connection 'hanging' because as storbinary() is running (it takes >5 minutes for the Article, it is 'blocking' any other commands from being run on the FTP server. This leads to an 'idle' FTP connection. As described in this thread:

      Either your client is timing out the control channel, or the server is. So, when you try to hang up with ftp.quit(), it will either hang forever or raise an exception.

    • Extending CarbonFtpsTls.timeout does not seem to have an effect. It just delays the logging of the TimeoutError.
    • There doesn't appear to be a straightforward way to keep the connection alive, but the TimeoutError has not shown to be an issue and will still result in an XML file uploaded to the Elements FTP server.
  • People feed

    • The error goes away when we use the default FTPS_TLS.storbinary() method.

Helpful background context

While looking into this issue, @ghukill came across this thread, which we assume is the reason for previously overwriting the storbinary() command. However, as stated previously, using the default FTPS_TLS.storbinary() method results in the same XML file while avoiding the error for the 'people' feed.

How can a reviewer manually see the effects of these changes?

Compare logs from different code iterations, and verify that latest code enables successful Carbon runs with improved logging

Produced by code in main:

Produced by Commenting out CarbonFtpsTls.storbinary():

Code in this branch

Compare XML files produced with latest code (excludes custom CarbonFtpsTls.storbinary against XML files produced with current code in main)

  1. A simple diff shows that comparing the pairs of files (original vs. latest) have the same bytes/characters.
image
  1. Logs are the same when running original code vs. latest code. Note that the FTP-related errors do not appear because these tests involve running the application in a Docker container on a local machine.
    • Latest
      image
    • Original
      image

Includes new or updated dependencies?

NO

Developer

  • README is updated to reflect all changes as needed
  • All related Jira tickets are linked in commit message(s)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated to reflect all changes or is not needed
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

@jonavellecuerdo jonavellecuerdo self-assigned this Sep 22, 2023
Why these changes are being introduced:
* The ftplib.storbinary method throws errors that suggest file
transfer has failed even when the Carbon run successfully uploads
an XML file to the Elements FTP server.

For the 'people' feed, it would log:
ftplib.error_temp: 425 Error while transfering data: ECONNABORTED - Connection aborted

For the 'articles' feed, it would log:
TimeoutError: The read operation timed out

These changes introduce cleaner error handling that avoid the error for
the 'people' feed and except' timeout errors that do not actually seem to
have a negative impact.

How this addresses that need:
* Remove custom storbinary method to resolve errors for 'people' feed
* Add try-except-finally code block to cleanly handle timeout errors for 'articles' feed

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/jira/software/c/projects/IN/boards/70?modal=detail&selectedIssue=IN-924
@jonavellecuerdo jonavellecuerdo force-pushed the IN-924-fix-ftp-file-transfer-method branch from 4927e98 to 05b509f Compare September 22, 2023 19:18
@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review September 25, 2023 15:51
@jonavellecuerdo
Copy link
Contributor Author

jonavellecuerdo commented Sep 25, 2023

@ghukill @ehanson8 I made a small update to the README about accessing data from Carbon. As discussed with Christopher, I created a Confluence document that gives a general overview of the process and then linked it in the README. Let me know what you think!

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Thanks for the detailed PR and confluence document.

Can confirm from pairing that debugging this was difficult and involved. Because it appears the XML file is getting written successfully, the try / except TimeOut / except <generic> / finally block makes sense to me. And, with the logging in place, over time we could see how many times that timeout warning is thrown.

My two cents is that introducing workarounds to avoid the timeout on ftps.quit() may not be worth the complexity it introduces, given the "piping" nature of this application.

@ehanson8
Copy link
Contributor

This is a very thorough explanation of the issue which is useful for an outsider, a good template for future PRs detailing a complicated situation

**Note**: As of this writing, Apple M1 Macs cannot run Oracle Instant Client.
* If you are on a machine that can run Oracle Instant Client, follow the steps outlined in [Without Docker](#without-docker).

The data retrieved by the Carbon application contains personally identifiable information (PII), so downloading the files is not recommended. However, if examining the files created by Carbon is **absolutely necessary** for testing purposes, this can be done on your local machine via a Docker container. For more information, please refer to the Confluence document: [How to download files from an application that connects to the Data Warehouse](https://mitlibraries.atlassian.net/wiki/spaces/~712020c6fcb37f39c94e54a6bfd09607f29eb4/pages/3469443097/Running+applications+in+a+local+Docker+Container).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great warning that I should reuse in alma-patronload

@jonavellecuerdo
Copy link
Contributor Author

Thank you both for your assistance and guidance through this exploration!

@jonavellecuerdo jonavellecuerdo merged commit fbbfaa6 into main Sep 25, 2023
@jonavellecuerdo jonavellecuerdo deleted the IN-924-fix-ftp-file-transfer-method branch September 25, 2023 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants