Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using S3 as inbox backend storage #22

Merged
merged 15 commits into from
Feb 8, 2019
Merged

Using S3 as inbox backend storage #22

merged 15 commits into from
Feb 8, 2019

Conversation

silverdaz
Copy link
Contributor

Using S3 as inbox backend storage

Describe the pull request:

  • Bug fix
  • Functional change
  • New feature
  • Code cleanup
  • Build system change
  • Documentation change
  • Language translation

Pull request long description:

The storage module can read and write to either a POSIX file system or an S3 backend.
Ingestion now reads from an S3 backend in case the --inbox-backend s3 is passed to the bootstrap script.

Changes made:

  1. Added sections in the bootstrap script to spawn an S3 volume, and a Minio container on it.
  2. Adjusted the configurations to make ingest.py point to that backend
  3. Fixed a issue in the storage.py latest additions (generating the huge headers)
  4. Adjusted the notification message that land in the inbox queue.

Related issues:

#15
Fixes the huge header issue (not listed, but still existing)

Additional information:

Added a simple.bats with only one ingestion of a 1MB test file.
Calling bats tests/integration/simple.bats for simplicity and quick checking if it all works.

Mentions:

@dtitov: You can try that branch with your setup in swarm.
I tried it and ingestion was succesful
@blankdots, @viklund : Could you review the python code, please?

@silverdaz silverdaz self-assigned this Jan 21, 2019
@silverdaz silverdaz added this to the Sprint 41 milestone Jan 21, 2019
@silverdaz silverdaz changed the title Inbox s3 Using S3 as inbox backend storage Jan 21, 2019
@@ -13,9 +13,10 @@ class testIngest(unittest.TestCase):
Testing ingestion functionalities.
"""

@mock.patch('lega.ingest.getattr')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to patch it that way? The getattr is not exactly in the lega.ingest namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any suggestions to patch it another way ?

It is a global function, but it is used in that module, we do the same for consume and get_connection. I do recognize a more appropriate way to do it would be to mock the result for each getattr, but I think this is for a later PR when we revisit the unit-tests.

tests/unit/test_storage.py Outdated Show resolved Hide resolved
@dtitov
Copy link
Contributor

dtitov commented Jan 22, 2019

      - S3_ACCESS_KEY=Nzk99F3wzIPGSAV7
      - S3_SECRET_KEY=9d3gRjFv2Qgy7l70iH0SugjC7HYYWwHm
      - AWS_ACCESS_KEY_ID=Nzk99F3wzIPGSAV7
      - AWS_SECRET_ACCESS_KEY=9d3gRjFv2Qgy7l70iH0SugjC7HYYWwHm

Why are we setting it twice to the ingestion microservice?

@dtitov
Copy link
Contributor

dtitov commented Jan 24, 2019

Tested in Swarm setup:

  • Ingestion works.
  • Outgestion doesn't.

I mean, all steps until Verify microservice (including it) are successful, but when I obtain the ingested file from RES, it gives me the wrong file (wrong content), which doesn't match original one (no errors in logs).

@silverdaz, could you, please, also try outgesting the file with Docker Compose setup using this branch? To make sure that it's not my local issue.

Also: does Bats test suite contain outgestion test? I didn't find it.

@dtitov
Copy link
Contributor

dtitov commented Jan 24, 2019

After another try, it worked (outgestion). I've asked Oskar to also test outgestion to make sure that it's not some kind of a floating bug.

@dtitov dtitov self-requested a review January 24, 2019 11:02
@omllobet
Copy link
Contributor

omllobet commented Jan 24, 2019

I tested it and confirmed it works.

@dtitov
Copy link
Contributor

dtitov commented Jan 24, 2019

@omllobet, thanks! Then my first outgestion failure was something unrelated. I approve.

Copy link
Contributor

@jbygdell jbygdell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet again we have a merge commit in a PR.
Can we not disable Allow merge commits in the repo settings to prevent these things from happening.
A rebased merge gives a much cleaner git history.

@silverdaz
Copy link
Contributor Author

That's the github interface, when resolving conflict: it merges one commit with another one.
This is how git works, @jbygdell, why are you so bothered about it?
Is it actually just the commit text that you don't like (it says "merged X with Y")?

@jbygdell
Copy link
Contributor

Because the git history gets messed up.
What will happen now is that the commits will end up in incorrect order, they will show up in the order of when they where committed to the base branch not when they where committed to master.
If you want to incorporate new changes from master into your dev branch you rebase your branch on top of master.

Copy link
Contributor

@blankdots blankdots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Comments along source code:

I assume we don't have an integration test (be it even skipped tests) for the finalize functionality, or i missed it ?

╰─$ make ps
   Name                 Command               State                                              Ports                                           
-------------------------------------------------------------------------------------------------------------------------------------------------
cega-mq      docker-entrypoint.sh rabbi ...   Up       15671/tcp, 0.0.0.0:15670->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 0.0.0.0:5670->5672/tcp
cega-users   python3.6 /cega/users.py 0 ...   Up       0.0.0.0:15671->80/tcp                                                                     
db           /bin/bash /usr/bin/ega-ent ...   Up       5432/tcp                                                                                  
finalize     gosu lega ega-id-mapper          Exit 1                                                                                             
inbox        entrypoint.sh                    Up       0.0.0.0:2222->9000/tcp                                                                    
ingest       gosu lega ega-ingest             Up                                                                                                 
keys         java -jar /ega-data-api-ke ...   Up                                                                                                 
mq           /bin/bash /usr/bin/ega-ent ...   Up       15671/tcp, 0.0.0.0:15672->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp              
vault        /usr/bin/docker-entrypoint ...   Up       9000/tcp                                                                                  
verify       gosu lega ega-verify             Up                                                                              

It is because of the fact it uses an old version of the code, but not sure the code injection does something:

finalize      | ModuleNotFoundError: No module named 'lega.mapper'
finalize      | Traceback (most recent call last):
finalize      |   File "/usr/bin/ega-id-mapper", line 11, in <module>
finalize      |     load_entry_point('lega==1.1', 'console_scripts', 'ega-id-mapper')()
finalize      |   File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 480, in load_entry_point
finalize      |     return get_distribution(dist).load_entry_point(group, name)
finalize      |   File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2693, in load_entry_point
finalize      |     return ep.load()
finalize      |   File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in load
finalize      |     return self.resolve()
finalize      |   File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2330, in resolve
finalize      |     module = __import__(self.module_name, fromlist=['__name__'], level=0)
finalize      | ModuleNotFoundError: No module named 'lega.mapper'

@blankdots
Copy link
Contributor

actually this might be the cause of the finalize failing:
entrypoint: ["gosu", "lega", "ega-id-mapper"] https://github.com/EGA-archive/LocalEGA/blob/inbox-s3/deploy/bootstrap/run.sh#L368 ... still would like a test for it, not ok that it passes like this!

@blankdots
Copy link
Contributor

this: #22 (comment) to be fixed by: #25 other issues mentioned by the reviewers should be addressed, thus approving.

@silverdaz
Copy link
Contributor Author

Related to Stefan's comment: This issue was already there, and fixed in the next PR.
Related to Joakim's comment: I didn't include the merge commit, the github web interface did.
As this does not break the PR's features, I'm keeping it as it is now, and I'll try to avoid using the web interface in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants