Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed the implementation of Containers.migration to match the 'lxc … #319

Merged
merged 20 commits into from
Dec 13, 2018

Conversation

gabrik
Copy link
Contributor

@gabrik gabrik commented Jul 11, 2018

This should solve definetively the issue #315
And the behaviour of migration will be the same of using

lxc move <container_name> <remote_name>:

Also added myself to contributors.rst

Hope this is usefull

@codecov-io
Copy link

codecov-io commented Jul 11, 2018

Codecov Report

Merging #319 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #319      +/-   ##
==========================================
+ Coverage   96.84%   96.87%   +0.03%     
==========================================
  Files          11       11              
  Lines         918      928      +10     
  Branches      106      108       +2     
==========================================
+ Hits          889      899      +10     
  Misses         10       10              
  Partials       19       19
Impacted Files Coverage Δ
pylxd/models/container.py 91.42% <100%> (+0.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c442d6b...b1eb0d0. Read the comment docs.

@gabrik gabrik force-pushed the master branch 2 times, most recently from 390e58f to 39d4569 Compare July 11, 2018 14:56
@ajkavanagh
Copy link
Contributor

Thanks for submitting the PR. It's a great start. It would be very good to have both unit tests and an idea of an integration test for it, please. Many thanks.

Copy link
Contributor

@ajkavanagh ajkavanagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs unit tests and, (hopefully, if possible) an integration test as well.

@gabrik
Copy link
Contributor Author

gabrik commented Jul 31, 2018

Thanks @ajkavanagh, I'll do that, just two questions

  • how I can run tests on my machine before committing? (I have the snap version of LXD)
  • there is some example of integration test that I can use as reference?

@ajkavanagh
Copy link
Contributor

Hi @gabrik,

To run the unit tests you need tox installed, which is a python test runner that uses python virtualenv to isolate the tests from your system.

To the run the integration tests you just need a machine with lxd installed on it - I use libvirt virtual machines with the snap version of lxd installed, so that I can test 2.0.x and 3.0.x as well as the more recent versions. The integration test runner creates a privileged lxd container and then runs the tests inside that, so as to not affect the host machine.

As an example, take a look in https://github.com/lxc/pylxd/blob/master/integration/test_containers.py which is the integration test for containers.

Come back to me if you have any further questions.
Thanks very much

@gabrik
Copy link
Contributor Author

gabrik commented Aug 1, 2018

Thanks @ajkavanagh

I'm having some issues on adding this tests, I have added some other tests on the migration.
Seems from the codeconv report that the code inside the try..execept is not covered.
Any suggestion?

@ajkavanagh
Copy link
Contributor

@gabrik It's hard to see, but I suspect, from briefly looking at the tests, that the if clause is not being entered (only the else). Which means that the unit tests needs to simulate a 103 return code? (at least that's what I think it is doing).

@gabrik
Copy link
Contributor Author

gabrik commented Aug 1, 2018

@ajkavanagh yes I think so, I have added a test in which the container is started and then migrated, in this case the test should enter also in the if
I was also trying to understand how simulate the case of an LXDAPIException coming from new_client.containers.create
There is a way to do that?

@ajkavanagh
Copy link
Contributor

Hi @gabrik -- I've been away for a while. Anyway, back now.

The main issue with the unit test coverage is that the unit tests are not hitting the exception at: line 429. The test code needs to get the new_client.containers.create(...) method to generate the exception so that the code is tested for deleting.

Also checking for a string in the error message is probably a bit brittle. I suspect, that reading the docs that in the handler, e.response.status_code will be 103 if the container is running.

Thanks.

@gabrik
Copy link
Contributor Author

gabrik commented Aug 20, 2018

Hi @ajkavanagh thanks for the response.
I have updated the checking using the status code.
Regarding the exception, I thought that https://github.com/lxc/pylxd/pull/319/files#diff-df366cf880bd5589b821f362464ef814R130 generate the exception, as that test is trying to migrate a container in a destination that has already a container with that name.
Otherwise how I can generate the exeception?

@ajkavanagh
Copy link
Contributor

@gabrik, in the test function, you'll need to mock out generate_migration_data(). Add a side_effect to the mock that generates the Exception and then, when the test function calls migrate it will access the code path at line 430 onwards. Then there will be more complete coverage of the code.

@gabrik
Copy link
Contributor Author

gabrik commented Aug 24, 2018

@ajkavanagh
Following your suggestions I have updated the tests and everything seems fine now, let me know if there is something still missing

@ajkavanagh
Copy link
Contributor

This is great. Okay, I'll pull your branch down and do some integration testing with it (which I need to automate and add to the repo, and should have it done by the end of September). If all is good, it'll be merged. Thanks!

@gabrik
Copy link
Contributor Author

gabrik commented Aug 26, 2018

Actually I'm updating also the script used for integration test, in a way to have two containers for testing the migration. I'll ping you when I finish

@gabrik
Copy link
Contributor Author

gabrik commented Aug 27, 2018

@ajkavanagh
I have update the integration test script, One test retrieves error, the one with live migration.
The error comes only when using LXD inside LXD (as the test script does) I have opened a issue on the LXD main repo (https://github.com/lxc/lxd/issues/4980)

@ajkavanagh
Copy link
Contributor

Okay, based on lxc/lxd#4980 we can't have a direct integration test lxd-in-lxd, but only on metal. So drop the integration test into a separate 'run_migration_integration_test-18-04` file and we'll just have to run it on metal (I have access to a MaaS cluster).

Also, sadly your branch has overwritten the symlink run_integration_tests which was a symlink to run_integration_tests-18-04 .

Let's get those fixed up and then we can merge. Thanks.

@gabrik
Copy link
Contributor Author

gabrik commented Sep 3, 2018

@ajkavanagh
Now should be ok, can you let me know if the tests works?

@ajkavanagh
Copy link
Contributor

I'm still getting problems with testing the containers. The run_integration_tests link gives me:

=====================================================================[239/1017]
ERROR: Raise ValueError, cannot migrate from local connection                  
----------------------------------------------------------------------         
Traceback (most recent call last):                                             
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ion.py", line 171, in _new_conn                                                    (self._dns_host, self.port), self.timeout, **extra_kw)                     
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/util/co
nnection.py", line 79, in create_connection                                    
    raise err                                                                  
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/util/co
nnection.py", line 69, in create_connection                                    
    sock.connect(sa)                                                           
TimeoutError: [Errno 110] Connection timed out                                 
                                                                               
During handling of the above exception, another exception occurred:            
                                                                               Traceback (most recent call last):                                             
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ionpool.py", line 600, in urlopen                                              
    chunked=chunked)                                                           
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ionpool.py", line 343, in _make_request                                        
    self._validate_conn(conn)                                                  
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ionpool.py", line 849, in _validate_conn                                           conn.connect()                                                               File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ion.py", line 314, in connect                                                  
    conn = self._new_conn()
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ion.py", line 180, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnect
ion object at 0x7fbb99853e80>: Failed to establish a new connection: [Errno 110
] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):                                             
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/requests/adapte
rs.py", line 445, in send
    timeout=timeout
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connec$
ionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/util/r$
try.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.0.3.222', port=$
443): Max retries exceeded with url: /1.0 (Caused by NewConnectionError('<urll$
b3.connection.VerifiedHTTPSConnection object at 0x7fbb99853e80>: Failed to est$
blish a new connection: [Errno 110] Connection timed out',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pylxd/pylxd/client.py", line 283, in __init__
    response = self.api.get()
  File "/opt/pylxd/pylxd/client.py", line 145, in get
    response = self.session.get(self._api_endpoint, *args, **kwargs)
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/requests/sessi$
ns.py", line 525, in get
    return self.request('GET', url, **kwargs)
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/requests/sessi$
ns.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/requests/sessi$
ns.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/requests/adapt$
rs.py", line 513, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.0.3.222', po$
t=8443): Max retries exceeded with url: /1.0 (Caused by NewConnectionError('<u$
llib3.connection.VerifiedHTTPSConnection object at 0x7fbb99853e80>: Failed to $
stablish a new connection: [Errno 110] Connection timed out',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pylxd/integration/test_containers.py", line 222, in test_migrate_l
ocal_client
    Client(endpoint=second_host, verify=False)
  File "/opt/pylxd/pylxd/client.py", line 290, in __init__
    raise exceptions.ClientConnectionFailed()
pylxd.exceptions.ClientConnectionFailed

======================================================================
ERROR: A running container is migrated.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ion.py", line 171, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/util/co
nnection.py", line 79, in create_connection
    raise err
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/util/co
nnection.py", line 69, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ionpool.py", line 343, in _make_request
    self._validate_conn(conn)
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ionpool.py", line 849, in _validate_conn
    conn.connect()
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ion.py", line 314, in connect
    conn = self._new_conn()
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ion.py", line 180, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnect
ion object at 0x7fbb98da5fd0>: Failed to establish a new connection: [Errno 110
] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/requests/adapte
rs.py", line 445, in send
    timeout=timeout
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/connect
ionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/opt/pylxd/.tox/integration/lib/python3.6/site-packages/urllib3/util/re
try.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.0.3.111', port=8
443): Max retries exceeded with url: /1.0 (Caused by NewConnectionError('<urlli
b3.connection.VerifiedHTTPSConnection object at 0x7fbb98da5fd0>: Failed to esta
blish a new connection: [Errno 110] Connection timed out',))

During handling of the above exception, another exception occurred:

etc.

How did you test it (i.e. what was your setup)?

@gabrik
Copy link
Contributor Author

gabrik commented Sep 18, 2018

I had 2 VMs with LXD installed I created a simple alpine linux container on the first one and then I migrated it 5 times using the pylxd api.
The integration test I simply have a VMs with snap LXD and the 2.something LXD, but in that case I get the error from criu.

From what I see the error is in the configuration of the network, in https://github.com/gabrik/pylxd/blob/master/run_migration_integration_tests-18-04 I just create 2 containers in the same network with address 10.0.3.111 and 10.0.3.222 and then a container should be migrated between the two

@gabrik
Copy link
Contributor Author

gabrik commented Sep 19, 2018

@ajkavanagh after reading the errors with fresh mind I can figure out where my error is, as these test case are in the same file as the one running fine in LXD in LXD, when you use the current script it does not create the 2 containers with those IP addresses, a solution can be to move the migration tests in a separate sets that integration?

@ajkavanagh
Copy link
Contributor

@gabrik So the problem is, is that I'm running the run_integration_tests-18-04 tests on this branch and they fail, but when I run the tests in lxc/pylxd master, then they pass. I'm not even getting to the migration tests yet. This is on a bionic host with the lxd set up with defaults.

Can you run the run_integration_tests-18-04 test on this branch and see if it passes? If so, can you dump the config that you're using for LXD, as obviously I have something different. There really ought to be a default 'lxd' setting for the test, though, which I'll add to the master branch in integration, that can be used with lxd init.

@gabrik
Copy link
Contributor Author

gabrik commented Sep 20, 2018

@ajkavanagh I have updated the tests, I was mixing the tests this is why it was not working,
I put the migration test in a separate test environment, in this way they are not exectuted during the normal integration test.
I have just tested everything and now should be ok, really sorry for this was my fault

@ajkavanagh
Copy link
Contributor

@gabrik I really do want to get this landed! :) However, I've just done some minor changes to master (please take a look) that moves the integration test runner scripts into the integration directory -- this sadly, breaks the PR, which means it needs to move things around too.

The background to this work is that I'm getting the integration scripts to work in a libvirt on a CI system, so that pylxd will be tested against the master and stable branches of lxd as lxd has new code/changes committed.

The other aspect is that the integration scripts (currently) expect a pre-configured LXD environment so that it can launch containers. Moving forward, where the integration script(s) will be run automatically, this won't be the case; I'm doing some work to configure a "fresh" xenial/bionic machine so that the integration scripts can be run.

To cut a long story short, I think you need to:

  • remove the run_integration_tests link
  • remove run_integration_tests-18-04 script
  • rebase/merge the master branch to bring this branch up to date.

I'm (trying to) test the migration tests and will post a yay/nay here as soon as I've got the "configure a base machine" script going (today!).

Then we'll get this branch landed. And then I'll do the work to move the script to the 'correct' place and integrate it so that the CI system will run it everyday in a configured environment.

I hope this is okay!?

…ption

Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
…container is already running

Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
…e on LXD main repo

Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
…ide LXD

Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
Signed-off-by: gabrik <gabriele.baldoni@gmail.com>
@gabrik
Copy link
Contributor Author

gabrik commented Sep 21, 2018

@ajkavanagh now everything should be ok ;)

@Silentphantom62
Copy link

Silentphantom62 commented Sep 25, 2018

@ajkavanagh can we release 2.2.8 after we merge this change? Really looking forward for the container execute method fix since a while.

@Silentphantom62
Copy link

@ajkavanagh any update? we have been waiting for more than a 45 days now with issues in pylxd 2.2.7.

@gabrik
Copy link
Contributor Author

gabrik commented Nov 13, 2018

@ajkavanagh any news about this PR?

@ajkavanagh
Copy link
Contributor

So I took another crack at testing this branch, but no luck so far. @gabrik please could you list the steps you did to run the run_migration_integration_tests-18-04 script; i.e. any pre-setup of lxd, networking, storage, etc. I need to replicate your testing prior to merging it, plus work out how to then integrate it into the CI.
Thanks, Alex.

@gabrik
Copy link
Contributor Author

gabrik commented Nov 22, 2018

Sure @ajkavanagh

To run the integration tests I created a VM with Ubuntu 18.04 and latest LXD from snap

Then in this LXD i created a Network with this address 10.0.3.1/24

Then I created two Ubuntu 18.04 containers, both connected to this network, with addresses
1st. 10.0.3.111
2nd. 10.0.3.222

In both container I installed tox python3-dev libssl-dev libffi-dev build-essential criu

Then I condfigured LXD to be accessed from the network using these commands:

lxc exec $CONTAINERONE_NAME -- lxc config set core.trust_password password
lxc exec $CONTAINERONE_NAME -- lxc config set core.https_address [::]
lxc exec $CONTAINERONE_NAME -- mkdir -p /root/.config/lxc
openssl genrsa 1024 > ./$CONTAINERONE_NAME.key
lxc file push ./$CONTAINERONE_NAME.key $CONTAINERONE_NAME/root/.config/lxc/client.key
rm ./$CONTAINERONE_NAME.key
lxc exec $CONTAINERONE_NAME -- chmod 400 /root/.config/lxc/client.key
lxc exec $CONTAINERONE_NAME -- openssl req -new -x509 -nodes -sha1 -days 365 \
	-key /root/.config/lxc/client.key -out /root/.config/lxc/client.crt \
	-subj="/C=UK/ST=London/L=London/O=OrgName/OU=Test/CN=example.com"

# create a default dir storage pool for bionic
lxc exec $CONTAINERONE_NAME -- lxc storage create default dir
lxc exec $CONTAINERONE_NAME -- lxc profile device add default root disk path=/ pool=default

Then I started the test on the first container, that will use the second one only for test the migration.

Let me know if you need any other help

@ajkavanagh
Copy link
Contributor

@gabrik so I spent a good portion of today trying to get this working. I'm failing with CRIU errors on an Ubuntu 18.04 LTS host with CRIU 3.6.2. I'm going to need more details of how you tested this:

  • machine specs
  • ubuntu release and kernel version
  • CRIU version
  • LXD snap version

I'm also surprised that this worked from within a container; AFAICT CRIU migration doesn't work inside a container -- but that may have changed.

It appears that lxd migration (statefull) is still very, very, experimental, but I at least have to have managed it once before I can merge the code.

@gabrik
Copy link
Contributor Author

gabrik commented Nov 23, 2018

Sorry @ajkavanagh my mistake, I have created 2 VMs and configured in the same way as the two containers, but I have tested the migration manually, not using the script (because I do not have access to something to provision VMs in a similar way)

So I think that if you have access to an openstack or something like that you can create 2 vms with the same configuration I have written in the previous comment, and the tests should pass

@ajkavanagh
Copy link
Contributor

@gabrik please, please, may I have the exact specs that you are using to do the test. If I can't replicate it, I can't merge it. CRIU is experimental and flaky, but I at least have to be able to replicate what you've been doing to have confidence that the code works. Thanks.

@gabrik
Copy link
Contributor Author

gabrik commented Nov 23, 2018

Sure

ubuntu@fos1:~$ sudo criu --version
Warn  (criu/kerndat.c:659): Can't load /run/criu.kdat
Version: 3.6
ubuntu@fos1:~$ lxd --version
3.7
ubuntu@fos1:~$ uname -a
Linux fos1 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@ajkavanagh
Copy link
Contributor

Okay, I've now tested this and it works; the CRIU issues are incidental to the code, and it would work if CRIU was stable (which it isn't).

Stopping a container, then migrating and then restarting the container (i.e. without CRIU) works fine.

I'm going to merge this once the conflict is resolved. Thanks very much for the patch.

@ajkavanagh ajkavanagh merged commit ad07da9 into canonical:master Dec 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants