New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSTACK-9362: Skip VXLANs when rewriting the bridge name for migrations (4.8-2) #1513

Merged
merged 1 commit into from May 18, 2016

Conversation

Projects
None yet
6 participants
@insom
Copy link
Contributor

insom commented Apr 22, 2016

From the JIRA issue:

bb8f7c6

The above commit introduces rewriting of bridge device names when migrating a virtual machine from one host to another. However, it also matches bridges called "brvx-1234" and rewrites them to (in my case) "brem1-1234" - this doesn't match the bridge name on the destination and causes the migration to fail with the error:

error : virNetDevGetMTU:397 : Cannot get interface MTU on 'brem1-1234': No such device

I have flagged this as major because it's not possible to migrate VMs using VXLANs for maintenance, which seems important (it's certainly important to me!).

This is a version of #1508 based against 4.8 (sorry!)

@asfbot

This comment has been minimized.

Copy link

asfbot commented Apr 22, 2016

"Richard Klein (RSI)" on dev@cloudstack.apache.org replies:
First time poster to the development mailing list so greetings everyone. W=
e ran across this issue in our environment as well (CentOS 7 with KVM) with=
CS 4.7.0. The problem appeared to use as a behavior of libvirt because we=
could reproduce the error independent of CS by manually creating a VM and =
migrating it outside of CS but on the CS host. This led us to believe it w=
as a libvirt issue. However, based on the code change in the Git repositor=
y I was unaware there were hooks into libvirt.

We worked around the issue by just changing the name of the bridge from brv=
x-1234 to vxbr-1234 and that resolved the issue for us. Not that this is a=
better solution but just wondering if we are talking about the same issue.

Richard Klein=A0 rklein@rsitex.com=20
RSI=20
5426 Guadalupe, Suite 100=20
Austin TX 78751=20
K-
m1-
the
Ms using
o
h (at
ur reply
led
ct
FRA.

@kiwiflyer

This comment has been minimized.

Copy link
Contributor

kiwiflyer commented Apr 28, 2016

We'll pull this in for testing.

@kiwiflyer

This comment has been minimized.

Copy link
Contributor

kiwiflyer commented May 4, 2016

Tested this in a hardware 4.8 lab: I was able to migrate VXLAN enabled VMs between hosts as expected.

LGTM

@dmabry

This comment has been minimized.

Copy link
Contributor

dmabry commented May 12, 2016

Also, tested in a 4.8 lab and I was able to migrate VMs between hosts when using VXLAN Guest Isolated network.

LGTM

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 13, 2016

CI RESULTS

Tests Run: 85
  Skipped: 0
   Failed: 1
   Errors: 0
 Duration: 9h 01m 07s

Summary of the problem(s):

FAIL: Test Site 2 Site VPN Across redundant VPCs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_vpn.py", line 984, in test_01_redundant_vpc_site2site_vpn
    self.assert_(vpc2 is not None, "VPC2 creation failed")
AssertionError: VPC2 creation failed
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_B8TEJ7/results.txt

Associated Uploads

/tmp/MarvinLogs/DeployDataCenter__May_12_2016_22_26_38_6CZAVF:

/tmp/MarvinLogs/test_network_B8TEJ7:

/tmp/MarvinLogs/test_vpc_routers_XOZ934:

Uploads will be available until 2016-07-13 02:00:00 +0200 CEST

Comment created by upr comment.

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 13, 2016

@insom, @dmabry and @kiwiflyer, the CI failure seems like it could be related to this change, but I can't be sure. Can you guys provide some guidance since all of you have tested this. Also, @insom can you do a force push or close and re-open the PR so we can get this green to make it mergeable once we agree on status. Thanks...

@insom insom closed this May 13, 2016

@insom insom reopened this May 13, 2016

@insom

This comment has been minimized.

Copy link
Contributor Author

insom commented May 13, 2016

@swill I've closed and re-opened. This change is to a shell script only and the failures are within Marvin, so I'm pretty confident that they are unrelated to my proposed change.

edit: As in, they are errors in the simulated DC, which I don't believe touches these scripts, not Marvin itself.

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 13, 2016

@insom yes, I tend to agree with you. the main reason I highlighted it was because the error was network related and the code changed details related to an interface, so I figured I would check. Once everything comes back green, I will merge this. Thank you for the work and the quick reply. 👍

@insom

This comment has been minimized.

Copy link
Contributor Author

insom commented May 13, 2016

@swill Hmm. Now it's a different Cobertura error

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 13, 2016

Holy errors batman!!!

It looks like it started like this:

[INFO] --- cobertura-maven-plugin:2.6:instrument (default-cli) @ cloud-server ---
[INFO] Cobertura 2.0.3 - GNU GPL License (NO WARRANTY) - See COPYRIGHT file
[cobertura] WARN  [main] net.sourceforge.cobertura.instrument.CoberturaInstrumenter - Unable to instrument file /home/jenkins/jenkins-slave/workspace/cloudstack-pr-analysis/server/target/generated-classes/cobertura/com/cloud/ha/AbstractInvestigatorImpl.class
java.lang.NullPointerException
    at net.sourceforge.cobertura.instrument.tp.ClassMap.applyOnProjectData(ClassMap.java:364)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:187)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:121)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.addInstrumentationToSingleClass(CoberturaInstrumenter.java:234)
    at net.sourceforge.cobertura.instrument.Main.addInstrumentationToSingleClass(Main.java:298)
    at net.sourceforge.cobertura.instrument.Main.addInstrumentation(Main.java:307)
    at net.sourceforge.cobertura.instrument.Main.parseArguments(Main.java:399)
    at net.sourceforge.cobertura.instrument.Main.main(Main.java:421)
[cobertura] WARN  [main] net.sourceforge.cobertura.instrument.CoberturaInstrumenter - Unable to instrument file /home/jenkins/jenkins-slave/workspace/cloudstack-pr-analysis/server/target/generated-classes/cobertura/com/cloud/ha/HighAvailabilityManagerImpl$WorkerThread.class
java.lang.NullPointerException
    at net.sourceforge.cobertura.instrument.tp.ClassMap.applyOnProjectData(ClassMap.java:364)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:187)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:121)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.addInstrumentationToSingleClass(CoberturaInstrumenter.java:234)
    at net.sourceforge.cobertura.instrument.Main.addInstrumentationToSingleClass(Main.java:298)
    at net.sourceforge.cobertura.instrument.Main.addInstrumentation(Main.java:307)
    at net.sourceforge.cobertura.instrument.Main.parseArguments(Main.java:399)
    at net.sourceforge.cobertura.instrument.Main.main(Main.java:421)
[cobertura] WARN  [main] net.sourceforge.cobertura.instrument.CoberturaInstrumenter - Unable to instrument file /home/jenkins/jenkins-slave/workspace/cloudstack-pr-analysis/server/target/generated-classes/cobertura/com/cloud/ha/RecreatableFencer.class
java.lang.NullPointerException
    at net.sourceforge.cobertura.instrument.tp.ClassMap.applyOnProjectData(ClassMap.java:364)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:187)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:121)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.addInstrumentationToSingleClass(CoberturaInstrumenter.java:234)
    at net.sourceforge.cobertura.instrument.Main.addInstrumentationToSingleClass(Main.java:298)
    at net.sourceforge.cobertura.instrument.Main.addInstrumentation(Main.java:307)
    at net.sourceforge.cobertura.instrument.Main.parseArguments(Main.java:399)
    at net.sourceforge.cobertura.instrument.Main.main(Main.java:421)
[cobertura] WARN  [main] net.sourceforge.cobertura.instrument.CoberturaInstrumenter - Unable to instrument file /home/jenkins/jenkins-slave/workspace/cloudstack-pr-analysis/server/target/generated-classes/cobertura/com/cloud/ha/dao/HighAvailabilityDao.class
java.lang.NullPointerException
    at net.sourceforge.cobertura.instrument.tp.ClassMap.applyOnProjectData(ClassMap.java:364)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:187)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:121)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.addInstrumentationToSingleClass(CoberturaInstrumenter.java:234)
    at net.sourceforge.cobertura.instrument.Main.addInstrumentationToSingleClass(Main.java:298)
    at net.sourceforge.cobertura.instrument.Main.addInstrumentation(Main.java:307)
    at net.sourceforge.cobertura.instrument.Main.parseArguments(Main.java:399)
    at net.sourceforge.cobertura.instrument.Main.main(Main.java:421)
[cobertura] WARN  [main] net.sourceforge.cobertura.instrument.CoberturaInstrumenter - Unable to instrument file /home/jenkins/jenkins-slave/workspace/cloudstack-pr-analysis/server/target/generated-classes/cobertura/com/cloud/ha/dao/HighAvailabilityDaoImpl.class
java.lang.NullPointerException
    at net.sourceforge.cobertura.instrument.tp.ClassMap.applyOnProjectData(ClassMap.java:364)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:187)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:121)
    at net.sourceforge.cobertura.instrument.CoberturaInstrumenter.addInstrumentationToSingleClass(CoberturaInstrumenter.java:234)
    at net.sourceforge.cobertura.instrument.Main.addInstrumentationToSingleClass(Main.java:298)
    at net.sourceforge.cobertura.instrument.Main.addInstrumentation(Main.java:307)
    at net.sourceforge.cobertura.instrument.Main.parseArguments(Main.java:399)
    at net.sourceforge.cobertura.instrument.Main.main(Main.java:421)

I am pretty sure we are having clean problems on Jenkins, so I wonder if it is related to something being left over from the build of a different branch. Hmmm, I need to get better with Jenkins. Does anyone have an expertise here that could potentially help us understand what would be happening here?

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 16, 2016

@insom we are down to the wire here. I would like to see if we can get this green and merge it before the freeze. Can you rebase and push again to kick the automation. I will likely make judgement calls and merge PRs that have a passing travis even if jenkins is failing. We will see. I would rather not be in that situation...

@insom insom force-pushed the insom:CLOUDSTACK-9362-2 branch from d2eb0a5 to e9bf751 May 17, 2016

@insom

This comment has been minimized.

Copy link
Contributor Author

insom commented May 17, 2016

@swill rebased and pushed. It does have two LGTMs and tests by real people (plus it's in production at here at iWeb) if that makes it a little easier to overrule Jenkins.

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 17, 2016

We are green now. 👍 I am going to rerun CI to make sure the Failure we got does not pop up again. Otherwise we are pretty good. Thanks...

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 17, 2016

CI RESULTS

Tests Run: 44
  Skipped: 0
   Failed: 3
   Errors: 0
 Duration: 3h 12m 14s

Summary of the problem(s):

FAIL: test_02_vpc_privategw_static_routes (integration.smoke.test_privategw_acl.TestPrivateGwACL)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 262, in test_02_vpc_privategw_static_routes
    self.performVPCTests(vpc_off)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 325, in performVPCTests
    privateGw_1 = self.createPvtGw(vpc_1, "10.0.3.100", "10.0.3.101", acl1.id, vlan_1)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 595, in createPvtGw
    self.fail("Failed to create Private Gateway ==> %s" % e)
AssertionError: Failed to create Private Gateway ==> Execute cmd: createprivategateway failed, due to: errorCode: 431, errorText:Network with vlan vlan://100 already exists in zone 1
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_Y9TPF9/results.txt
FAIL: test_03_vpc_privategw_restart_vpc_cleanup (integration.smoke.test_privategw_acl.TestPrivateGwACL)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 274, in test_03_vpc_privategw_restart_vpc_cleanup
    self.performVPCTests(vpc_off, restart_with_cleanup = True)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 325, in performVPCTests
    privateGw_1 = self.createPvtGw(vpc_1, "10.0.3.100", "10.0.3.101", acl1.id, vlan_1)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 595, in createPvtGw
    self.fail("Failed to create Private Gateway ==> %s" % e)
AssertionError: Failed to create Private Gateway ==> Execute cmd: createprivategateway failed, due to: errorCode: 431, errorText:Network with vlan vlan://100 already exists in zone 1
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_Y9TPF9/results.txt
FAIL: test_04_rvpc_privategw_static_routes (integration.smoke.test_privategw_acl.TestPrivateGwACL)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 286, in test_04_rvpc_privategw_static_routes
    self.performVPCTests(vpc_off)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 345, in performVPCTests
    self.check_pvt_gw_connectivity(vm1, public_ip_1, [vm2.nic[0].ipaddress, vm1.nic[0].ipaddress])
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 704, in check_pvt_gw_connectivity
    "Ping to VM on Network Tier N from VM in Network Tier A should be successful at least for 2 out of 3 VMs"
AssertionError: Ping to VM on Network Tier N from VM in Network Tier A should be successful at least for 2 out of 3 VMs
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_Y9TPF9/results.txt

Associated Uploads

/tmp/MarvinLogs/DeployDataCenter__May_17_2016_17_19_03_TAJ94R:

/tmp/MarvinLogs/test_network_Y9TPF9:

Uploads will be available until 2016-07-17 02:00:00 +0200 CEST

Comment created by upr comment.

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 17, 2016

Not sure what to think of that. I have never seen those errors before in my environment...

@kiwiflyer

This comment has been minimized.

Copy link
Contributor

kiwiflyer commented May 17, 2016

@swill Sure you haven't got some artifacts left behind from some other testing? This is only a libvirt hook, so I can't imagine how the tests failing above could be related to this PR.

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 17, 2016

I will rerun tests now...
Edit: Well, I will run it later tonight when I have a CI env open again.

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 18, 2016

CI RESULTS

Tests Run: 44
  Skipped: 0
   Failed: 0
   Errors: 0
 Duration: 3h 20m 40s

Associated Uploads

/tmp/MarvinLogs/DeployDataCenter__May_18_2016_08_05_00_O6IVBL:

/tmp/MarvinLogs/test_network_CK2TZV:

Uploads will be available until 2016-07-18 02:00:00 +0200 CEST

Comment created by upr comment.

@swill

This comment has been minimized.

Copy link
Contributor

swill commented May 18, 2016

The tests that were failing are not failing anymore, I think this is ready...

@asfgit asfgit merged commit e9bf751 into apache:4.8 May 18, 2016

2 checks passed

Jenkins This pull request looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

asfgit pushed a commit that referenced this pull request May 18, 2016

Merge pull request #1513 from insom/CLOUDSTACK-9362-2
CLOUDSTACK-9362: Skip VXLANs when rewriting the bridge name for migrations (4.8-2)From the [JIRA issue](https://issues.apache.org/jira/browse/CLOUDSTACK-9362):

> bb8f7c6
>
> The above commit introduces rewriting of bridge device names when migrating a virtual machine from one host to another. However, it also matches bridges called "brvx-1234" and rewrites them to (in my case) "brem1-1234" - this doesn't match the bridge name on the destination and causes the migration to fail with the error:
>
> error : virNetDevGetMTU:397 : Cannot get interface MTU on 'brem1-1234': No such device
>
> I have flagged this as major because it's not possible to migrate VMs using VXLANs for maintenance, which seems important (it's certainly important to me!).

This is a version of #1508 based against 4.8 (sorry!)

* pr/1513:
  Skip VXLANs when rewriting the bridge name for migrations

Signed-off-by: Will Stevens <williamstevens@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment