-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aldrin2 Firmware 3.0.1: NO-CARRIER even when everything else indicates link #152
Comments
Is it firmware version 3.1 or 3.0.1? |
sorry, yes 3.0.1. btw /sys/class/net/swp1/operstate is still down after ip link set up. not sure if that indicates something |
For correctness, please edit/update the title and original report. PS: Fingers crossed, that Marvell and plvision.eu folks are going to help you quickly. |
There is a patch for 3.1.0 rc1 on the marvell-switching GitHub if you are
willing to try that. Do you know what the OS/SDK version on the a385
(firmware) CPU is?
…On Thu, Nov 18, 2021, 2:15 AM Paul Menzel ***@***.***> wrote:
For correctness, please edit/update the title and original report.
PS: Fingers crossed, that Marvell and plvision.eu folks are going to help
you quickly.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#152 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAP5WHYWADWW6D7BCXNEFSLUMTG5NANCNFSM5IJFVTLQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
dont think its the firmware after all. i dowgraded to 2.8.0 and still have the same issue.
how do i find out? i can access uboot. additional discovery: i can even see traffic being trapped to kernel via tcpdump,
a layer2 bridge, which should be entirely within the asic, also doesnt forward or learn anything unfortunately the driver is confusing to read since its a large patch file. but i'm currently trying to find the origin of this message |
can you pls provide the output of onlpdump. |
see attached
unfortunately i cant. this device is already deployed.
full module output as text here
|
err.... running onlpdump makes the links come up.
i tested this 3 times now to make extra sure i'm not imagining it.
|
We have seen that behavior and filed a bug with Marvell. Will follow up and see if it is fixed in 3.1.0 rc1. |
Thank you for escalating it. Sorry OT: It’d be great, if you used the public bug trackers, so that the whole community can benefit and participate. |
@sonoble where was this bug filed? This really needs to be a more transparent process since this is purportedly an open source project. I do not see this issue on the public switchdev-prestera tracker. This is going to become a major usability issue that prevents further adoption of DENT offerings. A better solution is required here. Users need a way to directly engage with those who develop critical pieces of software and the Prestera driver is certainly one of them. How do we work with Marvell to open things up here? |
Hi @jmpolom, I think you run too fast to a wrong conclusions. Hi @aep, the scenario you specified sounds familiar. Adding Accton team member: @richardlee66 Marvell>> i2c dev 2 Major CPLD number:Marvell>> i2c md 0x40 01 1 Minor CPLD number:Marvell>> i2c md 0x40 ff 1 Per the example above - The CPLD version is 1.03 |
Why? If it’s a known bug, why isn’t the problem and solution documented? What is the problem actually with older CPLD versions?
If there are known problems, why does the Linux kernel driver not check the CPLD version, and warn about it in the log files? […] |
@paulmenzel, assuming my concern about the CPLD is right - this is not a bug. It means @aep probably has a platform with ENG CPLD image.
The CPLD Driver is a platform driver handled by the Accton team - please consult with them. Marvell Switchdev driver has no direct interface with the CPLD. |
My comment was based on earlier comments seeming to suggest a driver issue and also the comment from @sonoble suggesting a bug was filed somewhere that isn’t public. Maybe that is not accurate but I’d like to see some explanation either way. Generally there hasn’t been a specifically identified support entry point for the Marvell-based DENT platforms. IE: if you have a hardware issue, where does a user ask for help? It has seemed to default to this issue tracker but that really is quite messy and should be better thought out. This issue tracker should not be used to provide end user device support and also to coordinate the development of an OS. It lumps a ton of disparate things into one bin and will become increasingly more difficult/painful/undesirable to interact with. |
this is a regular production device from Accton. If dentos only wants to support specific revisions of hardware, it would be nice to have that documented, so we can purchase the correct revision in the future.
they're the same thing. Unless you're specifically suggesting that dentos doesn't accept outside contributions, which would explain trivial bugfix PRs being ignored. Again, i would really appreciate if the purpose of dent is better documented. The overall tone appears to be that this is actually internal to some corporate agreement rather than for general use. Otherwise we'll have to find workarounds for the silicon that happens to be out there, as we traditionally do in linux. |
Hi Jon,
Yes my response was that we had seen something similar, but we had to
confirm it was fixed in the firmware. Mickey and others were able to
determine that the issue was in the cpld and we had good results testing
yesterday. I apologize for not immediately updating the ticket.
…On Tue, Nov 23, 2021, 8:22 AM Jon Polom ***@***.***> wrote:
Hi @jmpolom <https://github.com/jmpolom>, I think you run too fast to a
wrong conclusions.
Why? If it’s a known bug, why isn’t the problem and solution documented?
What is the problem actually with older CPLD versions?
My comment was based on earlier comments seeming to suggest a driver issue
and also the comment from @sonoble <https://github.com/sonoble>
suggesting a bug was filed somewhere that isn’t public. Maybe that is not
accurate but I’d like to see some explanation either way.
Generally there hasn’t been a specifically identified support entry point
for the Marvell-based DENT platforms. IE: if you have a *hardware* issue,
where does a user ask for help? It has seemed to default to this issue
tracker but that really is quite messy and should be better thought out.
This issue tracker should not be used to provide end user device support
and also to coordinate the development of an OS. It lumps a ton of
disparate things into one bin and will become increasingly more
difficult/painful/undesirable to interact with.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#152 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAP5WH7EDYTJ7BYZSPEAPADUNO5VBANCNFSM5IJFVTLQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Hi Arvid,
There is no issue with the hardware, it is an issue with the cpld version.
The cpld software is supported by the odm. Do you have a contact at
Acton/edge-core that you can work with or a reseller?
The suggestion is to update to the latest cpld. Please note this is not
common but bugs may not be exposed until new features are implemented. The
cpld update procedure is straight forward and is done from onie, it will
not modify the dentos installation or configuration.
…On Tue, Nov 23, 2021, 9:04 AM Arvid E. Picciani ***@***.***> wrote:
Please read the platform CPLD version according to the below procedure:
(From U-Boot command line)
Marvell>> i2c dev 2
Setting bus to 2
Marvell>> i2c md 0x40 01 1
0001: 01 .
Marvell>> i2c md 0x40 ff 1
00ff: 05 .
It means @aep <https://github.com/aep> probably has a platform with ENG
CPLD image.
this is a regular production device from Accton. If dentos only wants to
support specific revisions of hardware, it would be nice to have that
documented, so we can purchase the correct revision in the future.
This issue tracker should not be used to provide end user device support
and also to coordinate the development of an OS
they're the same thing. Unless you're specifically suggesting that dentos
doesn't accept outside contributions, which would explain trivial bugfix
PRs being ignored.
Again, i would really appreciate if the purpose of dent is better
documented. The overall tone appears to be that this is actually internal
to some corporate agreement rather than for general use. Otherwise we'll
have to find workarounds for the silicon that happens to be out there, as
we traditionally do in linux.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#152 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAP5WHZIUP5367IXW5NVUD3UNPCRBANCNFSM5IJFVTLQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
This issue really highlights some major deficiencies with the DENT Project that need to be resolved sooner rather than later if we want users to stick around. I see the following as questions that need to be answered:
I’d generally agree that the feeling the rest of us users are left with is that we are simply bystanders to someone else’s corporate objectives. We do not have clearly identified avenues for support with many of the major players here and it leads to a pretty lousy experience. This must be improved or we risk alienating existing users and denying future ones. |
I have requested support from the reseller, but accton is not a brand that cares about longevity of their products.
Can we collect the clpd images in a repo, similar to how we do it for firmware? |
Commit fee5b08 (Modify RX loss active high for CPLD RX loss definition correction) has no commit message body describing the problem and the fix. And also does not mention the side effect with older CPLD versions. |
You're right @paulmenzel, I think Taras only wanted to assist @aep to enable the system until Edge-Core will engage. |
Reseller responded that Accton does not release updated CLPD images for the dentos line, so these machines are dead on arrival unless the dentos community can somehow agree on a workaround. This line in the offending commit unfortunately confirms the sentiment of the rest of this thread. + It is currently not required for Amazon 'ethtool -m' support but it is intended for future use. We could maintain a community fork of dentos that works outside of Amazon, but i'm not sure if there's even interest. |
@aep |
thank you @demliu , i'm sure you realize this is not a particularly useful response, since there's nothing anyone outside of Accton can do about it. The CPLD isn't documented, and the sources aren't available. The commit that broke it is Accton specific. I'll happily help debugging the issue here, in public.
Please make this publicly available, or fix dentos main to work with all devices you shipped. This open source project won't work if basic functionality requires an NDA. |
I now received the CPLD from the reseller after pressure from Marvell. This might fix my issue but i'm not willing to purchase more devices until edge-core commits to making them publicly available. Dentos cannot be successful if this level of escalation is required for everyone participating. If dentos doesnt work we will stick to cisco, who have great support. Please make them publicly available or commit to a stable ABI. |
Is this an update to the CPLD firmware itself that you received? |
@aep |
@aep, thank you again very much for debugging the issue, we also seem to have run in with one device with firmware Aldrin2 firmware 3.0.1 and CPLD firmware 1.05 – same as you. One of our three devices started to show this issue once – no restart yet and the link just dropped. Before it worked fine. For you it was a little different, right? All the ports (besides management) never worked, didn’t they? Before we go through updating our devices, did you apply the update, and did it fix the issue? |
yes, the switch doesnt come up without the binary blobs. you need to have an exact match between dentos and the binaries, which aren't public, and we dont know which ones dentos devs test. DENTOS never made it out of the lab unfortunately. We're too small for being able to make a single vendor (Accton) to give us the required binary blobs. Only through pressure from Marvell they gave us anything. once. The second potential vendor (Delta) doesnt even want to sell us anything. As to your question on the ML: I think the CPLD is just for board specific things like pinouts, power, idk. They could probably just open source it if they wanted to. If there was a large enough community, i'm sure we could convince marvel to just sell us the chip. the rest of the board is trivial. but i'm not seeing any traction that would make that a compelling argument. And if there was a relevant community, Accton would probably also be convinced to just publish the blobs. TLDR: unless you're facebook, give up. |
i'm puzzled if this is a bug in firmware 3.0.1 or maybe an installation error.
after moving a AS5114-48X, all links are down. SFP modules are detected:
ethtool says its up, but ip link still shows NO-CARRIER. consequently all the routes are dead.
ethtool -S says it's receiving packets just fine, but not sending any (probably since its marked no-carrier)
unfortunately i never tested a cold boot before deploying. fw3.1 seemed to work fine in my tests where i plugged in the SFPs after boot.
how does carrier detection work, and is there a possibility i need to something other than "ip link set up" for it to work?
The text was updated successfully, but these errors were encountered: