Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

realtek-poe: Add PSE ID quirk for Zyxel GS1900-24HP-v1 #6

Merged
merged 3 commits into from
Jul 3, 2022

Conversation

mrnuke
Copy link
Collaborator

@mrnuke mrnuke commented Jun 26, 2022

The "set global power budget" command contains a PSE controller in its
argument list. This was hardcoded to 0. On some switches, commands to
PSE 0 do nothing. In those cases, the set budget command won't stick.
The power budget remains at its default value, usually 0. Because of
this, ports remain stuck in the "Requesting power" state.

To solve this, send the budget command to the correct PSE ID or set of
IDs. Use a mask to encode which PSE IDs neet the set budget command.
When the "compatible" devicetree string matches Zyxel GS1900-24HP-v1,
set the mask to 0xff, to command PS ID's 0->7. The vendor firmware
also uses ID's 0->7. This enables power delivery to PoE sinks.

image

Signed-off-by: Alexandru Gagniuc mr.nuke.me@gmail.com

@Hurricos
Copy link
Owner

Tested CI#15 on my GS1900-24HPv1:

  • Removed old realtek-poe package
  • Installed new one
  • Updated config to "enable all ports" (effectively)
  • Shut off to sync fs
  • Cycled power (cold boot)

A WS-AP3825i plugged into port 1 powered on as soon as the device did.

Note that issuing reboot after this does shut off PoE devices. Might be a problem resulting from how we're hogging a GPIO line to bring the device up, not sure.

Tested-by: Martin Kennedy hurricos@gmail.com

@mrnuke
Copy link
Collaborator Author

mrnuke commented Jun 26, 2022

The gpio-hogdefinition looks fine to me. I wonder if u-boot is forcing the GPIO low.

@Hurricos
Copy link
Owner

To add to my tested-by: I am noticing (though I have not A/B'd to verify this changeset / ddd72d2 is responsible) that realtek-poe now sees certain ports as 'disabled':

                "lan1": {
                        "priority": 2,
                        "mode": "PoE",
                        "status": "Searching"
                },
                "lan2": {
                        "priority": 2,
                        "mode": "PoE",
                        "status": "Delivering power",
                        "consumption": 5.100000
                },
                "lan3": {
                        "priority": 2,
                        "mode": "PoE",
                        "status": "Searching"
                },
                "lan4": {
                        "priority": 2,
                        "mode": "PoE",
                        "status": "Searching"
                },
                "lan5": {
                        "priority": 2,
                        "mode": "PoE",
                        "status": "Disabled"
                },
                "lan6": {
                        "priority": 2,
                        "mode": "PoE",
                        "status": "Searching"
                },

In fact, ports 9-24 all show as disabled (as does 5). I think this might be a race-condition of some sort -- I haven't traced it to see what it does, but ubus call poe reload actually fixes this and starts showing (through ubus call poe info) the ports being in the state I expect the config to have set them.

To add to this: I moved the PoE sink to port 23, found it come up; issued /etc/init.d/poe restart, found it stayed down and reported as Disabled; then issued an /etc/init.d/poe reload and found it working now.

I haven't tried applying just the functional / non-refactoring changes to test whether this is a regression from one of the commits in #4, and I bet the new functional commits would not apply without your cleanup. Just giving you a heads up that /etc/init.d/poe reload is needed after /etc/init.d/poe restart in order for all configured ports to come up.

I can tweak /etc/init.d/poe to use realtek-poe -d, to pass you a dump of the frames being sent here, if you'd like.

@mrnuke
Copy link
Collaborator Author

mrnuke commented Jun 26, 2022

No need to tweak /etc/init.d/poe, but do make sure you're not running two instances of realtek-poe at the same time:

# /etc/init.d/poe stop
# ps | grep realtek-poe
7939 root      1308 S    grep realtek-poe
# realtek-poe -d

@Hurricos
Copy link
Owner

No need to tweak /etc/init.d/poe, but do make sure you're not running two instances of realtek-poe at the same time:

# /etc/init.d/poe stop
# ps | grep realtek-poe
7939 root      1308 S    grep realtek-poe
# realtek-poe -d

I thought I needed to tweak it, as /etc/init.d/poe reload expects (?) the daemon to be initialized with procd.

ubus call poe reload at one point acted differently from /etc/init.d/poe reload, not 'unsticking' the output of ubus call poe info, but I can't reproduce that so I suspect it's an unrelated bug.

@mrnuke
Copy link
Collaborator Author

mrnuke commented Jun 26, 2022

I'm not surprised. realtek-poe is propped up by 2x4s and held together with duct tape.

First milestone is to get it to power on the ports. That makes 99% of users happy. Then we can worry about the undefined behavior and deadlocks that I found.

@walterav1984
Copy link

walterav1984 commented Jun 27, 2022

While still using CI14 realtek-poe package I can confirm that just a ubus call poe reload changes the port status for ports 9-24 from Disabled to Searching and after running ubus call poe sendframe '{"frame" : "18 00 07 07 08 00 00"}' its shows status Delivering power and starts powering PoE devices on port 9-24 😃!

ubus call poe info
		"lan9": {
			"priority": 2,
			"mode": "PoE+",
			"status": "Delivering power",
			"consumption": 2.400000
		},
...
		"lan24": {
			"priority": 2,
			"mode": "PoE+",
			"status": "Delivering power",
			"consumption": 4.600000
		}

Reconnecting PoE devices on other 9-24 ports will power those devices on although the ubus call poe info will not always show the correct status after that.

Doing a /etc/init.d/poe restart will just flat out put all status back to Disabled on ports 9-24 and will disable PoE and poweroff all devices on these ports. Contrary this service restarts works just fine for ports 1-8.
Reapplying a ubus call poe reload will bring back PoE devices status and power on ports 9-24.

EDIT: Also doing a reboot while ports 9-24 were active before with PoE, just quickly powers off and powers on the PoE devices during boot but eventually PoE power goes and stays down for port 9-24. However port status now shows a mix of Disabled and unknown after reboot.

{
	"firmware": "v17.1",
	"mcu": "ST Micro ST32F100 Microcontroller",
	"budget": 170.000000,
	"consumption": 0.000000,
	"ports": {
		"lan1": {
			"priority": 2,
			"status": "Searching"
		},
		"lan2": {
			"priority": 2,
			"status": "Searching"
		},
		"lan3": {
			"priority": 2,
			"mode": "PoE+",
			"status": "Searching"
		},
		"lan4": {
			"priority": 2,
			"status": "Searching"
		},
		"lan5": {
			"priority": 2,
			"mode": "PoE+",
			"status": "Searching"
		},
		"lan6": {
			"priority": 2,
			"mode": "PoE+",
			"status": "Searching"
		},
		"lan7": {
			"priority": 2,
			"mode": "PoE+",
			"status": "Disabled"   <--- Also notice this strange Disabled behaviour while not disabled in config
		},
		"lan8": {
			"priority": 2,
			"mode": "PoE+",
			"status": "Disabled"   <--- Also notice this strange Disabled behaviour while not disabled in config
		},
		"lan9": {
			"priority": 2,
			"mode": "PoE+",
			"status": "Disabled"   <--- Poe device connected
		},
		"lan10": {
			"priority": 2,
			"mode": "PoE+",
			"status": "Disabled"
		},
		"lan11": {
			"priority": 2,
			"status": "Disabled"
		},
		"lan12": {
			"priority": 2,
			"status": "Disabled"
		},
		"lan13": {
			"priority": 2,
			"status": "Disabled"
		},
		"lan14": {
			"priority": 2,
			"status": "Disabled"
		},
		"lan15": {
			"priority": 2,
			"status": "Disabled"
		},
		"lan16": {
			"priority": 2,
			"status": "Disabled"
		},
		"lan17": {
			"priority": 2,
			"status": "unknown"
		},
		"lan18": {
			"priority": 2,
			"status": "unknown"
		},
		"lan19": {
			"priority": 2,
			"status": "unknown"
		},
		"lan20": {
			"priority": 2,
			"status": "unknown"
		},
		"lan21": {
			"priority": 2,
			"status": "unknown"
		},
		"lan22": {
			"priority": 2,
			"status": "unknown"
		},
		"lan23": {
			"priority": 2,
			"status": "unknown"
		},
		"lan24": {
			"priority": 2,
			"status": "unknown"   <--- PoE device connected
		}
	}
}

ubus call poe reload #will poweron again devices 9-24 after reboot, this is not necesarry for ports 1-8
#status is still incorrect although PoE works again
#to get poe and status info correct it seems you always need a combination of `/etc/init.d/poe restart && ubus  call poe reload`

I can also confirm @Hurricos that there is strange behaviour concerning ports 5,6,7, they sometimes ran into a Disabled state after reconnecting PoE device for to many times. Although the /etc/config/poe config file lists the ports enabled it will refuse to power on PoE devices, even after rebooting the Switch nor after /etc/init.d/poe restart the PoE devices won't come on these ports and status stays disabled!
However, doing a "real" disabled in /etc/config/poe config file for instance port 5 and restarting the service and than doing a "real" enable again for that same port in config file and doing a service restart makes the port status and PoE work again...

@walterav1984
Copy link

Maybe an interresting but logical find is that disabling /etc/init.d/poe start at boot keeps ports 9-24 powered up with PoE "after reboot" albeit with the short interuption mentioned earlier.

What does GET_STR() expand to? For simple cases, it's trivial.
Operator precedence in C can result in unexpected behavior when
"a" or "b" are complex expressions:

    GET_STR(i + 2, array)
      ==> (i + 2 < ARRAY_SIZE() ? array[i + 2] : NULL);
    GET_STR(1, array + i)
      ==> (1 < ARRAY_SIZE() ? array + i[1] : NULL)

The former incorrectly checks the array size against the second term,
but dereferences a higher index. It can result in out-of-bounds reads.
The latter results in nonsense code

Macro expansions can be fixed by enclosing expresions in parentheses.
This ensures expressions get evaluated in the expected order. Add
parentheses in GET_STR(), where appropriate.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Only status data for the first 8 ports was requested. Most devices
have 8 ports, and the reply packet has room for 8 ports. It was the
perfect match!

People quickly figured out realtek-poe works for unicorn switches with
more than 8 PoE ports. And so they did. And they increased MAX_PORT
without regards to the out-of-bounds access problem they created. And
realtek-poe refused to crash!

Command 0x2a has proven to be unreliable for devices with more than
8 ports. It returns garbage for ports > 8, even on switchea with more
than 8 PoE ports.

Command 0x28 is more reliable. It only sends 4 port statuses per
packet, but returns correct data. If the port number exceeds the total
ports on the device, it correctly returns 0xff.

Replace port status query with command 0x28.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
@mrnuke mrnuke force-pushed the mrnuke-more-power branch 2 times, most recently from 35b1bdf to 2eb1831 Compare June 28, 2022 01:04
@mrnuke
Copy link
Collaborator Author

mrnuke commented Jun 28, 2022

Would be useful to have realtek-poe -d logs when the issue occurs.

The "set global power budget" command contains a PSE controller in its
argument list. This was hardcoded to 0. On some switches, commands to
PSE 0 do nothing. In those cases, the set budget command won't stick.
The power budget remains at its default value, usually 0. Because of
this, ports remain stuck in the "Requesting power" state.

To solve this, send the budget command to the correct PSE ID or set of
IDs. Use a mask to encode which PSE IDs neet the set budget command.
When the "compatible" devicetree string matches Zyxel GS1900-24HP-v1,
set the mask to 0xff, to command PS ID's 0->7. The vendor firmware
also uses ID's 0->7. This enables power delivery to PoE sinks.

Tested-by: Martin Kennedy hurricos@gmail.com
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
@Hurricos
Copy link
Owner

Hurricos commented Jul 1, 2022

Would be useful to have realtek-poe -d logs when the issue occurs.

Here's a .tar.gz of it: https://paste.c-net.org/LeelooLucid

  1. realtek-poe -d > /tmp/realtek-poe-d.log 2>&1 was run standalone (poe service disabled)
  2. ubus call poe info > /tmp/ubus-call-poe-info-1 was run
  3. ubus call poe reload was run
  4. ubus call poe info > /tmp/ubus-call-poe-info-2 ...
  5. ubus call poe reload ...
  6. ubus call poe info > /tmp/ubus-call-poe-info-3 ...

Each of these was run with about 15s between then.

@Hurricos
Copy link
Owner

Hurricos commented Jul 1, 2022

.tar.gz of it

One more thing -- that run didn't show the Disabled values I expected. This one did (raw log file): https://paste.c-net.org/AlliancePackard

@mrnuke
Copy link
Collaborator Author

mrnuke commented Jul 1, 2022

Seems to be the same root cause as #9 (comment)

@Hurricos
Copy link
Owner

Hurricos commented Jul 3, 2022

Seems to be the same root cause as #9 (comment)

OK. Looks like it's a separate issue unrelated to this PR. Some next steps are outlined here.

I am now merging this. Thank you @mrnuke!

@Hurricos Hurricos merged commit f1677d9 into realtek-poe Jul 3, 2022
@mrnuke mrnuke deleted the mrnuke-more-power branch July 3, 2022 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants