Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Screen off after restarting #22

Open
NeonDaniel opened this issue Jul 17, 2023 · 10 comments · May be fixed by #138
Open

[BUG] Screen off after restarting #22

NeonDaniel opened this issue Jul 17, 2023 · 10 comments · May be fixed by #138
Assignees
Labels
bug Something isn't working

Comments

@NeonDaniel
Copy link
Member

NeonDaniel commented Jul 17, 2023

Description

Occasionally, the GUI fails to load and even local tty sessions are not rendered on screen. fbi calls work normally and all fb and dri devices exist as expected

Steps to Reproduce

  • restart or power on a mark2
  • backlight or display might remain blank (otherwise boots as normal)
  • As of [FEAT] Notify reason for restarts #118 A static error screen remains active after the GUI fails to load

Relevant Code

The shutdown service does explicitly blank the screen; perhaps power on needs to explicitly power on the screen?

Other Notes

  • This appears unrelated to the GUI as tty sessions also don't render (though they do receive input)
@NeonDaniel
Copy link
Member Author

NeonDaniel commented Mar 19, 2024

Observed permissions of /dev/dri and /dev/render* as a possible cause of GUI errors. Reminded of this on Matrix

In a recent failure, the devices look normal:

(venv) neon@neon:~$ ll /dev/dr*
total 0
drwxr-xr-x  3 root root        120 Jan 26 13:48 ./
drwxr-xr-x 17 root root       3980 Jan 26 13:48 ../
drwxr-xr-x  2 root root        100 Jan 26 13:48 by-path/
crw-rw----  1 root video  226,   0 Jan 26 13:48 card0
crw-rw----  1 root video  226,   1 Jan 26 13:48 card1
crw-rw----  1 root render 226, 128 Jan 26 13:48 renderD128

@NeonDaniel
Copy link
Member Author

Example of gui-shell logs in a failure case. The leading and ending errors are seen every time the GUI fails to launch when the screen is properly initialized with /dev/dri populated.

Jan 26 13:48:44 neon ovos-shell[554]: Failed to move cursor on screen DSI1: -13
Jan 26 13:48:44 neon ovos-shell[554]: Failed to move cursor on screen DSI1: -13
Jan 26 13:48:46 neon ovos-shell[554]: kf.kirigami: The style does not provide a C++ Units implementation. QML Units implementations are no longer suppor>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/main.qml:334:9: QML FastBlur: Cannot anchor to an item that isn't a parent or sibling.
Jan 26 13:48:46 neon ovos-shell[554]: QMetaProperty::notifySignal: cannot find the NOTIFY signal usePTTClient in class GlobalSettings for property 'useP>
Jan 26 13:48:46 neon ovos-shell[554]: mycroft connection not open!
Jan 26 13:48:46 neon ovos-shell[554]: mycroft connection not open!
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/panel/quicksettings/MuteDelegate.qml:55:5: QML Connections: Implicitly defined onFoo properties in Connection>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/panel/quicksettings/VolumeSlider.qml:61:5: QML Connections: Implicitly defined onFoo properties in Connection>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/panel/quicksettings/BrightnessSlider.qml:37:5: QML Connections: Implicitly defined onFoo properties in Connec>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/panel/SlidingPanel.qml:47:9: QML Connections: Implicitly defined onFoo properties in Connections are deprecat>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/NotificationsSystem.qml:45:5: QML Connections: Implicitly defined onFoo properties in Connections are depreca>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/NotificationsSystem.qml:22:5: QML Connections: Implicitly defined onFoo properties in Connections are depreca>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/ListenerAnimation.qml:18:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecate>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/qml/SkillView.qml:63:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. U>
Jan 26 13:48:46 neon ovos-shell[554]: qrc:/osd/VolumeOSD.qml:40:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. U>
Jan 26 13:48:47 neon ovos-shell[554]: qrc:/StatusIndicator.qml:165:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated>
Jan 26 13:48:47 neon ovos-shell[554]: qrc:/ServiceWatcher.qml:35:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. >
Jan 26 13:48:47 neon ovos-shell[554]: error activating kdeconnectd: QDBusError("org.freedesktop.DBus.Error.Disconnected", "Not connected to D-Bus server>
Jan 26 13:48:47 neon ovos-shell[554]: error activating kdeconnectd: QDBusError("org.freedesktop.DBus.Error.Disconnected", "Not connected to D-Bus server>
Jan 26 13:48:47 neon ovos-shell[554]: kdeconnect.interfaces: dbus interface not valid
Jan 26 13:48:47 neon ovos-shell[554]: file:///usr/lib/aarch64-linux-gnu/qt5/qml/QMLTermWidget/QMLTermScrollbar.qml:29:5: QML Connections: Implicitly def>
Jan 26 13:48:47 neon ovos-shell[554]: qrc:/main.qml:88:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this s>
Jan 26 13:48:47 neon ovos-shell[554]: qrc:/main.qml:72:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this s>
Jan 26 13:48:47 neon ovos-shell[554]: qrc:/main.qml:62:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this s>
Jan 26 13:48:47 neon ovos-shell[554]: Failed to commit atomic request (code=-13)

@NeonDaniel
Copy link
Member Author

Another different error:

Mar 20 11:45:40 neon systemd[1]: Started gui-shell.service - Neon GUI.
Mar 20 11:45:41 neon ovos-shell[4137]: drmModeGetResources failed (Operation not supported)
Mar 20 11:45:41 neon ovos-shell[4137]: no screens available, assuming 24-bit color
Mar 20 11:45:41 neon ovos-shell[4137]: Cannot create window: no screens available
Mar 20 11:45:41 neon systemd[1]: gui-shell.service: Main process exited, code=killed, status=6/ABRT
Mar 20 11:45:41 neon systemd[1]: gui-shell.service: Failed with result 'signal'.

@NeonDaniel
Copy link
Member Author

NeonDaniel commented Mar 21, 2024

When the GUI service fails to launch, it appears that tty sessions also fail. ctrl+alt+F2 appears to change to a new session but there is no prompt (completely black screen). ctrl+alt+F1 resumes an active static screen.

Looking at systemd logs, typed input is being handled, just not rendered on-screen.

@NeonDaniel
Copy link
Member Author

  • No difference in initramfs.log in either case
  • No relevant difference in dmesg or raspinfo outputs

@NeonDaniel
Copy link
Member Author

NeonDaniel commented Mar 22, 2024

With debug set in cmdline.txt, the following are present in a WORKING boot dmesg output but not a broken one:

[   19.166018] (udev-worker)[282]: drm: Processing device (SEQNUM=1759, ACTION=add)
[   19.173827] (udev-worker)[281]: 8250: Processing device (SEQNUM=1760, ACTION=add)

This was also present in a subsequent broken boot

This may be related to scripts/init-bottom/udev in the initramfs

@NeonDaniel
Copy link
Member Author

NeonDaniel commented Mar 22, 2024

Working udev has additional:

S: disk/by-path/platform-fd500000.pcie-pci-0000:01:00.0-usb-0:1:1.0-scsi-0:0:0:0-part1

and working initramfs has additional:

brcm-pcie fd500000.pcie: clkreq control enabled

Broken udev has additional:

│ │   ├─gpio/gpio22
│ │   │ ┆ P: /devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio22
│ │   │ ┆ M: gpio22
│ │   │ ┆ R: 22
│ │   │ ┆ U: gpio
│ │   │ ┆ E: DEVPATH=/devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio22
│ │   │ ┆ E: SUBSYSTEM=gpio
│ │   ├─gpio/gpio23
│ │   │ ┆ P: /devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio23
│ │   │ ┆ M: gpio23
│ │   │ ┆ R: 23
│ │   │ ┆ U: gpio
│ │   │ ┆ E: DEVPATH=/devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio23
│ │   │ ┆ E: SUBSYSTEM=gpio
│ │   ├─gpio/gpio24
│ │   │ ┆ P: /devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio24
│ │   │ ┆ M: gpio24
│ │   │ ┆ R: 24
│ │   │ ┆ U: gpio
│ │   │ ┆ E: DEVPATH=/devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio24
│ │   │ ┆ E: SUBSYSTEM=gpio
│ │   └─gpio/gpio25
│ │     ┆ P: /devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio25
│ │     ┆ M: gpio25
│ │     ┆ R: 25
│ │     ┆ U: gpio
│ │     ┆ E: DEVPATH=/devices/platform/soc/fe200000.gpio/gpiochip0/gpio/gpio25
│ │     ┆ E: SUBSYSTEM=gpio

@NeonDaniel
Copy link
Member Author

Another different error:

Mar 20 11:45:40 neon systemd[1]: Started gui-shell.service - Neon GUI.
Mar 20 11:45:41 neon ovos-shell[4137]: drmModeGetResources failed (Operation not supported)
Mar 20 11:45:41 neon ovos-shell[4137]: no screens available, assuming 24-bit color
Mar 20 11:45:41 neon ovos-shell[4137]: Cannot create window: no screens available
Mar 20 11:45:41 neon systemd[1]: gui-shell.service: Main process exited, code=killed, status=6/ABRT
Mar 20 11:45:41 neon systemd[1]: gui-shell.service: Failed with result 'signal'.

This one appears to be because /dev/dri/card0 is not deterministic and will sometimes link platform-gpu-card instead of platform-fec00000.v3d.card. Will remove QT_QPA_EGLFS_KMS_CONFIG to allow auto-detection

@NeonDaniel
Copy link
Member Author

Reported as also affecting HDMI displays on the forum

@NeonDaniel NeonDaniel linked a pull request Mar 27, 2024 that will close this issue
@NeonDaniel
Copy link
Member Author

As suggested by Claude, I refactored service dependencies to start gui-shell after the modprobe@drm service completed but the issue persists. Comparing outputs from systemd-analyze plot there does not appear to be any pattern common to failed boots.

Anecdotally, systemd-binfmt.service appears to consistently take longer (~1s vs ~94ms) in the failure cases but does exit successfully in both cases

NeonDaniel added a commit that referenced this issue Apr 2, 2024
* Refactor eglfs config for RPi compat (relates to #22)

* Update gui service install and dependencies to resolve boot failures

* Remove deprecated mesa dependency installation

* Revert GUI service changes breaking init

* Update SJ201 script for kernel 6.6 and backwards-compat

* Update base images

---------

Co-authored-by: Daniel McKnight <daniel@neon.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

1 participant