Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internet connexion lost after a few minutes #2

Open
Leborgne23 opened this issue Feb 13, 2022 · 62 comments
Open

Internet connexion lost after a few minutes #2

Leborgne23 opened this issue Feb 13, 2022 · 62 comments

Comments

@Leborgne23
Copy link

Hi, just to thank you for your work and let you know what words and what does not in my case :

  • What works : DHCP, local network, no breakdown after sleep, internet connectivity
  • What does not : internet connectivity after a few minutes, I have to tweak link speed to make it work again.
    NIC : Intel GbE LAN chip (Built in Gigabyte Aorus X570 elite)
    OS version : macOS 12.2.1
    Router : TP-link MR 400
@donatengit
Copy link
Owner

Hi @Leborgne23
Thanks for checking this out

Which link speed works and doesn't work in your case? Does the link stable e.g. no packets loss?

@donatengit
Copy link
Owner

One more thing could you please send what is detected in (Mac) -> (About) -> (Ethernet). Interested in Device/Vendor IDs

https://support.apple.com/en-gb/guide/system-information/syspr35536/mac

@Leborgne23
Copy link
Author

Leborgne23 commented Feb 14, 2022 via email

@Leborgne23
Copy link
Author

Leborgne23 commented Feb 14, 2022 via email

@Leborgne23
Copy link
Author

Leborgne23 commented Feb 14, 2022 via email

@donatengit
Copy link
Owner

donatengit commented Feb 18, 2022

Nevermind, issue is back again after 5 hours.

@Leborgne23 Problem is I can't fully reproduce 1Gb issues with my NIC as it autonegotiates 100Mb with my old router and all other options don't work -- not necessarily due to the driver. While I'm looking for a valid 1G partner for testing could you please test one more version: [DELETED]
It has some dirty hacks to avoid resets without obvious reason so ifconfig enX up/down now works (in my case) but with a drawback that forcing speed/control dramatically reduces connection speed (unclear reason atm). Additionally it has far more logging please perform sudo dmesg | grep -i igb in console while manipulating the NIC's state through ifconfig for additional debug.

@donatengit
Copy link
Owner

donatengit commented Feb 20, 2022

Hi @Leborgne23

Could you please try a new version? It's supposed to be far more stable with link state changes.

Thanks in advance

@Leborgne23
Copy link
Author

Leborgne23 commented Feb 20, 2022 via email

@NyaomiDEV
Copy link

Hello there. I am also experiencing issues as per description with your latest version (the one you sent out one hour ago). I am trying to install Monterey, and the installation errored out because of the network connection cut off almost half an hour in.

I am using I211 on Asrock X570 Taichi.

@donatengit
Copy link
Owner

donatengit commented Feb 20, 2022

I am trying to install Monterey

Hi @NyaomiDEV,
Thanks for trying this out. The driver is for Monterey specifically (built targeted 12.1)
SmallTree supposed to work well on previous versions (meaning download/upgrade could be done without this driver).

You could manage which driver is used in each OS version by setting Min/MaxKernel options for every kext loaded in config.plist

@NyaomiDEV
Copy link

The driver is for Monterey specifically (built targeted 12.1)

I know. I am, indeed, trying to install Monterey from scratch, meaning that I am loading your kext on the macOS recovery. Please tell me if I am doing this wrong, though! I just want this suffering to end a working installation, so I can just keep trying all the kexts you can provide until I get the OS to install.

@donatengit
Copy link
Owner

donatengit commented Feb 20, 2022

I am, indeed, trying to install Monterey from scratch, meaning that I am loading your kext on the macOS recovery.

Oh, I've never tested the driver in this way. And I'm not sure which debugging options available during the process tbh.
Is WiFi working? If so just disable the driver for a while
P.s. Feel free to ping me in discord donniedisc#1988 (server) if that's more convenient

@NyaomiDEV
Copy link

Is WiFi working?

It's an Intel AX200, which I can use only after installing the system anyway. Can't count on it sadly.

(server)

We'll probably hear from each other in five minutes because of the Discord server cooldown.

@Leborgne23
Copy link
Author

Leborgne23 commented Feb 20, 2022 via email

@donatengit
Copy link
Owner

Well I still have the issue using the new version sorry. Seems more stable if I force half duplex, maybe that can help.

@Leborgne23, thanks.

  1. Is it still connection lost (i.e. cable unplugged or similar) or packets loss?
  2. What kind of network activity was that period: intense or almost none? It might be something with EEE power management
  3. Will you be able to run additional couple of commands in terminal when noticing problems?
  4. What link speed status is shown on the router, is this the same as autonegotiated/you force?

@Leborgne23
Copy link
Author

Leborgne23 commented Feb 21, 2022 via email

@donatengit
Copy link
Owner

I tested it using p2p (torrent) downloading / uploading so I guess yes it was intense.

Ok, it narrows the root cause, I guess. It's either inability to cope with the load (torrent is one of the most network intensive activities indeed) or can't detect/manage hangs properly (since counter-party is often unreliable).

  1. Will you be able to run additional couple of commands in terminal when noticing problems? Yes I’ll do it, which ones ?

Great, I'll prepare a debug version with additional logging around packets transmission and would ask you to run sudo dmesg | grep -i igb in terminal right after the problems occurs. But before I'll try to reproduce the issue myself with some torrents.

@donatengit
Copy link
Owner

Hi @Leborgne23,

I've tested the driver under high torrents load and indeed some packets was getting timeouted (less with patches below) but the overall download speed was constantly hitting maximum speed ISP allows. And the link was still stable unfortunately.

Anyway I applied several changes that might help:

  1. Explicitly rejecting packets when transmit queue is busy (before that it was kind of silent)
  2. Increased default queue capacity from 256 to 1024
  3. Added options to (un)select EEE mode (there are notes that disabling it could fix spontaneous link problems)
  4. Ensured software interrupt register in watchdog for rx ring cleaned

Could you please test
AppleIGB.kext.zip ?

I recommend to test autonegotiated 1Gbs first and if the issue remains force 1GBps without EEE.
Separately it would make sense to test limiting download/upload speed of your torrent client to 80-90% of your maximum ISP speed keeping space for other web/network activity (according to my tests torrents could take all of it).

As for additional debug, please run 2 terminals:

  • one with ping 8.8.8.8 -- it constantly pings google and reflects time of response (it could show timeouts or increase in ms if torrents take all the bandwidth)
  • another with sudo dmesg | grep -i igb -- run this as soon as you see any problem and accumulate contents for further sharing

Thanks in advance

@Leborgne23
Copy link
Author

Leborgne23 commented Feb 22, 2022 via email

@donatengit
Copy link
Owner

@Leborgne23 Thanks, forgot to attach the screenshot?

@donatengit
Copy link
Owner

Maybe something wrong with my OpenCore setup ?

Did SmallTree work well before?

@Leborgne23
Copy link
Author

Leborgne23 commented Feb 22, 2022 via email

@Leborgne23
Copy link
Author

Leborgne23 commented Feb 22, 2022 via email

@donatengit
Copy link
Owner

@Leborgne23

I did attach the screenshot to the email, not in GitHub.

Thanks but still don't see it for some reason.

If that can help I can install Big Sur on an external disk, use SmallTree and report back.

It's a good idea, please follow dortania guide carefully and while testing ensure no other network interfaces are enabled (including wi-fi).

But before that there is another version available, stabilizing output speed by stalling packets (as in IntelMausi driver)

@llyonard
Copy link

I'm having the most stable connection atm with the 5.7.2-im, with the newest 2 i keep having random disconnection every 2/3 min.

@NyaomiDEV
Copy link

I'm having the most stable connection atm with the 5.7.2-im, with the newest 2 i keep having random disconnection every 2/3 min.

On which hardware though?

@llyonard
Copy link

intel i211 controller. Its stable except under heavy upload load (download seems ok). The other 2 versions are really unstable in my config.

@NyaomiDEV
Copy link

intel i211 controller

On which chipset?

@llyonard
Copy link

llyonard commented Mar 21, 2022

AMD X570 aorus elite.

Edit: im having the same problems with that release too, was just lucky in some boots (still i dont know why)

@thedxrklord
Copy link

Same issue x570f gaming
i211 controller
Seems like it stops when I'm trying to create a new connection (for example, joining to discord voice channel)

@donatengit
Copy link
Owner

Also which SMBIOS/mac type you declare in your config.plist?

@donatengit
Copy link
Owner

Based on some debug activities with @thedxrklord in Discord, please try changing connection mode from Auto to 100 or 1000mbps, Full-duplex, With or Without Flow-control.
(Mac OS Settings -> Network -> Ethernet adapter (the i211 one) -> Advanced or Additional (sorry I don't have English UI, where TCP/IP or DNS tabs) -> Hardware tab.

It helped the guy. Let me know whether the connection is stable in your case. It might help me to narrow the issue

@llyonard
Copy link

Ok ill try this new setting in hardware config. For the other questions you're right, i reallly i dont have any difference with the newest versions, had just some luck with the old one.
I'll report later if i have an unstable connection even with the changes in Ethernet-->Hardware.

@donatengit
Copy link
Owner

Thanks, it would be ideal if you run sudo dmesg | grep -i igb in terminal right after the issue occur. Feel free to pm me in Discord tomorrow for a quicker turnaround if it's more convenient.

@thedxrklord
Copy link

thedxrklord commented Mar 24, 2022

Hi everyone
I've tested yesterday different settings in network advanced (thanks @donatengit)
And 100baseTX is more stable, but I've got slow internet connection (10mbit/250mbit)
BUT if drops 100baseTX I switch to 1000 back (ff,fc) and it is ok
No idea why it happens, debug says everything the same

Just know, that auto mode drops it every 2-3 minutes, 100 is the stablest one

I'm still learning why it happens, I'll write here if I find something

Also, here is a bash script
You can add it to autoload
It will reboot your interface automatically, when it drops

#!/bin/bash

while true;do
ping -c 3 -t 1 -i 0.1 8.8.8.8 > /dev/null
if [ $? -ne 0 ]; then
    echo network down
    sudo ifconfig en0 down
    sudo ifconfig en0 up
    sleep 10
fi
done

@llyonard
Copy link

Thanks for the bash script, i can confirm that without auto is much more stable

@NyaomiDEV
Copy link

NyaomiDEV commented Mar 26, 2022

The bash script sure can help getting a sense of continuity out of normal browsing, but the issue at hand is disruptive in real time applications like Teams meetings.. btw, so is it not stable on 1000Mbit/s at all?

@llyonard
Copy link

2 days ago was really stable even with 1000mbit/s, today it disconnects every 3 mins again. Here the dmesg
https://pastebin.com/DvRLJT3G

@donatengit
Copy link
Owner

donatengit commented Mar 27, 2022

Also, here is a bash script You can add it to autoload It will reboot your interface automatically, when it drops

Hi @thedxrklord ,

Thanks again for testing. I've just tested your approach on real macbook pro (2017) and not sure it reflects situation with connection consistently. Slightly modified script with exit code:

% cat check_link.sh 
#!/bin/bash

while true;do
ping -c 3 -t 1 -i 0.1 8.8.8.8 > /dev/null
PING_EXIT=$?
if [ $PING_EXIT -ne 0 ]; then
    echo "network down ? (code=$PING_EXIT)"
    sleep 10
fi
done

outputs following (under high network load in ~15 minutes):

tmp % ./check_link.sh 
network down ? (code=2)
network down ? (code=2)
network down ? (code=2)

So the script will perform long ifconfig down/up on pretty valid connection in some circumstances as it seems.

Even though it would make sense probably to switch your script to some local address first (e.g. router or ISP switch) to exclude ISP issues, please note that generally MacOS is not great on distributing network capacity between apps and services, one could take a lot from it. I've run ping -i 0.1 192.168.1.1 (local) under high network load (two 4K youtube videos, speedtest) and pings started to drop and response time dramatically increased. Here how it looks like on real mac with normal ping 1-2ms:

64 bytes from 192.168.1.1: icmp_seq=24443 ttl=64 time=158.267 ms
Request timeout for icmp_seq 24445
Request timeout for icmp_seq 24446
64 bytes from 192.168.1.1: icmp_seq=24445 ttl=64 time=305.311 ms
64 bytes from 192.168.1.1: icmp_seq=24447 ttl=64 time=169.409 ms
64 bytes from 192.168.1.1: icmp_seq=24448 ttl=64 time=91.876 ms
Request timeout for icmp_seq 24453
Request timeout for icmp_seq 24454
64 bytes from 192.168.1.1: icmp_seq=24452 ttl=64 time=357.367 ms
Request timeout for icmp_seq 24456

ifconfig down/up is comparatively expensive activity which could take up to 10 seconds. What kind of load/context you have while experiencing first issues: CPU, network load? What happens if you don't run your script?

@donatengit
Copy link
Owner

The bash script sure can help getting a sense of continuity out of normal browsing, but the issue at hand is disruptive in real time applications like Teams meetings.. btw, so is it not stable on 1000Mbit/s at all?

Hi @NyaomiDEV,

Thanks, are you testing this on BigSur? And SmallTree doesn't have these issues in the exactly same context, correct?

@donatengit
Copy link
Owner

donatengit commented Mar 27, 2022

2 days ago was really stable even with 1000mbit/s, today it disconnects every 3 mins again. Here the dmesg https://pastebin.com/DvRLJT3G

Hi @llyonard,

Thanks for testing. It's really unusual that the driver started to disconnect after 2 days of stable work, something has changed I guess, or it might help us to narrow the issue.

  1. First of all how do you understand the connection is dropped? From the logs provided I see manual switchs on/off nothing pointing that something is changed with link. You could run ping -i 0.1 8.8.8.8 to see the picture in real time and run call for debug logs right after packets are dropped constantly.
  2. Did you use reconnection script from this thread? Please see my reply above with checks from real mac.
  3. Do you have other network devices in your setup: another network chipset, wifi, iphone plugged via USB, ... ? Including virtuals e.g. VPN? Does the issue remain if everything else is switched off?
  4. Is there anything in the BIOS network related that could be a reason?
  5. Did your device go to suspend/sleep in the period? What kind of load did you have while experiencing the issue: high CPU, network and/or disk activity?
  6. What MacOS version do you have at the moment? Did you upgrade from Big Sur? Was SmallTree driver working there?
  7. Is there anything else not working properly at the moment e.g. wi-fi, bluetooth?
  8. What OpenCore version do you have?
  9. What mac device you declare in config.plist? Not sure this is related but I declare Mac Pro (2019) as the closest to my hardware, including Radeon drivers.

Thanks in advance

@llyonard
Copy link

1)Yes there are manual disconnects but i just did that for restart the connection
2)No
3)Yes, i have an intel ax200 but it is disabled
4) I cant see any but ill check more deeply
5)No, no suspend, i have it disabled cause isnt stable even in windows or linux
6)latest monterey (not beta or anything like that)
7)Sometimes when the ethernet is too unstable i activate the wifi and it loses connection but i have a more stable experience
8) latest release
9)Same mac device in config

@donatengit
Copy link
Owner

Hi @llyonard
Thanks

1)Yes there are manual disconnects but i just did that for restart the connection

So in your terms, disconnect is a dramatic speed drop, packets loss or something else?

  1. Yes, i have an intel ax200 but it is disabled

It's possible that issue might be caused by this but hard to say without the NIC to debug.

  1. I cant see any but ill check more deeply

Any news?

@Cryptiiiic
Copy link

@donatengit I'm also having connection drops. I have I211 NIC on ASUS ROG Formula VII X570. I'm on Monterey 12.5.
The only fix afaik is to up/down the interface using ifconfig but that's annoying to do all the time. In fact I had to reup the interface 3 times during the typing of this comment. There were no mentions of igb in dmesg log or any errors afaik. I do see a decent amount of connect() - failed necp_set_socket_domain_attributesconnect() logs but unsure if thats relevant to IGB.

@llyonard
Copy link

Hi @llyonard Thanks

1)Yes there are manual disconnects but i just did that for restart the connection

So in your terms, disconnect is a dramatic speed drop, packets loss or something else?

  1. Yes, i have an intel ax200 but it is disabled

It's possible that issue might be caused by this but hard to say without the NIC to debug.

  1. I cant see any but ill check more deeply

Any news?

Sadly i couldnt solve any of my problems so i was forrced to buy an usb to ethernet and since then i had 0 problems. Anyway im always open to help if you need more test for this project

@Cryptiiiic
Copy link

Found another log
[15063.048857]: uipc_accept: peer disconnected unp_gencnt 30140Sandbox apply: mdworker_shared[3649] <bytes>Sandbox apply: mdworker_shared[3648] <bytes>Sandbox apply: mdworker_shared[3647] <bytes>compat_ifmu_ulist: en0 copyin() error 14c

@Cryptiiiic
Copy link

image

pinging google -> drop connection after 30 minutes

@donatengit
Copy link
Owner

Hi @Cryptiiiic
Thanks for testing.

There were no mentions of igb in dmesg log or any errors afaik

It should be even on release version. At least when you are doing ifconfig en1 down/up (dmesg should be checked right afterwards), e.g. with command sudo ifconfig en1 down; sudo ifconfig en1 up; sleep 5; sudo dmesg | grep -i igb I have long output looking like this:

[191761.667427]: igb: AppleIGB::stopTxQueue()
[191761.667438]: igb: setCarrier(0) ===>
.... 
[191768.372439]: igb: hw->fc.current_mode = 3
[191768.373006]: igb: Flow Control = FULL.
[191768.373010]: igb: 1000 Mbs, igb: Full Duplex
[191768.373015]: igb: hw->fc.current_mode = 3
[191768.373017]: igb: checkLinkStatus() ===> link=1, carrier=0, linkUp=0
[191768.373020]: igb: setLinkUp() ===>
[191768.373303]: igb: OK Link register status: 0x0000796d
[191768.374294]: igb: 1000 Mbs, igb: Full Duplex
[191768.374299]: igb: setCarrier(1) ===>
[191768.374336]: igb: setCarrier() <===
[191768.392966]: igb: output: Dropping packet on disabled device
[191768.392976]: igb: output: Dropping packet on disabled device

Could you please ensure timing is correct for checking dmesg ? I think it would be really beneficial if you check that also right after what seems to be disconnect.

Found another log [15063.048857]: uipc_accept: peer disconnected unp_gencnt 30140Sandbox apply: mdworker_shared[3649] Sandbox apply: mdworker_shared[3648] Sandbox apply: mdworker_shared[3647] compat_ifmu_ulist: en0 copyin() error 14c

Not sure this is related, happening to people on real Macs.

Separately, please ensure you are using latest version, it would make sense to try different modes for speed, duplex, flow control and EEE ((Settings -> Network -> Ethernet -> Advanced -> Hardware tab).

@donatengit
Copy link
Owner

Hi @llyonard

Thanks a lot for your efforts and dedication.

Anyway im always open to help if you need more test for this project

I'm afraid until I have X570 to test or there is someone with X570 with minimal development skills there is not much you could help with.

@Cryptiiiic
Copy link

Cryptiiiic commented Jul 31, 2022

@donatengit
yes I do see compat_ifmu_ulist: en0 copyin() error 14compat_ifmu_ulist: en1 copyin() error 14 during the drop
And here is the igb log, no log during drop only interface up/down

[14698.156655]: igb: AppleIGB::stopTxQueue()
[14698.156742]: igb: setCarrier(0) ===>
[14698.156770]: igb: setCarrier() <===
[14698.156772]: igb: AppleIGB::stopTxQueue()
[14698.167229]: igb: Masking off all interrupts
[14698.182929]: igb: NVM word 0x03 is not mapped.
[14698.182966]: igb: Read INVM Word 0x0a = 402f
[14698.183310]: igb: Requested word 0x04 not found in OTP
[14698.183314]: igb: Initializing the IEEE VLAN
[14698.183639]: igb: Programming MAC Address into RAR[0]
[14698.183645]: igb: Clearing RAR[1-15]
[14698.183728]: igb: Zeroing the MTA
[14698.183755]: igb: Zeroing the UTA
[14698.183848]: igb: After fix-ups FlowControl is now = 3
[14698.184739]: igb: Reconfiguring auto-neg advertisement params
[14698.185040]: igb: autoneg_advertised 20
[14698.185044]: igb: Advertise 1000mb Full duplex
[14698.185194]: igb: Auto-Neg Advertising c01
[14698.185342]: igb: Restarting Auto-Neg
[14698.185928]: igb: No link register status 0x00007949 (try 1/10)
[14698.186237]: igb: No link register status 0x00007949 (try 2/10)
[14698.186546]: igb: No link register status 0x00007949 (try 3/10)
[14698.186855]: igb: No link register status 0x00007949 (try 4/10)
[14698.187164]: igb: No link register status 0x00007949 (try 5/10)
[14698.187472]: igb: No link register status 0x00007949 (try 6/10)
[14698.187781]: igb: No link register status 0x00007949 (try 7/10)
[14698.188089]: igb: No link register status 0x00007949 (try 8/10)
[14698.188397]: igb: No link register status 0x00007949 (try 9/10)
[14698.188706]: igb: No link register status 0x00007949 (try 10/10)
[14698.188722]: igb: Unable to establish link!!!
[14698.188723]: igb: Initializing the Flow Control address, type and timer regs
[14698.190627]: igb: No link register status 0x00007949 (try 1/1)
[14698.190629]: igb: Phy info is only valid if link is up
[14698.196711]: igb: disable() <===
[14712.566065]: igb: setCarrier(0) ===>
[14712.566068]: igb: setCarrier() <===
[14712.566070]: igb: intelSetupAdvForMedium(index 7, type 5242928) ===>
[14712.566072]: igb: intelSetupAdvForMedium() <===
[14712.566074]: igb: igb_open() ===>
[14712.566075]: igb: setCarrier(0) ===>
[14712.566077]: igb: setCarrier() <===
[14712.680742]: igb: MNG configuration cycle has not completed.
[14712.681045]: igb: After fix-ups FlowControl is now = 3
[14712.681945]: igb: Reconfiguring auto-neg advertisement params
[14712.682243]: igb: autoneg_advertised 20
[14712.682245]: igb: Advertise 1000mb Full duplex
[14712.682395]: igb: Auto-Neg Advertising c01
[14712.682544]: igb: Restarting Auto-Neg
[14712.683139]: igb: No link register status 0x00007949 (try 1/10)
[14712.683453]: igb: No link register status 0x00007949 (try 2/10)
[14712.683766]: igb: No link register status 0x00007949 (try 3/10)
[14712.684079]: igb: No link register status 0x00007949 (try 4/10)
[14712.684392]: igb: No link register status 0x00007949 (try 5/10)
[14712.684705]: igb: No link register status 0x00007949 (try 6/10)
[14712.685018]: igb: No link register status 0x00007949 (try 7/10)
[14712.685331]: igb: No link register status 0x00007949 (try 8/10)
[14712.685644]: igb: No link register status 0x00007949 (try 9/10)
[14712.685957]: igb: No link register status 0x00007949 (try 10/10)
[14712.685973]: igb: Unable to establish link!!!
[14712.685975]: igb: Initializing the Flow Control address, type and timer regs
[14712.685978]: igb: Powered up link.
[14712.703182]: igb: igb_open() <===
[14712.713980]: igb: setCarrier(1) ===>
[14712.714020]: igb: setCarrier() <===
[14712.714021]: igb: enable() <===
[14715.641432]: igb: OK Link register status: 0x0000796d
[14715.641594]: igb: hw->fc.current_mode = 3
[14715.642188]: igb: Flow Control = FULL.
[14715.642192]: igb: 1000 Mbs, igb: Full Duplex
[14715.642196]: igb: hw->fc.current_mode = 3
[14715.642198]: igb: checkLinkStatus() ===> link=1, carrier=1, linkUp=1
[14715.642200]: igb: Force link down due to IGB_FLAG_NEED_LINK_UPDATE
[14715.642203]: igb: setLinkDown() ===>
[14715.642206]: igb: setCarrier(0) ===>
[14715.642244]: igb: setCarrier() <===
[14715.642245]: igb: AppleIGB::stopTxQueue()
[14715.652453]: igb: Masking off all interrupts
[14715.668158]: igb: NVM word 0x03 is not mapped.
[14715.668197]: igb: Read INVM Word 0x0a = 402f
[14715.668497]: igb: Requested word 0x04 not found in OTP
[14715.668501]: igb: Initializing the IEEE VLAN
[14715.668831]: igb: Programming MAC Address into RAR[0]
[14715.668838]: igb: Clearing RAR[1-15]
[14715.668918]: igb: Zeroing the MTA
[14715.668955]: igb: Zeroing the UTA
[14715.669028]: igb: After fix-ups FlowControl is now = 3
[14715.669936]: igb: Reconfiguring auto-neg advertisement params
[14715.670241]: igb: autoneg_advertised 20
[14715.670244]: igb: Advertise 1000mb Full duplex
[14715.670393]: igb: Auto-Neg Advertising c01
[14715.670541]: igb: Restarting Auto-Neg
[14715.671138]: igb: No link register status 0x00007949 (try 1/10)
[14715.671448]: igb: No link register status 0x00007949 (try 2/10)
[14715.671759]: igb: No link register status 0x00007949 (try 3/10)
[14715.672071]: igb: No link register status 0x00007949 (try 4/10)
[14715.672381]: igb: No link register status 0x00007949 (try 5/10)
[14715.672695]: igb: No link register status 0x00007949 (try 6/10)
[14715.673007]: igb: No link register status 0x00007949 (try 7/10)
[14715.673315]: igb: No link register status 0x00007949 (try 8/10)
[14715.673627]: igb: No link register status 0x00007949 (try 9/10)
[14715.673936]: igb: No link register status 0x00007949 (try 10/10)
[14715.673952]: igb: Unable to establish link!!!
[14715.673952]: igb: Initializing the Flow Control address, type and timer regs
[14715.674439]: igb: No link register status 0x00007949 (try 1/1)
[14715.674441]: igb: Phy info is only valid if link is up
[14715.678798]: igb: Link down on en0
[14715.678800]: igb: setLinkDown() <===
[14715.678801]: igb: checkLinkStatus() <===
[14719.191877]: igb: OK Link register status: 0x0000796d
[14719.192036]: igb: hw->fc.current_mode = 3
[14719.192623]: igb: Flow Control = FULL.
[14719.192626]: igb: 1000 Mbs, igb: Full Duplex
[14719.192630]: igb: hw->fc.current_mode = 3
[14719.192631]: igb: checkLinkStatus() ===> link=1, carrier=0, linkUp=0
[14719.192633]: igb: setLinkUp() ===>
[14719.192926]: igb: OK Link register status: 0x0000796d
[14719.193944]: igb: 1000 Mbs, igb: Full Duplex
[14719.193948]: igb: setCarrier(1) ===>
[14719.193977]: igb: setCarrier() <===
[14719.209571]: igb: output: Dropping packet on disabled device
[14719.209574]: igb: output: Dropping packet on disabled device
[14719.209580]: igb: [LU]: Link Up on en0 (i211 Copper), 1-Gigabit, Full-duplex, Rx/Tx flow-control
[14719.209584]: igb: [LU]: CTRL=0x581c0241
[14719.209587]: igb: [LU]: CTRL_EXT=0x101000c0
[14719.209590]: igb: [LU]: STATUS=0x00280383
[14719.209594]: igb: [LU]: RCTL=0x04448032
[14719.209597]: igb: [LU]: PSRCTL=0x00000000
[14719.209604]: igb: [LU]: FCRTL=0x80004170
[14719.209607]: igb: [LU]: FCRTH=0x00004180
[14719.209610]: igb: [LU]: RDLEN(0)=0x00004000
[14719.209613]: igb: [LU]: RDTR=0x00000000
[14719.209616]: igb: [LU]: RADV=0x00000000
[14719.209619]: igb: [LU]: RXCSUM=0x00002f00
[14719.209622]: igb: [LU]: RFCTL=0x00010000
[14719.209626]: igb: [LU]: RXDCTL(0)=0x02040808
[14719.209629]: igb: [LU]: RAL(0)=0x90fe4b24
[14719.209632]: igb: [LU]: RAH(0)=0x80048d67
[14719.209634]: igb: [LU]: MRQC=0x00370002
[14719.209637]: igb: [LU]: TARC(0)=0x00000000
[14719.209640]: igb: [LU]: TARC(1)=0x00000000
[14719.209643]: igb: [LU]: TCTL=0xa50400fa
[14719.209647]: igb: [LU]: TXDCTL(0)=0x02100108
[14719.209650]: igb: [LU]: TXDCTL(1)=0x00000000
[14719.209653]: igb: [LU]: EEE Active 0
[14719.209654]: igb: setLinkUp() <===
[14719.209655]: igb: checkLinkStatus() <===
[14719.670453]: tcp_timers: tcp_output() returned 0 with retransmission timer disabled for 59236 > 443 in state 4, reset timer to 483tcp_timers: tcp_output() returned 0 with retransmission timer disabled for 59237 > 443 in state 4, reset timer to 526Sandbox apply: netbiosd[73339] <bytes>uipc_accept: peer disconnected unp_gencnt 159208

I'm afraid until I have X570 to test or there is someone with X570 with minimal development skills there is not much you could help with.

I have x570 and I'm a bit of developer myself.

@donatengit
Copy link
Owner

donatengit commented Aug 1, 2022

I'm afraid until I have X570 to test or there is someone with X570 with minimal development skills there is not much you could help with.

Hey @Cryptiiiic

@donatengit yes I do see compat_ifmu_ulist: en0 copyin() error 14compat_ifmu_ulist: en1 copyin() error 14 during the drop

Does it appear during normal network operating? Having such errors appearing for native mac users, I'm still not quite sure this is related but one guy from reddit was able to narrow this down. Anything that could interfere the connection: VPN, other network devices (like iPhone connected through usb), maybe some Network/PCI/Energy saving/Secure boot settings in BIOS, some advanced features of X570 assumed to be used by a driver? What kind of network load did you have that period of time? Unlikely but to confirm that there is no overheat AMD Power Gadget could help.

....
[14698.196711]: igb: disable() <===
...
[14712.685978]: igb: Powered up link.
[14712.703182]: igb: igb_open() <===
[14712.714021]: igb: enable() <===
[14715.641432]: igb: OK Link register status: 0x0000796d
[14715.641594]: igb: hw->fc.current_mode = 3
[14715.642188]: igb: Flow Control = FULL.
...
[14715.642200]: igb: Force link down due to IGB_FLAG_NEED_LINK_UPDATE
....
[14715.669028]: igb: After fix-ups FlowControl is now = 3
[14715.678801]: igb: checkLinkStatus() <===
[14719.191877]: igb: OK Link register status: 0x0000796d
...
[14719.209571]: igb: output: Dropping packet on disabled device
[14719.209574]: igb: output: Dropping packet on disabled device
[14719.209580]: igb: [LU]: Link Up on en0 (i211 Copper), 1-Gigabit, Full-duplex, Rx/Tx flow-control

Re: log
You were resetting device via Settings->Network or ifconfig en0 down/up and select NIC mode manually, correct? How stable would your connection be if you disable Flow control and try lower speeds (if your peer/ISP ok with that of course)?

Anyway, I can't check register values (e.g. igb: [LU]: CTRL=0x581c0241 ) at the moment but other than another round of reset due to something changed the IGB_FLAG_NEED_LINK_UPDATE nothing looks suspicious -- link was established successfully.

I have x570 and I'm a bit of developer myself.

Great! So main goal is to catch the period when connection is 'lost' (or if I get the situation with x570 correct packets begin to drop/stall silently) and corresponding reason, hoping this is not NIC or vendor specific. Still not clear whether it's in the core of the driver itself (tx/rx rings, interrupts, ...) or in the layer communicating with the OS, or due to some advanced X570 features assuming different driver behaviour.

I propose to start with building DEBUG version of the driver with XCode and then get familiar with the code structure: all high-level ethernet controller management is concentrated in class AppleIGB (extending IOEthernetController), lower level code spread across igb_ files and functions (majority of it is crossplatform). Then depending on your observations and hypothesis you could add additional hooks or debugs. Unfortunately the driver doesn't implement advanced remote debugging like IntelMausi and is not user-spaced based on DriverKit so no super-easy way to debug but this way I was able to fix all issues found with my i211 @ B450.

I don't think it's the network queue out of capacity as you would get special log message in DEBUG version but not 100% sure.

The driver is based on Intel's IGB 5.7.2, you could cherry-pick small relevant patches from Linux adoption and/or Intel's source code, last time I checked nothing had caught my eye.

P.s. I would be happy to assist/help you further please let me know if you'd like to move to some messenger for a quicker turnaround, e.g. Discord

@donatengit
Copy link
Owner

Hey @Cryptiiiic,

Any news?

Meanwhile very unlikely it would help but please try a version based on Intel's 5.11.4. If it doesn't help there is not much I could do without both the X570 hardware and free time to debug. I'll describe extended project status soon and update here.

@donatengit
Copy link
Owner

Guys,
In case anyone on X570 has some time, please provide logs from the version with some functions tracing (N.B. it generates tones of logs).

@Cryptiiiic
Copy link

@donatengit same issues with 5.11.4, I then switched to the normal debug build(non 5.11) here are those logs. Let me know if I didn't get proper logs.
igb_boot.txt
igb_crash.txt
igb_crash2.txt
igb_crash3.txt
igb_crash4.txt

@Cryptiiiic
Copy link

Cryptiiiic commented Nov 22, 2022

@donatengit
I found a working driver on forums. I haven't dropped once while using it finally. Unfortunately, I can't seem to find source code.
https://www.macos86.it/topic/6029-appleigb-and-intelmausi-integration/?tab=comments#comment-137062

Edit: Found source, its integrated into mausi fork. https://github.com/mbarbierato/IntelMausi/tree/Intgegration

@henkiewie
Copy link

Edit: Found source, its integrated into mausi fork. https://github.com/mbarbierato/IntelMausi/tree/Intgegration
The answer is at the end of the post you are reffering to. (still having problems with i210).
I upgraded to monterey with the new driver with nog problems. I hope they will fix it.

@llyonard
Copy link

Can confirm, the link was posted on the amd discord forum, i installed since and i had 0 disconnection on a x570 gigabyte aorus elite

@donatengit
Copy link
Owner

@Cryptiiiic @henkiewie @llyonard

Thanks a lot for your involvement and contribution, that's all amazing news! I'm a bit surprised tbh looking at the code that it was all that had been necessary (all this time) to make the whole I211 family work on MacOS (probably that's not all but don't have time to compile/check it myself) via well-tested IntelMausi codebase. Hope that @mbarbierato will be able to provide releases via Github, or to pull request, so it's being merged into IntelMausi for better community testing and support.

Going to update READMEs with this fork deprecation and links immediately, going to block bug reports and discussion in 2-3 week in case anyone has something to say.

Guys, please spend some time to create pull request to update Dortania guide with these new links.

@donatengit
Copy link
Owner

I've updated the README tried hard to mention every contributor, please let me know if missed anyone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants