Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug during a handover in a simple asymmetric Dual Connectivity scenario #157

Open
gehirndienst opened this issue May 29, 2023 · 9 comments

Comments

@gehirndienst
Copy link

Hello,

I have encountered a bug in the NSA handover procedure, and during my research, I think I came across a similar open issue: #112.

The issue arises in a simple scenario where an UE moves eastward and sends UL packets to the server node without any interference. Initially, the UE is connected to eNB1/gNB1 through NSA coupling and later switches to eNB2 during the handover approximately at the 29th second. However, after the handover, the UE stops sending any packets to the server. The message dialogue shows that the UE sends airframes to eNB2, but there is no further progress. This issue persists regardless of the value of the tos parameter (three different configurations in ini file for LTE, NR and SB were tested).

I would appreciate your insights into the possible cause of this bug. I have limited knowledge about the inner workings and the whole idea of the handover procedure and dual connectivity management. Could you shed some light on the source of this issue and provide guidance on how to address it?

Thank you and best regards,
Nikita

P.S. I have attached the scenario in the archive, and the network topology is illustrated in the picture below. The scenario is expected to be extracted in simu5G/simulations/NR folder

LTE_DC_Bug
dc_bug.zip

@giovanninardini
Copy link
Collaborator

giovanninardini commented May 31, 2023

Hello,

apparently this has little to do with handover and dual connectivity.
The UE is sending big packets (20KB): when the UE is close to its serving cell, it has high SINR/CQI and manages to send the whole packet in a few milliseconds. Instead, when the UE moves away from the serving cell, its SINR/CQI gets lower and this implies that the UE needs more Resource Blocks to send the same amount of bytes.
When the UE is in the cell-border zone (the one you point out in your question), the UE does not manage to empty its buffer because it can just send few bytes in each subframe. This results in a situation where the UE sends data on every TTI, and that's why the log shows that the UE continuously sends airframes to eNB2.

Now, I think that all those airframe gets stucked at the receiving PDCP entity at eNB2, probably because at some point one of those airframes went lost and the PDCP entity waits to deliver the out-of-order packets until it receives the lost one... but the lost one will never arrive...

A possible workaround could be to allow out-of-order delivery at the receiving PDCP entity. See the corresponding parameter in NRRxPdcpEntity in stack/pdcpRrc/NRPdcpRrc.ned file.

Best regards.
Giovanni

Edit: actually it might be a bug occurring at handover... as it is likely that the sequence number assigned to the PDCP PDU is not reset at the handover. I will investigate more on this. For now, the workaround above should work.

@giovanninardini
Copy link
Collaborator

Hi @gehirndienst ,

I found the origin of the bug.
All you need to do is to add the following line of code in NRPdcpRrcUe::fromDataPort() - line 74:

    // select the correct nodeId for the source
    MacNodeId nodeId = (lteInfo->getUseNR()) ? nrNodeId_ : nodeId_;
    lteInfo->setSourceId(nodeId);    // <-- NEW LINE!!!

This fixes the issue you pointed out.

I also pushed this fix to the master branch of the repository.

Best regards.
Giovanni

@gehirndienst
Copy link
Author

gehirndienst commented Jun 1, 2023

Hi @giovanninardini

I have tested your suggestions and have made the following observations:

In the dummy scenario, both approaches (using your patch and enabling out-of-order delivery) fixed the issue. However, in my main scenario (which I am unable to publish, unfortunately), the issue was not resolved with the patch and only partially addressed by enabling out-of-order delivery. I still believe that there is an error occurring after a handover in NSA, as indicated by the following log:

ack1

Initially, everything appears to be fine. The main User Equipment (UE) (ship[0], tos = 10) is connected to a gNB via an eNB. It sends ship-* packets and receives ack packets from the server. However, immediately after that, the following sequence of events takes place:

ack2

ship[0] switches to another eNB, specifically eNB105587, which has its own secondary gNB and solely uses this new eNB as the primary medium for traffic, disregarding its secondary gNB. When out-of-order delivery is not enabled, this causes the UL direction to become stuck, and no more packets are sent. Otherwise, the packets are successfully transmitted via the new eNB. The issue arises when the acknowledgement packet is sent from one eNB to the previous one, rather than to the UE or the corresponding secondary gNB, meaning ship[0] will never receive this acknowledgement. Unfortunately, I am unable to share the specific scenario causing this issue, nor can I reproduce it in a dummy scenario. However, it is evident that the NSA handover is still problematic for me, and it is clear that the root cause lies in assigning a proper masterId/nrMasterId.

With best regards
Nikita

@giovanninardini
Copy link
Collaborator

But this is a different issue than the one in your first question, right?

Here, the server receives correctly all the packets from the UE (i.e., ship[0] in your case) but the problem is that the packet from the server (i.e., the ack) reaches the correct eNB (the one called eNB104542), which then forwards the packet to the old eNB (the one called eNB104542) via X2.
So, we should find out which condition causes this problem.

Could you get the following information (at the time when the misbehavior occur)?

  • is the UE connected to eNB104542 and its corresponding secondary gNB? (i.e., what is the value of nrMasterId?)
  • is the useNr flag in function NRTxPdcpEntity::deliverPdcpPdu set to true?
  • what is the value of destId in function NRTxPdcpEntity::deliverPdcpPdu ?

I think this problem might be related to the one in #156

@gehirndienst
Copy link
Author

gehirndienst commented Jun 2, 2023

Hi @giovanninardini

You're right, that's a new issue, the asymmetric issue is fixed with your last commit, but I believe there is a common source for both of them. I created a lighter version of my initial scenario and was able to reproduce actually all the issues mentioned here. The files are attached at the bottom.

The UE now is a simple UDPApp that sends a small amount of data, while the server echoes the packets. Please run the 'TestRun' configuration and observe what happens especially starting from 30s when the first master starts to experience out-of-range and sometimes the first handover occurs by assigning only new eNB105887 master (I also noticed that that depends on whether I used "fast run" or "express run"). Meanwhile UL traffic could be stuck up to 3 seconds (!) with sending 1B packets to the out-of-range gNB. And sometimes the eNB-eNB bug at that time occurs, refer to attached picture below. outOfOrderDelivery is set to False while the last commit is pulled and simu5G is recompiled.

dlbug

I also conducted a debugging session to find answers to your questions:

  1. The interesting thing is that the 'masterId' and 'nrMasterId' values remain the same throughout and equal to their initial values if one checks this by printing the following line in the application every second:
    std::cout << "Master id: " << getAncestorPar("masterId") << " NR Master Id: " << getAncestorPar("nrMasterId") << std::endl;
  2. useNR was 'false' and that's ok since no gNB is involved (that's probably a bug, eNB105587 has a secondary gNB)
  3. 1025, that is also seems to be correct...

With best regards
Nikita

dc_bug_test.zip

@giovanninardini
Copy link
Collaborator

Thanks Nikita, I will take a look as soon as possible and get back to you if a find something interesting.

@giovanninardini
Copy link
Collaborator

Hi, maybe I can give you an initial feedback.

when I tried your scenario, indeed I observed that at t=30 the UE and the gNB keeps exchanging 1B-packets. This happens because:

  • the UE asks for UL resources to transmit data,
  • the gNB sends one UL grant of one RB to let the UE send its Buffer Status Report (BSR),
  • the UE has very low UL SINR (hence very low UL CQI), and the UE could just send 3 Bytes in that RB, according to the formulas for TBS computation. The problem is that 3 Bytes are not enough to send the BSR (!) - we need at least 2B for the RLC header and 2B for the MAC header. Thus, the UE is not able to send its BSR. In turn, it cannot send UL data.

In fact, looking at the log when the UE computes the CQI, we see that the SINR is quite low:

Screenshot_20230605_105710

Screenshot_20230605_105741

The UE is able to transmit data again when the SINR gets a bit higher and it can transmit its BSR to the gNB.

Please note that the UE is still attached to that gNB because it is attached to its master eNB, and because the cell selection procedure is based on DL SINR, which is still quite high (and higher than other base stations) instead.

One other thing: I noticed that UL packets go through the gNB, which is correct because you set tos=10. But then the ack from the server goes through the corresponding master eNB (instead of going through the gNB as well). See the last airframe in the picture below:
Screenshot_20230605_110713

This is because the tos field for the packet from the server is set to 0 (see below). Unless this is the behavior you wanted, probably you might want to fix this.
Screenshot_20230605_111106

@gehirndienst
Copy link
Author

Hi @giovanninardini,

Thank you for your time and effort! Your information about the 4-byte headers was completely new to me, and I greatly appreciate it. Is there a way to change the default cell association in Simu5G? In real life, I'm sure it is somehow managed so that not only DL SINR is considered, especially during the high UL workload.

Regarding the TOS issue, it's strange that even after setting **.tos = 10 in the code, UdpEchoApp seems to be ignoring it. Thank you for bringing that to my attention.

Regarding the nrMasterId and masterId, have you confirmed that they are changed correctly and that the handover occurs later on, around 40–50 seconds, also in a correct way?

P.S.: could all this story be related to a bug in switching between carriers? I noticed that if I change the indexes of the blocks:

*.ship[*].cellularNic.numNRCarriers = 2
*.ship[*].cellularNic.nrChannelModel[0].componentCarrierIndex = 6->5
*.ship[*].cellularNic.nrChannelModel[1].componentCarrierIndex = 5->6

then no data is sent at all, which makes no sense!

Thank you once again for your assistance.

@giovanninardini
Copy link
Collaborator

The LtePhyUe::handoverHandler function is the one that computes DL SINR and decides whether the handover is needed (and toward which base station). The trick would be to make that function compute the UL SINR instead. Perhaps, this can be done easily by changing the direction from DL to UL in the lteInfo object (but I have never tried it)

Apparently the UdpEchoApp does not have a tos parameter at all in its NED file...

As far as the handover issue is concerned, nrMasterId and masterId looks to be changed correctly. And having some data going from one eNB to another eNB during the handover is normal. The problem here (or one of the problems...) is that at the handover there are X2 control messages are exchanged among the wrong eNBs.
I think there might be some issue related to the X2 configuration. I mean, your X2 settings in the INI file look correct, but then when it comes to parse the data structures within the X2 manager and within X2ClientApp/X2ServerApp it looks like some information is read mistakenly, causing X2 messages to be exchanged between wrong eNBs... I have not been able to find the problem source though, yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants