Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Error] Error in `dotnet': double free or corruption (out) ,V2.0.9328(latest) ,Ubuntu16.04x64,.netcore2.1.4 #220

Closed
BenDerPan opened this issue Aug 22, 2018 · 28 comments
Assignees
Labels

Comments

@BenDerPan
Copy link

BenDerPan commented Aug 22, 2018

My Test Env:

GraphEngine.Core: V2.0.9328(latest) , I rebuild this morning.

OS: Ubuntu16.04 x64

.Net Core: V2.1.4

My code run well on windows, the exception occurs when I run my same code on Linux. and I found that when the storage contains data and query data with syn protocol this will happen.

My server side output

image

My client side only throw exception: System.IO.IOException "Network error occurs."

@BenDerPan
Copy link
Author

More Info: My storage data copy from windows, and it loaded success. Empty storage will not throw exception.

@BenDerPan
Copy link
Author

BenDerPan commented Aug 23, 2018

Client side got no response here,check protocol signatures is a new feature?

[ INFO    ] *****************************************************
[ INFO    ] ServerCount: 1
[ INFO    ]     192.168.102.160:5304
[ INFO    ] ProxyCount: 0
[ INFO    ] *****************************************************
[ INFO    ] Checking Client-Server protocol signatures...

Server side output contains strange ip, and http for likq not startup...

[ DEBUG   ] Preserved sync (rsp) message GetCellType is registered.
[ DEBUG   ] Preserved sync (rsp) message QueryMemoryWorkingSet is registered.
[ DEBUG   ] Preserved async message Shutdown is registered.
[ DEBUG   ] Sync (rsp) message 0 is registered.
[ DEBUG   ] Sync (rsp) message 1 is registered.
[ DEBUG   ] Sync (rsp) message 2 is registered.
[ INFO    ] Listening endpoint :5304
[ INFO    ] Waiting for client connection ...
[ INFO    ] My IPEndPoint: 127.0.1.1:5304
[ INFO    ] *****************************************************
[ INFO    ] ServerCount: 1
[ INFO    ]     192.168.102.160:5304
[ INFO    ] ProxyCount: 0
[ INFO    ] *****************************************************
[ DEBUG   ] ServerSocket: Incomming connection from 215.58.192.168
[ DEBUG   ] ServerSocket: Incomming connection from 215.60.192.168
[ INFO    ] Checking Server-Server protocol signatures...
[ DEBUG   ] ServerSocket: Incomming connection from 215.68.192.168
[ DEBUG   ] ServerSocket: Incomming connection from 215.70.192.168

@yatli
Copy link
Contributor

yatli commented Aug 23, 2018

uh oh, looks like the networking subsystem crashed on connection. investigating.

@yatli
Copy link
Contributor

yatli commented Oct 1, 2018

@BenDerPan hey could you try the eventloop branch? The Linux networking is improved.

@yatli
Copy link
Contributor

yatli commented Oct 1, 2018

I’ve also noticed the weird addresses reported by the server on client connection — these connections should be all coming from localhost, but appear to be random in the log.

@BenDerPan
Copy link
Author

@yatli great , I will try it later, not on my PC now. :)

@BenDerPan
Copy link
Author

@yatli the address report is still strange, but there is a new problem, my env : Server on ubuntu 16.04x64, Client on windows 10, and GraphEngine 2.0.9542.

The client side sometimes will dead with no response when I save someting, but sometime it's ok. I think it is still the problem of network

@yatli
Copy link
Contributor

yatli commented Oct 6, 2018

@BenDerPan thanks for testing! how about the double free corruption?

@BenDerPan
Copy link
Author

@yatli there is no exception on Linux server side, so the double free corruption seems fixed. but I am not sure.

@yatli yatli self-assigned this Oct 6, 2018
@yatli
Copy link
Contributor

yatli commented Oct 6, 2018

attempting a minimal repro.

@yatli
Copy link
Contributor

yatli commented Oct 6, 2018

minimal repro failed.

Client side Windows 10 x64 eventloop HEAD:

using System;
using Trinity;
using Trinity.Storage;

namespace test_ge
{
    class Program
    {
        static void Main(string[] args)
        {
            TrinityConfig.LoadConfig("trinity.xml");
            TrinityConfig.CurrentRunningMode = RunningMode.Client;
            Global.CloudStorage.LoadCell(0, out var cell, out _);
            Console.WriteLine(cell.Length);
        }
    }
}

Server side Linux, eventloop HEAD:

using System;
using Trinity;
using Trinity.Storage;
using Trinity.Network;

namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            Global.LocalStorage.SaveCell(0, new byte[128]);
            Global.LocalStorage.SaveStorage();

            TrinityServer server = new TrinityServer();
            server.Start();
            Console.Write("Press any key to stop...");
            Console.ReadKey();
        }
    }
}

windows client correctly outputs 128 and exits.
@BenDerPan custom syn protocol?

@BenDerPan
Copy link
Author

@yatli yes, I used custom syn protocol

@yatli
Copy link
Contributor

yatli commented Oct 6, 2018

@BenDerPan repro failed.

Client:

using System;
using Trinity;
using Trinity.Storage;
using test_ge.S;

namespace test_ge
{
    class Program
    {
        static void Main(string[] args)
        {
            TrinityConfig.LoadConfig("trinity.xml");
            TrinityConfig.CurrentRunningMode = RunningMode.Client;
            Global.CloudStorage.LoadCell(0, out var cell, out _);
            Console.WriteLine(cell.Length);

            using(var rsp = Global.CloudStorage[0].P())
            {
                Console.WriteLine(rsp);
            }
        }
    }
}

server (Linux):

using System;
using Trinity;
using Trinity.Storage;
using Trinity.Network;

namespace test
{
    class Server: SBase
    {
        public override void PHandler(PayloadWriter rsp)
        {
            rsp.foo = 123;
            rsp.bar = "bar";
        }
    }


    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hello World!");
            Global.LocalStorage.SaveCell(0, new byte[128]);
            Global.LocalStorage.SaveStorage();

            Server server = new Server();
            server.Start();
            Console.Write("Press any key to stop...");
            Console.ReadKey();
        }
    }
}

TSL:

struct Payload
{
    int foo;
    string bar;
}

protocol P
{
    Type: Syn;
    Request: void;
    Response: Payload;
}

server S
{
    protocol P;
}

The client correctly outputs the response.

@BenDerPan
Copy link
Author

@yatli I am trying to clean all the cache ,and rebuild for a test with my project

@yatli yatli added the network label Oct 6, 2018
@BenDerPan
Copy link
Author

@yatli Still the same: dead again, no response, as pic:

image

@yatli
Copy link
Contributor

yatli commented Oct 6, 2018

@BenDerPan you mean crash, exception or freeze?

@BenDerPan
Copy link
Author

@yatli I mean freeze, the code stopped at that line ,but there is no crash or exception

@yatli
Copy link
Contributor

yatli commented Oct 7, 2018

Understood. From your screenshot I see that I should try larger payloads.

Attempting a repro.

@yatli
Copy link
Contributor

yatli commented Oct 8, 2018

@BenDerPan do you observe the same symptom if running a windows server program?

@BenDerPan
Copy link
Author

@yatli I am trying now

@BenDerPan
Copy link
Author

BenDerPan commented Oct 8, 2018

@yatli not freeze, but there is exception:
image

 在 Trinity.Storage.RemoteStorage._error_check(TrinityErrorCode err)
   在 Trinity.Storage.RemoteStorage._use_synclient(Func`2 func)
   在 Trinity.Storage.RemoteStorage.SendMessage(Byte* message, Int32 size, TrinityResponse& response)
   在 Trinity.Storage.MessagePassingExtensionMethods.GetCommunicationSchema(IMessagePassingEndpoint storage, String& name, String& signature)
   在 Trinity.Storage.MemoryCloud.CheckProtocolSignatures_impl(RemoteStorage storage, RunningMode from, RunningMode to)
   在 Trinity.Storage.FixedMemoryCloud.Open(ClusterConfig config, Boolean nonblocking)
   在 Trinity.Global.get_CloudStorage()
   在 

@yatli
Copy link
Contributor

yatli commented Oct 8, 2018

two possible cases:

  1. the remote handler did throw an exception
  2. the remote handler wasn't called at all. instead the default handler (which always throws an exception) was called.

@yatli
Copy link
Contributor

yatli commented Oct 8, 2018

could you come up with a minimal repro? I can then proceed to debug it.

@BenDerPan
Copy link
Author

@yatli sorry I can't , it's part of our big system, pick it out is a terrible work :(

@BenDerPan
Copy link
Author

@yatli I tested run the server on windows ,and build simple query client on both Ubuntu and windows, everything is ok , now I guess it's the version problem, server use eventloop version, and my exception client use master version.

@yatli
Copy link
Contributor

yatli commented Oct 9, 2018

@BenDerPan you mean you rebuild everything and the problem is gone?

@BenDerPan
Copy link
Author

@yatli yes, I can't reproduce the problem now, but I don't know why , maybe because I rebooted my PC.

@yatli
Copy link
Contributor

yatli commented Oct 9, 2018

alright. let's close this issue for now (as the original double-free corruption is known to be resolved).
you're welcome to open up new issues following up this topic.

thanks again!

@yatli yatli closed this as completed Oct 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants