Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add Source-Process Property to TcpConnectionInformation #63099

Closed
AlmightyLks opened this issue Dec 23, 2021 · 13 comments
Closed
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Net
Milestone

Comments

@AlmightyLks
Copy link

AlmightyLks commented Dec 23, 2021

Background and motivation

Libraries such as the ones made by dotpcap would benefit greatly to provide the source of internet traffic. And I know it's possible under Windows and should be under Linux. That's what I've gathered from my research and prototyping.

I am very open to anyone with great experience to hook in and suggest and discuss the possibilities for this to work.
I've looked into finding an API from glibc and alike for Linux, which I could P/Invoke, as I do with Windows, but no luck with that.
I'd be happy - grateful even - if someone knows a better way to fetch the information, as this is rather unordinary territory for me.

I know that, under Linux, netstat -ano -p tcp and ss -p provide both information, port & processid.
So a quick strace on both showed me, that they're both doing what I am about to propose as approach for Linux - Fetching lots of directory info.

API Proposal

namespace System.Net.NetworkInformation
{
    /// <summary>
    /// Provides information about the Transmission Control Protocol (TCP) connections on the local computer.
    /// </summary>
    public abstract class TcpConnectionInformation
    {
        /// <summary>
        /// Gets the source-Process of the Transmission Control Protocol (TCP) connection.
        /// </summary>
        public abstract Process Process { get; }

        // existing
        //public abstract IPEndPoint LocalEndPoint { get; }
        //public abstract IPEndPoint RemoteEndPoint { get; }
        //public abstract TcpState State { get; }
    }
}

API Usage

using System.Net.NetworkInformation;

var properties = IPGlobalProperties.GetIPGlobalProperties();
foreach (var connection in properties.GetActiveTcpConnections())
{
    Console.WriteLine(
        "{0}: [{1}] {2} <=> {3}", 
        connection.Process.ProcessName, 
        connection.State, 
        connection.LocalEndPoint, 
        connection.RemoteEndPoint
        );
}
/*
  Example Output:
  
  Foo: [Established] 192.168.1.1:8000 <=> 123.123.123.123:443
  System: [TimeWait] 192.168.1.1:2225 <=> 100.100.100.100:6000
  Discord: [TimeWait] 192.168.1.1:5050 <=> 80.81.8.10:5050
  ...
*/

Alternative Designs

No response

Risks

I would know how to hack something together for both, Windows & Linux.
However, for Windows it involves, besides a P/Invoke, to iterate through all connections and potentially all processes to match the id with a Process object.
For Linux, it involves, besides reading from /proc/net/tcp, to also iterate through all directories in /proc/ to match the /proc/net/tcp's gathered socket INode id with the one read from /proc/{processid}/fd's socket file.
(I would like to point out the last paragraph of Background and motivation for this)


My prototype seems performance taxing for Windows

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=6.0.100
  [Host]     : .NET Core 6.0.0 (CoreCLR 6.0.21.52210, CoreFX 6.0.21.52210), X64 RyuJIT
  DefaultJob : .NET Core 6.0.0 (CoreCLR 6.0.21.52210, CoreFX 6.0.21.52210), X64 RyuJIT


|                    Method |         Mean |       Error |      StdDev |
|-------------------------- |-------------:|------------:|------------:|
|         GetTcpInformation |     291.1 us |     4.69 us |     4.16 us |
|      GetTcpInformationNew |   4,087.7 us |    21.67 us |    20.27 us |

GetTcpInformation being the original vanilla IPGlobalProperties.GetActiveTcpConnections()
And GetTcpInformationNew my prototype with a P/Invoke, and two iterations through all open connections and potentially all running processes

Mind, my prototype consists of using Lists, tuples and probably lots of other taxing data structures and ways.

        public List<(Process, MIB_TCPROW_OWNER_PID)> GetTcpInformationNew()
        {
            var result = new List<(Process, MIB_TCPROW_OWNER_PID)>();
            var connections = IPStuff.GetAllTCPConnections(); // P/Invoke
            var processes = Process.GetProcesses();
            for (int i = 0; i < connections.Count; i++)
            {
                var connection = connections[i];
                Process process = null;
                for (int j = 0; j < processes.Length; j++)
                {
                    var localProcess = processes[j];
                    if (connection.ProcessId == localProcess.Id)
                        process = localProcess;
                }
                result.Add((process, connection));
            }
            return result;
        }

I am not sure yet how taxing it would be to iterate through all /proc/* directories and list each individual sub-directories of /proc/{processId}/fd to parse and filter.
Here an example:

image

Seemingly chaotic, but it makes sense. 😄


Top right

An example program which continuously sends data through a tcp connection, taking up port 8000 (An example target what I am looking for)

Bottom right

Getting my example program's process id (3793) to simulate and skip the process of scanning through all /proc/ directories and directly navigate to the right folder to see result

Top left

cat /proc/net/tcp to view all open tcp connections including the socket's inode from our target port

Bottom left

ls /proc/3793/fd -l containing the process id (which we know) from a "simulated" iteration through all folders, now at the point of finding the folder of which contains the socket with the target inode (socket[52096])


So see this as a bump of an idea, as I know it's possible and great software such as Fiddler and alike do this since forever, for windows at least.

@AlmightyLks AlmightyLks added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 23, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Dec 23, 2021
@ghost
Copy link

ghost commented Dec 23, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Libraries such as the ones made by dotpcap would benefit greatly to provide the source of internet traffic. And I know it's possible under Windows and should be under Linux. That's what I've gathered from my research and prototyping.

I am very open to anyone with great experience to hook in and suggest and discuss the possibilities for this to work.
I've looked into finding an API from glibc and alike for Linux, which I could P/Invoke, as I do with Windows, but no luck with that.
I'd be happy - grateful even - if someone knows a better way to fetch the information, as this is rather unordinary territory for me.

I know that, under Linux, netstat -ano -p tcp and ss -p provide both information, port & processid.
So a quick strace on both showed me, that they're both doing what I am about to propose as approach for Linux - Fetching lots of directory info.

API Proposal

namespace System.Net.NetworkInformation
{
    /// <summary>
    /// Provides information about the Transmission Control Protocol (TCP) connections on the local computer.
    /// </summary>
    public abstract class TcpConnectionInformation
    {
        /// <summary>
        /// Gets the source-Process of the Transmission Control Protocol (TCP) connection.
        /// </summary>
        public abstract Process Process { get; }

        /// <summary>
        /// Gets the local endpoint of a Transmission Control Protocol (TCP) connection.
        /// </summary>
        public abstract IPEndPoint LocalEndPoint { get; }
 
        /// <summary>
        /// Gets the remote endpoint of a Transmission Control Protocol (TCP) connection.
        /// </summary>
        public abstract IPEndPoint RemoteEndPoint { get; }
 
        /// <summary>
        /// Gets the state of this Transmission Control Protocol (TCP) connection.
        /// </summary>
        public abstract TcpState State { get; }
    }
}

API Usage

using System.Net.NetworkInformation;

var properties = IPGlobalProperties.GetIPGlobalProperties();
foreach (var connection in properties.GetActiveTcpConnections())
{
    Console.WriteLine(
        "{0}: [{1}] {2} <=> {3}", 
        connection.Process.ProcessName, 
        connection.State, 
        connection.LocalEndPoint, 
        connection.RemoteEndPoint
        );
}
/*
  Example Output:
  
  Foo: [Established] 192.168.1.1:8000 <=> 123.123.123.123:443
  System: [TimeWait] 192.168.1.1:2225 <=> 100.100.100.100:6000
  Discord: [TimeWait] 192.168.1.1:5050 <=> 80.81.8.10:5050
  ...
*/

Alternative Designs

No response

Risks

I would know how to hack something together for both, Windows & Linux.
However, for Windows it involves, besides a P/Invoke, to iterate through all connections and potentially all processes to match the id with a Process object.
For Linux, it involves, besides reading from /proc/net/tcp, to also iterate through all directories in /proc/ to match the /proc/net/tcp's gathered socket INode id with the one read from /proc/{processid}/fd's socket file.
(I would like to point out the last paragraph of Background and motivation for this)


My prototype seems performance taxing for Windows

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=6.0.100
  [Host]     : .NET Core 6.0.0 (CoreCLR 6.0.21.52210, CoreFX 6.0.21.52210), X64 RyuJIT
  DefaultJob : .NET Core 6.0.0 (CoreCLR 6.0.21.52210, CoreFX 6.0.21.52210), X64 RyuJIT


|                    Method |         Mean |       Error |      StdDev |
|-------------------------- |-------------:|------------:|------------:|
|         GetTcpInformation |     291.1 us |     4.69 us |     4.16 us |
|      GetTcpInformationNew |   4,087.7 us |    21.67 us |    20.27 us |

GetTcpInformation being the original vanilla IPGlobalProperties.GetActiveTcpConnections()
And GetTcpInformationNew my prototype with a P/Invoke, and two iterations through all open connections and potentially all running processes

Mind, my prototype consists of using Lists, tuples and probably lots of other taxing data structures and ways.

        public List<(Process, MIB_TCPROW_OWNER_PID)> GetTcpInformationNew()
        {
            var result = new List<(Process, MIB_TCPROW_OWNER_PID)>();
            var connections = IPStuff.GetAllTCPConnections(); // P/Invoke
            var processes = Process.GetProcesses();
            for (int i = 0; i < connections.Count; i++)
            {
                var connection = connections[i];
                Process process = null;
                for (int j = 0; j < processes.Length; j++)
                {
                    var localProcess = processes[j];
                    if (connection.ProcessId == localProcess.Id)
                        process = localProcess;
                }
                result.Add((process, connection));
            }
            return result;
        }

I am not sure yet how taxing it would be to iterate through all /proc/* directories and list each individual sub-directories of /proc/{processId}/fd to parse and filter.
Here an example:

image

Seemingly chaotic, but it makes sense. 😄


Top right

An example program which continuously sends data through a tcp connection, taking up port 8000 (An example target what I am looking for)

Bottom right

Getting my example program's process id (3793) to simulate and skip the process of scanning through all /proc/ directories and directly navigate to the right folder to see result

Top left

cat /proc/net/tcp to view all open tcp connections including the socket's inode from our target port

Bottom left

ls /proc/3793/fd -l containing the process id (which we know) from a "simulated" iteration through all folders, now at the point of finding the folder of which contains the socket with the target inode (socket[52096])


So see this as a bump of an idea, as I know it's possible and great software such as Fiddler and alike do this since forever, for windows at least.

Author: AlmightyLks
Assignees: -
Labels:

api-suggestion, area-System.Net, untriaged

Milestone: -

@teo-tsirpanis
Copy link
Contributor

Why not just return the source process' PID? Currently the System.Net.NetworkInformation assembly does not reference System.Diagnostics.Process at all. Better not couple APIs for no reason.

And why are you using Process.GetProcesses and an O(n²) loop? You can directly get the process object for a specific PID by calling Process.GetProcessById.

@AlmightyLks
Copy link
Author

AlmightyLks commented Dec 23, 2021

Why not just return the source process' PID?

Well, how user-friendly would it be to return / provide numeral id's over a process object

And why are you using Process.GetProcesses and an O(n²) loop? You can directly get the process object for a specific PID by calling Process.GetProcessById.

It can happen, that sockets are being shut down and no longer associated with a process.
That shows by not having an "linked" process

Process running with an associated open socket:
image

After the process was closed:
image

(https://www.howtouselinux.com/post/tcp_time_wait_linux)

From some benchmarking, I came to the conclusion, that a consistent ~2ms time would be better than ~0.003ms and occasional ~6ms cases.

However


I just did some confirmation benchmarks, and now I am seeing different results.
Seeing this confirmation I think you're right, there is practically no significant difference. So I'd say the GetProcessById_CatchFail() would be the go-to solution

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=6.0.100
  [Host]     : .NET Core 6.0.0 (CoreCLR 6.0.21.52210, CoreFX 6.0.21.52210), X64 RyuJIT
  DefaultJob : .NET Core 6.0.0 (CoreCLR 6.0.21.52210, CoreFX 6.0.21.52210), X64 RyuJIT


|                      Method | id |         Mean |      Error |     StdDev |
|---------------------------- |--- |-------------:|-----------:|-----------:|
|                GetProcesses |  ? | 2,044.905 us | 10.6577 us |  8.3208 us |
|              GetProcessById |  ? |     2.870 us |  0.0221 us |  0.0206 us |
| GetProcessById_SafeFailLinq | -1 | 2,027.962 us | 34.1328 us | 31.9278 us |
|     GetProcessById_SafeFail | -1 | 1,997.293 us | 21.5441 us | 20.1524 us |
|    GetProcessById_CatchFail | -1 | 2,554.766 us | 48.9251 us | 58.2418 us |
    public class TcpConnectionInfoPrep
    {
        private int ProgramProcessId;

        [GlobalSetup]
        public void Setup()
        {
            ProgramProcessId = Process.GetCurrentProcess().Id;
        }

        [Benchmark]
        public Process[] GetProcesses()
            => Process.GetProcesses();

        [Benchmark]
        public Process GetProcessById()
            => Process.GetProcessById(ProgramProcessId);

        [Benchmark]
        [Arguments(-1)]
        public Process GetProcessById_SafeFailLinq(int id)
            => Process.GetProcesses().FirstOrDefault(_ => _.Id == id);

        [Benchmark]
        [Arguments(-1)]
        public Process GetProcessById_SafeFail(int id)
        {
            var processes = Process.GetProcesses();
            for(int i = 0; i < processes.Length; i++)
            {
                var curProc = processes[i];
                if (curProc.Id == id)
                    return curProc;
            }
            return null;
        }

        [Benchmark]
        [Arguments(-1)]
        public Process GetProcessById_CatchFail(int id)
        {
            try
            {
                return Process.GetProcessById(id);
            }
            catch
            {
                return null;
            }
        }
    }

@teo-tsirpanis
Copy link
Contributor

Well, how user-friendly would it be to return / provide numeral id's over a process object

A PID is a well-known concept. If we name it TcpConnectionInformation.ProcessId, there is no way to get confused over what this number means. Besides it is the minimal thing .NET has to do to support getting the process that holds a connection. Creating a Process is not a cheap thing, and someone might just need the PID (that's why for example .NET 5 introduced Environment.CurrentProcessId, to avoid a wasteful Process.GetCurrentProcess().Id if you only need the PID).

@AlmightyLks
Copy link
Author

Oh, does Process.GetProcess*() spawn new Process objects, or are they managed and chached within?

@teo-tsirpanis
Copy link
Contributor

No, they are not cached.

@scalablecory
Copy link
Contributor

scalablecory commented Dec 24, 2021

I agree a ProcessId property makes more sense.

While I have no objection to the property itself, I have a hard time swallowing that change in performance which I suspect would become prohibitive in a server environment with thousands of open sockets. Can we find a better way to implement this, to make it optional, or to refactor the API in a way that doesn't affect the performance for current users?

@AlmightyLks
Copy link
Author

I have been thinking about this.
And I see two options.

Option 1:
Find a similar native library function, which can be p/invoked, which provides the info as the win32 api does

Option 2:
Create a separate function which gets a ProcessId by a specified port

It can't be that for Windows it's a singluar win api function call to get all information at once;
Vs for Linux I'd have to open and read the /proc/net/tcp file, to then iterate through and fetch all sub directories of /proc/N/fd where N is practically sequential and can possibly range from 1 to 2147483647, if not even 4294967295, to find resolve symbolic links and furthermore find a file named socket[Y] where Y is the socket INode gathered from the info fetched from /proc/net/tcp, meaning I need to fetch all file names, parse the file name to get the INode from socket[Y] and match the port from /proc/net/tcp and the process id by parsing the directory name /proc/N/fd, so I can combine the two via the inode id

I've made some prototyping to get the mentioned approach for linux to work and turns out, if I was to add the process id gathering to IPGlobalProperties.GetActiveTcpConnections(), I'd increase the latency by 100+ times

Which is horrible, but makes sense.

As a side note, that is how on linux both netstat and ss work behind the curtains, they also read all files and parse the text output.

I'd love to see a linux equivalent to iphlpapi.dll's GetExtendedTcpTable()

@AlmightyLks
Copy link
Author

Found a discussion on this topic in a WireShark issue:
https://gitlab.com/wireshark/wireshark/-/issues/1184

More specifically I'd like to point out the following:
https://gitlab.com/wireshark/wireshark/-/issues/1184#note_400621739

@karelz
Copy link
Member

karelz commented Jan 4, 2022

Triage:

  • There are concerns that it had to happen upfront and therefore add non-trivial perf overhead -- we would need more data on the impact
  • It is not clear if we can do it on all OSs (esp. macOS, Android)
    @geoffkizer @scalablecory do you have any more insights or concerns?

@AlmightyLks
Copy link
Author

@karelz as a side note: GetActiveTcpConnections() already isn't supported for android, see:
https://source.dot.net/#System.Net.NetworkInformation/System/Net/NetworkInformation/IPGlobalProperties.cs,37

@karelz
Copy link
Member

karelz commented Jan 6, 2022

Triage:
@scalablecory discussed this more offline with @AlmightyLks on Discord.
Apparently in current form, it would be not implementable on Linux without adding non-trivial perf overhead.
There may be option to introduce some new API which would be explicit opt-in into more perf overhead and results of which would be then joinable with current TcpConnectionInformation.

We will close the API proposal in current form.

Even the "new" potential API form seems to be fairly corner-case scenario. We would probably want to see more usage cases before adding such API.

@tmds perhaps it might be exposed in your extension library?

Let us know if you disagree, or if we missed some points. Thanks!

@karelz karelz closed this as completed Jan 6, 2022
@karelz karelz added this to the 7.0.0 milestone Jan 11, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Feb 10, 2022
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Net
Projects
None yet
Development

No branches or pull requests

5 participants