This solution uses GNU Parallel. Time on my machine (Cygwin): 1.0s (compare results in C# README). The bottleneck is grep, so the particular implementation of grep can greatly influence the result. In particular, the grep that comes with MacOS seems to perform poorly in comparison to GNU grep.
Here are results for a few different implementations (all with GNU Parallel if not indicated otherwise):
|Tool||Time in s|
|GNU grep -FUi||1.0|
|ag w/o GNU Parallel||5.3|
|FindStr /l /i||1.2|