Skip to content

08 Deal with cross cutting issues

JackHeeley edited this page Jul 19, 2023 · 10 revisions

Deal with cross-cutting issues

Use UTF-8 everywhere

The dominant cross-cutting factor is the selection of character-encoding. This factor has emerged over time because a range of options have been deployed, and improved upon, with no 'winner' or consensus between vendors or products ever arising. This leaves developers spoiled for choice, and stuck with it.

The C++ language itself provides more than adequate support for any treatment of strings and character sets, but it remains agnostic about choices. No recommendation that might help secure consistent character encoding treatment is given. There are pros and cons associated with every choice, and little chance that anything you do, will match the choice made by another developer or vendor. This adds cost to sharing and borrowing code in the community.

This is not a little problem because strings, exceptions, system errors, logging, operating system calls, console output, and unit test support are all sensitive to character-encoding choice. These are the cross-cutting issues referred to in the page title. It means that there are more than half a dozen categories of places that are potentially impacted by character encoding mismatches.

Currently, if packages or source code from other parties are assembled into your solution they will only match by chance, and its up to each of us to resolve that problem in our own way. Moreover if your own ideas evolve and improve over time you will have a price to pay when reusing your own solutions in future. I suggest it might be worthwhile to think about this in depth and settle on a strategy that at least 'future proofs' your own code.

When I considered this issue I settled on "UTF-8 everywhere" as a good way to mitigate this concern. That's what you see here. I wont preach about it, The cost is low, and the value high, so its a no brainer for me. You can judge the value for yourself by looking at this sample, and I think you will agree that it assists clean and easy (reading and) writing of code at a low cost. There is minimal conversion taking place, and this is encapsulated in specific points in the code that where it is easy to understand the need for conversion. The approach is also not limiting should you need to use non-latin scripts etc.

To fully understand the 'rules' that apply, look at UTF-8 Manifesto, and skip to 10. How to do text on windows.

Whats missing in the reference above, is that while the program is good-to-go, the windows console still uses windows code page to interpret strings that are sent to it. If you are not using ASCII you also have to reconfigure Windows (the console output and stream buffering). This is not difficult, and to see how, look at utf8_console.cpp.

Provide wrapper objects for exceptions, system errors, logging and asserts

This decouples the implementation from the client code, and is a big help for source-level code reuse. You wont have hundreds of lines to revise before you can even assess the suitability of some candidate solution. You can supply a different wrapper object implementation for these four objects - one that matches your environment or project standards or personal taste.

Avoid naked operating system calls

My default ambition is to write client code in pure ISO C++20, so that business logic source code is platform neutral. This makes it ready to be exploited outside the domain for which it was originally written, hosted and tested. This is a good deal less expensive than writing multi-platform code, but recognizes that re-use opportunities should not be discarded. Nine out of ten cases won't be ported anyway, so unless you know in advance, and target multiple platforms, this ambition is a reasonable one. Portability need not be attained in the fullest sense. We do however want to pull stuff out of our archives, and repurpose them quickly.

Platform neutral source is a modest goal that can be met at modest cost when compared with writing fully portable source or binary. A lower training cost, low initial cost, low (but not zero) porting cost, and some duplicated testing and maintenance if code is ported, is just a prudent form of practical economy.

The proportion of project code that can be classified as 'client code' of course varies considerably depending on objectives that are set. In App3Dev the client code is small (just SampleProject), and you could be forgiven for missing the discipline I'm advocating here. For emphasis, that advice is:

Client code should not make direct calls to operating system specific functions, or include system specific headers. Instead encapsulate these in an object or objects that expose a pure ISO C++20 STL/gsl and UTF-8 everywhere interface.

N.B. Visual studio has compiler setting that can check and enforce pure ISO C++20 for parts of your solution. I have used this settings in the SampleProgram project (and addressed earlier design compromises to achieve this goal).