Permalink
Browse files

More fundamentals and examples

  • Loading branch information...
1 parent ad0efb9 commit 3c70121dc6003eec9df89dd7304d5a6343d7897e @dennis714 committed Sep 15, 2016
Showing with 1,803 additions and 59 deletions.
  1. +1 −0 ChangeLog
  2. +1 −0 advanced/350_cpp/classes/1_inheritance.tex
  3. BIN advanced/550_more_structs/blockout/hs.png
  4. BIN advanced/550_more_structs/blockout/hs345678.png
  5. BIN advanced/550_more_structs/blockout/hs999999.png
  6. BIN advanced/550_more_structs/blockout/mc10.png
  7. BIN advanced/550_more_structs/blockout/mc3.png
  8. +208 −0 advanced/550_more_structs/highscore_EN.tex
  9. +8 −0 advanced/550_more_structs/main_EN.tex
  10. +75 −0 advanced/550_more_structs/struct_as_array_EN.tex
  11. +84 −0 advanced/550_more_structs/unsized_array_in_struct_EN.tex
  12. +124 −0 advanced/550_more_structs/version_of_structure_EN.tex
  13. +85 −0 advanced/600_memmove/main_EN.tex
  14. +117 −0 advanced/650_stack/main_EN.tex
  15. +20 −28 advanced/main.tex
  16. +2 −15 appendix/x86/instructions/POPCNT.tex
  17. +1 −0 appendix/x86/registers.tex
  18. +1 −0 digging_into_code/constants_EN.tex
  19. +1 −0 digging_into_code/constants_RU.tex
  20. +2 −1 examples/examples.tex
  21. +107 −0 examples/timedate/1.lst
  22. +18 −0 examples/timedate/2.lst
  23. BIN examples/timedate/6_pairs_zeroed.png
  24. BIN examples/timedate/counterclockwise.png
  25. +209 −0 examples/timedate/main_EN.tex
  26. BIN examples/timedate/math.png
  27. BIN examples/timedate/reshack.png
  28. +85 −0 fundamentals/AND_EN.tex
  29. +10 −0 fundamentals/AND_OR_XOR_as_MOV_EN.tex
  30. +123 −0 fundamentals/AND_OR_as_SUB_ADD_EN.tex
  31. +36 −0 fundamentals/POPCNT_EN.tex
  32. +10 −0 fundamentals/ZX_Spectrum_ROM.lst
  33. +398 −0 fundamentals/data_types_and_numbers.tex
  34. BIN fundamentals/koi8r.png
  35. +10 −11 fundamentals/main.tex
  36. +54 −0 fundamentals/one_more_EN.tex
  37. +2 −1 fundamentals/signed_numbers_EN.tex
  38. BIN fundamentals/zx_spectrum_ROM.png
  39. +1 −0 macros.tex
  40. +1 −0 patterns/04_scanf/2_global/olly_EN.tex
  41. +2 −0 patterns/04_scanf/2_global/olly_ITA.tex
  42. +2 −0 patterns/04_scanf/2_global/olly_RU.tex
  43. +2 −2 patterns/13_arrays/3_BO_protection/main_EN.tex
  44. +1 −1 patterns/13_arrays/3_BO_protection/main_RU.tex
  45. +1 −0 patterns/14_bitfields/4_popcnt/main_EN.tex
  46. +1 −0 patterns/14_bitfields/4_popcnt/main_RU.tex
View
@@ -1,3 +1,4 @@
+15-Sep-2016: More fundamentals and examples
14-Sep-2016: More of my blog posts are copypasted into the book
06-Sep-2016: Blog posts about FAT12 and fortune file has been copypasted into the book
05-Sep-2016: Blog posts about entropy and encrypted DB case #1 has been
@@ -1,4 +1,5 @@
\subsubsection{\RU{Наследование классов}\EN{Class inheritance}}
+\label{cpp_inheritance}
\RU{О наследованных классах можно сказать, что это та же простая структура, которую мы уже рассмотрели,
только расширяемая в наследуемых классах.}
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@@ -0,0 +1,208 @@
+\subsection{High-score file in \q{Block out} game and primitive serialization}
+
+Many videogames has high-score file, sometimes called \q{Hall of fame}.
+Ancient \q{Block out}\footnote{\url{http://www.bestoldgames.net/eng/old-games/blockout.php}} game
+(3D tetris from 1989) isn't exception, here is what we see at the end:
+
+\begin{figure}[H]
+\centering
+\myincludegraphics{advanced/550_more_structs/blockout/hs.png}
+\caption{High score table}
+\end{figure}
+
+Now we can see that the file has changed after we added our name is \IT{BLSCORE.DAT}.
+Let's take a look on it in Midnight Commander:
+
+\begin{figure}[H]
+\centering
+\myincludegraphics{advanced/550_more_structs/blockout/mc10.png}
+\caption{\IT{BLSCORE.DAT} file in Midnight Commander}
+\end{figure}
+
+All entries are clearly visible.
+The very first byte is probably number of entries.
+Second is zero and, in fact, number of entries can be 16-bit value spanning over first two bytes.
+
+Next, after \q{Irene} name we see 0xDA and 0x2A bytes.
+Irene has score 10970, and this is exactly 0x2ADA in hexadecimal radix.
+So high score value is probably 16-bit integer, or maybe 32-bit integer: there are two more zero bytes after.
+
+Now let's think about the fact that both array elements and structure elements are always placed in memory in adjacently to each other.
+\myindex{\CStandardLibrary!write()}
+\myindex{\CStandardLibrary!fwrite()}
+\myindex{\CStandardLibrary!read()}
+\myindex{\CStandardLibrary!fread()}
+That enables us to write the whole array/structure to the file using simple \IT{write()} or \IT{fwrite()} function,
+and then restore it using \IT{read()} or \IT{fread()}, as simple as that.
+This is what is called \IT{serialization} nowadays.
+
+\subsubsection{Read}
+
+Now let's write C program to read highscore file:
+
+\begin{lstlisting}
+ #include <assert.h>
+ #include <stdio.h>
+ #include <stdint.h>
+ #include <string.h>
+
+ struct entry
+ {
+ char name[11]; // incl. terminating zero
+ uint32_t score;
+ char date[11]; // incl. terminating zero
+ } __attribute__ ((aligned (1),packed));
+
+ struct highscore_file
+ {
+ uint8_t count;
+ uint8_t unknown;
+ struct entry entries[10];
+ } __attribute__ ((aligned (1), packed));
+
+ struct highscore_file file;
+
+ int main(int argc, char* argv[])
+ {
+ FILE* f=fopen(argv[1], "rb");
+ assert (f!=NULL);
+ size_t got=fread(&file, 1, sizeof(struct highscore_file), f);
+ assert (got==sizeof(struct highscore_file));
+ fclose(f);
+ for (int i=0; i<file.count; i++)
+ {
+ printf ("name=%s score=%d date=%s\n",
+ file.entries[i].name,
+ file.entries[i].score,
+ file.entries[i].date);
+ };
+ };
+\end{lstlisting}
+
+We need GCC \IT{((aligned (1),packed))} attribute so that all structure fields will be packed on 1-byte boundary.
+
+Of course it works:
+
+\begin{lstlisting}
+ name=Irene..... score=10970 date=08-12-2016
+ name=Saddler... score=7819 date=08-12-2016
+ name=Mary...... score=300 date=08-12-2016
+ name=James..... score=151 date=08-12-2016
+ name=Mike...... score=135 date=08-12-2016
+ name=AAAAAAAAAA score=135 date=08-12-2016
+ name=Joe....... score=130 date=08-12-2016
+ name=John...... score=128 date=08-12-2016
+ name=Doe....... score=124 date=08-12-2016
+ name=Alex...... score=120 date=08-12-2016
+\end{lstlisting}
+
+(Needless to say, each name is padded with dots, both on screen and in the file, perhaps, for aesthetical reasons.)
+
+\subsubsection{Write}
+
+Let's check if we right about width of score value. Is it really has 32 bits?
+
+\begin{lstlisting}
+ int main(int argc, char* argv[])
+ {
+ FILE* f=fopen(argv[1], "rb");
+ assert (f!=NULL);
+ size_t got=fread(&file, 1, sizeof(struct highscore_file), f);
+ assert (got==sizeof(struct highscore_file));
+ fclose(f);
+
+ strcpy (file.entries[1].name, "Mallory...");
+ file.entries[1].score=12345678;
+ strcpy (file.entries[1].date, "08-12-2016");
+
+ f=fopen(argv[1], "wb");
+ assert (f!=NULL);
+ got=fwrite(&file, 1, sizeof(struct highscore_file), f);
+ assert (got==sizeof(struct highscore_file));
+ fclose(f);
+ };
+\end{lstlisting}
+
+Let's run Blockout:
+
+\begin{figure}[H]
+\centering
+\myincludegraphics{advanced/550_more_structs/blockout/hs345678.png}
+\caption{High score table}
+\end{figure}
+
+First two digits (1 or 2) are choked. Perhaps, this is formatting issues... but the number is almost correct.
+Now I'm changing it to 999999 and run again:
+
+\begin{figure}[H]
+\centering
+\myincludegraphics{advanced/550_more_structs/blockout/hs999999.png}
+\caption{High score table}
+\end{figure}
+
+Now it's correct. Yes, high score value is 32-bit integer.
+
+\subsubsection{Is it serialization?}
+
+\dots almost.
+Serialization like this is highly popular in scientific software, where efficiency and speed is much more important
+than converting into \ac{XML} or \ac{JSON} and back.
+
+One important thing is that you obviously cannot serialize pointers, because each time you load the file into memory,
+all the structures may be allocated in different places.
+
+But: if you work on some kind of low-cost \ac{MCU} with simple \ac{OS} on it
+and you have your structures allocated at always same
+places in memory, you probably can save and restore pointers as well.
+
+\subsubsection{Random noise}
+
+When I prepared this example, I had to run \q{Block out} many times and played for it a bit
+to fill high-score table with random names.
+
+And when there were just 3 entries in the file, I saw this:
+
+\begin{figure}[H]
+\centering
+\myincludegraphics{advanced/550_more_structs/blockout/mc3.png}
+\caption{\IT{BLSCORE.DAT} file in Midnight Commander}
+\end{figure}
+
+The first byte has value of 3, meaning there are 3 entries.
+And there are 3 entries present.
+But then we see a random noise at the second half of file.
+
+The noise is probably has its origin in uninitialized data.
+Perhaps, \q{Block out} allocated memory for 10 entries somewhere in heap, where, obviously,
+some pseudorandom noise (left from something else) was present.
+Then it set first/second byte, fill 3 entries, and then it never touched 7 entries left, so they are written
+to the file as is.
+
+When \q{Block out} loads high score file at the next run, it reads number of entries from the first/second byte (3) and
+then completely ignores what is after it.
+
+This is common problem.
+
+\myindex{Microsoft Word}
+Microsoft Word versions from 1990s has been often left pieces of previously edited texts into the *.doc* files.
+It was some kind of amusement back then, to get a \IT{.doc} file from someone,
+then open it in a hexadecimal editor and read something else,
+what has been edited on that computer before.
+
+\myindex{Heartbleed}
+\myindex{OpenSSL}
+The problem can be even much more serious: Heartbleed bug\footnote{\url{https://en.wikipedia.org/wiki/Heartbleed}}
+in OpenSSL.
+
+\subsubsection{Homework}
+
+\q{Block out} has several polycubes (flat/basic/extended), size of pit can be configured, etc.
+And it seems, for each configuration, \q{Block out} has its own high score table.
+I've notices that some information is probably stored in \IT{BLSCORE.IDX} file.
+This can be a homework for hardcore \q{Block out} fans---to understand its structure as well.
+
+The \q{Block out} files are here: \url{http://beginners.re/examples/blockout.zip}
+(including the binary high score files
+I've used in this example).
+You can use DosBox to run it.
+
@@ -0,0 +1,8 @@
+\section{More about structures}
+
+% subsections:
+\input{advanced/550_more_structs/struct_as_array_EN}
+\input{advanced/550_more_structs/unsized_array_in_struct_EN}
+\input{advanced/550_more_structs/version_of_structure_EN}
+\input{advanced/550_more_structs/highscore_EN}
+
@@ -0,0 +1,75 @@
+\subsection{Sometimes a C structure can be used instead of array}
+
+\subsubsection{Arithmetic mean}
+
+\begin{lstlisting}
+ #include <stdio.h>
+
+ int mean(int *a, int len)
+ {
+ int sum=0;
+ for (int i=0; i<len; i++)
+ sum=sum+a[i];
+ return sum/len;
+ };
+
+ struct five_ints
+ {
+ int a0;
+ int a1;
+ int a2;
+ int a3;
+ int a4;
+ };
+
+ int main()
+ {
+ struct five_ints a;
+ a.a0=123;
+ a.a1=456;
+ a.a2=789;
+ a.a3=10;
+ a.a4=100;
+ printf ("%d\n", mean(&a, 5));
+ // test: https://www.wolframalpha.com/input/?i=mean(123,456,789,10,100)
+ };
+\end{lstlisting}
+
+This works: \IT{mean()} function will never access behind the end of \IT{five\_ints} structure,
+because 5 is passed, meaining, only 5 integers will be accessed.
+
+\subsubsection{Putting string into structure}
+
+\begin{lstlisting}
+ #include <stdio.h>
+
+ struct five_chars
+ {
+ char a0;
+ char a1;
+ char a2;
+ char a3;
+ char a4;
+ } __attribute__ ((aligned (1),packed));
+
+ int main()
+ {
+ struct five_chars a;
+ a.a0='h';
+ a.a1='i';
+ a.a2='!';
+ a.a3='\n';
+ a.a4=0;
+ printf (&a); // prints "hi!"
+ };
+\end{lstlisting}
+
+\IT{((aligned (1),packed))} attribute must be used, because otherwise,
+each structure field will be aligned on 4-byte or 8-byte boundary.
+
+\subsubsection{Summary}
+
+This is just another example of how structures and arrays are stored in memory.
+Perhaps, no sane programmer will do something like in this example, except in case of some clever hack.
+Or maybe in case of source code obfuscation?
+
@@ -0,0 +1,84 @@
+\subsection{Unsized array in C structure}
+
+In some win32 structures we can find ones with last element defined as one character:
+
+\begin{lstlisting}
+ typedef struct _SYMBOL_INFO {
+ ULONG SizeOfStruct;
+ ULONG TypeIndex;
+
+ ...
+
+ ULONG MaxNameLen;
+ TCHAR Name[1];
+ } SYMBOL_INFO, *PSYMBOL_INFO;
+\end{lstlisting}
+
+( \url{https://msdn.microsoft.com/en-us/library/windows/desktop/ms680686(v=vs.85).aspx} )
+
+This is a hack, meaning, the last field is array of unknown size, to be calculated at the time of structure allocation.
+
+Why: \IT{Name} field may be short, so why to define it with some kind of \IT{MAX\_NAME}
+constant which can be 128, 256, or even bigger?
+
+Why not to use pointer instead? Then you'll need to allocate two blocks: one for structure and the other one for string.
+This may be slower and may require larger memory overhead.
+Also, you need dereference pointer (i.e., read address of the string from the structure)---not a big deal, but some
+people say this is still surplus cost.
+
+This is also known as \IT{struct hack}: \url{http://c-faq.com/struct/structhack.html}.
+
+Example:
+
+\begin{lstlisting}
+ #include <stdio.h>
+
+ struct st
+ {
+ int a;
+ int b;
+ char s[];
+ };
+
+ void f (struct st *s)
+ {
+ printf ("%d %d %s\n", s->a, s->b, s->s);
+ // f() can't replace s[] with bigger string - size of allocated block is unknown at this point
+ };
+
+ int main()
+ {
+ #define STRING "Hello!"
+ struct st *s=malloc(sizeof(struct st)+strlen(STRING)+1); // incl. terminating zero
+ s->a=1;
+ s->b=2;
+ strcpy (s->s, STRING);
+ f(s);
+ };
+\end{lstlisting}
+
+In short, it works because C has no array boundary checks. Any array is treated as having infinite size.
+
+Problem: after allocation, the whole size of allocated block for structure is unknown (except for memory manager),
+so you can't just replace string with larger string.
+You would still be able to do so if the field would be declared as something like \IT{s[MAX\_NAME]}.
+
+In other words, you have a structure plus an array (or string) fused together in the single allocated memory block.
+Another problem is what you obviously can't declare two such arrays in single structure, or to declare another field
+after such array.
+
+Older compilers require to declare array with at least one element: \IT{s[1]}, newer allows to declare it as variable-sized
+array: \IT{s[]}.
+This is also called \IT{flexible array member}\footnote{\url{https://en.wikipedia.org/wiki/Flexible_array_member}}
+in C99 standard.
+
+Read more about it in
+GCC documentation\footnote{\url{https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html}},
+MSDN documentation\footnote{\url{https://msdn.microsoft.com/en-us/library/b6fae073.aspx}}.
+
+Dennis Ritchie (one of C creators) called this trick \q{unwarranted chumminess with the C implementation}
+(perhaps, acknowledging hackish nature of the trick).
+
+Like it or not, use it or not:
+it is still another demonstration on how structures are stored in memory, that's why I write about it.
+
Oops, something went wrong.

0 comments on commit 3c70121

Please sign in to comment.