Skip to content

C Puzzle

Julian Kemmerer edited this page Oct 22, 2023 · 36 revisions

Problem

Imagine a sequence of zeros and ones: 1,1,0,1,0,0,1 etc

Binary bytes (ASCII characters) have been encoded into the sequence based on some given integer constant N(ex.=1,2,etc) as follows:

  • The sequence begins with 2N or more 1's
  • Upon seeing the first 1->0 transition skip the next 3N-1 values
  • The value arrived at is the first and least significant bit in the byte
  • The next seven more-significant bits of the byte are obtained by skipping every 2N-1 values
  • After the 8th bit completing the byte, repeat the above for next byte, beginning with looking for the 1->0 transition

Constraints

Fill in this C function such that it returns zero/NULL until a full eight bit byte/character is parsed.

  • The function is run once for each element in the sequence in order
  • Only add code inside the function
  • Only return a completed non-zero/null character once
    • Use a single return statement at the end of the function
  • No global variables, no pointers, pass by value only
  • No libraries, no OS functionality, etc
  • Use +=1 or -=1 instead of ++ or --
  • Use bool|u/int_t types
  • Hint: static local variables maintain state
uint8_t the_function(bool seq_val){
  uint8_t return_value = 0;
  // CODE HERE
  // One return statement at end of function
  return return_value;
}

Test

You can compile and run your version of the_function (along with a working solution) as shown in c_puzzle.c:

Set #define N to either 1 or 2. The input_values array sequences encodes two characters "Hi".

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#define N 2 // Some integer ex. 1,2
#define SKIP3 ((3*N)-1)
#define SKIP2 ((2*N)-1)

uint8_t the_function(bool seq_val){
  // Hint: static uint8_t char_buffer; // Buffer to hold eight bits
  uint8_t return_value = 0;
  // CODE HERE
  // One return statement at end of function
  return return_value;
}

// Wrapper function for unrelated reasons... 
void testbench(){
  // These input_values sequences encode "Hi" with N=1 or N=2

  #if N==1
  // N=1
  #define TEST_SIZE 45
  bool input_values[TEST_SIZE] = {
    1, // 2*N=2 or more 1's to start
    1, // Ex. 3
    1,
    //
    0, // The 1->0 transition \/
    0, // Skip 3*N - 1=2
    // 'H' = 0x48 = 0b01001000
    0, // Skipped
    0, // Bit[0]
    //
    0, // Skip 2*N - 1=1
    0, // Bit[1]
    //
    0, // Skip 2*N - 1=1
    0, // Bit[2]
    //
    1, // Skip 2*N - 1=1
    1, // Bit[3]
    //
    0, // Skip 2*N - 1=1
    0, // Bit[4]
    //
    0, // Skip 2*N - 1=1
    0, // Bit[5]
    //
    1, // Skip 2*N - 1=1
    1, // Bit[6]
    //
    0, // Skip 2*N - 1=1
    0, // Bit[7]
    //
    1, // 2*N=2 or more 1's 
    1, // Ex. 3
    1,
    //
    0, // The 1->0 transition \/
    0, // Skip 3*N - 1=2
    // 'i' = 0x69 = 0b01101001
    1, // Skipped
    1, // Bit[0]
    //
    0, // Skip 2*N - 1=1
    0, // Bit[1]
    //
    0, // Skip 2*N - 1=1
    0, // Bit[2]
    //
    1, // Skip 2*N - 1=1
    1, // Bit[3]
    //
    0, // Skip 2*N - 1=1
    0, // Bit[4]
    //
    1, // Skip 2*N - 1=1
    1, // Bit[5]
    //
    1, // Skip 2*N - 1=1
    1, // Bit[6]
    //
    0, // Skip 2*N - 1=1
    0, // Bit[7]
    //
    1, // 2*N=2 or more 1's 
    1, // 
    1
  };
  #endif

  #if N==2
  // N=2
  #define TEST_SIZE 87
  bool input_values[TEST_SIZE] = {
    1, // 2*N=4 or more 1's to start
    1, // Ex. 5
    1,
    1,
    1,
    //
    0, // The 1->0 transition \/
    0, // Skip 3*N - 1=5
    0, // Skipped
    0, // Skipped
    // 'H' = 0x48 = 0b01001000
    0, // Skipped
    0, // Skipped
    0, // Bit[0]
    0, // Skip 2*N - 1=3
    //
    0, // Skipped
    0, // Skipped
    0, // Bit[1]
    0, // Skip 2*N - 1=3
    //
    0, // Skipped
    0, // Skipped
    0, // Bit[2]
    0, // Skip 2*N - 1=3
    //
    1, // Skipped
    1, // Skipped
    1, // Bit[3]
    1, // Skip 2*N - 1=3
    //
    0, // Skipped
    0, // Skipped
    0, // Bit[4]
    0, // Skip 2*N - 1=3
    //
    0, // Skipped
    0, // Skipped
    0, // Bit[5]
    0, // Skip 2*N - 1=3
    //
    1, // Skipped
    1, // Skipped
    1, // Bit[6]
    1, // Skip 2*N - 1=3
    //
    0, // Skipped
    0, // Skipped
    0, // Bit[7]
    0,
    //
    1, // 2*N=4 or more 1's
    1, // Ex. 5
    1,
    1,
    1,
    //
    0, // The 1->0 transition \/
    0, // Skip 3*N - 1=5
    0, // Skipped
    0, // Skipped
    // 'i' = 0x69 = 0b01101001
    1, // Skipped
    1, // Skipped
    1, // Bit[0]
    1, // Skip 2*N - 1=3
    //
    0, // Skipped
    0, // Skipped
    0, // Bit[1]
    0, // Skip 2*N - 1=3
    //
    0, // Skipped
    0, // Skipped
    0, // Bit[2]
    0, // Skip 2*N - 1=3
    //
    1, // Skipped
    1, // Skipped
    1, // Bit[3]
    1, // Skip 2*N - 1=3
    //
    0, // Skipped
    0, // Skipped
    0, // Bit[4]
    0, // Skip 2*N - 1=3
    //
    1, // Skipped
    1, // Skipped
    1, // Bit[5]
    1, // Skip 2*N - 1=3
    //
    1, // Skipped
    1, // Skipped
    1, // Bit[6]
    1, // Skip 2*N - 1=3
    //
    0, // Skipped
    0, // Skipped
    0, // Bit[7]
    0,
    //
    1, // 2*N=4 or more 1's 
    1, // Ex. 5
    1,
    1,
    1
  };
  #endif
  static uint32_t seq_index = 0;
  uint8_t the_char = the_function(input_values[seq_index]);
  if(the_char != 0){
    printf("the_char = %c\n", the_char);
  }
  seq_index += 1;
}

// gcc c_puzzle.c -o c_puzzle && ./c_puzzle
int main(int argc, char** argv)
{
  int i = 0;
  while(i<TEST_SIZE)
  {
    testbench();
    i++;
  }    
  return 0;
}

Expected output:

the_char = H
the_char = i

What's going on here?

Why is this a page on a hardware description language wiki?

Read this and see if it sounds familiar at all 😏

Solutions

This puzzle is the same problem present in digital universal asynchronous receivers(/transmitters) UART. N=one half of one bit period in time. 2N=uart bit period, 3N is skipping 1.5 bits to get into center of bit for sampling (minus 1 is for counting/off by one reasons). Units are the clock cycle time (ex. 100MHz = 10ns per tick, per run of function).

C code in c_puzzle.c written to solve this puzzle can be used with the PipelineC tool (see comments in code) to simulate and synthesize for FPGAs + ASICs.

Hardware Dataflow

Looking at the examples/c_puzzle.c/the_function/pipeline_map.gv.png dataflow graph you can see how each operation maps to which locations in the C code. The size of each operation's 'box'/node in the graph is scaled to the time+space requirement of the computation ~= combinatorial delay x a rough estimate of size based on number of IO wires (TODO measuring/optimizing for fewer registers/LUTs).

First New Solution

This solution was kindly submitted by Discord|GitHub user siddhpant and was not intended to target hardware, so it makes a great experiment for evaluating 'what is natural for software folks to write' vs. 'what hardware it makes':

picture of solution1 code

Dataflow: solution1 dataflow graph

In the the above dataflow graph, some of the larger and first items you might see are:

  • BIN_OP_SL_uint8_t_uint32_t line 112 4.4ns
  • BIN_OP_PLUS_int32_t_int33_t line 109 2.9ns

After synthesizing for an example Xilinx Artix 7 part, Vivado logs produced by the tool show how the_function has a maximum operating frequency of ~142 MHz:

Report Cell Usage: 
+------+-------+------+
|      |Cell   |Count |
+------+-------+------+
|1     |CARRY4 |    43|
|2     |LUT1   |     5|
|3     |LUT2   |     2|
|4     |LUT4   |    18|
|5     |LUT5   |    40|
|6     |LUT6   |    55|
|7     |FDRE   |   113|
+------+-------+------+

  Source:                 the_function_0CLK_a6fd0288/bit_pos_reg[4]/C
  Destination:            return_output_output_reg_reg[0]/R
  Data Path Delay:        6.522ns  (logic 2.414ns (37.013%)  route 4.108ns (62.987%))
  Logic Levels:           9  (CARRY4=6 LUT4=1 LUT5=1 LUT6=1)

The reported critically long delay path is from the bit_pos register to the return output. This can be confirmed by tracing the path in the dataflow from the bit_pos | NOW current value of the bit_pos register through the noted-as-large BIN_OP_SL_uint8_t_uint32_t line 112 4.4ns node all the way to the right OUT | return_output return value of the function.

Based on the results below for the original solution, these are the recommended changes for a better hardware result:

  • Don't use a variable-shifter << , ex. letter |= seq_val << bit_pos;
    • Instead of shifting seq_val by some 'run-time' value = bit_pos, it uses less resources to shift by a constant amount for each bit (one shift per bit in letter).
  • Use smaller width integer types
    • Of course on a CPU this barely matters - but in hardware every bit counts, can reduce counter sizes to as small as N allows.

Second New Solution

This solution was kindly submitted by Discord user can.l, as an almost adversarial example of very concise C code. The code was broken apart into multiple lines for PipelineC + readability reasons.

picture of solution2 code

Dataflow: solution2 dataflow graph

After synthesizing for an example Xilinx Artix 7 part, Vivado logs produced by the tool show how the_function has a maximum operating frequency of ~223 MHz:

Report Cell Usage: 
+------+-------+------+
|      |Cell   |Count |
+------+-------+------+
|1     |CARRY4 |     2|
|2     |LUT1   |     1|
|3     |LUT2   |    10|
|4     |LUT3   |     5|
|5     |LUT4   |     5|
|6     |LUT5   |    15|
|7     |LUT6   |    22|
|8     |FDRE   |    33|
+------+-------+------+

Source:          the_function_0CLK_f98a8a12/b_reg[0]/C
Destination:     the_function_0CLK_f98a8a12/s_reg[2]/D
Data Path Delay: 4.429ns  (logic 1.917ns (43.283%)  route 2.512ns (56.717%))
Logic Levels:    6  (CARRY4=2 LUT2=1 LUT4=1 LUT5=1 LUT6=1)

FMAX ~= 223MHz

Third New Solution

This next solution was submitted by PoignardAzur and has the highest estimated operating frequency (i.e. shortest critical path length) of all solutions thus far.

picture of solution3 code

Dataflow: solution3 dataflow graph

After synthesizing for an example Xilinx Artix 7 part, Vivado logs produced by the tool show how the_function has a maximum operating frequency of ~311 MHz:

Report Cell Usage: 
+------+-----+------+
|      |Cell |Count |
+------+-----+------+
|1     |LUT2 |     3|
|2     |LUT3 |     7|
|3     |LUT4 |    12|
|4     |LUT5 |    11|
|5     |LUT6 |    15|
|6     |FDRE |    35|
+------+-----+------+

Source:                 the_function_0CLK_8f4d4fc5/remaining_reg[5]/C
Destination:            return_output_output_reg_reg[0]/D
Data Path Delay:        3.158ns  (logic 0.999ns (31.634%)  route 2.159ns (68.366%))
Logic Levels:           3  (LUT4=1 LUT6=2)

FMAX ~= 311MHz

The critical path has is shown to be from the remaining count storage register to the returned output character (a generated output register).

Original Solution

Here is a short ranking of how slow operations are, fastest first, prefer selecting from the top of the list:

  1. Constant amount bit shifts / selecting some constant Nth bit / constant index local variable array/struct access
  2. Bitwise operators (&,|,etc)
  3. Multiplexing (if-else logic)
  4. Integer add/sub
  5. Integer mult
  6. Variable amount bit shifting (barrel shifters)
  7. Random array access
  8. Float mult
  9. Float add
  10. Integer divide
  11. Float divide

picture of solution0 code

Dataflow: solution 0 dataflow

In the the above dataflow graph, some of the larger and first items you might see are:

  • BIN_OP_MINUS_uint8_t_uint1_t line 88 2.3ns
  • BIN_OP_EQ_uint8_t_uint3_t line 78 2.3ns

This version of the_function has a maximum operating frequency of ~290 MHz:

Report Cell Usage: 
+------+-----+------+
|      |Cell |Count |
+------+-----+------+
|1     |LUT3 |     1|
|2     |LUT4 |     3|
|3     |LUT5 |     5|
|4     |LUT6 |    14|
|5     |FDRE |    29|
+------+-----+------+

  Source:                 the_function_0CLK_83e31706/counter_reg[0]/C
  Destination:            return_output_output_reg_reg[0]/R
  Data Path Delay:        2.925ns  (logic 0.875ns (29.915%)  route 2.050ns (70.085%))
  Logic Levels:           2  (LUT4=1 LUT6=1)

The reported critically long delay path is from the counter register to the returned output. This can be confirmed by tracing the path in the dataflow from the counter | NOW current value of the counter register through the noted-as-large BIN_OP_MINUS_uint8_t_uint1_t line 88 2.3ns node all the way to the right OUT | return_output return value of the function.

Clone this wiki locally