Skip to content

Commit

Permalink
more typos fixed in baremetal
Browse files Browse the repository at this point in the history
  • Loading branch information
dwelch committed Jan 19, 2016
1 parent 253c8c8 commit 9adfc2d
Showing 1 changed file with 59 additions and 49 deletions.
108 changes: 59 additions & 49 deletions baremetal/README
@@ -1,6 +1,7 @@
this is a rough draft, if/when I complete this draft I will at some point
go back through and rework it to improve it.
Update: draft 2. I went through almost all of this and cleaned it up.
Update: draft 3. Lots of typos and misspellings that I had missed before

THIS IS NOT AN ASSEMBLY LANGUAGE TUTORIAL, IT DOES HAVE A LOT OF
ASSEMBLY LANGUAGE IT IT. IF YOU ARE STUCK FOCUSING ON THE ASSEMBLY
Expand Down Expand Up @@ -881,14 +882,14 @@ orange global variables way above. Data actually is broken up into
different segments sometimes, and in particular with the GNU tools.
Most of the code out there that has global variables the globals are
not defined, not initialized in the code, but the language declares
those are assumed to be zero when you start using them (if you have
not changed them before you used them). So there is a special data
segment called .bss which holds all of our global variables that when
we start running C code should be zero. These are lumped together so
that some code can easily go through that chunk of memory and zero that
those as assumed to be zero when you start using them (if you have
not changed them before you read them). So there is a special data
segment called .bss which holds all of our .data that when we start
running C code should be zero. These are lumped together so that some
code can easily go through that chunk of memory and zero that
area before branching to the C entry point. Another segment we may
encounter is the .rodata segment. Sometimes even with GNU tools you
may find the read only data in the .text segment.
may find the read only data in the .text segment.

For fun lets make one of each:

Expand Down Expand Up @@ -932,10 +933,10 @@ Well notice that I used -O2 on the gcc command line this means
optimization level 2. -O0 or optimizaiton level 0 means no optimization
-O1 means some and -O2 is the maximum safe level of optimization using
the gcc compiler. There is a -O3 but we are not supposed to trust that
to be as tested as -O2. I am not going to get into that but recommend
you use -O2 often, esp with embedded bare metal where size and speed
are important. I use it here because it produces much less code than
no optimization, you can play with compiling and disassembling these
to be as well tested as -O2. I am not going to get into that but
recommend you use -O2 often, esp with embedded bare metal where size and
speed are important. I use it here because it produces much less code
than no optimization, you can play with compiling and disassembling these
things on your own with less or without optimization to see what
happens.

Expand Down Expand Up @@ -1056,7 +1057,8 @@ the value 9 that we pre-initialized.
I want to point something out here that is very important for general
bare metal programming. What do we have above, something like 12 32
bit numbers which is 12*4 = 48 bytes. So if I make this a true
binary we should see 48 bytes right? Well you would be wrong:
binary (memory image) we should see 48 bytes right? Well you would be
wrong:

baremetal > ls -al hello.elf
-rwxr-xr-x 1 root root 38002 Sep 23 15:06 hello.elf
Expand Down Expand Up @@ -1126,7 +1128,7 @@ There are 0x60000000 bytes between these two items, that means the
binary file created would at least be 0x60000000 bytes which is
1.6 GigaBytes. If you are like me you probably dont always have
1.6Gig of disk space handy. Much less wanting it to be filled with a
singel file which is mostly zeros. You can start to see the appeal for
single file which is mostly zeros. You can start to see the appeal for
these not really a binary binary file formats like elf and ihex and
srec. They only define the real data and dont have to hold the zero
filler.
Expand Down Expand Up @@ -1520,7 +1522,7 @@ variable pear now has its own address in memory, it did not get
optimized out.

I dont expect you to know assembly language but what I want to you to
see is a continuation what we discussed before with respect to the
see is a continuation of what we discussed before with respect to the
branch link instruction and the link register. The ARM instruction
set uses branch link (bl) to make function calls. The branch means
goto or jump or branch the program to some address. The link means
Expand Down Expand Up @@ -1689,7 +1691,7 @@ runtime=end-start;

And this may lead you to believe that this is not the code causing
your performance problems. Or hopefully you realize that this code
is executing way to fast and there is something wrong with your
is executing way too fast and there is something wrong with your
experiment. Knowing enough assembly code to see what is going on
will clue you into the optimization, just like in the notmain() example
above.
Expand All @@ -1705,10 +1707,9 @@ compiler to do what you want or of you have borrowed some code you
might have to have GCC do the assembling or linking. Some folks like
to put C stuff like defines and comment symbols in their assembler code
which works fine if you feed it through gcc, but it is not assembly
code it is some sort of hybrid. Doesnt stop people from doing it, and
when you borrow that code you either have to fix the code or use the C
compiler as an assembler.

language it is some sort of hybrid. Doesnt stop people from doing it,
and when you borrow that code you either have to fix the code or use the
C compiler as an assembler.

bootstrap.s

Expand Down Expand Up @@ -2004,7 +2005,7 @@ instructions provide some cost and performance benefits for embedded
systems. First off you can pack more instructions into the same
amount of memory, understanding that it may take more instructions to
perform the same task using thumb instructions than it would have using
ARM. My experiements at the time showed about 10-15% more instructions,
ARM. My experiments at the time showed about 10-15% more instructions,
but half the memory so that was a fair tradeoff. I know of one platform
that went so far as to use 16 bit memory busses, which actually made
thumb mode run much faster than ARM mode on that platform. That
Expand All @@ -2021,7 +2022,13 @@ bits you can have in that register. Note that that lower bit
is stripped off it is only used by the bx instruction itself the
address in the program counter always has the lower two bits zero
for ARM mode (4 byte instructions) and the lower bit zero for
thumb instructions (2 or 4 byte instructions).
thumb instructions (2 or 4 byte instructions). Note the bx/blx
instruction is not the only way to switch modes, sometimes you can
use the pop instruction, but bx works the same way on all ARM
architectures that I know of, the other solutions (pop for example)
vary in if/how they work for switching modes depending on the ARM
architecture in question. So that makes for very unportable code
across ARM if you are not careful. When in doubt just use BX.

Here again the goal is not to teach assembly but you may want to
get the ARM Architectural Reference Manual for this platform
Expand Down Expand Up @@ -2054,19 +2061,18 @@ least try. Assembly language in general does not have a standard.
A company designs a chip, which means they create an instruction set,
binary machine code instructions, and generally they create an
assembly language so that they can write down and talk about those
instructions without going insane with confusion and/or pain. And
not always but often if that company actually wants to sell those
processors they create or hire someone to create an assembler and
instructions using mnemonics instead of patterns of ones and zeros.
And not always but often if that company actually wants to sell those
processors, so they create or hire someone to create an assembler and
a compiler or few. Assembly language, like C language, has
directives that are not actually code like #pragma in C for example
you are using that to talk to the compiler not using it as code
necessarily. Assembly has those as well, many of them. The vendor
will often at a minimum use the syntax for the assembly language
instructions in the manual they create or have someone create to
provide to users of this processor they want to sell and if smart
will have the assembler match that manual. But that manual although
you might consider it a standard, is not, the machine code is the
hard and fast standard, the ASCII assembly language is fair game and
necessarily. Assembly has those as well, many of them. It is in the
processor vendors best interest to use the same assembly language
syntax for the instructions in the processor manual in the assembler
that they create or have someone create for them. But that manual
although you might consider it a standard, is not, the machine code is
the hard and fast standard, the ASCII assembly language is fair game and
anyone can create their own assembly language for that processor
with whatever syntax and directives that they want. ARM has a nice
set of compiler tools, or at least when I worked at a place that paid
Expand All @@ -2084,7 +2090,9 @@ instead of @ because this ; is the proper, almost universal, symbol for
a comment in assembly languages from many vendors. This @ is not.
Combined like this ;@ and you get code that is commented in both worlds
equally. Enough with that rant, this asm code will continue to be GNU
assembler specific I dont know if it works on any other assembler.
assembler specific as that is the toolchain I am using, I dont know if
it works on any other assembler, I keep the directives to a bare
minimum though.

Another side effect of thumb and in particular thumb2 is that ARM
decided to change their syntax in subtle ways to come up with a unified
Expand Down Expand Up @@ -2210,16 +2218,16 @@ instructions or at least until I tell you otherwise. the .thumb
directive is me telling the assembler otherwise. Start assembling
using 16 bit thumb instructions. Yes the bl is actually two separate
16 bit instructions and are documented by ARM as such, but always shown
as a pair in disassembly.
as a pair in disassembly. It is not a 32 bit instruction.

The .thumb_func is used to tell the assembler that the label
that follows is branch destination for thumb code, when you see this
label set the lsbit so that I dont have to play any games to switch
or stay in the right mode. You can see that the thumbstart label
is at address 0x8010, but the thumb_start add is 0x8011, the thumbstart
is at address 0x8010, but the thumbstart_add is 0x8011, the thumbstart
address with the lsbit set, so that when it hits the bx instruction
it tells the processor that we want to be in thumb mode. Note that
bx is used even if you are staying in the same mode, that is the key
bx can be used even if you are staying in the same mode, that is the key
to it, if you have used the proper address you dont care what
mode you are branching to. You can write code that calls functions
and the code making the call can be thumb mode and the code you are
Expand Down Expand Up @@ -2385,16 +2393,17 @@ address 0x8024, which being a trampoline to bounce off of, that instruction
bounces us back to 0x8018 which is the ARM instruction we wanted
to get to. this is all good, this code will run properly.


You may or may not know that compilers for a processor follow a "calling
convention" or binary interface or whatever term you like. It is a set
of rules for generating the code for a function so that you can have
functions call functions call functions and any function can
return values and the code generated will all work without having to
have some secret knowledge into the code for each function calling it.
conform to the calling convention and the code will all work together.
Conform to the calling convention and the code will all work together.
Now the conventions are not hard and fast rules any more than assembly
language is a standard for any particular processor. these things
change from time to time in some cases. For the arm, in general across
language is a standard for any particular processor. These things
change from time to time in some cases. For the ARM, in general across
the compilers I have used the first four registers r0,r1,r2,r3 are
used for passing the first up to 16 bytes worth of parameters, r0 is
used for returning things, etc. I find it surprising how often
Expand Down Expand Up @@ -2424,7 +2433,7 @@ Disassembly of section .text:
So what did I just figure out? Well if I had that function in C and
used that compiler and linked in that object code it would work with
other code created by that compiler, so that object code must follow
the calling convention. what I figured out is from that trivial experiment
the calling convention. What I figured out is from that trivial experiment
is that if I want to make a function in assembly code that uses two
inputs and one output (unsigned 32 bits each) then the first parameter,
a in this case, is passed in r0, the second is passed in r1, and the
Expand All @@ -2439,14 +2448,15 @@ Disassembly of section .text:
4: 44 00 48 00 l.jr r9
8: e1 64 18 00 l.add r11,r4,r3

Call me twisted an evil toward you but, what I see here is that
the first parameter is passed in register r3, the second parameter
This is not ARM but some completely different instruction set, and the
compiler for it has a different calling convention. What I see here is
that the first parameter is passed in register r3, the second parameter
is passed in r4 and the return value goes back in r11. and it just
so happens that the link register is r9.

Yes, it is true that I have not yet figured out what registers
I can modify without preserving them and what registers I have to
preserve, etc, etc. You can figure that out with these simple experiements
preserve, etc, etc. You can figure that out with these simple experiments
with practice. Because sometimes you may think you have found the
docment describing the calling convention only to find you have not.
And as far as preservation, if in doubt preserve everything but the
Expand All @@ -2455,8 +2465,8 @@ return registers...
So if you have looked at my work you see that I prefer to perform
singular memory accesses using hand written assembly routines like
PUT32 and GET32. Not going to say why here and now, I have mentioned
it elsewhere and it doesnt matter for this discussion. Moving on, lets
do a quick thumb experiment:
it elsewhere and it doesnt matter for this discussion. Lets accept
it and move on to use it, a quick thumb experiment:


baremetal > arm-none-eabi-gcc -mthumb -O2 -c fun.c -o fun.o
Expand Down Expand Up @@ -2567,12 +2577,12 @@ Disassembly of section .text:

So we start in arm, use 0x8011 to swich to thumb mode at address 0x8010
trampoline off to get to 0x801C entering notmain in ARM mode. and we
branch link to another trampoline. this one is not complicated as
we did this ourselves right after _start. load a register with
branch link to another trampoline. This one is not complicated as
we did this ourselves right after _start. Load a register with
the address orred with one. 0x8017 fed to bx means switch to thumb
mode and branch to 0x8016 which is our put32 in thumb mode.
mode and branch to 0x8016 which is our PUT32 in thumb mode.

lets go the other way, put32 in ARM mode called from thumb code
lets go the other way, PUT32 in ARM mode called from thumb code


baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
Expand Down Expand Up @@ -2620,7 +2630,7 @@ Disassembly of section .text:
And we did it, this code is broken and will not work. Can you see
the problem? PUT32 is in ARM mode at address 0x8010. Notmain is
thumb code. You cannot use a branch link to get to ARM mode from
thumb mode you have to use bx (or blx). the bl 0x8010 will start
thumb mode you have to use bx (or blx). The bl 0x8010 will start
executing the code at 0x8010 as if it were thumb instructions, and
you might get lucky in this case and survive long enogh to run
into the thumbstart code which in this case puts you right back into
Expand All @@ -2630,7 +2640,7 @@ and will cause an undefined instruction exception which if you bothered
to make an exception handler for you might start to see why the
code doesnt work.

it was very easy to fall into this trap, and very very hard to find
It was very easy to fall into this trap, and very very hard to find
out where and why the failure is until you have lived the pain or been
shown where to look. Even with me showing you where to look you may
still end up spending hours or days on this. But as you do know
Expand Down

0 comments on commit 9adfc2d

Please sign in to comment.