Skip to content

Latest commit

 

History

History
43 lines (31 loc) · 15 KB

AVR_Overclocking.md

File metadata and controls

43 lines (31 loc) · 15 KB

Modern AVRs and overclocking

This is written from the context of modern AVRs however much of it applies to classic AVRs as well. This document was excised from a planned response to an issue where it was largely off topic, but contained enough information to be of value, hence needed to be retained somewhere

AVRs are well known to (at least at room temperature) overclock extremely well, likely because they are designed to function at temperatures of up to 125C

Typically at their maximum spec'ed operating speed. This means they hacve oodles of headroom at more moderate temperatures typical of non-industrial, non-automotive applications.

Overclocking Dx-series

On AVR Dx-series parts, is achieved two ways:

  • Changing the MCLKCTRL register of the CLKCTRL. On the Dx-series, above 4 MHz, the granularity of this is 4 MHz, going from 4, 8, 12, 16, 20, and 24 MHz according the to spec. In reality, there are two "secret" speeds not mentioned in the datasheet. Setting the bitfield to a value 1 or 2 higher than 24 MHz results in operating frequency of 28 or 32 MHz respectively. Most parts can do either of these no problem.
  • For larger overclocks, the tuning is of little use for overclocking, due to the limited range. However, they can all take an external clock, and DD and DB parts can also use a crystal. I am aware of crystals working reliably on most E-spec (extended temp) parts at 40 or even 48 MHz! External clocks always work better and always have for overclocking AVRs, and 48 MHz has been found stable on at least one E-spec DB (but not on an I-spec one).

Overclocking tinyAVR parts

On tinyAVR, the internal oscillator is incredibly flexible, and is the primary method used to overclock: 1-series parts will do 24-25 no problem and most will do 30, though they fall apart above that. With solid supply rails at an external clock, though, they can be pushed to 32. 2-series parts will usually work fine on internal oscillator at 32, and collapse in the mid-30's just about at the top of the calibration range for the oscillator. This epic compliance of the internal oscillator makes these MUCH more fun to overclock :-) The datasheet makes a point of warning users not to change calibration by large amounts all at once. This is not new. However, study of the arcane code written by wizards of that sort has indicated that there is a "trick" to get around this, which I use succcessfully: Simply follow the write to the cal register immediately with a NOP. The source of voodoo practice is the widely used digispark-alikes, which like the run at 16.5 (classic AVRs, 8.0 MHz nominal, passed through a PLL to multiply by 8 and divide by 4, tuned upwards for a base F_CPU of 8.25, multiplied by 2 (net) yeilding 16.5, which is better for USB on the marginal oscillator of classic AVRs).

The theoretical groundings of that practice are convincing provided that the assumptions it rests upon are correct (though this is not known to be the case) The most frequent case of incorrect execution is 1 bits in the result being set to 0 instead; this was noticed immediately when doing overclocking trials on tinyAVRs (random garbage values can also happen too, though I can't rule out the possibility that those were from an intermediate experiencing 1->0 errors;), the voodoo solution appears to be supported. If the "no-zero-to-one-errors" conjecture is true, and if the assumption that abrupt clock change induced errors are similar to overclock induced errors is valid, that would make the use of a NOP (or possibly two nops - (not a _NOP2() which is a rjmp .+0, a great way to get a 2 cycle nop for a single instruction word and commonly used in cyclecounting based time critical code, but _NOP(); _NOP();, which is actually a pair of 0x0000 nop instructions) clearly the correct thing to do to eliminate improper execution after a large change to the clock speed - not only does it not do anything, the fact that the opcode is 0x0000 means that even if the instruction fetch glitches, if it can only clear 1 bits to 0, then it cannot transform the nop into amything else. Howevwe if these assumptions are not valid, then conclusion is not supported either

More about how crazy the tinyAVR internal osc is, and what tools could make overclocking more useful

Modern tinyAVRs are more interesting to overclock than Dx (on Dx. the cal is pretty worthless. Granularity is too large to really trim the oscillator accurately, hence autotune is of limited value, and there are so few steps, that they can't swing the speed far enough to expose new practical clock speeds). Of course you can overclock to 32 MHz just by just setting a value 2 higher than the value for 24 MHz, and that typically works at room temp. (that's as far as it goes - after that the last 4 just repeat), and I've got parts that run at 48 MHz, fully twice the spec!)). On tinies, though, he internal oscillator is nuts: 64 steps from 4/8ths nominal up to 13/8ths nominal on the 1-series, and 128 steps from around 5/8ths through 15/8ths of nominal on 2-series.

I have seen only one part that would reproducibly run the cal routine all the way up to the maximum, with 20 MHz selected while remaining stable enough to run my tuning sketch without apparent errors - (a couple would occasionally make it through - but have gotten some errors, which means they could not be used at this speed) this part also had an unusually slow internal oscillator, such that it simply couldn't reach high enough speeds to malfunction at room temperature). The transition from no apparent errors to visible errors happening very often happens over a change of less than 1 MHz, but that still means that these parts all have several cal settings at which they are struggling to various extents. That would be an interesting laboratory for exploring the behavior of AVRs that fail to excecute instructions correctely due to exceeded operating conditions. IMO an ideal investigation would need to know which instructions were most sensitive, I'm imagining like, using a part known to be in the struggling regime at a certain cal, T, and V, starting at normal speed, then running a test functions in asm for each instruction, Part of this could be proceedurally generated and should be, as a good test would be quite long, where you'd go to inline assembly push everything, ldi a start value into some registers, ldi the new speed and the CCP value, then out ccp, sts new cal, nop nop, then a long sequence of the same instruction with minor changes.

For example, starting with 4 bytes of data to mangle, you then have a sequence of subi's. Declare that there shalt be 256 subis, with all possible immediate values. each data byte could be the destination of 64 of those, (probably want to distribute randomly), with the order of the immediate values randomized as well. Then, turn the CPU speed back down, you should be running in clean territory now. so store than 4 byte value to memory, restore all the registers - but pop them into r0, compare with the value in their home register, and then mov them there. If any register you didn't use is changed, that's something to investigate. For example, if you targeted registers 20, 21, 22, 23, and then you look at what wound up in those vs what did at in-spec clock speed, and found that r20 was different, and your register scan revealed that r16 was wrongly changed, and you then see that the two are wrong by the same amount, and confirm that yes that amount is one of the numbers you tried to subi from r20, you would suspect that the 1-bit in the register section of the subi opcode was misread as a 0, and the operation otherwise carried out successfully. On the other hand if you saw no damage to other registers, saw evidence of only a single error, but the value was not one of the ones subtracted from that byte, nor was it the sum of any two (note - converse not true for a single sample, as chance of false positive is pretty high), you would instead suspect that the immediate value or the result itself was what got mangled. If the error was a power of two off, you would strongly suspect that. Multiple runs under the same condition using several randomly generated sequences, each multiple times, would reveal if the errors were distributed randomly across the opcode, or if there was a correlation between the opcode for that instruction (that is, between the operands) and the chance of error or not (both are plausible a priori). This could be repeated for each instruction (procedural generation of the asm, like I said, is a must). At the end you would be in a position to determine:

  1. What instructions are the most likely to fail to execute?
  2. What instruction operands are most likely to be misinterpreted? (or that it does not matter) Then if the same process were performed on a few specimens, you'd learn the most important things
  3. Whether the same instructions are the weakest ones on all devices, or vary part-to-part? (I suspect it is the same instructions).
  4. Whether that can be put into practice to develop a single short routine composed of a large number of sensitive isns, which give a trustworthy answer to the question "Is the chip stable at the current operating conditions?"

With such a tool in hand, an expanse of uncharted territory is ripe for exploration, as you could run up the clock speed like a tuning sketch, and recognize when it had started struggling and make a 3-dimensional plot of that Fmax temperature and voltage. Finally, if enough data was taken at one of the "speed grade" voltages for varying temperature, you would likely get a plot that you could fit a curve to, extrapolate to the manufacturer spec max speed, thus revealing how much headroom they designed for, and from there you could likely synthesize a h(V, T, F_CPU) which would indicate how comfortable the chip was running in those circumstances, ie, for all V, T you would know Fmax, ie, Fmax(V,T), so h(V, T, F_CPU), the headroom, would be 1-F_CPU/F_CPUmax(V,T). Now Fmax(V,T) of course would have to have specimen-dependant constants. The number of such constants, (assuming they're not correllated exactly), would be the number of points on the V,T plane you'd need to measure Fmax at for any given specimen in order to predict its headroom over all conditions

Are temperature grades still real?

Since the AVR DA-series was released, all newly released parts have not had the temperature grade marked on the parts. This is of course very unfortunate (Microchip support will tell you if you tell them the lot number), since it opens the door towards easier misconduct by turnkey PCBA manufacturers (often based in China, where unethical cost cutting is enthusiastically embraced). If my application requires operation at 125C, and I wanted to be certain that the products were using the parts I specified (for example to ensure that I couldn't be held liable for negligance), I would now have to write down all the almost-unreadable lot numbers, send them to Microchip support, and have them tell me what temperature spec they are. Assuming Microchip acts honestly when it answers these (which it likely does if it's putting things in writing, since they don't want to be held liable themselves, and they have every incentive to help call out the fraudsters who are pocketing profits that could be theirs), this is still a lot more work than just verifying that the letter is the correct one. The letters could still be faked (though E and I are less effective in that regards compared to the old Atmel system, which used F, N and U for the temperature grades, since you can turn an I (as written on parts) into an E by just adding 3 short lines, while turning an N or U into an F requires erasing lines).

Why was this change made? Did they:

  1. Want to support fraudulant activity? To what end? This seems implausible, because everyone except the fraudsters loses
  2. Dislike their customers and want to make their lives harder? To what end? This seems contrary to their intererests.
  3. Mindlessly follow an inflexible product marking doctrine? This could be excluded or supported by looking at other Microchip products and seeing if they marked different grades. This is plausible
  4. Want to continue charging some customers more for the same product, while eliminating the hassle of having to track twice as many part numbers until the very end of the process when the parts were listed as being in stock? Then, they could declare each lot to be E or I spec only when they were put up for sale. That would make it easy to change "production" in response to customer demand. It's also notable that the VAO qualified parts are - in many part families - only produced on request. What is special about VAO? Well, the QFN's have wettable flanks, but that just means that they milled away a little bit of the package. The differences in the VAO parts seem to be negligible, other than a nebulous "different qualification process"... Wouldn't it be much easier to do that if they just added that post-process milling step to some normal chips when someone asked for VAO parts?

It thus stands to reason that, if Microchip's process technology had bested Atmel's by a sufficient extent, they might be producing only E-spec parts without having to try. Since they'd still want to be able to charge more for as many parts as they could, without the prices scaring off the cost-sensitive customers, they might want to continue selling parts in two different grades. Yet each lot is homogenous in terms of temperature grade. How could we tell those apart?

  • Difference in price between the temperature grades which varied significantly between different parts would suggest the price difference was not due to process differences (which would result in PE = PI * YI/YE) where Y is the yield of the two parts, which for a given process node could in turn be approximated by YprocessA where A is the normalized die size, and Yprocess is the yield for a die of that size, and depends on the temeperature grade. So if we define U = YI process/YE process, the first equality becomes PE=PI * UA. That is, there should be a very clear relationship - for parts made on the same process - between the premium as a fraction of price and the size of the die. And if the processes were not identical, PE=PI * UA * C where C is a constant equal to the ratio between the cost of the production process. Since the flash and ram is a large portion of the size of the die, we would then expect that the smallest flash versions would never have a larger premium on high spec variant than the large flash versions: when A is smaller, UA * C is closer to unity. With a sack of files (the tool, not the digital kind) and sacrificial chips, one could experimentally characterize the normalized die area for a given chip design across the flash sizes. That would allow you to make predictions for normalized premium on the E/F spec parts as a function of U. If we assume that U >= 1, the bound for the extreme of U = 1 would be constant. If U > 1, the premium would get larger as the die got larger - and we could say somethin