Skip to content

Latest commit

 

History

History
106 lines (69 loc) · 21.5 KB

Ref_Robust.md

File metadata and controls

106 lines (69 loc) · 21.5 KB

Robust/production checklist

I hear of a lot of people deploying devices they designed at a scale of hundreds of devices who have made clear that they have not due diligence w/regards to design of the firmware and/or hardware and have barely tested anything,

I've tried not to stray too far from the main subject of DxCore - without too much success because a lot of things that people need to do and aren't doing are not specific to DxCore...

This is a work in progress, and is not exhaustive. It is however a starting point, and it covers a lot of things that are an absolute 100% must do for writing effective embedded code and deploying it successfully.

Software design

These parts have multiple features for enhanced robustness. You should like, use themm..... (hardly anyone does)

Use BOD if you can spare the power for sampled mode, at least while the chip is awake

See the datasheet for your part to check the exact power consumption and compare to your power budget. If you can - use it! Without BOD, there is nothing to keep the chip from continuing to try (and fail) to operate despite insufficient power supply voltage to do so.

BOD will keep the chip in reset when you it knows the voltage is too low to keep the chip running.

A bootloader is likely the right approach if you want end users to be uploading updates

There's a python library to upload via STK500 (the protocol we use) which would allow you to make an appropriate updater for your device

That is the only case in which it makes sense to use Optiboot on a device you are going to be selling (except a development board, but in that case, your users are hopefully able to manage "burn bootloader" if they want it....

Use the WDT whenever you are not sleeping if preventing hangs is a priority

It can be inconvenient to have to keep feeding the dog - but this is a nearly bulletproof way to keep your code from getting hung! Watchdog timer use is very widespread in commercial products. The windowed feature gives you a way to fix the hole in this that existed on classic avrs - the potential for it to get hung in an area where the WDT was being repeatedly reset, but it had lost the ability

Read the new reset reference, and include the "recommended" override of init_reset_flags

This is described in the Reset Reference, and provides strong defense against "dirty resets" regardless of where the come from. The fact that virtually nobody in Arduino-land looks at the reset flags, clears them and takes corrective action when it looks during startup and doesn't find any accounts for a great many of the times a part gets into a "bad state" after an adverse event, and never comes back until you pwer-cycles it.

That said, if you ever experience dirty resets that get sorted out by that, you should seek to determine what is causing then. A corrupted pointer might point anywhere; If the flash is mostly empty, it would usually end up after the app, skid along the 0xFF's until hitting the reset vector or the first vector (which isn't defined, so it's badisr which is a dirty jump to 0x0000), but it might not point to after the flash. Try to figure out what trashed things that corrupted it because that shouldn't be happening at all! Any system that is experiencing unexpected resets (WDT timeouts or dirty resets) should be viewed with suspicion until the cause is better understood. ("It resets sometimes when the air conditioner goes on because this second rate power supply doesn't deal with the transients very well" might very well be acceptable, at least if you're not shipping it with that power supply but "It resets every day or so, but the code handles it, so no problem" never is - until you know what it's doing.

Disable digital input buffers of unused pins, or pull them up, or set them output. Domn't let them float

Microchip advises that minimum power consumption requires disabling the digital input buffer on all floating pins. Otherwise, the pins must not be allowed to float. Pins that are used as analog inputs should also have input disabled

Never allow compile warnings to go univestigated, more often than not they indicate a defect

I think I've made it so you can't disable them in the Arduino IDE now. compile warnings are a HUGE red flag. At best they represent haphazard programming practices. Frequently they are latent (or active) bugs - what sets them apart is that unlike most bugs, these are carrying big neon signs telling you where they are.

There's a CRC check feature on these parts

It can be set to automaticallty check the flash and compare it to a checksum stored at the end. We don't support this directly (or generate a hex with the CRC), but literally all you need to do to force it on once you calculate the CRC and put it at the end of the file you upload is to set the two high bits of SYSCGF0 to something other than 11. See the datasheet for details). If the firmware is misuploaded, or if it's corrupted by cosmic rays or god knows what else, the device will sit there bootlooping, rather than working well enough that the user believes it it's fine until it suddenly fails catastrophically at some crucial momemnt. Depending on the type of product and intended use case, this could be either a foolish obstacle that you don't want to use, or a necessary countermeasure that you absolutely must use.

Hardware design

Make sure I2C always has appropriate pullups

Internal pullups are not sufficient. See the description of UsePulluops() in the Wire Documentation.

Read the errarta FIRST

As in, before you're tripping over them and it's gating development or inconveniencing users.

Alwaus hope for fixes to them, but don't depeend on fixes or expect them, because wwe have no idea when those fixes will be coming. Sometimes that future die revision never happens. Once you know what the errata are, you can plan around it and ensure that they don't become showstoppers when you discover them later plan around these issues, and also be aware of the fact that that behavior will change, so you also should not depend on the bug being there forever.

Capacitors are good

Don't skimp on the decoupling caps. 1x 0.1uF per pair of power and ground. 1uF on the MVIO pin of parts with that feature (other side to ground), and FFS (for Faraday's Sake) be sure to include some "board level" decoupling. The datasheet specifies a minimum of 1uF (for just the chip), in practice, this should really be 4.7 to 10uF, assuming the same supply doesn't have any large loads attached to it. If it does, bigger caps. USB has a limit on how much it allows - only 10uF =before you need inrush current limiting. If it isn't powered by USB though, you can be a lot more liberal with the capacitors. I put 47 uF on the PCB I use for controlling LEDs, in addition to several times that along the strings of lights. Caps under 10uF should probably be ceramic. Larger caps you can use aluminum electrolytics, though it's recommended to still have a ceramic cap in the 1-10uF range. In the most demanding designs, it is very common to parallel multiple caps of different values. Many chips spec a 0.1uF and a 0.01uF decoupling cap pair on each power pin, and those that don't usually at least "recommend" it. These are especially helpful for dealing with high frequency (ie, sudden sharp transitions) noise. The lower value caps are better able to supporess high frequency noise.

Don't use overclocked parts in products

The Dx-series parts overclock fantastically well. That should not be an invitation to use them like this in hardware you ship out to users who aren't embedded systems experts. UUnless your customers fully understand what they're getting themselves into (ie, you're selling dev boards to embedded hardware people, in which case the standards are lower anyway). I will have overclocked Azduino Nano DB boards for sale. But I will make very clear that that's an overclock, and I wouldn't if the bootloader ran from the crystal (that's why I didn't do it on certain classic AVRs where I wanted to), or if there was anything to keep them from uploading code that rab from the internal oscillator instead. In all other cases, it should be avoided if at all possibkle. If it's not possible to avoid, you need to take the testing to a higher level than usual, and keep the overclock small. Would I do 32 if I really needed it? Sure, okay, if I had to (I use that most of the time personally, generally from internal osc) Would I do 40 on anything other than a development board? Hell no! In any event if this is a possibility be sure to use extended temperatire parts -neither temperature grade will work across the full range when overclocked, but if you start with a wider rated operating range at 24 MHz, it stands to reason that you'll have a wider range at 32 MHz as well.

Check the thickness of the wire you're using

I don't mean reading the number on the insulation anyone can print "22 AWG" on wire, but that doesn't make it 22 AWG. Strip it, measure the bundle of wire, separate it to strands, count them, and measure one strand as accurately as you can (need to measure to at least 0.01 mm) and calculate what gauge it actually is. Some dishonest Chinese manufacturers (and probably some in other countries, but China is by far the most prolific producer of everything) frequently use wire 2-6 AWG smaller than the number they print on the insulation. In order to keep people from immediately noticing (because it's too thin) they make up the difference with extra thick insulation. Cheap hookup wire (UL1007) on AliExpress, for example, is usually 3-4 AWG too thin. Some pre-wired JST-SM connectors I've gotten had what appears to be 32-gauge wire marked 24. I didn't discover this until heat from the load through the too-small wires had resulted in it heating up and turning the red insulation black; when investigating the subsequent failure, I was mystefied as to why there were two black wires instead of a black and a red...

Apparently it's not just wire being exported that's like this - I hear stories of assembled equipment also made with the undersized wire, and failing as a result. The impression I get is that some of the manufacturers using it don't realize it's undersized (likely they are aware that undersized wire exists, were assured that the stuff they were getting was not, and maybe even measured it back when they started buying from that supplier...)

Do not use dupont line in production devices

At least not the cheap crap normally sold as that. The cheao dupont line is for prototyping only, and barely suitable for that. Real DuPont line with real DuPont terminals is fine (that division is now run by Amphenol, the line is called MiniPV), as long as you make sure you get ones made by Amphenol obtained through a reputable western supply house. They should cost a minimum of 10 cents each in quantiy for no-frills terminals, and typically more like 20-30 for decent gold plated ones. Use the highest spring tension ones unless you have more than 20 pins ins a connector, and make sure that you or your manufacturing partner is doing a good job crimping them on (go try the pull test on some cheap dupont line - most of it doesn't even pass the pull test) If you look at the terminal of real dupont connectors, they are obviously not the same design as the cheap ones.

If any wires are soldered to a PCB

You must provide some sort of strain relief. If the wires are held immobile (for example, the PCB might be encased in glue-lined shrink tube). that is sufficient for non-critical applications. In low cost consumer devices, hot glueing wires to the PCB they're soldered to is very common. This is why. When wires are soldered, some solder wicks up between the strand, and where that ridgid soldered to itself wire meets the unsoldered flexible strands, there is a weak point and it can snap very easily, even with careful handling.

Murphy's Law applied to Connectors

Any connector that can be plugged in backwards will be. Try to design the pinout of any non-polarized connectors so that it's safe for the device. Consider the 6-pin classic ISP connector. Notice how Reset and Vcc are opposite corners? If it is rotated 180 degrees, Vcc will be on reset, but that's fine, reset can take up to 12v because it's used for HV programming. And reset is the only pin driven high when idle, but it will be on Vcc, which will be unharmed (if not functional) like that. Or my 3-pin UPDI header. Ground in the middle, Vcc and UPDI on the sides. Plugging it in backwards connects power to UPDI and UPDI to power. There's almost always a current limiting resistor somewhere on the UPDI line (there should be!), and the modern devices are much more forgiving of current through the clamp diodes than classic AVRs are, often being rated for max of 15-20mA, and on the tinyAVRs, it's also the HV pin (otherwise, it has protection/clamp diodes, but that's what the resistor limits the current through). Don't use two connectors of the same type, same gender, and same number of pins on one device if they're not interchangible. It is worth using a larger connector and leaving one slot empty. You can cut the pin off the socket and fill the eopty hole in the housing with glue (the plastic polarizing pins, if you can find them, are a lot more graceful, but they seem to be scarcer than hens teeth).

If you can't keep people from reversing a connector, and you can't make it harmless if they do so through pinout choice, or don't want to because it would require too many other compromises, how about a diode in series with the problem wire (likely power)?

The Second law of Connectors

You will have one or more connectors that carry power. Connectors get abused over time, and foreign objects can get into them. You do not want them to cause a short when thety do. There are many connectors that manage to keep both inacessable (for example, USB). With other connectors, generally The side which supplies power should always be the female - it is almost always easier for the male terminals to be shorted by contact with the environment than female terminals. Also, the convention is that the negative side is ground, and is considered less hazsrdous. That is whty barrrel jacks are almost always center positive.

Use thicker traces for power

Both compared to data traces, and compared to what calculations show is required. The board house doesn't charge extra for making the traces wider, why do people design so many things as if it does?

Don't use too-good-to-be-true batteries

Too-good-to-be-true is hard to judge sometimes. With batteries, it's not. While battery caopacity has always been one of the most dishonest specs in electronics, even from reputable companies being misleading if they can do so without explicitly lying. For example they quote the capacity at some tiny load as their headline spec, so the fact that as it discharges the internal resistance skyrockets is not as visible, or if the self discharge is horrendous, they'll find the load to test that will maximnize the apparent capacity. But vendors who ship batteries from foreign countries don't need to get sophisticated about it. They just make numbers up, while using low quality, used, or barely functioning product. The top of the line Japanese and Western manufacturers have as of 2021 been squeezing maybe 3500 mAh out of a 18650-size Lithium-based rechargeable. There are large number of third rate castoff and used batteries sold online, rewrapped ("Ultrafire" is the usual unintentionally ironic branding - and no, it's not one company using that name...) with capacities as high as 9900 mAh! Generally, if the capacity is higher than what Panasonic's best in that size is, the actual capacity is inversely proprtional to the claim. While many of them are just old laptop batteries rewrapped and resold, the most extreme examples consist of a hollow metal housing containing an itty bitty LiPo battery. 9900mAh on the wrap, 90 mAh battery inside. Unlike the case of the wire, there is no doubt that the sellers are very aware that they're selling crap. I think the outrageous capacities may be intended to make anyone who knows what they are doing stay away.

Speaking of batteries

If tyou have a stack of batteries and you need to charge them in place, you must use a balancing charger if they are in series. There are turnkey solutions available for affordable prices; this is a hard area of design because doing it wrong can start a fire, so it's best to leave that design work to someone who specializes in that. If they are in parallel, you can charge them only if the charging current and load current are both small compared to the maximum rated load of the batteries.

MOSFET specs: Id at Tc

Mant MOSFET manufacturers, for mosfets with a heat spreading plate (ex, TO-220), specify as their headline current spec a maximum confinuous drain current "At Tc = 25C" - numbers which are invariably very impressive. They are also utterly useless. It takes some thought to realize what that actually means: Id at Tc is the maximum drain current while the case is held at a constant temperature. Put another way, if a magic heatsink that kept the FET case at 25C on the outside was being used, with any more than this much current, even that wouldn't be able to keep it from overheating internally and failing, because of the finite thermal conductivity of the package. Unless you have ome of those magic heatsinks (can you tell me where to buy some if you do? I need one for my perpetual motion machine), it is not a useful value for specifting a MOSFET. The maximum drain current at Ta, on the other hand is much closer to reality (in the event that it's specified - it useually is for small fets, but often not for big ones). Be sure to read the footnotes for the conditions they used. They specify the details of the mounting, because it relies on using the PCB as heatsink.

I wrote a document going into great depth about MOSFETs (since I sell them mounted on breakout boards in my Tindie store) - it had become clear from the questiosns I was getting that.... people didn't understand what the limitations were. That document covers them in great detail.

Automotive 12v power is notoriously noisy

You need far more filtering than usual to get a safe 5v from automotive 12V.

Testing

Does anyone test their designs credibly these days? A lot of the embedded devices I've been around are like "developer tested" software. That is, the testing, if it happened at all, didn't go beyond verifying that the final bug they fixed didn't occur (under the same conditions at least) and a sanity check before it was declared working. It generally begins in ernest only when several users have made the same complaint and the designer realizes there is a problem and has not the faintest clue what it is or how to fix it - at least within startup-scale companies. One of the advantages of a larger slower moving company is that they have the resources to employ people whose job it is to test stuff, and they won't release stuff that hasn't been given basic testing!

Cover the corner cases

Okay, so it works under ideal conditions. Test at the ends of the range you expect it to encounter. For example, if it's powered by batteries, test it with dying battery. Test it with a fully chanrged one. if it is supposed to charge the battery too, make sure it can drain the battery such that it stops functioning due to low voltage, then connect it to the charger, and make sure it comes back up cleanly. A particular concern is when the battery is "dead" but power is still reaching components (did you forget to include an undervoltage lockout or overdischarge protection? Most battery chemistrys need something like that). Will it come back up cleanly when power is connected to charge the battery? Or will one of the parts not get reset because the voltage didn't fall below what triggers it's power on reset, but did fall below the voltage that it needs to save state in volatile memory. Remember that bit about brownout detect above!

Test the extremes of the temperature range

Put it in the freezer if it's going to be used outside in the winter - some crystals stop working if they get too cold. Run at a higher temoperature than you imagine likely.

If any part emits heat, test on a hot day

If it doesn't work, you need to either solve the technical problem, or clearly document the limitations. In warm conditions, are any heatsinks you're using sufficient? How close does the temperature get to the ratings?

Shake, poke, drop

Everything gets abused. Make sure your physical assembly is sufficiently robust. Shake it while it's running and make sure it doesn't reset (unless that's what it's designed to do, or is too heavy to shake). Poke and prod it while it's running (assuming it doesn't have dangerous voltages exposed, which it really ought not to have!). Hold it by its wires, yank out its cable (Micro USB connectors should be the kind with the through holes, which prevent a yank from pulling the connector off the PCB, unless you have a case around it to keep the force from being handled mainly by the connector). The crap connectors that many developemnt boards have used are so bad they've given MicroUSB a bad rep in some circles - including for a time Arduino circles due to the connector on early Arduino Micros. Pick them up and drop them from as high a height as is plausible. Your customers may use the products while frazzled and hurried, at awkward physical angles, and while intoxicated - sometimes all of the above. If your product is unavoidably fragile, warn people (but don't cry wolf if it's not).