# Section VII. ROBOTIC SYSTEMS IN PRACTICE

# Chapter 27. Systems Engineering

Once you graduate from university and start in the robotics workforce, you will be exposed to a massively different world than you've encountered in your classes, educational competitions like FIRST robotics, team projects in student organizations, and even research projects at well-reknowned labs. You may be thinking, "Well, I've worked on team class projects, so I know pretty much how this will go.  Some people pull their weight, some slack off, but, in the end, everything will go ok for our final report.  The workforce will be pretty much like that."  (How adorable!  Little do you know...)

Don't underestimate it: the scale of the engineering effort (and its impact) in the enterprise setting will be larger than anything else you have experienced at university, the impact of the "soft skills" like teamwork and communication will be much higher, and the quality standards for your technical contributions will be higher too.  It is not uncommon to feel some culture shock during this transition.  Hopefully you will have had some summer internships doing real R&D to help you prepare for this experience.  Or, you may be lucky enough to participate in one of the few university labs that engages in system engineering at a reasonable scale -- and by "reasonable" I mean 10+ simultaneous developers on one project.  Even if you're a student with a 4.0 GPA, if you can't adapt to the complexities of systems engineering, you might end up a perpetual junior engineer bumbling your way around an organization with no hope for career advancement.

Real-world robotics engineering requires working on large and diverse teams over long periods of time.  A good engineer is grounded in 1) the theory governing component algorithms, 2) system integration and development practices, and 3) effective communication skills to document, justify, and advocate for their work.  Teams of engineers also need managers, and a good manager is grounded in 4) logical and organized thought about the system at appropriate abstraction levels, 5) project management skills to plan, assign, and track development progress, and 6) people skills to motivate the engineering team and convey the vision and progress to upper-level management.  Both classes of employees should also bring a sense of personal investment in the project so that they stay enthusiastic as setbacks are encountered, the project scope changes, and personnel changes occur.  Although a lot of these aspects cannot be taught outside of self-help books, we will be able to provide some degree of training in items 2), 3), 4), and 5) in this book.

This chapter provides a brief overview of theory, processes, project management, and organizational practices of typical robotic systems engineering projects.  We can only scratch the surface of this material, as there have been many wonderful books written about systems engineering, software engineering, organizational strategy, and organizational psychology.  People can be quite opinionated and passionate about these topics, and we lack hard data exploring which methods are more successful than others, so it's best not to delve too deep into any one philosophy.  Nevertheless, this high-level summary should be help the aspiring robotics engineer (and engineering manager) predict the terminology, best practices, and pain points they are expected to encounter in their future career.

## Systems engineering theory

### Abstraction

It is hard to define precisely what a "system" means, but for the most part we can settle on a somewhat vague meaning: a **system** is an artifact composed of multiple interacting **components** that is engineered for a defined **purpose**.  Often (but not always) these components correspond to different physical units, computational devices, or pieces of code. The most critical aspect of this definition is that the components themselves are engineered to produce specified function by interacting with other components within the system.  We can system as a network of components interacting through edges (a system diagram) and reason about operations and information flow at a more abstract level than thinking about the details of how each component is implemented.  At an organizational level, we can also think about projects in terms of a timeline of implementing components, measuring their performance, or replacing old implementations with new ones. 

*******************************************************
![fig:AVPerceptionSystemComponents](figures/systems/av-perception-system-diagram.png)

<div class="figcaption"><a name="fig:AVPerceptionSystemComponents">Figure 1</a>. A system diagram for a hypothetical perception system for an autonomous vehicle.  The rounded boxes denote components of the system, and the boxes denote data that are components' inputs or outputs.  The green boxes denote an interface to the vehicle hardware, the orange boxes are static elements or external to the behavior system, and the blue boxes denote behavior system code.
</div>

*******************************************************

**Abstraction** is the key tool we use in system engineering to manage complexity.  Abstraction is also hammered home in typical computer science curricula due to the complexity of large software projects.  Its purpose is a *cognitive* one: human brains are simply incapable of reasoning holistically about thousands or millions of lines of computer code interacting with controllers, power electronics, motors, and mechanisms.  Instead, for our own benefit we must break the system into smaller components, each of which fulfills a specific function.  These "functions" are our mental model of how each component behaves or *should behave*.   The mechanism by which we achieve abstraction is called **encapsulation**, which means hiding details of the implementation from the external user.  We should not need to know all the details by which a motion planner works in order to use it, e.g., if it uses RRT, PRM, trajectory optimization, etc.  We just need to know the inputs, the outputs, and its expected peformance. 

Note that in a sufficiently complex system, the components are usually also systems themselves, built out of sub-components!  You may ask, why do we choose one level of abstraction over another?  One could define a car as a system of tens of thousands of parts down to the last bolt, but for most purposes that is not as useful of an abstraction as defining a car as a body, frame, engine, wheels, steering, electrical system, and passenger compartment.  Useful for whom?  Well, the company management, engineers, factory workers, parts suppliers, certification agencies, repair shops, and customers would tend to think of different parts of the vehicle that way.  Indeed, the theory, expertise, design, tooling, and operation of each of these components is specialized for their specific function.

As a systems engineer, you may welcome abstraction at times, but at others, you may struggle against it. Some possible pitfalls include:
- Compatibility conflicts
- Incorrect abstractions
- Leaky abstractions
- Overzealous abstractions
- Bad abstractions

Considering again the car example, if you are designing a sleek and streamlined body with decorative elements that you know will sell to customers, you may run into a struggle with the engine designer who can no longer fit a sufficiently beefy engine to give the customers the horsepower they desire.  This is a *compatibility conflict* which needs clever engineering or strong management to resolve.  (If you are Ferrari, your boss tells you to quiet down and design the body around the engine!) 

An *incorrect abstraction* is one in which one's mental model of the system may not be satisfied by the implementation. As a real-world example, my lab struggled with an issue for several days during development for the Amazon Picking Challenge.  We found that when we were testing at certain times of the day, our robot would start acting strange and the picking performance would drop precipitously.  Then we'd test again, and everything would work fine.  The culprit?  The Intel RealSense cameras we had at the time would normally report RGBD data at 30 frames per second (fps) in good lighting, but then silently drop to 15 fps in poor lighting.  Because the students on the team would work long into the night, they set up the perception system to work appropriately with the lower frame rate.  But at the higher frame rate, some network buffers were being filled with too RGBD images, and so the perception system was processing stale data from multiple seconds in the past.  The issue here was that our working mental model of the camera was a device that provided data at a consistent rate, and this abstraction was not incorrect. Perhaps we should have read the documentation better or constructed more thorough [unit tests](#unit-and-system-testing)! 

*Leaky abstractions* are a similar concept that can cause all sorts of frustration. In the Amazon Picking Challenge, the variable frame rate of the camera caused *side-effects* that we did not account for, as we did not carefully design the perception system in mind with all the details of the ROS communication system.  This is because the publish-subscribe abstraction used by ROS is, coarsely speaking, "a publisher sends a message and immediately the subscriber(s) get it".  In order to find the issue the developer needs to know more about networking than was promised -- specifically ROS queues and the slow receiver problem.  Once we found the culprit, the fix was easy (shortening the queues to only provide the latest data), but placing blame on the right component was tricky.   (We'll see more about how to [assign blame to components later](#analysis).)

An *overzealous abstraction* occurs when a component is designed to encapsulate too much functionality.  Developers of other components would like to interact with a finer level of control over its internal functions.  For example, developers of industrial robots often provide a "go-to" subroutine that does not terminate until the robot arrives at its destination (or encounters a fault).  This would not be acceptable if you wished to build a collision avoidance system that could stop the robot mid-motion if an obstacle were detected in the robot's path.  A similar concept is the *bad abstraction*, in which a component tries to do a collection of things whose grouping is poorly rationalized or cognitively complex.  Bad abstractions often come from a combination of overzealous encapsulation and changing requirements. As new use cases arise, the developer adds more and more configuration parameters to customize how the component functions, leading to an unwieldy, confusing set of inputs.  

An aspect of abstraction that is somewhat unique to robotics is that many **upstream components must model downstream components** in order to function properly. For example, state estimators, object trackers, and planners need a dynamics model to predict how the system moves.  If the movement mechanisms or low-level controllers grow in complexity, the dynamics become more complex, necessitating more complex models.  Similarly, increased sensor capabilities usually lead to greater complexity in observation models used in state estimation or [active sensing](PlanningWithDynamicsAndUncertainty.ipynb#active-sensing).  For this reason, as we seek to improve component performance, we usually pay the price in terms of model complexity.


### System diagrams and system specifications

The most important part of organizing a team of engineers is to build a shared *mental model* of what that function that system should perform, what components the system will consist of, and how those components will operate.  There are many ways to build such mental models, listed in order of formality:

1. *Background knowledge* (a.k.a. book learning): all the topics studied in courses, books, and academic papers.  After you have graduated from a robotics program, you should generally know the functions of forward and inverse kinematics, motion planning, trajectory optimization, Kalman filters, deep neural networks, etc.
2. *Experiential knowledge*: information that an individual gathers from interacting with the system and its components.
3. *Community knowledge*: information scraped from web forums.  This includes the use of ChatGPT and other AI tools, which are trained on such community  information. (Such information is often of dubious validity)
4. *Tribal knowledge* (a.k.a. institutional memory): information passed between team members and held within individuals' memory. 
5. *Textual documentation*: design documents, code comments and documentation, technical manuals, presentations shared amongst the organization.
6. *System diagrams*: control flow diagrams, computation graphs, and state machines.
7. *System specifications*: application programming interfaces (APIs), interface definition languages (IDLs), behavior trees (BTs), and modeling languages, e.g., Universal Modeling Language (UML).

As a general rule, information should **flow down** this list toward documentation, diagrams, and specifications.  As the formality of such information grows, it becomes more precise, interpretable, widely disseminated, and longer-lasting.  The tradeoff is that turning information from mental information to formal knowledge takes time and effort.  Keeping formal knowledge up-to-date is also more time-consuming.

Control flow diagrams

Computation graphs

*******************************************************
![fig:AVPlanningSystemComponents](figures/systems/av-planning-system-diagram.png)

<div class="figcaption"><a name="fig:AVPlanningSystemComponents">Figure 3</a>. A system diagram for a hypothetical planning system for an autonomous vehicle, with left-side inputs corresponding to the outputs of the perception system.  It is clear from the diagram that the data processing occurs in a sequential (serial) manner, where higher-level concerns like deciding on the vehicle's mission and route are processed before lower-level ones like the vehicle's path or control outputs.  What are the potential benefits and drawbacks of a serial system architecture?
</div>

*******************************************************


State machines

Behavior trees


### Reliability and redundancy

Given n components in sequence, any one of which may fail independently with probability $\epsilon$, the probability that any one of them fails is $1−(1−\epsilon)^𝑛$
- Example: $\epsilon$=0.05, 𝑛=5 => 23% probability of failure
- Example: $\epsilon$=0.01, 𝑛=10 => 9% probability of failure

Given n (redundant) components in parallel, the probability that all of them fails is $\epsilon^𝑛$
- Example: $\epsilon$=0.05, 𝑛=3 => 0.01% probability of failure
- Example: $\epsilon$=0.5, 𝑛=5=> 3% probability of failure

Principle: Minimize long chains of dependent components (probably impossible) and implement multiple approaches for the same task (sometimes possible)
But are failures truly independent?

## The robot engineering process

### A developer's responsibilities

- To create system features that are useful to users or other developers
- To characterize and report on the behavior of the system or its components
- To maintain desired functions of the system or its components under changing requirements
- To improve developer velocity, i.e., the rate at which developers contribute to items 1-3
- To enable and aid the usefulness of the system or its components through documentation and organization.


### Phases of development

Generally speaking, a robotics project will follow the four phases listed here.  If the organization is lucky, these steps and phases proceed one after another without a hitch.  But, I have never heard of such a case in my life, and never do expect to hear of one!  We will discuss caveats to this outline below.

#### Phase I: Planning

| Product team           | System integration team  |
|------------------------|--------------------------|
| Requirements gathering |  System architecture design  |

#### Phase II: Component development

| Hardware team        | Perception team     | Dynamics and control team | Planning team      |
|----------------------|---------------------|---------------------------|--------------------|
| Design               | Calibration         | System identification     | Obstacle detection |
| Fabrication          | State estimation    | Tracking control          | Cost / constraint definition | 
| Integration          | Visual perception   | Control API development   | Motion planning    |
| Modeling             | 3D perception       |                           | Mission planning   |

 
#### Phase III: Integration and evaluation

| System integration team  | Product team     |
|--------------------------|------------------|
| System integration       | User interface development |
| Logger development       | User interface testing     |
| Debugging tool development (visualization, metrics, etc) |    |
| Data gathering, machine learning |    |
| Iterative development and tuning |    |
 

#### Phase IV: Marketing and deployment

| Hardware team              | Product team                    | Sales and marketing |
|----------------------------|---------------------------------|---------------------|
| Scaling up fabrication     | Product requirement validation  | Technical documentation | 
| Design for mass production | Certification                   | Marketing | 
| Supply chain organization  | User acceptance testing         | Deployment | 


#### Caveats

In reality, development will be continual both within a phase and between phases.  Within a phase, there will inevitably be iterative evaluation and design as components are tested and refined, and when interacting components are upgraded.   There also will be continual work between phases. (... more specifically, between phases I-III, since most robotics companies never get to a product!)  Requirements will change, integration challenges will kick problems back to the component teams, data gathered from integration testing will go back to tuning and machine learning, acceptance testing may require repeating the planning phase, etc. So, even though you might worry as mechanical engineer that your job will no longer be needed after the start of Phase II, in reality you are likely to be called upon throughout the development process.

Moreover, in a later section we will describe the concept of [vertical development](#horizontal-vs-vertical-development) in which teams are created early in the development process to solve Phase III problems.  This is a very good idea, as it can be hard to predict all of the integration problems that will be met.  User interface development is also often an afterthought in many engineering projects, but getting early results from user interface testing is another very good idea.  The end user might find something confusing, or not so useful, or might even be satisfied with a partial product!  Having this information at hand can drastically shape the landscape of development priorities and make the difference between success and failure within the development budget and timeline.


### Project management

Planning: SMART goals, 

Design documents:

Scheduling: Gantt charts, critical path, etc

Budgeting and personnel: < 10% to hardware
Personnel assignment %FTE 

Project tracking


### Horizontal vs vertical development

When developing a product there will often be teams that focus on specific components, as well as teams that integrate multiple components to fulfill specific system functions.  These are, respectively, known as **horizontals** and **verticals**.  This terminology follows the notion of a "tech stack" with high-level, slow components on top and low-level, fast components on the bottom (see the connection to [hierarchical architectures](AnatomyOfARobot.ipynb)?)

TODO: figure showing horizontal / vertical matrix

Engineers on a horizontal team will focus on refining a component's performance.  For example, an object detection team would be a horizontal one and would focus on improving detection accuracy.  They will also work with members of intersecting vertical teams to ensure that their component works to implement the vertical function.  These will typically be subject-matter specialists with intimate knowledge of the mechanical, electrical, algorithmic, and/or computational aspects of that component.  Their performance metrics will typically involve [unit testing](#unit-and-system-testing).

Engineers on a vertical team will focus on expanding the range of functions of the system, or its operational domain.  For example, in an autonomous driving company a lane changing team would be focused on producing high quality driving behavior when the vehicle needs to perform a lane change.  They will often have specialists in multiple relevant horizontal teams who will work with those horizontal teams to ensure that the system function can be implemented.  For example, lane changing may require specialized agent trajectory prediction and motion planning functions, so working closely with those teams should be a high priority for this vertical.  In contrast, an object detection horizontal team may not need to be closely involved, since lane changing does not typically require any different object detection capabilities compared to normal driving.  A vertical team's performance metrics will typically involve [system testing](#unit-and-system-testing).


It is a common pitfall, especially in smaller organizations, to assign effort only to horizontal components or only to vertical ones.  Without verticals, the effort on components may not be well-targeted to produce the desired functions of the system, which leads to last-minute scrambling as product deadlines grow near.  Without horizontals, development is slowed down by a lack of coherence and expertise in technical components.  You may end up with a mess of code with multiple implementations of motion planners, object detectors, etc. with different APIs, coding conventions, and quality standards.  In a real-world example of this, I participated on a DARPA Robotics Challenge team that was vertically oriented.  The competition asked teams to develop a robot to complete 8 search-and-rescue tasks, and the theory was to have a lot of professors working on the same team, each of whom had expertise on each task.  My students and I were on the ladder climbing team, another professor's lab would address valve turning, another's would address driving, etc.  As it turns out, the lack of coordination between task subteams was a big handicap. Although we scored quite well on my event during the semifinals, the team as a whole didn't make it to the finals...


### Technology Readiness Levels

## Unit and system testing

**System testing** (aka *integration testing*) evaluates whether the entire system performs its specified function according to key performance objectives.  
Integrating a system is expensive and takes a very long time. So, the system engineering process typically involves a large amount of **unit testing**, which evaluates whether an individual component performs its specified function.  These are defined in more detail below.



### Unit testing

To perform unit testing, a developer will
1. Develop test inputs and supposed outputs (including errors).
2. If the component would interact with other components in the system, develop *mock* implementations of them, e.g., generating dummy data or replaying data from a log.
3. Create a test protocol (runner).
4. Ensure that the actual outputs of the component agree with the supposed outputs.

For many components, we do not have a perfect idea of what the outputs should be. Instead, the developer will seek to **measure performance**, using the following process:
1. Develop test inputs and a [performance metric(s)](#metrics).
2. Develop mocks, if necessary.
3. Create a test protocol (runner).
4. Analyze the metric(s) and report.


Defining good *mocks* is extremely important in unit testing, and can be quite challenging in robotics. Essentially, an ideal mock would emulate the outputs of any upstream components so that we can predict how our tested component will perform in practice.  There are several ways of getting close to this: 
- *Stubs*: Generate (constant, random, or varied) outputs of the same data type and structure that the upstream component produces.  Ideally, the data should also have similar values and behavior.
- *Replays*: Replay logged data from the upstream component.
- *Simulation*: Use a high-fidelity simulator of the robot that can simulate all of its actuators and sensors.  Execute the tested component using the actual code for all upstream and downstream components.
- *Faked simulation*: Use a simulator that partially simulates the robot's actuators and sensors, but also create fake implementations of upstream system components to produce "omniscient" readings.


Let's take an object trajectory prediction component as an example, which takes the output of an object detector as input and then extrapolates the future trajectories of detected objects.  We would like to mock the object detector. For a stub, we could generate some hypothetical detections of an object moving in a straight line with some noise, and verify whether the predictor generates predictions along that line. For a replay, we would simply record the output of the object detector running on some video data.  For a simulation, we would run our test by running the simulation and the object detector on the images generated by simulation.  Finally, for a faked simulation, we would skip generating images in simulation, and instead build a fake object detector that reads objects directly from the simulation's objects.



### System testing

To perform system testing, a developer will
1. Establish the performance critera and a way to measure them.  
2. Create test environments that are ideally representative of deployed conditions.
3. Run the system multiple times across the test environments.
4. Analyze and report the performance results.

Measuring performance may involve extensive manual observation or instrumentation of the test environment.

### Metrics

#### Hardware metrics

Actuators / robot arms
- Peak torque
- Stall torque
- Repeatability
- Accuracy
- Workspace volume
- Load
- Energy consumption
- Ingress protection (IP) rating
- Backdrivability

Sensors
- Resolution
- Field of view
- Frames per second (FPS)
- Depth range
- Signal-to-noise ratio
- Drift
- Data transfer rate

#### Perception metrics

State estimation

SLAM

Object detection

Segmentation

Tracking

System identification

#### Planning metrics

Kinematic path planning

Kinodynamic path planning

Trajectory optimization

Model predictive control

Multi-agent path planning

Informative path planning / active sensing

Imitation learning

Reinforcement learning

#### Control metrics

- Control frequency
- Control bandwidth
- Step response
- Overshoot
- Tracking accuracy

#### System metrics

Industrial robots
- Mean time to failure (MTTF)
- Return on investment (ROI)
- Cost per unit
- Cycle time

Autonomous vehicles
- Miles per disengagement (MPD) / Miles per intervention (MPI)
- Accidents / close calls
- Manual driving pose deviation

### Evaluation 

Domain

Validity

Thoroughness

In- and out- of distribution

### Analysis
Persistent questions: are our tests representative?  Are they conclusive? When do we stop testing and start redesigning?  How to tell which component was responsible for poor behavior?


## Large-team engineering practices

Dunbar's number

### Development methodologies

Waterfall: an organizational philosophy that breaks a project into sequential stages with clearly defined development objectives that must be met before proceeding to the next.

Agile: an organizational philosophy that prioritizes frequent changes to adapt to product and customer needs. It deprioritizes systematic long-term planning due to the inability to foresee precise specifications.

### Design documents



### Software organization

Code is code, right?  You couldn't be more wrong!

Organizing code is the single most important imperative of software engineering.

Separation between algorithms, integration wrappers, settings, models, data, logs, executors, tests, and setup.

### Levels of component maturity

D level: "Code scraps"
-	Looks like: Messy scripts and notebooks
-	Purpose: Testing, rapid prototyping
-	Seen by: only you, at this moment in time. Most importantly, you’re not planning to revisit it months from now.
-	Code style: you may use whatever style you want. Go ahead and name things badly, others won’t see it. 
-	Packaging: none. Maybe you throw this into Dropbox / Box when you are done. Or it could be placed in your personal Github account, later to gather dust. Do not push this type of code to your team's project repository.
-	Interoperability: None. Dependencies are system dependent.
-	Settings: Hardcode things. Delete and replace when you want to change a setting.

C level: Research code
-	Looks like: Partially organized code that fulfills a specified, interpretable function. File names and variable names are meaningful, the basics are explained in a readme.
-	Purpose: Gathering together useful scraps to make them reusable for yourself and others. To make a checkpoint of your code, e.g., for a big demo or upon paper submission to encourage replicable research.
-	Seen by: future you and close colleagues. Someone might need to get a hold of you to learn how to use it.
-	Code style: units are organized into meaningful file structures, classes, and variables. Documentation is present but partial. Don’t Repeat Yourself (DRY) is practiced. Code is separated from data and output.
-	Packaging: Ideally, put these into your team's project repository, or a self-contained Github repo if you plan to eventually migrate to B-level code.
-	Interoperability: Document dependencies in docstrings or readme, or better yet, include setup scripts / requirements.txt.  Simplified and suboptimal communications middleware may be used.
-	Settings: Configuration files (ideally) or constants at the top of the file. Example settings might be commented out.

B level: Legitimate module
-	Looks like: A documented, reusable module that is not embarrassing.
-	Purpose: public code releases, releases to collaborating organizations, or to add longevity to your work.
-	Seen by: multiple colleagues, some of whom might be using your code after you leave the organization.
-	Code style: conformant to typical style guidelines. Organization is solid, DRY is practiced. Module code, tests, data, and settings are separated.
-	Packaging: a self-contained Github repo if you want to release it to the public. May work with packaging tools, e.g., `pip install`.
-	Interoperability: communication between parts is documented well. For communication middleware, use best practices in the domain / system on which you are working, e.g., ROS, Google Protocol Buffers, AJAX, etc. System requirements and dependencies are specified in installation instructions (readme) and/or requirements.txt or a setup script.
-	Settings: configuration files or documented command-line arguments. Examples or tutorials should be provided.

A level: Maintained package
-	Looks like: a high-quality package
-	Purpose: public code releases
-	Seen by: the world
-	Code style: conformant to typical style guidelines. High-quality documentation and tutorials with images and examples.
-	Packaging: a self-contained Github repo with continual integration, tests, maintainers, etc. Should work with `pip install`.
-	Interoperability: uses best practices
-	Settings: configuration files or documented command-line arguments. Examples or tutorials should be provided.



### Key principles of software development

Convention vs configuration

DRY

Naming 


### Software management skills

Github

Branching

Pull requests

Code review

Continual integration


### Versioning


## Systems engineering glossary

### General terms

- Metric: any quantifiable measure of performance of any thing. Can be continuous, discrete, or binary (e.g., success achieved / not). The subject can be a tech component, a product, a team, or an individual.
- KPI (key performance indicator): a quantifiable measure of a team’s performance, i.e., a human metric. Used by management.
- OKR (objectives and key results): used by Google. Similar to KPI, but prioritizes broad goals rather than quantifiable metrics.
Headroom: hypothetical upper limit of a technical component’s performance under ideal circumstances and with an ideal implementation. Headroom analysis simulates those conditions, and can help decide how much to invest in a particular method or implementation.
- Development velocity / developer velocity: how “productive” a developer can be over a given timeframe.
- [X]ops (e.g., devops, secops, MLops): teams whose goals are aligned to aid in X velocity, e.g., by providing tools, frameworks, guides, etc.
- Critical path: the sequence of dependent activities in a project plan defining the minimum possible time to complete a project
- Waterfall: an organizational philosophy that breaks a project into sequential stages with clearly defined development objectives that must be met before proceeding to the next.
- Agile: an organizational philosophy that prioritizes frequent changes to adapt to product and customer needs. It deprioritizes systematic long-term planning due to the inability to foresee precise specifications.


### Engineering management terms

- Stakeholder: anyone who has an interest in a project, both internal and external to the organization.
- Technical readiness level (TRL): originating in DoD, a scale from 1-9 defining the maturity of a technology in development.
- Tribal knowledge: information that resides only within human brains on the team rather than in formal documentation, reports, or reference materials.
- Technical debt / tech debt: suboptimal style, structure, or functionality that is introduced when developers take shortcuts in an attempt to make deadlines. Tech debt usually appears as sloppy, badly organized, or non-extensible code, and the debt must be “repaid” later by undoing those introductions.
- NIH (not invented here) syndrome: a tendency in organizations to avoid using products, software, or knowledge that was derived from outside the organization. Can have legitimate reasons (e.g., licensing restrictions, compatibility) but can also waste time.
- SMART goal: a set of principles for writing good milestones and deliverables during project planning. Stands for Specific, Measurable, Achievable, Realistic, and Time-bound.
- % FTE (full-time equivalent): The percentage of effort that one developer devotes to a particular task.

### Engineering management pitfalls

- Peter principle: people in a hierarchy rise to the level of incompetence
- Dilbert principle: the most incompetent people in an organization are promoted to management to minimize harm to productivity
- Bike-shed effect / law of triviality: people in an organization commonly give undue attention to relatively trivial issues
- Hofstadter's Law: "It always takes longer than you expect, even when you take into account Hofstadter's Law." Also see optimism bias, planning fallacy
- 90-90 rule: "The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."
- Student syndrome: planned procrastination, because an impending deadline induces the proper amount of urgency.


### Software engineering terms

- Abstraction: 
- Toolchain: a sequence of programs designed to accomplish a complex development function
- Regression testing: verifying that new changes to software do not break old functionality, e.g., by introducing new bugs or changing behavior.
- Continual integration: a methodology and toolchains for automatically verifying that a complex software product functions (e.g., compiles, regression tests pass) as desired. Such toolchains are run upon each push.
- Unit testing: testing a component of a product to ensure it behaves as expected and/or to gather metrics.
- Integration testing: testing a whole product to ensure it behaves as expected and/or to gather metrics.
- Full-stack developer: a developer whose expertise bridges multiple components rather than specializing in a single component.

### Software engineering pitfalls
- Software / code rot: code losing performance or functionality over long periods of time due to the environment changing around it, e.g., system or library upgrades
- Leaky abstraction:
