# Making Our Future Lives Easier

## 1. Dealing with Hard Problems

You might be wondering, why is debugging so hard that we need an entire course on it? Brian Kernighan, one of the first contributors to the Unix operating system and co-author of the famous C programming language book, among many other things, once said, **"everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"** 

This is a warning against writing complicated programs. If the code is clear and simple, it will be much easier to debug than if it's clever but obscure. The same applies to IT systems. If the system is engineered very cleverly, it will be extremely hard to understand what's going on with it when something fails. It's important to focus on building systems and applications that are simple and easy to understand. So that when something goes wrong, we can figure out how to fix them quickly. So how do we do this? One piece of advice I found really valuable is **to develop code in small, digestible chunks.** Every so often, I stop and test what I've written. The hardest thing to do is to try to debug something if I'm running it for the first time only after I've completed it. There are so many places things could have gone wrong. 

Another lesson that's super useful is to **keep your goal clear.** If you're writing code, try writing the tests for the program before the actual code to help you keep focus on your goal. If you're building a system or deploying an application, having documentation that states what the end goal should be, and the steps you took to get there can be really helpful. To both keep you on track, and figure out any problems that might turn up along the way.

We called out at the beginning of this course that solving technical problems is a bit of an art, and that it can be fun when things finally click together. On the flip side, the worst part of troubleshooting and debugging is when we get stuck. When we can't think of any other reasons why the program is failing, or we can't figure out what else we can do to fix it. 

In this course, we've given you a bunch of tools and processes to follow that can hopefully help you avoid getting stuck on a lot of these situations, but we can't cover absolutely everything. You might still find yourself facing an issue that you have no idea what to do about, and that's okay. 

If you're in a sticky situation, the main thing to do is to remain calm. We need our creative skills to solve problems, and the worst enemy of creativity is anxiety. So if you feel that you're out of ideas, it's better to take your mind off the problem for a while. Maybe grab a cup of coffee, or take a walk outside. Sometimes a change of scenery is all we need for a new idea to pop up and help us figure out what we're missing, true in coding and in life. 

If the problem you're trying to solve is complex and affects a lot of people, it can get really stressful to try to fully debug it with everyone waiting on you. That's why it's better to focus first on the short-term solution, and then look for the long-term remediation once those affected are able to get back to work. 

And don't be afraid to ask for help. Sometimes just the act of explaining the problem to someone else can help us realize what we're missing. There's a technique called rubber duck debugging, which is simply explaining the problem to a rubber duck. It sounds whimsical, and you may look like a quack, but it can really work. Because when we force ourselves to explain a problem, we already start thinking about the issue differently. And remember that no one knows absolutely everything. Sometimes the best way to learn new skills and techniques is to ask others for help. We're all in this thing together. 

There are times when I know that if I spend enough hours on a problem, I'll probably figure out a solution, but is that the best use of my time? Usually, the better answer is to ask someone who has done it before, to save time and frustration. And then use the problem at hand as an opportunity to keep learning, so that the next time, I can do it on my own. When you ask a colleague for their help with debugging a problem, be careful not to tell them what you think the root cause of the issue might be. Instead, tell them about the symptoms, and see what questions they ask and what possibilities they probe. They might come up with completely different paths to explore. Of course, our lives as IT specialists would be much easier if we could avoid problems altogether. Up next, we'll look into some proactive approaches to catching issues before they affect any users.

## 2. Proactive Practices

Something that IT specialists and exterminators have in common is dealing with bugs. I just love a good coding joke. Anyhow, moving on, it can be bugs in our software or someone else's software. But we'll come across lots of bugs that trigger lots of different failures in our programs. There's a bunch of strategies we can adopt to make our lives easier, by catching issues before they affect our users or making troubleshooting simpler by having better information. 

We've touched upon some of them here and there but now, it's time to deep dive. To avoid having to scramble to fix things when there's an outage, it's really helpful to have infrastructure that lets us test changes in advance so that we can check that things are working as expected before they reach our users. 

### 2.1 Tests

**If we're the ones writing the code, one thing we can do is to make sure that our code has good unit tests and integration tests.** If our tests have good coverage of the code, we can rely on them to catch a wide array of bugs whenever there's a change that may break things. For these tests to be really meaningful, we need to run them often, and make sure we know as soon as they fail. Setting up **continuous integration can help with that.** 

Another step in this direction is to have a **test environment,** where we can deploy new code before shipping it to the rest of our users. This serves two purposes. First, we can do a thorough check of the software as it will be seen by the users. Depending on the software and how often we update it, we can do both automated and manual tests in this environment. Second, we can use this test environment to troubleshoot problems whenever they happen. We can try possible solutions and new features without affecting the production environment. 

### 2.2 Ways of Delpoying

Taking this even further, another recommended practice when managing a fleet of computers is to deploy software in phases or canaries. What this means is that instead of upgrading all computers at the same time and possibly breaking all of them at the same time, you upgrade some computers first and check how they behave. If everything goes fine, you can upgrade a few more, and so on until you're confident enough to upgrade the remaining part of the fleet. As the saying goes, like a canary in a coal mine. 

To make the best use of this practice, we'll need to be able to easily roll back to the previous version. Depending on the software, this might require more or less infrastructure. But trust me, it's worth spending the time setting up that additional infrastructure. 

If you deploy to software version that was broken and suddenly a bunch of your computers aren't working correctly, you'll want to roll them back to a previous state as fast as possible. Now, even with all these preventative measures, bugs will still filter through and problems will occur. We can make our troubleshooting easier by including good debug logging in the code. That way, whenever we have to figure out an issue, we can look at the logs and get a pretty good idea of what's going on. 

### 2.3 Centralized Logs Collection

Another method that can help us is having **centralized logs collection.** *This means there's a special server that gathers all the logs from all the servers or even all the computers in the network.* That way, when we have to look at those logs, we don't need to connect to each machine individually, we can comb through all the logs together in a centralized server. 

Similarly, having a good monitoring system can be super helpful. We can use it to catch issues early before they affect too many users. During a debugging session, we can look at the collected data to try to determine if there's anything out of the ordinary going on. 

We called out ticketing systems a few times already, because we can't stress their importance enough. Making good use of them can help us save a lot of time when trying to get to the bottom of a problem. **If we ask users to provide the needed information up front, we don't have to waste time and go back and forth.** Even here, we can look at opportunities for automation. Say you almost always want some specific info from the users computers, you can automate getting it by creating a script that gathers all the data you want and have the users attach it to the ticket. 

### 2.4 Documentation

Finally, remember to spend time writing documentation. Just as importantly, store the documentation in a well-known location. Even if writing documentation isn't especially fun, having good instructions on how to solve a specific problem, knowing how to diagnose what's going on with the server, or tracking the known issues in a system can be real time savers. At Google, we have a bunch of docs called Playbooks where we detail what a person who's on call can do to diagnose and mitigate a ton of different problems. By keeping this information updated, we make sure that no matter who the person on call is, everybody has access to the knowledge base accumulated by the whole team. It doesn't stop there. If we're dealing with systems that change and grow, we can proactively plan for the additional capacity that we'll need in the future. Speaking of planning ahead, you can plan to hear more about this in our next video.