## Crashing Programs

The cause of these crashes is that the software ran into an unexpected situation, a state that the developers didn't anticipate.

## System that crashes

you need to do is reduce the scope of the problem, and remember, you want to start with the actions that are easier and faster to check. As a first step, you tried looking at the logs to see if there's any error that may point to what's happening, but you only find an error saying application terminated and no useful information. 


#### Ram Chips
If some hardware component is at fault. The next step is to find out which one. Given the random crashes, one thing to check would be the RAM. Memory chips deteriorate over time. When they do, the computer might write data to some part of the memory and then get a totally different value when trying to read it back. 

**To check the health of our RAM, we can use the memtest 86 tool to look for errors**. We run this tool on boot instead of the normal operating system so that it can access all of the available memory and verify if the data written

to memory is the same when it tries to read it back. If the RAM is fine, you can check if the computer's overheating by looking at the sensor data provided by the OS. If that's not the case, check if there's a problem with external devices like a graphics card or sound card.

#### Operating system problem

Each OS ships its own battery of hard drive checking tools, and you should familiarize yourself with ones in the OS you're working with

You'll want to look at the output of the tools that check the disk for bad sectors, and you'll also want to use these **S.M.A.R.T tools which can help detect errors and even try to anticipate problems before they affect the computer's performance**

## Understanding Crashing Applications

To look at logs on **Linux will open the system log files and VAR log or the user log files and dot accession errors file**. 
On *Mac OS we generally use the console app to look at logs and the event Viewer on Windows*. 

If there are no errors or the errors aren't useful we can try to find out more info by enabling **sling debug logging. Many applications generate a lot more output when debugging logging is enabled**. We might need to enable it from a setting in the applications **configuration file or a command line parameter to pass** when running the application manually. By enabling this extra logging information, we can get a better idea of what's actually causing the problem.

### If no error messages or logs

On Linux we use **S Trace to see what system calls a programs doin**g. The equivalent tool is called de trois on Mac OS process monitor is a Windows tool that can also take a peek inside what's going on inside a process on Windows?


By tracing which system calls a program is doing we can see what files and directories it's trying open what network connections it's trying to make and what information it's trying to read or write. T

#### Recently started crashing applications

If the application used to work fine and recently started crashing. It's useful to look into what changed in between. The first thing is to check if the issue is caused by a new version of the application itself. Maybe there's a bug in the new version that causes the crash or maybe the way that we're using the application is no longer supported. But that's not the only possible change that could trigger crashes. It could also be that a library or service used by our application changed and they no longer work well together or it could be that there was a configuration change in the overall environment

When trying to figure out what changed logs can also be a useful source of information. In the system log we can check which programs and libraries were recently updated checking for configuration changes might be harder depending on how you manage that configuration. If the settings are managed through a configuration management system and the values are stored in a Version Control System

When we're trying to debug an application that crashes finding a reproduction case can help us both understand what's causing the crash and figure out what we can do to fix it. So it's valuable to spend some time figuring out the state that triggers the crash.

 And remember we want to make the reproduction case as small as possible this lets us better understand the problem and also quickly check if its present or not when we attempt to fix it. 
 
**To find the root cause of a crashing application will want to look at all available logs figure out what changed trace the system or library calls the program makes and create the smallest possible reproduction case.**

## What to do when you can't fix the program?

### Wrapper

if the problem is caused by an external service that the application uses and that's no longer compatible, we could write a service to act as a proxy and make sure that both sides see the requests and responses they expect. This type of compatibility layer is called a Wrapper. A Wrapper is a function or program that provides a compatibility layer between two functions or programs so they can work well together. Using Wrappers is a pretty common technique when the expected output and input formats don't match. So if you're faced with some sort of compatibility problem don't be afraid to write a Wrapper to work around it. 

### Container

Another possibility you might need to look at is if the overall system environment is it working well with the application. In this case, you might want to check what environment the applications developers recommend and then modify your systems to match that. This could be running the same version of the operating system using the same version of the dynamic libraries or interacting with the same back end services. Say the application was developed and tested on Windows 7, if you run into problems while trying to run it under Windows 10,

 you might want to consider running the application inside a virtual machine or maybe a container
 
 
 ### WatchDog
 
 . Sometimes we can't find a way to stop an application from crashing but we can make sure that if it crashes it starts back again. To do this, we can deploy a watchdog. This is a process that checks whether a program is running and when it's not starts the program again. To implement this, we need to write a script that stays running in the background and periodically checks if the other program is running. Whenever the check fails the watchdog will trigger the program to restart. Doing this won't avoid the crash itself. But it will at least ensure that the service is available

## Internal Server Error

```
cd /var/log
ls -lt | head
tail syslog

sudo netstat -nlp | grep :80

ls -l /etc/nginx

ls -l /etc/nginx/sites-enabled/

ls -l /etc/uwsgi/apps-enabled/

ls -l site.log

sudo service uwsgi reload

ls -l site*

sudo chown www.data.ww-data site.log
```

But we do know that the Web server is running on port 80, the default web serving port. How can we find which software is listening on port 80? We can use the netstat command which can give us a bunch of information about our network connections depending on the flags we pass.

This command accesses a bunch of sockets that are restricted to route the administrator user on Linux. So we'll need to call it with sudo which lets us run commands as root, and then we'll pass a bunch of flags netstat. We'll use **-n to print numerical addresses instead of resolving host names. L to only check out the sockets that are listening for connection, and P to print the process ID and name to which each socket belongs**. Since we only care about port 80, we'll connect the output to a **grep command checking for colon 80.**

We see that **the process listening in port 80 is called "nginx."** One of the popular web serving applications out there. We now want to check out the configuration for our site. **Configuration files on Linux are stored in the etc directory**. So let's look at etc/nginx.

We're looking for the configuration related to a specific site. So let's look at etc/nginx sites-enabled.

but at the bottom we see that it says uwsgi_pass, and then the local host address followed by a different port number. It seems that this website isn't being served directly from nginx, instead, the software is passing the control of the connections to **uWSGI which is a common solution used to connect a Web server to programs that generate dynamic pages**






## Resources for understanding crashes

There's a ton of different reasons why a computer might crash. This Scientific American article discusses many of the possible reasons, including hardware problems and issues with the overall operating system or the applications on top. 

On Linux or MacOS, the worst kind of crash is called a Kernel Panic. On Windows, it's known as the Blue Screen of Death. These are situations where the computer completely stops responding and only a reboot can make it work again. They don't happen often, but it's good to understand what they mean: the whole OS encountered an error and it can't recover.

We called out that reading logs is super important. You should know how to read logs on the operating system that you're using. Here are some resources for this:

How to find logs on Windows 10 (Digital Masters Magazine)
How to view the System Log on a Mac (How-to Geek)
How to check system logs on Linux (FOSS Linux) 
You also need to be familiar with the tools available in your OS to diagnose problems. These are the tools we called out, but you don't need to limit yourself to them:

Process Monitor for Windows (Microsoft)
Linux strace command tutorial for beginners (HowtoForge)
How to trace your system calls on Mac OS (/etc/notes)