# Analyzing Performance

## Using Linux built-in performance measurement tools

One of the most prolific tools (as you may have read in the textbook) is the `time` ([man page](https://man7.org/linux/man-pages/man1/time.1.html)) tool built into Linux and BSD (amongst other operating systems), usually located at `/usr/bin/time`.  To run `/usr/bin/time`, simply just add it on the same line where you call `python` or your compiled program at the beginning of the line in your `%%qsub` cells.  

Note that it will print, by default: the CPU usage by the program code (user), the CPU usage by the system (system), the Wall time (elapsed), the percentage of CPU used, and different information about the RAM usage.

You can customize it yourself by feeding the `--format="..."` parameters, by replacing the ellipsis with a printf-style format string.

For instance, if we want to time the program `sleep 5`, with the format string `"real %e system %S cpu %P avg_ram_kb %K"`, our line would look like:

`/usr/bin/time --format="real %e system %S cpu %P avg_ram_kb %K" sleep 5`

**NOTE**: The output from this tool will appear in the error buffer instead (`STDIN.eNNNNNN`), so make sure you look for through both the standard output and the standard error files.  Make sure to look through the `STDIN.oNNNNN` file too, so that you have the job number and can know which run was with which parameters.

Try using this tool on your code in the following cells!

## Collecting run data

To make sure that we have adequate data, make sure to submit at least 10 different variations of your code, such as the following example variations (based on the Monte Carlo example):

1. Run with draw number size 10000000 and 2 workers
1. Run with draw number size 10000000 and 4 workers
1. Run with draw number size 10000000 and 6 workers
1. Run with draw number size 10000000 and 8 workers
1. Run with draw number size 10000000 and 10 workers
1. Run with draw number size 10000000 and 12 workers
1. Run with draw number size 10000000 and 14 workers
1. Run with draw number size 10000000 and 16 workers
1. Run with draw number size 100000000 and 8 workers
1. Run with draw number size 100000000 and 16 workers
1. Run with draw number size 1000000000 and 8 workers
1. Run with draw number size 1000000000 and 16 workers
1. Run with draw number size 10000000000 and 8 workers
1. Run with draw number size 10000000000 and 16 workers

Now, go ahead and use the two cells below to run your job for different variations (you can either programmatically run the variations or just manually run each variation here and just note the data down).

In [None]:
import cfxmagic

We can also submit to different types of machines.  The Intel(R) Core(tm) processors differ in specifications from the Intel(R) Xeon(tm) processors.  To switch between the Intel Core nodes and the Intel Xeon nodes, simply just call `qsub` with different node properties as shown:

In [None]:
%%qsub -l nodes=1:core:ppn=2
cd $PBS_O_WORKDIR
python ParallelCode.py

In [None]:
%%qsub -l nodes=1:xeon:ppn=2
cd $PBS_O_WORKDIR
python ParallelCode.py

## Generating plots

We would like to generate plots using the data we collected above.  Optimally, we'd generate data files that we could just simply import and plot, but for this time, it's okay to just create lists of data manually as this isn't really a class on data analysis.

A nice video explaining plotting in Jupyter notebooks is available at: https://www.youtube.com/watch?v=Hr4yh1_4GlQ

Practice by plotting the number of workers (or another variable, such as data draw size) against a response (such as wall time, cpu time, memory usage, etc.):

## Presenting data/plots inline with Markdown text

Now that you have your data and know how to plot your data, create Markdown and code cells below to answer the following questions in report format (incorporating the code cells to generate plots):

1. What software application did you choose to attempt to parallelize or augment?
2. How did you parallelize or augment your chosen software application?
3. How did the throughput or latency of your software application change as you increased the number of resources (workers, CPUs, etc.)?
4. Were there any differences between the Linux system performance measurement tools and your language-based measurement tools?  What may be the cause of that?
5. What would you change if you were to attempt this project again?