# Cyberinfrastructure

## Segment 1 of 5

<i>Lesson Developers: </i>
<ul>
    <li>
    <i>Craig A. Stewart (stewart@iu.edu)</i>
    </li>
    <li>
    <i>Sri Harsha (srmanj@iu.edu)</i>
    </li>
    <li>
    <i>Brian Cooper (coope824@umn.edu)</i>
    </li>
    <li>
    <i>Eric Shook (eshook@umn.edu)</i>
    </li>
</ul>


 <img src="supplementary/pti.jpg" width="200" alt="Pervasive Technology Institute logo">

In [None]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci


# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <style>
        .output_prompt{opacity:0;}
    </style>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')

# Introduction and History

In this section we will cover the history of computation and computing and why the need for the word cyberinfrastructure arose.

- For a long time (thousands of years) people talked about computation and computers.
- Then in 2003, the National Science Foundation decided that a new word was needed to talk about the infrastructure that was used to support the creation of knowledge: **cyberinfrastructure**. 



## Why?

To understand what happened, let’s look a bit at the history of computation and computers.

## Early History of Computation

<table>
  <tr style="background-color:transparent">
    <td style="padding-right:50px; width:700px">
        <ul>
           <li>During the time of the Roman empire (about 2,000 years ago), calculating where stones thrown by a catapult would land was an important mathematical problem.</li>
            <li>Here is an example of a catapult, used by Rome and by countries that invaded Rome.</li>
        </ul>
    </td>
    <td>
        <img src='supplementary/catapult.png', width="450"/>
        <font size="-1">Image credit: <a href="https://commons.wikimedia.org/wiki/File:Mang2.png">Wikimedia</a></font>
    </td>
  </tr>
</table>




## Run some simulations of your own!

- To get a sense for the number of parameters you need to consider to estimate the distance a boulder (payload ) is thrown by a catapult, you can run a simulation of a catapult. 
- Try it a few times!


In [None]:
IFrame("supplementary/catapult.html", width=984, height=700)

### An early “computation device”

<table>
  <tr style="background-color:transparent">
    <td style="padding-right:50px; width:700px">
        <ul>
          <li>Archimedes was an important mathematician who made many discoveries.</li>
          <li>He lived from 287 to 212 BC.</li>
          <li>He was valuable to the army of Carthage (which was at war with Rome during Archimedes lifetime) because he was very good at calculating where stones thrown by a catapult would land. This was a calculation that had essentially one number as <b>output</b>: how far the stone would go.</li>
        </ul>
    </td>
    <td>
        <img src="supplementary/Archimedes.jpg" width="300"/>
        <font size="-1">Image credit: <a href="https://commons.wikimedia.org/wiki/File:Domenico-Fetti_Archimedes_1620.jpg">(Wikimedia)</a></font>
    </td>
  </tr>
</table>





## Let's figure out how quickly Archimedes could calculate the distance a boulder is going to fly
Once a catapult is built, there are really just two parameters you can adjust that impact the distance:
  1. Mass of the projectile
  2. Amount of tension
  
These two parameters are called **inputs** that will determine the **output**, which is distance a boulder will fly.


## Let's figure out how quickly Archimedes could calculate the distance a boulder is going to fly

Inputs and outputs are simply **data** that Archimedes can use and produce as a human calculator to calculate distance. Let's get a little more precise in how we represent this data. Computers are based on the binary number system, which means they use 0's and 1's. One **bit** is either a 0 or a 1. One **byte** is 8 bits.

| Binary Number | Decimal Number | 
|------|--------|
|00000000 |   0 |
|00000001 |   1 |
|00000010 |   2 |
|00001010 |  10 |
|00010000 |  16 |
|11111111 | 255 |

Two bytes (or 16 bits) can store numbers up to 65,536. Four bytes (or 32 bits) can store numbers up to 4,294,967,295! See more about binary numbers [here](https://en.wikipedia.org/wiki/Binary_number). If we can store more numbers, then we can be more precise. 



## How do bits relate to precision?

Let's take a look at the following image. If Archimedes is trying to hit the target that is in the firing range, then he needs to have small enough units to be able to communicate the location. If he has only 1 bit, then he can either fire the maximum distance (1) or half the maximum distance (0). If he has 4 bits, then he has 16 different locations that can be calculated. This is similar to the difference between measuring distance using kilometers versus meters. If the target is 1,400 meters away, then 1km is too short, but 2 km is too far.

<img src="supplementary/catapult-bits.png">

## Let's figure out how quickly Archimedes could calculate the distance a boulder is going to fly

Let’s figure that maybe people were really precise with their measurements and the two input values might take four bytes each, and the one output value might take four bytes as well... 

So that means that we have:
  * 2 * 4 bytes of **input** 
  * 1 * 4 bytes of **output**

We can use these to figure out the **Input/Output Rate** or **I/O Rate**, which is how quickly we can accept input, run the calculations, and produce output. What do you think the "I/O rate" of Archimedes might have been?





### Well, we don't really know ....

because we don’t know how quickly Archimedes could run his calculations. Perhaps he approximated his results. He was, after all, trying to help soldiers crush other soldiers with rocks; he wasn’t doing brain surgery.

But let’s figure maybe 1 calculation in 5 minutes and 4 Bytes of I/O per minute, tops, was the I/O rate for Archimedes as a human calculator. If that is the case it would take him two minutes to get the input location from a soldier (2 * 4 bytes), five minutes to run his calculations, and another minute to communicate the distance to the catapult launcher (1 * 4 bytes). So his I/O rate would be approximately 8 minutes to input, calculate, and output 12 bytes.

Now let's compare Archimedes to others types of calculating machines...

## Later there were “calculating machines”

- These are mechanical devices that performed calculations
- The Chinese Abacus was very practical and skilled people using it were very fast
- The Arithmometer, manufactured and sold in 1851, was the first commercially successful calculating machine for office use

<table>
  <tr style="background-color:transparent">
    <td style="padding-right:50px; width:50%">
        <figure>
        <div>
        <img src="https://upload.wikimedia.org/wikipedia/commons/a/af/Abacus_6.png" width="250"/>
        <footer><small><!-- copyright noice --></small></footer>
        </div>
        </figure>
        <font size="-1">Image credit: [Wikimedia](https://commons.wikimedia.org/wiki/File:Abacus_6.png)</font>
    </td>
    <td>
        <figure>
        <div>
        <img src="https://upload.wikimedia.org/wikipedia/commons/5/59/Arithmometre.jpg" width="300"/>
        <footer><small><!-- copyright noice --></small></footer>
        </div>
        </figure>
        <font size="-1">Image credit: [Wikipedia](https://en.wikipedia.org/wiki/Arithmometer)</font>   
    </td>
  </tr>
</table>









### Wait a moment...!

You're about to see a virtual Chinese abacus with a hexadecimal numeral system. Hexadecimal numeral system is simply a numeral system that has 16 base digits (instead of 10 for decimal). 
The hexadecimal base digits are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f which an "f" is equivalent to 15 in decimal system (see <a href = https://en.wikipedia.org/wiki/Hexadecimal>here</a> for more details). 
 
The abacus has an upper deck with two beads, each is worth 5, and the bottom has five, each worth 1. 


## Try a virtual Abacus simulator and see how fast you can make it go!

In [None]:
# Auto-run
IFrame("supplementary/abacus.html", width=600, height=475)

> In 2012, Naofumi Ogasawara, a 22-year-old abacus instructor from Japan completed the task of calculating 10 sums of 10 10-digit numbers each in three minutes and 11 seconds. ( [Recordholders.org](http://www.recordholders.org/en/events/worldcup/2012/results.html); [theguardian.com](https://www.theguardian.com/science/alexs-adventures-in-numberland/2012/oct/10/mental-calculation-world-cup) )

### What is the calculation rate and I/O rate that the winner achieved?

- The input was 100 integers each of which could be represented by 2 bytes (200 bytes total)
- The output was 10 integers each of which could be represented by 2 bytes (20 bytes total)
- The number of seconds was 191
- Each sum of 10 integers took 9 additions, so there were a total of 900 mathematical **operations**
- The total calculation rate was approximately (900 operations / 191 seconds) or ~4.7 operations per second
- So the total I/O rate was something like 220 bytes/191 seconds or between 1 and 2 bytes per second





## The first fully electronic computer

<table>
  <tr style="background-color:transparent">
    <td style="padding-right:50px; width:700px; text-align: left;">
        <ul>
          <li>The Z3, invented by Konrad Zuse, in Berlin, Germany in 1941.</li>
          <li>It could accept any program and took about 1 second per addition and 3 seconds per multiplication of a 22 bit number. Faster than an abacus but not much!</li>
          <li>We’re not quite sure what the I/O rates were - but input was with a keyboard and output with lights, so … pretty slow.</li>
        </ul>
    </td>
    <td>      
<figure>
    <div>
    <img src="https://upload.wikimedia.org/wikipedia/commons/4/4c/Z3_Deutsches_Museum.JPG" width="400"/>
    <footer><small><!-- copyright noice --></small></footer>
    </div>
</figure>

<font size="-1">Image credit: [Wikimedia](https://upload.wikimedia.org/wikipedia/commons/4/4c/Z3_Deutsches_Museum.JPG)</font>    
    </td>
  </tr>
</table>



## Supercomputers!

In the 1970s and 1980s there were lots of different labels for computers. Mainframe computers, minicomputers, workstations. The label “supercomputers” was invented for several reasons:
  - This word distinguished the most powerful computers on earth from "ordinary" computers
  - It sounds cool

###  What makes a supercomputer "super"?


##  What makes a supercomputer "super"?
  - There is no fixed and agreed on definition. The general idea is that a supercomputer is one of the most powerful computers around
  - One of the general characteristics of supercomputers is that they break computational problems up into many parts and work on those problems in **parallel** – many different processors each analyzing a part of a problem
  - If you have a computer and it costs more than \$1,000,000 and you want to call it “super” then go ahead!
  
*Side note:* To use supercomputers effectively to solve problems, it is important to learn about **parallel computing** so make sure to check out the beginner lesson on parallel computing if you are interested!


## The First Supercomputer

<table>
  <tr style="background-color:transparent">
    <td style="padding-right:50px; width:700px; text-align: left;">
        <ul>
          <li>The first supercomputer which was both “super” and called a supercomputer was the Control Data 6600, designed by Seymour Cray. The first system was delivered to a commercial customer in 1964.</li>
          <li>This supercomputer could do 3,000 calculations per second.</li>
          <li>I/O Rate. Because of the way the system was designed, it’s tricky to calculate an input rate. But output was with a teletype, so output was no more than about 10 characters per second.</li>
        </ul>
    </td>
    <td>      
<figure>
    <div>
    <img src="https://upload.wikimedia.org/wikipedia/commons/c/c4/CDC_6600.jc.jpg" width="400"/>
    <footer><small><!-- copyright noice --></small></footer>
    </div>
</figure>

<font size="-1">Image credit: [Wikipedia](https://upload.wikimedia.org/wikipedia/commons/c/c4/CDC_6600.jc.jpg)</font>    
    </td>
  </tr>
</table>



## Today’s fastest supercomputer


<table>
  <tr style="background-color:transparent">
    <td style="padding-right:50px; width:700px; text-align: left;">
        <ul>
          <li>The fastest (unclassified) supercomputer in the world as of Summer 2020 is called Summit, at Oak Ridge National Labs in the US.</li>
          <li>It has achieved a calculation speed of 148,600,000,000,000 calculations per second. (Calculations are measured now in “FLOPS” - floating point operations per second - and that’s 148 PetaFLOPS. Or, in round numbers, really fast. </li>
          <li>I/O rate is 100 GigaBytes per second from an external network (that’s 1,000,000,000 Bytes per second). Local output - to a local file system - is 2.5 TByte / second, or 2,500,000,000,000 Bytes per second.</li>
        </ul>
    </td>
    <td>      
<figure>
    <div>
    <img src="https://upload.wikimedia.org/wikipedia/commons/b/b4/Summit_%28supercomputer%29.jpg" width="400"/>
    <footer><small><!-- copyright noice --></small></footer>
    </div>
</figure>

<font size="-1">Image credit: [Wikipedia](https://en.wikipedia.org/wiki/Summit_(supercomputer)#/media/File:Summit_(supercomputer).jpg)</font>    
    </td>
  </tr>
</table>



## Supercomputers help people do cool stuff


<table style="width:90%">
  <tr style="background-color:transparent">
    <td style="padding-right:50px; width:650px; text-align: left;">
        Many discoveries were made with supercomputers, including:
        <ul>
              <li>Calculate the mass of subatomic particles</li>
              <li>Simulate how suns form</li>
              <li>Simulate how tornadoes form</li>
              <li>Solving the four color problem. This is a mathematical problem that is hundreds of years old. The problem is this: prove that any map can be colored with four colors and no two adjoining countries on the map will be colored the same. Supercomputers were used to solve this problem in the mid 1970s. See the illustration</li>
        </ul>
    </td>
    <td>      
<figure>
    <div>
    <img src="https://upload.wikimedia.org/wikipedia/commons/8/8a/Four_Colour_Map_Example.svg" width="350"/>
    <footer><small><!-- copyright noice --></small></footer>
    </div>
</figure>

Image credit: https://commons.wikimedia.org/wiki/File:Four_Colour_Map_Example.svg    
    </td>
  </tr>
</table>







## Can you solve the four color problem?

In [None]:
IFrame("supplementary/fourcolor.html", width="970", height="730")

## Let's look back: what was the evolution of I/O rates of computing devices?
In early computing devices, up to early supercomputers, the ratio of Calculation / IO was very high. This is called the Compute / Bandwidth ratio now.

| Device | Year | Calculation Rate (measured in Floating Point Operations per Second) | I/O Rate |
| :-: | :-: | :-: | :-: |
| Archimedes | 0 | <1 | 1 Byte / Minute
| Abacus | 1000 | 5 | 1 Byte / Second
| Z3 | 1941 | ⅓ to 1 operation per second | Not sure - but slow
| Control Data 6600 | 1943 | 3,000 | 10 Bytes / second
| Summit - the fastest supercomputer in the world | 2018 | 148,600,000,000,000  | In, across a network: 1,000,000,000 Bytes / second <br/>Out, locally to a file system: 2,500,000,000,000 Bytes / second


<figure>
    <div>
    <img src="supplementary/congratulations.png" width="400"/>
    <footer><small><!-- copyright notice --></small></footer>
    </div>
</figure>

## You now understand three very important concepts
1. The early history of computation
2. I/O Rate, the speed at which data is input and output
3. Calculation Rate (now measured in Floating Point Operations per Second or FLOP)

## Next

Let's learn about cyberinfrastructure and how even supercomputers are enough for science.

<a href="cyberinfrastructure-3.ipynb">Click here to move to the next section to learn more!</a>