# Cyberinfrastructure

## Segment 3 of 5

<i>Lesson Developers: </i>
<ul>
    <li>
    <i>Craig A. Stewart (stewart@iu.edu)</i>
    </li>
    <li>
    <i>Sri Harsha (srmanj@iu.edu)</i>
    </li>
    <li>
    <i>Brian Cooper (coope824@umn.edu)</i>
    </li>
    <li>
    <i>Eric Shook (eshook@umn.edu)</i>
    </li>
</ul>


 <img src="supplementary/pti.jpg" width="200" alt="Pervasive Technology Institute logo">

In [None]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

# Retreive the user agent string, it will be passed to the hourofci submit button
agent_js = """
IPython.notebook.kernel.execute("user_agent = " + "'" + navigator.userAgent + "'");
"""
Javascript(agent_js)

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    <style>
        .output_prompt{opacity:0;}
    </style>
    <input id="toggle_code" type="button" value="Toggle raw code">
''')

# Types of Computational Systems in Cyberinfrastructure

In this section we will look "under the hood" and cover different types of computational systems that are commonly used in cyberinfrastructure.



## GPUs

<img src="supplementary/gpu.png" width="400"/>    

<small>CC BY 4.0 https://commons.wikimedia.org/wiki/File:NvidiaTesla.jpg</small>

- GPUs – Graphical Processing Units – are a very powerful type of processor (currently being used to display this text on your computer screen).
- GPUs make up a very large part of the computational power of many computing systems.
- GPUs were originally developed for rendering graphical images, but it turns out that they are very fast for some (but not all) kinds of mathematically-oriented calculations.




## Quantum Computers

- Quantum computers are the newest new thing and are very different.
- Normal computers work on a very simple principle: you do the same calculation over and over, you get the same results.

- Rather than operating with a string of things called “bits” each of which is either a 0 or a 1 like a current digital computer, Quantum computers operate on things called Qbits that are 0 or 1 with a certain probability.
- So when you run a program with a quantum computer, you don’t get an answer. You get a probability distribution of answers.
- Quantum computers are very important for some kinds of challenges, but it will be a long time before they matter much to people using GIS applications!

### High Throughput Computing (HTC) Systems

- HTC systems have been around for a long time. Sometimes certain data analysis problems involved doing lots of analysis (or lots of computations) that can happen pretty much independently. So a lot of work is done, and then the results are collected. This is a different kind of parallel than the kind of programs that you usually run on a supercomputer, which are called **jobs.**

- A good way to think about high performance computing as opposed to high throughput computing is this:
  - If you care about how long one job takes, you’re probably doing high performance computing.
  - If you care about how many thousand jobs you run per month, you’re probably doing high throughput computing.

## Let's take a closer look at High Throughput Computing Systems in GIS (at Clemson University)

<table>
    <tr style="background: #fff">
        <td width=30%> <img src="supplementary/htcs.png" width="400" alt="High Throughout Computing System graphic"/></td>
        <td valign=top style = "text-align: left;">
HTC systems can be used to analyze geospatial data. An example problem: calculate the Annual Average Daily Traffic (AADT) through Greenville, South Carolina. Specifically, calculate all possible intersects between vehicle trips (1.9 million observations) in the city of Greenville.

- Notice, each trip is independent from the others so the intersection between trips can be calculated independently.
- To solve this problem, the Clemson University HTC system used a well-known HTC software called Condor [(link)](https://research.cs.wisc.edu/htcondor/) </td>
    </tr>
</table>


- Read more about the HTC system at Clemson at - https://www.clemsongis.org/high-throughput-computing-for-gis


### How High Throughput Computing was used

<img src="supplementary/htc.png" width="400"/>

And you can see how this is cyberinfrastructure: lots of data, lots of data storage, broken up and sent across a network to lots of different computers calculating intersections, which are organized into something called a “Condor Pool.” A “Condor Pool” is what a group of computational systems is called within the Condor HTC software system.


- Read more about the HTC system at Clemson at - https://www.clemsongis.org/high-throughput-computing-for-gis

### Another example: The Large Hadron Collider (LHC)

<img src="supplementary/hadron.png" width="400"/>

- The LHC is the single biggest physics experiment in the world.
- It produces lots of small-ish blocks of data.
- The data tell what happened when subatomic particles are smashed together.
- Most of the time nothing new happens.
- So the data analysis task is to look at a whole bunch of data and determine if anything novel has happened.
- PERFECT for HTC – which is a very important kind of cyberinfrastructure.

From: https://home.cern/science/accelerators/large-hadron-collider

<img src="supplementary/congratulations.png" width="400"/>

## You just learned a lot more detail about what computational systems can be part of cyberinfrastructure systems

Really, anything that can connect to a digital network and can either produce data or do calculations can be considered cyberinfrastructure if it is put to work as part of “infrastructure for knowledge”

<a href="cyberinfrastructure-5.ipynb">Click here to move on to the next segment where you will learn more about the importance of CI in scientific discovery</a>