<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173_Fall2025/blob/main/F25_Class_01_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

## **Class_01_3: Lists, Dictionaries, Sets and JSON**

##### **Module I: Getting Started with Python**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Biology, Health and the Environment](https://sciences.utsa.edu/bhe/), [UTSA](https://www.utsa.edu/)

### Module 1 Material

* Part 1.1: Introduction to Google CoLab
* Part 1.2: Python Basics 1 -- Strings, Variables, Functions
* **Part 1.3: Python Basics 3 -- Lists, Dictionaries, Sets and JSON**
* Part 1.4: Python Basics 4 -- Conditionals and Loops
* Part 1.5: Python Basics 5 -- Packages, NumPy arrays and Matplotlib
* Part 1.6: Python Basics 6 -- Pandas and File Handling

## Google CoLab Instructions

You MUST run the following code cell to get credit for this class lesson. By running this code cell, you will map your GDrive to ```/content/drive``` and print out your Google GMAIL address. Your Instructor will use your GMAIL address to verify the author of this class lesson.

In [None]:
# You must run this cell first
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    from google.colab import auth
    auth.authenticate_user()
    COLAB = True
    print("Note: Using Google CoLab")
    import requests
    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    print(gcloud_tokeninfo['email'])
except:
    print("**WARNING**: Your GMAIL address was not printed in the output below.")
    print("**WARNING**: You will NOT receive credit for this lesson.")
    COLAB = False

If the code is correct, you should see something similar to the following output, but your GMAIL address will be printed out.
~~~text
Mounted at /content/drive
Note: Using Google CoLab
david.senseman@gmail.com
~~~

If your GMAIL address is not visible, your submission will not be graded.


# **Python Basics 3 -- Lists, Dictionaries, Sets and JSON**

Python includes **Lists**, **Sets**, **Dictionaries**, and other data structures as built-in types. The syntax appearance of both of these is similar to **JSON** which is discussed later in this module.

This course will focus primarily on Lists, Sets, and Dictionaries. It is important to understand the differences between these three fundamental collection types.

* **List** - A list is a mutable ordered collection that allows duplicate elements.
* **Tuple** - A tuple is an immutable ordered collection that allows duplicate elements.
* **Dictionary** - A dictionary is a mutable unordered collection that Python indexes with name and value pairs.
* **Set** - A set is a mutable unordered collection with no duplicate elements.

Most Python collections are mutable, meaning the program can add and remove elements after definition. One notable exception is a Python **tuple** which is an immutable collection which means that items cannot be added or removed after its definition.

It is also essential to understand that an ordered collection means that items maintain their order as the program adds them to a collection. However, this order might not be any specific ordering, such as alphabetic or numeric.

Lists and tuples are very similar in Python and are often confused. The significant difference is that a list is mutable, but a tuple isn’t. So, we include a list when we want to contain similar items and a tuple when we know what information goes into it ahead of time.

## **Lists and Tuples**

For a Python programmer, lists and tuples look very similar. Both lists and tuples hold an ordered collection of items. It is possible to get by as a programmer using only lists and ignoring tuples.

The primary difference is that a list is enclosed by square braces `[ ]`, while a tuple is enclosed by parenthesis `( )`.

The following code defines both list and tuple. The code below also illustrates that Python indexes lists starting at element `0`. Accessing element one modifies the second element in the collection. One advantage of `tuples` over `lists` is that `tuples` are generally slightly faster to iterate over than `lists`.

### Example 1:  Create a list called `myList`

The code in the cell below uses square brackets `[ ]` to create a list called `myList`. (Note: `list` is a Python **reserved word** so don't use it a variable name without adding something to it).

In [None]:
# Example 1: Create a list

# Use square brackets to create a list
myList = ['a', 'b', 'c', 'd']

# Print output
print(myList)

If the code is correct, you should see the following output:
~~~text
['a', 'b', 'c', 'd']
~~~

The square brackets tells you that this is Python `list`.

### **Exercise 1: Create a tuple called `myTuple`**

In the cell below use parentheses `( )` to create a tuple called `myTuple` with the letters `A`, `B`, `C` and `D`. (Note: `tuple` is a Python reserved word so don't use it a variable name without adding something to it).

In [None]:
# Insert your code for Exercise 1 here




If the code is correct, you should see the following output but perhaps in a different order:
~~~text
('A', 'B', 'C', 'D')
~~~

The parantheses tells you that this is Python `tuple`.

### Example 2: Change the contents of `myList`

As mentioned above, a `list` is mutable, which means its contents can be changed after it has been created.

The code in the cell below demonstrates that the program can change a list. This example uses square bracket`[ ]` indexing to change the second element in `myLis`. (Remember: Python starts counting sequences at 0.)


In [None]:
# Example 2: Change contents of myList

# Second the second element
myList[1] = 'Z'

# Print output
print(myList)

If the code is correct, you should see the following output but perhaps in a different order:
~~~text

['a', 'Z', 'c', 'd']
~~~

This demonstrates that a program can change the contents of a `list` after it has been created.

### **Exercise 2: Change the contents of `myTuple`**

As mentioned above, a `tuple` is **immutable**, which means its contents can not be changed after it has been created.

Using Example 2 as a template. write the code in the cell below to change the second element in `myTuple` to letter Z.

In [None]:
# Insert your code for Exercise 2 here



If your code is correct, you should see the following error message:

![___](https://biologicslab.co/BIO1173/images/module_01/class_01_3_image01.png)

As expected, Python will generate an error if you try to change the contents of `tuple` after it has been created. In other words, a `tuple` is immutable.


## **Difference Between List and Tuple**

For a Python program, lists and tuples are very similar. Both lists and tuples hold an ordered collection of items. It is possible to get by as a programmer using only lists and ignoring tuples.

The primary difference that you will see syntactically is that a list is enclosed by square braces `[ ]`, and a tuple is enclosed by parenthesis `( )`.

In [None]:
# Run this example

myList = ['a', 'b', 'c', 'd']
myTuple = ('a', 'b', 'c', 'd')

print(f"This is a Python list: {myList}")
print(f"This is a Python tuple: {myTuple}")


If the code is correct you should see the following out:

~~~text
This is a Python list: ['a', 'b', 'c', 'd']
This is a Python tuple: ('a', 'b', 'c', 'd')
~~~


## **Python Dictionaries**

A Python **dictionary** or **`dict`** is a mutable unordered collection of key/value pairs. A mutable data type in Python is a type of object whose value can be changed after it is created. It means that you can modify, add, or remove elements within the object without creating a new instance of it.

For example, lists (`list`) and dictionaries (`dict`) in Python are mutable data types. You can change the elements of a list or update the key-value pairs of a dictionary without creating a new list or dictionary. An important aspect of mutability is that if you have multiple references to the same mutable object, any modifications made to the object will be reflected in all the references.

In contrast, immutable data types like strings (`str`), tuples (`tuple`), and numbers (`int`, `float`) cannot be changed after they are created. If you want to modify these types, you need to create a new instance with the desired changes.

Like other collection types, `dict` can be called with a collection argument to create a dictionary with the elements of the argument. However, those elements must be tuples or lists of two elements — a key and a value. Dictionaries are enclosed within curly braces `{ }`, and each item is separated by a comma. The key-value pairs in a dictionary are separated by a colon `:`.

### Example 3: Create a dict called `DNABaseDict`

The code in the cell below uses Python's `dict()` function to create a dictionary called `DNABaseDict`.

The dictionary's keys are the single-letter abbreviation of the four bases in DNA. Each key has an associated value which is the name of the base. The key/value pairs in this example are contained in a list since they are defined using square brackets `[  ]`.

In [None]:
# Example 3: Multiple two integers

# create dict using dict() function
DNABaseDict = dict((['A', 'adenine'],
                    ['C', 'cytosine'],
                    ['G', 'guanine'],
                    ['T', 'thymine']
                   ))

# Print out the dictionary
DNABaseDict

If the code is correct, you should see the following output:

~~~text
{'A': 'adenine', 'C': 'cytosine', 'G': 'guanine', 'T': 'thymine'}
~~~

The curly braces `{ }` tells you that this is a Python dictionary. This particular dictionary contains 4 `key/value` pairs with a colon `:` separating each `key` from its corresponding `value`.

### **Exercise 3: Create a dict called `RNABaseDict`**

In the cell below use Python's `dict()` function to create a dictionary called `RNABaseDict`. The dictionary's `keys` should be the single-letter abbreviations of the four RNA nitrogenous bases with the correspondong `value` being the name of the base. Print out the RNABaseDict.

You should already know the four nitrogenous bases in RNA. If you don't, you can "Google" it.

In [None]:
# Insert your code for Exercise 3 here



If the code is correct, you should see the following output:

~~~text
{'A': 'adenine', 'C': 'cytosine', 'G': 'guanine', 'U': 'uracil'}
~~~

----------------------------

### **Why the curly braces `{ }`?**

Even though parentheses `( )` were used to define the `DNABaseDict` dictionary in Example 3, Python printed out this dictionary using curly braces `{ }`.

> **So why the curly braces?**

Because they are so frequently used, Python provides a notation for dictionaries that is similar to `sets`: a comma-separated list of `key/value` pairs enclosed in curly braces `{ }`. Within the braces, a colon `:` is used to separate each `key/value` pair. When Python prints out a dictionary, it uses the curly brace format.


----------------------------

### Example 4: Create `dnaBaseDict` using curly braces `{ }`

As explained above, you can also create a Python dictionary using curly braces `{ }`. The cell below shows how to create a dictionary called `dnaBaseDict` using curly braces. This dictionary is similar to `DNABaseDict` created in Example 2 except that lower case letters are used for the keys.

In [None]:
# Example 4: Create dictionary using {}

# Create dictionary using curly braces
dnaBaseDict = {'a': 'adenine', 't': 'thymine', 'g': 'guanine',  'c': 'cytosine'}

# Print out the dictionary
dnaBaseDict

If the code is correct, you should see the following output but perhaps in a different order:

~~~text
{'a': 'adenine', 't': 'thymine', 'g': 'guanine', 'c': 'cytosine'}
~~~

### **Exercise 4: Create `rnaBaseDict` using curly braces `{ }`**

In the cell below create a dictionary called `rnaBaseDict` using curly braces. Use lower case letters for the `keys`.

In [None]:
# Insert your code for Exercise 4 here



If the code is correct, you should see the following output but perhaps in a different order:

~~~text
{'a': 'adenine', 'c': 'cytosine', 'g': 'guanine', 'u': 'uracil'}
~~~

-------------------------------

### **Unique Keys**

The keys of a mapping must be **unique** within the collection, because the dictionary has no way to distinguish different values indexed by the same key.

--------------------------------

## **Python Sets**

A Python **set** is an _unordered_  collection of items that contains no duplicates. As we will see, if you try to add an item that is already in a set, nothing happens.

Since strings behave as collections, a string can be used as the argument for a call to set. The resulting set will contain a **single-character string** for each unique character that appears in the argument. The order in which the elements of a set are printed will not necessarily bear any relation to the order in which they were added.

### Example 5: Create a Set called `DNABases_set`

The code below shows how to create a set of single-character strings called `DNABase_set` using curly braces `{ }`.

In [None]:
# Example 5: Create set

# Create a set called DNABases_set
DNABases_set = {'T', 'C', 'A', 'G'}

# Print DNABases_set
print(DNABases_set)

If the correct is correct, you should see the following output but perhaps in a different order:

~~~text
{'T', 'A', 'C', 'G'}
~~~
You should notice two things. First our `set` is **not** the string "TCAG', but a collection of 4 single-character strings ("letters"), `A`, `C`, `G` and `T`.

Second, the order in the "letters" when we created the set (`TCAG`) may, or may not, be preserved.

Also, a set is an _unordered_  collection. If you tried to use square brackets `[ ]` to index an item in a set, you would get the error message: `set object is not subscriptable`.

### **Exercise 5: Create a Set called `RNABases_set`**

In the cell below, create a new set called `RNABases_set` and print it out. Remember that in RNA the base `uracil` substitutes for the DNA base `thymine`.

In [None]:
# Insert your code for Exercise 5 here



If your code is correct, you should see the following output but perhaps in a different order:

~~~text
{'U', 'A', 'C', 'G'}
~~~

### Example 6: Algebraic set operations - Union

In Python there are a number of operations and functions that work on different collection types such as sets. In this example, we show one example of an operation called `union`.

The "adding" of one set with another is called the **union of the two sets**. In Python, you can use the `|` operator to create a union of two sets as shown in the next cell.

In [None]:
# Example 6: Union of 2 sets

# Create a new set called AddBases_set
AddBases_set =  {'X', 'Y', 'Z', 'U', 'U', 'A','A'}

# Use | to create union
RNABases_set_union = RNABases_set | AddBases_set

# Print the new set
print(RNABases_set_union)


If the code is correct, you should see the following output but perhaps in a different order:

~~~text
{'Z', 'X', 'C', 'U', 'G', 'Y', 'A'}
~~~

Notice that when we add the two sets together, only the letters `X`, `Y`, and `Z` were added to `RNABases_set`, not the additional `Us` and `As`.

> **Why?**

Because every element in a set must be unique. Since our original `RNABases_set` already contained the letters `U` and `A`, they were not added, only the new letters, `X`, `Y` and `Z`. In other words, a set can only contain **one example of each element**.

**NOTE:** In order for this example to run correctly, you must have successfully completed **Exercise 5** above.

### **Exercise 6: Try to create a set with duplicated items**

Because each element in a set must be unique, when you try to create a set with duplicated items, you don't get an error, but only one item will be added to the set.

In the cell below, create a set called `RNABases_set2` with {'U', 'A', 'A', 'G', 'U', 'C', 'C'} and then print out the set.

In [None]:
# Insert your code for Exercise 6 here



If your code is correct, you should see the following output but perhaps in a different order:

~~~text
{'U', 'A', 'G', 'C'}
~~~

The new `RNABase_set2` only contains one example of each item.

### Example 7: Algebraic set operations - Intersection

Another algebraic set operation is **intersection**.

The cell below uses the `&` operator to find the intersection of two sets.

In [None]:
# Example 7: Set intersection using & operator

# Create 2 sets using curly braces
let1_set = {'a','b','c','d','e'}
let2_set = {'c','d','e','f','g'}

# Use `&` to find their intersection
let_set_intersection = let1_set & let2_set

# Print out the intersection
print(let_set_intersection)


If the code is correct, you should see the following output but perhaps in a different order:

~~~text
{'e', 'c', 'd'}
~~~

Set intersection is the set of elements that **both sets have in common**. In this example, only the letters `c`, `d` and `e` were contained in both sets.

### **Exercise 7: Algebraic set operations - Intersection**

In Example 7, set intersection was found using the `&` operator. Python also offers the `intersection()` method for accomplishing the same thing. In the cell below, use the `intersection()` method to find the intersection between the same two sets, `let1_set` and `let2_set` used in Example 7.

(**HINT:** The use of Python **methods** was covered in Class_01_02. Methods are called using **dot notation**. In this case, the `intersection()` method is attached (by the dot) to the first set and its argument is the second set.)

In [None]:
# Insert your code for Exercise 7 here




If your code is correct, you should see the following output but perhaps in a different order:

~~~text
{'e', 'c', 'd'}
~~~


### Example 8: Use `add()` method with sets

A `list` is always enclosed in square braces `[ ]`, a `tuple` in parenthesis `( )`, and similarly a `set` is enclosed in curly braces `{ }`.

Programs can add items to a `set` as they run. Programs can dynamically add items to a `set` with the **add function**. However, to add an item to a `list` you must use instead the **append function**.

In other words, items are added to `sets` using the **add function** while items are added to `lists` using the **append function**.

In [None]:
# Example 8: Use add() method

# Create a new empty set called mySet
mySet = set()

# Add letter `a` to the empty set
mySet.add('a')

# Keeping adding letters to the set
mySet.add('b')
mySet.add('c')

# Try to add a duplicate letter `c`
mySet.add('c')

# Print out the set
print(mySet)


If the code is correct, you should see the following output but perhaps in a different order:

~~~text
{'a', 'b', 'c'}
~~~

Sets can only contain unique items so there is only one `c` in the final set.

### **Exercise 8: Use `append()` method with lists**

While programs can dynamically add items to a `set` with the `add( )` method you must use the `append( )` method to add items to a `list`.

In the cell below use the `append( )` method function to adds item to a list called `myList` using Example 8 as a template.

In [None]:
# Insert your code for Exercise 8 here




If your code is correct you should see the following list but perhaps in a different order:

~~~text
['a', 'b', 'c', 'c']
~~~

Lists (but not sets) can contain duplicate items so the letter `c` appears twice in this list.

## **JSON (JavaScript Object Notation)**

Data stored in a comma separated variable (CSV) file must be flat. In a flat file, all the data must fit neatly into rows and columns.

Here is an example of data stored in a flat, CSV file:

![__](https://biologicslab.co/BIO1173/images/module_01/class_01_3_image02.png)

Most people refer to this type of data as structured or **tabular**. This data is tabular because the number of columns is the same for every row. Individual rows may be missing a value for a column but these rows still have the same number of columns. Tabular data is convenient for machine learning because most models, such as neural networks, also expect incoming data to be of fixed dimensions.

On the other hand, real-world information is not always so tabular. This is where **JSON** comes in. Instead of being tubular, **JavaScript Object Notation (JSON)** is a standard file format that stores data in a hierarchical format similar to **eXtensible Markup Language (XML)**.

JSON is nothing more than a hierarchy of lists and dictionaries. Programmers refer to this sort of data as semi-structured data or hierarchical data.

The following is a sample JSON file. Even though this isn't Python code, Python is able to run it!

In [None]:
# JASON example. RUN THIS CELL
{
    "glossary": {
        "title": "example glossary",
		"GlossDiv": {
            "title": "S",
			"GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
					"SortAs": "SGML",
					"GlossTerm": "Standard Generalized Markup Language",
					"Acronym": "SGML",
					"Abbrev": "ISO 8879:1986",
					"GlossDef": {
                        "para": "A meta-markup language.",
						"GlossSeeAlso": ["GML", "XML"]
                    },
					"GlossSee": "markup"
                }
            }
        }
    }
}

If the code is correct, you should see the following output:

~~~text
{'glossary': {'title': 'example glossary',
  'GlossDiv': {'title': 'S',
   'GlossList': {'GlossEntry': {'ID': 'SGML',
     'SortAs': 'SGML',
     'GlossTerm': 'Standard Generalized Markup Language',
     'Acronym': 'SGML',
     'Abbrev': 'ISO 8879:1986',
     'GlossDef': {'para': 'A meta-markup language.',
      'GlossSeeAlso': ['GML', 'XML']},
     'GlossSee': 'markup'}}}}}
~~~~

A data scientist will generally encounter **JSON** when they access web services to get their data.

## **Lesson Turn-in**

When you have completed and run all of the code cells, use the **File --> Print.. --> Save to PDF** to generate a PDF of your Colab notebook. Save your PDF as `Class_01_3.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.

-----------------------------------------

## **Lizard Tail**


## **IBM Blue Gene**


![__](https://upload.wikimedia.org/wikipedia/commons/d/d3/IBM_Blue_Gene_P_supercomputer.jpg)

> A Blue Gene/P supercomputer at Argonne National Laboratory

**Blue Gene** was an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with relatively low power consumption.

The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, and Blue Gene/Q. During their deployment, Blue Gene systems often led the TOP500 and Green500 rankings of the most powerful and most power-efficient supercomputers, respectively. Blue Gene systems have also consistently scored top positions in the Graph500 list. The project was awarded the 2009 National Medal of Technology and Innovation.

After Blue Gene/Q, IBM focused its supercomputer efforts on the OpenPower platform, using accelerators such as FPGAs and GPUs to address the diminishing returns of Moore's law.

**History**

In December 1999, IBM announced a US\$100 million research initiative for a five-year effort to build a massively parallel computer, to be applied to the study of biomolecular phenomena such as protein folding. The research and development was pursued by a large multi-disciplinary team at the IBM T. J. Watson Research Center, initially led by William R. Pulleyblank. The project had two main goals: to advance understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software. Major areas of investigation included: how to use this novel platform to effectively meet its scientific goals, how to make such massively parallel machines more usable, and how to achieve performance targets at a reasonable cost, through novel machine architectures.

The initial design for Blue Gene was based on an early version of the Cyclops64 architecture, designed by Monty Denneau. In parallel, Alan Gara had started working on an extension of the QCDOC architecture into a more general-purpose supercomputer. The US Department of Energy started funding the development of this system and it became known as Blue Gene/L (L for Light). Development of the original Blue Gene architecture continued under the name Blue Gene/C (C for Cyclops) and, later, Cyclops64.

Architecture and chip logic design for the Blue Gene systems was done at the IBM T. J. Watson Research Center, chip design was completed and chips were manufactured by IBM Microelectronics, and the systems were built at IBM Rochester, MN.

In November 2004 a 16-rack system, with each rack holding 1,024 compute nodes, achieved first place in the TOP500 list, with a LINPACK benchmarks performance of 70.72 TFLOPS.[1] It thereby overtook NEC's Earth Simulator, which had held the title of the fastest computer in the world since 2002. From 2004 through 2007 the Blue Gene/L installation at LLNL gradually expanded to 104 racks, achieving 478 TFLOPS Linpack and 596 TFLOPS peak. The LLNL BlueGene/L installation held the first position in the TOP500 list for 3.5 years, until in June 2008 it was overtaken by IBM's Cell-based Roadrunner system at Los Alamos National Laboratory, which was the first system to surpass the 1 PetaFLOPS mark.

While the LLNL installation was the largest Blue Gene/L installation, many smaller installations followed. The November 2006 TOP500 list showed 27 computers with the eServer Blue Gene Solution architecture. For example, three racks of Blue Gene/L were housed at the San Diego Supercomputer Center.

While the TOP500 measures performance on a single benchmark application, Linpack, Blue Gene/L also set records for performance on a wider set of applications. Blue Gene/L was the first supercomputer ever to run over 100 TFLOPS sustained on a real-world application, namely a three-dimensional molecular dynamics code (ddcMD), simulating solidification (nucleation and growth processes) of molten metal under high pressure and temperature conditions. This achievement won the 2005 Gordon Bell Prize.

In June 2006, NNSA and IBM announced that Blue Gene/L achieved 207.3 TFLOPS on a quantum chemical application (Qbox). At Supercomputing 2006, Blue Gene/L was awarded the winning prize in all HPC Challenge Classes of awards. In 2007, a team from the IBM Almaden Research Center and the University of Nevada ran an artificial neural network almost half as complex as the brain of a mouse for the equivalent of a second (the network was run at 1/10 of normal speed for 10 seconds).

**The Name**

The name Blue Gene comes from what it was originally designed to do, help biologists understand the processes of protein folding and gene development."Blue" is a traditional moniker that IBM uses for many of its products and the company itself. The original Blue Gene design was renamed "Blue Gene/C" and eventually Cyclops64. The "L" in Blue Gene/L comes from "Light" as that design's original name was "Blue Light". The "P" version was designed to be a petascale design. "Q" is just the letter after "P".

**Major Features**

The Blue Gene/L supercomputer was unique in the following aspects:

* Trading the speed of processors for lower power consumption. Blue Gene/L used low frequency and low power embedded PowerPC cores with floating-point accelerators. While the performance of each chip was relatively low, the system could achieve better power efficiency for applications that could use large numbers of nodes.
* Dual processors per node with two working modes: co-processor mode where one processor handles computation and the other handles communication; and virtual-node mode, where both processors are available to run user code, but the processors share both the computation and the communication load.
System-on-a-chip design. Components were embedded on a single chip for each node, with the exception of 512 MB external DRAM.
* A large number of nodes (scalable in increments of 1024 up to at least 65,536).
* Three-dimensional torus interconnect with auxiliary networks for global communications (broadcast and reductions), I/O, and management.
* Lightweight OS per node for minimum system overhead (system noise).

**Architecture**
The Blue Gene/L architecture was an evolution of the QCDSP and QCDOC architectures. Each Blue Gene/L Compute or I/O node was a single ASIC with associated DRAM memory chips. The ASIC integrated two 700 MHz PowerPC 440 embedded processors, each with a double-pipeline-double-precision Floating-Point Unit (FPU), a cache sub-system with built-in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs gave each Blue Gene/L node a theoretical peak performance of 5.6 GFLOPS (gigaFLOPS). The two CPUs were not cache coherent with one another.

Compute nodes were packaged two per compute card, with 16 compute cards (thus 32 nodes) plus up to 2 I/O nodes per node board. A cabinet/rack contained 32 node boards. By the integration of all essential sub-systems on a single chip, and the use of low-power logic, each Compute or I/O node dissipated about 17 watts (including DRAMs). The low power per node allowed aggressive packaging of up to 1024 compute nodes, plus additional I/O nodes, in a standard 19-inch rack, within reasonable limits on electrical power supply and air cooling. The system performance metrics, in terms of FLOPS per watt, FLOPS per m2 of floorspace and FLOPS per unit cost, allowed scaling up to very high performance. With so many nodes, component failures were inevitable. The system was able to electrically isolate faulty components, down to a granularity of half a rack (512 compute nodes), to allow the machine to continue to run.

Each Blue Gene/L node was attached to three parallel communications networks: a 3D toroidal network for peer-to-peer communication between compute nodes, a collective network for collective communication (broadcasts and reduce operations), and a global interrupt network for fast barriers. The I/O nodes, which run the Linux operating system, provided communication to storage and external hosts via an Ethernet network. The I/O nodes handled filesystem operations on behalf of the compute nodes. A separate and private Ethernet management network provided access to any node for configuration, booting and diagnostics.

To allow multiple programs to run concurrently, a Blue Gene/L system could be partitioned into electronically isolated sets of nodes. The number of nodes in a partition had to be a positive integer power of 2, with at least 25 = 32 nodes. To run a program on Blue Gene/L, a partition of the computer was first to be reserved. The program was then loaded and run on all the nodes within the partition, and no other program could access nodes within the partition while it was in use. Upon completion, the partition nodes were released for future programs to use.

Blue Gene/L compute nodes used a minimal operating system supporting a single user program. Only a subset of POSIX calls was supported, and only one process could run at a time on a node in co-processor mode—or one process per CPU in virtual mode. Programmers needed to implement green threads in order to simulate local concurrency. Application development was usually performed in C, C++, or Fortran using MPI for communication. However, some scripting languages such as Ruby and Python have been ported to the compute nodes.

IBM published BlueMatter, the application developed to exercise Blue Gene/L, as open source. This serves to document how the torus and collective interfaces were used by applications, and may serve as a base for others to exercise the current generation of supercomputers.

**Blue Gene/P**

In June 2007, IBM unveiled Blue Gene/P, the second generation of the Blue Gene series of supercomputers and designed through a collaboration that included IBM, LLNL, and Argonne National Laboratory's Leadership Computing Facility.

**Design**

The design of Blue Gene/P is a technology evolution from Blue Gene/L. Each Blue Gene/P Compute chip contains four PowerPC 450 processor cores, running at 850 MHz. The cores are cache coherent and the chip can operate as a 4-way symmetric multiprocessor (SMP). The memory subsystem on the chip consists of small private L2 caches, a central shared 8 MB L3 cache, and dual DDR2 memory controllers. The chip also integrates the logic for node-to-node communication, using the same network topologies as Blue Gene/L, but at more than twice the bandwidth. A compute card contains a Blue Gene/P chip with 2 or 4 GB DRAM, comprising a "compute node". A single compute node has a peak performance of 13.6 GFLOPS. 32 Compute cards are plugged into an air-cooled node board. A rack contains 32 node boards (thus 1024 nodes, 4096 processor cores). By using many small, low-power, densely packaged chips, Blue Gene/P exceeded the power efficiency of other supercomputers of its generation, and at 371 MFLOPS/W Blue Gene/P installations ranked at or near the top of the Green500 lists in 2007-2008.

