######The cell above loads the visual style of the notebook when run.

In [1]:
from IPython.core.display import HTML
css_file = '../styles.css'
HTML(open(css_file, "r").read())

#Analysing Star Data

<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2><span class="fa fa-certificate"></span>Learning Objectives</h2>
</div>
</section>

> * Explain what a library is, and what libraries are used for.
* Load a Python library and use the things it contains. 
* Read tabular data from a file into a program. 
* Assign values to variables.
* Select individual values and subsections from data. 
* Perform operations on arrays of data. 
* Display simple graphs. 

---

Words are useful, but what’s more useful are the sentences and stories we build with them. Similarly, while a lot of powerful tools are built into languages like Python, even more live in the libraries they are used to build.

In order to load our inflammation data, we need to import a library called NumPy. In general you should use this library if you want to do fancy things with numbers, especially if you have matrices or arrays. We can load NumPy using:

In [3]:
import numpy as np

Importing a library is like getting a piece of lab equipment out of a storage locker and setting it up on the bench. Once you’ve loaded the library, we can ask the library to read our data file for us: 

In [4]:
np.loadtxt(fname='data/star_data_1.csv', delimiter=',')

array([[ 19.35074402,  17.90057217,  18.56071009, ...,  24.03636105,
         23.76304737,  22.06254627],
       [ 16.4623656 ,  17.73512092,  19.20910858, ...,  23.79436354,
         24.34756387,  26.10136283],
       [ 18.02527993,  17.99324759,  18.12197651, ...,  23.28660305,
         23.76715088,  23.91506544],
       ..., 
       [ 17.96948253,  17.84508471,  18.4352858 , ...,  25.72433046,
         25.84482763,  24.57508077],
       [ 18.90826319,  18.5795385 ,  18.80793267, ...,  23.25876672,
         24.92410973,  24.62491228],
       [ 18.5269321 ,  17.90517432,  19.94819732, ...,  24.20774477,
         24.19840176,  24.87040472]])

The expression `numpy.loadtxt(...)` is a [function call](reference.html#function-call)
that asks Python to run the function `loadtxt` that belongs to the `numpy` library.
This [dotted notation](reference.html#dotted-notation) is used everywhere in Python
to refer to the parts of things as `thing.component`.

`numpy.loadtxt` has two [parameters](reference.html#parameter): the name of the file we want to read, and the [delimiter](reference.html#delimiter) that separates values on a line. These both need to be character strings (or [strings](reference.html#string) for short), so we put them in quotes.

When we are finished typing and press Shift+Enter, the notebook runs our command. Since we haven't told it to do anything else with the function's output, the notebook displays it.
In this case, that output is the data we just loaded. By default, only a few rows and columns are shown (with `...` to omit elements when displaying big arrays).

Our call to `numpy.loadtxt` read our file, but didn't save the data in memory. To do that,
we need to [assign](reference.html#assignment) the array to a [variable](reference.html#variable). A variable is just a name for a value, such as `x`, `current_temperature`, or `subject_id`.

Python's variables must begin with a letter and are [case sensitive](reference.html#case-sensitive). We can create a new variable by assigning a value to it using `=`.
As an illustration, let's step back and instead of considering a table of data, consider the simplest "collection" of data, a single value. The line below assigns the value `55` to a variable `weight_kg`:

In [5]:
weight_kg = 55

This statement says to the computer "put the value of `55` inside the box labelled `weight_kg`". This is useful because we can use it later in calculations. For example: once a variable has a value, we can print it to the screen: 

In [6]:
print(weight_kg)

55


and do arithmetic with it: 

In [8]:
print('weight in pounds', 2.2*weight_kg)

weight in pounds 121.00000000000001


We can also change a variable’s value by assigning it a new one: 

In [10]:
weight_kg = 57.5
print ('weight in kilograms is now:', weight_kg)

weight in kilograms is now: 57.5


As the example above shows, we can print several things at once by separating them with commas.

We should imagine the variable as a box to store something in the computer's memory. The name of the variable is the label attached to the box, like a sticky note.

<img src='images/python-sticky-note-variables-01.svg'/>
<div style="text-align: center;">Figure: Variables as sticky note labels on boxes</div>

This means that assigning a value to one variable does *not* change the values of other variables. For example, let’s store the subject’s weight in pounds in a variable:

In [11]:
weight_lb = 2.2 * weight_kg
print ('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)

weight in kilograms: 57.5 and in pounds: 126.50000000000001


<img src='images/python-sticky-note-variables-02.svg'/>
<div style="text-align: center;">Figure: Creating another variable</div>

and then change `weight_kg`

In [12]:
weight_kg = 100.0
print ('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)

weight in kilograms is now: 100.0 and weight in pounds is still: 126.50000000000001


<img src='images/python-sticky-note-variables-03.svg'/>
<div style="text-align: center;">Figure: Updating a variable</div>
    
Since `weight_lb` doesn’t “remember” where its value came from, it isn’t 
automatically updated when `weight_kg` changes. This is different from the way 
spreadsheets work.

The statement `weight_kg=100.0` is not a mathematical equation. It's an instruction to the computer which means "take the value on the RHS (`100.0`), and store it in a variable with the name on the LHS (`weight_kg`)". The RHS is calculated first, then the result is put in the variable named on the left. Therefore, the following statement makes perfect sense in Python!

In [13]:
weight_kg = weight_kg - 5
print ('weight in kilograms is now:', weight_kg)

weight in kilograms is now: 95.0


Just as we can assign a single value to a variable, we can also assign an array of values to a variable using the same syntax. Let’s re-run `numpy.loadtxt` and save its result:

In [14]:
data = np.loadtxt(fname='data/star_data_1.csv', delimiter=',')

This statement doesn’t produce any output because assignment doesn’t display anything. If we want to check that our data has been loaded, we can print the variable’s value:

In [15]:
print(data)

[[ 19.35074402  17.90057217  18.56071009 ...,  24.03636105  23.76304737
   22.06254627]
 [ 16.4623656   17.73512092  19.20910858 ...,  23.79436354  24.34756387
   26.10136283]
 [ 18.02527993  17.99324759  18.12197651 ...,  23.28660305  23.76715088
   23.91506544]
 ..., 
 [ 17.96948253  17.84508471  18.4352858  ...,  25.72433046  25.84482763
   24.57508077]
 [ 18.90826319  18.5795385   18.80793267 ...,  23.25876672  24.92410973
   24.62491228]
 [ 18.5269321   17.90517432  19.94819732 ...,  24.20774477  24.19840176
   24.87040472]]
