# Week 2: Arrays and Plotting

This week we'll work on
- organizing data in arrays in NumPy,
- creating new arrays,
- manipulating data in arrays,
- evaluating functions on data in arrays, and
- creating new plots in Matplotlib,
- visualizing functions with plots,
- changing figure and axes appearances,
- loading data from csv and xlsx files,
- visualizing data in Matplotlib.

Big goals!

## 1. Collecting data in lists, tuples, and NumPy arrays

You can combine several elements like integers, floats, and strings into lists, tuples, and NumPy array.  These three data structures have different rules and are used for different purposes.  

### 1.1 Lists

Wrap a collection of integers, floats, and/or strings into a list by enclosing them in square brackets.  Lists have an order (a first and possible a second, third, etc element), and they're changeable (e.g., you can give the first element in a list a new value after creating the list).

Further reading: [W3 Schools article on lists](https://www.w3schools.com/python/python_lists.asp)

### 1.2 Tuples

Wrap a collection of integers, floats, and/or strings into a tuple by enclosing them with parentheses.  Tuples have an order (a first and possible a second, third, etc element), and they're unchangeable (once you create a tuple, you can't add, change, or remove an element of the tuple).  We'll use tuples to send inputs to functions.

Further reading: [W3 Schools article on tuples](https://www.w3schools.com/python/python_tuples.asp)

In [2]:
myTuple = (7, 8, 9, "haha")
print(myTuple)

(7, 8, 9, 'haha')


### 1.3 NumPy arrays

To do math and science, we often rely on ordered lists or rectangular arrays of numbers.  They could be a vector in the sense of linear algebra, or all the x-values or y-values that you need to plot a function, or an ordered set of observations.  Lists and tuples are cool, but they're not set up to do math with very easily. Enter NumPy arrays.

### 1.4 In-class exercise: math with NumPy arrays
- `np.array()`
- `np.arange()`
- `np.linspace()`
- `np.ones()` and `np.zeros()`

Note that you can use the IPython **debugger** built into JupyterLab to see the values and types of variables you create. You can enable the debugger with the bug-shaped button at the top right of JupyterLab.  If you don't see the bug, you'll need some help from an instructor. ![Screenshot 2025-01-31 at 10.45.16 AM.png](attachment:c2561a6b-38c7-449a-8ef2-81f232ca46d4.png)

You can get the size of an array with `np.size()`.

In [34]:
import numpy as np

print(np.array([1, 1, 2, 3, 5, 8, 13, 21]) )

print( np.arange(10) )

collection_days = np.arange(5, 10)
print(collection_days)

myRange = np.linspace(0, 1, 101)
print(myRange)

gimmeOnes = np.ones(10)
print(gimmeOnes)

gimmeZeros = np.zeros(10)
print(gimmeZeros)
print(" size equals", np.size(gimmeZeros) )

[ 1  1  2  3  5  8 13 21]
[0 1 2 3 4 5 6 7 8 9]
[5 6 7 8 9]
[0.   0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1  0.11 0.12 0.13
 0.14 0.15 0.16 0.17 0.18 0.19 0.2  0.21 0.22 0.23 0.24 0.25 0.26 0.27
 0.28 0.29 0.3  0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4  0.41
 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5  0.51 0.52 0.53 0.54 0.55
 0.56 0.57 0.58 0.59 0.6  0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69
 0.7  0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8  0.81 0.82 0.83
 0.84 0.85 0.86 0.87 0.88 0.89 0.9  0.91 0.92 0.93 0.94 0.95 0.96 0.97
 0.98 0.99 1.  ]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 size equals 10


## 2. Slicing Arrays

Slicing, or indexing, lets you access just a part of an existing array.  This is our first foray into **data wrangling**, a common theme in programming for (physical, social, medical, data, etc) science.  We'll introduce it now, then come back to the idea often.

### 2.1 Zero-based indexing

You can slice a NumPy array by following the array variable's name with square brackets.  Inside the square brackets, provide the location(s) of the elements you're interested in extracting. Positions are indexed, or sequentially numbered, starting at 0.  So, the first element of a 1D array is at position 0.  The second element is at position 1.  This is [zero-based indexing](https://en.wikipedia.org/wiki/Zero-based_numbering) and can be confusing at first.  

For instance, a 1D array called `meas_time` might contain 100 times at which measurements were made.  To access the first time in the array, use `meas_time[0]`.  

### 2.2 Practice with vectors (i.e., 1D numerical arrays)

Create a 1D NumPy array called `stream_velocity` that contains the numbers 15 through 35 (inclusive).  Create a new array called `second_measurement` that contains the second element of `stream_velocity`. 🥢

In [47]:
stream_velocity = np.arange(15, 35+1)
print(stream_velocity)

second_measurement = np.array([stream_velocity[1]])
print(second_measurement)

[15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35]
[16]


Also handy: you can reference the last element of as the "-1 position".  So, the last element of `stream_velocity` is `stream_velocity[-1]`.

### 2.3 Referencing a range or interval of data

A range of data is denoted as `start:end`, where as usual for Python, the counting includes `start` but stops right before and excludes `end`.  So if `start` is 0 and `end` is 2, then you start at position 0 (the first element) and then include the element in position 1 (the second element), but exclude the element in position 2 (the third element).  It makes more sense if you just write the code, though:
```{python}
print(stream_velocity[0:2])
```
will print the first two elements of `stream_velocity`.  Try printing the first 10 elements:

In [55]:
print( stream_velocity[0:10] )

[15 16 17 18 19 20 21 22 23 24]


Now try printing ten consecutive elements from the middle of the array, starting at the tenth element.

In [61]:
print( stream_velocity[9:9+10] )

[24 25 26 27 28 29 30 31 32 33]


In [71]:
print( np.sum( stream_velocity ) )
print( np.cumsum( stream_velocity ) )

525
[ 15  31  48  66  85 105 126 148 171 195 220 246 273 301 330 360 391 423
 456 490 525]


## 3. Math on arrays

Now that we have our data in an array, we can do some math with it.  NumPy does math on arrays _element-wise_ by default. So, adding two arrays of the same size makes a new array whose first element is the sum of the first elements, second element is the sum of the second elements, etc. Same goes for multiplication (note that this is different from how the field of linear algebra defines vector operations).  You can use the `np.sum()` function to add up the elements of an array.

Here are some data typed in as NumPy arrays about some ore you've just analyzed.
```{python}
sample_grams = np.array([0.7, 0.3, 0.8, 1.1, 2.7, 0.6, 0.2])
gold_ppm = np.array([500, 700, 200, 800, 300, 800, 120])
```
Copy and paste them into the code cell below, then answer the questions:
1. How many total grams of sample did you collect?
2. The mass of gold in each sample (in micrograms) is the sample mass (in grams) multiplied by its gold concentration (in ppm).  Calculate the mass of gold (in micrograms) for each sample.
3. Calculate the total mass of gold in all your samples, in grams.
4. The price of gold as of 2025-01-31 is about $91 per gram.  How much is the gold in your samples worth?

In [119]:
sample_grams = np.array([0.7, 0.3, 0.8, 1.1, 2.7, 0.6, 0.2])
gold_ppm = np.array([500, 700, 200, 800, 300, 800, 120])

# my way (messed up!)
print("My Way")
grams_total_sample = np.sum (sample_grams)
print( "Total weight of samples (grams):", np.sum(sample_grams) )
print( "Gold in each sample (micrograms):", sample_grams * gold_ppm )
print( np.sum(gold_ppm) / 1000000)
total_grams = np.array([63.7, 27.3, 72.8, 100.1, 245.7, 54.6, 18.2])
print (np.sum(total_grams)*91 )

# Noah's way
print("Noah's Way")
#1. 
grams_total_sample = np.sum( sample_grams )
print("1. Total sample grams:", grams_total_sample)

My Way
Total weight of samples (grams): 6.4
Gold in each sample (micrograms): [350. 210. 160. 880. 810. 480.  24.]
0.00342
52998.4
Noah's Way
1. Total sample grams: 6.4


## 4. Plotting with Matplotlib

Matplotlib is a popular plotting package that supports scientific workflows. Find some good [documentation here](https://matplotlib.org/stable/) including a quick start guide and lots of examples. 

### 4.1 First plot!

Built up the following code inside the code cell at the bottom of the exercise.

1. First we load the widely-used, simple, user-friendly sub-module of Matplotlib called Pyplot under the canonical alias `plt`.
```{python}
import matplotlib.pyplot as plt
```
We'll be using NumPy here as well, but we've imported that module in an earlier cell.

2. Next we want to create or load the numerical data we'll use to make our plot.  A common belief among scientific coders like me is:

> "The effort it takes to write a chunk of code is proportional to the number of lines needed, squared"

Let's make a plot of that for code chunks between 0 and 100 lines long.  First create a variable containing a NumPy array with the integers from 0 to 100.  Second, create a variable containing a NumPy array with those code lengths squared. Make sure to comment as you go: on a new line above some code, write a `#` and then explain what you're up to.

3. Next up, we need to a new figure and some axes to go in that figure.  We'll use the axes to plot our data.  Even if we just use one set of axes in one figure, we use `plt.subplots()`.

4. We'll start with a line plot. We want to put it in the axes that we just created, using `.plot()`

5. Finally, we want to draw the figure we've been working on to our ouput.  In our Jupyter Notebook, that's the "inline" output just below the code cell.  Use `plt.show()`

6. Now we can experiment with the appearance of our plot.
    - Add x- and y-axis labels with e.g., `ax.set_xlabel()`
    - Change the x- and y- axis limits with e.g., `ax.set_xlim()`
    - See more options using JupyterLab's tab completion!
    - Switch the line plot for a scatter plot with red circular markers.
    - Change the figure size (width, height) with `fig.set_size_inches()`