# Section 2. Intro to NumPy
## 2.1 Getting Started with NumPy

NumPy (often pronounced "num-pie") is a widely used module that is optimized to efficiently execute a variety of mathematical operations. The name Numpy hints at its usefulness for us: Numerical Python. Numpy will be extremely useful for your budding career as an astronomer because of everything it can help you do - read in and manipulate large amounts of data, quickly do simple or complex math, and much much more.

Before we do anything else, we must import the module.
Try doing this yourself in the cell below.
```
import numpy as np
```
Importing this module under the alias "np" isn't necessary, strictly speaking, but it will make your life easier by taking up less space on a line of code.
It is standard practice to import numpy as np.

In [None]:
import numpy as np

### 2.1.1 Making your first arrays

Now that we've imported the module, we can now start talking about why we want to use NumPy.

The biggest and main advantage of using NumPy is the use of arrays.
An array is similar to a list, except that arrays are optimized for use in mathematical applications.

We can turn a list into an array as follows:
```
list1 = [0, 3, 8, 10, 14]
list2 = [9, 5, 3, 2, 4]

array1 = np.array([0, 3, 8, 10, 14])
array2 = np.array([9, 5, 3, 2, 4])
```
Above, we defined two lists, list1 and list2.
Then, we created two arrays, array1 and array2, using the same numerical values as the lists.
We could have also performed the following:
```
array1 = np.array(list1)
```
Note that arrays in NumPy have a couple of restrictions regarding what can go in them.
For our purposes, arrays should only contain numbers.
There are ways to put strings, or other non-numerical items into arrays, but that's beyond the scope of this course.

In the code block below, define your own set of lists and arrays using your own values.

In [None]:
list1 = [12, 42, 34, 96, 7]
list2 = [3, 8, 45, 23, 79]

array1 = np.array(list1)
array2 = np.array(list2)

### 2.1.2 Mathematical Manipulation of Arrays
To illustrate the difference, let's go back to lists. In the previous module, you were asked to do mathematical operations on two lists. If you wanted to add the entries of two lists together, your code would probably look something like this:
```
list3 = len(list1)*[0] # first line

for i in range(len(list1)): # second line
    list3[i] = list1[i] + list2[i] # third line
```
This took three lines of code, not including where we initialize list1 and list2.

Now, here's how you can accomplish the same thing with NumPy:
```
array3 = array1 + array2
```
Only one line of code!
We won't show it here, but it takes a computer much less time to do this type of operation with arrays instead of lists, especially as the number of entries in your array grows larger and larger.
[(Apparently NumPy starts being faster around n=8)](https://stackoverflow.com/a/18713494)

What else can we do?
Anything you can do to two numbers in Python, you can do to two lists with NumPy.
```
array1 + array2 # addition
array1 * array2 # multiplication
array1 - array2 # subtraction
array1 / array2 # division
array1**array2 # exponentiation
```
The catch is that if you're doing element-wise operations, both arrays must be the same length.
Sometimes we want to do something to all the numbers in a single array.
In this case, we can do operations with an array and a number (int or float). For example,
```
array1 + 2 # add 2 to all numbers
array1 / 2 # divide all numbers by 2
```
In these cases, the rules for the order of operations are the same as for normal mathematical operations.

In [None]:
list3 = len(list1)*[0] # first line

for i in range(len(list1)): # second line
    list3[i] = list1[i] + list2[i] # third line

print(list1)
print(list2)
print(list3)

[12, 42, 34, 96, 7]
[3, 8, 45, 23, 79]
[15, 50, 79, 119, 86]


In [None]:
array3 = array1 + array2
print(array3)

[ 15  50  79 119  86]


In [None]:
array4 = np.array([2, 3, 4, 2, 3])

print(array1 + array2) # addition
print(array1 * array2) # multiplication
print(array1 - array2) # subtraction
print(array1 / array2) # division
print(array1**array4) # exponentiation

print(array1 + 2)
print(array1 / 2)

[ 15  50  79 119  86]
[  36  336 1530 2208  553]
[  9  34 -11  73 -72]
[4.         5.25       0.75555556 4.17391304 0.08860759]
[    144   74088 1336336    9216     343]
[14 44 36 98  9]
[ 6.  21.  17.  48.   3.5]


### Problem: 3D Function

Mathematical functions commonly depend on two or more variables. Let's say we have a function $z=f(x,y)$ which we want to evaluate at different $(x,y)$ locations. Let's say we have the function:

$$ z = f(x,y) = \frac{xy^2}{3} + 2. $$

Suppose we want to evaluate the function at the four (x, y) data points:
*   (0, 0)
*   (1, 2)
*   (2, 5)
*   (3, 9)

In the cell below, write code that solves for z using the function at the four points.
Define two arrays, one for $x$ and one for $y$.
To check to see if your code is accurate, use one or two of the points to find the value of z either with a calculator or with pencil and paper.

In [None]:
x_array = np.array([0, 1, 2, 3])
y_array = np.array([0, 2, 5, 9])
z_array = ((x_array*(y_array**2))/3)+2
print(z_array)

z0 = ((0*(0**2))/3)+2
z1 = ((1*(2**2))/3)+2
z2 = ((2*(5**2))/3)+2
z3 = ((3*(9**2))/3)+2
zlist = [z0, z1, z2, z3]

print(zlist)

[ 2.          3.33333333 18.66666667 83.        ]
[2.0, 3.333333333333333, 18.666666666666668, 83.0]


### 2.1.3 Special NumPy Functions

There are many special mathematical functions that you'll probably often use.
We can divide these into different categories.

First, there are functions that exist in Python already, like $\sin(x)$, $\log(x)$, or $\exp(x)$.
However, the native Python functions only work on single numbers (i.e. floats or ints), so we need a version that works for multidimensional objects.
Here are some examples provided by NumPy:
```
np.sin(x) # sin of each element in x
np.log10(x) # log base 10 of each element in x
np.exp(x) # e^x for each element in x
```

Then, there are some functions that only have meaning when we're dealing with collections of numbers.
Some of these should be familiar from statistics, like the average, median, or range.
Others include things like the product or sum of all elements,
the difference between successive elements, or the maximum and minimum values of an array (and their indices).
Here are some examples:
```
np.mean(x) # mean of x
np.std(x) # standard deviation of x
np.sum(x) # sum of array elements or along an axis
np.diff(x) # difference between consecutive elements along an axis
np.amin(x) # min value of array or along a given axis
np.argmax(x) # indices of max values of array or along a given axis
```

One nice thing about using these functions is that they typically take "array-like" arguments.
In other words, you can give these functions a list and they'll convert the list to an array for you! These are also much faster than their Python analogs since they're optimized for arrays.

You can find details and additional functions that could be useful to you [here](https://numpy.org/doc/stable/reference/routines.math.html).

### 2.1.4 Concatenate Arrays

You can also combine separate arrays into a single array using the concatenate function. Consider the two arrays containing the semimajor axes of the inner and outer planets in the Solar System:
```python
a_inner = np.array([5.76e10, 1.082e11, 1.496e11, 2.280e11])
a_outer = np.array([7.785e11, 1.432e12, 2.867e12, 4.515e12])
```
We can combine both arrays into a single array using the concatenate command. Notice that we need two sets of parenthesis!
```python
a_all = np.concatenate((a_inner, a_outer))
print(a_all)
```
Try yourself in the following code cell.

In [None]:
a_inner = np.array([5.76e10, 1.082e11, 1.496e11, 2.280e11])
a_outer = np.array([7.785e11, 1.432e12, 2.867e12, 4.515e12])

a_all = np.concatenate((a_inner, a_outer))
print(a_all)

[5.760e+10 1.082e+11 1.496e+11 2.280e+11 7.785e+11 1.432e+12 2.867e+12
 4.515e+12]


## 2.2 Multidimensional Arrays

Useful data can rarely be described in a single dimension. To describe the motion of a point particle, we need seven numbers: position ($x$, $y$, and $z$) and velocities ($v_x$, $v_y$, and $v_z$) as well as time ($t$).
Stars have numerous properties, such as mass, radius, composition, position in the sky, distance, etc.
In these cases, it would be unwieldy to have many separate lists of numbers. Luckily for us, the NumPy array datatype is actually called an ndarray, where n stands for N and d stands for dimensions.
In other words, our arrays can have any dimensionality we require!
For now, let's restrict ourselves to two dimensions, which will be sufficient for most of our use cases.
You might have three dimensions (for example, making a 2D animation or running a simulation of particles) and more rarely four or more, but luckily everything we are about see works the same for arrays with higher dimensions.

### 2.2.1 Creating Multidimensional Arrays + .shape

When you initialize 2D arrays, you should give a list of rows.
```
A = np.array([[1, 2, 3],
              [4, 5, 6]])
```
In this example, the array has 6 elements, arranged in two rows and three columns. The first row is [1, 2, 3] and the first column is [1, 4]. We could instead arrange these six elements in a host of different ways. For example:
```
B = np.array([[1, 2], # three rows and two columns
              [3, 4],
              [5, 6]])
C = np.array([[1, 2, 3, 4, 5, 6]]) # one row and six columns
D = np.array([[1], # six rows and one column
              [2],
              [3],
              [4],
              [5],
              [6]])
```
That's four different ways to arrange six numbers! For larger collections of numbers, there can be even more combinations. We may need to check the shape our arrays, which can can do with the shape parameter:
```
A.shape # (2, 3)
```
The shape parameter outputs (# of rows, # of columns) for 2D arrays.

In the cell below, create some 2D arrays and make sure they have the shape you expect by printing the shape parameter.

In [None]:
A = np.array([[2,7,5],[1,4,5],[6,1,6],[9,13,7]]) #make sure to have double brackets
print(A)
print(A.shape)

[[ 2  7  5]
 [ 1  4  5]
 [ 6  1  6]
 [ 9 13  7]]
(4, 3)


In [None]:
B = np.array([[2],[4],[7],[3]])
print(B)
print(B.shape)

[[2]
 [4]
 [7]
 [3]]
(4, 1)


In [None]:
C = np.array([[2,1,6,9],[7,4,1,13],[5,5,6,7]])
print(C)
print(C.shape)

[[ 2  1  6  9]
 [ 7  4  1 13]
 [ 5  5  6  7]]
(3, 4)


It's common for data to be given to us. For example, maybe I can give you three lists of planetary mass, radii, and distances (units in SI, first four planets only). We can combine them like this:
```
masses = [3.3e23, 4.9e24, 6.0e24, 6.4e23] # kg
radii = [2439, 6052, 6378, 3398] # m
periods = [5.7e10, 1.1e11, 1.5e11, 2.3e11] # m

planets = np.array([masses, radii, periods])
```
Thus each column describes a planet, and each property is a row.
If you want to make it the other way around, you can do so by taking the transpose of the array.
```
planets_T = planets.transpose()
```
Try this yourself below.
Also, compare the shapes and entries of planets and planets_T to make sure everything looks right.

In [None]:
masses = [3.3e23, 4.9e24, 6.0e24, 6.4e23] # kg
radii = [2439, 6052, 6378, 3398] # m
periods = [5.7e10, 1.1e11, 1.5e11, 2.3e11] # m

planets = np.array([masses, radii, periods])

print(planets)

[[3.300e+23 4.900e+24 6.000e+24 6.400e+23]
 [2.439e+03 6.052e+03 6.378e+03 3.398e+03]
 [5.700e+10 1.100e+11 1.500e+11 2.300e+11]]


In [None]:
planets_T = planets.transpose()
print(planets_T)

[[3.300e+23 2.439e+03 5.700e+10]
 [4.900e+24 6.052e+03 1.100e+11]
 [6.000e+24 6.378e+03 1.500e+11]
 [6.400e+23 3.398e+03 2.300e+11]]


### 2.2.2 np.zeros() and np.ones()

It's extremely unwieldy to make 2D arrays.
Just as we are able to make lists with placeholders very quickly, there are some functions that make simple arrays for you very easily.
The two most common are np.zeros() and np.ones().
The first gives you an array with all zeros and the second an array with all ones.
Both functions take the same arguments.
As an example, let's consider np.zeros().
The main argument is the shape of the array you want.

For example,
```
np.zeros(5) # this returns an array of length 5

np.zeros((2, 6)) # this returns an array with 2 rows and 6 columns
```
Try yourself in the code cell.

In [None]:
print(np.zeros(5))
print()
print(np.zeros((2,6)))

[0. 0. 0. 0. 0.]

[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]


### 2.2.3 np.arange and np.linspace()

The np.arange function is exactly the same as range() from before, but it returns an array. Technically, the other main difference is that we can use non-integer step sizes. However, in those cases, it's better to use the next function.

np.linspace is a little different. This lets us make an array with equally spaced numbers by specifying the start, end, and number of points. This is extremely useful in a variety of applications, especially plotting. Note that, given the nature of these functions, we can only make 1D arrays in this manner.

```
# 50 (default) linearly-spaced numbers from 0 to 10
np.linspace(0, 10)

# With 3 arguments, this generates 200 numbers should generated
np.linspace(0, 10, 200)

# This is helpful, for example, if you wanted to plot the sin( ) function
x = np.linspace(0, 2*np.pi, 100)
```
It is possible to reshape arrays.
In the cell below, compare the output from these two lines of code.
```
print(np.arange(12))
print(np.arange(12).reshape(4, 3))
```

In [None]:
print(np.linspace(0, 10))
print()
print(np.linspace(0, 10, 200))
print()
print(np.linspace(0, 2*np.pi, 100))

[ 0.          0.20408163  0.40816327  0.6122449   0.81632653  1.02040816
  1.2244898   1.42857143  1.63265306  1.83673469  2.04081633  2.24489796
  2.44897959  2.65306122  2.85714286  3.06122449  3.26530612  3.46938776
  3.67346939  3.87755102  4.08163265  4.28571429  4.48979592  4.69387755
  4.89795918  5.10204082  5.30612245  5.51020408  5.71428571  5.91836735
  6.12244898  6.32653061  6.53061224  6.73469388  6.93877551  7.14285714
  7.34693878  7.55102041  7.75510204  7.95918367  8.16326531  8.36734694
  8.57142857  8.7755102   8.97959184  9.18367347  9.3877551   9.59183673
  9.79591837 10.        ]

[ 0.          0.05025126  0.10050251  0.15075377  0.20100503  0.25125628
  0.30150754  0.35175879  0.40201005  0.45226131  0.50251256  0.55276382
  0.60301508  0.65326633  0.70351759  0.75376884  0.8040201   0.85427136
  0.90452261  0.95477387  1.00502513  1.05527638  1.10552764  1.15577889
  1.20603015  1.25628141  1.30653266  1.35678392  1.40703518  1.45728643
  1.50753769  1.55778894

### 2.2.4 np.loadtxt()

We can also import a data file into an array.
All you need to do is supply a filename. In this case,  we will load the file named 'example.txt'
```
data = np.loadtxt('example.txt')
```
After you load the data, do the following:
1. Get the shape of the array
2. Sum all the elements of the array
3. The the max values of each row (axis=0) and each column (axis=1)
4. Finally, sum each row and column individually. Your answer might surprise you.

In [None]:
# Running in Google Colab? Run this cell to download the file
!wget https://raw.githubusercontent.com/CIERA-Northwestern/REACHpy/main/Module_2/data/example.txt

# If you're not running in Colab, this file should be in the data directory.
# Change the loading path of the file to include 'data/' when the file is loaded

--2024-07-19 22:31:31--  https://raw.githubusercontent.com/CIERA-Northwestern/REACHpy/main/Module_2/data/example.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 189 [text/plain]
Saving to: ‘example.txt.1’


2024-07-19 22:31:31 (2.93 MB/s) - ‘example.txt.1’ saved [189/189]



In [None]:
data = np.loadtxt('example.txt')

print(data)

[[64.  2.  3. 61. 60.  6.  7. 57.]
 [ 9. 55. 54. 12. 13. 51. 50. 16.]
 [17. 47. 46. 20. 21. 43. 42. 24.]
 [40. 26. 27. 37. 36. 30. 31. 33.]
 [32. 34. 35. 29. 28. 38. 39. 25.]
 [41. 23. 22. 44. 45. 19. 18. 48.]
 [49. 15. 14. 52. 53. 11. 10. 56.]
 [ 8. 58. 59.  5.  4. 62. 63.  1.]]


In the above example, the file assumed the columns are separated by whitespace, which loadtxt assumes by default.
This isn't always the case.
For example, some files may instead separate values by commas. In the case of comma-separated values (CSV) file format, you would need to pass a delimiter option:
```
data = np.loadtxt('example.txt', delimiter=',')
```
We will not be doing an example here, but we may see it in future Challenge Problems.

### 2.2.5 np.meshgrid()

In the lists section, we discussed a way of evaluating a function on a *grid* of points by using nested for loops. We can accomplish something similar without needing to use loops with the function `np.meshgrid`. The example from that section was
```
xs = [0,1,3,5,6]
ys = [1,3,6,7,10]

zs = len(xs)*[0]
for i in range(len(xs)):
    zs[i] = len(ys)*[0]

for i, x in enumerate(xs):
    for j, y in enumerate(ys):
        zs[i][j] = x*x + y

print(zs)
```
Here's one way we could do this in numpy:

```
xs = np.array([0, 1, 3, 5, 6])
ys = np.array([1, 3, 6, 7, 10])

xx, yy = np.meshgrid(xs, ys)
zz = xx*xx + yy
```
Run the NumPy example below and print out xx and yy to see what's going on. Essentially, the function takes your x and y values and clones the x values along each row and the y values along each column so you work with 2D arrays.

In [None]:
xs = np.array([0, 1, 3, 5, 6])
ys = np.array([1, 3, 6, 7, 10])

xx, yy = np.meshgrid(xs, ys)
zz = xx*xx + yy

print(zz)

[[ 1  2 10 26 37]
 [ 3  4 12 28 39]
 [ 6  7 15 31 42]
 [ 7  8 16 32 43]
 [10 11 19 35 46]]


## 2.3 Indexing and Slicing in NumPy

### 2.3.1 Commonalities with Lists

Luckily for us, accessing the data in 1D arrays is essentially the same as working with normal lists! Just to review with some examples,
```
my_array = np.arange(8)

my_array[3] # this gets the 4th element of my_array
my_array[-5:] # this gets the last 5 elements of my_array
my_array[1:6:2] # this gets every other element of my_array, starting with the 2nd and stopping by the 7th
```
Try yourself in the code cell.

In [None]:
my_array = np.arange(8)
print(my_array)

print(my_array[3]) # this gets the 4th element of my_array
print(my_array[-5:]) # this gets the last 5 elements of my_array
print(my_array[1:6:2])# this gets every other element of my_array, starting with the 2nd and stopping by the 7th

[0 1 2 3 4 5 6 7]
3
[3 4 5 6 7]
[1 3 5]


### 2.3.2 Working with 2D Arrays

Now that we have multiple dimensions, we can access the arrays along multiple dimensions.
In order to specify multiple axes, we can separate them with commas:
```
my_array = np.arange(48).reshape(6, 8)

my_array[2, 3] # accesses the element in the 3rd row, 4th column
```
The various rules for slicing also apply here.
```
my_array[1:-1,::2] # get all but the first and last row, every other column
```
What if we only want to get certain rows? Since a 2D array is basically an array of arrays, we can access rows by pretending we're working with a 1D array.
```
my_array[4] # access the 5th row
my_array[2:5] # access rows 3 through 6
```
What about columns? If we want to access part or all of a column in every row, use a colon before the comma:
```
my_array[:, 4] # access the 5th col
my_array[:, 2:5] # access cols 3 through 6
```
Try to work through these examples in your head before you try to run them in the cell below.

In [None]:
my_array = np.arange(48).reshape(6,8)
print(my_array)
print()
print(my_array[2,3])

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31]
 [32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47]]

19


In [None]:
print(my_array[1:-1,::2])  # get all but the first and last row, every other column
print()
print(my_array[4]) # access the 5th row
print()
print(my_array[2:5]) # access rows 3 through 6

[[ 8 10 12 14]
 [16 18 20 22]
 [24 26 28 30]
 [32 34 36 38]]

[32 33 34 35 36 37 38 39]

[[16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31]
 [32 33 34 35 36 37 38 39]]


### 2.3.3 Accessing Specific Indices

NumPy can access multiple indices and values at once. For example,
```
my_array = np.arange(8)

my_array[[0, 2, 3, 6]] # this gets the elements at all the specified positions
```

Try the example in the code cell.

In [None]:
my_array = np.arange(8)
print(my_array)
print()
print(my_array[[0, 2, 3, 6]]) # this gets the elements at all the specified positions

[0 1 2 3 4 5 6 7]

[0 2 3 6]


### 2.3.4 Boolean Indexing

The word Boolean means we're dealing with conditional logic (True or False).

Instead of accessing data based on the index, we can do it based on the nature of the data itself! For example, in the following list, suppose we want only the even numbers.
```
a = np.arange(12)

a[a%2 == 0]
```
Here, the == means 'is equal to', as opposed to =, which is just assignment. Here's a list of the different comparison operators you'll probably use:
```
== # equal to
!= # not equal to
<= # less than or equal to
<  # less than
>= # greater than or equal to
>  # greater than
```
In addition, we can chain logical statements together with these operators:
```
& # logical and
| # logical or
~ # logical not
```
Here's an example. Note that we need to enclose different logical statements in parentheses.
```
a[(a%2 == 0) & (a > 7)] # return elements even and greater than 7
a[(a%2 == 0) & ((a > 7) | (a < 3))] # return elements even and greater than 7 or less than 3
a[~(a%2 == 0) & ((a > 7) | (a < 3))] # return elements NOT even (ie odd) and greater than 7 or less than 3
```
Try to predict the output of each of these lines before running the code below.

In [None]:
a = np.arange(12)
print(a)
print()
print(a[a%2 == 0])
print()
print(a[(a%2 == 0) & (a > 7)]) # return elements even and greater than 7
print()
print(a[(a%2 == 0) & ((a > 7) | (a < 3))]) # return elements even and greater than 7 or less than 3
print()
print(a[~(a%2 == 0) & ((a > 7) | (a < 3))]) # return elements NOT even (ie odd) and greater than 7 or less than 3

[ 0  1  2  3  4  5  6  7  8  9 10 11]

[ 0  2  4  6  8 10]

[ 8 10]

[ 0  2  8 10]

[ 1  9 11]


### 2.3.5 Boolean Indexing in 2D

For two dimensions, we need to specify the row or column.

First, consider selecting rows based on different column values:
```
b = np.arange(48).reshape(6,8)
b[b[:,4] > 20] # get all rows where column 5 is greater than 20
b[(b[:,4] > 20) | (b[:,2] < 30)] # get all rows where column 5 is greater than 20 or column 3 is less than 30
```
Next, consider selecting columns based on different row values:
```
b[:,b[0,:] > 3] # get all columns where first row value is greater than 3
b[:,(b[0,:] > 3) & b[0,:]%2 == 1] # get all columns where first row value is greater than 3 and odd
```
Try to predict the output of each of these lines before running the code below.

In [None]:
b = np.arange(48).reshape(6,8)
print(b)
print()
print(b[b[:,4] > 20]) # get all rows where column 5 is greater than 20
print()
print(b[(b[:,4] > 20) | (b[:,2] < 30)]) # get all rows where column 5 is greater than 20 or column 3 is less than 30

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31]
 [32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47]]

[[24 25 26 27 28 29 30 31]
 [32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47]]

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31]
 [32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47]]


In [None]:
print(b[:,b[0,:] > 3]) # get all columns where first row value is greater than 3
#Predicted result: [[ 4  5  6  7]
                  # [ 12 13 14 15]
                  # [ 20 21 22 23]
                  # [ 28 29 30 31]
                  # [ 36 37 38 39]
                  # [ 44 45 46 47]]
print()
print(b[:,(b[0,:] > 3) & b[0,:]%2 == 1]) # get all columns where first row value is greater than 3 and odd
#Predicted result: [[ 5  7]
                  # [ 13 15]
                  # [ 21 23]
                  # [ 29 31]
                  # [ 37 39]
                  # [ 45 47]]

[[ 4  5  6  7]
 [12 13 14 15]
 [20 21 22 23]
 [28 29 30 31]
 [36 37 38 39]
 [44 45 46 47]]

[[ 5  7]
 [13 15]
 [21 23]
 [29 31]
 [37 39]
 [45 47]]


### 2.3.6 np.where()

Boolean indexing is extremely useful, but sometimes we want to work with data in the same column(s) differently based on some condition. This is where the function `np.where` comes in handy. The syntax for `np.where` is
```
np.where(condition,if true do this, else do this)
```
This is simply an if-else statement.
An example of this is as follows:
```
# Create masses of black holes
m1 = np.random.uniform(5, 50, size=10)
m2 = np.random.uniform(5, 50, size=10)

# Define mass ratio to always be less than 1
q = np.where(m2 < m1, m2/m1, m1/m2)
print(q)
```
In the above example, the first two lines create masses of objects in a binary star system to be between 5 and 50 (in some unit system).
Typically, in binary star system, the mass ratio is defined such that we divide the smaller mass by the larger mass.
We could do this with a for loop and an if-else statement.
This way is more compact and faster!

There is a second use case for `np.where`.
If you leave off the second and third arguments of the function, then the function will return a list. The first element of this list contains the indices where the condition is true.
For example, the following code will return all the indices where the number is negative.
```
a = np.random.uniform(-1, 1, size=10)
b = np.where(a < 0)[0]
print(b)
```


In [None]:
m1 = np.random.uniform(5, 50, size=10)
m2 = np.random.uniform(5, 50, size=10)

In [None]:
q = np.where(m2 < m1, m2/m1, m1/m2)
print(q)
print()
print(m1)
print(m2)

[0.22967913 0.26374907 0.91504401 0.93881414 0.91262215 0.49274599
 0.82387274 0.58662292 0.53707856 0.7008163 ]

[11.04474212  7.75601656 37.04522297 45.86116834 31.32594952 17.92741753
 41.16156234  5.56114179 27.82186302 20.07068143]
[48.08770345 29.40680129 33.89800924 43.0551135  34.32521297  8.83366318
 49.96106832  9.4799259  14.94252611 28.63900467]


In [None]:
a = np.random.uniform(-1, 1, size=10)
b = np.where(a < 0)[0]
print(b)
print()
print(a)

[0 1 4 7 8]

[-0.78174447 -0.5669412   0.06785797  0.56217444 -0.36740643  0.72453974
  0.30158386 -0.2550579  -0.41333081  0.11743063]


## 2.4 Practice Problem

Use np.loadtxt to open the file named 'binaries.txt'. Characterize it. In particular,

1. How many rows does it have?
2. How many columns does it have?
3. What kinds of values do each column contain? That is, how big or small are the numbers in each column? What's the smallest/largest number? The median? The average? If you notice anything else interesting, write that down too.

Answer: Write your answers here.


In [None]:
# Running in Google Colab? Run this cell to download the file
!wget https://raw.githubusercontent.com/CIERA-Northwestern/REACHpy/main/Module_2/data/binaries.txt

# If you're not running in Colab, this file should be in the data directory.
# Change the loading path of the file to include 'data/' when the file is loaded

--2024-07-19 22:31:31--  https://raw.githubusercontent.com/CIERA-Northwestern/REACHpy/main/Module_2/data/binaries.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 30000 (29K) [text/plain]
Saving to: ‘binaries.txt.1’


2024-07-19 22:31:31 (11.1 MB/s) - ‘binaries.txt.1’ saved [30000/30000]



In [None]:
data = np.loadtxt('binaries.txt')
print(data)

[[1.00000000e+00 1.00000000e+00 8.81212551e-01 7.76804862e-01
  6.28615959e+04 2.06867218e-02]
 [1.00000000e+00 1.00000000e+00 1.77039360e+00 1.34342425e+00
  1.47145158e+04 4.70694691e-01]
 [1.00000000e+00 0.00000000e+00 9.92801200e-01 5.66901650e-01
  2.50524376e+04 8.24416117e-01]
 ...
 [1.00000000e+00 1.00000000e+00 3.46500488e+00 8.50591883e-01
  7.04027404e+04 5.51253701e-01]
 [1.00000000e+00 1.00000000e+00 1.21141564e+00 1.10036871e+00
  4.16808849e+06 7.62588117e-01]
 [1.00000000e+00 1.00000000e+00 3.18885496e+00 2.48999519e+00
  7.33612769e+04 5.33781888e-01]]


Now that you've done that, I should tell you what you're looking at. I used a program called [COSMIC](https://cosmic-popsynth.github.io/docs/stable/index.html) to create a list of binary star systems based on the properties of observed systems. To be clear, the data here is completely synthetic. Each row represents a different binary star system. The columns are as follows

1) The k-type of star 1

2) The k-type of star 2

3) The mass of star 1 in units of solar mass

4) The mass of star 2 in units of solar mass

5) The orbital period in days.

6) The orbital eccentricity which ranges between 0 and 1.

The k-type is related to the evolutionary state of the star. Here, all of the k-types are either 0 or 1, both of which correspond to main-sequence stars.

Now, here are your next tasks:

1. How many binaries have stars where at least one is above 1 solar mass? 5 solar masses? 10?
2. How many stars in this sample have masses above 1, 5 and 10 solar masses?
3. How many binaries have orbital periods shorter than a month? A year? What are the shortest and longest orbital periods?
4. As stated before, the k-type for all these stars are either 0 or 1. A main-sequence star will have k-type 0 below a certain mass and k-type 1 above that mass. See if you can find a range of values where the cutoff mass might be. State your answer in solar masses.
5. Using Kepler's Third Law, make another array that corresponds to the semimajor axis of each binary system. Make sure you explicitly state what units you're using (ie meters, kilometers, AU). Once again, Keplers Third Law is

$$T = 2\pi \sqrt{\frac{a^3}{G(M_1 + M_2)}}$$

## Problem 1

In [None]:
m1 = data[:,2]
m2 = data[:,3]

In [None]:
mSun1 = data[(m1 > 1) | (m2 > 1)]
mSun1_count = len(mSun1)
print(mSun1_count)
print()
print(f'There are {mSun1_count} binaries where at least one star has a mass greater than 1 solar mass.')

148

There are 148 binaries where at least one star has a mass greater than 1 solar mass.


In [None]:
# Another method using np.where()
mSun5 = np.where((m1 > 5) | (m2 > 5))[0]
mSun5_count = mSun5.shape[0]
print(mSun5_count)
print()
print(f'There are {mSun5_count} binaries where at least one star has a mass greater than 5 solar masses.')

42

There are 42 binaries where at least one star has a mass greater than 5 solar masses.


In [None]:
mSun10 = data[(m1 > 10) | (m2 > 10)]
mSun10_count = len(mSun10)
print(mSun10_count)
print()
print(f'There are {mSun10_count} binaries where at least one star has a mass greater than 10 solar masses.')

18

There are 18 binaries where at least one star has a mass greater than 10 solar masses.


In [None]:
# Function solution
def mass_count(sol_mass):
    mSun = data[(m1 > sol_mass)| (m2 > sol_mass)]
    mSun_count = len(mSun)
    #alternative
    #mSun_count = np.where((m1 > sol_mass) | (m2 > sol_mass))[0].shape[0]

    print(f'There are {mSun_count} binaries where at least one star has a mass greater than {sol_mass} solar mass(es).')

In [None]:
mass_count(1)

There are 148 binaries where at least one star has a mass greater than 1 solar mass(es).


In [None]:
mass_count(5)

There are 42 binaries where at least one star has a mass greater than 5 solar mass(es).


In [None]:
mass_count(10)

There are 18 binaries where at least one star has a mass greater than 10 solar mass(es).


## Problem 2

In [None]:
all_masses = np.concatenate((m1, m2))

In [None]:
all_mSun1 = all_masses[all_masses > 1]
all_mSun1_count = len(all_mSun1)
print(all_mSun1_count)
print()
print(f'There are {all_mSun1_count} total stars with a mass greater than 1 solar mass.')

240

There are 240 total stars with a mass greater than 1 solar mass.


In [None]:
all_mSun5 = all_masses[all_masses > 5]
all_mSun5_count = len(all_mSun5)
print(all_mSun5_count)
print()
print(f'There are {all_mSun5_count} total stars with a mass greater than 5 solar masses.')

62

There are 62 total stars with a mass greater than 5 solar masses.


In [None]:
all_mSun10 = all_masses[all_masses > 10]
all_mSun10_count = len(all_mSun10)
print(all_mSun10_count)
print()
print(f'There are {all_mSun10_count} total stars with a mass greater than 10 solar masses.')

25

There are 25 total stars with a mass greater than 10 solar masses.


In [None]:
def all_mass_count(sol_mass):
    all_masses = np.concatenate((m1, m2))
    mSun_count = len(all_masses[all_masses > sol_mass])
    print(f'There are {mSun_count} total stars with a mass greater than {sol_mass} solar mass(es).')

In [None]:
all_mass_count(1)

There are 240 total stars with a mass greater than 1 solar mass(es).


In [None]:
all_mass_count(5)

There are 62 total stars with a mass greater than 5 solar mass(es).


In [None]:
all_mass_count(10)

There are 25 total stars with a mass greater than 10 solar mass(es).


## Problem 3

In [None]:
porb = data[:,4]

In [None]:
porb_month = data[porb < 30]
print(len(porb_month))
print()
print(f'There are {len(porb_month)} binaries that have orbital periods shorter than one month.')

33

There are 33 binaries that have orbital periods shorter than one month.


In [None]:
porb_yr = data[porb < 365.25]
print(len(porb_yr))
print()
print(f'There are {len(porb_yr)} binaries that have orbital periods shorter than one year.')

57

There are 57 binaries that have orbital periods shorter than one year.


In [None]:
porb_min = porb.min()
print(porb_min)
print()
print(f'The minimum orbital period is {porb_min:.2f} days.')

1.801811702332768

The minimum orbital period is 1.80 days.


In [None]:
porb_max = porb.max()
print(porb_max)
print()
print(f'The maximum orbital period is {porb_max:,.2f} days.')
print()
print(f'The maximum orbital period is {porb_max/365.25:.2f} years.')

67915281.32414712

The maximum orbital period is 67,915,281.32 days.

The maximum orbital period is 185941.91 years.


## Problem 4

In [None]:
m1 = data[:,2]
m2 = data[:,3]
all_masses = np.concatenate((m1, m2))

k1 = data[:,0]
k2 = data[:,1]
all_ktypes = np.concatenate((k1, k2))

In [None]:
ktype0 = all_masses[np.where(all_ktypes==0)[0]]
print(ktype0)

ktype0_min = ktype0.min()
ktype0_max = ktype0.max()

print()
print(ktype0_min)
print(ktype0_max)
print()
print(f"The range of mass for stars with k = 0 is {ktype0_min:.2f} to {ktype0_max:.2f} solar masses.")

[0.57934428 0.65732089 0.55675441 0.66374045 0.66055641 0.62227614
 0.60426115 0.52111388 0.59018537 0.63405243 0.60854679 0.63619815
 0.63101254 0.5897245  0.58523403 0.55639617 0.56690165 0.56223745
 0.61535332 0.59607911 0.54400311 0.69870895 0.50306811 0.51240934
 0.54823416 0.55964648 0.68645324 0.58236399 0.53719113 0.54416234
 0.59235973 0.52344153 0.52881731 0.54272026 0.6345005  0.68957127
 0.6659802  0.67504128 0.65307841 0.52449152 0.53438246 0.57459822
 0.53181529 0.51393436 0.57423246 0.55775037 0.60156025 0.60451348
 0.67653714 0.62730293 0.62984894 0.65189864 0.57326348 0.5491008
 0.60406752 0.68636246 0.54999756 0.58730136 0.591433   0.56656134
 0.50368517 0.51446666 0.59453621 0.62745009 0.51810361 0.58052228
 0.54764009 0.56293011 0.63254416 0.51604528 0.5236259  0.54013956
 0.54729691 0.54088708]

0.503068107966992
0.6987089484923723

The range of mass for stars with k = 0 is 0.50 to 0.70 solar masses.


In [None]:
ktype1 = all_masses[np.where(all_ktypes==1)[0]]
print(ktype1)

ktype1_min = ktype1.min()
ktype1_max = ktype1.max()

print()
print(ktype1_min)
print(ktype1_max)
print()
print(f"The range of mass for stars with k = 1 is {ktype1_min:.2f} to {ktype1_max:.2f} solar masses.")

[ 0.88121255  1.7703936   0.9928012   2.56117286  3.52799289  1.780738
  7.32569779  4.53825652  1.88566148  1.2280377   2.51263247  9.98768971
  0.70316841  7.46360161  3.50841219  4.11438623  8.02008352 21.31510398
  1.72190262  1.04824236  2.41455355  0.9683944   0.98961324  1.8853475
  6.8787421   2.84505873  2.60136673  1.89632717  2.60876412 29.87443005
  0.92866024 10.54207908  0.90541401 15.81961519  1.08302496  0.90629382
  3.56756702  6.7405956   4.00456185  4.78045023  8.20175655  1.67572544
  1.84425365  0.76803439  4.3417644   4.12563639  4.17775214  1.34793421
 63.97121983  1.43545404  3.56278159  8.2702774  14.27574833  0.83166586
 11.51569062  2.03169758  3.57665303  6.64714205  0.87180774  3.33500651
  0.97374132  2.14606115  2.56689289  1.95769069  1.05139384  0.97889865
 20.04940249  0.95688102 34.6573491   1.16073474  3.15959233  1.7513581
  1.75398319  7.08105369  1.07094682  0.95344575  3.9050805   0.71225912
  5.09172264  1.38528668  0.79486872  0.95828498  1.576

## Problem 5

In [None]:
# My work to solve for a

# 2*pi*sqrt(a**3/G*(m1+m2)) = porb
# sqrt(a**3/G*(m1+m2)) = porb/2*pi
# a**3/G*(m1+m2) = (porb/2*pi)^2
# a**3 = (porb**2/4*pi**2)*(G*(m1+m2))
# a = ((porb**2/4*pi**2)*(G*(m1+m2)))**(1/3)

In [None]:
G = 6.6743*10**(-11)
porb_day = data[:,4]
m1_mSun = data[:,2]
m2_mSun = data[:,3]

day_to_s = 86400 #1 day = 86400 s
porb_s = porb_day * day_to_s

mSun_to_kg = 1.98947*10**30 #1 m_Sun = 1.98947*10**30 kg
m1_kg = m1_mSun * mSun_to_kg
m2_kg = m2_mSun * mSun_to_kg

In [None]:
a_m = ((porb_s**2/4*np.pi**2)*(G*(m1_kg+m2_kg)))**(1/3)

In [None]:
print(a_m)

[2.52110006e+13 1.18140418e+13 1.33777805e+13 1.32746134e+13
 2.05514512e+12 9.25267256e+12 4.76553144e+11 2.69381492e+12
 6.00785082e+12 1.05187347e+14 1.27259515e+14 5.02894407e+13
 1.46496143e+12 2.05263044e+12 6.14349015e+13 1.18090783e+13
 4.81649388e+13 1.78561011e+13 1.74115786e+11 1.68474896e+12
 8.42307979e+13 2.60090538e+12 5.36610430e+13 1.71432181e+12
 6.87296913e+12 2.59593384e+10 8.25546886e+13 6.09084683e+12
 9.21641572e+13 7.77086267e+10 7.21289844e+10 2.90183511e+14
 3.18287498e+11 1.66957155e+12 3.07467597e+11 5.43686875e+14
 1.69032808e+11 3.51993125e+14 2.78313251e+12 7.69705452e+12
 1.31860402e+12 2.43087970e+12 6.87870288e+11 1.51967035e+14
 6.04588952e+10 6.62667141e+14 5.04683697e+12 7.39655568e+12
 5.18269050e+11 1.53187636e+12 1.46989159e+11 6.10930759e+12
 9.32767908e+10 8.39283186e+14 6.42039115e+13 1.49259946e+11
 6.00617241e+12 5.18974996e+14 1.20792659e+14 2.99565952e+15
 2.83629656e+14 4.79203532e+11 4.09378852e+10 1.04630464e+14
 1.16984251e+14 4.201392

In [None]:
AU_to_m = 1.496*10**11 #1 AU = 1.496*10**11 m

a_AU = a_m / AU_to_m #to get to AU, need to divide by conversion factor rather than multiply like usual
                      # [AU_to_m] = m/AU
                      # [AU] = [m]/[m/AU] = [AU]

print(a_AU)

[1.68522731e+02 7.89708677e+01 8.94236667e+01 8.87340468e+01
 1.37376011e+01 6.18494155e+01 3.18551567e+00 1.80067842e+01
 4.01594306e+01 7.03123980e+02 8.50665207e+02 3.36159363e+02
 9.79252290e+00 1.37207917e+01 4.10661107e+02 7.89376891e+01
 3.21958147e+02 1.19358965e+02 1.16387558e+00 1.12616909e+01
 5.63040093e+02 1.73857312e+01 3.58696812e+02 1.14593704e+01
 4.59423070e+01 1.73524989e-01 5.51836154e+02 4.07142168e+01
 6.16070569e+02 5.19442692e-01 4.82145618e-01 1.93972935e+03
 2.12759023e+00 1.11602376e+01 2.05526468e+00 3.63427055e+03
 1.12989845e+00 2.35289522e+03 1.86038269e+01 5.14508992e+01
 8.81419796e+00 1.62491959e+01 4.59806342e+00 1.01582243e+03
 4.04137000e-01 4.42959319e+03 3.37355412e+01 4.94422171e+01
 3.46436531e+00 1.02398152e+01 9.82547853e-01 4.08376176e+01
 6.23507960e-01 5.61018173e+03 4.29170531e+02 9.97726909e-01
 4.01482113e+01 3.46908420e+03 8.07437559e+02 2.00244620e+04
 1.89592016e+03 3.20323216e+00 2.73648965e-01 6.99401495e+02
 7.81980289e+02 2.808417