# PHXS 491_001: Observational Astronomy - Homework 2

### Due Sep 14, 2020
Remember to save your completed notebook as a PDF and upload to Brightspace under Assignments.

Name:

# Part 1: Intro to Numpy
The Numpy module is fundamental to almost all analysis tools that use Python. The power of Numpy is that it allows you to do math on whole arrays, and it does this math in C. This allows the operations to be very fast which is essential when dealing with large amounts of data. For more information see:
* http://www.numpy.org

* https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

## Python indices are "zero-based"

* The center of the origin pixel is index 0
* e.g. for a 2D array the origin pixel (lower-left) is [0, 0]

### Comparisons to other languages/applications:

* 0-based:  python, C, IDL
* 1-based:  fortran, iraf, FITS WCS, SExtractor, ds9

## Python arrays are stored in "row-major" order

* for a 2D array, if x is the column index and y is the row index, then
the array is indexed as **[y, x]**
  * e.g. **data[y, x]**
  * *x (column) is the fast array index and y (row) is the slow array index*
* for a 3D array, index as e.g. **data[z, y, x]**

## Numpy multidimensional array (ndarray):
* an array of homogeneous elements (usually numbers), all of the same type
* a memory-efficient container that provides fast numerical operations
* designed for scientific computation (array-oriented computing)

First you need to import the Numpy Module

In [1]:
import numpy as np    # standard convention

## How to create a Numpy Array
You basically create a list and generate the numpy array from there.

In [2]:
# define a 1D array of 4 elements
a = np.array([0, 1, 2, 3])
print(a)
print(type(a))

[0 1 2 3]
<class 'numpy.ndarray'>


In [3]:
# define a 2D (3x3) array
b = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(b)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


## Arange
Like `range()` `np.arrange()` produces a sequence of numbers. Unlike `range()` this is not a range object but rather a numpy array.

In [4]:
c = np.arange(10)
d = np.arange(2, 5, 0.5)  # start, stop (exclusive), step
print(c)
print(d)

[0 1 2 3 4 5 6 7 8 9]
[2.  2.5 3.  3.5 4.  4.5]


## Zeros
np.zeros let's you create an empty numpy array of any size or dimension.

In [5]:
e = np.zeros(3)
print(e)

[0. 0. 0.]


In [6]:
f = np.zeros((3, 3))
print(f)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


## Array attributes
Numpy arrays have a number of attributes that you can access to determine what kind of array it is.

In [7]:
a = np.array([[0, 1, 2], [3, 4, 5]])
print(a.ndim) #How many dimensions
print(a.size) #How many elements
print(a.shape) #How are those elements arranged
print(a.dtype) #What is the object type

2
6
(2, 3)
int32


Create a string array of letters in alphabetical order with shape (3,4). What type is it? What size?, How many dimensions?

In [8]:
a = np.array([["a","b","c","d"], ["e","f","g","h"], ["i","j","k","l"]])
print(a.shape) #How are those elements arranged
print(a.dtype) #What is the object type
print(a.size) #How many elements
print(a.ndim) #How many dimensions

(3, 4)
<U1
12
2


## Basic Numpy Operations
The most powerful thing about Numpy arrays is that opreations work *elementwise* and in C. So you can use a numpy array in an equation.

In [9]:
print('a Matrix:')
a = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(a)
print('Addition:')
print(a + 10)
print('Power:')
print(a ** 3)
print('Equations:')
print(a + (2 * a))
print('Elementwise multiplication, not matrix multiplication')
print(a * a )
print('Matrix multiplication:')
print(np.dot(a, a))
# a.dot(a)   # shorthand for above
print('To Save memory you can do operations in place:')
a *= 3
print(a)

a Matrix:
[[0 1 2]
 [3 4 5]
 [6 7 8]]
Addition:
[[10 11 12]
 [13 14 15]
 [16 17 18]]
Power:
[[  0   1   8]
 [ 27  64 125]
 [216 343 512]]
Equations:
[[ 0  3  6]
 [ 9 12 15]
 [18 21 24]]
Elementwise multiplication, not matrix multiplication
[[ 0  1  4]
 [ 9 16 25]
 [36 49 64]]
Matrix multiplication:
[[ 15  18  21]
 [ 42  54  66]
 [ 69  90 111]]
To Save memory you can do operations in place:
[[ 0  3  6]
 [ 9 12 15]
 [18 21 24]]


## Statistical Functions
Numpy has many functions that can take a numpy array and return a statistical value. Things like sum, average, median, and standard deviation are built-in. You can often call these functions in two ways.

In [10]:
print(np.sum(a))
print(a.sum())
print(a.mean())
print(a.std())

108
108
12.0
7.745966692414834


## Mathematical Functions
Whereas most of the function in the math module work only on single numbers. The same function exist in numpy, so that you can work on them all at the same time.

In [11]:
x = np.arange(5)
print(x)
print(np.exp(x))
print(np.sqrt(x))
print(np.sin(x))

[0 1 2 3 4]
[ 1.          2.71828183  7.3890561  20.08553692 54.59815003]
[0.         1.         1.41421356 1.73205081 2.        ]
[ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]


## Indexing and Slicing
Numpy has a very powerful indexing and slicing abilities. It has the same indexing abiities as a Python sequence, but it also allows you to do boolean expressions to create new numpy arrays from old ones.

Here are some of the standard indexing:

In [12]:
x = np.arange(10)**3
print(x)
print(x[3])
print(x[-1])
print(x[3:6])

[  0   1   8  27  64 125 216 343 512 729]
27
729
[ 27  64 125]


## Matrix Indexing
Remember that matrices are row-major.

In [13]:
a = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(a)
print(a[1,2])

[[0 1 2]
 [3 4 5]
 [6 7 8]]
5


How would I index the element that contains the number 8?

In [14]:
print(a[2][2])

8


## Here is the Fancy Indexing
You can give numpy arrays indices in any order and they don't have to be continuous. You can also use boolean expressions to make mask arrays that effectively let you creat new numpy arrays from conditions placed on old ones.

In [15]:
idx = [5, 2, 1]
print(x[idx])
#A shorthand way
print(x[[5,2,1]])

[125   8   1]
[125   8   1]


In [16]:
maskidx = (x > 300)
print(maskidx)
print(x[maskidx])
#A shorthand way to do this
print(x[(x>300)])

[False False False False False False False  True  True  True]
[343 512 729]
[343 512 729]


In [17]:
maskidx = ((x > 50) & (x < 200)) #Logical AND
print(x[maskidx])
maskidx = ((x < 50) | (x > 200)) #Logical OR
print(x[maskidx])

[ 64 125]
[  0   1   8  27 216 343 512 729]


## Now it is your turn
Use numpy to find the the y values for `x = [56, 62, 84, 16, 57, 73, 84, 27, 93, 42, 33, 17, 30, 72, 57, 53, 41, 13, 36, 79]` given a line of slope of 3 and y-intercept of -2.

In [18]:
x = np.array([56, 62, 84, 16, 57, 73, 84, 27, 93, 42, 33, 17, 30, 72, 57, 53, 41, 13, 36, 79])
y = 3*x -2
y

array([166, 184, 250,  46, 169, 217, 250,  79, 277, 124,  97,  49,  88,
       214, 169, 157, 121,  37, 106, 235])

What is the average of the x array? What is the standard deviation?

In [19]:
print("x average: {:.2f}".format(np.mean(x)))
print("x std dev: {:.2f}".format(np.std(x)))

x average: 51.25
x std dev: 23.92


What is the average and standard deviation of only those elements greater than 20, but less than 50?

In [20]:
x1 = x[(x > 20) & (x < 50)]
print("x' average: {:.2f}".format(np.mean(x1)))
print("x' std dev: {:.2f}".format(np.std(x1)))

x' average: 34.83
x' std dev: 5.46


# Part 2: Astropy Units and Quantities

This portion is based on a notebook given at the Astropy Session at th American Astronomical Society.

The Astropy module includes a powerful framework for units that allows users to attach units to scalars and arrays.  These quantities can be manipulated or combined, keeping track of the units.

For more information about the features presented below, please see the
[astropy.units](http://docs.astropy.org/en/stable/units/index.html) docs.

## Representing units and quantities

Because we may want to use units in many expressions, it is easiest and
most concise to import the units module with:

In [21]:
import astropy.units as u

However, note that this will conflict with any variable called ``u``.

Units can then be accessed as u."unit", e.g.:

In [22]:
u.m

Unit("m")

Units have docstrings defining them:

In [23]:
u.m.__doc__

'meter: base unit of length in SI'

and a physical type:

In [24]:
u.m.physical_type

'length'

In [25]:
u.pc.__doc__

'parsec: approximately 3.26 light-years.'

In [26]:
u.s.physical_type

'time'

Please see the complete list of [available units](https://astropy.readthedocs.org/en/stable/units/index.html#module-astropy.units.si).

## Composite units

Composite units are created using Python numeric operators, e.g. "`*`" (multiplication), "`/`" (division), and "`**`" (power).

In [27]:
u.km / u.s

Unit("km / s")

In [28]:
u.imperial.mile / u.h

Unit("mi / h")

In [29]:
(u.eV * u.Mpc) / u.Gyr

Unit("eV Mpc / Gyr")

In [30]:
u.cm**3

Unit("cm3")

In [31]:
u.m / u.kg / u.s**2

Unit("m / (kg s2)")

## ``Quantity`` objects
The most useful feature of units is the ability to attach them to scalars or arrays, creating ``Quantity`` objects. The easiest way to create a `Quantity` object is simply by multiplying the value with its unit.

In [32]:
3. * u.au

<Quantity 3. AU>

A completely equivalent (but more verbose) way of doing the same thing is to use the `Quantity` object's initializer, demonstrated below.  In general, the simpler form (above) is preferred, as it is closer to how such a quantity would actually be written in text.  The initalizer form has more options, though, which you can learn about from the [astropy reference documentation on Quantity](http://docs.astropy.org/en/stable/api/astropy.units.quantity.Quantity.html).

In [33]:
u.Quantity(3, unit=u.au)

<Quantity 3. AU>

We can also generate a ``Quantity`` array:

In [34]:
import numpy as np
x = np.array([1.2, 2.2, 1.7]) * u.pc / u.year
x

<Quantity [1.2, 2.2, 1.7] pc / yr>

In [35]:
x * 3

<Quantity [3.6, 6.6, 5.1] pc / yr>

## `Quantity` attributes

The units and value of a `Quantity` can be accessed separately via the ``value`` and ``unit`` attributes:

In [36]:
q = 5. * u.Mpc

In [37]:
q.value

5.0

In [38]:
q.unit

Unit("Mpc")

In [39]:
x = np.array([1.2, 2.2, 1.7]) * u.pc / u.year
print(x.value)
print(x.unit)

[1.2 2.2 1.7]
pc / yr


## Combining Quantities

Quantities can also be combined using Python numeric operators:

In [40]:
q1 = 3. * u.m / u.s
q1

<Quantity 3. m / s>

In [41]:
q2 = 5. * u.cm / u.s / u.g**2
q2

<Quantity 5. cm / (g2 s)>

In [42]:
q1 * q2

<Quantity 15. cm m / (g2 s2)>

In [43]:
q1 / q2

<Quantity 0.6 g2 m / cm>

In [44]:
q1 ** 2

<Quantity 9. m2 / s2>

Addition and subtraction require compatible unit types:

In [45]:
q1 = 3 * u.m
q1 + (5 * u.m)

<Quantity 8. m>

In [46]:
q1 + (5. * u.kpc)

<Quantity 1.54283879e+20 m>

In [47]:
q1 + (10. * u.km)

<Quantity 10003. m>

## Coverting units

In [48]:
(2.5 * u.year).to(u.s)

<Quantity 78894000. s>

In [49]:
(7. * u.deg**2).to(u.sr)

<Quantity 0.00213232 sr>

In [50]:
q1.to(u.au)

<Quantity 2.00537614e-11 AU>

In [51]:
(55. * u.imperial.mile / u.h).to(u.km / u.h)

<Quantity 88.51392 km / h>

In [52]:
q1 * q2

<Quantity 15. cm m / (g2 s)>

In [53]:
(q1 * q2).to(u.m**2 / u.kg**2 / u.s)

<Quantity 150000. m2 / (kg2 s)>

## Decomposing units

The units of a quantity can be decomposed into a set of base units using the
``decompose()`` method. By default, units will be decomposed to S.I.:

In [54]:
q = 3. * u.cm * u.pc / u.g / u.year**2
q

<Quantity 3. cm pc / (g yr2)>

In [55]:
q.decompose()

<Quantity 929.53097353 m2 / (kg s2)>

To decompose into c.g.s. bases:

In [56]:
q.decompose(u.cgs.bases)

<Quantity 9295.30973535 cm2 / (g s2)>

In [57]:
u.cgs.bases

{Unit("K"),
 Unit("cd"),
 Unit("cm"),
 Unit("g"),
 Unit("mol"),
 Unit("rad"),
 Unit("s")}

In [58]:
u.si.bases

{Unit("A"),
 Unit("K"),
 Unit("cd"),
 Unit("kg"),
 Unit("m"),
 Unit("mol"),
 Unit("rad"),
 Unit("s")}

## Using physical constants

The [astropy.constants](http://docs.astropy.org/en/stable/constants/index.html) module contains physical constants relevant for astronomy.  They are defined as ``Quantity`` objects using the ``astropy.units`` framework.

In [59]:
from astropy.constants import G, c, R_earth
G

<<class 'astropy.constants.codata2018.CODATA2018'> name='Gravitational constant' value=6.6743e-11 uncertainty=1.5e-15 unit='m3 / (kg s2)' reference='CODATA 2018'>

In [60]:
c

<<class 'astropy.constants.codata2018.CODATA2018'> name='Speed of light in vacuum' value=299792458.0 uncertainty=0.0 unit='m / s' reference='CODATA 2018'>

In [61]:
R_earth

<<class 'astropy.constants.iau2015.IAU2015'> name='Nominal Earth equatorial radius' value=6378100.0 uncertainty=0.0 unit='m' reference='IAU 2015 Resolution B 3'>

Constants are Quantities, thus they can be coverted to other units:

In [62]:
R_earth.to(u.km)

<Quantity 6378.1 km>

Please see the complete list of [available physical constants](http://docs.astropy.org/en/stable/constants/index.html#module-astropy.constants).  Additions are welcome!

## An example

Kepler's law can be used to estimate the (circular) orbital speed of the Earth around the sun using:

$$v = \sqrt{\frac{G M_{\odot}}{r}}$$

In [63]:
v = np.sqrt(G * 1 * u.M_sun / (1 * u.au))
v

<Quantity 8.16963891e-06 m(3/2) solMass(1/2) / (AU(1/2) kg(1/2) s)>

In [64]:
v.decompose()

<Quantity 29784.69182968 m / s>

In [65]:
v.to(u.km / u.s)

<Quantity 29.78469183 km / s>

## Exercise 1

The *James Webb Space Telescope (JWST)* will be located at the second Sun-Earth Lagrange (L2) point.  L2 is located opposite the Sun at a distance from the Earth of approximately:

$$ r \approx R \left(\frac{M_{earth}}{3 M_{sun}}\right) ^{(1/3)} $$"

where $R$ is the Sun-Earth distance. ![](l2graphic.jpg)

Calculate the Earth-L2 distance in kilometers and miles:

* *Hint*:  the mile unit is defined as ``u.imperial.mile`` (see [imperial units](http://docs.astropy.org/en/v0.2.1/units/index.html#module-astropy.units.imperial))

In [67]:
from astropy.constants import M_earth, M_sun, R_sun, R_earth, G
r =  u.AU *(M_earth/(3*M_sun) )**(1/3)
print("{:.2f}".format(r.to(u.km)))
print("{:.2f}".format(r.to(u.imperial.mile)))

1496558.48 km
929918.33 mi


## Exercise 2

The L2 point is about 1.5 million kilometers away from the Earth opposite the Sun.
The total mass of the *James Webb Space Telescope (JWST)* is about 6500 kg.

Using the value you obtained above for the Earth-L2 distance, calculate the gravitational force in Newtons between 

* *JWST* (at L2) and the Earth
* *JWST* (at L2) and the Sun

*Hint*: the gravitational force between two masses separated by a distance *r* is:

$$ F_g = \frac{G m_1 m_2}{r^2} $$

In [68]:
m1 = 6500 * u.kg
F_earth = G * m1*M_earth /(r**2)
F_sun = G*m1*M_sun/((1*u.AU + R_sun + 2*R_earth + r)**2)
print("{:.2f}".format(F_earth.to(u.N)))
print("{:.2f}".format(F_sun.to(u.N)))

1.16 N
37.43 N


## Equivalencies

Equivalencies can be used to convert quantities that are not strictly the same physical type.

In [69]:
# this raises an error
#(7. * u.cm).to(u.GHz)

In [70]:
# we need to use an equivalency (here spectral)
(7. * u.cm).to(u.GHz, equivalencies=u.spectral())

<Quantity 4.2827494 GHz>

In [71]:
(13.6 * u.eV).to(u.Angstrom, equivalencies=u.spectral())

<Quantity 911.64851789 Angstrom>

In [72]:
u.eV.find_equivalent_units()

Primary name,Unit definition,Aliases
J,kg m2 / s2,"Joule, joule"
Ry,2.17987e-18 kg m2 / s2,rydberg
eV,1.60218e-19 kg m2 / s2,electronvolt
erg,1e-07 kg m2 / s2,


In [73]:
u.eV.find_equivalent_units(equivalencies=u.spectral())

Primary name,Unit definition,Aliases
Bq,1 / s,becquerel
Ci,3.7e+10 / s,curie
Hz,1 / s,"Hertz, hertz"
J,kg m2 / s2,"Joule, joule"
Ry,2.17987e-18 kg m2 / s2,rydberg
eV,1.60218e-19 kg m2 / s2,electronvolt
erg,1e-07 kg m2 / s2,
k,100 / m,"Kayser, kayser"
m,irreducible,meter


For spectral density equivalencies, it is necessary to supply the location in the spectrum where the conversion is done:

In [74]:
q = (1e-18 * u.erg / u.s / u.cm**2 / u.AA)
q

<Quantity 1.e-18 erg / (Angstrom cm2 s)>

In [75]:
q.to(u.uJy, equivalencies=u.spectral_density(1. * u.um))

<Quantity 3.33564095 uJy>

## Integration with Numpy ufuncs

Most of the [Numpy](http://www.numpy.org) functions understand `Quantity` objects:

In [76]:
np.sin(30 * u.degree)

<Quantity 0.5>

In [77]:
q = 100 * u.km * u.km
q

<Quantity 100. km2>

In [78]:
np.sqrt(q)

<Quantity 10. km>

In [79]:
np.exp(3 * u.m / (3 * u.km))

<Quantity 1.0010005>

Care needs to be taken with dimensionless units.  Passing dimensionless values to an inverse trigonometric function gives a result without units:

In [80]:
np.arcsin(1.0)

1.5707963267948966

`u.dimensionless_unscaled` creates a ``Quantity`` with a "dimensionless unit" and therefore gives a result *with* units:

In [81]:
np.arcsin(1.0 * u.dimensionless_unscaled)

<Quantity 1.57079633 rad>

In [82]:
np.arcsin(1.0 * u.dimensionless_unscaled).to(u.degree)

<Quantity 90. deg>

## Known issues

Quantities lose their units with some Numpy operations, e.g.:

* np.dot
* np.hstack
* np.vstack
* np.where
* np.choose
* np.vectorize

See [Quantity Known Issues](http://docs.astropy.org/en/stable/known_issues.html#quantities-lose-their-units-with-some-operations) for more details.

## Defining new units

You can also define custom units for something that isn't built-in to astropy.

In [83]:
# fundamental unit
chuckle = u.def_unit('chuckle')

In [84]:
# compound unit
laugh = u.def_unit('laugh', 4 * chuckle)

In [85]:
(3 * laugh).to(chuckle)

<Quantity 12. chuckle>

In [86]:
bakers_fortnight = u.def_unit('bakers_fortnight', 13 * u.day)

In [87]:
(3 * bakers_fortnight).to(u.s)

<Quantity 3369600. s>

# Part 3: Working with Data Files
In many kinds of research, we need to be able to read in and manipulate data. In this section we will look at different ways of reading in, writing out, and manipulating data files.

## The Standard Method
As we saw previously we learned how to read in and write out a standard ascii file. In general we want to put data into numpy arrays, so that we can work with it. If we use a standard read in, that requires a lot of work on our part. When you want to create a new numpy array. It is computationally cheaper to create a list first and then create a numpy array.

In [88]:
#Get my imports dealt with
import numpy as np
import astropy.units as u

In [89]:
try:
    infile = open('Homework2_data/hip_tiny.csv','r')
except IOError:
    print("File hip_tiny.csv could not be opened!")

#Define lists
name_list = list()
vmag_list = list()
    
for line in infile:
    #Check for header that begins with a # or are entirely blank
    if line.startswith("#") or line.isspace():
        print(line) #Print the header
        continue
    llist = line.split(',')
    name_list.append(llist[0])        #It is okay if the name is a string
    vmag_list.append(float(llist[5])) #Remember Vmag should be a float

infile.close()

#Now convert to numpy arrays
name_arr = np.array(name_list)
vmag_arr = np.array(vmag_list)
print(name_arr)
print(vmag_arr)

#HIP (Name),Ra (Degrees),Dec (Degrees),Plx (milliarcsec),B (mag),V (mag),err_Plx,err_B,err_V

['7' '25' '34' '38' '47' '50' '54' '55' '57']
[ 9.679  6.375  6.491  8.723 10.909  6.579 10.679  7.381  8.353]


Create numpy arrays from hip_small.csv for RA and DEC called ra_arr and dec_arr. Print the Median of each.

In [90]:
try:
    infile = open('Homework2_data/hip_tiny.csv','r')
except IOError:   
    print("File hip_tiny.csv could not be opened!")

#Define lists
ra_list = list()
dec_list = list()
    
for line in infile:
    #Check for header that begins with a # or are entirely blank
    if line.startswith("#") or line.isspace():continue
    llist = line.split(',')
    ra_list.append(float(llist[1]))      #It is okay if the name is a string
    dec_list.append(float(llist[2])) #Remember Vmag should be a float

infile.close()

#Now convert to numpy arrays
ra_arr = np.array(ra_list)
dec_arr = np.array(dec_list)
print(np.median(ra_arr))
print(np.median(dec_arr))

0.13519236
-53.09766277


## The Numpy way
Numpy has two built in functions for reading data files `np.loadtxt()` and `np.genfromtxt()`. Both work the same way, but `np.genfromtxt()` can handle missing data, so that is the one I generally use. These functions have several keywords for handling the data. The default column delimiter is whitespace, but we need to use commas, so we will set the *delimiter* keyword. Note the default data type is **float**.

In [91]:
#Load all the data into 2-d structured array
data_2darr = np.genfromtxt('Homework2_data/hip_tiny.csv',delimiter=',')
print(data_2darr)

[[ 7.00000000e+00  2.25489100e-02  2.00366022e+01  1.77400000e+01
   1.05420000e+01  9.67900000e+00  1.30000000e+00  3.90000000e-02
   3.00000000e-02]
 [ 2.50000000e+01  7.93653700e-02 -4.42902974e+01  1.37400000e+01
   7.23800000e+00  6.37500000e+00  9.80000000e-01  4.00000000e-03
   3.00000000e-03]
 [ 3.40000000e+01  9.94697300e-02  2.69182382e+01  1.27100000e+01
   7.04000000e+00  6.49100000e+00  7.40000000e-01  4.00000000e-03
   4.00000000e-03]
 [ 3.80000000e+01  1.11046940e-01 -7.90618313e+01  2.38400000e+01
   9.61400000e+00  8.72300000e+00  7.80000000e-01  1.60000000e-02
   1.20000000e-02]
 [ 4.70000000e+01  1.35192360e-01 -5.68352477e+01  2.44500000e+01
   1.21250000e+01  1.09090000e+01  1.97000000e+00  1.27000000e-01
   7.20000000e-02]
 [ 5.00000000e+01  1.42870590e-01 -5.30976628e+01  1.68900000e+01
   7.25600000e+00  6.57900000e+00  8.00000000e-01  4.00000000e-03
   3.00000000e-03]
 [ 5.40000000e+01  1.51655580e-01  1.79689558e+01  2.09700000e+01
   1.16850000e+01  1.0679000

Oftentimes it is easier to work with a series of 1-d arrays, instead of one big 2-d array. We can work with 1-d arrays by setting the *unpack* keyword to True. We can also specify which columns we want using the *usecols* keyword.

In [92]:
name_arr, vmag_arr = np.genfromtxt('Homework2_data/hip_tiny.csv',delimiter=',',usecols=(0,5),unpack=True)
print(name_arr)
print(vmag_arr)

[ 7. 25. 34. 38. 47. 50. 54. 55. 57.]
[ 9.679  6.375  6.491  8.723 10.909  6.579 10.679  7.381  8.353]


Create a numpy array using `np.genfromtxt` from hip_small.csv for Plx called plx_arr. Print the minimum value.

In [119]:
pix_arr = np.genfromtxt('Homework2_data/hip_small.csv',delimiter=',',usecols=(4),unpack=True)
print(np.min(pix_arr))

0.458


If you want to use a 2d array for your data, you can go further and assign names to each column. This way you can access the data by name instead of by indexing.

In [94]:
data_2darr = np.genfromtxt('Homework2_data/hip_tiny.csv',delimiter=',',names=True)
print(data_2darr.dtype.names)
print(data_2darr)

('HIP_Name', 'Ra_Degrees', 'Dec_Degrees', 'Plx_milliarcsec', 'B_mag', 'V_mag', 'err_Plx', 'err_B', 'err_V')
[( 7., 0.02254891,  20.03660216, 17.74, 10.542,  9.679, 1.3 , 0.039, 0.03 )
 (25., 0.07936537, -44.29029741, 13.74,  7.238,  6.375, 0.98, 0.004, 0.003)
 (34., 0.09946973,  26.91823821, 12.71,  7.04 ,  6.491, 0.74, 0.004, 0.004)
 (38., 0.11104694, -79.06183133, 23.84,  9.614,  8.723, 0.78, 0.016, 0.012)
 (47., 0.13519236, -56.83524773, 24.45, 12.125, 10.909, 1.97, 0.127, 0.072)
 (50., 0.14287059, -53.09766277, 16.89,  7.256,  6.579, 0.8 , 0.004, 0.003)
 (54., 0.15165558,  17.96895579, 20.97, 11.685, 10.679, 1.71, 0.102, 0.065)
 (55., 0.15783323, -66.68310336, 14.66,  7.946,  7.381, 0.98, 0.016, 0.015)
 (57., 0.16828557, -69.67580068, 33.89,  9.353,  8.353, 0.79, 0.013, 0.009)]


In [95]:
print(data_2darr['Ra_Degrees'])

[0.02254891 0.07936537 0.09946973 0.11104694 0.13519236 0.14287059
 0.15165558 0.15783323 0.16828557]


## Astropy Tables
The **astropy.table** is a module of astropy. This module provides methods for a new object type called Table. Table objects are very useful for working with large amounts of data with many columns. For instance, we can do all of the above with astropy Tables, and it is able to read from more than just text files. For more information see [the Astropy documentation for table module](http://docs.astropy.org/en/stable/table/)

In [96]:
from astropy.table import Table #Import in the Astropy object

The `Table.read()` method is a very easy way to read in information. It also automatically populates headers.

In [97]:
mytable = Table.read('Homework2_data/hip_smaller.csv')
print(mytable) #Tables are smart enough to show you only the first and last few columns

#HIP (Name) Ra (Degrees) Dec (Degrees) Plx (milliarcsec) ... err_Plx err_B err_V
----------- ------------ ------------- ----------------- ... ------- ----- -----
          7   0.02254891   20.03660216             17.74 ...     1.3 0.039  0.03
         25   0.07936537  -44.29029741             13.74 ...    0.98 0.004 0.003
         34   0.09946973   26.91823821             12.71 ...    0.74 0.004 0.004
         38   0.11104694  -79.06183133             23.84 ...    0.78 0.016 0.012
         47   0.13519236  -56.83524773             24.45 ...    1.97 0.127 0.072
         50   0.14287059  -53.09766277             16.89 ...     0.8 0.004 0.003
         54   0.15165558   17.96895579             20.97 ...    1.71 0.102 0.065
         55   0.15783323  -66.68310336             14.66 ...    0.98 0.016 0.015
         57   0.16828557  -69.67580068             33.89 ...    0.79 0.013 0.009
         58   0.17376341   62.17600484             26.06 ...    0.67 0.006 0.005
        ...          ...    

For smaller datasets, you can have direct data access to search and page through the data. **Be Careful: Large datasets can overwhelm your notebook kernel!**

In [1]:
#mytable.show_in_notebook() #Only use for relatively small tables

## Reading Data
Astropy tables can read/write many different formats: http://docs.astropy.org/en/stable/io/unified.html#built-in-table-readers-writers. Sometimes,though, it needs help.

We can do a quick check that we succeeded by using a Linux command `head`. `head` shows only the first ten lines of a file. We can access the linux command line by using `!`.

In [99]:
#Show the contents of hip_prob.txt
!head hip_prob.txt

'head' is not recognized as an internal or external command,
operable program or batch file.


In [100]:
#prob_tab = Table.read('Homework2_data/hip_prob.txt')

In [101]:
#Let's give it some help and suggest a format
prob_tab = Table.read('Homework2_data/hip_prob.txt',format='ascii')
print(prob_tab)

HIP(Name) Ra(Degrees) Dec(Degrees) Plx(milliarcsec) ... err_Plx err_B err_V
--------- ----------- ------------ ---------------- ... ------- ----- -----
        7  0.02254891  20.03660216            17.74 ...     1.3 0.039  0.03
       25  0.07936537 -44.29029741            13.74 ...    0.98 0.004 0.003
       34  0.09946973  26.91823821            12.71 ...    0.74 0.004 0.004
       38  0.11104694 -79.06183133            23.84 ...    0.78 0.016 0.012
       47  0.13519236 -56.83524773            24.45 ...    1.97 0.127 0.072
       50  0.14287059 -53.09766277            16.89 ...     0.8 0.004 0.003
       54  0.15165558  17.96895579            20.97 ...    1.71 0.102 0.065
       55  0.15783323 -66.68310336            14.66 ...    0.98 0.016 0.015
       57  0.16828557 -69.67580068            33.89 ...    0.79 0.013 0.009


**Additional note:** Be sure that the number of header columns matches your data. Also no two column names can repeat or it will not read, and the error messages will be **unhelpful**!

## Accessing data in an Astropy Table
Let's learn how to get useful information about our table.

In [102]:
#Get basic info about our table including how long it is and column names
mytable.info()

<Table length=500>
       name        dtype 
----------------- -------
      #HIP (Name)   int32
     Ra (Degrees) float64
    Dec (Degrees) float64
Plx (milliarcsec) float64
          B (mag) float64
          V (mag) float64
          err_Plx float64
            err_B float64
            err_V float64


In [103]:
#Get statistical information about each column
mytable.info('stats')

<Table length=500>
       name              mean               std             min          max    
----------------- ----------------- ------------------- ------------ -----------
      #HIP (Name)          2071.922  1182.2975885605113            7        4005
     Ra (Degrees) 6.575637460659999   3.803082310082671   0.02254891 12.85403677
    Dec (Degrees)     -7.0696758856  41.184198565035096 -85.89152517 86.78789828
Plx (milliarcsec)          24.09856   21.74627579900522         12.5      280.27
          B (mag)          8.825816  1.7860826481840082        2.003      13.066
          V (mag)          8.040028  1.6099407228888898         2.05      11.817
          err_Plx           1.15238  0.8222708407817948         0.47       17.24
            err_B          0.028122 0.04979758142721391        0.002       0.508
            err_V          0.017642 0.02414162040957483        0.002       0.292


Let's access a single column. The columns of an astropy table are similar to numpy arrays, but they have a column name associated with them. You can transform the columns back into normal numpy arrays using `np.array()`.

In [104]:
#Access one column
b_col = mytable['B (mag)']
print(b_col[1:3])
#Do math with two columns
bminusv_col = mytable['B (mag)'] - mytable['V (mag)']
print(bminusv_col[1:3])
#Convert to a numpy array
bminusv_array = np.array(bminusv_col)
print("Now in array form:")
print(bminusv_array[1:3])

B (mag)
-------
  7.238
   7.04
     B (mag)      
------------------
0.8630000000000004
0.5490000000000004
Now in array form:
[0.863 0.549]


You can also access individual rows, or list of indices

In [105]:
bmag = mytable['B (mag)'][1]
print(bmag)
bmag_col = mytable['B (mag)'][[1,3,6]]
print(bmag_col)

7.238
B (mag)
-------
  7.238
  9.614
 11.685


The real power of an astropy table is that you can use the results in one column to select values in another column

In [106]:
#Create a table with only stars less than err_Plx < 1
new_tab = mytable[(mytable['err_Plx'] < 1)]
new_tab

#HIP (Name),Ra (Degrees),Dec (Degrees),Plx (milliarcsec),B (mag),V (mag),err_Plx,err_B,err_V
int32,float64,float64,float64,float64,float64,float64,float64,float64
25,0.07936537,-44.29029741,13.74,7.238,6.375,0.98,0.004,0.003
34,0.09946973,26.91823821,12.71,7.04,6.491,0.74,0.004,0.004
38,0.11104694,-79.06183133,23.84,9.614,8.723,0.78,0.016,0.012
50,0.14287059,-53.09766277,16.89,7.256,6.579,0.8,0.004,0.003
55,0.15783323,-66.68310336,14.66,7.946,7.381,0.98,0.016,0.015
57,0.16828557,-69.67580068,33.89,9.353,8.353,0.79,0.013,0.009
58,0.17376341,62.17600484,26.06,7.665,7.11,0.67,0.006,0.005
122,0.39937928,-77.06529438,14.77,6.43,4.928,0.47,0.003,0.002
128,0.41376473,73.61187664,15.2,6.74,6.521,0.63,0.004,0.004
142,0.45374866,66.30600204,16.04,8.222,7.414,0.69,0.007,0.007


You can also do complex selection using bitwise and (`&`) or bitwise or (`|`)

In [107]:
#Select stars with Error in Parallax less than 1 and V Magnitude > 7
new_tab2 = mytable[(mytable['err_Plx'] < 1) & (mytable['V (mag)'] > 7)]
new_tab2

#HIP (Name),Ra (Degrees),Dec (Degrees),Plx (milliarcsec),B (mag),V (mag),err_Plx,err_B,err_V
int32,float64,float64,float64,float64,float64,float64,float64,float64
38,0.11104694,-79.06183133,23.84,9.614,8.723,0.78,0.016,0.012
55,0.15783323,-66.68310336,14.66,7.946,7.381,0.98,0.016,0.015
57,0.16828557,-69.67580068,33.89,9.353,8.353,0.79,0.013,0.009
58,0.17376341,62.17600484,26.06,7.665,7.11,0.67,0.006,0.005
142,0.45374866,66.30600204,16.04,8.222,7.414,0.69,0.007,0.007
276,0.85791177,20.66591788,15.55,8.237,7.555,0.95,0.01,0.008
277,0.85927886,-36.25120504,15.84,7.347,7.007,0.81,0.004,0.004
293,0.91589999,-57.16443914,13.14,8.867,8.359,0.97,0.01,0.009
305,0.97206281,-28.3937693,20.44,8.56,7.738,0.96,0.01,0.008
308,0.98353978,14.37891704,13.9,7.445,7.126,0.84,0.005,0.006


You can also access individual columns

In [108]:
name_col = mytable['#HIP (Name)'][(mytable['err_Plx'] < 1) & (mytable['V (mag)'] > 7)]
print(name_col[0:5])

#HIP (Name)
-----------
         38
         55
         57
         58
        142


## Your Turn
Print off only those stars that have an RA less than 5 and a Parallax less than 20

In [109]:
name_col = mytable['#HIP (Name)'][(mytable['Ra (Degrees)'] < 5) & (mytable['Plx (milliarcsec)'] < 20)]
name_col

0
7
25
34
50
55
80
84
93
122
128


## Modifying a table
You can modify a table the same way you modify a numpy array

In [110]:
change_tab = mytable
change_tab['#HIP (Name)'][[0,1,2,3,5,6]] = [1000,19,156,208,11,16453]
change_tab

#HIP (Name),Ra (Degrees),Dec (Degrees),Plx (milliarcsec),B (mag),V (mag),err_Plx,err_B,err_V
int32,float64,float64,float64,float64,float64,float64,float64,float64
1000,0.02254891,20.03660216,17.74,10.542,9.679,1.3,0.039,0.03
19,0.07936537,-44.29029741,13.74,7.238,6.375,0.98,0.004,0.003
156,0.09946973,26.91823821,12.71,7.04,6.491,0.74,0.004,0.004
208,0.11104694,-79.06183133,23.84,9.614,8.723,0.78,0.016,0.012
47,0.13519236,-56.83524773,24.45,12.125,10.909,1.97,0.127,0.072
11,0.14287059,-53.09766277,16.89,7.256,6.579,0.8,0.004,0.003
16453,0.15165558,17.96895579,20.97,11.685,10.679,1.71,0.102,0.065
55,0.15783323,-66.68310336,14.66,7.946,7.381,0.98,0.016,0.015
57,0.16828557,-69.67580068,33.89,9.353,8.353,0.79,0.013,0.009
58,0.17376341,62.17600484,26.06,7.665,7.11,0.67,0.006,0.005


You can also add new columns, rename, or remove old ones. Just make sure that the new column has exactly the same length as the table. You can also use units with your table.

In [111]:
new_column1 = np.arange(len(mytable))
new_column2 = mytable['Plx (milliarcsec)'] / 1000.0
change_tab['index'] = new_column1
change_tab['Plx'] = new_column2*u.arcsec
change_tab.rename_column('err_B','error_B')
change_tab.remove_column('err_V')
change_tab #Note the new row that gives the unit

#HIP (Name),Ra (Degrees),Dec (Degrees),Plx (milliarcsec),B (mag),V (mag),err_Plx,error_B,index,Plx
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,arcsec
int32,float64,float64,float64,float64,float64,float64,float64,int32,float64
1000,0.02254891,20.03660216,17.74,10.542,9.679,1.3,0.039,0,0.01774
19,0.07936537,-44.29029741,13.74,7.238,6.375,0.98,0.004,1,0.01374
156,0.09946973,26.91823821,12.71,7.04,6.491,0.74,0.004,2,0.01271
208,0.11104694,-79.06183133,23.84,9.614,8.723,0.78,0.016,3,0.02384
47,0.13519236,-56.83524773,24.45,12.125,10.909,1.97,0.127,4,0.02445
11,0.14287059,-53.09766277,16.89,7.256,6.579,0.8,0.004,5,0.016890000000000002
16453,0.15165558,17.96895579,20.97,11.685,10.679,1.71,0.102,6,0.02097
55,0.15783323,-66.68310336,14.66,7.946,7.381,0.98,0.016,7,0.01466
57,0.16828557,-69.67580068,33.89,9.353,8.353,0.79,0.013,8,0.03389
...,...,...,...,...,...,...,...,...,...


## Your Turn
Add a column to `change_tab` called `new_err_plx' where the error in Parallax has values and units arcsecs.

In [112]:
change_tab['new_err_plx'] = change_tab['err_Plx']*u.arcsec
change_tab

#HIP (Name),Ra (Degrees),Dec (Degrees),Plx (milliarcsec),B (mag),V (mag),err_Plx,error_B,index,Plx,new_err_plx
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,arcsec,arcsec
int32,float64,float64,float64,float64,float64,float64,float64,int32,float64,float64
1000,0.02254891,20.03660216,17.74,10.542,9.679,1.3,0.039,0,0.01774,1.3
19,0.07936537,-44.29029741,13.74,7.238,6.375,0.98,0.004,1,0.01374,0.98
156,0.09946973,26.91823821,12.71,7.04,6.491,0.74,0.004,2,0.01271,0.74
208,0.11104694,-79.06183133,23.84,9.614,8.723,0.78,0.016,3,0.02384,0.78
47,0.13519236,-56.83524773,24.45,12.125,10.909,1.97,0.127,4,0.02445,1.97
11,0.14287059,-53.09766277,16.89,7.256,6.579,0.8,0.004,5,0.016890000000000002,0.8
16453,0.15165558,17.96895579,20.97,11.685,10.679,1.71,0.102,6,0.02097,1.71
55,0.15783323,-66.68310336,14.66,7.946,7.381,0.98,0.016,7,0.01466,0.98
57,0.16828557,-69.67580068,33.89,9.353,8.353,0.79,0.013,8,0.03389,0.79
...,...,...,...,...,...,...,...,...,...,...


## Creating a table from scratch
Often times you want to save a new table based on your work. Remember tables can have units too. Adding new columns is just like a dictionary.

In [113]:
new_tab = Table()
new_tab['Name'] = np.arange(10)
new_tab['Distance'] = new_tab['Name'] * 10 * u.km
new_tab['Distance2'] = new_tab['Distance'].to(u.m)
print(new_tab)

Name Distance Distance2
        km        m    
---- -------- ---------
   0      0.0       0.0
   1     10.0   10000.0
   2     20.0   20000.0
   3     30.0   30000.0
   4     40.0   40000.0
   5     50.0   50000.0
   6     60.0   60000.0
   7     70.0   70000.0
   8     80.0   80000.0
   9     90.0   90000.0


## Writing out a Table
Once you have a table you can write it out into any of the formats in http://docs.astropy.org/en/stable/io/unified.html#built-in-table-readers-writers. Let's write out the same table using a commas separted file and a pipe `|` separated file.

In [114]:
new_tab.write('Homework2_data/distance.csv')
new_tab.write('Homework2_data/distance.txt',format='ascii',delimiter='|')



In [115]:
!head distance.csv

'head' is not recognized as an internal or external command,
operable program or batch file.


In [116]:
!head distance.txt

'head' is not recognized as an internal or external command,
operable program or batch file.


When we write to ascii we lose the units. We can use one of the enhanced file types like ipac to store our infomation with metadata. If you are going to overwrite a file, you need to set overwrite to True.

In [117]:
new_tab.write('Homework2_data/distance.txt',format='ascii.ipac',overwrite=True)
!head distance.txt

'head' is not recognized as an internal or external command,
operable program or batch file.


Finally, we can also store files in binary format. These cannot be read by `head`. The most common binary format in astronomy is the FITS format. We will discuss it more later. You have to use the command 

In [120]:
#new_tab.write('distance.fits') #If you need to overwrite put in overwrite=True
#new2_tab = Table.read('distance.fits')
#print(new2_tab) #Note the units survive

## Now it is your turn
Using `hip_smaller.csv` read in the file and create an astropy table called `hip_tab`. Then give the columns that have units in their name, the correct units, and then remove the units from the name of the column. Show your results.

In [121]:
hip_tab = Table.read('Homework2_data/hip_small.csv',format='ascii')
hip_tab.rename_columns(('Ra (Degrees)', 'Dec (Degrees)', 'Plx (milliarcsec)', 'B (mag)', 'V (mag)'), 
                       new_names=('Ra', 'Dec', 'Plx', 'B', 'V'))

In [122]:
hip_tab['Ra'] *= u.deg
hip_tab['Dec'] *= u.deg
hip_tab['Plx'] *= u.marcsec
hip_tab['B'] *= u.dimensionless_unscaled
hip_tab['V'] *= u.dimensionless_unscaled
hip_tab['err_Plx'] *= u.marcsec
hip_tab['err_B'] *= u.dimensionless_unscaled
hip_tab['err_V'] *= u.dimensionless_unscaled
hip_tab
hip_tab.colnames

['HIP (Name)', 'Ra', 'Dec', 'Plx', 'B', 'V', 'err_Plx', 'err_B', 'err_V']

Using `hip_tab` create a new table called `north_tab` with Dec greater than 0 degrees and Plx between 10 and 50 marcsec. Then make Plx and plx_err have units of arcsecs.

In [123]:
north_tab = hip_tab[:][(hip_tab['Dec'] > 0*u.deg) & (10*u.marcsec <hip_tab['Plx']) & (hip_tab['Plx'] < 50*u.marcsec)]
north_tab.replace_column('Plx',north_tab['Plx'].to(u.arcsec))
north_tab.replace_column('err_Plx',north_tab['err_Plx'].to(u.arcsec))

north_tab

HIP (Name),Ra,Dec,Plx,B,V,err_Plx,err_B,err_V
Unnamed: 0_level_1,deg,deg,arcsec,Unnamed: 4_level_1,Unnamed: 5_level_1,arcsec,Unnamed: 7_level_1,Unnamed: 8_level_1
int32,float64,float64,float64,float64,float64,float64,float64,float64
7,0.02254891,20.03660216,0.01774,10.542,9.679,0.0013000000000000002,0.039,0.03
34,0.09946973,26.91823821,0.01271,7.04,6.491,0.00074,0.004,0.004
54,0.15165558,17.96895579,0.02097,11.685,10.679,0.00171,0.102,0.065
58,0.17376341,62.17600484,0.02606,7.665,7.11,0.00067,0.006,0.005
68,0.20063628,16.98896499,0.0318,10.017,8.903,0.00117,0.024,0.016
74,0.22187281,35.75272213,0.02422,11.364,10.112,0.00136,0.075,0.038
84,0.25180722,27.88634368,0.0189,10.686,9.7,0.00133,0.05,0.032
110,0.34870816,39.61081788,0.02042,9.831,8.887,0.00191,0.022,0.016
128,0.41376473,73.61187664,0.0152,6.74,6.521,0.00063,0.004,0.004
...,...,...,...,...,...,...,...,...


How long did it take you to complete this homework?

~2.5 hours