# Testing the Galaxy class

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Read-a-data-file" data-toc-modified-id="Read-a-data-file-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Read a data file</a></span></li><li><span><a href="#ParticleProperties" data-toc-modified-id="ParticleProperties-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>ParticleProperties</a></span></li><li><span><a href="#Exploring-the-data" data-toc-modified-id="Exploring-the-data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Exploring the data</a></span><ul class="toc-item"><li><span><a href="#Data-structures" data-toc-modified-id="Data-structures-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Data structures</a></span></li><li><span><a href="#Data-visualization" data-toc-modified-id="Data-visualization-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Data visualization</a></span></li></ul></li></ul></div>

In [1]:
# standard Python imports
from pathlib import Path

# scientific package imports
import numpy as np
from numpy.linalg import norm

import astropy.units as u
from astropy.table import QTable

import pandas as pd

# import my own class for this homework
from galaxy import Galaxy

## Read a data file

In this case, the Milky Way, snap 0

In [2]:
gal = Galaxy('MW')
print(gal.filename)
# gal.read_file()

MW_000.txt


It created the correct filename and found a path to the file. The data is in a form of a 1-D np.ndarray with 135000 elements:

In [3]:
gal.data, gal.data.shape, type(gal.data)

(array([(1., 0.00394985, -2.51725e+00,  19.1588 ,  5.28528e+01,  1.34962e-01, 116.109   ,  -85.3822 ),
        (1., 0.00394985, -2.86601e+02, 298.455  ,  3.91692e+02,  5.02658e+01, -46.4521  ,   15.1825 ),
        (1., 0.00394985, -5.05945e-01, -28.6337 , -8.39565e+01,  1.13833e+01,  -0.974253,  -39.3509 ),
        ...,
        (3., 0.00010005, -3.29432e+00,   3.36725,  1.09023e-01,  2.18821e+02,  73.4462  ,   -8.81108),
        (3., 0.00010005,  2.57806e-01,   5.31409, -6.62670e-01,  5.46121e+01, -19.0044  , -190.184  ),
        (3., 0.00010005, -6.57662e-01,   3.32552, -2.51660e+00, -1.37672e+01,  44.8175  ,   16.7124 )],
       dtype=[('type', '<f8'), ('m', '<f8'), ('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('vx', '<f8'), ('vy', '<f8'), ('vz', '<f8')]),
 (135000,),
 numpy.ndarray)

Individual rows (a single particle) claim to be np.void, but they still understand column headers:

In [4]:
gal.data[1], type(gal.data[1]), gal.data[1]['m']

((1., 0.00394985, -286.601, 298.455, 391.692, 50.2658, -46.4521, 15.1825),
 numpy.void,
 0.00394985)

It is ***not*** OK to treat the data as a 2-D array for indexing:

In [5]:
# this will throw an IndexError
gal.data[:,2]

IndexError: too many indices for array

## ParticleProperties

There are two related methods:
- `single_particle_properties()` gets a (mass, distance, velocity) tuple for the specified particle, returning magnitudes with units in a galactic CoM frame.
- `all_particle_properties()` returns a QTable of values

Both can be filtered by particle type (1=DM, 2=disk, 3=bulge).

Remember that `particle_num` is zero-based.

In [6]:
# the 100th disk particle
particle100 = gal.single_particle_properties(type=2, particle_num=99)
pos, v, m = particle100
pos, v, m

(<Quantity 4.976 kpc>, <Quantity 434.785 km / s>, <Quantity 1000000. solMass>)

Change units:

In [7]:
np.around(pos.to(u.lyr), 3)

<Quantity 16229.541 lyr>

In [8]:
t = gal.all_particle_properties(type=2)
t[:3]

type,m,pos,v
Unnamed: 0_level_1,solMass,kpc,km / s
float64,float64,float64,float64
2.0,1000000.0,24.528,269.616
2.0,1000000.0,5.468,361.43
2.0,1000000.0,4.058,261.901


In [9]:
len(t), np.mean(t['m'])

(75000, <Quantity 1000000. solMass>)

It appears that this dataset simplifies by having relatively few massive stellar clusters rather than a realistic number of normal-mass stars. Makes sense for an undergrad class!

## Exploring the data

### Data structures

The default ndarray format is probably efficient for the calculations we'll do later, but many modern packages expect pandas dataframes or astropy Tables/QTables. Both can be exported from the Galaxy class.

In [10]:
# pandas
df = gal.get_df()
df[:3]

Unnamed: 0,type,m,x,y,z,vx,vy,vz
0,1.0,0.00395,-2.51725,19.1588,52.8528,0.134962,116.109,-85.3822
1,1.0,0.00395,-286.601,298.455,391.692,50.2658,-46.4521,15.1825
2,1.0,0.00395,-0.505945,-28.6337,-83.9565,11.3833,-0.974253,-39.3509


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 135000 entries, 0 to 134999
Data columns (total 8 columns):
type    135000 non-null float64
m       135000 non-null float64
x       135000 non-null float64
y       135000 non-null float64
z       135000 non-null float64
vx      135000 non-null float64
vy      135000 non-null float64
vz      135000 non-null float64
dtypes: float64(8)
memory usage: 8.2 MB


In [12]:
df.describe()

Unnamed: 0,type,m,x,y,z,vx,vy,vz
count,135000.0,135000.0,135000.0,135000.0,135000.0,135000.0,135000.0,135000.0
mean,1.703704,0.001526,0.361403,2.16517,-2.975892,-0.82053,1.206656,-1.190753
std,0.597206,0.001859,400.986716,406.302545,402.947138,134.857647,131.334283,87.295667
min,1.0,0.0001,-10499.7,-10813.5,-9908.83,-550.021,-485.052,-461.048
25%,1.0,0.0001,-6.15764,-4.479513,-2.169652,-95.614625,-91.942875,-38.252725
50%,2.0,0.0001,-1.90952,2.86933,-1.426715,-0.081675,-0.162324,-0.401043
75%,2.0,0.00395,4.96911,8.026022,-0.665771,92.38625,89.797525,36.362
max,3.0,0.00395,10328.1,10735.1,10955.4,526.824,544.527,468.521


In [13]:
# QTable
t = gal.get_qtable()
t[:3]

type,m,x,y,z,vx,vy,vz
Unnamed: 0_level_1,solMass,kpc,kpc,kpc,km / s,km / s,km / s
float64,float64,float64,float64,float64,float64,float64,float64
1.0,39498500.0,-2.51725,19.1588,52.8528,0.134962,116.109,-85.3822
1.0,39498500.0,-286.601,298.455,391.692,50.2658,-46.4521,15.1825
1.0,39498500.0,-0.505945,-28.6337,-83.9565,11.3833,-0.974253,-39.3509


QTables are great at handling units and doing astronomer-type I/O (a.g. FITS files) but don't have all the data analysis capabilities of pandas. Fortunately they are easy to interconvert for clean datasets (though missing values can be a problem).

### Data visualization

Data scientists tend to have bigger budgets (and bigger salaries) than astronomers. Fortunately many of them have moved towards using Python rather than R, so making cool dataviz tools is a growth industry. 

Start by getting a big dataframe in suitable format:

In [19]:
galaxies = []
for gal_name in ['MW', 'M31', 'M33']:
    g = Galaxy(gal_name)
    print(g.name)
    g_df = g.all_particle_properties().to_pandas()
    g_df['name'] = gal_name
    galaxies.append(g_df)
gals_df = pd.concat(galaxies)

MW
M31
M33


In [20]:
gals_df

Unnamed: 0,type,m,pos,v,name
0,1.0,39498500.0,27.211,164.203,MW
1,1.0,39498500.0,510.187,82.718,MW
2,1.0,39498500.0,40.497,11.466,MW
3,1.0,39498500.0,261.603,159.261,MW
4,1.0,39498500.0,26.541,155.194,MW
...,...,...,...,...,...
14295,2.0,1000000.0,842.420,232.263,M33
14296,2.0,1000000.0,847.568,219.680,M33
14297,2.0,1000000.0,840.035,222.736,M33
14298,2.0,1000000.0,840.966,107.240,M33
