# Using pandas to study the stars

The Tycho-Gaia Astrometric Solution (TGAS) is a collection of 2.5 million stars identified by the Gaia astronomical observatory, a spacecraft launched in 2013. In this notebook, you will practice your pandas skills with a fraction of that dataset.

First, import pandas with the alias "pd":

In [1]:
#import libraries
import csv
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Now, open the "TGAS_data.csv" file using the .read_csv() function.

In [2]:
df = pd.read_csv("TGAS_data.csv")

Look at the first five rows of the dataframe using the .head() function.

In [3]:
df.head()

Unnamed: 0,source_id,random_index,ref_epoch,ra,ra_error,dec,dec_error,parallax,parallax_error,pmra,pmra_error
0,7627860000000.0,243619,2015,45.03433,0.305989,0.235392,0.218802,6.352951,0.30791,43.752313,0.070542
1,9277130000000.0,487238,2015,45.165007,2.583882,0.200068,1.197789,3.900329,0.323488,10.036263,4.611414
2,13297200000000.0,1948952,2015,45.086155,0.213836,0.248825,0.180326,3.155313,0.273484,2.932284,1.908644
3,13469000000000.0,102321,2015,45.066542,0.276039,0.248211,0.200958,2.292367,0.280972,3.661982,2.065052
4,15736800000000.0,409284,2015,45.136038,0.170697,0.335044,0.17013,1.582077,0.261539,0.340802,1.220476


How many rows are there in this dataframe? Hint: You can use the len() function.

In [4]:
print(len(df))

134865


You might notice that one of the columns contains the parallax angle for each star (remember: nearby objects have a larger parallax angle than far away objects).

Store the "parallax" column in a variable.

In [5]:
source_id = df["source_id"]
random_index = df["random_index"]
ref_epoch = df["ref_epoch"]
ra = df["ra"]
ra_error = df["ra_error"]
dec = df["dec"]
dec_error = df["dec_error"]
parallax = df["parallax"]
parallax_error = df["parallax_error"]
pmra = df["pmra"]
pmra_error = df["pmra_error"]


print(parallax.min())
print(parallax.max())

-4.349788529
280.740075


Find the mean parallax angle for this dataset.

In [8]:
print(parallax.mean())

2.51050046368


Sort the entire dataframe by parallax angle from largest to smallest and print out the first 10 rows of the sorted dataframe.

In [10]:
df.sort_values(["parallax"],ascending = False)

Unnamed: 0,source_id,random_index,ref_epoch,ra,ra_error,dec,dec_error,parallax,parallax_error,pmra,pmra_error
94182,3.853340e+17,1891830,2015,4.612109,0.287586,44.024673,0.367812,280.740075,0.305500,2890.430683,0.045499
60079,2.728560e+17,995879,2015,69.422644,0.731638,52.891635,0.483035,101.233485,0.323176,304.133888,0.058960
22819,1.454210e+17,689496,2015,67.250215,0.242147,21.923427,0.129090,89.041322,0.262731,-67.095509,0.082876
94799,3.866540e+17,851626,2015,1.426237,0.307011,45.811419,0.286776,88.262860,0.430357,889.089787,0.060094
63286,2.835700e+17,1345457,2015,86.453810,0.097230,62.233497,0.132750,74.202164,0.253300,297.566971,0.067006
58961,2.690680e+17,216910,2015,89.907373,0.274176,58.591954,0.283633,74.185025,0.325805,9.084194,0.096984
58371,2.668470e+17,1952397,2015,75.858634,0.282263,53.122061,0.321951,72.894533,0.354967,1302.877435,0.115796
26634,1.706400e+17,1549490,2015,62.158466,0.246399,33.637562,0.102616,72.002273,0.258425,526.982839,0.207995
12299,8.792150e+16,868068,2015,38.973222,0.374186,20.219299,0.226397,71.268160,0.331243,249.922590,0.117825
9605,7.005160e+16,1745571,2015,56.585713,0.369864,26.214688,0.246812,69.565046,0.379625,386.935217,0.119310


You can index a single item from a row of the dataframe as follows:

In [11]:
df["random_index"][94182] # Use a the following format: df[column name/variable][row index]

1891830

Try it yourself! Find the "random index" (ID number) for the star in row index 1.

In [12]:
df["random_index"][1]

487238

Now find the parallax angle for this star.

In [14]:
df["parallax"][1]

3.9003289350000001

## Pandas Challenge 1
Print out the average parallax error in this dataset.

In [19]:
print(parallax_error.mean())
print(df.min())

0.387319030976
source_id         7.627860e+12
random_index      4.800000e+01
ref_epoch         2.015000e+03
ra                1.189161e-03
ra_error          5.239195e-02
dec               2.000677e-01
dec_error         6.153038e-02
parallax         -4.349789e+00
parallax_error    2.063749e-01
pmra             -4.874547e+02
pmra_error        1.441252e-02
dtype: float64


## Pandas Challenge 2
Print out the "random index" for the furthest star in this dataset (Hint: this is the star with the smallest parallax angle.)

In [24]:
print(df["parallax"].min())
print(df["random_index"][133819])


-4.349788529
9765
