# Importing Data 

Once the data is successfuly downloaded from [https://gea.esac.esa.int/archive/](https://gea.esac.esa.int/archive/), and correctly decompressed following the instroctions provided in [Neeed to add link to README from personal github](add-link), we can now import the data on Python and store it as a data frame using the package [pandas](https://pandas.pydata.org/).

In [5]:
from astroquery.gaia import Gaia
import pandas as pd

# Query to get the astrometric properties of QSO-like objects
query = """
SELECT 
    agn.source_id, 
    gs.ra, gs.dec, 
    gs.pmra, gs.pmdec, 
    gs.parallax, gs.parallax_error, 
    gs.ruwe, gs.phot_g_mean_mag 
FROM gaiadr3.agn_cross_id AS agn
JOIN gaiadr3.gaia_source AS gs 
ON agn.source_id = gs.source_id
WHERE gs.parallax < 5 * gs.parallax_error  -- Remove potential stars
AND gs.ruwe < 1.4  -- Ensure good astrometric quality
AND gs.phot_g_mean_mag < 21  -- Bright enough for good measurements
"""

# Launch query and download data
job = Gaia.launch_job_async(query)
result = job.get_results()

# Save as CSV
result.write("qso_full_data.csv", format="csv", overwrite=True)


  source_name_in_catalogue            source_id                 catalogue_name
0      J174227.66-170055.3  4123935547888822656  AllWISE (Secrest et al. 2015)
1      J075739.42-170217.3  5718208892558907648  AllWISE (Secrest et al. 2015)
2      J075602.64-171829.5  5718174807695152384  AllWISE (Secrest et al. 2015)
3         1194m167o0006708  5718012041318231680  Gaia-unWISE (Shu et al. 2019)
4      J075642.64-171141.9  5718199855947920128  AllWISE (Secrest et al. 2015)


In [12]:
# Load into Pandas
df = pd.read_csv("qso_full_data.csv")
print(df.head())  # Check the data

        source_id         ra       dec      pmra     pmdec  parallax  \
0   3470333738112  45.075505  0.152316 -1.072371 -3.191011  0.366321   
1   5944234902272  44.884761  0.164806 -0.121274  0.725026 -0.395659   
2   6459630980096  44.910498  0.189649  0.217806 -0.316007 -0.626561   
3   9517648372480  45.254655  0.228999 -0.552941 -1.895446 -0.917219   
4  10892037246720  45.188575  0.282424 -0.098037 -0.120580  0.001630   

   parallax_error      ruwe  phot_g_mean_mag  
0        0.901633  0.889714        20.571114  
1        1.340139  1.087911        20.704517  
2        0.548536  1.020956        20.173105  
3        1.507964  1.031971        20.634562  
4        0.246332  0.974657        18.787239  


### Column Content

Before we procede, we need to understand what each column contains, to do so let us print the heading of each column.

In [13]:
# Check column names
print(df.columns)

Index(['source_id', 'ra', 'dec', 'pmra', 'pmdec', 'parallax', 'parallax_error',
       'ruwe', 'phot_g_mean_mag'],
      dtype='object')


The output above, displays:

1. <b>source_name_in_catalogue</b>, 

We want to make suere the data matches the description provided in the papaer, i.e. we expect there to be 1614173 sources which are identified as QSO-like objects. 

In [14]:
# Display the shape of the dataset
df.shape

(1596027, 9)

As expected, the above output confirms that there are 1614173 QSO-like objects in the dataset.

# Visualising Data

In this section we are going to reproduce some of the plots presented in the paper [Gaia Early Data Realease 3, Acceleration of the Solar System from Gaia astrometry](https://www.aanda.org/articles/aa/full_html/2021/05/aa39734-20/aa39734-20.html). 

In [11]:
df.columns

Index(['source_id', 'ra', 'dec', 'pmra', 'pmdec', 'parallax', 'parallax_error',
       'ruwe', 'phot_g_mean_mag'],
      dtype='object')