# Getting Started: Tools, Data, and How Astronomers Use Databases
by Ernest Bavarsad

## What You Will Learn in This Notebook

By the end of this notebook, you will be able to:

- Run Jupyter notebooks without panic
- Install and import the Python tools used in astronomy research
- Understand what a *database table* actually is
- Read astronomical data with correct units and uncertainties
- Recognize what an astronomical query does (without writing one yet)

This notebook is intentionally **light on astrophysics** and **light on math**.
Its purpose is to build fluency, not results.

## Your Computing Environment

Modern astronomy research is often done using Python and Jupyter notebooks.
This allows us to:
- Document our work
- Share reproducible results
- Combine code, figures, and explanations in one place

### Anaconda and Jupyter

We will use:
- **Anaconda**: a Python distribution that manages packages
- **Jupyter Notebook**: an interactive coding environment

If this notebook opens and runs, you are already 90% set.

## Installing Required Python Packages

We will use a small set of standard scientific Python libraries.
These are widely used in professional astronomy research.

In [1]:
# Only run this cell if you do NOT already have these packages installed

# Uncomment the line below if needed (delete the #)
#!pip install numpy pandas matplotlib astroquery astropy

## Importing Libraries

In Python, we must import packages before using them.
This cell below should run without errors.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from astropy import units as u
from astropy.table import Table

## What Is a Table?

Most astronomical databases store data in **tables**.

A table consists of:
- Rows → individual objects (stars, galaxies, comets)
- Columns → measured quantities (position, brightness, distance)

This is true whether the data come from Gaia, SDSS, or any other survey.

In [3]:
data = {
    "star_id": [1, 2, 3],
    "parallax_mas": [7.2, 2.5, 0.8],
    "g_mag": [10.3, 13.1, 15.7]
}

table = Table(data)
print(table)

star_id parallax_mas g_mag
------- ------------ -----
      1          7.2  10.3
      2          2.5  13.1
      3          0.8  15.7


## Units Matter (A Lot)

Astronomy data are meaningless without units.

Astropy allows us to explicitly attach units to values.
This prevents many common scientific mistakes.

In [4]:
parallax = np.array([7.2, 2.5, 0.8]) * u.mas
distance = parallax.to(u.arcsec, equivalencies=u.parallax())

print(distance)

[0.0072 0.0025 0.0008] arcsec


## Uncertainties and Measurement Error

Every astronomical measurement has uncertainty.

Good science:
- Acknowledges uncertainty
- Propagates uncertainty
- Never hides uncertainty

In [5]:
parallax = 7.2 * u.mas
parallax_error = 0.3 * u.mas

fractional_error = parallax_error / parallax
print(fractional_error)

0.041666666666666664


## What Is an Astronomical Database?

An astronomical database is:
- A large collection of tables
- Queried using a structured language
- Designed for efficient searching, not spreadsheets

Examples include:
- Gaia Archive
- SDSS
- SIMBAD / VizieR

## A Very Light Preview of ADQL

ADQL (Astronomical Data Query Language) is similar to SQL.

A query has three main parts:
- SELECT → what you want
- FROM → where it lives
- WHERE → conditions on the data

You will learn this properly in the next notebook.

In [6]:
query = """
SELECT TOP 10 *
FROM gaiadr3.gaia_source
"""
print(query)


SELECT TOP 10 *
FROM gaiadr3.gaia_source



## What Comes Next

In the next notebook, you will:
- Write real ADQL queries
- Download data from Gaia
- Apply basic quality cuts
- Begin answering scientific questions

If you understand this notebook, you are ready.

## Self-Check

Before moving on, make sure you can answer:

- What is a table?
- What is a column vs a row?
- Why do units matter?
- Why do astronomers care about uncertainty?
- What does SELECT / FROM / WHERE conceptually mean?