# Getting data into LSDB

The most practical way to load data into LSDB is from catalogs in HiPSCat format, hosted locally or on a remote source. We recommend you to visit our own cloud repository, [data.lsdb.io](https://data.lsdb.io), where you are able to find large surveys publicly available to use.

In [None]:
import lsdb

### Loading Gaia DR3

Let's get Gaia DR3 into our workflow, as an example. It is as simple as invoking `read_hipscat` with the respective catalog URL.

Note that it's important (and highly recommended) to:

- **Load catalogs with their respective margin caches**, when available. These margins are necessary to obtain accurate results in several operations such as joining and crossmatching.

- **Pre-select a small subset of columns** that satisfies your scientific needs. Loading an unnecessarily large amount of data leads to computationally expensive and inefficient workflows. To see which columns are available, please refer to the column descriptions in each catalog's section on [data.lsdb.io](https://data.lsdb.io).

You can copy the command to read the catalog directly from our website and adjust its arguments as needed.

In [None]:
gaia_dr3 = lsdb.read_hipscat(
    'https://data.lsdb.io/unstable/gaia_dr3/gaia/', 
    margin_cache='https://data.lsdb.io/unstable/gaia_dr3/gaia_10arcs/',
    columns=[
        "source_id",
        "ra",
        "dec",
        "phot_g_mean_mag",
        "phot_proc_mode",
        "azero_gspphot",
        "classprob_dsc_combmod_star",
    ],
)
gaia_dr3

### Data loading is lazy

When invoking `read_hipscat` only metadata information about that catalog (e.g. sky coverage, number of total rows and column schema) is loaded into memory! Notice that the ellipses in the previous catalog representation are just placeholders.

You will find that most use cases start with **LAZY** loading and planning operations, followed by more expensive **COMPUTE** operations. The data is only loaded into memory when we trigger the workflow computations, usually with a `compute` call.

![Lazy workflow diagram](../_static/lazy_diagram.svg)

### Visualizing catalog metadata

Even without loading any data, we can still get a glimpse of our catalog's structure.

#### HEALPix map

We can invoke `plot_pixels` on the catalog to plot its sky coverage map and obtain information about its HEALPix distribution. Areas of higher density of points are represented by higher order pixels.

In [None]:
gaia_dr3.plot_pixels("Gaia DR3 Pixel Map")

#### Point density map

We can invoke `plot_points` with the catalog's internal metadata structure to observe the point density map.

In [None]:
from hipscat.inspection import plot_points
plot_points(gaia_dr3.hc_structure)

#### Column schema

It's also straightfoward to have a look at column names and their respective types.

In [None]:
gaia_dr3.dtypes

The full arrow schema is also available through `gaia_dr3.hc_structure.schema`.