Skip to content

Original Notes

David Rodriguez edited this page Mar 2, 2021 · 6 revisions

Links

Discussion group: https://groups.google.com/forum/#!forum/simple-archive Github repo for database: https://github.com/kelle/SIMPLE Github repo for software: https://github.com/dr-rodriguez/astrodbkit2

July 28

Attendees: Kelle, David, Will Cooper,

New PR #38 - reviewed as a group and merged! Idea for a new test (not a part of CI) which tests against simbad. How to make front end website Php javascript bokeh - technology choices. Will to scope out the vision for how the website will work Php vs flask?
Will is going to look into PHP vs Flask vs Django (#18) UI design - eventually we could ask Jen Coddler

July 21

Attendees: Kelle, David, Will Cooper, Niall Whiteford

  • switched to using GitHub issues in SIMPLE repo as primary place for project documentation. We are still meeting weekly!
  • Astrodbkit2 released to pypi so you can pip install. https://github.com/dr-rodriguez/AstrodbKit2/releases/tag/v0.1.0
  • Mostly co-working!
  • Implementing data integrity tests (kelle and David)
  • plotting data (Will)
  • Ingesting data script (Niall)
  • Merged THREE pull requests

Update to environment with new pypi command (#34) And two PRs with data integrity tests (#35, #36)

June 30, 2020

Attendees: Kelle and David

Followed up on emails from last week, invited more people to this event.
Made a bunch of issues, made a new milestone.
Priority for next week is the Sources table (#7).
After that, we can start making data integrity tests (#9)! :)
Kelle wants to load database into SQLite Browser and design the tests from there.

June 23, 2020

Attendees: Kelle and David

Merged PR#2 for SIMPLE with example scripts and demos. https://github.com/kelle/SIMPLE/pull/2
Code now in astrodbkit2.
Used astropy cookiecutter to make repo.
Relies on sqlalchemy.
Kelle Actions.
Make milestones in SIMPLE repo.
Send emails.
New standing meeting/co-working on Tuesdays at 3-5pm ET.

June 9, 2020

Attendees: Kelle and David

David’s Scope:
Finalize the Schema:
Need to prototype and design the database before scoping out the service side requirements too much.
Remove obsCore and other complexities.
Do we want to actually ingest photometry or do live queries + caching?
Easier to store subset of the data we need. Live queries takes more resources.
Want to do the curation of mismatches ahead of time.
Our database is not big enough to warrant this.

DECISION: INGEST PHOTOMETRY.
Add UCD column to properly define the measurements?
Allow export/import all of the data as a JSON file. The direction the multi-mission databases have taken (using XML, but still). Makes data migration easy.
Core API - modifying astrodbkit to work with the new database.
Demo scripts for single measurement and batch measurement ingest.
Load example data from BDNYC database to test drive.
Could use Best big parallax paper as a case input.

NEED TO FIND SOMEONE ELSE TO HELP PUT TOGETHER THE WEBSITE.

Phase 0 - initial implementation of sqlalchemy and JSON
Milestone 0: new version of astrodbkit
Phase 1 - Photomotery.
Milestone: Ingest existing BDNYC photomotery and add Gaia photometry.
Phase 2 - Kinematics and Astrometry
Milestone: Ingest existing BDNYC data and add Best parallaxes and Gaia astrometry
Phase 3 - Spectra
Milestone: Ingest existing BDNYC spectra and add
Phase 4 - Images
Milestone: Ingest

Milestone: Web interface

Actions: Kelle wants to email the list about Phase 1-4 plan.
Maybe try to have a Check-in/discussion meeting on week of June 22 to discuss and add more Milestones.
Discuss who might take the lead on web implementation - https://github.com/kelle/SIMPLE/issues/3.
David needs to block out half-days for this work.
Will make updated schema for Phase 1.
Will use Phase 1 schema to start writing astrodbkit2.
Next meeting/co-working 3-5pm on June 23.

March 13, 2020

Expected Attendees: Kelle Cruz, David Rodriguez, Amelia Bayo, Julien Rameau, Angelle Tanner, Ricky Smart, Jackie Faherty, Will Cooper, Victor, Clémence Fontanive, Denise Stephens, Kora Muzic, Janis Hagelberg

Draft Agenda & Notes:

I (Angelle Tanner) am putting all high contrast images I have for the brown dwarfs and stars in my input catalogs into the Starchive. My brown dwarf list which Jonathan helped me create is in the database and is ready to be matched with data. I have ingested multiple spectral atlases as well. Please contact me and tell me how Starchive can help. I’ve gathered all the ALICE images and DIVA as well. Starchive also has multiple BD spectra from the usual places - Burgasser, Dwarf Archive, etc.

I want to provide services as a database storage service. We are buying a server with 16Tb now but can double that. It will be housed on campus with T1 access. It’s on AWS right now.

The Starchive currently lists 3783 BDs. This is from dwarfarchives, J Gagne and Johnson. Its current to 2015 I believe I am happy to share the list of names

If you have an updated list, I’d like that please. (AMT)

Link to google doc: https://docs.google.com/document/d/1PJ8BpIspaCjSINSm2XSchRpcNd58kFiY1JDKaW2sGV8/edit?usp=sharing

Contact: angelle.tanner@gmail.com

DwarfArchives

Link to Google Spreadsheet with Chris Gelino L Dwarf Archives updated with GAIA distances where available and 2MASS and Wise 1 and 2 magnitudes. ~2500 dwarfs - Send questions to Denise Stephens denise_stephens@byu.edu - THANKYOU! (AMT, also for including all references) https://docs.google.com/spreadsheets/d/1oOHlD_JwcaXcPKr4pP23DEC9sMhJit7m3d-Hj6BvKL4/edit?usp=sharing

GUCDS - https://gucds.inaf.it/gucds/

Note: in development, the table and spectra will have errors, please do not use any of this for scientific purposes. Some of the spectra are also unpublished so the download capability has been turned off for now.
From the above page you can do searches using WHERE and/or specific shortnames. For example put J2130-0845 in the box with the “J” and hit submit. Then click on the magnifying glass to get page of spectra with some tools to play with there. J2130-0845 is a good one of multiple spectra. Can click on file names to view meta data as well from that spectra viewing page/ deselect them or add an optical template. Some bokeh interactivity has been built in (sliders, clicking on the legend, etc).
You can change the reference catalog being plotted on the Aladin lite interface using the reference catalog in the pull down list. Contact w.cooper@herts.ac.uk if it’s not clear

Species - https://species.readthedocs.io/

Not an archive but a useful tool to consider including DIVA http://cesam.lam.fr/diva/

NASA Exoplanet archive for directly imaged planets [Link]

Why do we need a new one? Folks current challenges

  • Want a new thing which is designed to allow more folks to ingest data from the start
  • Want something up to date, with uniform format and convention
  • Want a thing which serves all known spectra (and images) - data retrieval
  • DACE funding is for exoplanets: development steered towards exoplanets but it’s open for other datasets as long as there is no data ingestion work needed
  • Need to make a distinction between the database and the service on top of the database
  • Also need to make clear what are the specific Use Cases we want to address (some of these may be need to be moved to out-of-scope/to-be-discussed) < ADD THINGS HERE>
  • Make a sample for a telescope proposal - whats up, magnitudes, binary?, etc
  • Find and download spectra of brown dwarfs and directly imaged exoplanets based on wavelength and resolution from both a website and from a script
  • Visualize spectra on a website.
  • List all observations/measurements for a single object
  • Search all data from a published reference
  • Programmatic (eg, Python) access to database contents (you mean an API?)
  • Perform a cone search of database objects (eg, what objects in some ra/dec region)
  • Be able to have a private version of the database and an “easy” way to add/ingest previously private data to the public version.
  • A workflow for community curation of data ingestion. - a test suite, guidelines, as well as a community of curators.
  • Holdings: Find what data exists on any directly imaged object that is suspected of being a brown dwarf or planetary mass. (E.g., young M6s, all field LTY dwarfs, all directly imaged exoplanets.)
  • Find and download images from ground-based AO projects
  • Search by distance, either spectrophotometric or astrometric, or any specific parameter (SpT, magnitudes etc.).

NOT SURE - TO BE DISCUSSED

  • A way of including model derived parameters, especially retrieval outputs like temperature pressure profiles, corner plot figures + statistics, model SED fits etc for easy comparison between models.
  • Consistent schema to spectral files? E.g. what’s in the headers, ascii/ fits/ csv?
  • Model derived parameters, T_eff, cloud properties, retrieval outputs.
  • See something about how the data was reduced.
  • USEFUL OUT OF Version 1 SCOPE - STRETCH THINGS
  • Data analysis functions such as measure equivalent widths of spectral lines, fit SEDs, model fits
  • Synthetic magnitudes - applying a filter profile to a spectrum
  • Plots of tabulated data (e.g. CDMs)

Starchive currently covers most of these use cases. Free free to look at the google doc. I talk about how I can implement a BD version.

Feb 20, 2020

Attendees: Kelle Cruz, David Rodriguez

Stick with SQL (SQLLite?) Postgres?

store data as JSON
This would encapsulate all data into a single file for ease of transport across services (eg, commits to master)
Astrodbkit would need updates to serialize JSON to/from the database

Code Development in astrodbkit will be needed.

SIMPLE = database for low mass stars and directly imaged planets
SIMPLE archive of directly imaged planets and BDs
Substellar and IMaged PLanet Explorer.
“A Simple archive of complex objects.”

Le could stand for Leiden?

COM - metadata
OBSCORE

Clean - start fresh.

1st step CREATE NEW SCHEMA - New field names that correspond to ObsCore. Start with the BDNYC schema

David will start on it.

Start a new repo!

Kelle started and put BSD3 license on it. https://github.com/kelle/SIMPLE

Agenda for next meeting: Review of schema first draft License? SQLLite, Postgres, something else?

Next meeting. Fridays good for David. Aiming for this Mar 13 or Mar 20 at 10 ET.

How about a poster for Cool Stars? David et al. will make poster but Kelle will present.

Feb 12, 2020 - Unconference Meeting in Leiden

present - Victor, Clem, Amelia, Kora, Steve, Elisabeth, Kelle

Kora Muzic kmuzic@sim.ul.pt, Clémence Fontanive clemence.fontanive@csh.unibe.ch, Elisabeth Matthews ematth@mit.edu , Amelia Bayo amelia.bayo@uv.cl, Kelle Cruz kellecruz@gmail.com, Janis Hagelberg Janis.Hagelberg@unige.ch, Víctor Almendros valmendros@sim.ul.pt, Niall Whiteford niallw@roe.ac.uk

We’re going to primarily just focus on data access

  • The way vizier ingests data is based on how you publish the data -- we may want to e.g. allow unpublished data to be added
  • Vizier doesn’t have standard column names - SpType vs SpT, etc - do we want to have standards for this?
  • There is a list of approved descriptors - we could have a list of UCD descriptors that we recommend include.
  • HARD to maintain and we need something that can be maintained for years -- BDNYC has a collaborative workflow designed for their database. But still needs someone to curate, merge pull requests, etc.
  • Amelia notes that one also needs someone contactable/a help desk of some sort
  • Kelle suggests there are two things - database that can be contributed to plus also a metadata standard that means people can make their own databases that talk to this one
  • Is this for science ready data or also earlier data products? - raw or published data
  • BDNYC metadata is in fields in a database. Dictionaries for allowed entries in different fields?
  • Tabular/Non-tabular data; images/spectra. Non-tabular data cannot have standards imposed
  • We are NOT making MAST, we’re making something much smaller/easier - quick and dirty. If a grad student is leaving astronomy and wants to dump 100 spectra first we should have a simple mechanism to ingest that - better to have data that’s too easy to ingest than too hard to ingest
  • We want low barrier to entry - but that requires some marking of what things a user should choose to trust - perhaps there needs to be some quality criterion, which can have some user input. Collaborative quality flags. e.g. for spectra - important to know if it’s flux calibrated or not, and if it’s normalized to something.
  • specutils is now an active area of funded development in astropy - we should be careful to ?
  • if i am a grad student leaving the field, what do i have to upload. need a validator of some sort?
    • data files
    • ascii file w/ metadata
    • minimum requirements but with flexibility
    • ability to submit an object without all of the spectral data, so that a third party can at least note that an object exists
    • same object ID
  • out of the scope of this work for now - time series data
  • out of the scope of this work for now - x vs y / deeper analysis
  • VOSA automatically generates a bibtex file from citation information - no excuse not to cite the correct tool

NEXT - top three use case scenarios -- what three science cases do we want to enable.

What do we want to enable (in no particular order)

  1. Data discovery - e.g. how many type T or later are within 20pc, can I see a CMD of all objects in some criterion, what subset of type L0 have spectral/imaging data, etc
  2. Standardized access to data - allows someone to directly overplot spectra from groups A and B, which were generated slightly different a. Retrievals - person X who is developing a retrieval code should be able to get a spectrum that is ready for retrieval analysis
  3. Data sharing. a. Sharing of JWST spectra b. gaia kinematics/multiplicity? but this is already fairly open c. reduced keck/VLT/magellan/subaru data - for these, the data is closed unless we explicitly open it, e.g. only raw data available in public archives
  4. Community Curation.

What do we want to include important to allow flexibility - don’t want to split different subsets of data e.g. young vs field objects BUT needs clear guidelines of what we’re focussing on persuading people to include, and what people should expect to be able to extract stand alone BDs vs exoplanets with host stars vs late type binaries - BDNYC database has a field which is a list of companion object IDs. But priority isn’t multiple systems. web vs api oriented? - Kelle says API oriented because that allows both - web interface is a front end for a background API.

Where can non-tabular data live? It needs to be in a proper archive, that’s not an amazon webserver which has no promise of futureproofing. BDNYC currently in a mix of amazon webserver/CUNYcommons. Amelia mentioned that you need a level of security if people are allowed to upload data. Need permalinks to the data, which is a difficult problem. Perhaps the best solution is to go through the astronomy datacenters - MAST/Vizier? Doesn’t need to be a uniform solution as long as things are being stored somewhere - but we can begin to establish the connections with the databases. This is an unsolved problem that we should continue to think about.

Where can we revisit this/where do the people who we want to have using this go?
Cool Stars 21
Exoplanets III
Spirit of Lyoto isn’t for several years
OWL summer school
ADASS
telecons will probably get us pretty far?
Potentially Exoclimes

Kelle proposes
Kelle schedules a meeting with David, who is partly funded to do this, and is likely to be our lead developer
we all schedule a telecon, where David can ask a bunch of questions

Vision is for this to just provide the data, and a package like SPLAT or private codes are for data analysis scripts etc.

Exisiting infrastructure will motivate people to upload their data - but also there needs to be a carrot? need to be motivating individual projects to use these things.

exiting tools - pros and cons

Over lunch - think about names. Branding is important - need to make sure it’s explicitly community wide
SIMPLE archive of directly imaged planets and BDs
Substellar and IMaged PLanet Explorer.
“A Simple archive of complex objects.”