Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the API for the Bolides Package #10

Closed
jcsmithhere opened this issue Jun 10, 2022 · 3 comments
Closed

Define the API for the Bolides Package #10

jcsmithhere opened this issue Jun 10, 2022 · 3 comments
Assignees

Comments

@jcsmithhere
Copy link
Collaborator

We want a user-friendly API that can be used by casual Python programmers. Here we should lay out the interface and provide some examples of calls to the package. We will iterate until we agree upon a good API.

@anthonyozerov
Copy link
Collaborator

Main idea:

BolideDataFrame class extending the GeoDataFrame class from GeoPandas (which itself extends DataFrame from Pandas) while providing extra bolide-specific methods. The DataFrame class and all of its built-in methods are quite efficient for this size of data set. The following are the standard, must-have columns that the BolideDataFrame has for basic functionality to work:

  • latitude (degrees)
  • longitude (degrees)
  • datetime
  • geometry (just holding lat and long as Point objects, as in GeoPandas)
    Still need to think about how to encode other useful data (detecting satellite, various metadata from the website, etc.). Currently this is just left in the format coming from the website.

Initialization:

From website:
bdf = BolideDataFrame() or bdf = BolideDataFrame('website')
From pipeline:
bdf = BolideDataFrame('pipeline', [list of database files or a single file])

The initialization also creates a few new columns:

  • phase: the phase of the moon with 0 being a new moon, 0.5 being a full moon, and 0.999 being just before the next new moon.
  • moon_fullness: distance from a new moon, as a value from 0 to 1. (0.002 representing a 0.001 or 0.999 in the phase column). There is probably a better term for this…
  • solarhour: the solar time at the time and place of the bolide event.
    More columns can be added if we think there are more things that would be useful.

Plotting:

Maps

bdf.plot_detections() plots the points on a cartopy map using matplotlib's scatter with sensible defaults. A cartopy Coordinate Reference System object can be passed in to change the projection and any of matplotlib's scatter arguments can be used as well. A more advanced call would be:
bdf.plot_detections(ccrs.Orthographic(central_longitude=-100), marker='.', col=bdf.duration)

Histograms

plt.hist() is fairly simple and can be called on any column of the BolideDataFrame, but perhaps we can make a method (bdf.histogram?) which by default produces a good, useful plot but similarly allows additional arguments to be passed (like plot_detections)

Filtering:

Regular Pandas filtering, e.g.:

  • Filtering for date:
    bdf[bdf.datetime.between(datetime(2020,6,13),datetime(2020,6,14))]
  • Filtering for detections by only GLM-16:
    bdf[bdf.detectedBy == 'GLM-16']
  • Filtering for detections by GLM-16, including stereo:
    bdf[bdf.detectedBy.str.contains('GLM-16')]
  • Filtering by latitude:
    bdf[bdf.latitude>30]
  • Compound filtering syntax to filter for bolides only detected by GLM-16 at a latitude above 40° and not between November 10-20 2020:
    bdf[(bdf.detectedBy == 'GLM-16') & ~(bdf.datetime.between(datetime(2020,11,10),datetime(2020,11,20))) & (bdf.latitude>40)]

Additional bolide-specific filter methods can be made for some of the filters which are a little clunky (e.g. dates):
bdf.filter_date_after('2020-06-13')
bdf.filter_date_before('2020-06-13')
bdf.filter_date_between('2020-06-13','2020-06-20')
(Any iso-format string that can be read by datetime can be passed into these.)

Light curves

Light curves can be handled as a column ('lightcurve') containing a list of LightCurve objects (from lightkurve) for each bolide. Not sure if this is the best way…

Implementation

All of this (except for the lightcurve part) has been implemented for feasibility testing purposes and seems to work.

@jcsmithhere
Copy link
Collaborator Author

jcsmithhere commented Jun 10, 2022

Hi @anthonyozerov !

Thanks for creating this excellent summary. Some comments:

  • I would add to the number of required fields. Detecting satellite is very important. Data provenance is very important (i.e. pulled from website, or directly from pipeline, time when data was pulled from website or pipeline, etc...) There might be more that are an absolute necessity for the data to be useful...
  • For the object initialization, we should also have an option to load from a local data file, so the user does not need to re-read the data from the website.
  • I don't see the difference between "phase" and "moon_fullness". However having fields for both phase of moon and local solar time for each detection is good.
  • For the map plotting, in the tutorial provide numerous examples of the extra configuration parameters. The typical user will not be an adept cartopy or matplotlib user, and we want this tool to help invite the general meteor science community to use our data.
  • A pdf.histogram method would be very good. It would be customized specifically to look at the bolide data. This means, the date tick marks and other formatting will work well out of the box for the bolide data. Also, have quick methods to plot only certain times or other cuts on the data. Again, we don't want to assume the user knows how to set up the matplotlib histogram on their own.
  • Provide a way to show all fields in the data frame with short descriptions. Yes, df.keys() shows all the keys, but we want short descriptions of the keys.
  • I'm not aware of the datatime.between method. Is that a pandas extension to the datetime class?
  • We do want some good tools to plot and process the individual light curves. We want a plotting tool to plot the light curves for each satellite. Lots of options here. We are currently working on generating calibrated light curves. So we will have multiple light curves per bolide, both the GLM reported and the bolide-callibrate light curves. So, we want to set up our tools to plan for this extension. We might even want to be abel to import data form other data sources to compare all light curves for each event, but that's a more long-term goal for the package.

We want this user-friendly too be be a gateway to show the meteor science community the utility of our data set and how easy it is to use the data with this package. Right now, we have this huge data set and we want to begin "selling" it.

This is a great start! Begin to provide some examples of the above in the other issue tickets.

@jcsmithhere
Copy link
Collaborator Author

The API looks good to me. I'm sure we'll continue to modify things, but for the sake of this ticket, an API has been defined and implemented. We can close the ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants