# Tutorial 1: Accessing Datasets

This tutorial demonstrates how to retrieve data using the *datasets* module.

## Datasets Classes
The datasets module contains a collection of classes for managing the loading and saving of data.
Datasets are ingested from various online sources in different formats (e.g. csv, txt, dat, or html). 
Each dataset class contains custom methods specific to the format of the dataset. 
For convenience, all classes also contain methods with similar namespaces to standardize the loading and saving of the data in different formats:


| Methods         | Description                                                                  |
|:----------------|:-----------------------------------------------------------------------------|
| import_from_url | Load data table from an online source to Pandas dataframe.                   |
| save_to_csv     | Save dataframe into a csv file on local drive.                               |
| import_from_csv | Load data table from a saved csv file to a Pandas dataframe.<br>(If file does not exist, it will be downloaded using the save_to_csv method.)                 |
| save_to_sql     | Save dataframe as a table in an SQL database.                                |
| load_from_sql   | Load data table from SQL database to a Pandas dataframe.                     |

Note: Some datsets contain multiple data tables that are packaged together in the same class. In these cases, the naming convention is adapted to the format: "import_< table_name >_from_< method >"

The following datasets have been implemented (note that some datsets include multiple tables).


|ClassName |Table Name                        |
|:---------|:---------------------------------|
|SBDB	   | JPL_SBDB                         |
|AstDys	   | AstDys_Elements                  |
|AstDys	   | AstDys_Families                  |
|AstDys	   | AstDys_Family_Members            |
|AstDys	   | AstDys_Synthetic_Proper_Elements |
|DAMIT	   | DAMIT_AstModel                   |
|DAMIT     | DAMIT_Model_Reference            |
|DAMIT     | DAMIT_Reference                  |
|Lowell    | Lowell_Elements                  |
|Lowell	   | Lowell_MOID                      |
|MPCORB	   | MPCORB_table                     |
|PanSTARRS | PanSTARRS                        |
|PDS_SBN   | Taxonomy_table                   |
|MITHNEOS  | MITHNEOS                         |
|SDSSMOC   | SDSS_MOC                         |
|LCDB	   | LCDB_Summary                     |
|LCDB      | LCDB_Details                     |



## Using the Datasets Class
The Datasets class is a rapper class allowing access to all of the datasets that have been fully implemented.

### load_dataset
The Datsets.load_dataset method can be used to load individual datasets by their table name (listed above).
The additional 'version' argument specifies if you want to re-download the most up to date dataset (version='today'), or to load in the most recent file saved on your machine (version='current').

In [2]:
# Import the Datasets class
from sr_tools.Datasets.Datasets import Datasets
import pandas as pd
pd.options.mode.chained_assignment = None # Surpress pandas warnings

# Load the most recent version of a dataset into a Pandas dataframe.
# (If running for the first time, no files will exist and the most recent version will be downloaded from online source.)
df = Datasets.load_dataset('MPCORB',version='current')
# Show the result
df

Unnamed: 0,spkid,pdes,number,name,designation,additional_desig,packed_number,orbit_type,neo,km_neo,...,packed_obs,rms,pert_coarse,pert_precise,packed_hexflag,computer,last_obs_date,first_obs_year,last_obs_year,arc
0,2000001,1,1,Ceres,(1) Ceres,A801 AA A899 OF,00001,AST,0,0,...,1801-2021,0.63,M-v,30k,0000,Pan,20211019.0,1801.0,2021.0,80355.00
1,2000002,2,2,Pallas,(2) Pallas,A802 FA,00002,AST,0,0,...,1804-2021,0.58,M-c,28k,0000,Pan,20211014.0,1804.0,2021.0,79259.25
2,2000003,3,3,Juno,(3) Juno,A804 RA,00003,AST,0,0,...,1804-2021,0.61,M-v,3Ek,0000,Pan,20211001.0,1804.0,2021.0,79259.25
3,2000004,4,4,Vesta,(4) Vesta,A807 FA,00004,AST,0,0,...,1821-2021,0.50,M-p,18k,0000,Pan,20210903.0,1821.0,2021.0,73050.00
4,2000005,5,5,Astraea,(5) Astraea,A845 XA 1969 SE,00005,AST,0,0,...,1845-2021,0.70,M-v,3Ek,0000,Pan,20211018.0,1845.0,2021.0,64284.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1151411,54220010,2021 VK19,,2021 VK19,2021 VK19,,K21V19K,APO,1,0,...,1 days,0.40,M-v,3Ek,0803,Veres,20211115.0,,,1.00
1151412,54220011,2021 VL19,,2021 VL19,2021 VL19,,K21V19L,APO,1,0,...,1 days,0.66,M-v,3Ek,0803,Veres,20211114.0,,,1.00
1151413,3246834,6331 P-L,,6331 P-L,6331 P-L,2010 LQ92 2010 RX48,PLS6331,AST,0,0,...,1960-2017,0.48,M-v,38h,0000,MPCLINUX,20171107.0,1960.0,2017.0,20819.25
1151414,3013075,6344 P-L,,6344 P-L,6344 P-L,2007 RR9,PLS6344,APO,1,0,...,1960-2021,0.74,M-v,3Ek,8803,MPCLINUX,20210801.0,1960.0,2021.0,22280.25


## Using Individual Dataset Classes

Data can be accessed directly by using the coresponding classes of the dataset.
For example, Orbital Elements data from the Minor Planet Center can be accessed using the *MPCORB* class.

In [3]:
# Import the relevant class
from sr_tools.Datasets.MPCORB import MPCORB

# Load the dataset into a Pandas dataframe
df = MPCORB().import_from_csv()
# Show the results
df

Unnamed: 0,spkid,pdes,number,name,designation,additional_desig,packed_number,orbit_type,neo,km_neo,...,packed_obs,rms,pert_coarse,pert_precise,packed_hexflag,computer,last_obs_date,first_obs_year,last_obs_year,arc
0,2000001,1,1,Ceres,(1) Ceres,A801 AA A899 OF,00001,AST,0,0,...,1801-2021,0.63,M-v,30k,0000,Pan,20211019.0,1801.0,2021.0,80355.00
1,2000002,2,2,Pallas,(2) Pallas,A802 FA,00002,AST,0,0,...,1804-2021,0.58,M-c,28k,0000,Pan,20211014.0,1804.0,2021.0,79259.25
2,2000003,3,3,Juno,(3) Juno,A804 RA,00003,AST,0,0,...,1804-2021,0.61,M-v,3Ek,0000,Pan,20211001.0,1804.0,2021.0,79259.25
3,2000004,4,4,Vesta,(4) Vesta,A807 FA,00004,AST,0,0,...,1821-2021,0.50,M-p,18k,0000,Pan,20210903.0,1821.0,2021.0,73050.00
4,2000005,5,5,Astraea,(5) Astraea,A845 XA 1969 SE,00005,AST,0,0,...,1845-2021,0.70,M-v,3Ek,0000,Pan,20211018.0,1845.0,2021.0,64284.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1151411,54220010,2021 VK19,,2021 VK19,2021 VK19,,K21V19K,APO,1,0,...,1 days,0.40,M-v,3Ek,0803,Veres,20211115.0,,,1.00
1151412,54220011,2021 VL19,,2021 VL19,2021 VL19,,K21V19L,APO,1,0,...,1 days,0.66,M-v,3Ek,0803,Veres,20211114.0,,,1.00
1151413,3246834,6331 P-L,,6331 P-L,6331 P-L,2010 LQ92 2010 RX48,PLS6331,AST,0,0,...,1960-2017,0.48,M-v,38h,0000,MPCLINUX,20171107.0,1960.0,2017.0,20819.25
1151414,3013075,6344 P-L,,6344 P-L,6344 P-L,2007 RR9,PLS6344,APO,1,0,...,1960-2021,0.74,M-v,3Ek,8803,MPCLINUX,20210801.0,1960.0,2021.0,22280.25
