# Reading SP Data

The Spectral Profiler data is available in archives with `.sl2` file endings.  These files are 

In [5]:
import tarfile
import glob

import sys
sys.path.insert(0, "/data/plio/")
import plio

## Data Extraction
Only the .spc file is truly needed here.  We want to keep a link to the underlying jpg, but those are simply the same file name with .jpg attched.  I also wonder how much value we have in keeping those around at all?  Something to decide in the future.

In [19]:
files = glob.glob('*.sl2')
for f in files:
    tar = tarfile.open(f)
    # Extract the spc and the ctg files
    list(map(tar.extract, [m for m in tar.getmembers() if '.spc' in m.name])


In [8]:
from plio.io.io_spectral_profiler import Spectral_Profiler

In [9]:
s = Spectral_Profiler('SP_2C_02_05494_N704_E2143.spc')

In [15]:
s.label

PVLModule([
  ('PDS_VERSION_ID', 'PDS3')
  ('RECORD_TYPE', 'UNDEFINED')
  ('FILE_NAME', 'SP_2C_02_05494_N704_E2143.spc')
  ('PRODUCT_ID', 'SP_2C_02_05494_N704_E2143')
  ('DATA_FORMAT', 'PDS')
  ('^ANCILLARY_AND_SUPPLEMENT_DATA', Units(value=24730, units='BYTES'))
  ('^SP_SPECTRUM_WAV', Units(value=31038, units='BYTES'))
  ('^SP_SPECTRUM_RAW', Units(value=31630, units='BYTES'))
  ('^SP_SPECTRUM_REF2', Units(value=54126, units='BYTES'))
  ('^SP_SPECTRUM_RAD', Units(value=76622, units='BYTES'))
  ('^SP_SPECTRUM_REF1', Units(value=99118, units='BYTES'))
  ('^SP_SPECTRUM_QA', Units(value=121614, units='BYTES'))
  ('^L2D_RESULT_ARRAY', Units(value=144110, units='BYTES'))
  ('SOFTWARE_NAME', 'RGC_SP')
  ('SOFTWARE_VERSION', '2.10.3')
  ('PROCESS_VERSION_ID', 'L2C')
  ('PRODUCT_CREATION_TIME',
   datetime.datetime(2012, 4, 20, 16, 24, 43, tzinfo=<UTC>))
  ('PROGRAM_START_TIME', datetime.datetime(2012, 4, 20, 16, 19, 31, tzinfo=<UTC>))
  ('PRODUCER_ID', 'LISM')
  ('PRODUCT_SET_ID', 'SP_Level2C'

In [19]:
s.ancillary_data.head(5)

Unnamed: 0,SPACECRAFT_CLOCK_COUNT,VIS_FOCAL_PLANE_TEMPERATURE,NIR1_FOCAL_PLANE_TEMPERATURE,NIR2_FOCAL_PLANE_TEMPERATURE,SPECTROMETER_TEMPERATURE_1,SPECTROMETER_TEMPERATURE_2,SPECTROMETER_TEMPERATURE_3,SPECTROMETER_TEMPERATURE_4,HALOGEN_BULB_RADIANCE,HALOGEN_BULB_VOLTAGE1,...,CALIBRATION,SP_PELTIER,TC_MI_STATUS,CLOCK_COUNT_ERR_FLAG,SPATIAL_RESOLUTION_FLAG,GEOMETRIC_INFO_RECAL_FLAG,SUPPORT_IMAGE_LINE_POSITION,SUPPORT_IMAGE_COLUMN_POSITION,THUMBNAIL_LINE_POSITION,THUMBNAIL_COLUMN_POSITION
0,914831700.0,13.48,10.33,243.0,10.41,14.3,10.81,17.049999,4.759,4.759,...,0,1,1,0,66,67,27,478,13,230
1,914831700.0,13.48,10.33,243.0,10.41,14.3,10.81,17.049999,4.759,4.759,...,0,1,1,0,66,67,55,478,27,230
2,914831700.0,13.48,10.25,243.0,10.41,14.3,10.81,17.049999,4.759,4.759,...,0,1,1,0,66,67,83,478,40,230
3,914831700.0,13.48,10.33,243.0,10.41,14.3,10.81,17.049999,4.759,4.759,...,0,1,1,0,66,67,112,478,53,230
4,914831700.0,13.48,10.33,243.0,10.41,14.3,10.81,17.049999,4.759,4.759,...,0,1,1,0,66,67,140,478,66,230


In [23]:
s.spectra[0].head(5)

Unnamed: 0,RAW,REF1,REF2,QA
512.6,3639.0,0.0,0.0,393.0
518.4,3631.0,0.0,0.0,393.0
524.7,3639.0,0.0,0.0,385.0
530.4,3632.0,0.0015,6.5531,393.0
536.5,3647.0,0.0015,6.5531,385.0


## Spectra
Spectra are a pandas panel. This particular data collection has 38 individual spectra with 269 different wavelengths and 4 different data collected (RAW, REF1, REF2, QA).  The ancillary data contains 38 individual entries that map to the 38 different spectral panels.

Therefore, each observation is composed of:

* a panel from the spectra object: `s.spectra[i]`
* an associated row from the ancillary data object: `s.ancillarydata.iloc[i]`
* the file name `s.input_data`
* the image label metadata `s.label`

The above 4 items are what need to be stored within the DBs.

In [24]:
s.spectra.shape

(38, 269, 4)

In [25]:
s.ancillary_data.shape

(38, 43)

In [27]:
s.input_data

'SP_2C_02_05494_N704_E2143.spc'