# 1. Introduction
## 3 - Overview of available files


The goal of this notebook is to explore the Kepler/K2 telemetry.

In [1]:
import numpy as np
import pandas as pd

In [2]:
import glob
from natsort import natsorted

What does the directory structure look like?

In [3]:
! tree -d ../data/

../data/
├── AED
│   ├── K2
│   │   ├── attitude
│   │   └── thermal
│   └── Kepler
│       ├── attitude
│       └── thermal
└── notAed
    ├── K2
    │   ├── attitude
    │   │   ├── c0
    │   │   │   ├── 2ndhalf
    │   │   │   └── allgood
    │   │   ├── c1
    │   │   ├── c10b
    │   │   ├── c11
    │   │   ├── c12
    │   │   ├── c13
    │   │   ├── c14
    │   │   ├── c15
    │   │   ├── c16
    │   │   ├── c17
    │   │   ├── c18
    │   │   ├── c19
    │   │   ├── c2
    │   │   ├── c3
    │   │   ├── c4
    │   │   ├── c5
    │   │   ├── c6
    │   │   ├── c7
    │   │   ├── c8
    │   │   ├── c9a
    │   │   └── c9b
    │   └── thermal
    └── Kepler
        ├── attitude
        └── thermal

37 directories


We can collect and sort the text file names in preparation for reading the files programmatically.

In [4]:
txt_files = glob.glob('../data/Aed/K2/thermal/*.txt')
txt_files = natsorted(txt_files)

There are four distinct file types per campaign, identified by their suffixes:

In [5]:
suffixes = [strings.split('_', maxsplit=1)[-1].split('.')[0] for strings in txt_files]
uniq_suffixes = list(set(suffixes))
uniq_suffixes

['BoardTemperatures',
 'TelescopeTemperatureTH_2',
 'TelescopeTemperatureTH_1',
 'TelescopeTemperaturePED']

Let's make dataframes for all of these.  The files do not all have the same time sampling, so we will gather the dataframes into a holder dictionary, rather than merge them into a single DataFrame object.

In [6]:
campaign_dict = {'C0':0,'C1':1,'C2':2,'C3':3,'C4':4,'C5':5,'C6':6,'C7':7,'C8':8,'C9a':91,'C9b':92,'C10b':101,'C11a':111,'C11b':112,
                 'C12':12,'C13':13,'C14':14,'C15':15,'C16':16,'C17':17,'C18':18,'C19':19}
campaigns = list(campaign_dict.keys())

This step takes about 10 seconds to read in all the files.

In [7]:
%%time
holder = {suffix:pd.DataFrame() for suffix in uniq_suffixes}
for suffix in uniq_suffixes:
    holder[suffix] = pd.DataFrame()
    for campaign in campaigns:
        fn = '../data/AED/K2/thermal/{}_{}.txt'.format(campaign, suffix)
        df = pd.read_csv(fn, skiprows=[0,1,2,3,4,6], sep='|')
        df['campaign']=campaign_dict[campaign]
        holder[suffix] = holder[suffix].append(df, ignore_index=True)

CPU times: user 36.1 s, sys: 11.5 s, total: 47.6 s
Wall time: 12.8 s


Each dictionary entry contains a dataframe.  The dataframes have different time sampling, so we can't simply match on the time axis.  In a future notebook we will merge times.

In [8]:
holder.keys()

dict_keys(['BoardTemperatures', 'TelescopeTemperatureTH_2', 'TelescopeTemperatureTH_1', 'TelescopeTemperaturePED'])

In [9]:
type(holder['BoardTemperatures'])

pandas.core.frame.DataFrame

Let's examine the various dataframes.

In [10]:
for suffix in uniq_suffixes:
    print("{}: {} \n".format(suffix, holder[suffix].shape), 
          "    >>>> {} \n".format(holder[suffix].columns.values))

BoardTemperatures: (1100448, 14) 
     >>>> ['MJD' 'LC' 'SC' 'PEDDRV1T' 'PEDACQ1T' 'PEDDRV2T' 'PEDACQ2T' 'PEDDRV3T'
 'PEDACQ3T' 'PEDDRV4T' 'PEDACQ4T' 'PEDDRV5T' 'PEDACQ5T' 'campaign'] 

TelescopeTemperatureTH_2: (1121179, 8) 
     >>>> ['MJD' 'LC' 'SC' 'TH1SPIDT' 'TH2SPIDT' 'TH1TELET' 'TH2TELET' 'campaign'] 

TelescopeTemperatureTH_1: (1121182, 6) 
     >>>> ['MJD' 'LC' 'SC' 'TH1SCMNTT' 'TH2SCMNTT' 'campaign'] 

TelescopeTemperaturePED: (1100599, 13) 
     >>>> ['MJD' 'LC' 'SC' 'PEDCRRT1' 'PEDCRRT2' 'PEDCRRT3' 'PEDCRRT4' 'PEDPMAT1'
 'PEDPMAT2' 'PEDPMAT3' 'PEDPMAT4' 'PEDTELMNTT1' 'campaign'] 



Many of the dataframes have over 1 million rows.  Wow!  Let's dig into these temperature data more closely in the next notebooks.