# Data Preparation
Now that I've got the data, I have to clean it in a way that I can explore and model on it. This will require an understanding of what the columns represent, versus what I will need in the final product.

In [1]:
# Data Science Libraries
import pandas as pd

# Import my own functions
import wrangle

# Block Warning Boxes
import warnings
warnings.filterwarnings("ignore")

# Remove Limits On Viewing Dataframes
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In response to my time constraint for this project, I will reduce my data to just twins_raw data for now. I will keep the code for exploring all the data, including `twins_rawevent` and `twins_config` for another time.

#### TWINS raw data
>This data product contains TWINS raw science data downloaded from the spacecraft in
continuous mode. This includes Air Temperature Sensor PT1000 raw values, Wind Sensor
counters and temperatures, and ASIC temperature. 

In [2]:
# Acquiring the data using functions in wrangle
df = wrangle.twins_raw_data()

Reading Data from local file...


In [3]:
# How big is this dataframe?
df.shape

(2350436, 63)

In [4]:
# What does it look like?
df.head()

Unnamed: 0,AOBT,SCLK,LMST,LTST,UTC,BMY_2L_TEMP_1,BMY_2L_TEMP_2,BMY_2L_TEMP_3,BMY_2L_TEMP_4,BMY_2L_TEMP_4_AVERAGE,BMY_2L_TEMP_4_STD,BMY_2L_TEMP_5,BMY_2L_TEMP_6,BMY_AIR_TEMP,BMY_AIR_TEMP_AVERAGE,BMY_AIR_TEMP_STD,BMY_WD_REF_OUT_1,BMY_WD_REF_OUT_2,BMY_WD_REF_OUT_3,BMY_WD_OUT_1,BMY_WD_OUT_2,BMY_WD_OUT_3,BMY_WD_OUT_4,BMY_WD_OUT_5,BMY_WD_OUT_6,BMY_WD_OUT_7,BMY_WD_OUT_8,BMY_WD_OUT_9,BMY_WD_OUT_10,BMY_WD_OUT_11,BMY_WD_OUT_12,BMY_WIND_FREQUENCY,BMY_AIR_TEMP_FREQUENCY,BMY_ASIC_TEMP,BPY_2L_TEMP_1,BPY_2L_TEMP_2,BPY_2L_TEMP_3,BPY_2L_TEMP_4,BPY_2L_TEMP_5,BPY_2L_TEMP_5_AVERAGE,BPY_2L_TEMP_5_STD,BPY_2L_TEMP_6,BPY_AIR_TEMP,BPY_AIR_TEMP_AVERAGE,BPY_AIR_TEMP_STD,BPY_WD_REF_OUT_1,BPY_WD_REF_OUT_2,BPY_WD_REF_OUT_3,BPY_WD_OUT_1,BPY_WD_OUT_2,BPY_WD_OUT_3,BPY_WD_OUT_4,BPY_WD_OUT_5,BPY_WD_OUT_6,BPY_WD_OUT_7,BPY_WD_OUT_8,BPY_WD_OUT_9,BPY_WD_OUT_10,BPY_WD_OUT_11,BPY_WD_OUT_12,BPY_WIND_FREQUENCY,BPY_AIR_TEMP_FREQUENCY,BPY_ASIC_TEMP
0,596876952.0,596861200.0,00004M06:46:33.826,00004 06:05:41,2018-334T14:46:55.755Z,-4353.0,-4645.0,-4778.0,-5006.0,,,-4512.0,1002.0,-5703.0,,,4369.0,5167.0,5599.0,648.0,662.0,648.0,1687.0,3360.0,1109.0,1019.0,1229.0,868.0,2451.0,760.0,663.0,1.0,1.0,8485.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,596876953.0,596861200.0,00004M06:46:34.799,00004 06:05:42,2018-334T14:46:56.755Z,-4415.0,-4723.0,-4870.0,-5099.0,,,-4600.0,936.0,-5776.0,,,4274.0,5108.0,5524.0,867.0,379.0,828.0,1033.0,10.0,551.0,578.0,324.0,637.0,418.0,526.0,492.0,1.0,1.0,8488.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,596876954.0,596861200.0,00004M06:46:35.772,00004 06:05:43,2018-334T14:46:57.755Z,-4424.0,-4727.0,-4861.0,-5107.0,,,-4613.0,930.0,-5803.0,,,4301.0,5100.0,5532.0,713.0,521.0,641.0,1085.0,1194.0,855.0,807.0,918.0,859.0,1233.0,741.0,661.0,1.0,1.0,8491.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,596876955.0,596861200.0,00004M06:46:36.746,00004 06:05:44,2018-334T14:46:58.755Z,-4409.0,-4722.0,-4859.0,-5109.0,,,-4603.0,916.0,-5771.0,,,4297.0,5111.0,5539.0,687.0,578.0,662.0,1028.0,1254.0,861.0,857.0,991.0,879.0,1190.0,717.0,689.0,1.0,1.0,8494.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,596876956.0,596861200.0,00004M06:46:37.719,00004 06:05:45,2018-334T14:46:59.755Z,-4424.0,-4728.0,-4873.0,-5107.0,,,-4610.0,922.0,-5754.0,,,4296.0,5113.0,5535.0,438.0,284.0,599.0,862.0,1085.0,748.0,969.0,1049.0,813.0,1097.0,417.0,443.0,1.0,1.0,8498.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [5]:
# What do my columns look like
df.info(null_counts=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2350436 entries, 0 to 4443
Data columns (total 63 columns):
 #   Column                  Non-Null Count    Dtype  
---  ------                  --------------    -----  
 0   AOBT                    2350436 non-null  float64
 1   SCLK                    2350436 non-null  float64
 2   LMST                    2350436 non-null  object 
 3   LTST                    2350436 non-null  object 
 4   UTC                     2350436 non-null  object 
 5   BMY_2L_TEMP_1           1440001 non-null  float64
 6   BMY_2L_TEMP_2           1440001 non-null  float64
 7   BMY_2L_TEMP_3           1440001 non-null  float64
 8   BMY_2L_TEMP_4           707662 non-null   float64
 9   BMY_2L_TEMP_4_AVERAGE   732324 non-null   float64
 10  BMY_2L_TEMP_4_STD       732324 non-null   float64
 11  BMY_2L_TEMP_5           1440001 non-null  float64
 12  BMY_2L_TEMP_6           1440001 non-null  float64
 13  BMY_AIR_TEMP            707662 non-null   float64
 14  BMY_A

In [6]:
# I can see I have some missing values, let's see how many in each column:
df.isnull().sum()

AOBT                            0
SCLK                            0
LMST                            0
LTST                            0
UTC                             0
BMY_2L_TEMP_1              910435
BMY_2L_TEMP_2              910435
BMY_2L_TEMP_3              910435
BMY_2L_TEMP_4             1642774
BMY_2L_TEMP_4_AVERAGE     1618112
BMY_2L_TEMP_4_STD         1618112
BMY_2L_TEMP_5              910435
BMY_2L_TEMP_6              910435
BMY_AIR_TEMP              1642774
BMY_AIR_TEMP_AVERAGE      1618112
BMY_AIR_TEMP_STD          1618112
BMY_WD_REF_OUT_1           910435
BMY_WD_REF_OUT_2           910435
BMY_WD_REF_OUT_3           912043
BMY_WD_OUT_1               910435
BMY_WD_OUT_2               910435
BMY_WD_OUT_3               910435
BMY_WD_OUT_4               910435
BMY_WD_OUT_5               911459
BMY_WD_OUT_6               910435
BMY_WD_OUT_7               910435
BMY_WD_OUT_8               910435
BMY_WD_OUT_9               910435
BMY_WD_OUT_10              910435
BMY_WD_OUT_11 

Looks like I have a lot of missing data to look into. Before I do that tho I think I'll drop some columns first.  
Now I'd like to drop some columns so my data is simple to explore, however I think I should create a data dictionary first so I don't delete anything important!

## TWINS Raw Data Columns
|System          |  # | Column  |Data Type              | Description                                         |
|:---------------|:---|:--------|:----------------------|:----------------------------------------------------|
|Time References |  1 | AOBT    |ASCII_Real             | APSS Onboard Time
|                |  2 | SCLK    |ASCII_Real             | Spacecraft Clock
|                |  3 | LMST    |ASCII_String           | Local Mean Solar Time
|                |  4 | LTST    |ASCII_String           | Local True Solar Time
|                |  5 | UTC     |ASCII_Date_Time_DOY_UTC| Coordinated Universal Time
|================|====|=========|=======================|=====================================================|
| BOOM -Y        |  6 | BMY_2L_TEMP_1 | ASCII_Integer | Wind Sensor transducer 1 Printed Circuit Board temperature PT-1000 Platinum Resistance Thermometer |
|                |  7 | BMY_2L_TEMP_2 | ASCII_Integer | WS transducer 2 PCB temperature PT-1000 PRT | 
|                |  8 | BMY_2L_TEMP_3 | ASCII_Integer | WS transducer 3 PCB temperature PT-1000 PRT |
|                |  9 | BMY_2L_TEMP_4 | ASCII_Integer | ATS-mid-rodtemperature: PT1000 PRT sensor located at an intermediate position in the ATS rod|
|                | 10 | BMY_2L_TEMP_4_AVERAGE | ASCII_Integer | ATS-mid-rod temperature average of the last N samples
|                | 11 | BMY_2L_TEMP_4_STD     | ASCII_Integer | ATS-mid-rod temperature standard deviation of the last N samples
|                | 12 | BMY_2L_TEMP_5         | ASCII_Integer | Boom Housing Temp: PT-1000 PRT located at the Boom housing near the base of the ATS rod
|                | 13 | BMY_2L_TEMP_6         | ASCII_Integer | Calibration resistor: 1K ohm
|                | 14 | BMY_AIR_TEMP | ASCII_Integer | ATS-rod-extreme temperature: PT1000 PRT located at ATS extreme
|                | 15 | BMY_AIR_TEMP_AVERAGE  | ASCII_Integer | ATS-rod-extreme temperature average of the last N samples
|                | 16 | BMY_AIR_TEMP_STD | ASCII_Integer | ATS-rod-extreme temperature standard deviation of the last N samples
|                | 17 | BMY_WD_REF_OUT_1 | ASCII_Integer | WS transducer 1 cold die temperature
|                | 18 | BMY_WD_REF_OUT_2 | ASCII_Integer | WS transducer 2 cold die temperature
|                | 19 | BMY_WD_REF_OUT_3 | ASCII_Integer | WS transducer 3 cold die temperature
|                | 20 | BMY_WD_OUT_1     | ASCII_Integer | Number of counts measured for WS channel 1
|                | 21 | BMY_WD_OUT_2     | ASCII_Integer | Number of counts measured for WS channel 2
|                | 22 | BMY_WD_OUT_3     | ASCII_Integer | Number of counts measured for WS channel 3
|                | 23 | BMY_WD_OUT_4     | ASCII_Integer | Number of counts measured for WS channel 4
|                | 24 | BMY_WD_OUT_5     | ASCII_Integer | Number of counts measured for WS channel 5
|                | 25 | BMY_WD_OUT_6     | ASCII_Integer | Number of counts measured for WS channel 6
|                | 26 | BMY_WD_OUT_7     | ASCII_Integer | Number of counts measured for WS channel 7
|                | 27 | BMY_WD_OUT_8     | ASCII_Integer | Number of counts measured for WS channel 8
|                | 28 | BMY_WD_OUT_9     | ASCII_Integer | Number of counts measured for WS channel 9
|                | 29 | BMY_WD_OUT_10    | ASCII_Integer | Number of counts measured for WS channel 10
|                | 30 | BMY_WD_OUT_11    | ASCII_Integer | Number of counts measured for WS channel 11
|                | 31 | BMY_WD_OUT_12 | ASCII_Integer | Number of counts measured for WS channel 12
|                | 32 | BMY_ASIC_TEMP | ASCII_Integer | ASIC temperature
|                | 33 | BMY_AIR_TEMP_FREQUENCY | ASCII_String | Air temperature channels frequency or frequencies
|                | 34 | BMY_WIND_FREQUENCY | ASCII_String | Wind channels frequency or frequencies
|================|====|=========|=======================|=====================================================|
| BOOM +Y        | 35 | BPY_2L_TEMP_1 | ASCII_Integer | WS transducer 1 PCB temperature PT-1000 PRT
|                | 36 | BPY_2L_TEMP_2 | ASCII_Integer | WS transducer 2 PCB temperature PT-1000 PRT
|                | 37 | BPY_2L_TEMP_3 | ASCII_Integer | WS transducer 3 PCB temperature PT-1000 PRT
|                | 38 | BPY_2L_TEMP_4 | ASCII_Integer | Calibration resistor: 1K ohm
|                | 39 | BPY_2L_TEMP_5 | ASCII_Integer | ATS-mid-rod temperature: PT1000 PRT sensor located at a intermediate position in the ATS rod
|                | 40 | BPY_2L_TEMP_5_AVERAGE | ASCII_Integer | ATS-mid-rod temperature average of the last N samples
|                | 41 | BPY_2L_TEMP_5_STD | ASCII_Integer | ATS-mid-rod temperature standard deviation of the last N samples
|                | 42 | BPY_2L_TEMP_6 | ASCII_Integer | Boom Housing Temp: PT-1000 PRT located at the Boom housing near the base of the ATS rod
|                | 43 | BPY_AIR_TEMP | ASCII_Integer | ATS-rod-extreme temperature: PT1000 PRT located at ATS extreme
|                | 44 | BPY_AIR_TEMP_AVERAGE | ASCII_Integer | ATS-rod-extreme temperature average of the last N samples
|                | 45 | BPY_AIR_TEMP_STD | ASCII_Integer | ATS-rod-extreme temperature standard deviation of the last N samples
|                | 46 | BPY_WD_REF_OUT_1 | ASCII_Integer | WS transducer 1 cold die temperature
|                | 47 | BPY_WD_REF_OUT_2 | ASCII_Integer | WS transducer 2 cold die temperature
|                | 48 | BPY_WD_REF_OUT_3 | ASCII_Integer | WS transducer 3 cold die temperature
|                | 49 | BPY_WD_OUT_1     | ASCII_Integer | Number of counts measured for WS channel 1
|                | 50 | BPY_WD_OUT_2     | ASCII_Integer | Number of counts measured for WS channel 2
|                | 51 | BPY_WD_OUT_3     | ASCII_Integer | Number of counts measured for WS channel 3
|                | 52 | BPY_WD_OUT_4     | ASCII_Integer | Number of counts measured for WS channel 4
|                | 53 | BPY_WD_OUT_5     | ASCII_Integer | Number of counts measured for WS channel 5
|                | 54 | BPY_WD_OUT_6     | ASCII_Integer | Number of counts measured for WS channel 6
|                | 55 | BPY_WD_OUT_7     | ASCII_Integer | Number of counts measured for WS channel 7
|                | 56 | BPY_WD_OUT_8     | ASCII_Integer | Number of counts measured for WS channel 8
|                | 57 | BPY_WD_OUT_9     | ASCII_Integer | Number of counts measured for WS channel 9
|                | 58 | BPY_WD_OUT_10    | ASCII_Integer | Number of counts measured for WS channel 10
|                | 59 | BPY_WD_OUT_11    | ASCII_Integer | Number of counts measured for WS channel 11
|                | 60 | BPY_WD_OUT_12    | ASCII_Integer | Number of counts measured for WS channel 12
|                | 61 | BPY_ASIC_TEMP    | ASCII_Integer | ASIC temperature
|                | 62 | BPY_AIR_TEMP_FREQUENCY | ASCII_String |Air temperature channels frequency or frequencies
|                | 63 | BPY_WIND_FREQUENCY | ASCII_String | Wind channels frequency or frequencies

In [7]:
# df.BPY_AIR_TEMP_FREQUENCY

#### TWINS RAW Data
>In continuous mode, and for mid-rod and extreme-rod air temperature PT1000 sensors, the
recorded value is an average and a standard deviation using a window of N samples (N
defined to 7). This window starts at the time in which the decimation process selects the
rest of the data, each M samples. This means that the actual averages and standard
deviations are obtained N seconds after the rest of the samples. For convenience, the data
products contain all values aligned at the same time tags.  
Air Temperature data for which the average and standard deviations are calculated have
three columns: one for the raw value, for modes in which a single sample was retrieved,
and two others with the average and standard deviations. Depending on the downlink mode
used to retrieve the data, the appropriate column will contain the value and the other(s) will
be empty.

## My plan
BOOM -Y , Take average of all these temps and make one column
- BMY_2L_TEMP_1
- BMY_2L_TEMP_2
- BMY_2L_TEMP_3
- BMY_2L_TEMP_4
- BMY_2L_TEMP_4_AVERAGE [DROP]
- BPY_2L_TEMP_5
- BPY_2L_TEMP_5_AVERAGE [DROP]
- BPY_2L_TEMP_5_STD [DROP]
- BPY_2L_TEMP_6
---
- BMY_AIR_TEMP
- BMY_AIR_TEMP_AVERAGE [DROP]
- BMY_AIR_TEMP_STD [DROP]
---
Take the average of these
- BMY_WD_REF_OUT_1
- BMY_WD_REF_OUT_2
- BMY_WD_REF_OUT_3
---
Take the average of these
- BPY_WD_OUT_1
- BPY_WD_OUT_2
- BPY_WD_OUT_3
- BPY_WD_OUT_4
- BPY_WD_OUT_5
- BPY_WD_OUT_6
- BPY_WD_OUT_7
- BPY_WD_OUT_8
- BPY_WD_OUT_9
- BPY_WD_OUT_10
- BPY_WD_OUT_11
- BPY_WD_OUT_12
---
- BMY_ASIC_TEMP
- BPY_AIR_TEMP_FREQUENCY
- BPY_WIND_FREQUENCY

BOOM +Y
- BPY_2L_TEMP_1
- BPY_2L_TEMP_2
- BPY_2L_TEMP_3
- BPY_2L_TEMP_4
- BPY_2L_TEMP_5
- BPY_2L_TEMP_5_AVERAGE
- BPY_2L_TEMP_5_STD
- BPY_2L_TEMP_6
---
- BPY_AIR_TEMP
- BPY_AIR_TEMP_AVERAGE
- BPY_AIR_TEMP_STD
---
- BPY_WD_REF_OUT_1
- BPY_WD_REF_OUT_2
- BPY_WD_REF_OUT_3
---
- BPY_WD_OUT_1
- BPY_WD_OUT_2
- BPY_WD_OUT_3
- BPY_WD_OUT_4
- BPY_WD_OUT_5
- BPY_WD_OUT_6
- BPY_WD_OUT_7
- BPY_WD_OUT_8
- BPY_WD_OUT_9
- BPY_WD_OUT_10
- BPY_WD_OUT_11
- BPY_WD_OUT_12
---
- BPY_ASIC_TEMP
- BPY_AIR_TEMP_FREQUENCY
- BPY_WIND_FREQUENCY

In [10]:
df.head()

Unnamed: 0,AOBT,SCLK,LMST,LTST,UTC,BMY_2L_TEMP_1,BMY_2L_TEMP_2,BMY_2L_TEMP_3,BMY_2L_TEMP_4,BMY_2L_TEMP_4_AVERAGE,BMY_2L_TEMP_4_STD,BMY_2L_TEMP_5,BMY_2L_TEMP_6,BMY_AIR_TEMP,BMY_AIR_TEMP_AVERAGE,BMY_AIR_TEMP_STD,BMY_WD_REF_OUT_1,BMY_WD_REF_OUT_2,BMY_WD_REF_OUT_3,BMY_WD_OUT_1,BMY_WD_OUT_2,BMY_WD_OUT_3,BMY_WD_OUT_4,BMY_WD_OUT_5,BMY_WD_OUT_6,BMY_WD_OUT_7,BMY_WD_OUT_8,BMY_WD_OUT_9,BMY_WD_OUT_10,BMY_WD_OUT_11,BMY_WD_OUT_12,BMY_WIND_FREQUENCY,BMY_AIR_TEMP_FREQUENCY,BMY_ASIC_TEMP,BPY_2L_TEMP_1,BPY_2L_TEMP_2,BPY_2L_TEMP_3,BPY_2L_TEMP_4,BPY_2L_TEMP_5,BPY_2L_TEMP_5_AVERAGE,BPY_2L_TEMP_5_STD,BPY_2L_TEMP_6,BPY_AIR_TEMP,BPY_AIR_TEMP_AVERAGE,BPY_AIR_TEMP_STD,BPY_WD_REF_OUT_1,BPY_WD_REF_OUT_2,BPY_WD_REF_OUT_3,BPY_WD_OUT_1,BPY_WD_OUT_2,BPY_WD_OUT_3,BPY_WD_OUT_4,BPY_WD_OUT_5,BPY_WD_OUT_6,BPY_WD_OUT_7,BPY_WD_OUT_8,BPY_WD_OUT_9,BPY_WD_OUT_10,BPY_WD_OUT_11,BPY_WD_OUT_12,BPY_WIND_FREQUENCY,BPY_AIR_TEMP_FREQUENCY,BPY_ASIC_TEMP
0,596876952.0,596861200.0,00004M06:46:33.826,00004 06:05:41,2018-334T14:46:55.755Z,-4353.0,-4645.0,-4778.0,-5006.0,,,-4512.0,1002.0,-5703.0,,,4369.0,5167.0,5599.0,648.0,662.0,648.0,1687.0,3360.0,1109.0,1019.0,1229.0,868.0,2451.0,760.0,663.0,1.0,1.0,8485.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,596876953.0,596861200.0,00004M06:46:34.799,00004 06:05:42,2018-334T14:46:56.755Z,-4415.0,-4723.0,-4870.0,-5099.0,,,-4600.0,936.0,-5776.0,,,4274.0,5108.0,5524.0,867.0,379.0,828.0,1033.0,10.0,551.0,578.0,324.0,637.0,418.0,526.0,492.0,1.0,1.0,8488.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,596876954.0,596861200.0,00004M06:46:35.772,00004 06:05:43,2018-334T14:46:57.755Z,-4424.0,-4727.0,-4861.0,-5107.0,,,-4613.0,930.0,-5803.0,,,4301.0,5100.0,5532.0,713.0,521.0,641.0,1085.0,1194.0,855.0,807.0,918.0,859.0,1233.0,741.0,661.0,1.0,1.0,8491.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,596876955.0,596861200.0,00004M06:46:36.746,00004 06:05:44,2018-334T14:46:58.755Z,-4409.0,-4722.0,-4859.0,-5109.0,,,-4603.0,916.0,-5771.0,,,4297.0,5111.0,5539.0,687.0,578.0,662.0,1028.0,1254.0,861.0,857.0,991.0,879.0,1190.0,717.0,689.0,1.0,1.0,8494.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,596876956.0,596861200.0,00004M06:46:37.719,00004 06:05:45,2018-334T14:46:59.755Z,-4424.0,-4728.0,-4873.0,-5107.0,,,-4610.0,922.0,-5754.0,,,4296.0,5113.0,5535.0,438.0,284.0,599.0,862.0,1085.0,748.0,969.0,1049.0,813.0,1097.0,417.0,443.0,1.0,1.0,8498.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


During science monitoring phase, after all instruments (SEIS, HP3) are deployed, only one
boom will be used at a time. A selection of which boom is operative (measuring) will be
done in accordance to where the wind flows come from, highly determined by the time of
the Martian day (among other factors). The boom selection pattern will be commanded
once per week, based on past observations and predictive models

In [13]:
# If I'm being honest there's a lot of overwhelming data here
# So I'm gonna break it down...
# I'll start by making time into a dataframe
time = df[['AOBT', 'SCLK', 'LMST', 'LTST', 'UTC']].head()
time.head()

Unnamed: 0,AOBT,SCLK,LMST,LTST,UTC
0,596876952.0,596861200.0,00004M06:46:33.826,00004 06:05:41,2018-334T14:46:55.755Z
1,596876953.0,596861200.0,00004M06:46:34.799,00004 06:05:42,2018-334T14:46:56.755Z
2,596876954.0,596861200.0,00004M06:46:35.772,00004 06:05:43,2018-334T14:46:57.755Z
3,596876955.0,596861200.0,00004M06:46:36.746,00004 06:05:44,2018-334T14:46:58.755Z
4,596876956.0,596861200.0,00004M06:46:37.719,00004 06:05:45,2018-334T14:46:59.755Z


In [16]:
# Now to separate and examine the BMY_2L_TEMP data
# Why does TEMP 6 have positive numbers?
BMY_2L_Temp = df[['BMY_2L_TEMP_1', 'BMY_2L_TEMP_2', 'BMY_2L_TEMP_3', 'BMY_2L_TEMP_4', 'BMY_2L_TEMP_5', 'BMY_2L_TEMP_6']]
BMY_2L_Temp.head()

Unnamed: 0,BMY_2L_TEMP_1,BMY_2L_TEMP_2,BMY_2L_TEMP_3,BMY_2L_TEMP_4,BMY_2L_TEMP_5,BMY_2L_TEMP_6
0,-4353.0,-4645.0,-4778.0,-5006.0,-4512.0,1002.0
1,-4415.0,-4723.0,-4870.0,-5099.0,-4600.0,936.0
2,-4424.0,-4727.0,-4861.0,-5107.0,-4613.0,930.0
3,-4409.0,-4722.0,-4859.0,-5109.0,-4603.0,916.0
4,-4424.0,-4728.0,-4873.0,-5107.0,-4610.0,922.0


In [18]:
# I know it's simple but I want to separate EVERYTHING
BMY_AIR_TEMP = df.BMY_AIR_TEMP
BMY_AIR_TEMP.head()

0   -5703.0
1   -5776.0
2   -5803.0
3   -5771.0
4   -5754.0
Name: BMY_AIR_TEMP, dtype: float64

In [20]:
# Boom Minus Y reference out = Wind Speed transducer cold die temperature
BMY_WD_REF_OUT = df[['BMY_WD_REF_OUT_1', 'BMY_WD_REF_OUT_2', 'BMY_WD_REF_OUT_3']]
BMY_WD_REF_OUT.head()

Unnamed: 0,BMY_WD_REF_OUT_1,BMY_WD_REF_OUT_2,BMY_WD_REF_OUT_3
0,4369.0,5167.0,5599.0
1,4274.0,5108.0,5524.0
2,4301.0,5100.0,5532.0
3,4297.0,5111.0,5539.0
4,4296.0,5113.0,5535.0


In [22]:
# Boom Minus Y Wind Direction OUT = Number of counts measured for Wind Speed channel #
BMY_WD_OUT = df[['BMY_WD_OUT_1', 'BMY_WD_OUT_2', 'BMY_WD_OUT_3', 'BMY_WD_OUT_4', 'BMY_WD_OUT_5', 'BMY_WD_OUT_6', 'BMY_WD_OUT_7', 'BMY_WD_OUT_8', 'BMY_WD_OUT_9', 'BMY_WD_OUT_10', 'BMY_WD_OUT_11',  'BMY_WD_OUT_12']]
BMY_WD_OUT.head()

Unnamed: 0,BMY_WD_OUT_1,BMY_WD_OUT_2,BMY_WD_OUT_3,BMY_WD_OUT_4,BMY_WD_OUT_5,BMY_WD_OUT_6,BMY_WD_OUT_7,BMY_WD_OUT_8,BMY_WD_OUT_9,BMY_WD_OUT_10,BMY_WD_OUT_11,BMY_WD_OUT_12
0,648.0,662.0,648.0,1687.0,3360.0,1109.0,1019.0,1229.0,868.0,2451.0,760.0,663.0
1,867.0,379.0,828.0,1033.0,10.0,551.0,578.0,324.0,637.0,418.0,526.0,492.0
2,713.0,521.0,641.0,1085.0,1194.0,855.0,807.0,918.0,859.0,1233.0,741.0,661.0
3,687.0,578.0,662.0,1028.0,1254.0,861.0,857.0,991.0,879.0,1190.0,717.0,689.0
4,438.0,284.0,599.0,862.0,1085.0,748.0,969.0,1049.0,813.0,1097.0,417.0,443.0


In [26]:
frequency = df[['BMY_WIND_FREQUENCY', 'BMY_AIR_TEMP_FREQUENCY']]
BMY_ASIC_TEMP = df.BMY_ASIC_TEMP

In [28]:
BPY_2L_TEMP = df[['BPY_2L_TEMP_1', 'BPY_2L_TEMP_2', 'BPY_2L_TEMP_3', 'BPY_2L_TEMP_4', 'BPY_2L_TEMP_5', 'BPY_2L_TEMP_6']]
BPY_AIR_TEMP = df.BPY_AIR_TEMP
BPY_WD_REF_OUT = df[['BPY_WD_REF_OUT_1', 'BPY_WD_REF_OUT_2', 'BPY_WD_REF_OUT_3']]
BPY_WD_OUT = df[['BPY_WD_OUT_1', 'BPY_WD_OUT_2', 'BPY_WD_OUT_3', 'BPY_WD_OUT_4', 'BPY_WD_OUT_5', 'BPY_WD_OUT_6', 'BPY_WD_OUT_7', 'BPY_WD_OUT_8', 'BPY_WD_OUT_9', 'BPY_WD_OUT_10', 'BPY_WD_OUT_11', 'BPY_WD_OUT_12']]
BPY_FREQUENCY = df[['BPY_WIND_FREQUENCY', 'BPY_AIR_TEMP_FREQUENCY']]
BPY_ASIC_TEMP = df.BPY_ASIC_TEMP


In [25]:
df.BMY_AIR_TEMP_FREQUENCY.value_counts()

0.1    732339
0.5    690182
1.0     17480
Name: BMY_AIR_TEMP_FREQUENCY, dtype: int64

In [32]:
# I don't know why temp 6 is a positive number
# Temp 6 is for the Calibration resistor: 1K ohm
# I will drop it from the average 
BMY_2L_Temp['BMY_2L_TEMP_AVG'] = (BMY_2L_Temp.BMY_2L_TEMP_1 + BMY_2L_Temp.BMY_2L_TEMP_2 + BMY_2L_Temp.BMY_2L_TEMP_3 + BMY_2L_Temp.BMY_2L_TEMP_4 + BMY_2L_Temp.BMY_2L_TEMP_5)/5
BMY_2L_Temp.head()


Unnamed: 0,BMY_2L_TEMP_1,BMY_2L_TEMP_2,BMY_2L_TEMP_3,BMY_2L_TEMP_4,BMY_2L_TEMP_5,BMY_2L_TEMP_6,BMY_2L_TEMP_AVG
0,-4353.0,-4645.0,-4778.0,-5006.0,-4512.0,1002.0,-4658.8
1,-4415.0,-4723.0,-4870.0,-5099.0,-4600.0,936.0,-4741.4
2,-4424.0,-4727.0,-4861.0,-5107.0,-4613.0,930.0,-4746.4
3,-4409.0,-4722.0,-4859.0,-5109.0,-4603.0,916.0,-4740.4
4,-4424.0,-4728.0,-4873.0,-5107.0,-4610.0,922.0,-4748.4


In [54]:
# Wind Speed transducer cold die temperature
# I am going to make a column for the average of the 3 transducers
BMY_WD_REF_OUT['BMY_WD_REF_OUT_AVG'] = (BMY_WD_REF_OUT.BMY_WD_REF_OUT_1 + BMY_WD_REF_OUT.BMY_WD_REF_OUT_2 + BMY_WD_REF_OUT.BMY_WD_REF_OUT_3)/3
BMY_WD_REF_OUT.head()


Unnamed: 0,BMY_WD_REF_OUT_1,BMY_WD_REF_OUT_2,BMY_WD_REF_OUT_3,average,BMY_WD_REF_OUT_AVG
0,4369.0,5167.0,5599.0,5045.0,5045.0
1,4274.0,5108.0,5524.0,4968.666667,4968.666667
2,4301.0,5100.0,5532.0,4977.666667,4977.666667
3,4297.0,5111.0,5539.0,4982.333333,4982.333333
4,4296.0,5113.0,5535.0,4981.333333,4981.333333


In [48]:
# should i keep them separate or combine them with a loop where it takes the column without null 
# or averages if both columns have data?
temp = pd.concat([BMY_AIR_TEMP , BPY_AIR_TEMP], axis=1)
temp.head()

Unnamed: 0,BMY_AIR_TEMP,BPY_AIR_TEMP
0,-5703.0,
1,-5776.0,
2,-5803.0,
3,-5771.0,
4,-5754.0,


In [49]:
# This is wind direction. I'm not sure I can calculate this in my MVP, so I think I'll leave this out for now
# I'll come back for this later
BMY_WD_OUT.head()

Unnamed: 0,BMY_WD_OUT_1,BMY_WD_OUT_2,BMY_WD_OUT_3,BMY_WD_OUT_4,BMY_WD_OUT_5,BMY_WD_OUT_6,BMY_WD_OUT_7,BMY_WD_OUT_8,BMY_WD_OUT_9,BMY_WD_OUT_10,BMY_WD_OUT_11,BMY_WD_OUT_12
0,648.0,662.0,648.0,1687.0,3360.0,1109.0,1019.0,1229.0,868.0,2451.0,760.0,663.0
1,867.0,379.0,828.0,1033.0,10.0,551.0,578.0,324.0,637.0,418.0,526.0,492.0
2,713.0,521.0,641.0,1085.0,1194.0,855.0,807.0,918.0,859.0,1233.0,741.0,661.0
3,687.0,578.0,662.0,1028.0,1254.0,861.0,857.0,991.0,879.0,1190.0,717.0,689.0
4,438.0,284.0,599.0,862.0,1085.0,748.0,969.0,1049.0,813.0,1097.0,417.0,443.0


In [50]:
# Air temperature channels frequency or frequencies
# Wind channels frequency or frequencies
# I don't think I need to know what frequency channel they were using at the time
# Not using in MVP 
frequency.head()

Unnamed: 0,BMY_WIND_FREQUENCY,BMY_AIR_TEMP_FREQUENCY
0,1.0,1.0
1,1.0,1.0
2,1.0,1.0
3,1.0,1.0
4,1.0,1.0


In [55]:
BMY_simple_df = pd.concat([BMY_2L_Temp.BMY_2L_TEMP_AVG, # temperature
                           BMY_AIR_TEMP, # temperature
                           BMY_WD_REF_OUT.BMY_WD_REF_OUT_AVG, # temperature
                           BMY_ASIC_TEMP # temperature
                          ], axis=1)

BMY_simple_df.head()


Unnamed: 0,BMY_2L_TEMP_AVG,BMY_AIR_TEMP,BMY_WD_REF_OUT_AVG,BMY_ASIC_TEMP
0,-4658.8,-5703.0,5045.0,8485.0
1,-4741.4,-5776.0,4968.666667,8488.0
2,-4746.4,-5803.0,4977.666667,8491.0
3,-4740.4,-5771.0,4982.333333,8494.0
4,-4748.4,-5754.0,4981.333333,8498.0


In [60]:
# BPY_2L_TEMP_4 is the resistor, so I'll skip that for averaging again
BPY_2L_TEMP['BPY_2L_TEMP_AVG'] = (BPY_2L_TEMP.BPY_2L_TEMP_1 + BPY_2L_TEMP.BPY_2L_TEMP_2 + BPY_2L_TEMP.BPY_2L_TEMP_3 + BPY_2L_TEMP.BPY_2L_TEMP_5 + BPY_2L_TEMP.BPY_2L_TEMP_6)/5
BPY_2L_TEMP.head(100)


Unnamed: 0,BPY_2L_TEMP_1,BPY_2L_TEMP_2,BPY_2L_TEMP_3,BPY_2L_TEMP_4,BPY_2L_TEMP_5,BPY_2L_TEMP_6,BPY_2L_TEMP_AVG
0,,,,,,,
1,,,,,,,
2,,,,,,,
3,,,,,,,
4,,,,,,,
5,,,,,,,
6,,,,,,,
7,,,,,,,
8,,,,,,,
9,,,,,,,


In [63]:
# 
BPY_WD_REF_OUT['BPY_WD_REF_OUT_AVG'] = (BPY_WD_REF_OUT.BPY_WD_REF_OUT_1 + BPY_WD_REF_OUT.BPY_WD_REF_OUT_2 + BPY_WD_REF_OUT.BPY_WD_REF_OUT_3)/3

BPY_WD_REF_OUT.head(100)


Unnamed: 0,BPY_WD_REF_OUT_1,BPY_WD_REF_OUT_2,BPY_WD_REF_OUT_3,BPY_WD_REF_OUT_AVG
0,,,,
1,,,,
2,,,,
3,,,,
4,,,,
5,,,,
6,,,,
7,,,,
8,,,,
9,,,,


In [64]:
BPY_simple_df = pd.concat([BPY_2L_TEMP.BPY_2L_TEMP_AVG, # temperature
                           BPY_AIR_TEMP, # temperature
                           BPY_WD_REF_OUT.BPY_WD_REF_OUT_AVG, # temperature
                           BPY_ASIC_TEMP # temperature
                          ], axis=1)
BPY_simple_df.head(100)

Unnamed: 0,BPY_2L_TEMP_AVG,BPY_AIR_TEMP,BPY_WD_REF_OUT_AVG,BPY_ASIC_TEMP
0,,,,
1,,,,
2,,,,
3,,,,
4,,,,
5,,,,
6,,,,
7,,,,
8,,,,
9,,,,


In [65]:
df.head()

Unnamed: 0,AOBT,SCLK,LMST,LTST,UTC,BMY_2L_TEMP_1,BMY_2L_TEMP_2,BMY_2L_TEMP_3,BMY_2L_TEMP_4,BMY_2L_TEMP_4_AVERAGE,BMY_2L_TEMP_4_STD,BMY_2L_TEMP_5,BMY_2L_TEMP_6,BMY_AIR_TEMP,BMY_AIR_TEMP_AVERAGE,BMY_AIR_TEMP_STD,BMY_WD_REF_OUT_1,BMY_WD_REF_OUT_2,BMY_WD_REF_OUT_3,BMY_WD_OUT_1,BMY_WD_OUT_2,BMY_WD_OUT_3,BMY_WD_OUT_4,BMY_WD_OUT_5,BMY_WD_OUT_6,BMY_WD_OUT_7,BMY_WD_OUT_8,BMY_WD_OUT_9,BMY_WD_OUT_10,BMY_WD_OUT_11,BMY_WD_OUT_12,BMY_WIND_FREQUENCY,BMY_AIR_TEMP_FREQUENCY,BMY_ASIC_TEMP,BPY_2L_TEMP_1,BPY_2L_TEMP_2,BPY_2L_TEMP_3,BPY_2L_TEMP_4,BPY_2L_TEMP_5,BPY_2L_TEMP_5_AVERAGE,BPY_2L_TEMP_5_STD,BPY_2L_TEMP_6,BPY_AIR_TEMP,BPY_AIR_TEMP_AVERAGE,BPY_AIR_TEMP_STD,BPY_WD_REF_OUT_1,BPY_WD_REF_OUT_2,BPY_WD_REF_OUT_3,BPY_WD_OUT_1,BPY_WD_OUT_2,BPY_WD_OUT_3,BPY_WD_OUT_4,BPY_WD_OUT_5,BPY_WD_OUT_6,BPY_WD_OUT_7,BPY_WD_OUT_8,BPY_WD_OUT_9,BPY_WD_OUT_10,BPY_WD_OUT_11,BPY_WD_OUT_12,BPY_WIND_FREQUENCY,BPY_AIR_TEMP_FREQUENCY,BPY_ASIC_TEMP
0,596876952.0,596861200.0,00004M06:46:33.826,00004 06:05:41,2018-334T14:46:55.755Z,-4353.0,-4645.0,-4778.0,-5006.0,,,-4512.0,1002.0,-5703.0,,,4369.0,5167.0,5599.0,648.0,662.0,648.0,1687.0,3360.0,1109.0,1019.0,1229.0,868.0,2451.0,760.0,663.0,1.0,1.0,8485.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,596876953.0,596861200.0,00004M06:46:34.799,00004 06:05:42,2018-334T14:46:56.755Z,-4415.0,-4723.0,-4870.0,-5099.0,,,-4600.0,936.0,-5776.0,,,4274.0,5108.0,5524.0,867.0,379.0,828.0,1033.0,10.0,551.0,578.0,324.0,637.0,418.0,526.0,492.0,1.0,1.0,8488.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,596876954.0,596861200.0,00004M06:46:35.772,00004 06:05:43,2018-334T14:46:57.755Z,-4424.0,-4727.0,-4861.0,-5107.0,,,-4613.0,930.0,-5803.0,,,4301.0,5100.0,5532.0,713.0,521.0,641.0,1085.0,1194.0,855.0,807.0,918.0,859.0,1233.0,741.0,661.0,1.0,1.0,8491.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,596876955.0,596861200.0,00004M06:46:36.746,00004 06:05:44,2018-334T14:46:58.755Z,-4409.0,-4722.0,-4859.0,-5109.0,,,-4603.0,916.0,-5771.0,,,4297.0,5111.0,5539.0,687.0,578.0,662.0,1028.0,1254.0,861.0,857.0,991.0,879.0,1190.0,717.0,689.0,1.0,1.0,8494.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,596876956.0,596861200.0,00004M06:46:37.719,00004 06:05:45,2018-334T14:46:59.755Z,-4424.0,-4728.0,-4873.0,-5107.0,,,-4610.0,922.0,-5754.0,,,4296.0,5113.0,5535.0,438.0,284.0,599.0,862.0,1085.0,748.0,969.0,1049.0,813.0,1097.0,417.0,443.0,1.0,1.0,8498.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [85]:
# Is there an overlap where BMY and BPY are collecting data at the same time?
# YES!
df[['BMY_2L_TEMP_1', 'BPY_2L_TEMP_1']].isnull().sum(axis=1).value_counts()

1    1821589
0     528809
2         38
dtype: int64

Translation:  
True + False = 1: There is only one column that has data (is not null)  
False + False = 0: Both columns have data, (niether is null)  
True + True = 2: Neither column has any recorded data (both are null)

SyntaxError: cannot assign to True (1770609971.py, line 1)

Takeaways:
- BMY = Boom minus Y
- BPY = Boom plus Y

- There are so columns simply for telling time! The on board APSS time, the Universal time, local time, etc.
- I want to predict Air Temperature and Wind speed/direction