# Data Preparation
Now that I've got the data, I have to clean it in a way that I can explore and model on it. This will require an understanding of what the columns represent, versus what I will need in the final product.

In [13]:
# Data Science Libraries
import pandas as pd

# Import my own functions
import wrangle

# Block Warning Boxes
import warnings
warnings.filterwarnings("ignore")

# Remove Limits On Viewing Dataframes
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In response to my time constraint for this project, I will reduce my data to just twins_raw data for now. I will keep the code for exploring all the data, including `twins_rawevent` and `twins_config` for another time.

#### TWINS raw data
>This data product contains TWINS raw science data downloaded from the spacecraft in
continuous mode. This includes Air Temperature Sensor PT1000 raw values, Wind Sensor
counters and temperatures, and ASIC temperature. 

In [2]:
# Acquiring the data using functions in wrangle
df = wrangle.twins_raw_data()

Requesting Data...
Organizing list of file names...
Combining twins_raw csv files...
Creating single file...
'INSIGHT_TWINS_RAW.csv' created.


In [3]:
# How big is this dataframe?
df.shape

(2350436, 63)

In [4]:
# What does it look like?
df.head()

Unnamed: 0,AOBT,SCLK,LMST,LTST,UTC,BMY_2L_TEMP_1,BMY_2L_TEMP_2,BMY_2L_TEMP_3,BMY_2L_TEMP_4,BMY_2L_TEMP_4_AVERAGE,BMY_2L_TEMP_4_STD,BMY_2L_TEMP_5,BMY_2L_TEMP_6,BMY_AIR_TEMP,BMY_AIR_TEMP_AVERAGE,BMY_AIR_TEMP_STD,BMY_WD_REF_OUT_1,BMY_WD_REF_OUT_2,BMY_WD_REF_OUT_3,BMY_WD_OUT_1,BMY_WD_OUT_2,BMY_WD_OUT_3,BMY_WD_OUT_4,BMY_WD_OUT_5,BMY_WD_OUT_6,BMY_WD_OUT_7,BMY_WD_OUT_8,BMY_WD_OUT_9,BMY_WD_OUT_10,BMY_WD_OUT_11,BMY_WD_OUT_12,BMY_WIND_FREQUENCY,BMY_AIR_TEMP_FREQUENCY,BMY_ASIC_TEMP,BPY_2L_TEMP_1,BPY_2L_TEMP_2,BPY_2L_TEMP_3,BPY_2L_TEMP_4,BPY_2L_TEMP_5,BPY_2L_TEMP_5_AVERAGE,BPY_2L_TEMP_5_STD,BPY_2L_TEMP_6,BPY_AIR_TEMP,BPY_AIR_TEMP_AVERAGE,BPY_AIR_TEMP_STD,BPY_WD_REF_OUT_1,BPY_WD_REF_OUT_2,BPY_WD_REF_OUT_3,BPY_WD_OUT_1,BPY_WD_OUT_2,BPY_WD_OUT_3,BPY_WD_OUT_4,BPY_WD_OUT_5,BPY_WD_OUT_6,BPY_WD_OUT_7,BPY_WD_OUT_8,BPY_WD_OUT_9,BPY_WD_OUT_10,BPY_WD_OUT_11,BPY_WD_OUT_12,BPY_WIND_FREQUENCY,BPY_AIR_TEMP_FREQUENCY,BPY_ASIC_TEMP
0,596876952.0,596861200.0,00004M06:46:33.826,00004 06:05:41,2018-334T14:46:55.755Z,-4353.0,-4645.0,-4778.0,-5006.0,,,-4512.0,1002.0,-5703.0,,,4369.0,5167.0,5599.0,648.0,662.0,648.0,1687.0,3360.0,1109.0,1019.0,1229.0,868.0,2451.0,760.0,663.0,1.0,1.0,8485.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,596876953.0,596861200.0,00004M06:46:34.799,00004 06:05:42,2018-334T14:46:56.755Z,-4415.0,-4723.0,-4870.0,-5099.0,,,-4600.0,936.0,-5776.0,,,4274.0,5108.0,5524.0,867.0,379.0,828.0,1033.0,10.0,551.0,578.0,324.0,637.0,418.0,526.0,492.0,1.0,1.0,8488.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,596876954.0,596861200.0,00004M06:46:35.772,00004 06:05:43,2018-334T14:46:57.755Z,-4424.0,-4727.0,-4861.0,-5107.0,,,-4613.0,930.0,-5803.0,,,4301.0,5100.0,5532.0,713.0,521.0,641.0,1085.0,1194.0,855.0,807.0,918.0,859.0,1233.0,741.0,661.0,1.0,1.0,8491.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,596876955.0,596861200.0,00004M06:46:36.746,00004 06:05:44,2018-334T14:46:58.755Z,-4409.0,-4722.0,-4859.0,-5109.0,,,-4603.0,916.0,-5771.0,,,4297.0,5111.0,5539.0,687.0,578.0,662.0,1028.0,1254.0,861.0,857.0,991.0,879.0,1190.0,717.0,689.0,1.0,1.0,8494.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,596876956.0,596861200.0,00004M06:46:37.719,00004 06:05:45,2018-334T14:46:59.755Z,-4424.0,-4728.0,-4873.0,-5107.0,,,-4610.0,922.0,-5754.0,,,4296.0,5113.0,5535.0,438.0,284.0,599.0,862.0,1085.0,748.0,969.0,1049.0,813.0,1097.0,417.0,443.0,1.0,1.0,8498.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [11]:
# What do my columns look like
df.info(null_counts=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2350436 entries, 0 to 4443
Data columns (total 63 columns):
 #   Column                  Non-Null Count    Dtype  
---  ------                  --------------    -----  
 0   AOBT                    2350436 non-null  float64
 1   SCLK                    2350436 non-null  float64
 2   LMST                    2350436 non-null  object 
 3   LTST                    2350436 non-null  object 
 4   UTC                     2350436 non-null  object 
 5   BMY_2L_TEMP_1           1440001 non-null  float64
 6   BMY_2L_TEMP_2           1440001 non-null  float64
 7   BMY_2L_TEMP_3           1440001 non-null  float64
 8   BMY_2L_TEMP_4           707662 non-null   float64
 9   BMY_2L_TEMP_4_AVERAGE   732324 non-null   float64
 10  BMY_2L_TEMP_4_STD       732324 non-null   float64
 11  BMY_2L_TEMP_5           1440001 non-null  float64
 12  BMY_2L_TEMP_6           1440001 non-null  float64
 13  BMY_AIR_TEMP            707662 non-null   float64
 14  BMY_A

In [14]:
# I can see I have some missing values, let's see how many in each column:
df.isnull().sum()

AOBT                            0
SCLK                            0
LMST                            0
LTST                            0
UTC                             0
BMY_2L_TEMP_1              910435
BMY_2L_TEMP_2              910435
BMY_2L_TEMP_3              910435
BMY_2L_TEMP_4             1642774
BMY_2L_TEMP_4_AVERAGE     1618112
BMY_2L_TEMP_4_STD         1618112
BMY_2L_TEMP_5              910435
BMY_2L_TEMP_6              910435
BMY_AIR_TEMP              1642774
BMY_AIR_TEMP_AVERAGE      1618112
BMY_AIR_TEMP_STD          1618112
BMY_WD_REF_OUT_1           910435
BMY_WD_REF_OUT_2           910435
BMY_WD_REF_OUT_3           912043
BMY_WD_OUT_1               910435
BMY_WD_OUT_2               910435
BMY_WD_OUT_3               910435
BMY_WD_OUT_4               910435
BMY_WD_OUT_5               911459
BMY_WD_OUT_6               910435
BMY_WD_OUT_7               910435
BMY_WD_OUT_8               910435
BMY_WD_OUT_9               910435
BMY_WD_OUT_10              910435
BMY_WD_OUT_11 

Now I'd like to drop some columns so my data is simple to explore, however I think I should create a data dictionary first so I don't delete anything important!

## TWINS Raw Data Columns
|System          |  # | Column  |Data Type              | Description                                         |
|:---------------|:---|:--------|:----------------------|:----------------------------------------------------|
|Time References |  1 | AOBT    |ASCII_Real             | APSS Onboard Time
|                |  2 | SCLK    |ASCII_Real             | Spacecraft Clock
|                |  3 | LMST    |ASCII_String           | Local Mean Solar Time
|                |  4 | LTST    |ASCII_String           | Local True Solar Time
|                |  5 | UTC     |ASCII_Date_Time_DOY_UTC| Coordinated Universal Time
|================|====|=========|=======================|=====================================================|
| BOOM -Y        |  6 | BMY_2L_TEMP_1 | ASCII_Integer | WS transducer 1 PCB temperature PT-1000 PRT |
|                |  7 | BMY_2L_TEMP_2 | ASCII_Integer | WS transducer 2 PCB temperature PT-1000 PRT | 
|                |  8 | BMY_2L_TEMP_3 | ASCII_Integer | WS transducer 3 PCB temperature PT-1000 PRT |
|                |  9 | BMY_2L_TEMP_4 | ASCII_Integer | ATS-mid-rodtemperature: PT1000 PRT sensor located at an intermediate position in the ATS rod|
|                | 10 | BMY_2L_TEMP_4_AVERAGE | ASCII_Integer | ATS-mid-rod temperature average of the last N samples
|                | 11 | BMY_2L_TEMP_4_STD     | ASCII_Integer | ATS-mid-rod temperature standard deviation of the last N samples
|                | 12 | BMY_2L_TEMP_5         | ASCII_Integer | Boom Housing Temp: PT-1000 PRT located at the Boom housing near the base of the ATS rod
|                | 13 | BMY_2L_TEMP_6         | ASCII_Integer | Calibration resistor: 1K ohm
|                | 14 | BMY_AIR_TEMP | ASCII_Integer | ATS-rod-extreme temperature: PT1000 PRT located at ATS extreme
|                | 15 | BMY_AIR_TEMP_AVERAGE  | ASCII_Integer | ATS-rod-extreme temperature average of the last N samples
|                | 16 | BMY_AIR_TEMP_STD | ASCII_Integer | ATS-rod-extreme temperature standard deviation of the last N samples
|                | 17 | BMY_WD_REF_OUT_1 | ASCII_Integer | WS transducer 1 cold die temperature
|                | 18 | BMY_WD_REF_OUT_2 | ASCII_Integer | WS transducer 2 cold die temperature
|                | 19 | BMY_WD_REF_OUT_3 | ASCII_Integer | WS transducer 3 cold die temperature
|                | 20 | BMY_WD_OUT_1     | ASCII_Integer | Number of counts measured for WS channel 1
|                | 21 | BMY_WD_OUT_2     | ASCII_Integer | Number of counts measured for WS channel 2
|                | 22 | BMY_WD_OUT_3     | ASCII_Integer | Number of counts measured for WS channel 3
|                | 23 | BMY_WD_OUT_4     | ASCII_Integer | Number of counts measured for WS channel 4
|                | 24 | BMY_WD_OUT_5     | ASCII_Integer | Number of counts measured for WS channel 5
|                | 25 | BMY_WD_OUT_6     | ASCII_Integer | Number of counts measured for WS channel 6
|                | 26 | BMY_WD_OUT_7     | ASCII_Integer | Number of counts measured for WS channel 7
|                | 27 | BMY_WD_OUT_8     | ASCII_Integer | Number of counts measured for WS channel 8
|                | 28 | BMY_WD_OUT_9     | ASCII_Integer | Number of counts measured for WS channel 9
|                | 29 | BMY_WD_OUT_10    | ASCII_Integer | Number of counts measured for WS channel 10
|                | 30 | BMY_WD_OUT_11    | ASCII_Integer | Number of counts measured for WS channel 11
|                | 31 | BMY_WD_OUT_12 | ASCII_Integer | Number of counts measured for WS channel 12
|                | 32 | BMY_ASIC_TEMP | ASCII_Integer | ASIC temperature
|                | 33 | BMY_AIR_TEMP_FREQUENCY | ASCII_String | Air temperature channels frequency or frequencies
|                | 34 | BMY_WIND_FREQUENCY | ASCII_String | Wind channels frequency or frequencies
|================|====|=========|=======================|=====================================================|
| BOOM +Y        | 35 | BPY_2L_TEMP_1 | ASCII_Integer | WS transducer 1 PCB temperature PT-1000 PRT
|                | 36 | BPY_2L_TEMP_2 | ASCII_Integer | WS transducer 2 PCB temperature PT-1000 PRT
|                | 37 | BPY_2L_TEMP_3 | ASCII_Integer | WS transducer 3 PCB temperature PT-1000 PRT
|                | 38 | BPY_2L_TEMP_4 | ASCII_Integer | Calibration resistor: 1K ohm
|                | 39 | BPY_2L_TEMP_5 | ASCII_Integer | ATS-mid-rod temperature: PT1000 PRT sensor located at a intermediate position in the ATS rod
|                | 40 | BPY_2L_TEMP_5_AVERAGE | ASCII_Integer | ATS-mid-rod temperature average of the last N samples
|                | 41 | BPY_2L_TEMP_5_STD | ASCII_Integer | ATS-mid-rod temperature standard deviation of the last N samples
|                | 42 | BPY_2L_TEMP_6 | ASCII_Integer | Boom Housing Temp: PT-1000 PRT located at the Boom housing near the base of the ATS rod
|                | 43 | BPY_AIR_TEMP | ASCII_Integer | ATS-rod-extreme temperature: PT1000 PRT located at ATS extreme
|                | 44 | BPY_AIR_TEMP_AVERAGE | ASCII_Integer | ATS-rod-extreme temperature average of the last N samples
|                | 45 | BPY_AIR_TEMP_STD | ASCII_Integer | ATS-rod-extreme temperature standard deviation of the last N samples
|                | 46 | BPY_WD_REF_OUT_1 | ASCII_Integer | WS transducer 1 cold die temperature
|                | 47 | BPY_WD_REF_OUT_2 | ASCII_Integer | WS transducer 2 cold die temperature
|                | 48 | BPY_WD_REF_OUT_3 | ASCII_Integer | WS transducer 3 cold die temperature
|                | 49 | BPY_WD_OUT_1     | ASCII_Integer | Number of counts measured for WS channel 1
|                | 50 | BPY_WD_OUT_2     | ASCII_Integer | Number of counts measured for WS channel 2
|                | 51 | BPY_WD_OUT_3     | ASCII_Integer | Number of counts measured for WS channel 3
|                | 52 | BPY_WD_OUT_4     | ASCII_Integer | Number of counts measured for WS channel 4
|                | 53 | BPY_WD_OUT_5     | ASCII_Integer | Number of counts measured for WS channel 5
|                | 54 | BPY_WD_OUT_6     | ASCII_Integer | Number of counts measured for WS channel 6
|                | 55 | BPY_WD_OUT_7     | ASCII_Integer | Number of counts measured for WS channel 7
|                | 56 | BPY_WD_OUT_8     | ASCII_Integer | Number of counts measured for WS channel 8
|                | 57 | BPY_WD_OUT_9     | ASCII_Integer | Number of counts measured for WS channel 9
|                | 58 | BPY_WD_OUT_10    | ASCII_Integer | Number of counts measured for WS channel 10
|                | 59 | BPY_WD_OUT_11    | ASCII_Integer | Number of counts measured for WS channel 11
|                | 60 | BPY_WD_OUT_12    | ASCII_Integer | Number of counts measured for WS channel 12
|                | 61 | BPY_ASIC_TEMP    | ASCII_Integer | ASIC temperature
|                | 62 | BPY_AIR_TEMP_FREQUENCY | ASCII_String |Air temperature channels frequency or frequencies
|                | 63 | BPY_WIND_FREQUENCY | ASCII_String | Wind channels frequency or frequencies

## Abbreviations and their meanings:
- `AOBT APSS OnBoard Time`
- APSS Auxiliary Payload Sensor Subsystem
- ASCII American Standard Code for Information Interchange
- ASIC Application-Specific Integrated Circuit
| Atmos PDS Atmospheres Node (NMSU, Las Cruces, NM)
| ATS Air Temperature Sensor
| CAB Centro de AstroBiologia
| CCSDS Consultative Committee for Space Data Systems
| CDR Calibrated Data Record
| CNES Centre National d’Études Spatiales
| CODMAC Committee on Data Management, Archiving, and Computing
| CSV Comma-Separated Values
| DTE Direct To Earth
| EDL Entry Descent and Landing
| EDR Experiment Data Record
| ERT Earth Received Time
| ESTA Energy Short Term Average
| FEI File Exchange Interface
| FIR Finite Impulse Response
| FOV Field of View
| FTP File Transfer Protocol
| GB Gigabyte(s)
| GEO PDS Geosciences Node (Washington University, St. Louis, Missouri)
| GSFC Goddard Space Flight Center (Greenbelt, MD)
| HK Housekeeping
| HP3 Heat Flow and Physical Properties Package
| HTML Hypertext Markup Language
| ICC Instrument Context Camera
| ICD Interface Control Document
| IDA Instrument Deployment Arm
| IDC Instrument Deployment Camera
| IDS Instrument Deployment System
| IM Information Model
| IRIS Incorporated Research Institutions for Seismology
| ISO International Standards Organization
| JPL Jet Propulsion Laboratory (Pasadena, CA)
| LID Logical Identifier
| LIDVID Versioned Logical Identifier
`LMST Local Mean Solar Time`
`LTST Local True Solar Time`
| MAG Magnetometer
| MB Megabyte(s)
| MD5 Message-Digest Algorithm 5
| MIPL Multi-Mission Instrument Processing Laboratiry
| NAIF Navigation and Ancillary Information Facility (JPL)
| NASA National Aeronautics and Space Administration
| NSSDC National Space Science Data Center (GSFC)
| PAE Payload Auxiliary Electronics
| PCB Printed Circuit Board
| PDS Planetary Data System
| PDS4 Planetary Data System Version 4
| PPI PDS Planetary Plasma Interactions Node (UCLA)
| PRT Platinum Resistance Thermometer
| PS Pressure Sensor
| SIS Software Interface Specification
| RAD Radiometer
| RCT Record Creation Time
| RISE Rotation and Interior Structure Experiment
| RMS Root Mean Square
| SCET Spacecraft Event Time
`SCLK Spacecraft Clock`
| SEED Standard for the Exchange of Earthquake Data
| SEIS Seismic Experiment for Investigating the Subsurface
| SFTP Secure File Transfer Protocol
| SIS Software Interface Specification
| SPICE Spacecraft, Planet, Instrument, C-matrix, and Events (NAIF data format)
| SPK Spacecraft and Planetary Ephemeris Kernel (NAIF)
| TBD To Be Determined
| TWINS Temperature and Wind sensor for INSight
| URN Uniform Resource Name
`UTC Coordinated Universal Time`
| VID Version Identifier
| WS Wind Sensor
| WTS Wind and Thermal Shield
| WU Washington University, St. Louis
| XML eXtensible Markup Language


#### TWINS Data Products
There are four types of data products for TWINS: three raw, two calibrated and two derived.

I

Takeaways:
- There are so columns simply for telling time! The on board APSS time, the Universal time, local time, etc.
- I want to predict Air Temperature and Wind speed/direction