# 01_IDAO_Preproc_datetime

* выделение меток времени из датасетов.

## Данные

* id — integer measurement id
* epoch — datetime in "%Y-%m-%dT%H:%M:%S.%f” format (like 2014-02-01T00:44:57.685)
* sat_id — integer satellite id
* x, y, z, x_sim, y_sim, z_sim — real (obtained with the precise simulator) and simulated (obtained with SGP4-simulator) coordinates of the satellite (km)
* Vx, Vy, Vz, Vx_sim, Vy_sim, Vz_sim — real (obtained with the precise simulator) and simulated (obtained with SGP4-simulator) velocities of the satellite (km/s)

## 0. Подключение библиотек

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

## 1. Обзор данных

In [2]:
# Тренировочные данные
train = pd.read_csv('data/train.csv')
test = pd.read_csv('data/Track 1/test.csv')

# Задача 1
submission = pd.read_csv('data/Track 1/submission.csv')

In [3]:
train.head(3)

Unnamed: 0,id,epoch,sat_id,x,y,z,Vx,Vy,Vz,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
0,0,2014-01-01T00:00:00.000,0,-8855.823863,13117.780146,-20728.353233,-0.908303,-3.808436,-2.022083,-8843.131454,13138.22169,-20741.615306,-0.907527,-3.80493,-2.024133
1,1,2014-01-01T00:46:43.000,0,-10567.672384,1619.746066,-24451.813271,-0.30259,-4.272617,-0.612796,-10555.500066,1649.289367,-24473.089556,-0.303704,-4.269816,-0.616468
2,2,2014-01-01T01:33:26.001,0,-10578.684043,-10180.46746,-24238.280949,0.277435,-4.047522,0.723155,-10571.858472,-10145.939908,-24271.169776,0.27488,-4.046788,0.718768


In [4]:
test.head(3)

Unnamed: 0,id,sat_id,epoch,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
0,3927,1,2014-02-01T00:01:45.162,-13366.891347,-14236.753503,6386.774555,4.333815,-0.692764,0.810774
1,3928,1,2014-02-01T00:22:57.007,-7370.434039,-14498.77152,7130.411325,5.077413,0.360609,0.313402
2,3929,1,2014-02-01T00:44:08.852,-572.068654,-13065.289498,7033.794876,5.519106,2.01283,-0.539412


Преобразуем столбец **epoch** во временные метки:

In [5]:
def time_preproc(row):
    row['epoch_num'] = row['epoch'].split('.')[1]
    row['datetime'] = row['epoch'].split('.')[0]
    return row

def df_epoch_preproc(df):
    df = df.apply(time_preproc, axis=1)
    df['datetime'] = pd.to_datetime(df['datetime'], format='%Y-%m-%dT%H:%M:%S')
    return df

In [6]:
train = df_epoch_preproc(train)
test = df_epoch_preproc(test)

In [7]:
train.head(3)

Unnamed: 0,id,epoch,sat_id,x,y,z,Vx,Vy,Vz,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim,epoch_num,datetime
0,0,2014-01-01T00:00:00.000,0,-8855.823863,13117.780146,-20728.353233,-0.908303,-3.808436,-2.022083,-8843.131454,13138.22169,-20741.615306,-0.907527,-3.80493,-2.024133,0,2014-01-01 00:00:00
1,1,2014-01-01T00:46:43.000,0,-10567.672384,1619.746066,-24451.813271,-0.30259,-4.272617,-0.612796,-10555.500066,1649.289367,-24473.089556,-0.303704,-4.269816,-0.616468,0,2014-01-01 00:46:43
2,2,2014-01-01T01:33:26.001,0,-10578.684043,-10180.46746,-24238.280949,0.277435,-4.047522,0.723155,-10571.858472,-10145.939908,-24271.169776,0.27488,-4.046788,0.718768,1,2014-01-01 01:33:26


In [8]:
test.head(3)

Unnamed: 0,id,sat_id,epoch,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim,epoch_num,datetime
0,3927,1,2014-02-01T00:01:45.162,-13366.891347,-14236.753503,6386.774555,4.333815,-0.692764,0.810774,162,2014-02-01 00:01:45
1,3928,1,2014-02-01T00:22:57.007,-7370.434039,-14498.77152,7130.411325,5.077413,0.360609,0.313402,7,2014-02-01 00:22:57
2,3929,1,2014-02-01T00:44:08.852,-572.068654,-13065.289498,7033.794876,5.519106,2.01283,-0.539412,852,2014-02-01 00:44:08


Удаление лишних столбцов:

In [11]:
train = train.drop(['epoch', 'epoch_num'], axis=1)
test = test.drop(['epoch', 'epoch_num'], axis=1)

Сохранение датасетов:

In [12]:
train.to_csv('data/train_proc_with_id.csv')
test.to_csv('data/Track 1/test_proc_with_id.csv')