### 1. Forecast Data Format

#### 1.1 Data Format

```python
     time_col          var_col_0        var_col_1        var_col_2     ...    var_col_n
xxxx-xx-xx xx:xx:xx        x                x                x                    x
xxxx-xx-xx xx:xx:xx        x                x                x                    -
xxxx-xx-xx xx:xx:xx        x                x                x                    x
xxxx-xx-xx xx:xx:xx        -                x                x                    x
xxxx-xx-xx xx:xx:xx        x                x                x                    x
xxxx-xx-xx xx:xx:xx        x                -                x                    x
xxxx-xx-xx xx:xx:xx        x                -                x                    x
xxxx-xx-xx xx:xx:xx        x                x                -                    x
xxxx-xx-xx xx:xx:xx        x                x                x                    x
xxxx-xx-xx xx:xx:xx        x                x                -                    x
xxxx-xx-xx xx:xx:xx        -                -                -                    -
xxxx-xx-xx xx:xx:xx        x                x                x                    x
        -                  -                -                -                    -
xxxx-xx-xx xx:xx:xx        x                x                x                    x
```

**where，xxxx-xx-xx xx:xx:xx represents date，(x) represents the value of a variable in a moment.，(-)represents the missing value.**

```python
     time_col          var_col_0   var_col_1 ... var_col_n     covar_col_0    covar_col_1 ... covar_col_m
xxxx-xx-xx xx:xx:xx        x          x              x              x              x               x 
xxxx-xx-xx xx:xx:xx        x          x              -              x              x               x
xxxx-xx-xx xx:xx:xx        x          x              x              x              -               x
xxxx-xx-xx xx:xx:xx        -          x              x              x              x               -
xxxx-xx-xx xx:xx:xx        x          x              x              x              x               x
xxxx-xx-xx xx:xx:xx        x          -              x              x              x               x
xxxx-xx-xx xx:xx:xx        x          -              x              x              x               x
xxxx-xx-xx xx:xx:xx        x          x              x              x              -               x
xxxx-xx-xx xx:xx:xx        x          x              x              x              x               x
xxxx-xx-xx xx:xx:xx        x          x              x              x              x               x
xxxx-xx-xx xx:xx:xx        -          -              -              x              x               x
xxxx-xx-xx xx:xx:xx        x          x              x              x              x               x
        -                  -          -              -              -              -               -
xxxx-xx-xx xx:xx:xx        x          x              x              x              x               x
```
**where，covar_col_i (i=1,2,..,m)represents covariables.**

#### 1.2 Example

In [3]:
import numpy as np
import pandas as pd

size=5

# without covariables
df_no_covariates = pd.DataFrame({
    'timestamp': pd.date_range(start='2022-02-01', periods=5, freq='H'),
    'val_0': np.random.normal(size=size),
    'val_1': [0.5, 0.2, np.nan, 0.9, 0.0],
    'val_2': np.random.normal(size=size),
})

df_no_covariates

Unnamed: 0,timestamp,val_0,val_1,val_2
0,2022-02-01 00:00:00,-0.340573,0.5,1.684303
1,2022-02-01 01:00:00,-0.643602,0.2,0.46057
2,2022-02-01 02:00:00,-0.061449,,-1.277848
3,2022-02-01 03:00:00,-1.329061,0.9,0.111005
4,2022-02-01 04:00:00,0.451141,0.0,-0.018236


In [4]:
# with covariables
df_with_covariates = pd.DataFrame({
    'timestamp': pd.date_range(start='2022-02-01', periods=size, freq='D'),
    'val_0': np.random.normal(size=size),
    'val_1': [12, 52, 34, np.nan, 100],
    'val_2': [0.5, 0.2, np.nan, 0.9, 0.0],
    'covar_0': [0.2, 0.4, 0.2, 0.7, 0.1],
    'covar_1': ['a', 'a', 'b', 'b', 'b'],
    'covar_2': [1, 2, 2, None, 3], 
})

df_with_covariates

Unnamed: 0,timestamp,val_0,val_1,val_2,covar_0,covar_1,covar_2
0,2022-02-01,-0.028036,12.0,0.5,0.2,a,1.0
1,2022-02-02,-1.413566,52.0,0.2,0.4,a,2.0
2,2022-02-03,-0.726053,34.0,,0.2,b,2.0
3,2022-02-04,-0.505495,,0.9,0.7,b,
4,2022-02-05,-1.008861,100.0,0.0,0.1,b,3.0


In [5]:
from hyperts.datasets import (load_random_univariate_forecast_dataset, 
                              load_random_multivariate_forecast_dataset,
                              load_network_traffic)

In [6]:
df0 = load_random_univariate_forecast_dataset(return_X_y=False)
df0.head()

Unnamed: 0,ds,id,value
0,2013-01-01,1.0,0.074821
1,2013-01-02,1.0,0.466331
2,2013-01-03,0.0,0.297812
3,2013-01-04,0.0,0.898156
4,2013-01-05,1.0,0.22497


In [7]:
df1 = load_random_multivariate_forecast_dataset(return_X_y=False)
df1.head()

Unnamed: 0,ds,Var_1,Var_2
0,2022-02-17 19:24:19.575739,0.033264,0.793325
1,2022-02-18 19:24:19.575739,1.682852,2.647661
2,2022-02-19 19:24:19.575739,2.868912,3.423535
3,2022-02-20 19:24:19.575739,3.930536,4.101508
4,2022-02-21 19:24:19.575739,4.528761,4.800131


In [8]:
df2 = load_network_traffic(return_X_y=False)
df2.head()

Unnamed: 0,TimeStamp,Var_1,Var_2,Var_3,Var_4,Var_5,Var_6,HourSin,WeekCos,CBWD
0,2021-03-01 00:00:00,0.7534,3.375,10.195,1.449,19174.977,286443.88,0.0,1.0,NW
1,2021-03-01 01:00:00,0.3376,2.414,3.92,0.4065,7529.263,178930.45,0.258819,1.0,NW
2,2021-03-01 02:00:00,0.2032,1.654,3.318,0.2142,3310.539,42296.164,0.5,1.0,NW
3,2021-03-01 03:00:00,0.242,1.393,3.148,0.2312,4535.464,26220.232,0.707107,1.0,NW
4,2021-03-01 04:00:00,0.194,1.429,3.215,0.2157,2732.911,27990.348,0.866025,1.0,NW


<br>

<br>

### 2. Classification/Regression Data Format

#### 2.1 Data Format

```python
 var_col_0       var_col_1   ...    var_col_n     target
x,x,x,...,x     x,x,x,...,x        x,x,x,...,x      0
x,x,x,...,x     x,x,x,...,x        x,x,x,...,x      0
x,x,x,...,x     x,x,x,...,x        x,x,x,...,x      1
x,x,x,...,x     x,x,x,...,x        x,x,x,...,x      1
x,x,x,...,x     x,x,x,...,x        x,x,x,...,x      1
x,x,x,...,x     x,x,x,...,x        x,x,x,...,x      2
x,x,x,...,x     x,x,x,...,x        x,x,x,...,x      2
x,x,x,...,x     x,x,x,...,x        x,x,x,...,x      2
x,x,x,...,x     x,x,x,...,x        x,x,x,...,x      2
```

**where，x,x,x,...,x represents a sample in len(x,x,x... ,x) time segment of length the fluctuation of a variable over time. 。(x)represents the value of a variable at a time.**

#### 2.2 Example

In [9]:
import numpy as np
import pandas as pd

size=10

df = pd.DataFrame({
    'var_0': [pd.Series(np.random.normal(size=size)), pd.Series(np.random.normal(size=size)),
              pd.Series(np.random.normal(size=size)), pd.Series(np.random.normal(size=size)),
              pd.Series(np.random.normal(size=size)), pd.Series(np.random.normal(size=size))],
    'var_1': [pd.Series(np.random.normal(size=size)), pd.Series(np.random.normal(size=size)),
              pd.Series(np.random.normal(size=size)), pd.Series(np.random.normal(size=size)),
              pd.Series(np.random.normal(size=size)), pd.Series(np.random.normal(size=size))],
    'var_2': [pd.Series(np.random.normal(size=size)), pd.Series(np.random.normal(size=size)),
              pd.Series(np.random.normal(size=size)), pd.Series(np.random.normal(size=size)),
              pd.Series(np.random.normal(size=size)), pd.Series(np.random.normal(size=size))],
    'y': [0, 0, 1, 1, 2, 2], 
})

df

Unnamed: 0,var_0,var_1,var_2,y
0,0 -0.047296 1 -0.301670 2 0.115860 3 ...,0 -1.255403 1 1.180596 2 1.129211 3 ...,0 1.499294 1 -1.279534 2 -0.625612 3 ...,0
1,0 0.066158 1 0.642872 2 -2.125515 3 ...,0 -1.651127 1 -2.143031 2 -1.531549 3 ...,0 1.336447 1 -1.372525 2 -2.998177 3 ...,0
2,0 -1.092750 1 0.290924 2 -0.185960 3 ...,0 -1.063737 1 0.631803 2 0.105490 3 ...,0 -0.384169 1 -0.333234 2 0.801410 3 ...,1
3,0 -1.543037 1 0.270866 2 1.423431 3 ...,0 0.798215 1 -0.194042 2 0.399881 3 ...,0 -1.131351 1 0.215143 2 -0.076144 3 ...,1
4,0 -1.711941 1 0.186444 2 0.253809 3 ...,0 0.462602 1 1.419213 2 0.392129 3 ...,0 -0.042926 1 -0.946377 2 0.861729 3 ...,2
5,0 -0.935916 1 -0.732586 2 -0.790152 3 ...,0 -1.095017 1 0.272817 2 -1.032466 3 ...,0 0.395659 1 0.690628 2 0.170508 3 ...,2


In [10]:
from hyperts.datasets import load_arrow_head, load_basic_motions

In [11]:
df0 = load_arrow_head(return_X_y=False)
df0.head() 

Unnamed: 0,Var_1,target
0,0 -1.9630 1 -1.9578 2 -1.9561 3 ...,0
1,0 -1.7746 1 -1.7740 2 -1.7766 3 ...,1
2,0 -1.8660 1 -1.8420 2 -1.8350 3 ...,2
3,0 -2.0738 1 -2.0733 2 -2.0446 3 ...,0
4,0 -1.7463 1 -1.7413 2 -1.7227 3 ...,1


In [12]:
df0.target.unique()

array(['0', '1', '2'], dtype=object)

In [13]:
df1 = load_basic_motions(return_X_y=False)
df1.head()

Unnamed: 0,Var_1,Var_2,Var_3,Var_4,Var_5,Var_6,target
0,0 0.079106 1 0.079106 2 -0.903497 3...,0 0.394032 1 0.394032 2 -3.666397 3...,0 0.551444 1 0.551444 2 -0.282844 3...,0 0.351565 1 0.351565 2 -0.095881 3...,0 0.023970 1 0.023970 2 -0.319605 3...,0 0.633883 1 0.633883 2 0.972131 3...,standing
1,0 0.377751 1 0.377751 2 2.952965 3...,0 -0.610850 1 -0.610850 2 0.970717 3...,0 -0.147376 1 -0.147376 2 -5.962515 3...,0 -0.103872 1 -0.103872 2 -7.593275 3...,0 -0.109198 1 -0.109198 2 -0.697804 3...,0 -0.037287 1 -0.037287 2 -2.865789 3...,standing
2,0 -0.813905 1 -0.813905 2 -0.424628 3...,0 0.825666 1 0.825666 2 -1.305033 3...,0 0.032712 1 0.032712 2 0.826170 3...,0 0.021307 1 0.021307 2 -0.372872 3...,0 0.122515 1 0.122515 2 -0.045277 3...,0 0.775041 1 0.775041 2 0.383526 3...,standing
3,0 0.289855 1 0.289855 2 -0.669185 3...,0 0.284130 1 0.284130 2 -0.210466 3...,0 0.213680 1 0.213680 2 0.252267 3...,0 -0.314278 1 -0.314278 2 0.018644 3...,0 0.074574 1 0.074574 2 0.007990 3...,0 -0.079901 1 -0.079901 2 0.237040 3...,standing
4,0 -0.123238 1 -0.123238 2 -0.249547 3...,0 0.379341 1 0.379341 2 0.541501 3...,0 -0.286006 1 -0.286006 2 0.208420 3...,0 -0.098545 1 -0.098545 2 -0.023970 3...,0 0.058594 1 0.058594 2 0.175783 3...,0 -0.074574 1 -0.074574 2 0.114525 3...,standing


In [14]:
df1.target.unique()

array(['standing', 'running', 'walking', 'badminton'], dtype=object)

#### 2.3 3-D Numpy.array to 2-D pandas.DataFrame

In [15]:
import numpy as np

nb_samples = 100
series_length = 72
nb_variables = 6
nb_classes = 4

X = np.random.normal(size=nb_samples*series_length*nb_variables).reshape(nb_samples, series_length, nb_variables)
y = np.random.randint(low=0, high=nb_classes, size=nb_samples)

In [16]:
X.shape, y.shape, np.unique(y)

((100, 72, 6), (100,), array([0, 1, 2, 3]))

In [17]:
import pandas as pd
from hyperts.toolbox import from_3d_array_to_nested_df

df_X = from_3d_array_to_nested_df(data=X)
df_y = pd.DataFrame({'y': y})
df = pd.concat([df_X, df_y], axis=1)

In [18]:
df.head()

Unnamed: 0,Var_0,Var_1,Var_2,Var_3,Var_4,Var_5,y
0,0 0.806234 1 0.188977 2 1.273848 3...,0 0.258608 1 -0.792130 2 1.038857 3...,0 -0.847792 1 0.156601 2 -0.470024 3...,0 -1.678493 1 0.630987 2 0.315366 3...,0 0.101854 1 0.598913 2 -0.094354 3...,0 -2.338218 1 0.449695 2 1.073322 3...,0
1,0 -0.852834 1 0.153450 2 -0.735384 3...,0 0.849006 1 -0.223111 2 0.369678 3...,0 0.441615 1 1.876537 2 0.867711 3...,0 -1.330127 1 -0.852403 2 0.929594 3...,0 0.790247 1 -2.822168 2 0.198400 3...,0 -1.667110 1 0.667884 2 -0.552332 3...,0
2,0 0.516611 1 -1.936538 2 1.181051 3...,0 0.332350 1 -0.899296 2 0.385209 3...,0 1.108018 1 -0.523654 2 -1.078039 3...,0 -0.529913 1 1.149300 2 1.532611 3...,0 0.261585 1 -1.552161 2 0.579025 3...,0 -1.105977 1 0.412628 2 -0.477267 3...,1
3,0 -1.303581 1 -0.052115 2 0.170122 3...,0 0.567543 1 -1.612241 2 0.106416 3...,0 0.181888 1 0.611484 2 1.393938 3...,0 -0.414300 1 0.380566 2 -0.634024 3...,0 0.374566 1 0.211554 2 1.043710 3...,0 -0.033671 1 1.241645 2 1.849054 3...,2
4,0 0.117883 1 -1.520218 2 -0.099349 3...,0 -2.272789 1 1.653166 2 0.297192 3...,0 0.664080 1 -0.511925 2 0.373986 3...,0 1.185659 1 0.634788 2 -0.258795 3...,0 -0.090901 1 -1.225513 2 0.873829 3...,0 -2.196946 1 0.294095 2 0.995669 3...,3
