## Use case



Use exactly one loader per datatype.
This is an example of how to use most of the methods in the package.
Most of cases are advanced.

This notebook uses data from `example_multiprocess`, make sure to run it first.



### Initialization



In [1]:
%load_ext autoreload
%autoreload 2
import numpy as np
import pandas as pd
from TSload import TSloader, DataFormat

In [1]:
path = "data/example_use_case/data"
datatype = "simulated"
split = ["0", "1"]
permission = "overwrite"  # Overwrite is used for repeated execution
loader = TSloader(path, datatype, permission=permission)

### Data operations



#### Add datatype



In [1]:
d = {"ID": np.hstack((["name1" for _ in range(5)], ["name2" for _ in range(5)])),
    "timestamp": list(map(str, range(0,10))),
     "feature1": list(range(10)), "feature2": list(range(10,20))}
df = pd.DataFrame(data=d)
loader.initialize_datatype(df=df)
print(loader.df)

#+begin_example
                 feature1  feature2
ID    timestamp                    
name1 0                 0        10
      1                 1        11
      2                 2        12
      3                 3        13
      4                 4        14
name2 5                 5        15
      6                 6        16
      7                 7        17
      8                 8        18
      9                 9        19
#+end_example

#### Add ID



In [1]:
ID = "added_ID"
d = {"timestamp": list(map(str, range(0,5))), "feature1": list(range(5)) ,"feature2": list(range(10,15))}
df = pd.DataFrame(data=d)
loader.add_ID(df, ID=ID)
print(loader.metadata) # in memory

split                       IDs              features         start  \
datatype                                                                        
simulated    []  [added_ID, name2, name1]  [feature2, feature1]  [2016-01-01]   

          test test2  
datatype              
simulated  [0]   [1]

#### Add feature



It is definitely easier to add the datatype correctly in the first place
than to use `add_feature`.

You can add feature for name1



In [1]:
feature = "added_feature"
d = {"timestamp": list(map(str, range(10))), feature: list(range(10))}
df = pd.DataFrame(data=d)
#print(df)
loader.add_feature(df, ID="added_ID", feature=feature)
loader.metadata

split                       IDs              features         start  \
datatype                                                                        
simulated    []  [added_ID, name2, name1]  [feature2, feature1]  [2016-01-01]   

          test test2  
datatype              
simulated  [0]   [1]

#### Remove data



In [1]:
empty_loader = TSloader(path, datatype, permission=permission)
empty_loader.df

#+begin_example
                    feature1  feature2  added_feature
ID       timestamp                                   
name1    0               0.0      10.0            NaN
         1               1.0      11.0            NaN
         2               2.0      12.0            NaN
         3               3.0      13.0            NaN
         4               4.0      14.0            NaN
name2    5               5.0      15.0            NaN
         6               6.0      16.0            NaN
         7               7.0      17.0            NaN
         8               8.0      18.0            NaN
         9               9.0      19.0            NaN
added_ID 0               0.0      10.0            0.0
         1               1.0      11.0            1.0
         2               2.0      12.0            2.0
         3               3.0      13.0            3.0
         4               4.0      14.0            4.0
         5               NaN       NaN            5.0
         6  

In [1]:
empty_loader.rm_datatype()
assert len(empty_loader.df) == 0

### Metadata operations



#### Add metadata



In [1]:
loader.overwrite_metadata(start="2016-01-01")
loader.add_metadata(start="2016-01-01")
loader.add_metadata(test=["0", "0"], test2=["1", "1"])
loader.metadata

split                       IDs              features         start  \
datatype                                                                        
simulated    []  [added_ID, name2, name1]  [feature2, feature1]  [2016-01-01]   

          test test2  
datatype              
simulated  [0]   [1]

Don't forget to write the changes on the file



In [1]:
loader.write()

### Dataset operations



*Execution order here is important.*



#### Instantiate



In [1]:
data_path = "data/example_use_case/data"
multiprocess_path = "data/example_multiprocess"
copy_path = "data/example_use_case/copy"
move_path = "data/example_use_case/move"
merge_path = "data/example_use_case/example_merge"
permission = "overwrite"
data_loader = TSloader(data_path, datatype, permission=permission)
multiprocess_loader = TSloader(multiprocess_path, datatype, permission=permission)
print("Use case metadata")
print("-----------------")
print(data_loader.metadata)
print()
print("Multiprocess metadata")
print("---------------------")
print(multiprocess_loader.metadata)

#+begin_example
    Use case metadata
    -----------------
              split                       IDs              features         start  \
    datatype                                                                        
    simulated    []  [added_ID, name2, name1]  [feature2, feature1]  [2016-01-01]   

              test test2  
    datatype              
    simulated  [0]   [1]  

    Multiprocess metadata
    ---------------------
                              split             IDs              features
    splitted_data  [split0, split1]  [name1, name2]  [feature0, feature1]
    simulated                    []              []                    []
    #+end_example

##### Copy the data to \`copy\_path\`



In [1]:
data_loader.copy_dataset(copy_path)

##### Move data to \`move\_path\`



In [1]:
data_loader.move_dataset(move_path)

##### Remove data from loader's path (\`move\_path\`) and set its path back



In [1]:
data_loader.rm_dataset()
data_loader.set_path(data_path)

##### Merging dataset



In [1]:
merge_loader = DataFormat.merge_dataset([data_loader, multiprocess_loader], merge_path)
print("Dataset are merged, here is the metadata")
print(merge_loader.metadata)

Dataset are merged, here is the metadata
                          split                       IDs  \
splitted_data  [split0, split1]            [name2, name1]   
simulated                    []  [added_ID, name2, name1]   

                           features  
splitted_data  [feature0, feature1]  
simulated      [feature2, feature1]