# Deterministic Model 01: Data Preparation

<h3 id="data_combine">Data to Combine</h3>
<p>
There are two datasets in .csv format:<br>
* First: Contain Real data.<br>
* Second: Contain output from simulation.<br>
</p>

In [1]:
# install libraries
#%pip install pandas
#%pip install numpy

In [2]:
# import pandas library
import pandas as pd
import numpy as np

<h3>Read Data</h3>

In [3]:
# Read the file and assign it to variable "df_real"
other_path = '../../data/raw/ready_timelaps.csv'
df_real = pd.read_csv(other_path, header=0)

# Read the file and assign it to variable "df_sim"
other_path = '../../data/raw/ARSS_plan.csv'
df_sim = pd.read_csv(other_path, header=0)

In [4]:
# show the first 5 rows using dataframe.head() method
print('The first 5 rows of the Real dataframe')
df_real.head(5)

The first 5 rows of the Real dataframe


Unnamed: 0,id,type_brick,type,start_to_verif,verif_to_dest,dest_to_end,total_time
0,1,CORNER,2,6,18,16,40
1,2,HALF,3,4,16,18,38
2,4,BASIC,1,6,14,16,36
3,6,BASIC,1,6,14,16,36
4,9,BASIC,1,8,12,24,44


In [5]:
# show the first 5 rows using dataframe.head() method
print('The first 5 rows of the Sim dataframe')
df_sim.head(5)

The first 5 rows of the Sim dataframe


Unnamed: 0,ID,TYPE,ROTATION,X,Y,Z,LAYER,PALLET,DIST
0,1,2,90,220.0,95.0,0,1,1,2731.367057
1,2,3,90,220.0,252.5,0,1,1,2596.795573
2,3,1,90,220.0,440.0,0,1,1,2440.17233
3,4,1,90,220.0,690.0,0,1,2,2350.74903
4,5,1,90,220.0,940.0,0,1,3,2082.200999


In [6]:
# create headers list
headers = ['id', 'type_brick', 'rotation', 'x', 'y', 'z', 'layer', 'pallet', 'dist']
print('headers\n', headers)

headers
 ['id', 'type_brick', 'rotation', 'x', 'y', 'z', 'layer', 'pallet', 'dist']


In [7]:
df_sim.columns = headers
df_sim.head()

Unnamed: 0,id,type_brick,rotation,x,y,z,layer,pallet,dist
0,1,2,90,220.0,95.0,0,1,1,2731.367057
1,2,3,90,220.0,252.5,0,1,1,2596.795573
2,3,1,90,220.0,440.0,0,1,1,2440.17233
3,4,1,90,220.0,690.0,0,1,2,2350.74903
4,5,1,90,220.0,940.0,0,1,3,2082.200999


In [8]:
df_sim[['dist','x','y','z']] = df_sim[['dist','x','y','z']].astype(int)
df_sim = df_sim.drop('type_brick', axis=1)
df_sim.head()

Unnamed: 0,id,rotation,x,y,z,layer,pallet,dist
0,1,90,220,95,0,1,1,2731
1,2,90,220,252,0,1,1,2596
2,3,90,220,440,0,1,1,2440
3,4,90,220,690,0,1,2,2350
4,5,90,220,940,0,1,3,2082


### Merging datasets by ID

In [9]:
# Merging two datasets by ID
df = pd.merge(df_sim, df_real, on="id", how="inner")
df.head()

Unnamed: 0,id,rotation,x,y,z,layer,pallet,dist,type_brick,type,start_to_verif,verif_to_dest,dest_to_end,total_time
0,1,90,220,95,0,1,1,2731,CORNER,2,6,18,16,40
1,2,90,220,252,0,1,1,2596,HALF,3,4,16,18,38
2,4,90,220,690,0,1,2,2350,BASIC,1,6,14,16,36
3,6,90,220,1190,0,1,4,1804,BASIC,1,6,14,16,36
4,9,90,220,1940,0,1,7,1454,BASIC,1,8,12,24,44


### Sorting Dataframes Columns

In [10]:
df = df[['id','type_brick', 'type', 'rotation', 'x', 'y', 'z', 'layer', 'pallet', 'dist', 'start_to_verif', 'verif_to_dest', 'dest_to_end', 'total_time']]

<h2>Save Dataset</h2>
<p>
Correspondingly, Pandas enables us to save the dataset to csv. By using the <code>dataframe.to_csv()</code> method, you can add the file path and name along with quotation marks in the brackets.
</p>
<p>
For example, if you would save the dataframe <b>df</b> as <b>automobile.csv</b> to your local machine, you may use the syntax below, where <code>index = False</code> means the row names will not be written.
</p>


In [11]:
df.to_csv("../../data/sim/merged_data.csv", index=False)

We can also read and save other file formats. We can use similar functions like **`pd.read_csv()`** and **`df.to_csv()`** for other data formats. The functions are listed in the following table:


#### Author/Date/Organization

Vjaceslav Usmanov, CTU in Prague

###### Change Log


|  Date (YYYY-MM-DD) |  Version | Changed By  |  Change Description |
|---|---|---|---|
| 2026-01-20 | 1.1 | Vjaceslav Usmanov| added DM_01_Data_Prepearing.ipynb |
| 2026-02-12 | 1.2 | Vjaceslav Usmanov| changed DM_01_Data_Prepearing.ipynb |