# how action_classifier works

## 1. `ac.py`:
Dreher set the ac.py as a python external module, so that people can import the module directly.

###### `sys.exit(ac.exec.main(sys.argv[1:]))`

`sys.exit()` means the program exits the module onece launched the command.

`sys.argv[]` means a String list, which is passed to the function.
It returns a String of status of the executing program.

## 2. exec.py
#### 2.0 `exec.main(argv)`:
###### `args = parse_args(argv, env)`
 this `args` contain methods and arguments. argv: methods in the module: `
[mkevenv, train, predict, dataset, evaluate]`. For example,if `argv` is dataset, then it turns to
`parse_args(dataset, env)`. `env` is a dictionary which contains path to the dataset.

###### `code = getattr(ac.exec, args.command)(args)`:
Get a named attribute from an object. `getattr(x, 'y')` is
equivalent to `x.y`. `getattr(object, method)(argument of method)`: object calls the method with the given argument.
More specifically: `ac.exec.args.command(args)`.

####  2.1 `def dataset(args) -> int`:
Generate the symbolic and graphs dataset.
##### Arguments:
- `--history-size`: Amount of scene graphs to be considered in the history for temporal edges; Default is 10.
- `--raw-dataset-path`: Dataset path to the derived information. Default given in `env` variable.

#####  2.1.1 `ac.dataset.generate_symbolic_dataset(args.raw_dataset_path, args.basepath)`:
Generate symbolic dataset in a cache file from derived data and ground truth data. For each `subject`, each `task`,
and each `take`, `rec = Recording(derived_data_path, ground_truth_path)` is called.
Generally speaking, in the `symbolic_dataset.cache`, an object of class `Recording` of each take is saved.  <br>
`Recording` is a class, which has some attributes like `frame_count`, `objects`, and `relations`.
It has also some methods like `_load_objects()`, `_load_relations()`, and `_load_ground_truth()`.

- `_load_objects()`: load 3d_objects derived data for each frame. Save the loaded data into `self.objects`. For each take,
end loading when the number of loaded objects equals the number of frames of this take. An example loading path is "derived_data_path" +
"3d_objects" + "frame_10".  <br>
It is worth mentioning that the loaded object data of each frame is serialised and used as the
input in the class `Object`. That means there are 6 variables for each loaded object in each frame. They are `certainty`,
`class_index`, `class_name`, `instance_name`, `bounding_box`, `past_bounding_box`.  <br>
The variables `bounding_box` and `past_bounding_box` are in object of class `BoundingBox`. It also uses serialization to
get information from upper object.

- `_load_relations()`: same idea. Class `Relation` has such attributes: `subject_index`, `object_index`, `relation_name`.
Each frame mach has multiple dictionary to describe different relations between subject and object.

- `_load_ground_truth()`: same idea. However, there are only one ground-truth file for each take instead of each frame.
So we load directly from path instead of iterating over frames using `_load_json_series`.  <br>
The ground-truth is in format `begin_frame -> action_index -> action_changing_frame -> action_index -> end_frame`.

Summary: for each frame, an object of `Object()` and `Relation()` are saved. For each take, an object of `GroundTruth`
is saved. All objects contains multiple attributes. For each take, the information is saved by `recs[subject][task][take] = rec`.
Once the dataset for all subjects are loaded, the information is written into the cache file.

##### 2.1.2 `ac.dataset.generate_dataset(args.basepath, dataset_config, args.history_size)`:
load the saved cache file into a list by `recs = load_symbolic(basepath)`. It's kind of frustrating because 16 GB memory
is enough for generating cache file but not enough for loading it into python list. Luckily, 32 GB memory is enough for the loading.  <br>

###### `for subject, task, take in crawl_dataset():`
returns a generator, which covers each subject, task, and take respectively. For each take, load the data into object
`recording` of class `Recording`. <br>

###### `recording.check_integrity()`:
a method of class `Recording`. Firstly, it checks the size of loaded object,
relation, and groundtruth with the frame of the take. Consequently, it checks if objects and relations match.

###### `recording.to_scene_graphs(i, history_size=history_size)`:
for each frame of the object recording (each take), take
multiple previous scene graph for temporal edges. It returns a list, contains up to `history-size` previous scenegraphs<br>
`i` is iterater over all frames in the take. `history_size` is a parsing argument, which indicates the amount of scene
graphs to be considered in the history for temporal edges, the default value is 10.  <br>
`sgl.append(self.to_scene_graph(i))`: eate a new object `sg` of class `Scenegraph` for a single frame. It contains
attribute `left_action`, `right_action`, `nodes`, and `edges`. <br>
Firstly it reads the groundtruth action into attribute `left_action` and `right_action`.  <br>
Secondly it generates `nodes` of scenegraph from objects: for each object of the frame, nodes contains the index, name
and bounding box coordinate information of the object.  <br>
After that, it generates `edges` from relations: for each relation it generates a key consisting of (subject index,
object index), and the value of corresponding key is a list containing all index of relation. It is worth mentioning that
the subject index and object index are basically the node number.

###### `flatten_scene_graphs(sgl)`:
after obtaining the scenegraphs of each frame, this function flattens the multiple scengraphs
into a single graph. Since the multiple scene graphs are the temporal previous scenegraph, we encodes the temporal relations
of action into a single scenegraph. <br>
Firstly it generates groundtruth for the temporal scene graph. It is directly taken from the groundtruth of the scenegraph
of the latest frame. <br>
Secondly it generates global `nodes`. The `global_node_id_map` is a dictionary, which has (scenegraph_id, previous_nodes_id)
 as key and an integer indicator as value. The indicater indicates the number of current nodes in all scengraphs. The nodes
 of temporal graph is the node of each seperate graph. The position of the node in the list is indicated by the value of
 dictionary. The reason of dictionary is that we can easily find the position of node of a scenegraph by their index. <br>
 Thirdly it adds the edges to the temporal scenegraph. The first step is to copy the edge from each scenegraph. The last
 step is to add edges for temporal relations. For adjacent scenegraphs, if the two nodes are same in the two scenegraphs,
 then we add a esge called `temporal`.

###### `scene_graph.to_data_dict(mirrored=False)`:
convert scenegraph of a frame into a dictionary, which has one-hot-encoding
 as value. One-hot-encoding convert a single id into one-hot-bits. `globals` contains the one-hot groundtruth action label. <br>
 For each node of the scenegraph, use one-hot to encode the object class index. `graph_nodes` contains the one-hot object class
 and the bounding box information of all objects (nodes) of the scenegraph. <br>
 For each edge of the sceengraph, use one-hot to encode the relations. It also marks the sender node and receiver node,
 not using one-hot-encoding. `graph_edges` contains the one-code relations of all edges. `graph_senders` contains sender
 node of all edges, `graph_receivers` contains all receiver nodes of all edges. <br>
 `mirrored=False` is a utility, which can generate a mirror data given input data. Therefore, the graph will have a left
 hand action and a right hand action. <br>

###### `write_frame(subject, task, take, i, graphs)`:
since we already have the graph for current frame, we can write them into
a cache file. It generates a folder called `h10` in base path, which means the considered history frames here is 10. <br>  <br>
Summary: The `datapath` function of the module load the `3d_objects` and `spatial_relations` derived data and saved the
symbilc data into a cache file. After that, it loads the cache file and try to generates a temporal scene graph for each
frame. Last it will return `STATUS_OK`, if all procedure successfully processed.

#### 2.2 `def mkevenv(args) -> int`:
Generate a environmental file, which indicates the parameters for train, prediction, and evaluation. After that, program
needs no arguments from command line. It reads the parameters directly from evvironmental file.

##### Arguments:
- `--namespace`: String identifier of the namespace
- `--dataset-config`: String identifier of the dataset configuration. Namely h{history_size}
- `--evaluation-mode`: Evaluation mode, choose from (normal, contact, centroids)
- `--processing-steps-count`: Number of processing steps the graph network should perform
- `--layer-count` : Number of layers used for the MLPs in the graph network
- `--neuron-count`: Number of neurons per layer used for the MLPs in the graph network
- `--validation-id`: ID of the takes to use as validation set

##### 2.2.1 `env_config = args.__dict__`:
`__dict__` is a dictionary, which maps object to stored object’s (writable) attributes.

##### 2.2.2 `json.dump(env_config, f, indent=4)`:
write dictionary `env_config` into a jason file. The `indent` parameter that specifies how many spaces to indent. JSON array
elements and object members will be pretty-printed with that indent level

#### 2.3 `def train(args) -> int`:

##### Arguments:
- `--evaluation-id`: Numerical identifier of the left-out evaluation subject. Others for training
- `--restore`: Iteration number of the state to restore
- `--nax-iteration`: Maximum iteration number before interrupting. Negative values = unlimited
- `--log-interval`: Interval in seconds after which to perform a validation
- `--save-interval`: Number of iterations after which a model checkpoint is saveds

##### 2.3.1 `train_set = ac.dataset.load(...)`:
load graphs into training set. The loaded data are objects of class `SceneGraphProxy`.

###### `paths = get_dataset_paths(datasets_basepath, dataset_config)`:
At first it generate a list `paths`, which contains path to extracted graph for each frame.

###### `SceneGraphProxy(p, evaluation_mode)`:
for each frame, instance an object of class `SceneGraphProxy`. It contains attributes like `path`, `subject`, `task`,
`take`, `side`, `frame`, and `mode`. It read the information from give path. Besides, it has method `load`, which allows
it to load saved scenegraphs in `.cache` file of each frame. It is worth mentioning that the loaded information is only the
object. That means, before it calls the method of element in dataset, there is no useful information about scenegraph loaded.

###### `x for x in dataset if not filter_if(x)`:
`filter_if(x)` is `is_ev_sub(x) or is_vld(x)`, which is true when current subject index is not evaluation id and take index
is not validation id. When current frame is neither in evaluation id nor in validation id, we save the `sceneproxygraph`
from current frame into a list and return it as the training set.

##### 2.3.2 `train_set = ac.dataset.load(...)`:
load graphs into validation set. It was almost same as last step but the list only contains the `scenegraphproxy` when the
frame is from the take with validation id.

##### 2.3.3 `model = ac.model.ActionClassifierModel(...)`:
Input parameters: `out_path`, `processing_steps_count`, `layer_count`, `neuron_count` <br>
###### `self.model = EncodeProcessDecode(...)`:
Input parameters:
- `layer_count`
- `neuron_count`
- `edge_output_size`: number of all relations
- `node_output_size`: number of all objects
- `global_output_size`: number of all actions

Structure of `EncodeProcessDecode`:

                        Hidden(t)   Hidden(t+1)
                           |            ^
              *---------*  |  *------*  |  *---------*
              |         |  |  |      |  |  |         |
    Input --->| Encoder |  *->| Core |--*->| Decoder |---> Output(t)
              |         |---->|      |     |         |
              *---------*     *------*     *---------*

- Encoder: encodes edge, node, global attributes independently, without computation <br>
- Core:  performs N processing / message passing steps <br>
- Decoder: decodes edge, node, global attributes of message passing step

Multilayer Perceptrons are employed as edge update functions, node update functions, and global update functions. <br>
- `edge_fn = snt.layer(outputsize, name)`: build the linear layers. It only needs the output dimensionality
and module name, doesn't need input dimensionality. <br>
- `gn.modules.GraphIndependent(edge_fn, node_fn, global_fn)`: A graph block that applies models to the graph elements
independently. The inputs and outputs are graphs. The corresponding models are applied to each element of the graph
(edges, nodes and globals) in parallel and independently of the other elements. It can be used to encode or
decode the elements of a graph. When it is called, the input is a `graphs.GraphsTuple` containing non-`None` edges, nodes and
globals, and it returns an output `graphs.GraphsTuple` with updated edges, nodes and globals.<br>

Build module step of `EncodeProcessDecode`:
- Flow: `input_op` -> `encoder` -> `core` * `num_processing_steps` -> `decoder`, each input/output is graph
- Encoder: use MLP as `edge_model_fn`, `node_model_fn`, `global_model_fn`. Pass the model function into  `GraphIndependent`.
 The input of Encoder is `input_op`.
- Core: same model function. The input of core of each processing step is concatenated graphs of last processing step.
- Decoder: for the output graph of core of each processing step, use `MLPGraphIndependent` (MLP) to update once and use
`GraphIndependent` (linear) to generate the output graph for the processing step. It returns a list containing the output
graphs of all processing steps.

Summary of `EncodeProcessDecode`: Input is a graph and output is updated graphs of each processing step. It uses MLPs and
linear layers as update function. The probability distribution of all actions is encoded in the updated global attribute `u'`.

##### 2.3.4 `model.train(...)`:
Input parameters:
- `train_set`: Training set, list of loaded scenegraphs
- `valid_set`: Validation set, list of loaded scenegraphs
- `restore`: Iteration number of the state to restore
- `batch_size_train`: Train batch size
- `batch_size_train`: Train batch size
- `learning_rate`: Learning rate
- `max_iteration`: Max iteration, aborting afterwards
- `log_interval`: Interval when a log should be printed to stdout
- `save_interval`: Interval when the model should be saved to disk

The steps in training:
- `placeholder = [train_set[0].load()]`: set a placeholder. It calls `load` method and then the scenegraph is loaded.
- The graph neural network in `graph_nets` has specific input structure. It converts an prepocessed graph to a GNN graph
by function `gn.utils_tf.placeholders_from_data_dicts(graph_pre)`.
- `optimizer = tf.train.AdamOptimizer(learning_rate)`: set Adam optimizer with given learning rate.
- `_make_all_runnable_in_session()`: Input is graph and output is graph which is runable in tensorflow. Allows a graph
containing `None` fields to be run in a `tf.Session`.
- `_create_train_feed_dict`: For each iteration step, we need to create placeholders from scenegraphs. Inputs are a list
containing loaded scenegraph, the number of scenegraphs is batch size. We drop the 2/3 `idel` and `hold` scenegraphs in
current batch because they possesses too many frames. We also filter out the scenegraphs which has no edges.
Finally, it retuns a dictionay `feed_dict`, which contains `input` and `target` for the curent batch. The only difference
between `Input` and `target` is that `Input` has no action Id, it is used as feature. `target` has action id, it is used
as labels. It is a supervised learning set.
- `run_parameters`: contains `step`, `targets`, `loss`, and `output`.
- `tf.session.run()`:  In a TensorFlow Session `tf.Session`, you want to run (or execute) the optimizer operation (in
this case it is `train_step`). The optimizer minimizes your loss function (in this case cross_entropy), which is evaluated
 or computed using the model hypothesis y.



#### 2.4 `def predict(args) -> int`:

##### Arguments:
- `basepath`:
- `verbose`:
- `namespace`:
- `e`: use the subject which is left out in training process
- `restore`: restore the model at which iteration; Usually set this to maximal iteration number 3000