<h1 align="center">Software Introspection for Signaling Emergent Cyber-Social Operations (SIGNAL)</h1>
<h2 align="center">SRI International</h2>
<h3 align="center">In support of DARPA AIE Hybrid AI to Protect Integrity of Open Source Code (SocialCyber)</h3>

## Introduction

In this notebook, we provide pretrained RandomForest models trained on the Persuasion for Good (P4G) dataset from Wang et al. [1]. For the interested reader, the P4G dataset is publicly available on GitLab, see [HERE](https://gitlab.com/ucdavisnlp/persuasionforgood/tree/master/data). Wang et al. emphasize the need for understanding the intrinsic disclosure and appeal strategies taking place in human persuasion conversations. In their words, being able to model such properties would lead to advancements in the ethical developments of automated dialogue systems [1].

In the context of the SIGNAL project, and SocialCyber overall program, we foresaw three main benefits resulting from the adoption and use of P4G dataset, namely:
1. Assist in enriching the developer communication networks.,
2. Help in understanding and modeling the developer communications in the LKML.,
3. Provide an additional axis of information, indicating the persuasion strategies used by developers when trying to push their patches upstream in the Linux Kernel.

### In this tutorial...

We will start by demonstrating the performance of a Random Forest ([RF](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)) model pre-trained on P4G data. We opted for Random Forests, as the original Hybrid-RCNN architecture presented by [1] would take a substantial amount of time to train on the data (When testing Hybrid-RCNN, one single round of training required more than 12-hours when run over a single GPU). Furthermore, RFs provided comparable accuracy to the Hybrid-RCNN.

#### Transfer Learning

To adapt the RF model to the LKML data, we made use of *Transfer Learning* techniques. Transfer Learning enables re-purposing *trained* machine learning (ML) models to new tasks, as long as these tasks share some similarities with one another. TL is heavily used in deep learning tasks, where data or resource contraints prohibit the research community to train DL models from scratch.

##### Structure Expansion Reduction Algorithm

To enable the use of transfer learning on RF models, we turned our attention to the work of Segev et al. [2], where the authors introduce the structure expansion reduction (**SER**) algorithm. SER has two variants: 1) The first variant greedily searches and applies local modifications to each decision tree, expanding or reducing the tree around individual nodes; 2) The second variant of SER works by modifying the parameters of a selected decision tree component. We implement both of these variants using the original work by Segev et al. [2] and the subsequent work by Minvielle et al. [3] as basis.

#### Disclaimer

The content of this notebook is released under the **GNU General Public License v3.0**, see [LICENSE](https://github.com/SRI-CSL/signal-public/blob/main/LICENSE).

#### References

[1] Wang, Xuewei, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. "*Persuasion for good: Towards a personalized persuasive dialogue system for social good.*" Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL (2019). [PDF](https://arxiv.org/pdf/1906.06725.pdf).

[2] Segev, Noam, Maayan Harel, Shie Mannor, Koby Crammer, and Ran El-Yaniv. "*Learn on source, refine on target: A model transfer learning framework with random forests.*" IEEE transactions on pattern analysis and machine intelligence 39, no. 9 (2016): 1811-1824. [PDF](https://arxiv.org/pdf/1511.01258.pdf).

[3] Minvielle, Ludovic, Mounir Atiq, Sergio Peignier, and Mathilde Mougeot. "*Transfer Learning on Decision Tree with Class Imbalance.*" In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1003-1010. IEEE, 2019. [PDF](https://ieeexplore.ieee.org/abstract/document/8995296).