Skip to content

Commit

Permalink
new release of HitoshiIO
Browse files Browse the repository at this point in the history
  • Loading branch information
btsaubt committed Jun 11, 2019
1 parent 9e0acfe commit 2bf15fb
Show file tree
Hide file tree
Showing 122 changed files with 16,486 additions and 658 deletions.
9 changes: 9 additions & 0 deletions .travis.yml
@@ -0,0 +1,9 @@
dist: trusty
language: java
jdk:
- oraclejdk8
before_install:
- cd clonedetector
script:
- python3 run_integration_tests.py

88 changes: 62 additions & 26 deletions README.md
@@ -1,53 +1,90 @@
HitoshiIO: Detecting functionally similar code
========
# HitoshiIO: Detecting functionally similar code


HitoshiIO is a system for identifying functional similar methods based on thier I/Os. The information about how HitoshiIO works can be found in our [ICPC 2016 paper](http://jonbell.net/icpc_16_hitoshiio.pdf).
HitoshiIO is a system for identifying functional similar methods based on their I/Os. Information on how HitoshiIO works can be found in our [ICPC 2016 paper](http://jonbell.net/icpc_16_hitoshiio.pdf).


Running
-------
HitoshiIO will modify the bytecode of your application for recording I/Os and then compute similarity between these I/Os. The steps to install and use HitoshiIO are as follows.

### Pre-step 1 (Optional)
Change base variable in `hitoshiIO2/clonedetector/src/main/java/edu/columbia/cs/psl/ioclones/sim/AbstractSim.java` to change the correlation base.

### Pre-step 2 (Optional)
Change indexType variable in `hitoshiIO2/clonedetector/src/main/java/edu/columbia/cs/psl/ioclones/sim/FastAnalyzer.java` to choose a coefficient.

HitoshiIO will modify the bytecode of your application to record I/Os, and then compute similarity between these I/Os. The steps to install and use HitoshiIO are as follows.

### Step 0
HitoshiIO needs a database to store the captured functional clones. We use MySQL. For downloading and installing MySQL, please refer to [MySQL](https://www.mysql.com/).

HitoshiIO is a maven project. For compiling HitoshiIO, please change your directory to "clonedetector" and use the following command:
HitoshiIO is a maven project. For compiling HitoshiIO, please change your directory to `clonedetector` and run the following command:

```mvn clean package```

### Step 1 (PreAnalyzer: Capture and store identified I/O's)

Before running the application, HitoshiIO needs to identify the inputs and outputs of each method. Please make sure that you have `clonedetector/classinfo/methodeps.db` - it is provided in the current release of HitoshiIO. You can then run this command:

```java -cp target/CloneDetector-0.0.1-SNAPSHOT.jar edu.columbia.cs.psl.ioclones.analysis.PreAnalyzer -cb {/path/to/your/bytecodebase}```

`{/path/to/your/bytecodebase}` can be a relative or full path to a folder of Java .class files. A smaller bytecodebase may serve you better. Make sure that all classes that you want analyzed have been compiled using JDK 8 - compiling with newer versions of Java may not be supported by ASM, which is heavily used throughout the project.

The I/O identification results will be stored in "cb.db" under the "classinfo" directory.

### Step 2 (IODriver: Execute your application)

mvn clean package
Please execute your application using this command:

### Step 1
Before running the application, HitoshiIO needs to identify the inputs and outputs of each method. Please make sure that you have "methodeps.db" under the "classinfo" directory. You can then run this command:
```java -javaagent:target/CloneDetector-0.0.1-SNAPSHOT.jar -noverify -cp "target/CloneDetector-0.0.1-SNAPSHOT.jar:{/path/to/your/bytecodebase}" edu.columbia.cs.psl.ioclones.driver.IODriver {your.application.class} {... args}```

java -cp target/CloneDetector-0.0.1-SNAPSHOT.jar edu.columbia.cs.psl.ioclones.analysis.PreAnalyzer -cb /path/to/your/bytecodebase
Note: replace `{/path/to/your/bytecodebase}` with the file path to your bytecodebase containing all your java class files, `{your.application.class}` with the fully qualified class name (minus the .class file extension) of the java class you would like to capture I/Os for, and `{... args}` with any command-line arguments expected. Note that `{args}` can be a filepath, and will start at the `ioclones/clonedetector` directory, so all relative paths must adhere to that.

Th I/O identification results will be stored in "cb.db" under the "classinfo" directory.
Because HitoshiIO is a dynamic analysis tool, you need to execute (profile) every application that you want to detection functional clones in your codebase by the command above. **However, only classes with a `main` method can be executed, as functionality to individually execute methods has not yet been implemented.**

###Step 2
Now you can execute your application. Please use this command:
The I/O profiles of each method executed in your application can be found under the "iorepo" directory.

java -javaagent:target/CloneDetector-0.0.1-SNAPSHOT.jar -noverify -cp "target/CloneDetector-0.0.1-SNAPSHOT.jar:/path/to/your/bytecodebase" edu.columbia.cs.psl.ioclones.driver.IODriver your.application.class args
### Step 3 (SimAnalysisDriver: Store methods and corresponding similarities to database)

Because HitoshiIO is a dynamic analysis tool, you need to execute (profile) every application that you want to detection functional clones in your codebase by the command above. The profiling results of each method in a single application can be found uner the "iorepo" directory.
HitoshiIO needs a database to store the captured functional clones. We use MySQL Workbench and MySQL Community Server. Note that Workbench is not necessary, just convenient. For downloading and installing MySQL, please refer to [MySQL](https://www.mysql.com/). For MySQL Workbench please refer to [MySQL Workbench download link](https://dev.mysql.com/downloads/workbench/). For MySQL Community Server please refer to [MySQL Community Server download link](https://dev.mysql.com/downloads/mysql/).

###Step 3
For computing the functional similarity between methods, you can either assign a single I/O repository, which exhaustively compare all I/O profiles in this repository:
In order to run the command successfully, you must first create two tables in your database:

java -Xmx62g -cp target/CloneDetector-0.0.1-SNAPSHOT.jar edu.columbia.cs.psl.ioclones.driver.SimAnalysisDriver -cb /path/to/your/bytecodebase -alg deepHash -mode exhaustive -eName "preferred_comparision_name_in_db" -db db_ip:port/dbname -user username -pw pw -io /path/to/your/io_repo1
**[db_gen.txt](https://github.com/Programming-Systems-Lab/hitoshiIO2/blob/master/clonedetector/classinfo/db_gen.txt) exists to help with the creation of these tables, and contains the table schemas for `hitoshio_summary` and `hitoshio_row`.**

or you can assign two I/O repositories, which compare every pair of I/O profiles (one from repo1 while the other one from repo2):
For computing the functional similarity between methods, you can select from the following options:

java -cp target/CloneDetector-0.0.1-SNAPSHOT.jar edu.columbia.cs.psl.ioclones.driver.SimAnalysisDriver -cb /path/to/your/bytecodebase -alg deepHash -mode comparison -eName "preferred_comparision_name_in_db" -db db_ip:port/dbname -user username -pw pw -io /path/to/your/io_repo1 /path/to/your/io_repo2
1. For a single I/O repository with exhaustive I/O profile comparisons, run the following command:

Notes: HitoshiIO filter out constructor, static constructor, toString, equals and hashCode methods. We plan to make this configurable.
```java -Xmx62g -cp target/CloneDetector-0.0.1-SNAPSHOT.jar edu.columbia.cs.psl.ioclones.driver.SimAnalysisDriver -cb {/path/to/your/bytecodebase} -alg deepHash -mode exhaustive -eName “{preferred_comparision_name_in_db}” -db {db_IP}:{port}/{db_name} -user {root or your_username} -pw {your_password} -io {path/to/iorepo}```

###Step 4
For reviewing the detected function clones, you can simply use the following SQL command:
2. For a single I/O repository, to explore similarities of a specific method, run the following command:

```java -Xmx62g -cp target/CloneDetector-0.0.1-SNAPSHOT.jar edu.columbia.cs.psl.ioclones.driver.SimAnalysisDriver -cb {/path/to/your/bytecodebase} -alg deepHash -mode individual -target {method_name} -eName “{preferred_comparision_name_in_db}” -db {db_IP}:{port}/{db_name} -user {root or your_username} -pw {your_password} -io {path/to/iorepo}```
**NOTE: all methods with method names (does not include class name!) equal to the string replacing `{method_name}` will be compared against all other methods.**

3. For comparison of I/O profiles between two specified I/O repositories, run the following command:

```java -cp target/CloneDetector-0.0.1-SNAPSHOT.jar edu.columbia.cs.psl.ioclones.driver.SimAnalysisDriver -cb {/path/to/your/bytecodebase} -alg deepHash -mode comparison -eName "{preferred_comparision_name_in_db}" -db {db_ip}:{port}/{dbname} -user {username} -pw {pw} -io {/path/to/your/io_repo1} {/path/to/your/io_repo2}```

In all of these commands, replace `{/path/to/your/bytecodebase}` with the relative or full path to your bytecodebase, `{preferred_comparision_name_in_db}` with a name for your codebase to be stored in the database for any comparison reults from this execution, `{path/to/iorepo}` (or `{path/to/io_repo1}` and `{path/to/io_repo2}`) with the path to the `iorepo` directory that you wrote to in Step 2, and `{db_IP}`, `{port}`, `{db_name}`, `{your_username}`, and `{your_password}` with the appropriate identification fields for connecting to your mysql database. If the `-mode` flag is omitted, the mode will default to exhaustive comparison in a single I/O repo.

Only method I/O pairs with similarity scores higher than a predefined threshold will be written to the database.

Note: HitoshiIO will filter out constructor, static constructor, toString, equals and hashCode methods. We plan to make this configurable.

### Step 4 (Filter in database)

For reviewing the detected functional clones, you can simply use the following SQL command:

```sql
SELECT * FROM hitoshio_row
WHERE comp_id=codebase_id and sim>=your_threshold;
```

Notes: The defaul similarity threshold for HitoshiIO is 0.85.
Notes: The default similarity threshold for HitoshiIO is 0.85.

Additional Notes
------
A new release has been pushed on 6/11/2019, fixing a few issues with the selection of inputs and outputs. The previous version of HitoshiIO, referenced in the paper, can be found in `oldclonedetector`.


Questions, concerns, comments
Expand All @@ -69,4 +106,3 @@ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLI
Acknowledgements
--------
The authors of this software are [Fang-Hsiang (Mike) Su](mailto:mikefhsu@cs.columbia.edu), [Jonathan Bell](mailto:jbell@cs.columbia.edu), [Gail Kaiser](mailto:kaiser@cs.columbia.edu) and [Simha Sethumadhavan](mailto:simha@cs.columbia.edu). This work is funded in part by NSF CCF-1302269, CCF-1161079 and NSF CNS-0905246..

6 changes: 4 additions & 2 deletions clonedetector/classinfo/db_gen.txt
Expand Up @@ -11,9 +11,11 @@ CREATE TABLE `hitoshio_summary` (
CREATE TABLE `hitoshio_row` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`comp_id` int(11) unsigned NOT NULL,
`method1` varchar(300) NOT NULL DEFAULT '',
`method1` varchar(100) NOT NULL DEFAULT '',
`m_class1` varchar(300) NOT NULL DEFAULT '',
`m_id1` int(11) unsigned NOT NULL,
`method2` varchar(300) NOT NULL DEFAULT '',
`method2` varchar(100) NOT NULL DEFAULT '',
`m_class2` varchar(300) NOT NULL DEFAULT '',
`m_id2` int(11) unsigned NOT NULL,
`inSim` double unsigned NOT NULL,
`outSim` double unsigned NOT NULL,
Expand Down
29 changes: 27 additions & 2 deletions clonedetector/dependency-reduced-pom.xml
Expand Up @@ -30,7 +30,7 @@
</plugin>
<plugin>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
Expand All @@ -43,6 +43,7 @@
</plugin>
<plugin>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
<executions>
<execution>
<id>attach-sources</id>
Expand All @@ -54,6 +55,7 @@
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.1.1</version>
<configuration>
<archive>
<manifest>
Expand All @@ -66,6 +68,27 @@
</archive>
</configuration>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<skipTests>${skipTests}</skipTests>
</configuration>
</plugin>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<filesets>
<fileset>
<directory>evosuite-tests</directory>
<includes>
<include>**</include>
</includes>
<followSymlinks>false</followSymlinks>
</fileset>
</filesets>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
Expand All @@ -89,5 +112,7 @@
<url>http://ase.cs.columbia.edu:8282/repository/snapshots</url>
</snapshotRepository>
</distributionManagement>
<properties>
<skipTests>false</skipTests>
</properties>
</project>

0 comments on commit 2bf15fb

Please sign in to comment.