-
Notifications
You must be signed in to change notification settings - Fork 45
Scripts for jlearch #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Scripts for jlearch #279
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
ac47ad1
Add scripts
e72e5cb
Add some comments in scripts
006ce3e
Add docs for scripts
ca0567f
Remove project from run_contest_estimator.sh
afd39d0
Add setup.md and fix some scripts
5c9bfe2
Fixes scripts
1d7faf5
Fix issues
Atos1337 ad88c03
Add predictor in scripts
Atos1337 e0abce8
Add nformation about scrapper and quality_analysis
Atos1337 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| # Pipeline diagram | ||
|
|
||
| ```mermaid | ||
| graph TD | ||
| Projects --> ContestEstimator | ||
| Selectors --> ContestEstimator | ||
| subgraph FeatureGeneration | ||
| ContestEstimator --> Tests | ||
| Tests --> Features | ||
| Tests --> Rewards | ||
| end | ||
|
|
||
|
|
||
| Features --> Data | ||
| Rewards --> Data | ||
|
|
||
| Data --> Models | ||
| Models --> NNRewardGuidedSelector --> UsualTestGeneration | ||
| ``` | ||
|
|
||
| # Training | ||
|
|
||
| Briefly: | ||
|
|
||
| * Get dataset `D` by running `ContestEstimator` on several projects using several selectors. | ||
| * Train `model_0` using `D` | ||
| * For several `iterations` repeat (assume we on `i`-th step): | ||
| * Get dataset `D'` by running `ContestEstimator` on several projects using `NNRewardGuidedSelector`, which will use `model_i` | ||
| * $$D = D \cup D'$$ | ||
| * Train `model_$(i+1)` using `D` | ||
|
|
||
| To do this, you should: | ||
| * Be sure that you use `Java 8` by `java` command and set `JAVA_HOME` to `Java 8`. | ||
| * Put projects, on which you want to learn in `contest_input/projects` folder, then list classes, on which you want to learn in `contest_input/classes/<project name>/list` (if it is empty, than we will take all classes from project jar). | ||
| * Run `pip install -r scripts/requirements.txt`. It is up to you to make it in virtual environment or not. | ||
| * List selectors in `scripts/selector_list` and projects in `scripts/prog_list` | ||
| * Run `./scripts/train_iteratively.sh ` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| # How to use scripts | ||
| For each scenario: go to root of `UTBotJava` repository - it is `WORKDIR`. | ||
|
|
||
| `PATH_SELECTOR` as argument is `"PATH_SELECTOR_TYPE [PATH_SELECTOR_PATH for NN] [IS_COMBINED (false by default)] [ITERATIONS]"`. | ||
|
|
||
| Before start of work run: | ||
| ```bash | ||
| ./scripts/prepare.sh | ||
| ``` | ||
|
|
||
| It will copy contest resources in `contest_input` folder and build the project, because we use jars, so if you want to change something in code and re-run scripts, then you should run: | ||
| ```bash | ||
| ./gradlew clean build -x test | ||
| ``` | ||
|
|
||
| ## To Train a few iterations of your models: | ||
| By default features directory is `eval/features` - it should be created, to change it you should manually do it in source code of scripts. | ||
|
|
||
| List projects and selectors on what you want to train in `scripts/prog_list` and `scripts/selector_list`. You will be trained on all methods of all classes from `contest_input/classes/<project name>/list`. | ||
|
|
||
| Then just run: | ||
| ```bash | ||
| ./scripts/train_iteratively.sh <time_limit> <iterations> <output_dir> <python_command> | ||
| ``` | ||
| Python command is your command for python3, in the end of execution you will get iterations models in `<output_dir>` folder and features for each selector and project in `<features_dir>/<selector>/<project>` for `selector` from `selectors_list` and in `<features_dir>/jlearch/<selector>/<prog>` for models. | ||
|
|
||
| ## To Run Contest Estimator with coverage: | ||
| Check that `srcTestDir` with your project exist in `build.gradle` of `utbot-junit-contest`. If it is not then add `build/output/test/<project>`. | ||
|
|
||
| Then just run: | ||
| ```bash | ||
| ./scripts/run_with_coverage.sh <project> <time_limit> <path_selector> <selector_alias> | ||
| ``` | ||
|
|
||
| In the end of execution you will get jacoco report in `eval/jacoco/<project>/<selector_alias>/` folder. | ||
|
|
||
| ## To estimate quality | ||
| Just run: | ||
| ```bash | ||
| ./scripts/quality_analysis.sh <project> <selector_aliases, separated by comma> | ||
| ``` | ||
| It will take coverage reports from relative report folders (at `eval/jacoco/project/alias`) and generate charts in `$outputDir/<project>/<timestamp>.html`. | ||
| `outputDir` can be changed in `QualityAnalysisConfig`. Result file will contain information about 3 metrics: | ||
| * $\frac{\sum_{c \in classSet} instCoverage(c)}{|classSet|}$ | ||
| * $\frac{\sum_{c \in classSet} coveredInstructions(c)}{\sum_{c \in classSet} allInstructions(c)}$ | ||
| * $\frac{\sum_{c \in classSet} branchCoverage(c)}{|classSet|}$ | ||
|
|
||
| For each metric for each selector you will have: | ||
| * value of metric | ||
| * some chart with median, $q_1$, $q_3$ and so on | ||
|
|
||
|
|
||
| ## To scrap solution classes from codeforces | ||
| Note: You can't scrap many classes, because codeforces api has a request limit. | ||
|
|
||
| It can be useful, if you want to train Jlearch on classes usually without virtual functions, but with many algorithms, so cycles and conditions. | ||
|
|
||
| Just run: | ||
| ```bash | ||
| python3 path/to/codeforces_scrapper.py --problem_count <val> --submission_count <val> --min_rating <val> --max_rating <val> --output_dir <val> | ||
| ``` | ||
|
|
||
| All arguments are optional. Default values: `100`, `10`, `0`, `1500`, `.`. | ||
|
|
||
| At the end you should get `submission_count` classes for each of `problem_count` problems with rating between `min_rating` and `max_rating` at `output_dir`. Each class have package `p<contest_id>.p<submission_id>`. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| # How to setup environment for experiments on Linux | ||
|
|
||
| * Clone repository, go to root | ||
| * `chmod +x ./scripts/*` and `chmod +x gradlew`. | ||
| * Set `Java 8` as default and set `JAVA_HOME` to this `Java`. | ||
| For example | ||
| * Go through [this](https://sdkman.io/install) until `Windows installation` | ||
| * `sdk list java` | ||
| * Find any `Java 8` | ||
| * `sdk install <this java>` | ||
| * `sdk use <this java>` | ||
| * Check `java -version` | ||
| * `mkdir -p eval/features` | ||
| * `mkdir models` | ||
| * Set environment for `Python`. | ||
| For example | ||
| * `python3 -m venv /path/to/new/virtual/environment` | ||
| * `source /path/to/venv/bin/activate` | ||
| * Check `which python3`, it should be somewhere in `path/to/env` folder. | ||
| * `pip install -r scripts/requirements.txt` | ||
| * `./scripts/prepare.sh` | ||
| * Change `scripts/prog_list` to run on smaller project or delete some classes from `contest_input/classes/<project>/list`. | ||
|
|
||
| # Default settings and how to change it | ||
| * You can reduce number of models in `models` variable in `scripts/train_iteratively.sh` | ||
| * You can change amount of required RAM in `run_contest_estimator.sh`: `16 gb` by default | ||
| * You can change `batch_size` or `device` n `train.py`: `4096` and `gpu` by default | ||
| * If you are completing setup on server, then you will need to uncomment tmp directory option in `run_contest_estimator.sh` | ||
|
|
||
| # Continue setup | ||
| * `scripts/train_iteratively.sh 30 2 models <your python3 command>` | ||
Atos1337 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| * In `models/` you should get models. | ||
| * `mkdir eval/jacoco` | ||
| * `./scripts/run_with_coverage.sh <any project (guava-26.0, for example)> 30 "NN_REWARD_GUIDED_SELECTOR path/to/model" some_alias`. `path/to/model` should be something like `models/nn32/0`, where `nn32` is a type of model and `0` is the iteration number | ||
| * You should get jacoco report in `eval/jacoco/guava-26.0/some_alias/` | ||
Atos1337 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,120 @@ | ||
| import argparse | ||
| import os.path | ||
| import time | ||
|
|
||
| import requests | ||
| from urllib import request | ||
| import json | ||
| import bs4 | ||
| import javalang | ||
|
|
||
| from codeforces import CodeforcesAPI | ||
|
|
||
|
|
||
Atos1337 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| def get_args(): | ||
| parser = argparse.ArgumentParser() | ||
| parser.add_argument("--problem_count", dest='problem_count', type=int, default=100) | ||
| parser.add_argument("--submission_count", dest='submission_count', type=int, default=10) | ||
| parser.add_argument("--min_rating", dest='min_rating', type=int, default=0) | ||
| parser.add_argument("--max_rating", dest="max_rating", type=int, default=1500) | ||
| parser.add_argument("--output_dir", dest="output_dir", type=str, default=".") | ||
| return parser.parse_args() | ||
|
|
||
|
|
||
| args = get_args() | ||
|
|
||
|
|
||
| def check_json(answer): | ||
| values = json.loads(answer) | ||
|
|
||
| if values['status'] == 'OK': | ||
| return values['result'] | ||
|
|
||
|
|
||
| def get_main_name(tree): | ||
| """ | ||
| :return: class name with main method | ||
| """ | ||
| return next(klass.name for klass in tree.types | ||
| if isinstance(klass, javalang.tree.ClassDeclaration) | ||
| for m in klass.methods | ||
| if m.name == 'main' and m.modifiers.issuperset({'public', 'static'})) | ||
|
|
||
|
|
||
| def save_source_code(contest_id, submission_id): | ||
| """ | ||
| Parse html page to find source code of submission and save it in some unique package p${contest_id}.p${submission_id}. | ||
| If we reach api request bound, then we try to sleep for 5 minutes. | ||
| """ | ||
| url = request.Request(f"http://codeforces.com/contest/{contest_id}/submission/{submission_id}") | ||
| with request.urlopen(url) as req: | ||
| soup = bs4.BeautifulSoup(req.read(), "html.parser") | ||
| path = os.path.join(args.output_dir, f"p{contest_id}", f"p{submission_id}") | ||
| if not os.path.exists(path): | ||
| os.makedirs(path) | ||
| code = "" | ||
| for p in soup.find_all("pre", {"class": "program-source"}): | ||
| code += p.get_text() | ||
| tree = javalang.parse.parse(code) | ||
| try: | ||
| name = get_main_name(tree) | ||
| with open(os.path.join(path, f"{name}.java"), 'w') as f: | ||
| print(f"package p{contest_id}.p{submission_id};", file=f) | ||
| f.write(code) | ||
| except StopIteration: | ||
| print("Sleeping, because we reach request bound") | ||
| time.sleep(300) | ||
|
|
||
|
|
||
| def main(): | ||
| codeforces = "http://codeforces.com/api/" | ||
Atos1337 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| api = CodeforcesAPI() | ||
|
|
||
| with request.urlopen(f"{codeforces}problemset.problems") as req: | ||
| all_problems = check_json(req.read().decode('utf-8')) | ||
|
|
||
| problems = [] | ||
| cur_problem = 0 | ||
| for p in all_problems['problems']: | ||
| if cur_problem >= args.problem_count: | ||
| break | ||
| if p.get('rating') is None: | ||
| continue | ||
| if p['rating'] < args.min_rating or p['rating'] > args.max_rating: | ||
| continue | ||
| cur_problem += 1 | ||
| problems.append({'contest_id': p['contestId'], 'index': p['index']}) | ||
|
|
||
| print(f"Get {len(problems)} problems: {problems[0]}") | ||
|
|
||
| """ | ||
| For each problem try to take submission_count submissions. | ||
| """ | ||
| all_submission = 0 | ||
| for i, p in enumerate(problems): | ||
| cur_submission = 0 | ||
| iteration = 0 | ||
| page_size = 1000 | ||
| while cur_submission < args.submission_count: | ||
| length = 0 | ||
| for s in api.contest_status(contest_id=p['contest_id'], from_=page_size * iteration + 1, count=page_size): | ||
| if cur_submission >= args.submission_count: | ||
| break | ||
| length += 1 | ||
| if s.problem.contest_id != p['contest_id'] or s.problem.index != p['index']: | ||
| continue | ||
| if s.programming_language != "Java 8": | ||
| continue | ||
| if s.verdict.name != "ok": | ||
| continue | ||
| save_source_code(p['contest_id'], s.id) | ||
| cur_submission += 1 | ||
| all_submission += 1 | ||
| print(f"Get new {all_submission} program") | ||
| iteration += 1 | ||
| if length == 0: | ||
| break | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| #!/bin/bash | ||
|
|
||
| ./gradlew clean build -x test | ||
|
|
||
| INPUT_FOLDER=contest_input | ||
|
|
||
| # Copy resources folder in distinct folder to allow other scripts have specific project in contest resources folder | ||
| mkdir $INPUT_FOLDER | ||
| cp -r utbot-junit-contest/src/main/resources/* $INPUT_FOLDER |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| antlr |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| #!/bin/bash | ||
|
|
||
| PROJECT=${1} | ||
| SELECTORS=${2} | ||
| STMT_COVERAGE=${3} | ||
| WORKDIR="." | ||
|
|
||
| # We set QualityAnalysisConfig by properties file | ||
| SETTING_PROPERTIES_FILE="$WORKDIR/utbot-analytics/src/main/resources/config.properties" | ||
| touch $SETTING_PROPERTIES_FILE | ||
| echo "project=$PROJECT" > "$SETTING_PROPERTIES_FILE" | ||
| echo "selectors=$SELECTORS" >> "$SETTING_PROPERTIES_FILE" | ||
|
|
||
| JAR_TYPE="utbot-analytics" | ||
| echo "JAR_TYPE: $JAR_TYPE" | ||
| LIBS_DIR=utbot-analytics/build/libs/ | ||
| UTBOT_JAR="$LIBS_DIR$(ls -l $LIBS_DIR | grep $JAR_TYPE | awk '{print $9}')" | ||
| echo $UTBOT_JAR | ||
| MAIN_CLASS="org.utbot.QualityAnalysisKt" | ||
|
|
||
| if [[ -n $STMT_COVERAGE ]]; then | ||
| MAIN_CLASS="org.utbot.StmtCoverageReportKt" | ||
| fi | ||
|
|
||
|
|
||
|
|
||
| #Running the jar | ||
| COMMAND_LINE="java $JVM_OPTS -cp $UTBOT_JAR $MAIN_CLASS" | ||
|
|
||
| echo "COMMAND=$COMMAND" | ||
|
|
||
| $COMMAND_LINE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| beautifulsoup4 | ||
| javalang | ||
| numpy | ||
| pandas | ||
| requests | ||
| scikit_learn | ||
| torch |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.