Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**important** how to add a coq proj to a splits file json files automatically? i.e coq-proj -> coq-proj_splits.append #11

Open
brando90 opened this issue Dec 14, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@brando90
Copy link
Owner

brando90 commented Dec 14, 2022

Goal: coq-proj -> coq-proj_splits.append

  • download (.gitmodules) the specific version of the coq proj we want -- wrt to the right coq ver & ocaml compiler
    • likely need switch, compiler, coq ver, git commit for that coq proj
    • if we could use the opam remote version best! but we need to know how to coqc compile it through pycoq's strace))
  • create splits train, test (random 0.9, 0.1 -- ideally based on topological sort, actually divide by proj, some projs are train other test. Simpler + tests harder generalization. In production just train with EVERYTHING)
  • detect how to build coq proj (or add misisng files), either
    • a. opam install opam.lf files & _CoProject (later just has args to coqc & *.v$ files)
    • b. make, use the official MakeFile (see debug_proj) with a _CoqProj(just coqc args, *.v$) and uses make clean command cd-ed to the right dir. Might need the opam envs that proverbot hard codes with echo "eval \"$(opam env --set-switch --switch=$SWITCH)\"" >> coq-projects/$project/make.sh
    • c. Dune. No idea how to do this.
  • then given this coq-projs, run pycoq's python data extraction script to get the data we need/want
@brando90
Copy link
Owner Author

for commit: #12

@brando90
Copy link
Owner Author

in additions there is no _CoqMake, there is a Make. Might need to port some of these.

@brando90
Copy link
Owner Author

brando90 commented Dec 15, 2022

to be safe cd to the coq projs (as I did in my python scripts)

Goal is to make a single coq_proj.opam for all coq-projects from proverbot. Run the proverbot install script and store everything you need into the coq_proj.opam file (and gitmodule? can we get rid of this? UCSD-PL/proverbot9001#68).

then we can "remake" all of coq-gym from proverbot and test the addition of a new coq-proj with the above code that checks the things inside that coq-proj, create a copy/git clones it etc according to the list at the top. End result is a _CoqMake with a coq_proj.lf with all dependencies. Figure out how to remove the configure command "build_command": "./configure.sh && ./remake" in the build from coq-projs in proverbot UCSD-PL/proverbot9001#67.

Note, this makes most of the fields in proverbot's splits.json not needed anymore -- except for the train_files & test_files -- which is all that is needed in those files now. Unless we mark in the coq_proj.opam folder if it's a train or test project in the case no more .json files are needed. If you see my above comment I choose this because it's 1. simpler to build + 2. it tests for a harder generalization setting. You can just re-train on EVERYTHING once you deploy it in practice. Or fine tune + (add the tokens too) on the test scripts so that it knows those files.

@brando90 brando90 added the enhancement New feature or request label Dec 15, 2022
@brando90
Copy link
Owner Author

brando90 commented Dec 15, 2022

splits:

todo: should be split by project or by train, test files? project doesn't need us to worry about topolical sort.
    Tests a harder gen. Let's do this + it's simpler. 

note: what is the difference btw build & install in a opam file? https://discuss.ocaml.org/t/what-is-the-difference-between-a-build-command-and-an-install-command-in-an-proj-opam-file/10966

Can opam files have arbitrary stuff? https://discuss.ocaml.org/t/can-i-have-arbitrary-text-fields-in-a-proj-opam-file-can-it-be-converted-to-json-too-if-i-want/10967

@brando90
Copy link
Owner Author

brando90 commented Dec 15, 2022

Edit to plan:

  • principles/heuristics:
      1. pre-prepare all the coq projects, build files needed, coq_proj.opam file with all the info needed, and if I am really forced & can't put it in the .opam file, then a modules file with the commit (https://stackoverflow.com/questions/5542910/how-do-i-commit-changes-in-a-git-submodule)
      • once the coq-projs are done, the builds work & one can create data from them, push EVERYTHING including the coq-projs submodules, .opam files (note build files already there and command in opam file & commit stored and dependencies too in opam file, everything tested with python mega install). This script also prepares the .json file with only the split info (which is optional since I am thinking to split by proj since it's simpler + might have better gen + ML models benfit more by data scale than random hardcoded decisions humans make anyway)
      1. although likely stuff has to be done manually for each coq-project, we can create a function that takes in a specific coq-proj (not coq-projs nor benchmarks) & tries to do to it's best of it's ability to do step 1

Summary of API

So two main functions (note benchmark == coq_projs):

    1. def prepare_single_coq_proj_generate_dummy_data(path2coqproj: str) -> does all of 1 but for 1 coq_proj automatically best effort. If it fails do fixes manually and try to add them to the function to your best of your ability so it works in the future with either a filenames != [] or a try. Must try to execute coq-files too & extract a dummy data e.g. tactic pred/PosEval.
    1. def preprepare_everything_from_scratch(path2benchmark: str) -> does step 1 outlined above
    • a. do all the prep to generate valid opam files -> tested with a very lightweight data creation loop (only executes script, collects tactics throws them away) -- just to make sure the prep skill works
    • b. [optionally calls your real data gen function if you want (passes a function and it's args)]
    1. def generate_benchmark_data_assuming_preparation_is_done(path2benchmark: str) -> assume the .opam files, builds are created properly and generates data
    1. [Extra?] given a def add_repo_given_git_url(giturl: str): -> does most of step 0 but adds to gitmodules (if that ends up mattering) and to the path to coq-projs etc.

Vocabulary:

  • benchmark:= the entire data set. In this context the collection of all the coq-projects so the coq-projects dir inside of pycoq.

Files to manage

Goal is to have everything in 1 place if possible. For now I think we can get away with 3 files (hope is 1 file eventually & push to my repo all the time so everything is reproducible in a good state):

  1. coq_proj.opam. Generated by step 0 (& thus 1).
    a. has everything. commit, build (build, install?), deps (opam ver, path to _CoqMake/Make/Remake/Makefile, git home page, opam ver, swith name, path2coqporjs, anything else? (looks good!)
  2. git .modules (submodules). Generated by step 0 (& thus 1).
  3. *splits.json. contains train, test splits, created at the end of step 1. (I think this one is optional if we are splitting by coq-proj)

@brando90
Copy link
Owner Author

brando90 commented Dec 15, 2022

For the sake of an example, here is VPs lf.opam file:

opam-version: "2.0"
maintainer: "vasily.pestun@gmail.com"

homepage: "https://github.com/pestun/lf"
dev-repo: "git+https://github.com/pestun/lf.git"
bug-reports: "https://github.com/pestun/lf/issues"
license: "LGPL-2.1"

synopsis: "Software foundation exercises"
description: """
solutions to software foundations exercises 
"""

version: "dev"

build: [make "-j%{jobs}%"]
install: [make "install"]

depends: [
  "ocaml"
  "coq" {(>= "8.11" & < "8.12~") | (= "dev")}
]

tags: [
  "category:Miscellaneous/Coq Extensions"
  "keyword:integer numbers"
  "keyword:arithmetic"
]
authors: [
  "Benjamin C. Pierce"
]

opam install calls build then install: https://discuss.ocaml.org/t/what-is-the-difference-between-a-build-command-and-an-install-command-in-an-proj-opam-file/10966

extra fields in .opam file: https://opam.ocaml.org/doc/Manual.html#opamfield-extra-fields, https://discuss.ocaml.org/t/can-i-have-arbitrary-text-fields-in-a-proj-opam-file-can-it-be-converted-to-json-too-if-i-want/10967/2

.opam -> .json: idk how yet https://discuss.ocaml.org/t/can-i-have-arbitrary-text-fields-in-a-proj-opam-file-can-it-be-converted-to-json-too-if-i-want/10967/2

@brando90
Copy link
Owner Author

(extra, start of from the opam file if it already exists, try to not do extra work if the coq_proj already "works" i.e. installs and I can get data from it)

@brando90
Copy link
Owner Author

what about making .opam files automatically for coq?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant