This repo is built to provide an example of how I use Codex for my daily research. Previously I use ChatGPT for every step, now I separate the workflow with two parts: getting idea and implementation & feedback. Coding agent is capable to:
- implement the code for an idea
- conduct small local jobs (e.g. the things that can be finishsed with Python/Bash script with several minutes)
- Identify small bugs without debugging manually
- Analyze the results
- Write documentation (that you don't want to write)
- Understand/Refactor your old messy code
- Provide quick understanding for newly published complicated code repo
However using coding agents can be annoying if you don't have a good organization of your files or you don't know how to communicate with agents. I'd like to share my usage of coding agents with Codex as one example. You can surely switch to Claude Code or other your favorite agents. Note that You need to download the VSCode editor, install the Codex extension (and have access to Codex e.g. a ChatGPT membership) to try examples on this playground.
Coding agent can get access to a local or a Github-based repo (a folder). You can grant more permissions with acknowledgement of the risk. The magic thing about coding agent (compared to ChatGPT) is that it can write command and conduct it in the command line. This gives it the ability to see files in the folder and get feedback from step to step. This is the key difference from ChatGPT. In a normal chat, the model only knows what you paste into the chat box. In Codex, the agent can inspect the files in your repo, propose shell commands, run local scripts, and revise files directly. So it is not only "answering" your question, but also interacting with your project step by step. That is why organization of the repo matters much more here than in a normal chat. For safety, you can always ask it to provide the commands first, review them, and only then let it conduct them.
I will use my published code repo as the playground: https://github.com/ChaoNyu/CSTShift (Star it if you like!). Use this command to clone the repo to your local PC:
git clone https://github.com/ChaoNyu/CSTShift.git
(This example is designed to be able to run with a local PC since our HPC is too complicated and fragile for illustration. You can develop related HPC-specific skills like setting a singularity env.)
This is the vscode window you can see with the right panel as Codex:
Let's first try to summarize this repo:
> Summarize this repo. Briefly summarize each file as bullet points.
This should give you a good start to understand the repo. You can look around and ask more questions, just like using ChatGPT.
To train and run the model, we need to set up the env:
> I want to setup the conda env according to the readme.md. Make sure it's a new conda env. Provide the bash commands first, let me review it before conduct it.
(I check the answer and it looks good)
> Conduct them to install the env.
(After it's finished)
> Try to test if the env is able to be used.
In this conducting mode you will have steps that need you to confirm with higher permission. The installing takes a while (<10mins for me) and it actually fixes small problems with reasonable solutions:
- I omitted cudatoolkit=11.3 because it is not available on osx-arm64.
- I downgraded numpy to 1.26.4 because the older torchvision 0.13.1 stack is not clean with NumPy 2.x.
Now we have a valid playground. Let's move on to the organization part.
As in the workflow, an idea will lead to the code implementation, and the related task result analysis. For an automated research workflow, you would want the coding agents to be able to get access to your main code, your previous results, and other related files in an organized way. I create separate folders for each experiment (run) as this:
In addtion to a normal repo, I add this runs folder to store all the runs with related prompt/summary/code/output. I will later use some examples to explain this idea.
Please create a runs folder and a weekly folder for later usage.
To implement ideas or to analyze results, we are doing a lot of different things: write Python code to change the model, write Jupyter notebooks to plot the figure, create slurm job files, debugging... If you want to conduct something with Codex, you would like it to follow certain rules. Writing the rules everytime would be tedious and easy to break. We can use Skills to make this easier.
This provided folder already contains the files that you should copy/paste into the example playground repo. Concretely, copy skills-agent/AGENTS.md into the root of the playground repo, and copy the folder skills-agent/skills into .agent/skills in that repo. The AGENTS.md controls the repo-local behavior, meaning it tells Codex the overall rules to follow whenever it works in that repo. The .agent/skills/ folder contains the reusable skills. A skill is like a fixed function or workflow that Codex can use for a certain type of task, for example plotting, creating a run, or writing a summary. You can check further about the recommended usage here: https://developers.openai.com/codex
Note that I haven't tuned these files to achieve the best performance. You should adjust them for your best experience. With these skills we can start to vibe-code our ideas.
Let's start with training a toy model.
> For illustration, I want to create a run to train a toy model based on this repo.
1. It should be able to finish on a normal PC/mac for <10 mins. Use very few epochs.
2. Explain the model config you will choose. It should be a very small model with few parameter numbers just to do a quick test on the model training code.
3. I want to use the dataset of CHESHIRE as provided in the dataset. Check if you need further processing for that (e.g. is the split provided?)
4. Save the trained model in the new run folder.
Plan this first then we can conduct it.
(Reply with a good plan)
> What does ext_atom_method do? Does this require additional dft results that are not in the provided dataset? I'm not sure if the dataset has the dft and you can check that
(Reply to confirm details)
> You can use the processed CHESHIRE data. Your plan is good. Implement and run it.
(Run)
> Create a new run as v2 of this run, where everything is the same but use 100 epochs to see if the training loss can decrease as expected. Create all related file and just run it.
(v2 created and run)
To implement a slightly complicated task (like this one), I prefer to write bullet points for all the thoughts I have on this task. Though the plan mode is already very powerful in codex (try type 'plan' in the box), it would be easier to control if you write the points manually. You can also try to just write "I want to create a run to train a toy model based on this repo. " and see what's different.
This create-run will trigger the skills we have written in the .agent folder. For separate task you should create different runs to make sure they are not messed up together as one big run folder. In my prompt I also say to plan it first before conduct it. Extra care is needed for heavier tasks like training a model or going through a big dataset. You can ask additional questions before actually conduct the run.
Note that for long runs and slurm submission it's better to let it write the script and run it by yourself.
As long as it's a continuous run, you can have the conversation in this window. When your conversation is too long, Codex will compress the context, which might not be the optimal solution that you want. Check the context window on the right bottom to keep track of this. I recommend to change to a new window to start a new run. If your run is not finished in one conversation, that means you may need to reconsider your definition of one run. It should be more atomic. You can have a try with my provided conversation prompt above to get a newly trained toy model.
You may find your files in the folder to have different colors, green, brown, etc. That's the git version control. It helps a lot for a safer and collaborative code development. Git is a history system for your code. It records what changed, when it changed, and lets you compare the current version with older ones. This is one of the most important tools to use in professional collaborative development. Even if you don't collaborate, you should use it because you are collborating with yourself from days/weeks ago. Usually you will find yourself not understanding the code you wrote last week. If you are not familiar with Git, you can check https://git-scm.com/docs/gittutorial or just ask ChatGPT/Codex:
> Explain Git to a beginner with top-5 useful command and one example.
The beginner operational view is:
- green/brown file colors in VSCode mean Git sees changes in those files
- "diff" means seeing exactly what lines changed
- "stage" means selecting which file changes you want to include in the next saved snapshot
- "commit" means saving one named snapshot in the history
In Codex workflow this matters even more, because the agent may revise files quickly. Git lets you inspect exactly what changed before you keep it.
In this system the git should protect the src (source code) and other files that you think is important (e.g. .agent folder). You can set the .gitignore file to ignore the files that should not be in the git controlled region, e.g. __pycache__. You can decide whether to keep track of runs folder or not. If you will generate a lot of artifacts (e.g. figures), you may want to just keep track of summary in runs instead of everything. However this is tricky because if you don't have the figures you cannot render the figures correctly in the summary.
In my setting I will save the main core code under src, while there is also supplementary code in each run. For example, the Python files to create splits, or small helping code snippets for analysis. I don't keep these codes with Git becuase they are designed run-only. Therefore, though conversations may lead to file revision, it doesn't lead to an essential commit.
When you revise the core code you need to make a commit after the revision is tested correct. You can use the command line to do this. Another way is to use the source control panel (on the left). Make sure you check what has been changed for each file before adding to stage.
One nice thing that VSCode can help is that it can automatically generate the commit message. This is super useful when you need to revert back to certain version or check the code difference.
A lot of us use Jupyter notebook for multiple purposes: analyze data, write comments, display results, plot figures, or even compose all-in-one files. This will be hard for re-using the notebooks. Jupyter notebooks are also more difficult for coding agents to read as context. For example they will be read as json file where the figures are also read as words that cannot be understood. My alternative solution is to decompose the jupyter notebook functions. In different runs when you need to analyze the results, create separate Python files just as different cells in the notebooks. I will demonstrate my usage with figure plotting, and you can design your own way to use Python files in an interactive way (or find better practice with notebook + AI combination).
I create the skill of plotting which you can see in the skill folder. When I need to plot something, this will create a separate Python file in the run folder, with all user-tunable plot controls at the very top of the plotting script as ALL_CAPS variables. For example we can try to plot the training curve of the toy model:
...
> Plot the training curve using skill.
(Give the Python file and png)
> I want more detailed adjustment to the legend position.
You will have a new plot_training_curve.py for plotting and png file in out/artifacts. You can change the variables at the top of the Python file and rerun it (with the correct env) to adjust the style. For example I change the position of the legend in the figure to make it look better.
After the run is finished, you can use the skill summarize_run to make a summary markdown file:
> Summarize the run 04-08-26__cheshire-toy-train__carbon-smoke-v2
Markdown files are AI-friendly and their usage is quite similar to LaTeX. You can insert figures, tables, control format easily. You can find some good tutorials like https://www.markdownguide.org/basic-syntax/#overview.
It works best to design your own summary style. For example, in my design I have this Comments section, where I can write my own thoughts and comments for this run. These comments will also be copied directly without altering when a larger summary (e.g. weekly update) is generated as required in the skills.
You can also try the weekly summary skill when you have more runs during a week. It's also great to have a PROJECT.md file to summarize the current progress of this project. This helps a lot to communicate with others about your research update and progress so far.
- Refactor one of your jupyter notebooks
- Build your own skills e.g. create slurm job files similar to your template
- Run the weekly summary skill
- Refactor the code to trim the dead functions






