This is a repository for setting up Large Language Model (LLM) agents to analyze data on DANDI.
Aim 1 is to use LLMs and open data on DANDI to reproduce key foundational findings in systems neuroscience data analysis.
We have identified 10 key findings in systems neuroscience:
- Orientation selectivity
- Linear receptive field modeling
- Frequency tuning
- Spectrotemporal receptive fields
- Tuning for direction and speed
- Tuning during reach planning
- Place fields and grid cells
- Sequential activity and memory replay
- Theta phase entrainment
- Theta phase precession
These studies are intended to be foundational in the sense that many current studies rely on and extend these analyses. For example, a more contemporary study might look at how frequency tuning changes in different conditions (experiences, stimulation, maturity, attentional state, etc.). An ability to reproduce these foundational results is therefore necessary to reproduce these more contemporary studies.
More details about these topics can be found here.
Our initial approach is to use Cline, a LLM coding agent that is implemented as a Visual Studio Code extension. Cline can receive instructions and read, write, and execute code to accomplish the task. Cline can also use tools that are made available through the Model Context Protocol (MCP), which allows us to expand the capabilities of the agent. Cline supports a wide range of LLMs- here, we use Anthropic 3.7 Sonnet with thinking, as it provides the best performance in our experimentation.
For Cline to accomplish this task, its capatibilities need to be augmented with MCP tools. We provide the following tools, defined in neurosift:
- dandi_search: Search for datasets in the DANDI Archive using the standard text search feature
- dandi_semantic_search: Semantic search for DANDI datasets using natural language
- dandiset_info: Get detailed information about a specific DANDI dataset including neurodata objects and metadata
- dandiset_assets: List assets/files in a DANDI dataset
- nwb_file_info: Get information about an NWB file including neurodata objects
- dandi_list_neurodata_types: List all unique neurodata types in DANDI archive
- dandi_search_by_neurodata_type: Search for datasets containing specific neurodata types
A big part of getting the LLM to do what you want is to have detailed instructions. These instructions can be provided in a .clineinstruct file.
- Clone this repository:
git clone https://github.com/dandi/llm-analysis.git
- Download and install Visual Studio Code and open this repository within the application.
- In Visual Studio Code, install the Cline extension.
- Install neurosift MCPs: https://github.com/flatironinstitute/neurosift/blob/main-v2/docs/mcp-neurosift-tools.md
- Use
anthropic/claude-3.7-sonnet:thinking
with extended thinking enabled and a Budget of 1,024 tokens. - Open Cline and switch to
Plan
mode. - Within the prompt, type: a. Demonstrate orientation selectivity for neurons in visual areas of the brain.