Issue: The agent couldn't create a workflow that reads the dataset using the following prompt and ends up getting a litellm.RateLimitError: AnthropicException.
# Dataset
1. TexeraChatbot_testdata_DDX41.txt.gz
Path: /lijin.bcm@gmail.com/texerachatbot-testdata-ddx41/v4/TexeraChatbot_testdata_DDX41.txt.gz
This TexeraChatbot_testdata_DDX41.txt.gz file includes a cell-by-gene raw count matrix, comprising 15,307 single cells (in columns) and 33,696 features (gene symbols, in rows). The first row contains cell barcodes, and the first column contains gene symbols.
2. TexeraChatbot_testdata_DDX41_obs.txt.gz
Path: /lijin.bcm@gmail.com/texerachatbot-testdata-ddx41/v4/TexeraChatbot_testdata_DDX41_obs.txt.gz
This TexeraChatbot_testdata_DDX41_obs.txt.gz file includes cell-level metadata for cell barcodes. The column “barcode” is the unique identifier for each cell. Other columns are described below:
- nCount_RNA: total UMI counts per cell
- nFeature_RNA: total number of detected features per cell
- percent.mt: percentage of mitochondrial reads per cell
- pANN: proportion of artificial nearest neighbors calculated by DoubletFinder
- nuclear_fraction: nuclear fraction score, capturing the proportion of reads derived from intronic regions; calculated using the DropletQC R package
- sampleid: 2 unique sample IDs, i.e., DDX41 for DDX41 cKO mouse and WT for wild-type mouse. The genotype for the conditional knockout mouse is Ddx41 fl/fl; ChxCre, and the genotype for the wild-type mouse is Ddx41fl/fl.
- majorclass: 12 annotated major cell classes, including AC, BC, Cone, HC, MG, Microglia, RGC, Rod, Endothelial, Pericyte, RPE, and Astrocyte
- celltype: high-resolution cell type annotation
In summary, the dataset comprises 15,307 single cells derived from 2 unique sample IDs, annotated into 12 major cell classes.
3. TexeraChatbot_testdata_DDX41_var.txt.gz
Path: /lijin.bcm@gmail.com/texerachatbot-testdata-ddx41/v4/TexeraChatbot_testdata_DDX41_var.txt.gz
This TexeraChatbot_testdata_DDX41_var.txt.gz file includes the gene features for the single-cell dataset. The “symbol” column contains the gene symbols for the 33,696 features, including both protein-coding and non-coding genes. Gene identifiers are gene symbols, and the RNA genome build used is the mouse reference (GRCm39).
Dataset (Too large to upload in github):
https://texera.eye.som.uci.edu/dashboard/hub/dataset/result/detail/6
Model: Claude-Haiku-4-5
Issue: The agent couldn't create a workflow that reads the dataset using the following prompt and ends up getting a
litellm.RateLimitError: AnthropicException.The user expects to read the file using
pandas.read_table()function, but the agent keeps complaining that the file paths given to the prompt are not valid file system paths.