In [1]:
from openai import OpenAI

openai_api_key = ''
client = OpenAI(api_key=openai_api_key)

In [13]:
def create_manifest(wrapper_script_fp, LSID, author, docker_image, repo, documentation_url, filepath, output_fp='output/manifest'):
    with open(wrapper_script_fp, 'r') as file:
        wrapper_script = file.read()
    
    completion = client.chat.completions.create(
    model="ft:gpt-4o-mini-2024-07-18:personal::B1TFUey0",
    messages=[
        {"role": "system", "content": """You are a genius GenePattern developer who writes manifest files based on wrapper scripts. You
will be fired for any mistakes you make"""},
        {"role": "user", "content": f"""
Act as a senior software developer. I am going to describe a GenePattern manifest file, and then I would like you to create one based on information I provide. The first line is a "#" followed by the name of the function I provide. The second line is a # followed by the current date and time. Do not include any blank lines in the output.
The next section should be pasted verbatim:
---------------------------------------------------------
JVMLevel=
LSID=
author=GenePattern Team + ChatGPT
commandLine=
cpuType=any
The next line should be "description=" followed by a brief description of the function I provided.
The next line should be "documentationUrl=" followed by the URL to the site where the function is described.
The next line should be "fileFormat=" followed by a comma-separated list of the file extensions output by the function.
The next line should be "job.cpuCount="
The next line should be "job.docker.image=[DOCKER IMAGE HERE]"
The next section should be pasted verbatim:
job.memory=
job.walltime=
language=any
The next line should start with "categories=" and should be most applicable term in following list: alternative splicing,batch correction,clustering,cnv analysis,data format conversion,differential expression,dimension reduction,ecdna,flow cytometry,gene list selection,gsea,image creators,metabolomics,methylation,missing value imputation,mutational significance analysis,pathway analysis,pipeline,prediction,preprocess & utilities,projection,proteomics,rna velocity,rna-seq,rnai,sage,sequence analysis,single-cell,snp analysis,statistical methods,survival analysis,variant annotation,viewer,visualizer
The next line should be "name=" and then the name of the function.
The next line should be "os=any"
After that, identify all the parameters in the wrapper script. 
---------------------------------------------------------
The manifest file should include each parameter of the provided function, in the format. I will provide below. Here are some instructions for this format:
1. When you see a # character, replace it with the number of the parameter in the provided function.
2. When you see "default_value=", place the parameter's default value after the "=" if there is one.
3. When you see "description=", add the parameter's description after the "="
4. When you see "name=", add the parameter's name after the "="
5. When you see "optional=", write "on" if the parameter is optional
6. In the parameter name, replace the "#" with the cardinal number of the parameter
7. When you see "flag=", add the parameter's command-line flag after the "=" if there is one. If there are more than one way of specifying a flag, use the one that starts with two hyphens: "--"
8. When you see "type=", add the parameter's type after the "=". The type should be the term in the following comma-separated list that corresponds most closely to the type of parameter: CHOICE,FILE,Floating Point,Integer,TEXT,java.lang.String. Pick java.io.File if you think the input are filepaths. 
9. When you see "taskType=", add the type of analysis this module performs. For example: batch correction, visualizer, scRNA analysis. You can infer the category based on the name and description.
10 If you think a parameter is a file path, put IN for the p#_MODE
11. Order from p1-pn
12. Enter a new line between parameter info (ex: new line between p1_... and p2_...)

---------------------------------------------------------
Here is the format for each parameter:
p#_MODE=
p#_TYPE=
p#_default_value=
p#_description=
p#_fileFormat=
p#_flag=
p#_name=
p#_numValues=
p#_optional=
p#_prefix=
p#_prefix_when_specified=
p#_type=
p#_value=
taskType= 
---------------------------------------------------------
For the commandline, generate a Rscript commandline to run the wrapper script. The parameters should be: --flag <value>. 

Relevant info:
LSID: {LSID}
author: {author}
docker_image: {docker_image}
repo: {repo}
documentation_url: {documentation_url},
filepath: {filepath}
wrapper_script: {wrapper_script}"""}
    ])

    manifest = completion.choices[0].message.content
    
    with open(output_fp, 'w') as file:
        file.write(manifest)
    file.close()
    
    return manifest

In [14]:
output = create_manifest(wrapper_script_fp='input/wrapper.R',
                LSID='urn:lsid:genepattern.org:module.analysis:00465:999999999',
                author='Thorin Tabor;UCSD - Mesirov Lab',
                docker_image='genepattern/spatialge-stgradient:0.2',
                repo='https://github.com/genepattern/spatialGE.STgradient',
                documentation_url='https://genepattern.github.io/spatialGE.STgradient/v1/',
                filepath='/spatialGE/wrapper.R',)