# Edison Analysis API Tutorial

This notebook provides you with an example usecase for using `Edison Analysis` to perform data analysis.

The only dependency you need to follow along is `edison-client` which you can install via pip:

```bash
pip install edison-client
```

We recommend reading the edison client [docs](https://pypi.org/project/edison-client/) before following this tutorial.

To run a `Edison Analysis` job you should take the following steps:

1. Upload the any artifacts to the data storage service
2. Start an `Edison Analysis` run using the Edison client passing the data storage entry ids
   along with any other details in the task config
3. Use the output of the task to obtain any data generated by the task

In [None]:
import time

from edison_client import EdisonClient
from edison_client.models import RuntimeConfig, TaskRequest
from edison_client.models.app import JobNames

In [None]:
# Instantiate the Edison client with your API key created via the platform
EDISON_API_KEY = ""  # Add your API key here
client = EdisonClient(api_key=EDISON_API_KEY)

## File management with Edison Analysis

`Edison Analysis` is designed to run data analysis on files provided by the user or caller. To provide `Edison Analysis` with this data, 
you'll need to upload it to the Edison data storage service. This service is your one stop shop for sharing, storing and
updating data to be used in the Edison ecosystem.

In [None]:
# Uploading a single file to the data storage service
single_file_upload_response = await client.astore_file_content(
    name="Demo file entry for a single file",
    file_path="./datasets/brain_size_data.csv",  # ADD DATASET PATH HERE
    description="This is a test file that will be be analysed by Edison Analysis",
)

In [None]:
# Uploading a directory to the data storage service
directory_upload_response = await client.astore_file_content(
    name="Demo file entry for a whole directory",
    file_path="./datasets",  # ADD DATASET FOLDER PATH HERE
    description="This is a directory that will be be analysed by Edison Analysis",
    as_collection=True,
)

## Running Your Job

When running a `Edison Analysis` job there are some considerations to take with how you configure the agent. The first things 
to note are the core configuration settings like `language`, `max_steps` and `query`. In addition to these core settings you have some
other options too. The key ones are listed below:

### Additional tools available:
- `query_ensembl`: query the Ensembl database
- `get_convert_gene`: for converting gene IDs from one type to another, for example Ensembl, Entrez, Refseq.
- `search_web`: expose exa.ai (/search) web search as a tool
- `crawl_web`: expose exa.ai (/contents) web crawl as a tool
- `research_web`: expose exa.ai (/research) web research as a tool
- `query_literature`: allow `Edison Analysis` to do calls to `Edison Literature` for literature search

- You can add in either user or system prompt for tool usage. For example: "Use the query_literature tool to compare your findings against published literature."

### Modifying system prompt
There are two options to modify the system prompt:
1. Replace the existing system prompt completely using `prompting_config["system_prompt"]`
2. Append additional guideline to existing system prompt using `prompting_config["system_prompt_additional_guidelines]`

Build the `prompting_config` dictionary then assign it to the `"prompting_config"` key within `environment_config`

In [None]:
# Define your task
USER_QUERY = "Teach me something new about crows."  # The actual query you want Edison Analysis` to run
SYSTEM_PROMPT = ""  # By setting this, you will replace the system prompt entirely.
SYSTEM_PROMPT_ADDITIONAL_GUIDELINES = (
    "Make all figures in dark mode."  # This will be appended to the system prompt
)
_SYSTEM_PROMPT_CONFIG = {
    "system_prompt": SYSTEM_PROMPT,
    "system_prompt_additional_guidelines": SYSTEM_PROMPT_ADDITIONAL_GUIDELINES,
}
LANGUAGE = "PYTHON"  # Choose between "R" and "PYTHON"
MAX_STEPS = 30  # You can change this to impose a limit on the number of steps the agent can take

In [None]:
# Create a task
task_data = TaskRequest(
    name=JobNames.ANALYSIS,
    query=USER_QUERY,
    runtime_config=RuntimeConfig(
        max_steps=MAX_STEPS,
        environment_config={
            "language": LANGUAGE,
            "prompting_config": {
                k: v for k, v in _SYSTEM_PROMPT_CONFIG.items() if v
            },  # See above for documentation
            "data_storage_uris": [
                f"data_entry:{directory_upload_response.data_storage.id}"
            ],
            "additional_tools": None,  # See above for options
        },
    ),
)
trajectory_id = client.create_task(task_data)
print(
    f"Task running on platform, you can view progress live at:https://platform.edisonscientific.com/trajectories/{trajectory_id}"
)

In [None]:
# Jobs take on average 3-10 minutes to complete
# We also have inbuilt support for polling, asynchronous tasks and other utilities documented here:
# https://edisonscientific.gitbook.io/edison-cookbook/edison-client
status = "in progress"
while status in {"in progress", "queued"}:
    status = client.get_task(trajectory_id).status
    time.sleep(15)

if status == "failed":
    raise RuntimeError("Task failed")

job_result = client.get_task(trajectory_id, verbose=True)
answer = job_result.environment_frame["state"]["state"]["answer"]
print(f"The agent's answer to your research question is: \n{answer}")

## Download Task Output

While the task is executing it will create some artifacts. First the notebook 
which is where the analysis code will be written and any other artifacts creating during the task.

Once the task has completed you may want to check the contents of the notebook or look through the artifacts generated. 
To obtain these artifacts, you will need to inspect the output of the agent's final `environment_frame`

In [None]:
output_data = job_result.environment_frame["state"]["info"]["output_data"]
print(output_data)

In [None]:
for output_file in output_data:
    download_response = await client.afetch_data_from_storage(
        data_storage_id=output_file["entry_id"]
    )

    # Note there are two potential outcomes here. One where the client downloads
    # the file to your local filesystem if it's above ~10MB. The second is where
    # it will return a RawFetchResponse object which contains the raw content.
    print(download_response)