Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Pip Installation Issue in Terminal #1707

Closed
wants to merge 18 commits into from
Closed

Conversation

jiangleo
Copy link
Contributor

@jiangleo jiangleo commented May 11, 2024

This PR addresses a issue where the CodeAct Agent could not install any pip library through the terminal, though installations were possible via Jupyter.

Changes Made:

  1. Dockerfile Update:

    • Created and activated a virtual environment to ensure consistent pip installation across terminal and Jupyter.
      # create and activate virtual environment
      ENV VIRTUAL_ENV=/app/.venv \
          PATH="$VIRTUAL_ENV/.venv/bin:$PATH"
      RUN python3 -m venv $VIRTUAL_ENV
  2. README Update:

    • Corrected the Docker build path to accurately reflect the location of the Dockerfile.
      docker build -f containers/sandbox/Dockerfile -t opendevin/sandbox:v0.1 .

This should resolve the installation issues and improve the overall setup consistency for all users. For reference, see the related issue discussed in the community: Issue #1627.

@codecov-commenter
Copy link

codecov-commenter commented May 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@bde12f4). Click here to learn what that means.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1707   +/-   ##
=======================================
  Coverage        ?   63.83%           
=======================================
  Files           ?       96           
  Lines           ?     3877           
  Branches        ?        0           
=======================================
  Hits            ?     2475           
  Misses          ?     1402           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@neubig
Copy link
Contributor

neubig commented May 11, 2024

@xingyaoww could you take a look at this one? It seems reasonable but I'm not 100% sure about whether this is the right approach.

@xingyaoww
Copy link
Collaborator

CodeAct Agent could not install any pip library through the terminal

Btw, I wonder if you have any screenshots / ways we could potentially reproduce this?

@neubig neubig assigned jiangleo and unassigned xingyaoww May 12, 2024
@jiangleo
Copy link
Contributor Author

jiangleo commented May 13, 2024

Btw, I wonder if you have any screenshots / ways we could potentially reproduce this?
@xingyaoww

I want to replicate the result of your example conversation, but I encountered an issue and noticed that others in the community have experienced the same problem.

Here is my local log:

I use GPT-4.
The OpenDevin version is 0.5.2.
The mode is CodeActAgent.

19:09:13 - opendevin:INFO: agent_controller.py:256 - Here is a dataset about auto MPG: "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv".
You should download and load the dataset from the URL, take a look at it, check for missing value, do a train-test split, and train a regression model predicting its MPG.
You should use everything except "mpg" and "car name" as the input features.

You can use Jupyter do it.
19:09:22 - opendevin:INFO: agent_controller.py:274 - **IPythonRunCellAction**
THOUGHT:Great! Let's start by downloading the dataset from the provided URL. I will use Python's `requests` library to fetch the CSV file and then load it into a pandas DataFrame to take a look at it. Let's begin by downloading the data.
CODE:
import requests
import pandas as pd

# URL of the dataset
url = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Save the content of the request to a CSV file
    with open('auto-mpg.csv', 'wb') as file:
        file.write(response.content)
    # Load the dataset into a pandas DataFrame
    df = pd.read_csv('auto-mpg.csv')
    # Display the first few rows of the DataFrame
    display(df.head())
else:
    print(f"Failed to download the dataset. Status code: {response.status_code}")
19:09:22 - opendevin:INFO: agent_controller.py:294 - IPythonRunCellObservation(content='---------------------------------------------------------------------------\r\nModuleNotFoundError                       Traceback (most recent call last)\r\nCell In[1], line 2\r\n      1 import requests\r\n----> 2 import pandas as pd\r\n      4 # URL of the dataset\r\n      5 url = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"\r\n\r\nModuleNotFoundError: No module named \'pandas\'', code='import requests\nimport pandas as pd\n\n# URL of the dataset\nurl = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"\n\n# Send a GET request to the URL\nresponse = requests.get(url)\n\n# Check if the request was successful\nif response.status_code == 200:\n    # Save the content of the request to a CSV file\n    with open(\'auto-mpg.csv\', \'wb\') as file:\n        file.write(response.content)\n    # Load the dataset into a pandas DataFrame\n    df = pd.read_csv(\'auto-mpg.csv\')\n    # Display the first few rows of the DataFrame\n    display(df.head())\nelse:\n    print(f"Failed to download the dataset. Status code: {response.status_code}")', observation='run_ipython')
19:09:22 - opendevin:INFO: agent_controller.py:254 - STEP 1
19:09:29 - opendevin:INFO: agent_controller.py:274 - **CmdRunAction**
THOUGHT:It seems that the `pandas` library is not installed in the current environment. Let me install it for you.
COMMAND:
pip install pandas
19:09:37 - opendevin:INFO: agent_controller.py:294 - CmdOutputObservation(content='Defaulting to user installation because normal site-packages is not writeable\r\nCollecting pandas\r\n  Using cached pandas-2.2.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (19 kB)\r\nCollecting numpy>=1.22.4 (from pandas)\r\n  Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (62 kB)\r\nRequirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.9.0.post0)\r\nCollecting pytz>=2020.1 (from pandas)\r\n  Using cached pytz-2024.1-py2.py3-none-any.whl.metadata (22 kB)\r\nCollecting tzdata>=2022.7 (from pandas)\r\n  Using cached tzdata-2024.1-py2.py3-none-any.whl.metadata (1.4 kB)\r\nRequirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\r\nUsing cached pandas-2.2.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (15.6 MB)\r\nUsing cached numpy-1.26.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.2 MB)\r\nUsing cached pytz-2024.1-py2.py3-none-any.whl (505 kB)\r\nUsing cached tzdata-2024.1-py2.py3-none-any.whl (345 kB)\r\nInstalling collected packages: pytz, tzdata, numpy, pandas\r\nSuccessfully installed numpy-1.26.4 pandas-2.2.2 pytz-2024.1 tzdata-2024.1', command_id=-1, command='pip install pandas', exit_code=0, observation='run')
19:09:37 - opendevin:INFO: agent_controller.py:254 - STEP 2
19:09:42 - opendevin:INFO: agent_controller.py:274 - **IPythonRunCellAction**
THOUGHT:Pandas has been successfully installed. Now let's try to download and load the dataset again.
CODE:
import pandas as pd

# URL of the dataset
url = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"

# Load the dataset into a pandas DataFrame
df = pd.read_csv(url)

# Display the first few rows of the DataFrame
df.head()
19:09:42 - opendevin:INFO: agent_controller.py:294 - IPythonRunCellObservation(content='---------------------------------------------------------------------------\r\nModuleNotFoundError                       Traceback (most recent call last)\r\nCell In[1], line 1\r\n----> 1 import pandas as pd\r\n      3 # URL of the dataset\r\n      4 url = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"\r\n\r\nModuleNotFoundError: No module named \'pandas\'', code='import pandas as pd\n\n# URL of the dataset\nurl = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"\n\n# Load the dataset into a pandas DataFrame\ndf = pd.read_csv(url)\n\n# Display the first few rows of the DataFrame\ndf.head()', observation='run_ipython')
19:09:42 - opendevin:INFO: agent_controller.py:254 - STEP 3
19:09:45 - opendevin:INFO: agent_controller.py:274 - **IPythonRunCellAction**
THOUGHT:It appears that the installation of `pandas` did not persist in the Jupyter Notebook environment. Let me try installing it again within the Jupyter environment.
CODE:
!pip install pandas
19:09:46 - opendevin:INFO: agent_controller.py:294 - IPythonRunCellObservation(content='Defaulting to user installation because normal site-packages is not writeable\r\r\nRequirement already satisfied: pandas in /home/opendevin/.local/lib/python3.10/site-packages (2.2.2)\r\r\nRequirement already satisfied: numpy>=1.22.4 in /home/opendevin/.local/lib/python3.10/site-packages (from pandas) (1.26.4)\r\r\nRequirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.9.0.post0)\r\r\nRequirement already satisfied: pytz>=2020.1 in /home/opendevin/.local/lib/python3.10/site-packages (from pandas) (2024.1)\r\r\nRequirement already satisfied: tzdata>=2022.7 in /home/opendevin/.local/lib/python3.10/site-packages (from pandas) (2024.1)\r\r\nRequirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)', code='!pip install pandas', observation='run_ipython')
19:09:46 - opendevin:INFO: agent_controller.py:254 - STEP 4
19:09:51 - opendevin:INFO: agent_controller.py:274 - **IPythonRunCellAction**
THOUGHT:It seems that `pandas` is already installed in the Jupyter environment. Let's try importing it again and loading the dataset.
CODE:
import pandas as pd

# URL of the dataset
url = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"

# Load the dataset into a pandas DataFrame
df = pd.read_csv(url)

# Display the first few rows of the DataFrame
df.head()
19:09:51 - opendevin:INFO: agent_controller.py:294 - IPythonRunCellObservation(content='---------------------------------------------------------------------------\r\nModuleNotFoundError                       Traceback (most recent call last)\r\nCell In[1], line 1\r\n----> 1 import pandas as pd\r\n      3 # URL of the dataset\r\n      4 url = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"\r\n\r\nModuleNotFoundError: No module named \'pandas\'', code='import pandas as pd\n\n# URL of the dataset\nurl = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"\n\n# Load the dataset into a pandas DataFrame\ndf = pd.read_csv(url)\n\n# Display the first few rows of the DataFrame\ndf.head()', observation='run_ipython')
19:09:51 - opendevin:INFO: agent_controller.py:254 - STEP 5
19:09:58 - opendevin:INFO: agent_controller.py:274 - **IPythonRunCellAction**
THOUGHT:It seems we're still encountering an issue with importing `pandas`. This could be due to the environment path not including the user site-packages directory where `pandas` is installed. To resolve this, I will try to modify the Python path to include the user site-packages directory and then attempt to import `pandas` again.
CODE:
import sys
sys.path.append('/home/opendevin/.local/lib/python3.10/site-packages')

# Now, let's try importing pandas again
import pandas as pd

# URL of the dataset
url = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"

# Load the dataset into a pandas DataFrame
df = pd.read_csv(url)

# Display the first few rows of the DataFrame
df.head()
19:10:00 - opendevin:INFO: agent_controller.py:294 - IPythonRunCellObservation(content='mpg  cylinders  displacement horsepower  weight  acceleration  model year  \\\r\n0  18.0          8         307.0        130    3504          12.0          70   \r\n1  15.0          8         350.0        165    3693          11.5          70   \r\n2  18.0          8         318.0        150    3436          11.0          70   \r\n3  16.0          8         304.0        150    3433          12.0          70   \r\n4  17.0          8         302.0        140    3449          10.5          70   \r\n\r\n   origin                   car name  \r\n0       1  chevrolet chevelle malibu  \r\n1       1          buick skylark 320  \r\n2       1         plymouth satellite  \r\n3       1              amc rebel sst  \r\n4       1                ford torino', code='import sys\nsys.path.append(\'/home/opendevin/.local/lib/python3.10/site-packages\')\n\n# Now, let\'s try importing pandas again\nimport pandas as pd\n\n# URL of the dataset\nurl = "https://huggingface.co/datasets/scikit-learn/auto-mpg/raw/main/auto-mpg.csv"\n\n# Load the dataset into a pandas DataFrame\ndf = pd.read_csv(url)\n\n# Display the first few rows of the DataFrame\ndf.head()', observation='run_ipython')
19:10:00 - opendevin:INFO: agent_controller.py:254 - STEP 6
19:10:05 - opendevin:INFO: agent_controller.py:274 - **IPythonRunCellAction**
THOUGHT:The dataset has been successfully loaded into a pandas DataFrame. The next steps are to check for missing values, perform a train-test split, and then train a regression model using the features except for "mpg" and "car name".

Let's start by checking for missing values in the dataset.
CODE:
# Check for missing values in the dataset
missing_values = df.isnull().sum()
missing_values

@jiangleo jiangleo closed this May 13, 2024
@jiangleo jiangleo reopened this May 13, 2024
@jiangleo
Copy link
Contributor Author

sorry, I accidentally clicked the wrong button. I have now reopened it.

@xingyaoww
Copy link
Collaborator

@jiangleo Oh yes!! I've definitely encountered this one.. But this seems a bit random to me - sometimes it works (like my screenshot), and sometimes it doesn't.
We temporarily patch it in #1609 by instructing the model to restart the kernel if it doesn't work. But ideally, this should just work - it works for my original demo.

If you could create a test case to reproduce this similar to this https://github.com/OpenDevin/OpenDevin/blob/main/tests/integration/test_agent.py#L73-L95, that would be awesome! We will then know if the issue is fixed by observing the test result.

@jiangleo
Copy link
Contributor Author

OpenDevin is an amazing project, and I'm happy to contribute! I will try to write an integration test for it.

@jiangleo
Copy link
Contributor Author

jiangleo commented May 14, 2024

The latest PR has added integration test cases.

This PR includes changes in three areas:

  1. Fixes the issue and adds an integration test case for it.
  2. Adds the capability to "Run Specific Agent Tests or Test Cases" to facilitate integration testing.
  3. Fixes the incorrect file path in the README.

Additionally, even though the issue is fixed, restarting the kernel is still necessary.

And, I have a few questions that need confirmation:

  1. I have submitted a PR in my jiangleo/OpenDevin fork, which automatically runs ghcr_build_and_push. Will this affect OpenDevin/OpenDevin?
  2. Occasionally, the test_ipython_installation integration test fails when run locally. I haven't found a consistent way to reproduce the issue or identify the cause of the failures. I'm unsure if this will affect the integration tests.Here is the files for both successful and failed tests.

@jiangleo
Copy link
Contributor Author

I have reproduced the error in the workflow locally, and I am preparing to try and resolve it.

@xingyaoww
Copy link
Collaborator

I have submitted a jiangleo#2 in my jiangleo/OpenDevin fork, which automatically runs ghcr_build_and_push. Will this affect OpenDevin/OpenDevin?

No - I don't think so -- it will just run on your own fork :)

Occasionally, the test_ipython_installation integration test fails when run locally. I haven't found a consistent way to reproduce the issue or identify the cause of the failures. I'm unsure if this will affect the integration tests.Here is the files for both successful and failed tests.

I don't yet have a clue, maybe due to some sandbox issues? How do you run local integration tests? by running the TEST_ONLY=true ./tests/integration/regenerate.sh?

jianghongwei and others added 3 commits May 16, 2024 14:26
… running the integration tests to ensure that the tests have no side effects.
fix: it is necessary to remove the installed packages and files after…
@jiangleo
Copy link
Contributor Author

by running the TEST_ONLY=true ./tests/integration/regenerate.sh?

Yes. And to make it easier to write/run the new integration test, I set the environment variables TARGET_AGENT and TARGET_TEST.

TARGET_AGENT="CodeActAgent" TARGET_TEST="test_ipython_installation" ./tests/integration/regenerate.sh

The issues with failed integration tests, both locally and in GitHub Actions, are due to two main causes: the statefulness of the sandbox environment and the prompt containing an excessively long installation log.

To resolve these issues, I propose two solutions:

  1. Ignore the cache to ensure that the tests are consistent on every run.
  2. Ignore the installation log that is too long, which may cause the LLM completion to fail.
pip install --no-cache-dir pandas > /dev/null 2>&1

I have already resolved these issues in my forked OpenDevin.

@jiangleo jiangleo requested a review from xingyaoww May 17, 2024 11:15
@neubig neubig assigned xingyaoww and unassigned jiangleo May 17, 2024
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jiangleo jiangleo closed this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants