In [1]:
!pip install gpt4all openai -q

**Example for validation ground_true**

```json
           {
                "id": "6",
                "name": "PFischbeck/parameter-fitting-experiments",
                "url": "https://raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md",
                "n_plans": 2,
                "plan_nodes": [
                    {
                        "type": "Source",
                        "plan_step": [
                            "Step 1: Make sure you have Python, Pip and R installed.",
                            "Step 2: Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs.",
                            "Step 3: Install the R dependencies (used for plots).",
                            "Step 4: Checkout the repository.",
                            "Step 5: Install the python dependencies.",
                            "Step 6: Download the file at https://doi.org/10.5281/zenodo.10629451.",
                            "Step 7: Extract the content",
                            "Step 8: Place it into the folder `input_data/konect`"
                        ],
                        "commands": [
                            "",
                            "",
                            "R -e 'install.packages(c(ggplot2", "reshape2", "plyr", "dplyr", "scales), repos=https://cloud.r-project.org/",
                            "",
                            "pip3 install -r requirements.txt",
                            "",
                            "",
                            ""
                        ]
                    },
                    {
                        "type": "Binary",
                        "plan_step": [
                            "Step 1(Optional):Download the file `output-data.zip` from (https://doi.org/10.5281/zenodo.10629451)",
                            "Step 2: extract its contents into the folder `output_data`."
                        ]
                    }
                ],
                "readme_instructions": "",
                "type": "easy"
            },
``````

In [3]:
import json
import openai
import config
api_key = config.API_KEY

import os
import re
import sys
import time
import urllib.request

## 1. - TYPE OF PLANS PROMPT

#### Baseline Zero Shot
```py
prompt_TEXT = """ Given a INSTALL_TEXT, detect the class of plan for the installation of a software."""
prompt_URL =  """ Given a INSTALL_TEXT, detect the class of plan for the installation of a software."""
```

---
**GEMMA: Gemini-based open model**

ref:
[https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)

In [4]:
# Given a URL (no labels), Natural language response
from openai import OpenAI
client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1",
    api_key= api_key # your key hf
)
MODEL = "google/gemma-7b-it"

In [6]:
def get_code_completion(messages, max_tokens=512, model=MODEL):
    chat_completion = client.chat.completions.create(
        messages=messages,
        model=model,
        max_tokens=max_tokens,
        # stop=[
        #     "<step>"
        # ],
        # frequency_penalty=1,
        # presence_penalty=1,
        # top_p=0.7,
        # n=10,
        # temperature=1,
    )

    return chat_completion

In [10]:
api_key = config.API_KEY
SYSTEM_PROMPT = "You are a smart and intelligent Named Entity Recognition (NER) system. I will provide you some task to identify entities in a text"
USER_PROMPT_1 = "Detect the TYPE of plan for the installation of a software in the README_INSTALLATION."
ASSISTANT_PROMPT_1 = "Sure, I'm ready to help you with your NER task. Please provide me with the necessary information"
GUIDELINES_PROMPT = (
    "Entity Definition:\n"
    "1. README_INSTALLATION: a excerpt of the text found in a readme file of a research software tool"
    "2. TYPE: represents the concept of a installation method type as a plan, which is composed of steps, which must be executed in a given order. A installation method is an instance of the Plan concept. There are four type of plans: binary, package manager, source and container. A research software readme installation instruction could refer to one or multiple methods\n"
    "3. PLAN_STEP: represents a list of planned action(s) as part of a Plan to be executed by an Agent. These are a list of indivisible sequence of actions that must executed without interruption. Step within a Plan could be linked to one specific executable operation, or refer to a group of operations. A Step then could invoke more than one action. Each sentence in a readme is an instance of the Step concept.\n"
    "4. TECHNOLOGY: repesents the concept of a particular operating sytem or package management correspoding to a particular set of PLAN_STEPs in a specific TYPE.\n"
    "5. N_PLANS: count the number of unique plan TYPE"
    "\n"
    "Output Format:\n"
    "{{'TYPE': [name of the TYPE plans present], 'PLAN_STEP': [list of steps], 'TECHNOLOGY': [name of the technology used for that specific TYPE], 'N_PLANS': [number of unique plan TYPE]}}"
)

In [8]:
messages = [
        {"role": "user", "content": "Behave as an expert labeler. \
          Given a README, detect the class of plan for the installation of a software in the text of the. \n Definition:\n 1. README_INSTALLATION: a excerpt of the text found in a readme file of a research software tool \n 2. TYPE: represents the concept of a installation method type as a plan, which is composed of steps, which must be executed in a given order. A installation method is an instance of the Plan concept. There are four type of plans: binary, package manager, source and container. A research software readme installation instruction could refer to one or multiple methods\n 3. PLAN_STEP: represents a list of planned action(s) as part of a Plan to be executed by an Agent. These are a list of indivisible sequence of actions that must executed without interruption. Step within a Plan could be linked to one specific executable operation, or refer to a group of operations. A Step then could invoke more than one action. Each sentence in a readme is an instance of the Step concept.\n 4. TECHNOLOGY: repesents the concept of a particular operating sytem or package management correspoding to a particular set of PLAN_STEPs in a specific TYPE.\n 5. N_PLANS: count the number of unique plan TYPE \n README = # Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with ```pip3 install -r requirements.txt``` - Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs \ - Install the R dependencies (used for plots) with  ```R -e install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/) Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect` Optional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`. This way, you can access all experiment results without running them yourself. The Output Format in JSON file:\n {{'TYPE': [name of the TYPE plans present], 'PLAN_STEP': [list of steps], 'TECHNOLOGY': [name of the technology used for that specific TYPE], 'N_PLANS': [number of unique plan TYPE]}} \
         "},
]

chat_completion = get_code_completion(messages)
            
print(chat_completion.choices[0].message.content)

Sure, the class of plan for the installation of the software in the text of the README is:

**Package Manager**

The text describes an installation plan that uses a package manager to install dependencies.


In [34]:
# Given a URL (no labels), Natural language response
from openai import OpenAI
client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1",
    api_key= api_key # your key hf
)

chat_completion = client.chat.completions.create(
    model="google/gemma-7b-it",
    messages=[
        {"role": "user", "content": "Behave as an expert labeler. \
          Given a URL, detect the class of plan for the installation of a software in the text of the URL. URL = https://raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md"},        
    ],
    stream=True,
    max_tokens=500
)

for message in chat_completion:
    print(message.choices[0].delta.content, end="")

**URL:**  raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md

**Class of plan:** Software installation plan

**Reasoning:**

The text of the URL contains the following clues that indicate it is for a software installation plan:

* **.md** file extension, which is commonly used for Markdown files, which are often used to document software installation instructions.
* **"Readme.md"** file name, which is a convention for the main documentation file for a project.
* **"parameter-fitting-experiments"** repository name, which suggests the software is related to data science or machine learning.
* **"Install"** word in the URL, which is a common keyword used in installation instructions.

**Therefore, based on the text of the URL, the class of plan for the installation of software in this case is software installation plan.**<eos>

In [20]:
# Given a URL (with labels), natural language
client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1",
    api_key= api_key
)

chat_completion = client.chat.completions.create(
    model="google/gemma-7b-it",
    messages=[
        {"role": "user", "content": "Behave as an expert labeler. \
          Given a URL, detect the class of plan for the installation of a software in the text of the URL. \
         The label can only be 1 of the 4 class of plans which are binary, package manager, container and source.\
         URL = https://raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md"},        
    ],
    stream=True,
    max_tokens=500
)

for message in chat_completion:
    print(message.choices[0].delta.content, end="")

Here is the label for the plan class of the installation of the software in the text of the URL:

URL:  https://raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md

The text of the URL does not describe any software installation process or plan details, therefore I cannot detect the class of plan for the installation of the software in the text of the URL.<eos>

In [22]:
# Given a INSTALL_TEXT (no labels), natural language
from openai import OpenAI
client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1",
    api_key= api_key
)

chat_completion = client.chat.completions.create(
    model="google/gemma-7b-it",
    messages=[
        {"role": "user", "content": "Behave as an expert labeler. \
          Given a INSTALL_TEXT, detect the class of plan for the installation of a software. \
         INSTALL_TEXT = # Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with \
         ```pip3 install -r requirements.txt``` \
         - Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs \
         - Install the R dependencies (used for plots) with \
         ```R -e 'install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/)'``` \
         - Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect` \
         - Optional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`. This way, you can access all experiment results without running them yourself."},        
    ],
    stream=True,
    max_tokens=500,
)

for message in chat_completion:
    print(message.choices[0].delta.content, end="")

## Plan Class Detection

Based on the provided text, the class of the plan for installing software is **shell script**. 

Here's why:

* **INSTALL_TEXT** clearly describes a process of installing software via shell commands.
* The text primarily focuses on installing Python, Pip, R, `pygirgs`, and dependencies.
* The text does not involve any steps related to other types of software installation methods, such as graphical interfaces or package managers.
* Therefore, based on the content and purpose of the text, it is most likely to be a shell script plan.<eos>

In [6]:
# Given a Install_text (with labels), natural language response
client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1",
    api_key= ""
)

chat_completion = client.chat.completions.create(
    model="google/gemma-7b-it",
    messages=[
        {"role": "user", "content": "Behave as an expert labeler. \
          Given a INSTALL_TEXT, print the INSTALL_TEXT and detect the class of plan for the installation of a software. \
         The label can only be 1 of the 4 class of plans which are binary, package manager, container and source. \
         INSTALL_TEXT = # Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with \
         ```pip3 install -r requirements.txt``` \
         - Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs \
         - Install the R dependencies (used for plots) with \
         ```R -e 'install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/)'``` \
         - Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect` \
         - Optional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`. This way, you can access all experiment results without running them yourself."},        
    ],
    stream=True,
    max_tokens=500,
)

for message in chat_completion:
    print(message.choices[0].delta.content, end="")

**INSTALL_TEXT:**

# Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with `pip3 install -r requirements.txt` - Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs - Install the R dependencies (used for plots) with `R -e 'install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/)'` - Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect` - Optional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`. This way, you can access all experiment results without running them yourself.

**Class of plan:**

This text describes a plan that involves installing software dependencies using `pip` and `R` package manager. Therefore, the class of plan is **package manager**.<eos>

---
**Mistral orca-2 Model**

ref: [mistral-7b-openorca.gguf2.Q4_0.gguf](mistral-7b-openorca.gguf2.Q4_0.gguf)

In [7]:
from gpt4all import GPT4All
model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf")
import logging
logging.basicConfig(level=logging.INFO)
with model.chat_session():
    prompt = 'Behave as an expert labeler. \ Given the indicated URL, detect the class of plan for the installation of a software in the text of the URL.\
    The output can only be 1 of the 4 class of plans which are binary, package manager, container and source. \
    URL: https://raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md'
    # print("PROMPT: ", prompt)
    response = model.generate(prompt=prompt, temp=0)
    print("RESPONSE", response)

INFO:gpt4all._pyllmodel:LLModel.prompt_model -- prompt:
### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.

### User:
Behave as an expert labeler. \ Given the indicated URL, detect the class of plan for the installation of a software in the text of the URL.    The output can only be 1 of the 4 class of plans which are binary, package manager, container and source.     URL: https://raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md
### Response:

===/LLModel.prompt_model -- prompt/===


RESPONSE  The URL indicates that the installation plan is for a software that uses a package manager. Therefore, the output is "package manager".


In [9]:
from gpt4all import GPT4All
model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf")
import logging
logging.basicConfig(level=logging.INFO)
with model.chat_session():
    prompt = 'Behave as an expert labeler. \ Given the indicated INSTALL_TEXT, please identify the class of plan of installation.\
    The output can only be 1 of the 4 class of plans which are binary, package manager, container and source. \
    INSTALL_TEXT = # Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with \
         ```pip3 install -r requirements.txt``` \
         - Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs \
         - Install the R dependencies (used for plots) with  ```R -e install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/) \
          Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect` \
            Optional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`. This way, you can access all experiment results without running them yourself.'
         
    print("PROMPT: ", prompt)
    response = model.generate(prompt=prompt, temp=0)
    print("Response", response)

INFO:gpt4all._pyllmodel:LLModel.prompt_model -- prompt:
### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.

### User:
Behave as an expert labeler. \ Given the indicated INSTALL_TEXT, please identify the class of plan of installation.    The output can only be 1 of the 4 class of plans which are binary, package manager, container and source.     INSTALL_TEXT = # Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with          ```pip3 install -r requirements.txt```          - Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs          - Install the R dependencies (used for plots) with  ```R -e install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/)           Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect`

PROMPT:  Behave as an expert labeler. \ Given the indicated INSTALL_TEXT, please identify the class of plan of installation.    The output can only be 1 of the 4 class of plans which are binary, package manager, container and source.     INSTALL_TEXT = # Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with          ```pip3 install -r requirements.txt```          - Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs          - Install the R dependencies (used for plots) with  ```R -e install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/)           Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect`             Optional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `outp

**One Shot Learning**

**Chain of Thought Prompting**


In addition to the four experiments above, we look at a fifth experiment using a state tracking chain of thought prompting technique in a natural language one shot setting. Within this configuration, we provide an annotated example where each action is annotated with the state prior to the action, the reason for why the action is applicable in the prior state, and the resulting state after applying the action. After the example, a meta-explanation about plan correctness is provided. The LLM is then asked to return a response making the same state tracking and justification annotations that were included in the example

---
## 2. - LIST OF STEPS PROMPT

#### Baseline Zero Shot
```py
prompt = """ Provide a detailed list of the Steps in the given Research Software which refers to "[software-name]" plan, including any optional or commands for each Step? """
```

#### One Shot Learning:

The prompt additionally contains an example instance of a research software installation plan (consisting of a description of the initial step and the end) and the corresponding plan (which ends with a tag, referred to as the plan-end tag, that denotes the end of the plan). The prompt is formatted in natural language (or **JSON/RDF format**?)

**JSON**

```py
prompt = """\nGiven a TEXT, I want you generate task steps and plan type of installation.\
    The format must in a strict JSON format : {[{"plan_type": "detect the method of installation", "task_steps": [a ordered list of steps to install], "commands": [ a concise list of commands for the tool.  \
    If you do not find software installation steps in the TEXT, return "none".\
    Annotate the following TEXT  """
        prompt += """\n\n# REQUIREMENTS #: \n1. the generated task steps and plan types allign with # RESEARCH SOFTWARE TOOL # perfectly. Task name must be detected from the README file; \n2. the task steps should strictly aligned with the plan type, and the number of task steps should be same with the readmes;  \n4.  RESEARCH SOFTWARE TOOL command should be align with the input-type field of # TASK LIST #;"""
```

**ONTOLOGY**

```py

prompt = """ Here is the complete definition of the procedural RDF ontology in TTL format which defines Plans and Steps and relationships between each other:

@prefix p-plan: <http://purl.org/net/p-plan#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix bpmn: <http://www.w3.org/ns/bpmn#> .

[.....]


Using the provided ontology, please create specific instances and data about plans representing the steps and  for the [-] software installation procedure. Include relationships such as 'isStepOfPlan' to model the hierarchy of main steps and their substeps. Additionally, create the RDF graph that represents the defined individuals and their relationships.  Refer to the RDF graph as Context, please provide an Answer to the Question below."""

```

---
**GEMMA: Gemini-based open model**

ref:
[https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)

**- Baseline Zero Shot**

In [10]:
# Given a URL, Natural language response
from openai import OpenAI
client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1",
    api_key= api_key
)

chat_completion = client.chat.completions.create(
    model="google/gemma-7b-it",
    messages=[
        {"role": "user", "content": "Behave as an expert labeler. \
          Given a URL, I want you generate a list of steps for the plan of installation. Please indicate the number of the steps. URL = https://raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md"},        
    ],
    stream=True,
    max_tokens=500
)

for message in chat_completion:
    print(message.choices[0].delta.content, end="")

## Installation Steps for Parameter Fitting Experiments (URL: raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md)

**1. Set Up Environment:**
- Python 3.6+
- Jupyter Notebook or PyCharm
- TensorFlow 2.0+

**2. Install Dependencies:**
- pip install tensorflow-gpu tf-keras keras-applications pyyaml tqdm matplotlib

**3. Clone Repository:**
- If you haven't already, clone the repository:
```
git clone git@github.com:PFischbeck/parameter-fitting-experiments.git
```

**4. Create a Virtual Environment:**
- To isolate dependencies for the project, create a virtual environment and activate it.

**5. Prepare Data:**
- You may need to download the data used in the experiments. Download the data from the provided URL or mirror and place it in the `data` folder within the project directory.

**6. Run the Experiment:**
- After activating the virtual environment, navigate to the `experiments` folder in the project directory.
- Select a specific experiment script from t

In [8]:
# given a URL, JSON object response
from openai import OpenAI
client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1",
    api_key= api_key
)



chat_completion = client.chat.completions.create(
    model="google/gemma-7b-it",
    messages=[
        {"role": "user", "content": "Behave as an expert labeler. \
          Given a README_TEXT, \
          I want you generate a list of steps for the plan of installation. \
          Please indicate the number of the steps. The RESPONSE Format for your response should be in JSON file as following: \
         \n {{'PLAN_STEP': [`Step 1: give a number of the step. Each step separated with commas], {{`NUM_STEP`: [count of unique steps]}}}} \n \
         Definition:\n 1. README_INSTALLATION: a excerpt of the text found in a readme file of a research software tool \n 2.PLAN_STEP: represents a list of planned action(s) in sequential orden as part of a installation plan to be executed by an Agent. These are a list of indivisible sequence of actions that must executed without interruption. Step within a Plan could be linked to one specific executable operation, or refer to a group of operations. A Step then could invoke more than one action. Each sentence in a readme is an instance of the Step concept.\n \
         3. N_PLANS: count the number of unique plan TYPE \n. \
         Here is the README_TEXT = # Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with ```pip3 install -r requirements.txt``` - Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs \ - Install the R dependencies (used for plots) with  ```R -e install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/) Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect` Optional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`. This way, you can access all experiment results without running them yourself."},
    ],
    stream=True,
    max_tokens=500,
    response_format={"type": "json_object"},
)

for message in chat_completion:
    print(message.choices[0].delta.content, end="")

```json
{
  "PLAN_STEP": [
    "Make sure you have Python, Pip and R installed.",
    "Checkout this repository",
    "Install the python dependencies with ```pip3 install -r requirements.txt```",
    "Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs",
    "Install the R dependencies (used for plots) with  ```R -e install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/)",
    "Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect`",
    "Optional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`"
  ],
  "NUM_STEP": 8
}
```

**Explanation:**

- The `PLAN_STEP` list includes each step in the installation plan.
- The number of unique steps is 8.
- Each step is a separate action to be performed in sequence.
- Some steps involve installing 

In [40]:
# given a URL, JSON object response
from openai import OpenAI
client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1",
    api_key= api_key
)

example_json = {
  "plan_nodes": [
                    {
                        "num_steps": "# total count number of unique steps",
                        "plan_step": [
                            "Step 1: # describe step 1:.",
                            "Step 2: # describe step 2."
                        ]
                    }
                ]
}

prompt= "Given a README_TEXT, I want you generate a valid JSON output with the list of steps for the plan of installation. \
README_TEXT = # Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with ```pip3 install -r requirements.txt``` - Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs \ - Install the R dependencies (used for plots) with  ```R -e install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/) Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect` Optional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`. This way, you can access all experiment results without running them yourself."

chat_completion = client.chat.completions.create(
    model="google/gemma-7b-it",
    messages=[
        {"role": "user", "content": "Provide output in valid JSON. The data schema should be like this:"+json.dumps(example_json)},
        { "role": "system", "content": prompt}
    ],
    stream=True,
    max_tokens=500,
    response_format={"type": "json_object"},
)

for message in chat_completion:
    print(message.choices[0].delta.content, end="")



```json
{"plan_nodes": [
    {
      "num_steps": "4",
      "plan_step": [
        "Step 1: Make sure you have Python, Pip and R installed.",
        "Step 2: Checkout this repository",
        "Step 3: Install the python dependencies with `pip3 install -r requirements.txt`",
        "Step 4: Install the `pygirgs` package and R dependencies (used for plots) with  ```R -e install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/)` "
      ]
    }
  ]
}
```<eos>

In [62]:
from openai import OpenAI
client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1",
    api_key= api_key
)

example_json = {
  "plan_nodes": [
                    {
                        "num_steps": "float, # count number of unique steps",
                        "plan_step": []
                    }
                ]
}

README_TEXT = "# Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with `pip3 install -r requirements.txt ` - Install the `pygirgs` package at <https://github.com/PFischbeck/pygirgs> - Install the R dependencies (used for plots) with  `R --vanilla CMD INSTALL ggplot2 reshape2 plyr dplyr scales ` - Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect` \nOptional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`. This way, you can access all experiment results without running them yourself."

# def create_json_response(plan_node):
#     return {
#       "plan_nodes": [
#           {
#               "num_steps": (plan_node["plan_step"]),
#               "plan_step": plan_node["plan_step"]
#           }
#       ]
#     }

def extract_steps(text):
    response = client.chat.completions.create(
      model="google/gemma-7b-it",
      messages=[
          {"role": "user", "content": "You are a helpful assistant that extracts installation steps from a given README text."},
          {"role": "system", "content": f"Extract the installation in a JSON format as {example_json} steps from the following README text:\n{text}"}
      ],
    # stream=True,
    max_tokens=500,
    response_format={"type": "json_object"},  
    )


    steps = response.choices[0].message.content.strip().split('\n')
    plan_step = [step.strip() for step in steps if step.strip()]
    return {"plan_step": plan_step}


plan_node = extract_steps(README_TEXT)
# final_json = create_json_response(plan_node)

print("Generated JSON:", json.dumps(plan_node, indent=4))


with open('outputfile.json', 'w') as outf:
    json.dump(plan_node, outf, indent=4)

Generated JSON: {
    "plan_step": [
        "```json",
        "{'plan_nodes': [{'num_steps': 'float, # count number of unique steps', 'plan_step': [",
        "'Make sure you have Python, Pip and R installed.',",
        "'Checkout this repository',",
        "'Install the python dependencies with `pip3 install -r requirements.txt`',",
        "'Install the `pygirgs` package at <github.com/PFischbeck/pygirgs>',",
        "'Install the R dependencies (used for plots) with  `R --vanilla CMD INSTALL ggplot2 reshape2 plyr dplyr scales`',",
        "'Download the file `konect-data.zip` from [Zenodo](doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect`',",
        "'Optional: Download the file `output-data.zip` from [Zenodo](doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`",
        "]}]}",
        "```"
    ]
}


In [13]:
# Given a INSTALL_TEXT, JSON object response
from openai import OpenAI
client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1",
    api_key= api_key
)

chat_completion = client.chat.completions.create(
    model="google/gemma-7b-it",
    messages=[
        {"role": "user", "content": "Behave as an expert labeler. \
          Given a INSTALL_TEXT, generate an example json object describing the Steps of installation of the Research Software. Please indicate the total number of the Steps. \
         INSTALL_TEXT = # Installation - Make sure you have Python, Pip and R installed. - Checkout this repository - Install the python dependencies with \
         ```pip3 install -r requirements.txt``` \
         - Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs \
         - Install the R dependencies (used for plots) with \
         ```R -e 'install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/)'``` \
         - Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect` \
         - Optional: Download the file `output-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `output_data`. This way, you can access all experiment results without running them yourself."},        
    ],
    stream=True,
    max_tokens=500,
    response_format={"type": "json_object"},
)

for message in chat_completion:
    print(message.choices[0].delta.content, end="")

## JSON object for installing research software from the text:

```json
{
  "total_steps": 6,
  "steps": [
    "Make sure you have Python, Pip and R installed.",
    "Checkout this repository",
    "Install the python dependencies with `pip3 install -r requirements.txt`",
    "Install the `pygirgs` package at https://github.com/PFischbeck/pygirgs",
    "Install the R dependencies (used for plots) with `R -e 'install.packages(c(ggplot2, reshape2, plyr, dplyr, scales), repos=https://cloud.r-project.org/)'",
    "Download the file `konect-data.zip` from [Zenodo](https://doi.org/10.5281/zenodo.10629451) and extract its contents into the folder `input_data/konect`. This way, you can access all experiment results without running them yourself. The file `output-data.zip` can be optionally downloaded and extracted into the folder `output_data`."
  ]
}
```

**Total number of steps:** 6<eos>

---

**Mistral orca-2 Model**

ref: [mistral-7b-openorca.gguf2.Q4_0.gguf](mistral-7b-openorca.gguf2.Q4_0.gguf)

In [10]:
from gpt4all import GPT4All
import logging
logging.basicConfig(level=logging.INFO)
model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf")
with model.chat_session('You are an expert label.\nBe terse.',
                        '### Instruction:\n{0}\n### Response:\n'):
    prompt = ' Given a TEXT, I want you generate task steps and plan type of installation.\
    The format must in a strict JSON format : {[{"plan_type": "detect the method of installation", "task_steps": [a ordered list of steps to install], "commands": [ a concise list of commands for the tool.  \
    If you do not find software installation steps in the TEXT, return "none".\
    Annotate the following TEXT: # Installation ### Dependencies Initialize git submodules with `` git submodule init git submodule update``` \
    Install the specific versions of every package from `requirements.txt` in a new conda environment:\
    conda create --name gsft python=3.9 \
    conda activate gsft\
    pip install -r requirements.txt\
    To ensure that Python paths are properly defined, update the `~/.bashrc` by adding the following lines\
    export GSFT_PATH=/path_to_gsfc\
    export PYTHONPATH=$PYTHONPATH:/$GSFT_PATH'
    print("PROMPT: ", prompt)
    response = model.generate(prompt=prompt, temp=0.6)
    print("Response", response)

INFO:gpt4all._pyllmodel:LLModel.prompt_model -- prompt:
You are an expert label.
Be terse.

### Instruction:
 Given a TEXT, I want you generate task steps and plan type of installation.    The format must in a strict JSON format : {[{"plan_type": "detect the method of installation", "task_steps": [a ordered list of steps to install], "commands": [ a concise list of commands for the tool.      If you do not find software installation steps in the TEXT, return "none".    Annotate the following TEXT: # Installation ### Dependencies Initialize git submodules with `` git submodule init git submodule update```     Install the specific versions of every package from `requirements.txt` in a new conda environment:    conda create --name gsft python=3.9     conda activate gsft    pip install -r requirements.txt    To ensure that Python paths are properly defined, update the `~/.bashrc` by adding the following lines    export GSFT_PATH=/path_to_gsfc    export PYTHONPATH=$PYTHONPATH:/$GSFT_PATH
##

PROMPT:   Given a TEXT, I want you generate task steps and plan type of installation.    The format must in a strict JSON format : {[{"plan_type": "detect the method of installation", "task_steps": [a ordered list of steps to install], "commands": [ a concise list of commands for the tool.      If you do not find software installation steps in the TEXT, return "none".    Annotate the following TEXT: # Installation ### Dependencies Initialize git submodules with `` git submodule init git submodule update```     Install the specific versions of every package from `requirements.txt` in a new conda environment:    conda create --name gsft python=3.9     conda activate gsft    pip install -r requirements.txt    To ensure that Python paths are properly defined, update the `~/.bashrc` by adding the following lines    export GSFT_PATH=/path_to_gsfc    export PYTHONPATH=$PYTHONPATH:/$GSFT_PATH
Response  {[{"plan_type": "software installation", "task_steps": [
"Determine the specific version of 

PROMPT:  Behave as an expert labeler. \ Given a URL, I want you generate task steps and plan type of installation.    The format must in a strict JSON format : {"task_steps": [ step description of one or more steps ], "plan_nodes": [{"plan_type": "detect the method of installation", "commands": [ a concise list of commands for the tool.     URL: https://raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md
Response  {
"task_steps": [
 {"step": "Detect the method of installation",
 "commands": ["sudo apt-get update && sudo apt-get install pf-script"]
}
],
"plan_nodes": [
 {"plan_type": "Installation Plan",
 "instructions": [
 "1. Download the latest version of the package manager for your operating system from the official website.",
 "2. Open the terminal and run the command `sudo apt-get update` to check if there are any updates available.",
 "3. Run the command `sudo apt-get install pf-script` to download and install the package manager."
 ]}
]
}


In [11]:
from gpt4all import GPT4All
model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf")
with model.chat_session():
    prompt = 'Behave as an expert labeler. \ Given a URL, I want you generate task steps and plan type of installation.\
    The format must in a strict JSON format : {"task_steps": [ step description of one or more steps ], "plan_nodes": [{"plan_type": "detect the method of installation", "commands": [ a concise list of commands for the tool. \
    URL: https://raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md'
    # print("PROMPT: ", prompt)
    response = model.generate(prompt=prompt, temp=0)
    print("Response", response)

INFO:gpt4all._pyllmodel:LLModel.prompt_model -- prompt:
### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.

### User:
Behave as an expert labeler. \ Given a URL, I want you generate task steps and plan type of installation.    The format must in a strict JSON format : {"task_steps": [ step description of one or more steps ], "plan_nodes": [{"plan_type": "detect the method of installation", "commands": [ a concise list of commands for the tool.     URL: https://raw.githubusercontent.com/PFischbeck/parameter-fitting-experiments/main/Readme.md
### Response:

===/LLModel.prompt_model -- prompt/===


Response  {
"task_steps": [
 {"step": "Detect the method of installation",
 "commands": ["sudo apt-get update && sudo apt-get install pf-script"]
}
],
"plan_nodes": [
 {"plan_type": "Installation Plan",
 "instructions": [
 "1. Download the latest version of the package manager for your operating system from the official website.",
 "2. Open the terminal and run the command `sudo apt-get update` to check if there are any updates available.",
 "3. Run the command `sudo apt-get install pf-script` to download and install the package manager."
 ]}
]
}


## 2. - TYPE OF PLANS PROMPTS

#### Baseline-  Zero Shot
```py
prompt = """ Classify the text into LABELS [source, binary, container, and package manager]"""
```

In [20]:
url = "https://raw.githubusercontent.com/PFischbeck/parameter-fission-examples/main/Readme.md"
question = "give me the type of plans"  # replace with your query
response = get_response_from_gemma(question, url)
print(json.dumps(response, indent=2))


NameError: name 'get_response_from_gemma' is not defined