diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index fa5fab3..0375329 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,5 +1,5 @@
 fail_fast: true
-exclude: '^(?!promptolution/).*$'
+exclude: '^(?!promptolution/).*$|^promptolution/templates.py'
 repos:
   - repo: https://github.com/gitleaks/gitleaks
     rev: v8.18.2
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..392c391
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright 2024 Timo Heiß, Moritz Schlager, Tom Zehle
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/README.md b/README.md
index 7b5d8d1..971fc73 100644
--- a/README.md
+++ b/README.md
@@ -33,12 +33,20 @@ Create API Keys for the models you want to use:
 - Anthropic: store token in anthropictoken.txt
 - DeepInfra (for Llama): store token in deepinfratoken.txt
 
+## Optimization Algorithms to choose from
+| **Name** | **# init population** | **Exploration** | **Costs** | **Convergence Speed** | **Parallelizable** | **Utilizes Failure Cases** |
+|:--------:|:---------------------:|:---------------:|:---------:|:---------------------:|:------------------:|:---------------------:|
+| EvoPrompt DE | 8-12 | 👍 | 💲 | ⚡⚡ | ✅ | ❌ |
+| EvoPrompt GA | 8-12 | 👍 | 💲 | ⚡⚡ | ✅ | ❌ |
+| OPro | 0 | 👎 | 💲💲 | ⚡ | ❌ | ❌ |
+
 ## Core Components
 
 - Task: Encapsulates initial prompts, dataset features, targets, and evaluation methods.
 - Predictor: Implements the prediction logic, interfacing between the Task and LLM components.
 - LLM: Unifies the process of obtaining responses from language models, whether locally hosted or accessed via API.
 - Optimizer: Implements prompt optimization algorithms, utilizing the other components during the optimization process.
+- Exemplar Selectors: Implements algorithms for the search of few shot examples that are added to the prompt.
 
 ## Key Features
 
@@ -49,6 +57,9 @@ Create API Keys for the models you want to use:
 - Integration with langchain for standardized LLM API calls
 - Detailed logging and callback system for optimization analysis
 
+
+## Getting Started
+Take a look at our getting started notebook: [getting_started.py](https://github.com/finitearth/promptolution/blob/main/notebooks/getting_started.ipynb)
 ## Reproduce our Experiments
 
 We provide scripts and configs for all our experiments. Run experiments based on config via:
diff --git a/configs/experiment_eval.ini b/configs/experiment_eval.ini
index 0a02b06..c0a3e7f 100644
--- a/configs/experiment_eval.ini
+++ b/configs/experiment_eval.ini
@@ -1,5 +1,5 @@
 [experiment]
-name = experiment_evaluation
+name = experiment_eval
 
 [target_experiment]
 name = experiment
diff --git a/dist/promptolution-0.1.1-py3-none-any.whl b/dist/promptolution-0.1.1-py3-none-any.whl
deleted file mode 100644
index c402bc8..0000000
Binary files a/dist/promptolution-0.1.1-py3-none-any.whl and /dev/null differ
diff --git a/dist/promptolution-0.1.1.tar.gz b/dist/promptolution-0.1.1.tar.gz
deleted file mode 100644
index 671eba0..0000000
Binary files a/dist/promptolution-0.1.1.tar.gz and /dev/null differ
diff --git a/docs/release-notes.md b/docs/release-notes.md
index ba6a637..65bfe94 100644
--- a/docs/release-notes.md
+++ b/docs/release-notes.md
@@ -1,3 +1,90 @@
 # Release Notes
 
+### Release v1.0.0
+### What's changed
+#### Added Features:
+* Classes for Exemplar selection (Random and RandomSearch)
+* helper functions: run_experiment, run_optimization and run_evaluation
+
+#### Further Changes:
+* removed deepinfra helper functions, as the langchain-community libary is now working as intended
+* added license
+* added release notes :)
+
+**Full Changelog**: [here](https://github.com/finitearth/promptolution/compare/v0..20...v1.0.0)
+
+## Release v0.2.0
+
+### What's Changed
+#### Added Features: 
+* Prompt creation utility function
+* Prompt variation utility function
+* New optimizer: OPro (see [arXiv paper](https://arxiv.org/abs/2309.03409))
+
+
+#### Further Changes:
+* Workflows for automated build, deployment & release
+* New documentation page appearance
+* Additional Docstrings & Formatting
+
+**Full Changelog**: [here](https://github.com/finitearth/promptolution/compare/v0.1.1...v0.2.0)
+
+## Release v0.1.1 (2)
+
+### What's Changed
+
+#### Added features:
+\-
+
+#### Further changes:
+* Added workflows for automated build, deployment, release and doc creation
+* Updated pre-commits
+* Added docstrings and formatting
+* Updated readme
+* Updated docs
+
+**Full Changelog**: [here](https://github.com/finitearth/promptolution/compare/0.1.1...v0.1.1)
+
+## Release v0.1.1
+
+### What's Changed
+
+#### Features added:
+\-
+
+#### Further changes:
+* Loosen restrictive python version requirements (^3.11 instead of ~3.11)
+* Add documentation pages
+* Update README
+
+**Full Changelog**: [here](https://github.com/finitearth/promptolution/compare/0.1.0...0.1.1)
+
+## Release v0.1.0
+
+*First release*
+
+### What's Changed
+
+#### Added Features:
+* Base classes for tasks, LLMs, predictors, and optimizers
+* Classification task
+* API LLMs from OpenAI, Anthropic, and DeepInfra
+* Local LLM
+* optimizer EvoPrompt GA and EvoPrompt DE (see [arXiv paper](https://arxiv.org/abs/2309.08532))
+
+#### Further changes:
+* Added example classification datasets used in the [EvoPrompt paper](https://arxiv.org/abs/2309.08532)
+* Added dummy classes for testing
+* Added example scripts and configs for experiments
+* Added experiment results and evaluation notebooks
+
+**Full Changelog**: [here](https://github.com/finitearth/promptolution/commits/0.1.0)
+<<<<<<< HEAD
+=======
+coming soon...
+>>>>>>> main
+=======
 coming soon...
+>>>>>>> parent of e23dd74 (Chore/docs release notes (#18))
+=======
+>>>>>>> parent of 25639c9 (Merge branch dev accepting all incoming changes)
diff --git a/notebooks/getting_started.ipynb b/notebooks/getting_started.ipynb
new file mode 100644
index 0000000..0c38f11
--- /dev/null
+++ b/notebooks/getting_started.ipynb
@@ -0,0 +1,260 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Getting started"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Before you start"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Custom Dataset\n",
+    "If you want to run prompt optimization on your own dataset, follow these steps:\n",
+    "\n",
+    "1. Create a folder.\n",
+    "1. Create a .txt file in the folder named \"prompts.txt\". It should contain 8-12 initial prompts from where you can start the optimization. Add line breaks between each of the prompts\n",
+    "1. Create two .txt files in another folder, which contain the dev set \"dev.txt\" and test set \"test.txt\" of your data points. Convert the classes of your file into integers. \n",
+    "Make sure to seperate the input from the expected output with a tab!\n",
+    "1. Create a description.json file that contains a dictionary, specifying:\n",
+    "    - \"seed\": the folder in which you find the dev and test files\n",
+    "    - \"init_prompts\": the name of the .txt file pointing to the prompts\n",
+    "    - \"description\": A short description of your task, that is fed to the meta-llm in order to optimize the prompts. \n",
+    "    (TIP: Include \"The class mentioned first in the response of the LLM will be the prediction.\" in the description if this is how you evaluate the models responses)\n",
+    "    - \"classes\": A list of the names of the classes you are trying to predict\n",
+    "\n",
+    "You can find examples of how this needs to be set up in our repo at data_sets/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Installs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! pip install promptolution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "## Imports"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "c:\\Users\\tzehl\\Documents\\programming\\promptolution\\.venv\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "from promptolution.helpers import run_experiment\n",
+    "from promptolution.config import Config"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## set up llms, predictor, tasks and optimizer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "token = open(\"../deepinfratoken.txt\", \"r\").read()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "config = Config(\n",
+    "    task_name=\"subj\",\n",
+    "    ds_path=\"../data_sets/cls/subj/\",\n",
+    "    n_steps=8,\n",
+    "    optimizer=\"evopromptde\",\n",
+    "    meta_llm=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n",
+    "    evaluation_llm=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n",
+    "    downstream_llm=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n",
+    "    api_token=token,\n",
+    "    prepend_exemplars=True,\n",
+    "    exemplar_selector=\"random_search\",\n",
+    "    n_exemplars=3,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = run_experiment(config)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>prompt</th>\n",
+       "      <th>score</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>evaluate each sentence as either objective or ...</td>\n",
+       "      <td>0.80</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>As a linguist, analyze a statement from a movi...</td>\n",
+       "      <td>0.80</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>identify whether the given sentence was expres...</td>\n",
+       "      <td>0.65</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>Analyze the textual content of a given stateme...</td>\n",
+       "      <td>0.65</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>determine the classification of each sentence ...</td>\n",
+       "      <td>0.60</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>evaluate each statement as either subjective o...</td>\n",
+       "      <td>0.50</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>Classify the sentence according to its subject...</td>\n",
+       "      <td>0.40</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>As a classifier, interpret phrases in movie re...</td>\n",
+       "      <td>0.35</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>and\\n\\nshae is about to return to bed when she...</td>\n",
+       "      <td>0.35</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>Analyze reviews and label them as subjective o...</td>\n",
+       "      <td>0.30</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                              prompt  score\n",
+       "1  evaluate each sentence as either objective or ...   0.80\n",
+       "8  As a linguist, analyze a statement from a movi...   0.80\n",
+       "3  identify whether the given sentence was expres...   0.65\n",
+       "5  Analyze the textual content of a given stateme...   0.65\n",
+       "9  determine the classification of each sentence ...   0.60\n",
+       "0  evaluate each statement as either subjective o...   0.50\n",
+       "2  Classify the sentence according to its subject...   0.40\n",
+       "6  As a classifier, interpret phrases in movie re...   0.35\n",
+       "7  and\\n\\nshae is about to return to bed when she...   0.35\n",
+       "4  Analyze reviews and label them as subjective o...   0.30"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/poetry.lock b/poetry.lock
index efe5954..1938d9c 100644
--- a/poetry.lock
+++ b/poetry.lock
@@ -1376,6 +1376,17 @@ files = [
     {file = "jiter-0.5.0.tar.gz", hash = "sha256:1d916ba875bcab5c5f7d927df998c4cb694d27dceddf3392e58beaf10563368a"},
 ]
 
+[[package]]
+name = "joblib"
+version = "1.4.2"
+description = "Lightweight pipelining with Python functions"
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "joblib-1.4.2-py3-none-any.whl", hash = "sha256:06d478d5674cbc267e7496a410ee875abd68e4340feff4490bcb7afb88060ae6"},
+    {file = "joblib-1.4.2.tar.gz", hash = "sha256:2382c5816b2636fbd20a09e0f4e9dad4736765fdfb7dca582943b9c1366b3f0e"},
+]
+
 [[package]]
 name = "jsonpatch"
 version = "1.33"
@@ -1568,18 +1579,18 @@ files = [
 
 [[package]]
 name = "langchain"
-version = "0.2.14"
+version = "0.2.16"
 description = "Building applications with LLMs through composability"
 optional = false
 python-versions = "<4.0,>=3.8.1"
 files = [
-    {file = "langchain-0.2.14-py3-none-any.whl", hash = "sha256:eed76194ee7d9c081037a3df7868d4de90e0410b51fc1ca933a8379e464bf40c"},
-    {file = "langchain-0.2.14.tar.gz", hash = "sha256:dc2aa5a58882054fb5d043c39ab8332ebd055f88f17839da68e1c7fd0a4fefe2"},
+    {file = "langchain-0.2.16-py3-none-any.whl", hash = "sha256:8f59ee8b45f268df4b924ea3b9c63e49286efa756d16b3f6a9de5c6e502c36e1"},
+    {file = "langchain-0.2.16.tar.gz", hash = "sha256:ffb426a76a703b73ac69abad77cd16eaf03dda76b42cff55572f592d74944166"},
 ]
 
 [package.dependencies]
 aiohttp = ">=3.8.3,<4.0.0"
-langchain-core = ">=0.2.32,<0.3.0"
+langchain-core = ">=0.2.38,<0.3.0"
 langchain-text-splitters = ">=0.2.0,<0.3.0"
 langsmith = ">=0.1.17,<0.2.0"
 numpy = [
@@ -1610,21 +1621,21 @@ langchain-core = ">=0.2.26,<0.3.0"
 
 [[package]]
 name = "langchain-community"
-version = "0.2.12"
+version = "0.2.17"
 description = "Community contributed LangChain integrations."
 optional = false
 python-versions = "<4.0,>=3.8.1"
 files = [
-    {file = "langchain_community-0.2.12-py3-none-any.whl", hash = "sha256:50e74473dd2309bdef561760afbbf0c5ea17ed91fc4dfa0d52279dd16d6d34e0"},
-    {file = "langchain_community-0.2.12.tar.gz", hash = "sha256:d671cfc6a4f3b65f49a2e59ab420d0164f109d0a56fc4b4996518205c63b8c7e"},
+    {file = "langchain_community-0.2.17-py3-none-any.whl", hash = "sha256:d07c31b641e425fb8c3e7148ad6a62e1b54a9adac6e1173021a7dd3148266063"},
+    {file = "langchain_community-0.2.17.tar.gz", hash = "sha256:b0745c1fcf1bd532ed4388f90b47139d6a6c6ba48a87aa68aa32d4d6bb97259d"},
 ]
 
 [package.dependencies]
 aiohttp = ">=3.8.3,<4.0.0"
 dataclasses-json = ">=0.5.7,<0.7"
-langchain = ">=0.2.13,<0.3.0"
-langchain-core = ">=0.2.30,<0.3.0"
-langsmith = ">=0.1.0,<0.2.0"
+langchain = ">=0.2.16,<0.3.0"
+langchain-core = ">=0.2.39,<0.3.0"
+langsmith = ">=0.1.112,<0.2.0"
 numpy = [
     {version = ">=1,<2", markers = "python_version < \"3.12\""},
     {version = ">=1.26.0,<2.0.0", markers = "python_version >= \"3.12\""},
@@ -1636,18 +1647,18 @@ tenacity = ">=8.1.0,<8.4.0 || >8.4.0,<9.0.0"
 
 [[package]]
 name = "langchain-core"
-version = "0.2.33"
+version = "0.2.41"
 description = "Building applications with LLMs through composability"
 optional = false
 python-versions = "<4.0,>=3.8.1"
 files = [
-    {file = "langchain_core-0.2.33-py3-none-any.whl", hash = "sha256:c8de411336c13fa440b7a52895bfd1c064f04d315344855962988483902cc532"},
-    {file = "langchain_core-0.2.33.tar.gz", hash = "sha256:dd2659e0a560fc987b210107bf989aa14a6f4b67dd214c13a2c9669036cda975"},
+    {file = "langchain_core-0.2.41-py3-none-any.whl", hash = "sha256:3278fda5ba9a05defae8bb19f1226032add6aab21917db7b3bc74e750e263e84"},
+    {file = "langchain_core-0.2.41.tar.gz", hash = "sha256:bc12032c5a298d85be754ccb129bc13ea21ccb1d6e22f8d7ba18b8da64315bb5"},
 ]
 
 [package.dependencies]
 jsonpatch = ">=1.33,<2.0"
-langsmith = ">=0.1.75,<0.2.0"
+langsmith = ">=0.1.112,<0.2.0"
 packaging = ">=23.2,<25"
 pydantic = [
     {version = ">=1,<3", markers = "python_full_version < \"3.12.4\""},
@@ -1689,22 +1700,24 @@ langchain-core = ">=0.2.10,<0.3.0"
 
 [[package]]
 name = "langsmith"
-version = "0.1.99"
+version = "0.1.132"
 description = "Client library to connect to the LangSmith LLM Tracing and Evaluation Platform."
 optional = false
 python-versions = "<4.0,>=3.8.1"
 files = [
-    {file = "langsmith-0.1.99-py3-none-any.whl", hash = "sha256:ef8d1d74a2674c514aa429b0171a9fbb661207dc3835142cca0e8f1bf97b26b0"},
-    {file = "langsmith-0.1.99.tar.gz", hash = "sha256:b5c6a1f158abda61600a4a445081ee848b4a28b758d91f2793dc02aeffafcaf1"},
+    {file = "langsmith-0.1.132-py3-none-any.whl", hash = "sha256:2320894203675c1c292b818cbecf68b69e47a9f7814d4e950237d1faaafd5dee"},
+    {file = "langsmith-0.1.132.tar.gz", hash = "sha256:007b8fac469138abdba89db931900a26c5d316640e27ff4660d28c92a766aae1"},
 ]
 
 [package.dependencies]
+httpx = ">=0.23.0,<1"
 orjson = ">=3.9.14,<4.0.0"
 pydantic = [
     {version = ">=1,<3", markers = "python_full_version < \"3.12.4\""},
     {version = ">=2.7.4,<3.0.0", markers = "python_full_version >= \"3.12.4\""},
 ]
 requests = ">=2,<3"
+requests-toolbelt = ">=1.0.0,<2.0.0"
 
 [[package]]
 name = "markdown"
@@ -3213,6 +3226,20 @@ urllib3 = ">=1.21.1,<3"
 socks = ["PySocks (>=1.5.6,!=1.5.7)"]
 use-chardet-on-py3 = ["chardet (>=3.0.2,<6)"]
 
+[[package]]
+name = "requests-toolbelt"
+version = "1.0.0"
+description = "A utility belt for advanced users of python-requests"
+optional = false
+python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
+files = [
+    {file = "requests-toolbelt-1.0.0.tar.gz", hash = "sha256:7681a0a3d047012b5bdc0ee37d7f8f07ebe76ab08caeccfc3921ce23c88d5bc6"},
+    {file = "requests_toolbelt-1.0.0-py2.py3-none-any.whl", hash = "sha256:cccfdd665f0a24fcf4726e690f65639d272bb0637b9b92dfd91a5568ccf6bd06"},
+]
+
+[package.dependencies]
+requests = ">=2.0.1,<3.0.0"
+
 [[package]]
 name = "safetensors"
 version = "0.4.4"
@@ -3345,6 +3372,106 @@ tensorflow = ["safetensors[numpy]", "tensorflow (>=2.11.0)"]
 testing = ["h5py (>=3.7.0)", "huggingface-hub (>=0.12.1)", "hypothesis (>=6.70.2)", "pytest (>=7.2.0)", "pytest-benchmark (>=4.0.0)", "safetensors[numpy]", "setuptools-rust (>=1.5.2)"]
 torch = ["safetensors[numpy]", "torch (>=1.10)"]
 
+[[package]]
+name = "scikit-learn"
+version = "1.5.2"
+description = "A set of python modules for machine learning and data mining"
+optional = false
+python-versions = ">=3.9"
+files = [
+    {file = "scikit_learn-1.5.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:299406827fb9a4f862626d0fe6c122f5f87f8910b86fe5daa4c32dcd742139b6"},
+    {file = "scikit_learn-1.5.2-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:2d4cad1119c77930b235579ad0dc25e65c917e756fe80cab96aa3b9428bd3fb0"},
+    {file = "scikit_learn-1.5.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8c412ccc2ad9bf3755915e3908e677b367ebc8d010acbb3f182814524f2e5540"},
+    {file = "scikit_learn-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3a686885a4b3818d9e62904d91b57fa757fc2bed3e465c8b177be652f4dd37c8"},
+    {file = "scikit_learn-1.5.2-cp310-cp310-win_amd64.whl", hash = "sha256:c15b1ca23d7c5f33cc2cb0a0d6aaacf893792271cddff0edbd6a40e8319bc113"},
+    {file = "scikit_learn-1.5.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:03b6158efa3faaf1feea3faa884c840ebd61b6484167c711548fce208ea09445"},
+    {file = "scikit_learn-1.5.2-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:1ff45e26928d3b4eb767a8f14a9a6efbf1cbff7c05d1fb0f95f211a89fd4f5de"},
+    {file = "scikit_learn-1.5.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f763897fe92d0e903aa4847b0aec0e68cadfff77e8a0687cabd946c89d17e675"},
+    {file = "scikit_learn-1.5.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f8b0ccd4a902836493e026c03256e8b206656f91fbcc4fde28c57a5b752561f1"},
+    {file = "scikit_learn-1.5.2-cp311-cp311-win_amd64.whl", hash = "sha256:6c16d84a0d45e4894832b3c4d0bf73050939e21b99b01b6fd59cbb0cf39163b6"},
+    {file = "scikit_learn-1.5.2-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:f932a02c3f4956dfb981391ab24bda1dbd90fe3d628e4b42caef3e041c67707a"},
+    {file = "scikit_learn-1.5.2-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:3b923d119d65b7bd555c73be5423bf06c0105678ce7e1f558cb4b40b0a5502b1"},
+    {file = "scikit_learn-1.5.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f60021ec1574e56632be2a36b946f8143bf4e5e6af4a06d85281adc22938e0dd"},
+    {file = "scikit_learn-1.5.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:394397841449853c2290a32050382edaec3da89e35b3e03d6cc966aebc6a8ae6"},
+    {file = "scikit_learn-1.5.2-cp312-cp312-win_amd64.whl", hash = "sha256:57cc1786cfd6bd118220a92ede80270132aa353647684efa385a74244a41e3b1"},
+    {file = "scikit_learn-1.5.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:e9a702e2de732bbb20d3bad29ebd77fc05a6b427dc49964300340e4c9328b3f5"},
+    {file = "scikit_learn-1.5.2-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:b0768ad641981f5d3a198430a1d31c3e044ed2e8a6f22166b4d546a5116d7908"},
+    {file = "scikit_learn-1.5.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:178ddd0a5cb0044464fc1bfc4cca5b1833bfc7bb022d70b05db8530da4bb3dd3"},
+    {file = "scikit_learn-1.5.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f7284ade780084d94505632241bf78c44ab3b6f1e8ccab3d2af58e0e950f9c12"},
+    {file = "scikit_learn-1.5.2-cp313-cp313-win_amd64.whl", hash = "sha256:b7b0f9a0b1040830d38c39b91b3a44e1b643f4b36e36567b80b7c6bd2202a27f"},
+    {file = "scikit_learn-1.5.2-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:757c7d514ddb00ae249832fe87100d9c73c6ea91423802872d9e74970a0e40b9"},
+    {file = "scikit_learn-1.5.2-cp39-cp39-macosx_12_0_arm64.whl", hash = "sha256:52788f48b5d8bca5c0736c175fa6bdaab2ef00a8f536cda698db61bd89c551c1"},
+    {file = "scikit_learn-1.5.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:643964678f4b5fbdc95cbf8aec638acc7aa70f5f79ee2cdad1eec3df4ba6ead8"},
+    {file = "scikit_learn-1.5.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ca64b3089a6d9b9363cd3546f8978229dcbb737aceb2c12144ee3f70f95684b7"},
+    {file = "scikit_learn-1.5.2-cp39-cp39-win_amd64.whl", hash = "sha256:3bed4909ba187aca80580fe2ef370d9180dcf18e621a27c4cf2ef10d279a7efe"},
+    {file = "scikit_learn-1.5.2.tar.gz", hash = "sha256:b4237ed7b3fdd0a4882792e68ef2545d5baa50aca3bb45aa7df468138ad8f94d"},
+]
+
+[package.dependencies]
+joblib = ">=1.2.0"
+numpy = ">=1.19.5"
+scipy = ">=1.6.0"
+threadpoolctl = ">=3.1.0"
+
+[package.extras]
+benchmark = ["matplotlib (>=3.3.4)", "memory_profiler (>=0.57.0)", "pandas (>=1.1.5)"]
+build = ["cython (>=3.0.10)", "meson-python (>=0.16.0)", "numpy (>=1.19.5)", "scipy (>=1.6.0)"]
+docs = ["Pillow (>=7.1.2)", "matplotlib (>=3.3.4)", "memory_profiler (>=0.57.0)", "numpydoc (>=1.2.0)", "pandas (>=1.1.5)", "plotly (>=5.14.0)", "polars (>=0.20.30)", "pooch (>=1.6.0)", "pydata-sphinx-theme (>=0.15.3)", "scikit-image (>=0.17.2)", "seaborn (>=0.9.0)", "sphinx (>=7.3.7)", "sphinx-copybutton (>=0.5.2)", "sphinx-design (>=0.5.0)", "sphinx-design (>=0.6.0)", "sphinx-gallery (>=0.16.0)", "sphinx-prompt (>=1.4.0)", "sphinx-remove-toctrees (>=1.0.0.post1)", "sphinxcontrib-sass (>=0.3.4)", "sphinxext-opengraph (>=0.9.1)"]
+examples = ["matplotlib (>=3.3.4)", "pandas (>=1.1.5)", "plotly (>=5.14.0)", "pooch (>=1.6.0)", "scikit-image (>=0.17.2)", "seaborn (>=0.9.0)"]
+install = ["joblib (>=1.2.0)", "numpy (>=1.19.5)", "scipy (>=1.6.0)", "threadpoolctl (>=3.1.0)"]
+maintenance = ["conda-lock (==2.5.6)"]
+tests = ["black (>=24.3.0)", "matplotlib (>=3.3.4)", "mypy (>=1.9)", "numpydoc (>=1.2.0)", "pandas (>=1.1.5)", "polars (>=0.20.30)", "pooch (>=1.6.0)", "pyamg (>=4.0.0)", "pyarrow (>=12.0.0)", "pytest (>=7.1.2)", "pytest-cov (>=2.9.0)", "ruff (>=0.2.1)", "scikit-image (>=0.17.2)"]
+
+[[package]]
+name = "scipy"
+version = "1.14.1"
+description = "Fundamental algorithms for scientific computing in Python"
+optional = false
+python-versions = ">=3.10"
+files = [
+    {file = "scipy-1.14.1-cp310-cp310-macosx_10_13_x86_64.whl", hash = "sha256:b28d2ca4add7ac16ae8bb6632a3c86e4b9e4d52d3e34267f6e1b0c1f8d87e389"},
+    {file = "scipy-1.14.1-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:d0d2821003174de06b69e58cef2316a6622b60ee613121199cb2852a873f8cf3"},
+    {file = "scipy-1.14.1-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:8bddf15838ba768bb5f5083c1ea012d64c9a444e16192762bd858f1e126196d0"},
+    {file = "scipy-1.14.1-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:97c5dddd5932bd2a1a31c927ba5e1463a53b87ca96b5c9bdf5dfd6096e27efc3"},
+    {file = "scipy-1.14.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2ff0a7e01e422c15739ecd64432743cf7aae2b03f3084288f399affcefe5222d"},
+    {file = "scipy-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8e32dced201274bf96899e6491d9ba3e9a5f6b336708656466ad0522d8528f69"},
+    {file = "scipy-1.14.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:8426251ad1e4ad903a4514712d2fa8fdd5382c978010d1c6f5f37ef286a713ad"},
+    {file = "scipy-1.14.1-cp310-cp310-win_amd64.whl", hash = "sha256:a49f6ed96f83966f576b33a44257d869756df6cf1ef4934f59dd58b25e0327e5"},
+    {file = "scipy-1.14.1-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:2da0469a4ef0ecd3693761acbdc20f2fdeafb69e6819cc081308cc978153c675"},
+    {file = "scipy-1.14.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:c0ee987efa6737242745f347835da2cc5bb9f1b42996a4d97d5c7ff7928cb6f2"},
+    {file = "scipy-1.14.1-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:3a1b111fac6baec1c1d92f27e76511c9e7218f1695d61b59e05e0fe04dc59617"},
+    {file = "scipy-1.14.1-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:8475230e55549ab3f207bff11ebfc91c805dc3463ef62eda3ccf593254524ce8"},
+    {file = "scipy-1.14.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:278266012eb69f4a720827bdd2dc54b2271c97d84255b2faaa8f161a158c3b37"},
+    {file = "scipy-1.14.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fef8c87f8abfb884dac04e97824b61299880c43f4ce675dd2cbeadd3c9b466d2"},
+    {file = "scipy-1.14.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:b05d43735bb2f07d689f56f7b474788a13ed8adc484a85aa65c0fd931cf9ccd2"},
+    {file = "scipy-1.14.1-cp311-cp311-win_amd64.whl", hash = "sha256:716e389b694c4bb564b4fc0c51bc84d381735e0d39d3f26ec1af2556ec6aad94"},
+    {file = "scipy-1.14.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:631f07b3734d34aced009aaf6fedfd0eb3498a97e581c3b1e5f14a04164a456d"},
+    {file = "scipy-1.14.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:af29a935803cc707ab2ed7791c44288a682f9c8107bc00f0eccc4f92c08d6e07"},
+    {file = "scipy-1.14.1-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:2843f2d527d9eebec9a43e6b406fb7266f3af25a751aa91d62ff416f54170bc5"},
+    {file = "scipy-1.14.1-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:eb58ca0abd96911932f688528977858681a59d61a7ce908ffd355957f7025cfc"},
+    {file = "scipy-1.14.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:30ac8812c1d2aab7131a79ba62933a2a76f582d5dbbc695192453dae67ad6310"},
+    {file = "scipy-1.14.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8f9ea80f2e65bdaa0b7627fb00cbeb2daf163caa015e59b7516395fe3bd1e066"},
+    {file = "scipy-1.14.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:edaf02b82cd7639db00dbff629995ef185c8df4c3ffa71a5562a595765a06ce1"},
+    {file = "scipy-1.14.1-cp312-cp312-win_amd64.whl", hash = "sha256:2ff38e22128e6c03ff73b6bb0f85f897d2362f8c052e3b8ad00532198fbdae3f"},
+    {file = "scipy-1.14.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:1729560c906963fc8389f6aac023739ff3983e727b1a4d87696b7bf108316a79"},
+    {file = "scipy-1.14.1-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:4079b90df244709e675cdc8b93bfd8a395d59af40b72e339c2287c91860deb8e"},
+    {file = "scipy-1.14.1-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:e0cf28db0f24a38b2a0ca33a85a54852586e43cf6fd876365c86e0657cfe7d73"},
+    {file = "scipy-1.14.1-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:0c2f95de3b04e26f5f3ad5bb05e74ba7f68b837133a4492414b3afd79dfe540e"},
+    {file = "scipy-1.14.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b99722ea48b7ea25e8e015e8341ae74624f72e5f21fc2abd45f3a93266de4c5d"},
+    {file = "scipy-1.14.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5149e3fd2d686e42144a093b206aef01932a0059c2a33ddfa67f5f035bdfe13e"},
+    {file = "scipy-1.14.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:e4f5a7c49323533f9103d4dacf4e4f07078f360743dec7f7596949149efeec06"},
+    {file = "scipy-1.14.1-cp313-cp313-win_amd64.whl", hash = "sha256:baff393942b550823bfce952bb62270ee17504d02a1801d7fd0719534dfb9c84"},
+    {file = "scipy-1.14.1.tar.gz", hash = "sha256:5a275584e726026a5699459aa72f828a610821006228e841b94275c4a7c08417"},
+]
+
+[package.dependencies]
+numpy = ">=1.23.5,<2.3"
+
+[package.extras]
+dev = ["cython-lint (>=0.12.2)", "doit (>=0.36.0)", "mypy (==1.10.0)", "pycodestyle", "pydevtool", "rich-click", "ruff (>=0.0.292)", "types-psutil", "typing_extensions"]
+doc = ["jupyterlite-pyodide-kernel", "jupyterlite-sphinx (>=0.13.1)", "jupytext", "matplotlib (>=3.5)", "myst-nb", "numpydoc", "pooch", "pydata-sphinx-theme (>=0.15.2)", "sphinx (>=5.0.0,<=7.3.7)", "sphinx-design (>=0.4.0)"]
+test = ["Cython", "array-api-strict (>=2.0)", "asv", "gmpy2", "hypothesis (>=6.30)", "meson", "mpmath", "ninja", "pooch", "pytest", "pytest-cov", "pytest-timeout", "pytest-xdist", "scikit-umfpack", "threadpoolctl"]
+
 [[package]]
 name = "seaborn"
 version = "0.13.2"
@@ -3509,6 +3636,17 @@ files = [
 doc = ["reno", "sphinx"]
 test = ["pytest", "tornado (>=4.5)", "typeguard"]
 
+[[package]]
+name = "threadpoolctl"
+version = "3.5.0"
+description = "threadpoolctl"
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "threadpoolctl-3.5.0-py3-none-any.whl", hash = "sha256:56c1e26c150397e58c4926da8eeee87533b1e32bef131bd4bf6a2f45f3185467"},
+    {file = "threadpoolctl-3.5.0.tar.gz", hash = "sha256:082433502dd922bf738de0d8bcc4fdcbf0979ff44c42bd40f5af8a282f6fa107"},
+]
+
 [[package]]
 name = "tiktoken"
 version = "0.7.0"
@@ -4034,4 +4172,4 @@ multidict = ">=4.0"
 [metadata]
 lock-version = "2.0"
 python-versions = "^3.11"
-content-hash = "7f2a00d58b72f3b7cec0991808ffc354c5f10f87e1c78bd9ed4d5369932d243d"
+content-hash = "6c9aacc81e214e934481f8764b4ecf4db4366f0860952bf045649e2b405f83a5"
diff --git a/promptolution/callbacks.py b/promptolution/callbacks.py
index 82aa7b4..fe655d6 100644
--- a/promptolution/callbacks.py
+++ b/promptolution/callbacks.py
@@ -17,16 +17,20 @@ def on_step_end(self, optimizer):
         """
         pass
 
-    def on_epoch_end(self, epoch):
+    def on_epoch_end(self, optimizer):
         """Called at the end of each optimization epoch.
 
         Args:
-        epoch: The current epoch number.
+        optimizer: The optimizer object that called the callback.
         """
         pass
 
-    def on_train_end(self):
-        """Called at the end of the entire optimization process."""
+    def on_train_end(self, optimizer):
+        """Called at the end of the entire optimization process.
+
+        Args:
+        optimizer: The optimizer object that called the callback.
+        """
         pass
 
 
@@ -53,19 +57,11 @@ def on_step_end(self, optimizer):
             self.logger.critical(f"*** Prompt {i}: Score: {score}")
             self.logger.critical(f"{prompt}")
 
-    def on_epoch_end(self, epoch, logs=None):
-        """Log information about the current epoch.
-
-        Args:
-        epoch: The current epoch number.
-        logs: Additional information to log.
-        """
-        self.logger.critical(f"Epoch {epoch} - {logs}")
-
-    def on_train_end(self, logs=None):
+    def on_train_end(self, optimizer, logs=None):
         """Log information at the end of training.
 
         Args:
+        optimizer: The optimizer object that called the callback.
         logs: Additional information to log.
         """
         self.logger.critical(f"Training ended - {logs}")
@@ -109,8 +105,12 @@ def on_step_end(self, optimizer):
         )
         df.to_csv(self.path, mode="a", header=False, index=False)
 
-    def on_train_end(self):
-        """Called at the end of training."""
+    def on_train_end(self, optimizer):
+        """Called at the end of training.
+
+        Args:
+        optimizer: The optimizer object that called the callback.
+        """
         pass
 
 
@@ -173,6 +173,10 @@ def on_step_end(self, optimizer):
         """
         self.pbar.update(1)
 
-    def on_train_end(self):
-        """Close the progress bar at the end of training."""
+    def on_train_end(self, optimizer):
+        """Close the progress bar at the end of training.
+
+        Args:
+        optimizer: The optimizer object that called the callback.
+        """
         self.pbar.close()
diff --git a/promptolution/config.py b/promptolution/config.py
index 5bba4df..dac2d9a 100644
--- a/promptolution/config.py
+++ b/promptolution/config.py
@@ -1,7 +1,8 @@
 """Configuration class for the promptolution library."""
-
-from configparser import ConfigParser
+import configparser
 from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Dict, Literal, Optional
 
 
 @dataclass
@@ -12,71 +13,116 @@ class Config:
     either from a config file or from keyword arguments.
 
     Attributes:
-        task_name (str): Name of the task.
-        ds_path (str): Path to the dataset.
-        n_steps (int): Number of optimization steps.
-        optimizer (str): Name of the optimizer to use.
-        meta_prompt_path (str): Path to the meta prompt file.
-        meta_llms (str): Name of the meta language model.
-        downstream_llm (str): Name of the downstream language model.
-        evaluation_llm (str): Name of the evaluation language model.
+        task_name (str): Name of the task. Should not be None if used.
+        ds_path (str): Path to the dataset. Should not be None if used.
+        n_steps (int): Number of optimization steps. Should not be None if used.
+        optimizer (str): Name of the optimizer to use. Should not be None if used.
+        meta_llm (str): Name of the meta language model. Should not be None if used.
+        downstream_llm (str): Name of the downstream language model. Should not be None if used.
+        evaluation_llm (str): Name of the evaluation language model. Should not be None if used.
         init_pop_size (int): Initial population size. Defaults to 10.
         logging_dir (str): Directory for logging. Defaults to "logs/run.csv".
         experiment_name (str): Name of the experiment. Defaults to "experiment".
         include_task_desc (bool): Whether to include task description. Defaults to False.
+        donor_random (bool): Whether to use random donor prompts for EvoPromptDE. Defaults to False.
         random_seed (int): Random seed for reproducibility. Defaults to 42.
+        selection_mode (str): Selection mode for EvoPromptGA. Defaults to "random".
+        meta_bs (int): Batch size for local meta LLM. Should not be None if llm is run locally. Defaults to None.
+        downstream_bs (int): Batch size for local downstream LLM.
+        Should not be None if llm is run locally Defaults to None.
+        api_token (str): API token for different APIs, as implemented in LLM classes.
+        Should not be None if APILLM is used. Defaults to None.
+        meta_prompt (str): Prompt template for the meta LLM.
+        If None is set, default meta_prompts from template.py will be used. Defaults to None.
+        prepend_exemplars (bool): rather to do exemplar search and prepend few-shot examples. Defaults to False.
+        n_exemplars (int): how many exemplars to prepend. Only used if prepend_exemplars is True. Defaults to 5.
+        exemplar_selector (str): which exemplar selector to use. Should not be None if preped_exemplars is True.
+        Defaults to None.
+        n_ds_samples_to_meta (int): how many examples to show of the ds to show to meta-llm
+        (not applicable to every optimizer)
+        n_eval_samples (int): how many examples to show to evaluation llm for evaluation.
     """
 
-    task_name: str
-    ds_path: str
-    n_steps: int
-    optimizer: str
-    meta_prompt_path: str
-    meta_llms: str
-    downstream_llm: str
-    evaluation_llm: str
-    init_pop_size: int = 10
-    logging_dir: str = "logs/run.csv"
+    task_name: str = None
+    ds_path: Path = None
+    optimizer: str = None
+    meta_llm: str = None
+    downstream_llm: str = None
+    evaluation_llm: str = None
+    n_steps: int = None
+    init_pop_size: int = None
+    logging_dir: Path = Path("logs/run.csv")
     experiment_name: str = "experiment"
-    include_task_desc: bool = False
+    include_task_desc: bool = True
+    donor_random: bool = False
     random_seed: int = 42
+    selection_mode: Optional[Literal["random", "wheel", "tour"]] = "random"
+    meta_bs: Optional[int] = None
+    downstream_bs: Optional[int] = None
+    api_token: Optional[str] = None
+    meta_prompt: Optional[str] = None
+    prepend_exemplars: Optional[bool] = False
+    n_exemplars: Optional[int] = 5
+    exemplar_selector: Optional[str] = None
+    n_ds_samples_to_meta: Optional[int] = 2
+    n_eval_samples: Optional[int] = 20
+
+    def __post_init__(self):
+        """Validate the configuration after initialization."""
+        self._validate_config()
+
+    @classmethod
+    def from_dict(cls, config_dict: Dict[str, Any]) -> "Config":
+        """Create a Config instance from a dictionary."""
+        return cls(**cls._process_config_dict(config_dict))
+
+    @classmethod
+    def from_file(cls, config_path: Path) -> "Config":
+        """Create a Config instance from a configuration file."""
+        if not config_path.exists():
+            raise FileNotFoundError(f"Configuration file not found: {config_path}")
+
+        config = configparser.ConfigParser()
+        config.read(config_path)
+
+        config_dict = {key: value for section in config.sections() for key, value in config[section].items()}
+
+        return cls.from_dict(config_dict)
+
+    @classmethod
+    def _process_config_dict(cls, config_dict: Dict[str, Any]) -> Dict[str, Any]:
+        """Process and validate the configuration dictionary."""
+        processed_dict = {}
+        for field in cls.__dataclass_fields__.values():
+            if field.name in config_dict:
+                value = config_dict[field.name]
+                if field.type == Path:
+                    processed_dict[field.name] = Path(value)
+                elif field.type == bool:
+                    processed_dict[field.name] = str(value).lower() == "true"
+                elif field.type == int:
+                    processed_dict[field.name] = int(value)
+                else:
+                    processed_dict[field.name] = value
+            elif field.default == field.default_factory:  # Check if field is required
+                raise ValueError(f"Required configuration parameter '{field.name}' is missing")
+
+        unknown_args = set(config_dict.keys()) - set(cls.__dataclass_fields__.keys())
+        if unknown_args:
+            print(f"Warning: Unexpected configuration arguments: {', '.join(unknown_args)}")
+
+        return processed_dict
+
+    def _validate_config(self):
+        """Validate the configuration settings."""
+        if self.meta_llm is not None:
+            if "local" in self.meta_llm and self.meta_bs is None:
+                raise ValueError("'meta_bs' must be specified for local meta_llm")
+            if "local" in self.downstream_llm and self.downstream_bs is None:
+                raise ValueError("'downstream_bs' must be specified for local downstream_llm")
+        if self.api_token is None:
+            print("Warning: No API token provided. Using default tokens from token files.")
 
-    def __init__(self, config_path: str = None, **kwargs):
-        """Initialize the Config object."""
-        if config_path:
-            self.config_path = config_path
-            self.config = ConfigParser()
-            self.config.read(config_path)
-            self._parse_config()
-        else:
-            for key, value in kwargs.items():
-                setattr(self, key, value)
-
-    def _parse_config(self):
-        """Parse the configuration settings from the config file."""
-        self.task_name = self.config["task"]["task_name"]
-        self.ds_path = self.config["task"]["ds_path"]
-        self.n_steps = int(self.config["task"]["steps"])
-        self.random_seed = int(self.config["task"]["random_seed"])
-        self.optimizer = self.config["optimizer"]["name"]
-        self.meta_prompt_path = self.config["optimizer"]["meta_prompt_path"]
-        self.meta_llm = self.config["meta_llm"]["name"]
-        self.downstream_llm = self.config["downstream_llm"]["name"]
-        self.evaluation_llm = self.config["evaluator_llm"]["name"]
-        self.init_pop_size = int(self.config["optimizer"]["init_pop_size"])
-        self.logging_dir = self.config["logging"]["dir"]
-        self.experiment_name = self.config["experiment"]["name"]
-
-        if "include_task_desc" in self.config["task"]:
-            self.include_task_desc = self.config["task"]["include_task_desc"] == "True"
-
-        if self.optimizer == "evopromptga":
-            self.selection_mode = self.config["optimizer"]["selection_mode"]
-        elif self.optimizer == "evopromptde":
-            self.selection_mode = self.config["optimizer"]["donor_random"]
-
-        if "local" in self.meta_llm:
-            self.meta_bs = int(self.config["meta_llm"]["batch_size"])
-
-        if "local" in self.downstream_llm:
-            self.downstream_bs = int(self.config["downstream_llm"]["batch_size"])
+    def to_dict(self) -> Dict[str, Any]:
+        """Convert the Config instance to a dictionary."""
+        return {field.name: getattr(self, field.name) for field in self.__dataclass_fields__.values()}
diff --git a/promptolution/exemplar_selectors/__init__.py b/promptolution/exemplar_selectors/__init__.py
new file mode 100644
index 0000000..f234373
--- /dev/null
+++ b/promptolution/exemplar_selectors/__init__.py
@@ -0,0 +1,33 @@
+"""Module for exemplar selectors."""
+
+from typing import Literal
+
+from promptolution.exemplar_selectors.random_search_selector import RandomSearchSelector
+from promptolution.exemplar_selectors.random_selector import RandomSelector
+from promptolution.predictors.base_predictor import BasePredictor
+from promptolution.tasks.base_task import BaseTask
+
+SELECTOR_MAP = {
+    "random": RandomSelector,
+    "random_search": RandomSearchSelector,
+}
+
+
+def get_exemplar_selector(name: Literal["random", "random_search"], task: BaseTask, predictor: BasePredictor):
+    """Factory function to get an exemplar selector based on the given name.
+
+    Args:
+        name (str): The name of the exemplar selector to instantiate.
+        task (BaseTask): The task object to be passed to the selector.
+        predictor (BasePredictor): The predictor object to be passed to the selector.
+
+    Returns:
+        BaseExemplarSelector: An instance of the requested exemplar selector.
+
+    Raises:
+        ValueError: If the requested selector name is not found.
+    """
+    if name not in SELECTOR_MAP:
+        raise ValueError(f"Exemplar selector '{name}' not found. Available selectors: {list(SELECTOR_MAP.keys())}")
+
+    return SELECTOR_MAP[name](task, predictor)
diff --git a/promptolution/exemplar_selectors/base_exemplar_selector.py b/promptolution/exemplar_selectors/base_exemplar_selector.py
new file mode 100644
index 0000000..dd96e7b
--- /dev/null
+++ b/promptolution/exemplar_selectors/base_exemplar_selector.py
@@ -0,0 +1,41 @@
+"""Base class for exemplar selectors."""
+
+from abc import ABC, abstractmethod
+from typing import Any, List, Tuple
+
+from promptolution.predictors.base_predictor import BasePredictor
+from promptolution.tasks.base_task import BaseTask
+
+
+class BaseExemplarSelector(ABC):
+    """An abstract base class for exemplar selectors.
+
+    This class defines the basic interface and common functionality
+    that all exemplar selectors should implement.
+    """
+
+    def __init__(self, task: BaseTask, predictor: BasePredictor):
+        """Initialize the BaseExemplarSelector.
+
+        Args:
+            task (BaseTask): An object representing the task to be performed.
+            predictor (BasePredictor): An object capable of making predictions based on prompts.
+        """
+        self.task = task
+        self.predictor = predictor
+
+    @abstractmethod
+    def select_exemplars(self, prompt: str, n_examples: int = 5) -> str:
+        """Select exemplars based on the given prompt.
+
+        Args:
+            prompt (str): The input prompt to base the exemplar selection on.
+            n_examples (int, optional): The number of exemplars to select. Defaults to 5.
+
+        Returns:
+            str: A new prompt that includes the original prompt and the selected exemplars.
+
+        Raises:
+            NotImplementedError: This method should be implemented by subclasses.
+        """
+        raise NotImplementedError("This method should be implemented by subclasses.")
diff --git a/promptolution/exemplar_selectors/random_search_selector.py b/promptolution/exemplar_selectors/random_search_selector.py
new file mode 100644
index 0000000..005fef8
--- /dev/null
+++ b/promptolution/exemplar_selectors/random_search_selector.py
@@ -0,0 +1,39 @@
+"""Random search exemplar selector."""
+
+from promptolution.exemplar_selectors.base_exemplar_selector import BaseExemplarSelector
+
+
+class RandomSearchSelector(BaseExemplarSelector):
+    """A selector that uses random search to find the best set of exemplars.
+
+    This class implements a strategy that generates multiple sets of random examples,
+    evaluates their performance, and selects the best performing set.
+    """
+
+    def select_exemplars(self, prompt, n_examples: int = 5, n_trials: int = 5):
+        """Select exemplars using a random search strategy.
+
+        This method generates multiple sets of random examples, evaluates their performance
+        when combined with the original prompt, and returns the best performing set.
+
+        Args:
+            prompt (str): The input prompt to base the exemplar selection on.
+            n_examples (int, optional): The number of exemplars to select in each trial. Defaults to 5.
+            n_trials (int, optional): The number of random trials to perform. Defaults to 5.
+
+        Returns:
+            str: The best performing prompt, which includes the original prompt and the selected exemplars.
+        """
+        best_score = 0
+        best_prompt = prompt
+
+        for _ in range(n_trials):
+            _, seq = self.task.evaluate(prompt, self.predictor, n_samples=n_examples, subsample=True, return_seq=True)
+            prompt_with_examples = "\n\n".join([prompt] + seq) + "\n\n"
+            # evaluate prompts as few shot prompt
+            score = self.task.evaluate(prompt_with_examples, self.predictor, subsample=True)
+            if score > best_score:
+                best_score = score
+                best_prompt = prompt_with_examples
+
+        return best_prompt
diff --git a/promptolution/exemplar_selectors/random_selector.py b/promptolution/exemplar_selectors/random_selector.py
new file mode 100644
index 0000000..5fe01ae
--- /dev/null
+++ b/promptolution/exemplar_selectors/random_selector.py
@@ -0,0 +1,46 @@
+"""Random exemplar selector."""
+
+from promptolution.exemplar_selectors.base_exemplar_selector import BaseExemplarSelector
+from promptolution.predictors.base_predictor import BasePredictor
+from promptolution.tasks.base_task import BaseTask
+
+
+class RandomSelector(BaseExemplarSelector):
+    """A selector that randomly selects correct exemplars.
+
+    This class implements a strategy that generates random examples and selects
+    those that are evaluated as correct until the desired number of exemplars is reached.
+    """
+
+    def __init__(self, task: BaseTask, predictor: BasePredictor, desired_score: int = 1):
+        """Initialize the RandomSelector.
+
+        Args:
+            task (BaseTask): An object representing the task to be performed.
+            predictor (BasePredictor): An object capable of making predictions based on prompts.
+            desired_score (int, optional): The desired score for the exemplars. Defaults to 1.
+        """
+        super().__init__(task, predictor)
+        self.desired_score = desired_score
+
+    def select_exemplars(self, prompt, n_examples: int = 5):
+        """Select exemplars using a random selection strategy.
+
+        This method generates random examples and selects those that are evaluated as correct
+        (score == self.desired_score) until the desired number of exemplars is reached.
+
+        Args:
+            prompt (str): The input prompt to base the exemplar selection on.
+            n_examples (int, optional): The number of exemplars to select. Defaults to 5.
+
+        Returns:
+            str: A new prompt that includes the original prompt and the selected exemplars.
+        """
+        examples = []
+        while len(examples) < n_examples:
+            score, seq = self.task.evaluate(prompt, self.predictor, n_samples=1, return_seq=True)
+            if score == self.desired_score:
+                examples.append(seq[0])
+        prompt = "\n\n".join([prompt] + examples) + "\n\n"
+
+        return prompt
diff --git a/promptolution/helpers.py b/promptolution/helpers.py
new file mode 100644
index 0000000..9d776a9
--- /dev/null
+++ b/promptolution/helpers.py
@@ -0,0 +1,85 @@
+"""Helper functions for the usage of the libary."""
+from logging import Logger
+from typing import List
+
+import numpy as np
+import pandas as pd
+
+from promptolution.config import Config
+from promptolution.exemplar_selectors import get_exemplar_selector
+from promptolution.llms import get_llm
+from promptolution.optimizers import get_optimizer
+from promptolution.predictors import Classificator
+from promptolution.tasks import get_task
+
+
+def run_experiment(config: Config):
+    """Run a full experiment based on the provided configuration.
+
+    Args:
+        config (Config): Configuration object for the experiment.
+
+    Returns:
+        pd.DataFrame: A DataFrame containing the prompts and their scores.
+    """
+    prompts = run_optimization(config)
+    df = run_evaluation(config, prompts)
+    return df
+
+
+def run_optimization(config: Config):
+    """Run the optimization phase of the experiment.
+
+    Args:
+        config (Config): Configuration object for the experiment.
+
+    Returns:
+        List[str]: The optimized list of prompts.
+    """
+    task = get_task(config)
+    llm = get_llm(config.meta_llm, token=config.api_token)
+    predictor = Classificator(llm, classes=task.classes)
+
+    if config.init_pop_size:
+        init_pop = np.random.choice(task.initial_population, size=config.init_pop_size, replace=True)
+    else:
+        init_pop = task.initial_population
+
+    optimizer = get_optimizer(
+        config,
+        meta_llm=llm,
+        initial_prompts=init_pop,
+        task=task,
+        predictor=predictor,
+        n_eval_samples=config.n_eval_samples,
+    )
+
+    prompts = optimizer.optimize(n_steps=config.n_steps)
+
+    if config.prepend_exemplars:
+        selector = get_exemplar_selector(config.exemplar_selector, task, predictor)
+        prompts = [selector.select_exemplars(p, n_examples=config.n_exemplars) for p in prompts]
+
+    return prompts
+
+
+def run_evaluation(config: Config, prompts: List[str]):
+    """Run the evaluation phase of the experiment.
+
+    Args:
+        config (Config): Configuration object for the experiment.
+        prompts (List[str]): List of prompts to evaluate.
+
+    Returns:
+        pd.DataFrame: A DataFrame containing the prompts and their scores.
+    """
+    task = get_task(config, split="test")
+
+    llm = get_llm(config.evaluation_llm, token=config.api_token)
+    predictor = Classificator(llm, classes=task.classes)
+
+    scores = task.evaluate(prompts, predictor, subsample=True, n_samples=config.n_eval_samples)
+    df = pd.DataFrame(dict(prompt=prompts, score=scores))
+    df = df.sort_values("score", ascending=False)
+
+    return df
diff --git a/promptolution/llms/api_llm.py b/promptolution/llms/api_llm.py
index a3dcdc7..1c34709 100644
--- a/promptolution/llms/api_llm.py
+++ b/promptolution/llms/api_llm.py
@@ -5,15 +5,14 @@
 from logging import INFO, Logger
 from typing import List
 
+import nest_asyncio
 import openai
 import requests
 from langchain_anthropic import ChatAnthropic
-from langchain_community.chat_models.deepinfra import ChatDeepInfraException
+from langchain_community.chat_models.deepinfra import ChatDeepInfra, ChatDeepInfraException
 from langchain_core.messages import HumanMessage
 from langchain_openai import ChatOpenAI
 
-from promptolution.llms.deepinfra import ChatDeepInfra
-
 logger = Logger(__name__)
 logger.setLevel(INFO)
 
@@ -39,12 +38,12 @@ async def invoke_model(prompt, model, semaphore):
 
         while attempts < max_retries:
             try:
-                response = await asyncio.to_thread(model.invoke, [HumanMessage(content=prompt)])
+                response = await model.ainvoke([HumanMessage(content=prompt)])
                 return response.content
             except ChatDeepInfraException as e:
                 print(f"DeepInfra error: {e}. Attempt {attempts}/{max_retries}. Retrying in {delay} seconds...")
                 attempts += 1
-                time.sleep(delay)
+                await asyncio.sleep(delay)
 
 
 class APILLM:
@@ -59,29 +58,25 @@ class APILLM:
 
     Methods:
         get_response: Synchronously get responses for a list of prompts.
-        _get_response: Asynchronously get responses for a list of prompts.
+        get_response_async: Asynchronously get responses for a list of prompts.
     """
 
-    def __init__(self, model_id: str):
+    def __init__(self, model_id: str, token: str = None):
         """Initialize the APILLM with a specific model.
 
         Args:
             model_id (str): Identifier for the model to use.
+            token (str): API key for the model.
 
         Raises:
             ValueError: If an unknown model identifier is provided.
         """
         if "claude" in model_id:
-            ANTHROPIC_API_KEY = open("anthropictoken.txt", "r").read()
-            self.model = ChatAnthropic(model=model_id, api_key=ANTHROPIC_API_KEY)
+            self.model = ChatAnthropic(model=model_id, api_key=token)
         elif "gpt" in model_id:
-            OPENAI_API_KEY = open("openaitoken.txt", "r").read()
-            self.model = ChatOpenAI(model=model_id, api_key=OPENAI_API_KEY)
-        elif "llama" in model_id:
-            DEEPINFRA_API_KEY = open("deepinfratoken.txt", "r").read()
-            self.model = ChatDeepInfra(model_name=model_id, deepinfra_api_token=DEEPINFRA_API_KEY)
+            self.model = ChatOpenAI(model=model_id, api_key=token)
         else:
-            raise ValueError(f"Unknown model: {model_id}")
+            self.model = ChatDeepInfra(model_name=model_id, deepinfra_api_token=token)
 
     def get_response(self, prompts: List[str]) -> List[str]:
         """Get responses for a list of prompts in a synchronous manner.
@@ -101,9 +96,11 @@ def get_response(self, prompts: List[str]) -> List[str]:
         delay = 3
         attempts = 0
 
+        nest_asyncio.apply()
+
         while attempts < max_retries:
             try:
-                responses = asyncio.run(self._get_response(prompts))
+                responses = asyncio.run(self.get_response_async(prompts))
                 return responses
             except requests.exceptions.ConnectionError as e:
                 attempts += 1
@@ -121,7 +118,7 @@ def get_response(self, prompts: List[str]) -> List[str]:
         # If the loop exits, it means max retries were reached
         raise requests.exceptions.ConnectionError("Max retries exceeded. Connection could not be established.")
 
-    async def _get_response(self, prompts: list[str], max_concurrent_calls=200) -> list[str]:
+    async def get_response_async(self, prompts: list[str], max_concurrent_calls=200) -> list[str]:
         """Asynchronously get responses for a list of prompts.
 
         This method uses a semaphore to limit the number of concurrent API calls.
@@ -133,7 +130,7 @@ async def _get_response(self, prompts: list[str], max_concurrent_calls=200) -> l
         Returns:
             list[str]: List of model responses.
         """
-        semaphore = asyncio.Semaphore(max_concurrent_calls)  # Limit the number of concurrent calls
+        semaphore = asyncio.Semaphore(max_concurrent_calls)
         tasks = []
 
         for prompt in prompts:
diff --git a/promptolution/llms/deepinfra.py b/promptolution/llms/deepinfra.py
deleted file mode 100644
index d91603c..0000000
--- a/promptolution/llms/deepinfra.py
+++ /dev/null
@@ -1,311 +0,0 @@
-"""DeepInfra API module for language models."""
-
-from __future__ import annotations
-
-from typing import Any, AsyncIterator, Callable, Dict, Iterator, List, Mapping, Optional, Sequence, Tuple, Type, Union
-
-from langchain_community.chat_models.deepinfra import (
-    ChatDeepInfraException,
-    _convert_dict_to_message,
-    _convert_message_to_dict,
-    _create_retry_decorator,
-    _handle_sse_line,
-    _parse_stream,
-    _parse_stream_async,
-)
-from langchain_community.utilities.requests import Requests
-from langchain_core.callbacks.manager import AsyncCallbackManagerForLLMRun, CallbackManagerForLLMRun
-from langchain_core.language_models import LanguageModelInput
-from langchain_core.language_models.chat_models import BaseChatModel, agenerate_from_stream, generate_from_stream
-from langchain_core.messages import BaseMessage
-from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
-from langchain_core.pydantic_v1 import BaseModel, Field, root_validator
-from langchain_core.runnables import Runnable
-from langchain_core.tools import BaseTool
-from langchain_core.utils import get_from_dict_or_env
-from langchain_core.utils.function_calling import convert_to_openai_tool
-
-
-class ChatDeepInfra(BaseChatModel):
-    """A chat model that uses the DeepInfra API."""
-
-    # client: Any  #: :meta private:
-    model_name: str = Field(alias="model")
-    """The model name to use for the chat model."""
-    deepinfra_api_token: Optional[str] = None
-    request_timeout: Optional[float] = Field(default=None, alias="timeout")
-    temperature: Optional[float] = 1
-    model_kwargs: Dict[str, Any] = Field(default_factory=dict)
-    """Run inference with this temperature. Must be in the closed
-       interval [0.0, 1.0]."""
-    top_p: Optional[float] = None
-    """Decode using nucleus sampling: consider the smallest set of tokens whose
-       probability sum is at least top_p. Must be in the closed interval [0.0, 1.0]."""
-    top_k: Optional[int] = None
-    """Decode using top-k sampling: consider the set of top_k most probable tokens.
-       Must be positive."""
-    n: int = 1
-    """Number of chat completions to generate for each prompt. Note that the API may
-       not return the full n completions if duplicates are generated."""
-    max_tokens: int = 256
-    streaming: bool = False
-    max_retries: int = 1
-
-    def __init__(self, model_name: str, **kwargs: Any):
-        """Initialize the DeepInfra chat model."""
-        super().__init__(model=model_name, **kwargs)
-
-    @property
-    def _default_params(self) -> Dict[str, Any]:
-        """Get the default parameters for calling OpenAI API."""
-        return {
-            "model": self.model_name,
-            "max_tokens": self.max_tokens,
-            "stream": self.streaming,
-            "n": self.n,
-            "temperature": self.temperature,
-            "request_timeout": self.request_timeout,
-            **self.model_kwargs,
-        }
-
-    @property
-    def _client_params(self) -> Dict[str, Any]:
-        """Get the parameters used for the openai client."""
-        return {**self._default_params}
-
-    def completion_with_retry(self, run_manager: Optional[CallbackManagerForLLMRun] = None, **kwargs: Any) -> Any:
-        """Use tenacity to retry the completion call."""
-        retry_decorator = _create_retry_decorator(self, run_manager=run_manager)
-
-        @retry_decorator
-        def _completion_with_retry(**kwargs: Any) -> Any:
-            try:
-                request_timeout = kwargs.pop("request_timeout")
-                request = Requests(headers=self._headers())
-                response = request.post(url=self._url(), data=self._body(kwargs), timeout=request_timeout)
-                self._handle_status(response.status_code, response.text)
-                return response
-            except Exception as e:
-                # import pdb; pdb.set_trace()
-                print("EX", e)  # noqa: T201
-                raise
-
-        return _completion_with_retry(**kwargs)
-
-    async def acompletion_with_retry(
-        self,
-        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
-        **kwargs: Any,
-    ) -> Any:
-        """Use tenacity to retry the async completion call."""
-        retry_decorator = _create_retry_decorator(self, run_manager=run_manager)
-
-        @retry_decorator
-        async def _completion_with_retry(**kwargs: Any) -> Any:
-            try:
-                request_timeout = kwargs.pop("request_timeout")
-                request = Requests(headers=self._headers())
-                async with request.apost(url=self._url(), data=self._body(kwargs), timeout=request_timeout) as response:
-                    self._handle_status(response.status, response.text)
-                    return await response.json()
-            except Exception as e:
-                print("EX", e)  # noqa: T201
-                raise
-
-        return await _completion_with_retry(**kwargs)
-
-    @root_validator(pre=True)
-    def init_defaults(cls, values: Dict) -> Dict:
-        """Validate api key, python package exists, temperature, top_p, and top_k."""
-        # For compatibility with LiteLLM
-        api_key = get_from_dict_or_env(
-            values,
-            "deepinfra_api_key",
-            "DEEPINFRA_API_KEY",
-            default="",
-        )
-        values["deepinfra_api_token"] = get_from_dict_or_env(
-            values,
-            "deepinfra_api_token",
-            "DEEPINFRA_API_TOKEN",
-            default=api_key,
-        )
-        # set model id
-        # values["model_name"] = get_from_dict_or_env(
-        #     values,
-        #     "model_name",
-        #     "DEEPINFRA_MODEL_NAME",
-        #     default="",
-        # )
-        return values
-
-    @root_validator(pre=False, skip_on_failure=True)
-    def validate_environment(cls, values: Dict) -> Dict:
-        """Validate the environment variables."""
-        if values["temperature"] is not None and not 0 <= values["temperature"] <= 1:
-            raise ValueError("temperature must be in the range [0.0, 1.0]")
-
-        if values["top_p"] is not None and not 0 <= values["top_p"] <= 1:
-            raise ValueError("top_p must be in the range [0.0, 1.0]")
-
-        if values["top_k"] is not None and values["top_k"] <= 0:
-            raise ValueError("top_k must be positive")
-
-        return values
-
-    def _generate(
-        self,
-        messages: List[BaseMessage],
-        stop: Optional[List[str]] = None,
-        run_manager: Optional[CallbackManagerForLLMRun] = None,
-        stream: Optional[bool] = None,
-        **kwargs: Any,
-    ) -> ChatResult:
-        should_stream = stream if stream is not None else self.streaming
-        if should_stream:
-            stream_iter = self._stream(messages, stop=stop, run_manager=run_manager, **kwargs)
-            return generate_from_stream(stream_iter)
-
-        message_dicts, params = self._create_message_dicts(messages, stop)
-        params = {**params, **kwargs}
-        response = self.completion_with_retry(messages=message_dicts, run_manager=run_manager, **params)
-        return self._create_chat_result(response.json())
-
-    def _create_chat_result(self, response: Mapping[str, Any]) -> ChatResult:
-        generations = []
-        for res in response["choices"]:
-            message = _convert_dict_to_message(res["message"])
-            gen = ChatGeneration(
-                message=message,
-                generation_info=dict(finish_reason=res.get("finish_reason")),
-            )
-            generations.append(gen)
-        token_usage = response.get("usage", {})
-        llm_output = {"token_usage": token_usage, "model": self.model_name}
-        res = ChatResult(generations=generations, llm_output=llm_output)
-        return res
-
-    def _create_message_dicts(
-        self, messages: List[BaseMessage], stop: Optional[List[str]]
-    ) -> Tuple[List[Dict[str, Any]], Dict[str, Any]]:
-        params = self._client_params
-        if stop is not None:
-            if "stop" in params:
-                raise ValueError("`stop` found in both the input and default params.")
-            params["stop"] = stop
-        message_dicts = [_convert_message_to_dict(m) for m in messages]
-        return message_dicts, params
-
-    def _stream(
-        self,
-        messages: List[BaseMessage],
-        stop: Optional[List[str]] = None,
-        run_manager: Optional[CallbackManagerForLLMRun] = None,
-        **kwargs: Any,
-    ) -> Iterator[ChatGenerationChunk]:
-        message_dicts, params = self._create_message_dicts(messages, stop)
-        params = {**params, **kwargs, "stream": True}
-
-        response = self.completion_with_retry(messages=message_dicts, run_manager=run_manager, **params)
-        for line in _parse_stream(response.iter_lines()):
-            chunk = _handle_sse_line(line)
-            if chunk:
-                cg_chunk = ChatGenerationChunk(message=chunk, generation_info=None)
-                if run_manager:
-                    run_manager.on_llm_new_token(str(chunk.content), chunk=cg_chunk)
-                yield cg_chunk
-
-    async def _astream(
-        self,
-        messages: List[BaseMessage],
-        stop: Optional[List[str]] = None,
-        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
-        **kwargs: Any,
-    ) -> AsyncIterator[ChatGenerationChunk]:
-        message_dicts, params = self._create_message_dicts(messages, stop)
-        params = {"messages": message_dicts, "stream": True, **params, **kwargs}
-
-        request_timeout = params.pop("request_timeout")
-        request = Requests(headers=self._headers())
-        async with request.apost(url=self._url(), data=self._body(params), timeout=request_timeout) as response:
-            async for line in _parse_stream_async(response.content):
-                chunk = _handle_sse_line(line)
-                if chunk:
-                    cg_chunk = ChatGenerationChunk(message=chunk, generation_info=None)
-                    if run_manager:
-                        await run_manager.on_llm_new_token(str(chunk.content), chunk=cg_chunk)
-                    yield cg_chunk
-
-    async def _agenerate(
-        self,
-        messages: List[BaseMessage],
-        stop: Optional[List[str]] = None,
-        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
-        stream: Optional[bool] = None,
-        **kwargs: Any,
-    ) -> ChatResult:
-        should_stream = stream if stream is not None else self.streaming
-        if should_stream:
-            stream_iter = self._astream(messages, stop=stop, run_manager=run_manager, **kwargs)
-            return await agenerate_from_stream(stream_iter)
-
-        message_dicts, params = self._create_message_dicts(messages, stop)
-        params = {"messages": message_dicts, **params, **kwargs}
-
-        res = await self.acompletion_with_retry(run_manager=run_manager, **params)
-        return self._create_chat_result(res)
-
-    @property
-    def _identifying_params(self) -> Dict[str, Any]:
-        """Get the identifying parameters."""
-        return {
-            "model": self.model_name,
-            "temperature": self.temperature,
-            "top_p": self.top_p,
-            "top_k": self.top_k,
-            "n": self.n,
-        }
-
-    @property
-    def _llm_type(self) -> str:
-        return "deepinfra-chat"
-
-    def _handle_status(self, code: int, text: Any) -> None:
-        if code >= 500:
-            raise ChatDeepInfraException(f"DeepInfra Server: Error {code}")
-        elif code >= 400:
-            raise ValueError(f"DeepInfra received an invalid payload: {text}")
-        elif code != 200:
-            raise Exception(f"DeepInfra returned an unexpected response with status " f"{code}: {text}")
-
-    def _url(self) -> str:
-        return "https://stage.api.deepinfra.com/v1/openai/chat/completions"
-
-    def _headers(self) -> Dict:
-        return {
-            "Authorization": f"bearer {self.deepinfra_api_token}",
-            "Content-Type": "application/json",
-        }
-
-    def _body(self, kwargs: Any) -> Dict:
-        return kwargs
-
-    def bind_tools(
-        self,
-        tools: Sequence[Union[Dict[str, Any], Type[BaseModel], Callable, BaseTool]],
-        **kwargs: Any,
-    ) -> Runnable[LanguageModelInput, BaseMessage]:
-        """Bind tool-like objects to this chat model.
-
-        Assumes model is compatible with OpenAI tool-calling API.
-
-        Args:
-            tools: A list of tool definitions to bind to this chat model.
-                Can be  a dictionary, pydantic model, callable, or BaseTool. Pydantic
-                models, callables, and BaseTools will be automatically converted to
-                their schema dictionary representation.
-            **kwargs: Any additional parameters to pass to the
-                :class:`~langchain.runnable.Runnable` constructor.
-        """
-        formatted_tools = [convert_to_openai_tool(tool) for tool in tools]
-        return super().bind(tools=formatted_tools, **kwargs)
diff --git a/promptolution/optimizers/__init__.py b/promptolution/optimizers/__init__.py
index 240447f..ae4ed93 100644
--- a/promptolution/optimizers/__init__.py
+++ b/promptolution/optimizers/__init__.py
@@ -1,24 +1,37 @@
 """Module for prompt optimizers."""
 
+from promptolution.templates import (
+    EVOPROMPT_DE_TEMPLATE,
+    EVOPROMPT_DE_TEMPLATE_TD,
+    EVOPROMPT_GA_TEMPLATE,
+    EVOPROMPT_GA_TEMPLATE_TD,
+    OPRO_TEMPLATE,
+)
+
 from .base_optimizer import DummyOptimizer
 from .evoprompt_de import EvoPromptDE
 from .evoprompt_ga import EvoPromptGA
 from .opro import Opro
 
 
-def get_optimizer(config, *args, **kwargs):
+def get_optimizer(
+    config=None, optimizer: str = None, include_task_desc: bool = None, meta_prompt: str = None, *args, **kwargs
+):
     """Factory function to create and return an optimizer instance based on the provided configuration.
 
     This function selects and instantiates the appropriate optimizer class based on the
-    'optimizer' field in the config object. It supports three types of optimizers:
-    'dummy', 'evopromptde', and 'evopromptga'.
+    'optimizer' field in the config object. Alternatively you can pass the relevant parameters.
+    It supports three types of optimizers: 'dummy', 'evopromptde', 'evopromptga', and 'opro'.
 
     Args:
-        config: A configuration object that must have an 'optimizer' attribute.
-                For 'evopromptde', it should also have a 'donor_random' attribute.
-                For 'evopromptga', it should also have a 'selection_mode' attribute.
+        config (Config): Configuration object containing the optimizer type.
+        optimizer (str): Identifier for the optimizer to use. Special cases:
+                         - "dummy" for DummyOptimizer
+                         - Any other string for the specified optimizer class
+        include_task_desc (bool): Flag to include task description in the prompt.
+        meta_prompt (str): Meta prompt for the optimizer.
         *args: Variable length argument list passed to the optimizer constructor.
-        **kwargs: Arbitrary keyword arguments passed to the optimizer constructor.
+        **kwargs: Arbitrary keyword arguments passed to the optimizer constructor
 
     Returns:
         An instance of the specified optimizer class.
@@ -28,10 +41,32 @@ def get_optimizer(config, *args, **kwargs):
     """
     if config.optimizer == "dummy":
         return DummyOptimizer(*args, **kwargs)
+
+    if optimizer is None:
+        optimizer = config.optimizer
+
+    if include_task_desc is None:
+        include_task_desc = config.include_task_desc
+
+    if config is not None and meta_prompt is None:
+        meta_prompt = config.meta_prompt
+
     if config.optimizer == "evopromptde":
-        return EvoPromptDE(donor_random=config.donor_random, *args, **kwargs)
+        prompt_template = EVOPROMPT_DE_TEMPLATE_TD if include_task_desc else EVOPROMPT_DE_TEMPLATE
+        prompt_template = meta_prompt if meta_prompt else prompt_template
+        donor_random = kwargs.get("donor_random", config.donor_random if config is not None else None)
+        return EvoPromptDE(donor_random=donor_random, prompt_template=prompt_template, *args, **kwargs)
+
     if config.optimizer == "evopromptga":
-        return EvoPromptGA(selection_mode=config.selection_mode, *args, **kwargs)
+        prompt_template = EVOPROMPT_GA_TEMPLATE_TD if config.include_task_desc else EVOPROMPT_GA_TEMPLATE
+        prompt_template = config.meta_prompt if meta_prompt else prompt_template
+        selection_mode = kwargs.get("selection_mode", config.selection_mode if config is not None else None)
+        return EvoPromptGA(selection_mode=selection_mode, prompt_template=prompt_template, *args, **kwargs)
+
     if config.optimizer == "opro":
-        return Opro(*args, **kwargs)
+        prompt_template = OPRO_TEMPLATE
+        prompt_template = config.meta_prompt if config.meta_prompt else prompt_template
+        n_samples = kwargs.get("n_samples", config.n_ds_samples_to_meta if config is not None else None)
+        return Opro(prompt_template=prompt_template, n_samples=n_samples, *args, **kwargs)
+
     raise ValueError(f"Unknown optimizer: {config.optimizer}")
diff --git a/promptolution/optimizers/base_optimizer.py b/promptolution/optimizers/base_optimizer.py
index effc329..2cac685 100644
--- a/promptolution/optimizers/base_optimizer.py
+++ b/promptolution/optimizers/base_optimizer.py
@@ -26,12 +26,20 @@ class BaseOptimizer(ABC):
         predictor (optional): Predictor for prompt evaluation. Defaults to None.
     """
 
-    def __init__(self, initial_prompts: list[str], task: BaseTask, callbacks: list[Callable] = [], predictor=None):
+    def __init__(
+        self,
+        initial_prompts: list[str],
+        task: BaseTask,
+        callbacks: list[Callable] = [],
+        predictor=None,
+        n_eval_samples=20,
+    ):
         """Initialize the BaseOptimizer."""
         self.prompts = initial_prompts
         self.task = task
         self.callbacks = callbacks
         self.predictor = predictor
+        self.n_eval_samples = n_eval_samples
 
     @abstractmethod
     def optimize(self, n_steps: int) -> List[str]:
diff --git a/promptolution/optimizers/evoprompt_de.py b/promptolution/optimizers/evoprompt_de.py
index a01b457..17d74b3 100644
--- a/promptolution/optimizers/evoprompt_de.py
+++ b/promptolution/optimizers/evoprompt_de.py
@@ -4,6 +4,7 @@
 
 import numpy as np
 
+from promptolution.llms.base_llm import BaseLLM
 from promptolution.optimizers.base_optimizer import BaseOptimizer
 
 
@@ -29,10 +30,11 @@ class EvoPromptDE(BaseOptimizer):
         **args: Additional arguments passed to the BaseOptimizer.
     """
 
-    def __init__(self, prompt_template, meta_llm, donor_random=False, **args):
+    def __init__(self, prompt_template: str = None, meta_llm: BaseLLM = None, donor_random: bool = False, **args):
         """Initialize the EvoPromptDE optimizer."""
         self.prompt_template = prompt_template
         self.donor_random = donor_random
+        assert meta_llm is not None, "A meta language model must be provided."
         self.meta_llm = meta_llm
         super().__init__(**args)
 
@@ -49,7 +51,7 @@ def optimize(self, n_steps: int) -> List[str]:
         Returns:
             List[str]: The optimized list of prompts after all steps.
         """
-        self.scores = self.task.evaluate(self.prompts, self.predictor)
+        self.scores = self.task.evaluate(self.prompts, self.predictor, subsample=True, n_samples=self.n_eval_samples)
         self.prompts = [prompt for _, prompt in sorted(zip(self.scores, self.prompts), reverse=True)]
         self.scores = sorted(self.scores, reverse=True)
 
@@ -78,7 +80,9 @@ def optimize(self, n_steps: int) -> List[str]:
             child_prompts = self.meta_llm.get_response(meta_prompts)
             child_prompts = [prompt.split("<prompt>")[-1].split("</prompt>")[0].strip() for prompt in child_prompts]
 
-            child_scores = self.task.evaluate(child_prompts, self.predictor)
+            child_scores = self.task.evaluate(
+                child_prompts, self.predictor, subsample=True, n_samples=self.n_eval_samples
+            )
 
             for i in range(len(self.prompts)):
                 if child_scores[i] > self.scores[i]:
diff --git a/promptolution/optimizers/evoprompt_ga.py b/promptolution/optimizers/evoprompt_ga.py
index 749c0c5..2ec789b 100644
--- a/promptolution/optimizers/evoprompt_ga.py
+++ b/promptolution/optimizers/evoprompt_ga.py
@@ -4,6 +4,7 @@
 
 import numpy as np
 
+from promptolution.llms.base_llm import BaseLLM
 from promptolution.optimizers.base_optimizer import BaseOptimizer
 
 
@@ -32,9 +33,10 @@ class EvoPromptGA(BaseOptimizer):
         AssertionError: If an invalid selection mode is provided.
     """
 
-    def __init__(self, prompt_template, meta_llm, selection_mode="wheel", **args):
+    def __init__(self, prompt_template: str = None, meta_llm: BaseLLM = None, selection_mode: str = "wheel", **args):
         """Initialize the EvoPromptGA optimizer."""
         self.prompt_template = prompt_template
+        assert meta_llm is not None, "Meta_llm is required"
         self.meta_llm = meta_llm
         assert selection_mode in ["random", "wheel", "tour"], "Invalid selection mode."
         self.selection_mode = selection_mode
@@ -54,7 +56,9 @@ def optimize(self, n_steps: int) -> List[str]:
             List[str]: The optimized list of prompts after all steps.
         """
         # get scores from task
-        self.scores = self.task.evaluate(self.prompts, self.predictor).tolist()
+        self.scores = self.task.evaluate(
+            self.prompts, self.predictor, subsample=True, n_samples=self.n_eval_samples
+        ).tolist()
         # sort prompts by score
         self.prompts = [prompt for _, prompt in sorted(zip(self.scores, self.prompts), reverse=True)]
         self.scores = sorted(self.scores, reverse=True)
@@ -62,7 +66,12 @@ def optimize(self, n_steps: int) -> List[str]:
         for _ in range(n_steps):
             new_prompts = self._crossover(self.prompts, self.scores)
             prompts = self.prompts + new_prompts
-            scores = self.scores + self.task.evaluate(new_prompts, self.predictor).tolist()
+            scores = (
+                self.scores
+                + self.task.evaluate(
+                    new_prompts, self.predictor, subsample=True, n_samples=self.n_eval_samples
+                ).tolist()
+            )
 
             # sort scores and prompts
             self.prompts = [prompt for _, prompt in sorted(zip(scores, prompts), reverse=True)][: len(self.prompts)]
diff --git a/promptolution/optimizers/opro.py b/promptolution/optimizers/opro.py
index 848342f..b2fa645 100644
--- a/promptolution/optimizers/opro.py
+++ b/promptolution/optimizers/opro.py
@@ -6,6 +6,7 @@
 
 from promptolution.llms.base_llm import BaseLLM
 from promptolution.optimizers.base_optimizer import BaseOptimizer
+from promptolution.templates import OPRO_TEMPLATE
 
 
 class Opro(BaseOptimizer):
@@ -25,19 +26,21 @@ class Opro(BaseOptimizer):
         optimize: Optimize the Meta-LLM by providing it with a new prompt.
     """
 
-    def __init__(self, llm: BaseLLM, n_samples: int = 2, **args):
+    def __init__(self, meta_llm: BaseLLM, n_samples: int = 2, prompt_template: str = None, **args):
         """Initialize the Opro optimizer."""
-        self.llm = llm
+        self.meta_llm = meta_llm
 
         assert n_samples > 0, "n_samples must be greater than 0."
         self.n_samples = n_samples
-        with open("templates/opro_template.txt") as f:
-            self.meta_prompt = "".join(f.readlines())
+
+        self.meta_prompt = prompt_template if prompt_template else OPRO_TEMPLATE
 
         super().__init__(**args)
         self.meta_prompt = self.meta_prompt.replace("<task_description>", self.task.description)
 
-        self.scores = [self.task.evaluate(p, self.predictor) for p in self.prompts]
+        self.scores = [
+            self.task.evaluate(p, self.predictor, subsample=True, n_samples=self.n_eval_samples) for p in self.prompts
+        ]
 
     def _sample_examples(self):
         """Sample examples from the task dataset with their label.
@@ -75,18 +78,15 @@ def optimize(self, n_steps: int) -> List[str]:
                 "<examples>", self._sample_examples()
             )
 
-            prompt = self.llm.get_response([meta_prompt])[0]
+            prompt = self.meta_llm.get_response([meta_prompt])[0]
             prompt = prompt.split("<prompt>")[-1].split("</prompt>")[0].strip()
-            score = self.task.evaluate(prompt, self.predictor)
+            score = self.task.evaluate(prompt, self.predictor, subsample=True, n_samples=self.n_eval_samples)
 
             self.prompts.append(prompt)
             self.scores.append(score)
 
             self._on_step_end()
 
-        # obtain best prompt
-        best_prompt = self.prompts[self.scores.index(max(self.scores))]
-
         self._on_epoch_end()
 
-        return best_prompt
+        return self.prompts
diff --git a/promptolution/predictors/__init__.py b/promptolution/predictors/__init__.py
index 9ae2dfe..d850759 100644
--- a/promptolution/predictors/__init__.py
+++ b/promptolution/predictors/__init__.py
@@ -34,6 +34,6 @@ def get_predictor(name, *args, **kwargs):
     if name == "dummy":
         return DummyPredictor("", *args, **kwargs)
 
-    downstream_llm = get_llm(name)  # , batch_size=config.downstream_bs)
+    downstream_llm = get_llm(name)
 
     return Classificator(downstream_llm, *args, **kwargs)
diff --git a/promptolution/predictors/base_predictor.py b/promptolution/predictors/base_predictor.py
index 941ee9a..eea7f74 100644
--- a/promptolution/predictors/base_predictor.py
+++ b/promptolution/predictors/base_predictor.py
@@ -1,10 +1,12 @@
 """Base module for predictors."""
 
 from abc import abstractmethod
-from typing import List
+from typing import List, Tuple
 
 import numpy as np
 
+from promptolution.llms.base_llm import BaseLLM
+
 
 class BasePredictor:
     """Abstract base class for predictors in the promptolution library.
@@ -12,37 +14,30 @@ class BasePredictor:
     This class defines the interface that all concrete predictor implementations should follow.
 
     Attributes:
-        model_id (str): Identifier for the model used by the predictor.
-        classes (List[str]): List of possible class labels for classification tasks.
+        llm: The language model used for generating predictions.
+
 
     Methods:
         predict: An abstract method that should be implemented by subclasses
                  to make predictions based on prompts and input data.
     """
 
-    def __init__(self, model_id, classes, *args, **kwargs):
+    def __init__(self, llm: BaseLLM):
         """Initialize the BasePredictor.
 
         Args:
-            model_id (str): Identifier for the model to use.
-            classes (List[str]): List of possible class labels.
-            *args: Variable length argument list.
-            **kwargs: Arbitrary keyword arguments.
+            llm: The language model to use for predictions.
+            classes (List[str]): The list of valid class labels.
         """
-        self.model_id = model_id
-        self.classes = classes
+        self.llm = llm
 
-    @abstractmethod
-    def predict(
-        self,
-        prompts: List[str],
-        xs: np.ndarray,
-    ) -> np.ndarray:
+    def predict(self, prompts: List[str], xs: np.ndarray, return_seq: bool = False) -> np.ndarray:
         """Abstract method to make predictions based on prompts and input data.
 
         Args:
             prompts (List[str]): List of prompts to use for prediction.
             xs (np.ndarray): Array of input data.
+            return_seq (bool, optional): whether to return the generating sequence
 
         Returns:
             np.ndarray: Array of predictions.
@@ -50,6 +45,24 @@ def predict(
         Raises:
             NotImplementedError: If not implemented by a subclass.
         """
+        if isinstance(prompts, str):
+            prompts = [prompts]
+
+        outputs = self.llm.get_response([prompt + "\n" + x for prompt in prompts for x in xs])
+        preds = self._extract_preds(outputs, (len(prompts), len(xs)))
+
+        if return_seq:
+            return preds, [i + "\n" + o for i, o in zip(xs, outputs)]
+
+        return preds
+
+    def _extract_preds(self, preds: List[str], shape: Tuple[int, int]) -> np.ndarray:
+        """Extract class labels from the predictions, based on the list of valid class labels.
+
+        Args:
+            preds: The raw predictions from the language model.
+            shape: The shape of the output array: (n_prompts, n_samples).
+        """
         raise NotImplementedError
 
 
diff --git a/promptolution/predictors/classificator.py b/promptolution/predictors/classificator.py
index 43518eb..f33bfc6 100644
--- a/promptolution/predictors/classificator.py
+++ b/promptolution/predictors/classificator.py
@@ -1,6 +1,6 @@
 """Module for classification predictors."""
 
-from typing import List
+from typing import List, Tuple
 
 import numpy as np
 
@@ -11,7 +11,10 @@ class Classificator(BasePredictor):
     """A predictor class for classification tasks using language models.
 
     This class takes a language model and a list of classes, and provides a method
-    to predict classes for given prompts and input data.
+    to predict classes for given prompts and input data. The class labels are extracted
+    by matching the words in the prediction with the list of valid class labels.
+    The first occurrence of a valid class label in the prediction is used as the predicted class.
+    If no valid class label is found, the first class label in the list is used as the default prediction.
 
     Attributes:
         llm: The language model used for generating predictions.
@@ -28,46 +31,26 @@ def __init__(self, llm, classes, *args, **kwargs):
             llm: The language model to use for predictions.
             classes (List[str]): The list of valid class labels.
         """
-        self.llm = llm
+        super().__init__(llm)
         self.classes = classes
 
-    def predict(
-        self,
-        prompts: List[str],
-        xs: np.ndarray,
-    ) -> np.ndarray:
-        """Predict classes for given prompts and input data.
-
-        This method generates predictions using the language model and then
-        extracts the predicted class from the model's output.
+    def _extract_preds(self, preds: List[str], shape: Tuple[int, int]) -> np.ndarray:
+        """Extract class labels from the predictions, based on the list of valid class labels.
 
         Args:
-            prompts (List[str]): The list of prompts to use for prediction.
-            xs (np.ndarray): The input data array.
-
-        Returns:
-            np.ndarray: A 2D array of predicted classes, with shape (len(prompts), len(xs)).
-
-        Note:
-            The method concatenates each prompt with each input data point,
-            passes it to the language model, and then extracts the first word
-            in the response that matches a class in self.classes.
+            preds: The raw predictions from the language model.
+            shape: The shape of the output array: (n_prompts, n_samples).
         """
-        if isinstance(prompts, str):
-            prompts = [prompts]
-
-        preds = self.llm.get_response([prompt + "\n" + x for prompt in prompts for x in xs])
-
         response = []
         for pred in preds:
-            predicted_class = ""
+            predicted_class = self.classes[0]  # use first class as default pred
             for word in pred.split(" "):
-                word = "".join([c for c in word if c.isalpha()])
+                word = "".join([c for c in word if c.isalnum()])
                 if word in self.classes:
                     predicted_class = word
                     break
 
             response.append(predicted_class)
 
-        response = np.array(response).reshape(len(prompts), len(xs))
+        response = np.array(response).reshape(*shape)
         return response
diff --git a/promptolution/tasks/__init__.py b/promptolution/tasks/__init__.py
index 5e48b85..44d7f69 100644
--- a/promptolution/tasks/__init__.py
+++ b/promptolution/tasks/__init__.py
@@ -7,22 +7,28 @@
 from promptolution.tasks.classification_tasks import ClassificationTask
 
 
-def get_tasks(config, split: Literal["dev", "test"] = "dev") -> List[BaseTask]:
-    """Create and return a list of task instances based on the provided configuration.
+def get_task(
+    config=None,
+    split: Literal["dev", "test"] = "dev",
+    ds_path: str = None,
+    task_name: str = None,
+    random_seed: int = None,
+) -> BaseTask:
+    """Create and return an task instance.
 
     This function supports creating multiple tasks, including a special 'dummy' task
-    for testing purposes and classification tasks based on JSON descriptions.
+    for testing purposes and classification tasks based on parsed config, or alternativly
+    the parsed arguments.
 
     Args:
-        config: Configuration object containing task settings.
-                Expected attributes:
-                - task_name (str): Comma-separated list of task names.
-                - ds_path (str): Path to the dataset directory.
-                - random_seed (int): Seed for random number generation.
-        split (Literal["dev", "test"], optional): Dataset split to use. Defaults to "dev".
+        config (Config): Configuration object containing the task details.
+        split (str): Split of the dataset to use for the task (default: 'dev').
+        ds_path (str): Path to the dataset containing the task description.
+        task_name (str): Name of the task to create.
+        random_seed (int): Random seed for the task.
 
     Returns:
-        List[BaseTask]: A list of instantiated task objects.
+        BaseTask: A list of instantiated task objects.
 
     Raises:
         FileNotFoundError: If the task description file is not found.
@@ -33,17 +39,17 @@ def get_tasks(config, split: Literal["dev", "test"] = "dev") -> List[BaseTask]:
         - For all other tasks, a ClassificationTask instance is created.
         - The task description is loaded from a 'description.json' file in the dataset path.
     """
-    task_names = config.task_name.split(",")
-
-    task_list = []
-    for task_name in task_names:
-        task_description_path = Path(config.ds_path) / Path("description.json")
-        task_description = json.loads(task_description_path.read_text())
-        if task_name == "dummy":
-            task = DummyTask()
-            task_list.append(task)
-            continue
-        task = ClassificationTask(task_name, task_description, split=split, seed=config.random_seed)
-        task_list.append(task)
-
-    return task_list
+    if config.task_name == "dummy":
+        task = DummyTask()
+        return task
+
+    if ds_path is None:
+        ds_path = config.ds_path
+    if task_name is None:
+        task_name = config.task_name
+    if random_seed is None:
+        random_seed = config.random_seed
+    task_description_path = Path(ds_path)
+    task = ClassificationTask(task_description_path, task_name, split=split, seed=random_seed)
+
+    return task
diff --git a/promptolution/tasks/classification_tasks.py b/promptolution/tasks/classification_tasks.py
index e7e25b1..f37deec 100644
--- a/promptolution/tasks/classification_tasks.py
+++ b/promptolution/tasks/classification_tasks.py
@@ -1,9 +1,11 @@
 """Module for classification tasks."""
 
+import json
 from pathlib import Path
-from typing import Dict, List, Literal, Optional
+from typing import Callable, Dict, List, Literal, Optional
 
 import numpy as np
+from sklearn.metrics import accuracy_score
 
 from promptolution.predictors.base_predictor import BasePredictor
 from promptolution.tasks.base_task import BaseTask
@@ -17,36 +19,48 @@ class ClassificationTask(BaseTask):
 
     Attributes:
         task_id (str): Unique identifier for the task.
+        path (Path): Path to the dataset description JSON file, and initial prompts.
         dataset_json (Dict): Dictionary containing dataset information.
         description (Optional[str]): Description of the task.
         initial_population (Optional[List[str]]): Initial set of prompts.
         xs (Optional[np.ndarray]): Input data for the task.
         ys (Optional[np.ndarray]): Ground truth labels for the task.
         classes (Optional[List]): List of possible class labels.
-        split (Literal["dev", "test"]): Dataset split to use.
         seed (int): Random seed for reproducibility.
+        split (Literal["dev", "test"]): Dataset split to use.
+        metric (Callable): Metric to use as an evaluation score for the prompts.
 
     Inherits from:
         BaseTask: The base class for tasks in the promptolution library.
     """
 
-    def __init__(self, task_id: str, dataset_json: Dict, seed: int = 42, split: Literal["dev", "test"] = "dev"):
+    def __init__(
+        self,
+        dataset_path: Path,
+        task_id: str = "Classification Task",
+        seed: int = 42,
+        split: Literal["dev", "test"] = "dev",
+        metric: Callable = accuracy_score,
+    ):
         """Initialize the ClassificationTask.
 
         Args:
             task_id (str): Unique identifier for the task.
-            dataset_json (Dict): Dictionary containing dataset information.
+            dataset_path (str): Path to the dataset description JSON file.
             seed (int, optional): Random seed for reproducibility. Defaults to 42.
             split (Literal["dev", "test"], optional): Dataset split to use. Defaults to "dev".
+            metric (Callable): Metric to use as an evaluation score for the prompts. Defaults to sklearn's accuracy.
         """
         self.task_id: str = task_id
-        self.dataset_json: Dict = dataset_json
+        self.path: Path = dataset_path
+        self.dataset_json: Dict = json.loads((dataset_path / Path("description.json")).read_text())
         self.description: Optional[str] = None
         self.initial_population: Optional[List[str]] = None
         self.xs: Optional[np.ndarray] = np.array([])
         self.ys: Optional[np.ndarray] = None
         self.classes: Optional[List] = None
         self.split: Literal["dev", "test"] = split
+        self.metric = metric
         self._parse_task()
         self.reset_seed(seed)
 
@@ -60,18 +74,17 @@ def _parse_task(self):
         This method loads the task description, classes, initial prompts,
         and the dataset split (dev or test) into the class attributes.
         """
-        task_path = Path(self.dataset_json["path"])
         self.description = self.dataset_json["description"]
         self.classes = self.dataset_json["classes"]
 
-        with open(task_path / Path(self.dataset_json["init_prompts"]), "r", encoding="utf-8") as file:
+        with open(self.path / Path(self.dataset_json["init_prompts"]), "r", encoding="utf-8") as file:
             lines = file.readlines()
         self.initial_population = [line.strip() for line in lines]
 
         seed = Path(self.dataset_json["seed"])
         split = Path(self.split + ".txt")
 
-        with open(task_path / seed / split, "r", encoding="utf-8") as file:
+        with open(self.path / seed / split, "r", encoding="utf-8") as file:
             lines = file.readlines()
         lines = [line.strip() for line in lines]
 
@@ -87,7 +100,12 @@ def _parse_task(self):
         self.ys = np.array(ys)
 
     def evaluate(
-        self, prompts: List[str], predictor: BasePredictor, n_samples: int = 20, subsample: bool = True
+        self,
+        prompts: List[str],
+        predictor: BasePredictor,
+        n_samples: int = 20,
+        subsample: bool = False,
+        return_seq: bool = False,
     ) -> np.ndarray:
         """Evaluate a set of prompts using a given predictor.
 
@@ -95,7 +113,9 @@ def evaluate(
             prompts (List[str]): List of prompts to evaluate.
             predictor (BasePredictor): Predictor to use for evaluation.
             n_samples (int, optional): Number of samples to use if subsampling. Defaults to 20.
-            subsample (bool, optional): Whether to use subsampling. Defaults to True.
+            subsample (bool, optional): Whether to use subsampling.
+            If set to true, samples a different subset per call. Defaults to False.
+            return_seq (bool, optional): whether to return the generating sequence
 
         Returns:
             np.ndarray: Array of accuracy scores for each prompt.
@@ -112,10 +132,17 @@ def evaluate(
         ys_subsample = self.ys[indices]
 
         # Make predictions on the subsample
-        preds = predictor.predict(prompts, xs_subsample)
+        preds = predictor.predict(prompts, xs_subsample, return_seq=return_seq)
+
+        if return_seq:
+            preds, seqs = preds
+
+        scores = np.array([self.metric(ys_subsample, pred) for pred in preds])
+
+        if return_seq:
+            return scores, seqs
 
-        # Calculate accuracy: number of correct predictions / total number of predictions per prompt
-        return np.mean(preds == ys_subsample, axis=1)
+        return scores
 
     def reset_seed(self, seed: int = None):
         """Reset the random seed."""
diff --git a/promptolution/templates.py b/promptolution/templates.py
new file mode 100644
index 0000000..05d7ae3
--- /dev/null
+++ b/promptolution/templates.py
@@ -0,0 +1,116 @@
+EVOPROMPT_DE_TEMPLATE = """Please follow the instruction step-by-step to generate a better prompt.
+Identifying the different parts between Prompt 1 and Prompt 2:
+Prompt 1: Your task is to classify the comment as one of the following categories: terrible, bad, okay, good, great.
+Prompt 2: In this task, you are given sentences from movie reviews. The task is to classify a sentence as one of the following categories: terrible, bad, okay, good, great.
+Different parts:
+"Your task is to classify the comment" vs "In this task, you are given sentences from movie reviews. The task is to classify a sentence"
+"comment" vs "sentences from movie reviews"
+
+2. Randomly mutate the different parts:
+"Your task is to classify the comment" -> "The objective is to categorize the statement"
+"comment" -> "phrases in movie reviews"
+
+3. Crossover the different parts with the following Prompt 3 and generate a final prompt bracketed with <prompt> and </prompt>:
+Prompt 3: You are a sentiment classifier. To do this, you must first understand the meaning of the sentence and any relevant context. And then you should classify it as one of the following categories: terrible, bad, okay, good, great.
+
+Final Prompt: <prompt>As a sentiment classifier, analyze phrases in movie reviews and categorize them into one of the following categories: terrible, bad, okay, good, great, while considering the meaning and relevant context.</prompt>
+
+Please follow the instruction step-by-step to generate a better prompt.
+1. Identify the different parts between the Prompt 1 and Prompt 2:
+Prompt 1: <prompt1>
+Prompt 2: <prompt2>
+2. Randomly mutate the different parts
+3. Crossover the different parts with the following Prompt 3 and generate a final prompt bracketed with <prompt> and </prompt>:
+Prompt 3: <prompt0>
+
+1."""
+
+EVOPROMPT_DE_TEMPLATE_TD = """Please follow the instruction step-by-step to generate a better prompt for the following task: The dataset consists of movie reviews with five levels of sentiment labels: terrible, bad, neutral, okay, good, and great. The task is to classify each movie review into one of these five sentiment categories. The class mentioned first in the response of the LLM will be the prediction.
+Identifying the different parts between Prompt 1 and Prompt 2:
+Prompt 1: Your task is to classify the comment as one of the following categories: terrible, bad, okay, good, great.
+Prompt 2: In this task, you are given sentences from movie reviews. The task is to classify a sentence as one of the following categories: terrible, bad, okay, good, great.
+Different parts:
+"Your task is to classify the comment" vs "In this task, you are given sentences from movie reviews. The task is to classify a sentence"
+"comment" vs "sentences from movie reviews"
+
+2. Randomly mutate the different parts:
+"Your task is to classify the comment" -> "The objective is to categorize the statement"
+"comment" -> "phrases in movie reviews"
+
+3. Crossover the different parts with the following Prompt 3 and generate a final prompt bracketed with <prompt> and </prompt>:
+Prompt 3: You are a sentiment classifier. To do this, you must first understand the meaning of the sentence and any relevant context. And then you should classify it as one of the following categories: terrible, bad, okay, good, great.
+
+Final Prompt: <prompt>As a sentiment classifier, analyze phrases in movie reviews and categorize them into one of the following categories: terrible, bad, okay, good, great, while considering the meaning and relevant context.</prompt>
+
+Please follow the instruction step-by-step to generate a better prompt for the following task: <task_desc>
+1. Identify the different parts between the Prompt 1 and Prompt 2:
+Prompt 1: <prompt1>
+Prompt 2: <prompt2>
+2. Randomly mutate the different parts
+3. Crossover the different parts with the following Prompt 3 and generate a final prompt bracketed with <prompt> and </prompt>:
+Prompt 3: <prompt0>
+
+1."""
+
+EVOPROMPT_GA_TEMPLATE = """Please follow the instruction step-by-step to generate a better prompt.
+1. Crossover the following prompts and generate a new prompt:
+Prompt 1: Rewrite the input text into simpler text.
+Prompt 2: Rewrite my complex sentence in simpler terms, but keep the meaning.
+2. Mutate the prompt generated in Step 1 and generate a final prompt bracketed with <prompt> and </prompt>.
+
+1. Crossover Prompt: Rewrite the complex text into simpler text while keeping its meaning.
+2. <prompt>Transform the provided text into simpler language, maintaining its essence.</prompt>
+
+Please follow the instruction step-by-step to generate a better prompt.
+1. Crossover the following prompts and generate a new prompt:
+Prompt 1: <prompt1>
+Prompt 2: <prompt2>
+2. Mutate the prompt generated in Step 1 and generate a final prompt bracketed with <prompt> and </prompt>.
+
+1."""
+
+EVOPROMPT_GA_TEMPLATE_TD = """Please follow the instruction step-by-step to generate a better prompt for the following task: The dataset consists of texts to be simplified. The meaning of the texts is to be kept.
+1. Crossover the following prompts and generate a new prompt:
+Prompt 1: Rewrite the input text into simpler text.
+Prompt 2: Rewrite my complex sentence in simpler terms, but keep the meaning.
+2. Mutate the prompt generated in Step 1 and generate a final prompt bracketed with <prompt> and </prompt>.
+
+1. Crossover Prompt: Rewrite the complex text into simpler text while keeping its meaning.
+2. <prompt>Transform the provided text into simpler language, maintaining its essence.</prompt>
+
+Please follow the instruction step-by-step to generate a better prompt for the following task: <task_desc>
+1. Crossover the following prompts and generate a new prompt:
+Prompt 1: <prompt1>
+Prompt 2: <prompt2>
+2. Mutate the prompt generated in Step 1 and generate a final prompt bracketed with <prompt> and </prompt>.
+
+1."""
+
+OPRO_TEMPLATE = """Your task is to generate an instruction for the following task:
+<task_description>
+
+Below are some previous instructions with their scores. The score ranges from 0 to 100.
+
+<old_instructions>
+
+Here are some examples of the target dataset:
+<examples>
+
+Generate a new instruction bracketed with <prompt> and ending it with </prompt> that is different from all the instructions above and has a higher score than all the instructions above. The instruction should be concise, effective, and generally applicable to the task described.
+
+Your new instruction:"""
+
+PROMPT_VARIATION_TEMPLATE = """Generate a single variation of the following instruction while keeping the semantic meaning.
+Generate the variation starting with <prompt> and ending with </prompt> tags.
+
+Input: <prev_prompt>
+
+Output:"""
+
+PROMPT_CREATION_TEMPLATE = """You are asked to give the corresponding prompt that gives the following outputs given these inputs.
+Return it starting with <prompt> and ending with </prompt> tags.
+Include the name of the output classes in the prompt.
+
+<input_output_pairs>
+
+The instruction was"""
diff --git a/promptolution/utils/prompt_creation.py b/promptolution/utils/prompt_creation.py
index b93f932..d85edd9 100644
--- a/promptolution/utils/prompt_creation.py
+++ b/promptolution/utils/prompt_creation.py
@@ -7,6 +7,7 @@
 from promptolution.llms.base_llm import BaseLLM
 from promptolution.tasks.base_task import BaseTask
 from promptolution.tasks.classification_tasks import ClassificationTask
+from promptolution.templates import PROMPT_CREATION_TEMPLATE, PROMPT_VARIATION_TEMPLATE
 
 
 def create_prompt_variation(prompt: Union[List[str], str], llm: BaseLLM, meta_prompt: str = None) -> List[str]:
@@ -23,13 +24,7 @@ def create_prompt_variation(prompt: Union[List[str], str], llm: BaseLLM, meta_pr
     Returns:
         List[str]: A list of generated variations of the input prompt(s).
     """
-    if meta_prompt is None:
-        meta_prompt = """Generate a single variation of the following instruction while keeping the semantic meaning.
-        Generate the variation starting with <prompt> and ending with </prompt> tags.
-
-        Input: <prev_prompt>
-
-        Output:"""
+    meta_prompt = PROMPT_VARIATION_TEMPLATE if meta_prompt is None else meta_prompt
 
     if isinstance(prompt, str):
         prompt = [prompt]
@@ -81,17 +76,11 @@ def create_prompts_from_samples(task: BaseTask, llm: BaseLLM, meta_prompt: str =
         xs = task.xs[indices].tolist()
         ys = task.ys[indices].tolist()
 
-    if meta_prompt is None:
-        meta_prompt = (
-            "You are asked to give the corresponding prompt that gives the following outputs given these inputs."
-            + "Return it starting with <prompt> and ending with </prompt> tags."
-            + "Include the name of the output classes in the prompt."
-        )
-
-    for x, y in zip(xs, ys):
-        meta_prompt += f"\n\nInput: {x}\nOutput: {y}"
-
-    meta_prompt += "\nThe instruction was"
+    meta_prompt = PROMPT_CREATION_TEMPLATE if meta_prompt is None else meta_prompt
+    examples = "\n\n".join([f"Input: {x}\nOutput: {y}" for x, y in zip(xs, ys)])
+    meta_prompt = meta_prompt.replace("<input_output_pairs", examples)
 
     prompt = llm.get_response([meta_prompt])[0]
     prompt = prompt.split("</prompt>")[0].split("<prompt>")[-1]
+
+    return prompt
diff --git a/pyproject.toml b/pyproject.toml
index cfd10c6..8635296 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "promptolution"
-version = "0.2.0"
+version = "1.0.0"
 description = ""
 authors = ["Tom Zehle, Moritz Schlager, Timo Heiß"]
 readme = "README.md"
@@ -14,12 +14,11 @@ langchain-core = "^0.2.29"
 langchain-community = "^0.2.12"
 pandas = "^2.2.2"
 tqdm = "^4.66.5"
-matplotlib = "^3.9.2"
-seaborn = "^0.13.2"
-
-
+scikit-learn = "^1.5.2"
 
 [tool.poetry.group.dev.dependencies]
+matplotlib = "^3.9.2"
+seaborn = "^0.13.2"
 transformers = "^4.44.0"
 black = "^24.4.2"
 flake8 = "^7.1.0"
diff --git a/scripts/experiment_evaluation.py b/scripts/experiment_evaluation.py
index e3ef1b5..027b2dd 100644
--- a/scripts/experiment_evaluation.py
+++ b/scripts/experiment_evaluation.py
@@ -10,7 +10,7 @@
 
 from promptolution.config import Config
 from promptolution.predictors import get_predictor
-from promptolution.tasks import get_tasks
+from promptolution.tasks import get_task
 
 logger = Logger(__name__)
 logger.setLevel(INFO)
@@ -53,7 +53,7 @@ def evaluate_best_prompts(
     )
 
     # create a test task to retrieve the samples to evaluate
-    test_task = get_tasks(config, split="test")[0]
+    test_task = get_task(config, split="test")
     test_predictor = get_predictor(downstream_llm, classes=test_task.classes)
 
     # evaluate the best prompt on the test set
@@ -85,7 +85,6 @@ def main():
     args = arg_parser.parse_args()
     all_configs = ConfigParser()
     all_configs.read(args.experiment)
-    print(all_configs)
 
     experiment_name = all_configs["experiment"]["name"]
     target_experiment = all_configs["target_experiment"]["name"]
diff --git a/scripts/experiment_initial_prompts.py b/scripts/experiment_initial_prompts.py
index 0645008..b05ed0e 100644
--- a/scripts/experiment_initial_prompts.py
+++ b/scripts/experiment_initial_prompts.py
@@ -10,7 +10,7 @@
 
 from promptolution.config import Config
 from promptolution.predictors import get_predictor
-from promptolution.tasks import get_tasks
+from promptolution.tasks import get_task
 
 logger = Logger(__name__)
 logger.setLevel(INFO)
@@ -28,12 +28,12 @@ def evaluate_prompts(
     # create config for the experiment
     config = Config(
         task_name=task_name,
-        ds_path=f"data_sets/cls/{task_name}",
+        ds_path=f"data_sets/{task_name}",
         random_seed=random_seed,
     )
 
     # create a test task to retrieve the samples to evaluate
-    test_task = get_tasks(config, split="test")[0]
+    test_task = get_task(config, split="test")
     test_predictor = get_predictor(downstream_llm, classes=test_task.classes)
 
     # evaluate the best prompt on the test set
@@ -96,7 +96,7 @@ def main():
 
         for task in tasks:
             # sample initial prompt (read txt from datasets/cls/task_name/prompts.txt)
-            task_path = Path(f"data_sets/cls/{task}/prompts.txt")
+            task_path = Path(f"data_sets/{task}/prompts.txt")
             with open(task_path, "r", encoding="utf-8") as file:
                 lines = file.readlines()
             lines = [line.strip() for line in lines]
diff --git a/scripts/experiment_runs.py b/scripts/experiment_runs.py
index 7f13194..fb17b25 100644
--- a/scripts/experiment_runs.py
+++ b/scripts/experiment_runs.py
@@ -12,7 +12,8 @@
 from promptolution.llms import get_llm
 from promptolution.optimizers import get_optimizer
 from promptolution.predictors import get_predictor
-from promptolution.tasks import get_tasks
+from promptolution.tasks import get_task
+from promptolution.templates import EVOPROMPT_DE_TEMPLATE, EVOPROMPT_GA_TEMPLATE, EVOPROMPT_DE_TEMPLATE_TD, EVOPROMPT_GA_TEMPLATE_TD
 
 logger = Logger(__name__)
 logger.setLevel(INFO)
@@ -38,14 +39,17 @@ def main():
             for evaluator_llm, meta_llm in zip(evaluator_llms, meta_llms):
                 for downstream_llm in downstream_llms:
                     for random_seed in [42, 47, 69]:
+                        if "task_desc" in meta_prompt_path:
+                            prompt_template = EVOPROMPT_DE_TEMPLATE_TD if "evopromptde" in optimizer_name else EVOPROMPT_GA_TEMPLATE_TD
+                        else:
+                            prompt_template = EVOPROMPT_DE_TEMPLATE if "evopromptde" in optimizer_name else EVOPROMPT_GA_TEMPLATE
                         config = Config(
                             task_name=task_name,
-                            ds_path=f"data_sets/cls/{task_name}",
+                            ds_path=f"data_sets/{task_name}",
                             n_steps=int(all_configs["task"]["steps"]),
                             optimizer=optimizer_name,
                             meta_llm=meta_llm,
                             downstream_llm=downstream_llm,
-                            meta_prompt_path=meta_prompt_path,
                             init_pop_size=int(all_configs["optimizer"]["init_population"]),
                             logging_dir=(
                                 f"logs/{experiment_name}/"
@@ -56,6 +60,7 @@ def main():
                             evaluation_llm=evaluator_llm,
                             selection_mode="random",
                             donor_random=False,
+                            meta_prompt=prompt_template,
                         )
                         # skip already performed experiments
                         if Path(config.logging_dir).exists():
@@ -65,7 +70,8 @@ def main():
 
 def run_experiment(config: Config):
     """Run a single experiment."""
-    task = get_tasks(config)[0]
+    task = get_task(config, split="dev")
+
     init_populations = task.initial_population
     # subsample using random seed
     np.random.seed(config.random_seed)
@@ -80,7 +86,7 @@ def run_experiment(config: Config):
         best_prompt_callback,
         ProgressBarCallback(config.n_steps),
     ]
-    prompt_template = open(config.meta_prompt_path, "r").read()
+    prompt_template = config.meta_prompt
     prompt_template = prompt_template.replace("<task_desc>", task.description)
 
     if "local" in config.meta_llm:
@@ -94,7 +100,6 @@ def run_experiment(config: Config):
         task=task,
         initial_prompts=init_population,
         callbacks=callbacks,
-        prompt_template=prompt_template,
         predictor=predictor,
     )
 
@@ -104,7 +109,7 @@ def run_experiment(config: Config):
     best_prompt, best_score = best_prompt_callback.get_best_prompt()
     logger.critical(f"Final prompt: {best_prompt}, with score: {best_score}")
 
-    test_task = get_tasks(config, split="test")[0]
+    test_task = get_task(config.ds_path, split="test", random_seed=config.random_seed, task_name=config.task_name)
     test_predictor = get_predictor(config.downstream_llm, classes=test_task.classes)
     test_score = test_task.evaluate(best_prompt, test_predictor, subsample=False)
 
diff --git a/scripts/opro_test_run.py b/scripts/opro_test_run.py
index 52985e9..474af3e 100644
--- a/scripts/opro_test_run.py
+++ b/scripts/opro_test_run.py
@@ -1,28 +1,34 @@
 """Test run for the Opro optimizer."""
 
-from configparser import ConfigParser
 from logging import Logger
 
 from promptolution.callbacks import LoggerCallback
 from promptolution.llms import get_llm
 from promptolution.optimizers import Opro
 from promptolution.predictors import get_predictor
-from promptolution.tasks import get_tasks
+from promptolution.tasks import get_task
+
+from promptolution.config import Config
 
 logger = Logger(__name__)
 
 
 def main():
     """Run a test run for the Opro optimizer."""
-    config = ConfigParser()
-    config.task_name = "agnews"
-    config.ds_path = "data_sets/cls/agnews"
-    config.random_seed = 42
+    config = Config(
+        meta_llm="meta-llama/Meta-Llama-3-8B-Instruct",
+        ds_path="data_sets/agnews",
+        task_name="agnews",
+        n_steps=10,
+        optimizer="opro",
+        downstream_llm="meta-llama/Meta-Llama-3-8B-Instruct",
+        evaluation_llm="meta-llama/Meta-Llama-3-8B-Instruct",
 
-    llm = get_llm("meta-llama/Meta-Llama-3-8B-Instruct")
-    task = get_tasks(config)[0]
-    predictor = get_predictor("meta-llama/Meta-Llama-3-8B-Instruct", classes=task.classes)
+    )
+    task = get_task(config, split="dev")
+    predictor = get_predictor(config.evaluation_llm, classes=task.classes)
 
+    llm = get_llm(config.meta_llm)
     optimizer = Opro(
         llm,
         initial_prompts=task.initial_population,
diff --git a/scripts/prompt_creation_run.py b/scripts/prompt_creation_run.py
index ecfc1d2..4c17694 100644
--- a/scripts/prompt_creation_run.py
+++ b/scripts/prompt_creation_run.py
@@ -5,7 +5,7 @@
 
 from promptolution.llms import get_llm
 from promptolution.predictors import get_predictor
-from promptolution.tasks import get_tasks
+from promptolution.tasks import get_task
 from promptolution.utils.prompt_creation import create_prompt_variation, create_prompts_from_samples
 
 logger = Logger(__name__)
@@ -14,12 +14,13 @@
 def main():
     """Main function to run the experiment."""
     config = ConfigParser()
-    config.task_name = "subj"
-    config.ds_path = "data_sets/cls/subj"
+    config.task_name = "agnews"
+    config.ds_path = "data_sets/agnews"
     config.random_seed = 42
 
     llm = get_llm("meta-llama/Meta-Llama-3-8B-Instruct")
-    task = get_tasks(config)[0]
+    task = get_task(config, split="dev")
+
     predictor = get_predictor("meta-llama/Meta-Llama-3-8B-Instruct", classes=task.classes)
 
     init_prompts = create_prompts_from_samples(task, llm)
diff --git a/templates/evoprompt_de_template.txt b/templates/evoprompt_de_template.txt
deleted file mode 100644
index b229a9c..0000000
--- a/templates/evoprompt_de_template.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-Please follow the instruction step-by-step to generate a better prompt.
-Identifying the different parts between Prompt 1 and Prompt 2:
-Prompt 1: Your task is to classify the comment as one of the following categories: terrible, bad, okay, good, great.
-Prompt 2: In this task, you are given sentences from movie reviews. The task is to classify a sentence as one of the following categories: terrible, bad, okay, good, great.
-Different parts:
-"Your task is to classify the comment" vs "In this task, you are given sentences from movie reviews. The task is to classify a sentence"
-"comment" vs "sentences from movie reviews"
-
-2. Randomly mutate the different parts:
-"Your task is to classify the comment" -> "The objective is to categorize the statement"
-"comment" -> "phrases in movie reviews"
-
-3. Crossover the different parts with the following Prompt 3 and generate a final prompt bracketed with <prompt> and </prompt>:
-Prompt 3: You are a sentiment classifier. To do this, you must first understand the meaning of the sentence and any relevant context. And then you should classify it as one of the following categories: terrible, bad, okay, good, great.
-
-Final Prompt: <prompt>As a sentiment classifier, analyze phrases in movie reviews and categorize them into one of the following categories: terrible, bad, okay, good, great, while considering the meaning and relevant context.</prompt>
-
-Please follow the instruction step-by-step to generate a better prompt.
-1. Identify the different parts between the Prompt 1 and Prompt 2:
-Prompt 1: <prompt1>
-Prompt 2: <prompt2>
-2. Randomly mutate the different parts
-3. Crossover the different parts with the following Prompt 3 and generate a final prompt bracketed with <prompt> and </prompt>:
-Prompt 3: <prompt0>
-
-1.
\ No newline at end of file
diff --git a/templates/evoprompt_de_template_task_desc.txt b/templates/evoprompt_de_template_task_desc.txt
deleted file mode 100644
index c8895a1..0000000
--- a/templates/evoprompt_de_template_task_desc.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-Please follow the instruction step-by-step to generate a better prompt for the following task: <task_desc>.
-Identifying the different parts between Prompt 1 and Prompt 2:
-Prompt 1: Your task is to classify the comment as one of the following categories: terrible, bad, okay, good, great.
-Prompt 2: In this task, you are given sentences from movie reviews. The task is to classify a sentence as one of the following categories: terrible, bad, okay, good, great.
-Different parts:
-"Your task is to classify the comment" vs "In this task, you are given sentences from movie reviews. The task is to classify a sentence"
-"comment" vs "sentences from movie reviews"
-
-2. Randomly mutate the different parts:
-"Your task is to classify the comment" -> "The objective is to categorize the statement"
-"comment" -> "phrases in movie reviews"
-
-3. Crossover the different parts with the following Prompt 3 and generate a final prompt bracketed with <prompt> and </prompt>:
-Prompt 3: You are a sentiment classifier. To do this, you must first understand the meaning of the sentence and any relevant context. And then you should classify it as one of the following categories: terrible, bad, okay, good, great.
-
-Final Prompt: <prompt>As a sentiment classifier, analyze phrases in movie reviews and categorize them into one of the following categories: terrible, bad, okay, good, great, while considering the meaning and relevant context.</prompt>
-
-Please follow the instruction step-by-step to generate a better prompt.
-1. Identify the different parts between the Prompt 1 and Prompt 2:
-Prompt 1: <prompt1>
-Prompt 2: <prompt2>
-2. Randomly mutate the different parts
-3. Crossover the different parts with the following Prompt 3 and generate a final prompt bracketed with <prompt> and </prompt>:
-Prompt 3: <prompt0>
-
-1.
\ No newline at end of file
diff --git a/templates/evoprompt_ga_template.txt b/templates/evoprompt_ga_template.txt
deleted file mode 100644
index 3ff3264..0000000
--- a/templates/evoprompt_ga_template.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-Please follow the instruction step-by-step to generate a better prompt.
-1. Crossover the following prompts and generate a new prompt:
-Prompt 1: Rewrite the input text into simpler text.
-Prompt 2: Rewrite my complex sentence in simpler terms, but keep the meaning.
-2. Mutate the prompt generated in Step 1 and generate a final prompt bracketed with <prompt> and </prompt>.
-
-1. Crossover Prompt: Rewrite the complex text into simpler text while keeping its meaning.
-2. <prompt>Transform the provided text into simpler language, maintaining its essence.</prompt>
-
-Please follow the instruction step-by-step to generate a better prompt.
-1. Crossover the following prompts and generate a new prompt:
-Prompt 1: <prompt1>
-Prompt 2: <prompt2>
-2. Mutate the prompt generated in Step 1 and generate a final prompt bracketed with <prompt> and </prompt>.
-
-1.
\ No newline at end of file
diff --git a/templates/evoprompt_ga_template_task_desc.txt b/templates/evoprompt_ga_template_task_desc.txt
deleted file mode 100644
index 979af90..0000000
--- a/templates/evoprompt_ga_template_task_desc.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-Please follow the instruction step-by-step to generate a better prompt for the following task: <task_desc>.
-1. Crossover the following prompts and generate a new prompt:
-Prompt 1: Rewrite the input text into simpler text.
-Prompt 2: Rewrite my complex sentence in simpler terms, but keep the meaning.
-2. Mutate the prompt generated in Step 1 and generate a final prompt bracketed with <prompt> and </prompt>.
-
-1. Crossover Prompt: Rewrite the complex text into simpler text while keeping its meaning.
-2. <prompt>Transform the provided text into simpler language, maintaining its essence.</prompt>
-
-Please follow the instruction step-by-step to generate a better prompt.
-1. Crossover the following prompts and generate a new prompt:
-Prompt 1: <prompt1>
-Prompt 2: <prompt2>
-2. Mutate the prompt generated in Step 1 and generate a final prompt bracketed with <prompt> and </prompt>.
-
-1.
\ No newline at end of file
diff --git a/templates/opro_template.txt b/templates/opro_template.txt
deleted file mode 100644
index e1c7ba5..0000000
--- a/templates/opro_template.txt
+++ /dev/null
@@ -1,13 +0,0 @@
-Your task is to generate an instruction for the following task:
-<task_description>
-
-Below are some previous instructions with their scores. The score ranges from 0 to 100.
-
-<old_instructions>
-
-Here are some examples of the target dataset:
-<examples>
-
-Generate a new instruction bracketed with <prompt> and ending it with </prompt> that is different from all the instructions above and has a higher score than all the instructions above. The instruction should be concise, effective, and generally applicable to the task described.
-
-Your new instruction: