In [3]:
import os
from crewai import Agent, Task, Crew, Process
from langchain.llms import Ollama

In [4]:
ollama_model = Ollama(model='mistral')

In [8]:
researcher = Agent(
    role="Researcher",
    goal="Using all your power and knowledge, find the best way to create a dataset to improve low-resource NLP models. This may entail crafting a dataset from scratch, or finding a way to augment an existing dataset.",
    backstory="You are a powerful researcher, with a lot of knowledge about NLP and machine learning. You are specially interested in how to improve the state of NLP for low-resource (scarce data) scenarios. You have a lot of resources at your disposal, and you are able to delegate tasks to others.",
    verbose=True,
    allow_delegation=False,
    llm=ollama_model
)

supervisor = Agent(
    role="Supervisor",
    goal="Using all your power and knowledge, supervise the researcher and make sure they are doing their job.",
    backstory="You are a supervisor. You are in charge of the researcher, and you have to make sure they are doing their job coherently.",
    verbose=True,
    allow_delegation=False,
    llm=ollama_model
)

writer = Agent(
    role="Writer",
    goal="Using all your power and knowledge, write a dataset with the information provided by the Researcher.",
    backstory="You are a writer. You are in charge of writing the dataset.",
    verbose=True,
    allow_delegation=False,
    llm=ollama_model
)


In [9]:
task_1 = Task(description='Figure out the best way to create a dataset to improve low-resource NLP models.', agent=researcher)
task_2 = Task(description='Supervise the researcher and make sure they are doing their job correctly.', agent=supervisor)
task_3 = Task(description='Write a detailed methodology to improve the performance of LMs in data scarce scenarios.', agent=writer)

crew = Crew(
    agents=[researcher, supervisor, writer],
    tasks=[task_1, task_2, task_3],
    verbose=True,
    allow_delegation=True,
    process=Process.sequential
)

In [10]:
result = crew.kickoff()

Working Agent: Researcher
Starting Task: Figure out the best way to create a dataset to improve low-resource NLP models. ...


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3m Thought: Do I need to use a tool? No

Final Answer: Creating a high-quality dataset for low-resource NLP models can be challenging, but there are several strategies that have been shown to be effective. Here are some ideas:

1. Transfer Learning: Use pre-trained models on large datasets and fine-tune them on your low-resource data. This can help improve the performance of your model even with limited data.
2. Data Augmentation: Create new data from existing data through techniques like synonym replacement, random insertion, random deletion, and random swap. This can help increase the size of your dataset and make it more robust.
3. Active Learning: Selectively label a small number of instances from your dataset to maximize the information gain for your model. This can help improve the accuracy of y