# Notebook setup

In [None]:
pip install -r requirements.txt

# How to improve?
Although the `StudentKnowingCrew` worked for a few questions, it was not sufficient to answer more general questions about PIRLS. This was due to the limited amount of information provided to the agents. To recap, we created this code:

```python
@agent
def database_expert(self) -> Agent:
    return Agent(
        role="Students database expert",
        backstory=dedent("""
            Database expert that knows the structure of PIRLS database regarding students.
            You know that there is the table Students with columns Stident_ID and Country_ID, and a table
            Countries with column Country_ID, Name and Code.
        """),
        goal="Use the tool to query the database and answer the question.",
        llm=self.llm,
        allow_delegation=False,
        verbose=True,
        tools=[
            query_database
        ]
    )
```
We can see that our so-called 'database expert' only knows about two tables from our general PIRLS database: Students and Countries, which is a small section of what we actually have. Referring back to the tutorial on the data, we know that our database structure looks more like this:

[<img src="../images/tutorial_2_db_schema.png" width=2000/>](../images/tutorial_2_db_schema.png)

Seeing this clearly shows us how little our current solution understands about the structure of the database. We need a better solution - agents that know much more and can create more advanced queries. Enter the [`DataAnalysisCrew`](./src/submission/crews/basic_crew.py)!

The main difference you’ll notice is that the code is more complex. It now involves two agents: the data analyst and the data engineer. Another change is that we now have two static variables that point to two configuration YAML files, and the `DataAnalysisCrew` class has a wrapper called `@CrewBase`. These changes allow us to specify configurations for agents and tasks in separate YAML files and help declutter our code. 

Now let's take a look at our new agents. The data analyst has no tools to use. They only create results and analyze the data provided by the data engineer. The data engineer, on the other hand, has three tools and extensive knowledge of the database structure. Their task is to provide data that can help answer the question. Let's see what we can find in the  [agents.yaml](./src/submission/config/agents.yaml) file.

```yaml
lead_data_analyst:
  role: >
    PIRLS lead data analyst
  goal: >
    Answer the research questions using the PIRLS 2021 dataset
  backstory: >
    You are the Lead Data Analyst for the Progress in International Reading Literacy Study (PIRLS) project. 
    Your expertise in data analysis and interpretation is crucial for providing insights into the dataset.
    Your analysis will be used to inform educational policies and practices.
    You focus on questions related to reading literacy and educational outcomes.
   
    While you have a good overview of PIRLS, you always rely on your data engineer to provide you with the necessary data.

# file cut out ...
    
```

This file contains the full configuration for both of our new agents. It's worth noting that the keys for both `lead_data_analyst` and `data_engineer` are the same as the arguments used for agent initialization.

```python
Agent(
  role="...",
  goal="...",
  backstory="..."
)
```

Similarly the same holds true for the [tasks.yaml](./src/submission/config/tasks.yaml) file. The keys used in this file must match the keyword arguments used for initializing the `Task` object. One important thing to notice is that all of the config files can be parametrised. Let's take a look at the tasks configuration file.

```yaml
answer_question_task:
  description: >
    Answer the following question:    
    {user_question}
    
    When applicable, search for relevant data in the PIRLS 2021 dataset.
    
    When answering, always:    
    - Do not comment on topics outside the area of your expertise.     
    - Ensure that your analysis is accurate and relevant to the research questions.
    - Unless instructed otherwise, explain how you come to your conclusions and provide evidence to support your claims.
    - Use markdown format for your final answer.
  expected_output: >
    A clear and concise answer to the question
```

Here we can see that Python string format is being used. In the `description` part we can see the `{user_question}` parameter. How to provide the value for this parameter? You can do that in the `kickoff` method of your crew.

```python
@CrewBase
class DataAnalysisCrew(Submission):
    """Data Analysis Crew for the GDSC project."""
    # Load the files from the config directory
    agents_config = PROJECT_ROOT / 'submission' / 'config' / 'agents.yaml'
    tasks_config = PROJECT_ROOT / 'submission' / 'config' / 'tasks.yaml'

    def __init__(self, llm):
        self.llm = llm

    def run(self, prompt: str) -> str:
        return self.crew().kickoff(inputs={'user_question': prompt})

# ... code cut out
```

We can see that in the `run` method when we call the crew's `kickoff` method we pass an argument called `inputs` which is a `dict` that uses the names of the parameters defined in the yaml fiel as keys. Although here we used but 1 parameter, you can define more and pass the values in the similar way.