Skip to content

refactor integration tests best practices#114

Closed
evgenydmitriev wants to merge 9 commits into
mainfrom
refactor-spec-file-best-practices
Closed

refactor integration tests best practices#114
evgenydmitriev wants to merge 9 commits into
mainfrom
refactor-spec-file-best-practices

Conversation

@evgenydmitriev
Copy link
Copy Markdown
Contributor

I ran the latest spec files you worked on and the best practices through the frontier models to improve the Integration tests section. Here are the summarized suggestions from Gemini, GPT, and Claude. Nothing we haven't discussed before, just more emphasis on the inputs and outputs in integration tests, rather than specific query implementations.

Comment thread wiki/spec_file_best_practices.md Outdated
Comment thread wiki/spec_file_best_practices.md Outdated
@kol3x
Copy link
Copy Markdown
Contributor

kol3x commented Oct 4, 2024

I'm not sure how we would apply new best practices to a scheduled worker

The code example we have in the best practices is a REST API with a clear input and output, but with the scheduled worker, it only seems to return a success or failure status, even if I try to put something else in there:

return new Response(JSON.stringify({ updatedMessages: messages, calculatedScores: parsedScores }));

Actual response with that code:

{ outcome: 'ok', noRetry: false }

I guess we could just console.log stuff on the last line instead and assert what console.log has been called with, if we can't modify response.

Otherwise, I agree with the changes. The only question is how we would follow them

@evgenydmitriev
Copy link
Copy Markdown
Contributor Author

@kol3x could you elaborate on your question with specific code line references? I'm struggling to pinpoint the problem you are talking about.

Comment on lines +41 to 59
```typescript
it('should correctly insert data into the database', async () => {
const inputData = { id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', };
const expectedOutput = [{ id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' }];

// High-level mock of database insert operation
const insertMock = vi.fn().mockResolvedValue(expectedOutput);
(dbMock.insert as any).mockReturnValue({ values: insertMock });

const response = await SELF.fetch('https://example.com/insert', {
method: 'POST',
body: JSON.stringify(inputData),
});

// Verify the response based on the expected output
const responseData = await response.json();
expect(responseData).toEqual(expectedOutput);
});
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evgenydmitriev I was referring to this code snippet in the last comment. Arina's tests for deduplicated-insert partly used similar approach, where we test service's response contents.

What I'm saying is that this exact approach might not be possible in a scheduled worker

@Lavriz
Copy link
Copy Markdown
Contributor

Lavriz commented Oct 5, 2024

@kol3x I will address your points, lmk what you think

@unicoder88 @jalmonter feel free to chime in and correct me

Integration Testing

First, let's clarify the concept of Interfaces

The Spec File is copied from the head branch and should contain 2 important sections: comments covering all functional requirements and Vitest integration tests covering all input/output interfaces

Interfaces

In our context, we refer to this:
API Interfaces: These define how our workers interact with other parts of the system. We're really just focusing on how our workers talk to other parts of the system. It's like the worker's API, but for databases and services instead of other apps.

Applying to Scheduled Workers:
While scheduled workers don't have traditional REST API inputs and outputs, they still have interfaces we can test:

  1. Database Interactions: How the worker reads from and writes to the database.
  2. External Service Calls: How the worker interacts with external services

For scheduled workers, we focus on these interactions rather than HTTP request/response cycles. The worker's "input" is the current state of the system (database, environment), and its "output" is the changes it makes to that state.

Example for Classification Worker

Our tests should focus on the worker's behavior rather than implementation details. We're not testing specific SQL queries, but rather high-level interactions with the database and external services.

Based on current Best Practices, here's an example of how tests for your worker could look:

const batchMessagesMock = [
  {
    id: 1,
    content: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
  },
  {
    id: 2,
    content: 'New DeFi protocol launched with innovative features',
  },
];

describe('Classification Worker', () => {
  beforeEach(() => {
    vi.clearAllMocks();
  });

 it('processes unclassified messages and updates scores', async () => {
    const distinctPairMock = [{ topic: 'cyberattack', industry: 'finance_blockchain' }];
    const messagesMock = [
      { id: 1, content: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' },
      { id: 2, content: 'New DeFi protocol launched with innovative features' },
    ];
    const openAIResponse = {
      choices: [{ message: { content: '1. 0.9\n2. 0.2' } }],
    };

    mockDb.values
      .mockResolvedValueOnce(distinctPairMock)
      .mockResolvedValueOnce(messagesMock);

    global.fetch = vi.fn().mockResolvedValue({
      ok: true,
      json: () => Promise.resolve(openAIResponse),
    });

    vi.spyOn(env.DESCRIPTIONS, 'get').mockResolvedValue('Mock description');

    const response = await SELF.scheduled();

    // Check the input for the distinct pair query
    expect(mockDb.values).toHaveBeenNthCalledWith(1, expect.objectContaining({
      topic: expect.any(Object),
      industry: expect.any(Object),
      similarity: expect.objectContaining({
        between: [0.2, 0.8]
      }),
      classification: null,
      timestamp: expect.any(Object)
    }));

    // Check the input for the messages query
    expect(mockDb.values).toHaveBeenNthCalledWith(2, expect.objectContaining({
      id: expect.any(Object),
      content: expect.any(Object),
      similarity: expect.objectContaining({
        between: [0.2, 0.8]
      }),
      classification: null,
      timestamp: expect.any(Object),
      topic: 'cyberattack',
      industry: 'finance_blockchain'
    }));

    // Check the input for the update query
    expect(mockDb.values).toHaveBeenNthCalledWith(3, expect.objectContaining({
      classification: expect.any(Object),
      main: expect.any(Object),
      messageId: expect.arrayContaining([1, 2]),
      topic: 'cyberattack',
      industry: 'finance_blockchain'
    }));

    expect(global.fetch).toHaveBeenCalledWith(
      'https://api.openai.com/v1/chat/completions',
      expect.objectContaining({
        method: 'POST',
        headers: expect.objectContaining({
          'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
        }),
        body: expect.stringContaining('Cryptocurrency theft'),
      })
    );
  });

});

Comment thread wiki/spec_file_best_practices.md Outdated
Comment thread wiki/spec_file_best_practices.md Outdated
Comment thread wiki/spec_file_best_practices.md Outdated
Comment thread wiki/spec_file_best_practices.md Outdated
Comment on lines +40 to +56
```typescript
it('should correctly insert data into the database', async () => {
const inputData = { id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', };
const expectedOutput = [{ id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' }];

// High-level mock of database insert operation
const insertMock = vi.fn().mockResolvedValue(expectedOutput);
vi.mocked(dbMock.insert).mockReturnValue({ values: insertMock });

const response = await SELF.fetch('https://example.com/insert', {
method: 'POST',
body: JSON.stringify(inputData),
});

// Verify the response based on the expected output
const responseData = await response.json();
expect(responseData).toEqual(expectedOutput);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```typescript
it('should correctly insert data into the database', async () => {
const inputData = { id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', };
const expectedOutput = [{ id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' }];
// High-level mock of database insert operation
const insertMock = vi.fn().mockResolvedValue(expectedOutput);
vi.mocked(dbMock.insert).mockReturnValue({ values: insertMock });
const response = await SELF.fetch('https://example.com/insert', {
method: 'POST',
body: JSON.stringify(inputData),
});
// Verify the response based on the expected output
const responseData = await response.json();
expect(responseData).toEqual(expectedOutput);
```typescript
beforeEach(() => {
vi.clearAllMocks();
mockDb = {
insert: vi.fn().mockReturnThis(),
values: vi.fn().mockReturnThis(),
execute: vi.fn().mockResolvedValue([]),
};
});
it('should correctly deduplicate and insert data into the database', async () => {
const inputData = {
messages: [
{
id: 1645479494256594945,
platform: 'RSS',
text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
},
{
id: 1645479494256594958,
platform: 'RSS',
text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
},
],
};
const expectedResponse = {
status: 'success',
data: {
inserted: 1,
deduplicated: 1,
total: 2,
},
message: 'Deduplication completed successfully',
};
const insertedData = [inputData.messages[0]];
mockDb.execute.mockResolvedValue(insertedData);
const response = await SELF.fetch('https://example.com/deduplicate', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(inputData),
});
const responseData = await response.json();
expect(response.status).toBe(200);
expect(responseData).toEqual(expectedResponse);
expect(mockDb.values).toHaveBeenCalledWith(expect.objectContaining({
id: 1645479494256594945,
platform: 'RSS',
text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
}));
});

Copy link
Copy Markdown
Contributor

@Lavriz Lavriz Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like to improve the test example, but I'm struggling what it should look like

Maybe more details could help capture the gist, but I'm not sure, feel free to fix it/remove it, probably it's too much and needs context
I think it could be a good example that it's not just testing the output of the API, but also the database interaction

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good, and I'm up for whatever makes things clearer, but it needs to be succinct if you expect people to actually read it. Those 16 lines just became 56.

ChatGPT o1-preview

I'm trying to write a Vitest spec file (test/index.spec.ts) to illustrate important points from our team's testing best practices (best-practices.md). Help me make my spec file more succinct while making sure it illustrates all the major points from our best practices.

beforeEach(() => {
    vi.clearAllMocks();
    mockDb = {
      insert: vi.fn().mockReturnThis(),
      values: vi.fn().mockReturnThis(),
      execute: vi.fn().mockResolvedValue([]),
    };
  });
  it('should correctly deduplicate and insert data into the database', async () => {
    const inputData = {
      messages: [
        {
          id: 1645479494256594945,
          platform: 'RSS',
          text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
        },
        {
          id: 1645479494256594958,
          platform: 'RSS',
          text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
        },
      ],
    };
    const expectedResponse = {
      status: 'success',
      data: {
        inserted: 1,
        deduplicated: 1,
        total: 2,
      },
      message: 'Deduplication completed successfully',
    };
    const insertedData = [inputData.messages[0]];
    mockDb.execute.mockResolvedValue(insertedData);
    const response = await SELF.fetch('https://example.com/deduplicate', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify(inputData),
    });
    const responseData = await response.json();
    expect(response.status).toBe(200);
    expect(responseData).toEqual(expectedResponse);
    expect(mockDb.values).toHaveBeenCalledWith(expect.objectContaining({
      id: 1645479494256594945,
      platform: 'RSS',
      text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
    }));
  });
# Spec File Best Practices

⚠️ Pay attention to the size of the Spec File. Consider modularizing, simplifying, or simply removing any non-essential content. When your file goes beyond a few hundred lines, and you are approaching LLM context windows, it might be time to split your worker into independent services. The goal is to keep Spec Files concise, manageable, readable, and maintainable.

## Integration tests section

⚠️ This section should focus on testing the worker as a black box, verifying functionality without making assumptions about the internal implementation.

### ✅️ Do

- **Limit Imports**: Keep things simple and standardized. Indispensable imports from Cloudflare, Vitest, Postgres, or Drizzle can be used without hesitation, but avoid importing packages for convenience reasons - other developers and LLMs might not be familiar with them.
- **Write Descriptive Test Names**: Write clear and descriptive test names that align with the functional requirements of the worker. If needed, include technical details in the test description to clarify the test's purpose.
- **Use Realistic Data Mocks**: Utilize mock data that closely resembles real-world scenarios. This ensures the tests are relevant and reflect actual use cases.
- **Mock Database Interactions**:  When testing database operations, use mock methods to simulate interactions with the database. This allows you to test various scenarios without relying on a real database.
- **Use High-Level Mocks Focused on Interface Interactions**: Concentrate on mocking the interfaces of your worker's dependencies, such as database operations or external service calls. Create high-level mocks that reflect the expected behavior of these interfaces without tying tests to specific implementation details. This approach aligns with our understanding of interfaces as points of interaction between the worker and its environment. For example, instead of mocking specific SQL queries or ORM methods like drizzle.select().from().where(), mock the insert method of your ORM to return a predefined result.

### ❌ Avoid

- **Unit Tests or Internal Implementation Checks**: Avoid testing specific internal functions or logic, as integration tests should generally treat the worker as a black box.
- **Random Data Mocks**: Avoid using randomly generated data in mocks. They put functional requirements in questions and can throw LLMs off.
- **Common-Sense Functionality Tests**: Refrain from testing trivial functionality. Assume that LLMs have seen enough good-quality examples for common functionality, such as logging, error handling, and back-off strategies.
- **Mocking Specific DB Queries**: Avoid mocking specific SQL queries or Drizzle query builder methods (e.g., `drizzle.select().from().where()`). Instead, focus on mocking the ORM methods like `insert`, `update`, etc., and assert on the data being passed to them.
- **Testing Internal DB Method Calls**: Do not assert that specific database methods are called with certain arguments. This focuses on implementation details rather than the worker's behavior in response to mocked database interactions.
- **Testing Internal DB Logic**: Avoid testing the internal workings of the database or ORM. Focus on the inputs your code provides to the database and the outputs it expects.```
Suggested change
```typescript
it('should correctly insert data into the database', async () => {
const inputData = { id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', };
const expectedOutput = [{ id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' }];
// High-level mock of database insert operation
const insertMock = vi.fn().mockResolvedValue(expectedOutput);
vi.mocked(dbMock.insert).mockReturnValue({ values: insertMock });
const response = await SELF.fetch('https://example.com/insert', {
method: 'POST',
body: JSON.stringify(inputData),
});
// Verify the response based on the expected output
const responseData = await response.json();
expect(responseData).toEqual(expectedOutput);
beforeEach(() => {
vi.clearAllMocks();
mockDb = {
insert: vi.fn().mockResolvedValue([]),
};
});
it('should correctly deduplicate and insert data into the database', async () => {
const inputData = {
messages: [
{
id: 1645479494256594945,
platform: 'RSS',
text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
},
{
id: 1645479494256594945, // Duplicate ID to simulate deduplication
platform: 'RSS',
text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
},
],
};
const expectedResponse = {
status: 'success',
data: {
inserted: 1,
deduplicated: 1,
total: 2,
},
message: 'Deduplication completed successfully',
};
const response = await SELF.fetch('https://example.com/deduplicate', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(inputData),
});
const responseData = await response.json();
expect(response.status).toBe(200);
expect(responseData).toEqual(expectedResponse);
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your example is good too, the only thing I'd add is:

expect(mockDb.values).toHaveBeenCalledWith(expect.objectContaining({
      id: 1645479494256594945,
      platform: 'RSS',
      text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
    }));

evgenydmitriev and others added 4 commits October 5, 2024 20:26
Co-authored-by: Arina Razmyslovich <55647212+Lavriz@users.noreply.github.com>
Co-authored-by: Arina Razmyslovich <55647212+Lavriz@users.noreply.github.com>
Co-authored-by: Arina Razmyslovich <55647212+Lavriz@users.noreply.github.com>
Co-authored-by: Arina Razmyslovich <55647212+Lavriz@users.noreply.github.com>
Comment thread wiki/spec_file_best_practices.md Outdated
Comment thread wiki/spec_file_best_practices.md Outdated
Lavriz and others added 2 commits October 6, 2024 16:16
Co-authored-by: Evgeny Dmitriev <56804873+evgenydmitriev@users.noreply.github.com>
Co-authored-by: Evgeny Dmitriev <56804873+evgenydmitriev@users.noreply.github.com>
@kol3x
Copy link
Copy Markdown
Contributor

kol3x commented Oct 7, 2024

@kol3x I will address your points, lmk what you think

In your example for Classification Worker you used expect(mockDb.values) for both select and update, but I only found this method available for insert in drizzle. Even in deduplicated-insert worker tests you are only using it for insert, while for select you are just doing expect(mockdb.select).toHaveBeenCalled().

So test for classification worker would look something like this instead:

const batchMessagesMock = [
  {
    id: 1,
    content: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
  },
  {
    id: 2,
    content: 'New DeFi protocol launched with innovative features',
  },
];

describe('Classification Worker', () => {
  beforeEach(() => {
    vi.clearAllMocks();
  });

 it('processes unclassified messages and updates scores', async () => {
    const distinctPairMock = [{ topic: 'cyberattack', industry: 'finance_blockchain' }];
    const messagesMock = [
      { id: 1, content: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' },
      { id: 2, content: 'New DeFi protocol launched with innovative features' },
    ];
    const openAIResponse = {
      choices: [{ message: { content: '1. 0.9\n2. 0.2' } }],
    };

    global.fetch = vi.fn().mockResolvedValue({
      ok: true,
      json: () => Promise.resolve(openAIResponse),
    });

    vi.spyOn(env.DESCRIPTIONS, 'get').mockResolvedValue('Mock description');

    const response = await SELF.scheduled();

    // Check the input for the distinct pair query
    // And  input for the messages query
    expect(mockDb.select).toHaveBeenCalledTimes(2)

    // Check the input for the update query
    expect(mockDb.update).toHaveBeenCalledOnce()

    expect(global.fetch).toHaveBeenCalledWith(
      'https://api.openai.com/v1/chat/completions',
      expect.objectContaining({
        method: 'POST',
        headers: expect.objectContaining({
          'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
        }),
        body: expect.stringContaining('Cryptocurrency theft'),
      })
    );
  });

});

Or am I not understanding something about dbMock.values?

@Lavriz
Copy link
Copy Markdown
Contributor

Lavriz commented Oct 11, 2024

hey-hey everyone! sorry for a delay

so, @jalmonter is developing a test example using updated Best Practices from this PR and will add it here. Once complete, we'll review it to address previous questions (i think having a real example clears things out!). Meanwhile, feel free to add comments/improvements on updated best practices sections

@jalmonter can you please share a PR where you're working?

@jalmonter
Copy link
Copy Markdown
Collaborator

I'm trying out some ideas. Instead of mocking high-level ORM functions, we could mock user-defined functions and interfaces that handle database operations. This approach would make spec files easier to read and write and avoid including implementation details.

However, since tests for Cloudflare Workers run a bit differently, there could be challenges in getting this to work. I'll look into the internals to see what's possible.

The PR is in the newly created worker for scraping RSS feeds: https://github.com/1712n/cg-rss-data-collector/pull/4

@jalmonter
Copy link
Copy Markdown
Collaborator

Here's the spec file that mocks the high-level functions instead of the ORM ones.

That said, there are a few pitfalls with this approach. The biggest issue is that you can't reuse the same exported functions in the index.ts file generated by wall-e. This means wall-e will need to decouple them from the index.ts file and move them into a separate file.

@Lavriz
Copy link
Copy Markdown
Contributor

Lavriz commented Oct 17, 2024

@1712n/dni-nlp-backend

hey! yesterday on the grooming @evgenydmitriev suggested 3 different directions based on what we've got now:

  1. super detailed best practices for every use case (for instance, examples of mocking drizzle internal functions)
  2. we give developers full freedom and aren't concerned about testing as long as the code (index.ts) works, with the caveat of needing to hire only smart people then
  3. simplifying the infrastructure (no Timescale, for example)

I was reading up more on Cloudflare Ecocystem, and I propose we pursue option 3 as CF offers D1 and Vectorize which aligns alright with our use cases & I don't see any downsides as I looked through SQLite
+ it'll be much simpler for people to write workers with D1 and therefore use wall-e

UPD: considering our use cases, i think we stick to Timescale for the main system, but for synthetic data we can switch to D1 and Vectorize, which means vector similarities will be leveraging Vectorize capabilities

feel free to share ur thoughts on this :)

@jalmonter
Copy link
Copy Markdown
Collaborator

Closing this in favor of #137

@jalmonter jalmonter closed this Nov 6, 2024
@jalmonter jalmonter deleted the refactor-spec-file-best-practices branch November 6, 2024 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants