refactor integration tests best practices by evgenydmitriev · Pull Request #114 · 1712n/wall-e

evgenydmitriev · 2024-09-26T15:22:19Z

I ran the latest spec files you worked on and the best practices through the frontier models to improve the Integration tests section. Here are the summarized suggestions from Gemini, GPT, and Claude. Nothing we haven't discussed before, just more emphasis on the inputs and outputs in integration tests, rather than specific query implementations.

kol3x · 2024-10-04T12:21:00Z

I'm not sure how we would apply new best practices to a scheduled worker

The code example we have in the best practices is a REST API with a clear input and output, but with the scheduled worker, it only seems to return a success or failure status, even if I try to put something else in there:

return new Response(JSON.stringify({ updatedMessages: messages, calculatedScores: parsedScores }));

Actual response with that code:

{ outcome: 'ok', noRetry: false }

I guess we could just console.log stuff on the last line instead and assert what console.log has been called with, if we can't modify response.

Otherwise, I agree with the changes. The only question is how we would follow them

evgenydmitriev · 2024-10-04T12:35:36Z

@kol3x could you elaborate on your question with specific code line references? I'm struggling to pinpoint the problem you are talking about.

kol3x · 2024-10-04T12:41:59Z

+```typescript
+it('should correctly insert data into the database', async () => {
+  const inputData = { id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', };
+  const expectedOutput = [{ id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' }];
+
+  // High-level mock of database insert operation
+  const insertMock = vi.fn().mockResolvedValue(expectedOutput);
+  (dbMock.insert as any).mockReturnValue({ values: insertMock });
+
+  const response = await SELF.fetch('https://example.com/insert', {
+    method: 'POST',
+    body: JSON.stringify(inputData),
+  });
+
+  // Verify the response based on the expected output
+  const responseData = await response.json();
+  expect(responseData).toEqual(expectedOutput);
 });
 ```


@evgenydmitriev I was referring to this code snippet in the last comment. Arina's tests for deduplicated-insert partly used similar approach, where we test service's response contents.

What I'm saying is that this exact approach might not be possible in a scheduled worker

Lavriz · 2024-10-05T12:50:18Z

@kol3x I will address your points, lmk what you think

@unicoder88 @jalmonter feel free to chime in and correct me

Integration Testing

First, let's clarify the concept of Interfaces

The Spec File is copied from the head branch and should contain 2 important sections: comments covering all functional requirements and Vitest integration tests covering all input/output interfaces

Interfaces

In our context, we refer to this:
API Interfaces: These define how our workers interact with other parts of the system. We're really just focusing on how our workers talk to other parts of the system. It's like the worker's API, but for databases and services instead of other apps.

Applying to Scheduled Workers:
While scheduled workers don't have traditional REST API inputs and outputs, they still have interfaces we can test:

Database Interactions: How the worker reads from and writes to the database.
External Service Calls: How the worker interacts with external services

For scheduled workers, we focus on these interactions rather than HTTP request/response cycles. The worker's "input" is the current state of the system (database, environment), and its "output" is the changes it makes to that state.

Example for Classification Worker

Our tests should focus on the worker's behavior rather than implementation details. We're not testing specific SQL queries, but rather high-level interactions with the database and external services.

Based on current Best Practices, here's an example of how tests for your worker could look:

const batchMessagesMock = [
  {
    id: 1,
    content: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
  },
  {
    id: 2,
    content: 'New DeFi protocol launched with innovative features',
  },
];

describe('Classification Worker', () => {
  beforeEach(() => {
    vi.clearAllMocks();
  });

 it('processes unclassified messages and updates scores', async () => {
    const distinctPairMock = [{ topic: 'cyberattack', industry: 'finance_blockchain' }];
    const messagesMock = [
      { id: 1, content: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' },
      { id: 2, content: 'New DeFi protocol launched with innovative features' },
    ];
    const openAIResponse = {
      choices: [{ message: { content: '1. 0.9\n2. 0.2' } }],
    };

    mockDb.values
      .mockResolvedValueOnce(distinctPairMock)
      .mockResolvedValueOnce(messagesMock);

    global.fetch = vi.fn().mockResolvedValue({
      ok: true,
      json: () => Promise.resolve(openAIResponse),
    });

    vi.spyOn(env.DESCRIPTIONS, 'get').mockResolvedValue('Mock description');

    const response = await SELF.scheduled();

    // Check the input for the distinct pair query
    expect(mockDb.values).toHaveBeenNthCalledWith(1, expect.objectContaining({
      topic: expect.any(Object),
      industry: expect.any(Object),
      similarity: expect.objectContaining({
        between: [0.2, 0.8]
      }),
      classification: null,
      timestamp: expect.any(Object)
    }));

    // Check the input for the messages query
    expect(mockDb.values).toHaveBeenNthCalledWith(2, expect.objectContaining({
      id: expect.any(Object),
      content: expect.any(Object),
      similarity: expect.objectContaining({
        between: [0.2, 0.8]
      }),
      classification: null,
      timestamp: expect.any(Object),
      topic: 'cyberattack',
      industry: 'finance_blockchain'
    }));

    // Check the input for the update query
    expect(mockDb.values).toHaveBeenNthCalledWith(3, expect.objectContaining({
      classification: expect.any(Object),
      main: expect.any(Object),
      messageId: expect.arrayContaining([1, 2]),
      topic: 'cyberattack',
      industry: 'finance_blockchain'
    }));

    expect(global.fetch).toHaveBeenCalledWith(
      'https://api.openai.com/v1/chat/completions',
      expect.objectContaining({
        method: 'POST',
        headers: expect.objectContaining({
          'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
        }),
        body: expect.stringContaining('Cryptocurrency theft'),
      })
    );
  });

});

Lavriz · 2024-10-05T14:20:15Z

+```typescript
+it('should correctly insert data into the database', async () => {
+  const inputData = { id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', };
+  const expectedOutput = [{ id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' }];
+
+  // High-level mock of database insert operation
+  const insertMock = vi.fn().mockResolvedValue(expectedOutput);
+  vi.mocked(dbMock.insert).mockReturnValue({ values: insertMock });
+
+  const response = await SELF.fetch('https://example.com/insert', {
+    method: 'POST',
+    body: JSON.stringify(inputData),
+  });
+
+  // Verify the response based on the expected output
+  const responseData = await response.json();
+  expect(responseData).toEqual(expectedOutput);


Suggested change

```typescript

it('should correctly insert data into the database', async () => {

const inputData = { id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', };

const expectedOutput = [{ id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' }];

// High-level mock of database insert operation

const insertMock = vi.fn().mockResolvedValue(expectedOutput);

vi.mocked(dbMock.insert).mockReturnValue({ values: insertMock });

const response = await SELF.fetch('https://example.com/insert', {

method: 'POST',

body: JSON.stringify(inputData),

});

// Verify the response based on the expected output

const responseData = await response.json();

expect(responseData).toEqual(expectedOutput);

```typescript

beforeEach(() => {

vi.clearAllMocks();

mockDb = {

insert: vi.fn().mockReturnThis(),

values: vi.fn().mockReturnThis(),

execute: vi.fn().mockResolvedValue([]),

};

});

it('should correctly deduplicate and insert data into the database', async () => {

const inputData = {

messages: [

{

id: 1645479494256594945,

platform: 'RSS',

text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',

},

{

id: 1645479494256594958,

platform: 'RSS',

text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',

},

],

};

const expectedResponse = {

status: 'success',

data: {

inserted: 1,

deduplicated: 1,

total: 2,

},

message: 'Deduplication completed successfully',

};

const insertedData = [inputData.messages[0]];

mockDb.execute.mockResolvedValue(insertedData);

const response = await SELF.fetch('https://example.com/deduplicate', {

method: 'POST',

headers: {

'Content-Type': 'application/json',

},

body: JSON.stringify(inputData),

});

const responseData = await response.json();

expect(response.status).toBe(200);

expect(responseData).toEqual(expectedResponse);

expect(mockDb.values).toHaveBeenCalledWith(expect.objectContaining({

id: 1645479494256594945,

platform: 'RSS',

text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',

}));

});

i would like to improve the test example, but I'm struggling what it should look like

Maybe more details could help capture the gist, but I'm not sure, feel free to fix it/remove it, probably it's too much and needs context
I think it could be a good example that it's not just testing the output of the API, but also the database interaction

I think it's good, and I'm up for whatever makes things clearer, but it needs to be succinct if you expect people to actually read it. Those 16 lines just became 56.

ChatGPT o1-preview

I'm trying to write a Vitest spec file (test/index.spec.ts) to illustrate important points from our team's testing best practices (best-practices.md). Help me make my spec file more succinct while making sure it illustrates all the major points from our best practices.

beforeEach(() => { vi.clearAllMocks(); mockDb = { insert: vi.fn().mockReturnThis(), values: vi.fn().mockReturnThis(), execute: vi.fn().mockResolvedValue([]), }; }); it('should correctly deduplicate and insert data into the database', async () => { const inputData = { messages: [ { id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', }, { id: 1645479494256594958, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', }, ], }; const expectedResponse = { status: 'success', data: { inserted: 1, deduplicated: 1, total: 2, }, message: 'Deduplication completed successfully', }; const insertedData = [inputData.messages[0]]; mockDb.execute.mockResolvedValue(insertedData); const response = await SELF.fetch('https://example.com/deduplicate', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify(inputData), }); const responseData = await response.json(); expect(response.status).toBe(200); expect(responseData).toEqual(expectedResponse); expect(mockDb.values).toHaveBeenCalledWith(expect.objectContaining({ id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', })); });

# Spec File Best Practices ⚠️ Pay attention to the size of the Spec File. Consider modularizing, simplifying, or simply removing any non-essential content. When your file goes beyond a few hundred lines, and you are approaching LLM context windows, it might be time to split your worker into independent services. The goal is to keep Spec Files concise, manageable, readable, and maintainable. ## Integration tests section ⚠️ This section should focus on testing the worker as a black box, verifying functionality without making assumptions about the internal implementation. ### ✅️ Do - **Limit Imports**: Keep things simple and standardized. Indispensable imports from Cloudflare, Vitest, Postgres, or Drizzle can be used without hesitation, but avoid importing packages for convenience reasons - other developers and LLMs might not be familiar with them. - **Write Descriptive Test Names**: Write clear and descriptive test names that align with the functional requirements of the worker. If needed, include technical details in the test description to clarify the test's purpose. - **Use Realistic Data Mocks**: Utilize mock data that closely resembles real-world scenarios. This ensures the tests are relevant and reflect actual use cases. - **Mock Database Interactions**: When testing database operations, use mock methods to simulate interactions with the database. This allows you to test various scenarios without relying on a real database. - **Use High-Level Mocks Focused on Interface Interactions**: Concentrate on mocking the interfaces of your worker's dependencies, such as database operations or external service calls. Create high-level mocks that reflect the expected behavior of these interfaces without tying tests to specific implementation details. This approach aligns with our understanding of interfaces as points of interaction between the worker and its environment. For example, instead of mocking specific SQL queries or ORM methods like drizzle.select().from().where(), mock the insert method of your ORM to return a predefined result. ### ❌ Avoid - **Unit Tests or Internal Implementation Checks**: Avoid testing specific internal functions or logic, as integration tests should generally treat the worker as a black box. - **Random Data Mocks**: Avoid using randomly generated data in mocks. They put functional requirements in questions and can throw LLMs off. - **Common-Sense Functionality Tests**: Refrain from testing trivial functionality. Assume that LLMs have seen enough good-quality examples for common functionality, such as logging, error handling, and back-off strategies. - **Mocking Specific DB Queries**: Avoid mocking specific SQL queries or Drizzle query builder methods (e.g., `drizzle.select().from().where()`). Instead, focus on mocking the ORM methods like `insert`, `update`, etc., and assert on the data being passed to them. - **Testing Internal DB Method Calls**: Do not assert that specific database methods are called with certain arguments. This focuses on implementation details rather than the worker's behavior in response to mocked database interactions. - **Testing Internal DB Logic**: Avoid testing the internal workings of the database or ORM. Focus on the inputs your code provides to the database and the outputs it expects.```

Suggested change

```typescript

it('should correctly insert data into the database', async () => {

const inputData = { id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', };

const expectedOutput = [{ id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' }];

// High-level mock of database insert operation

const insertMock = vi.fn().mockResolvedValue(expectedOutput);

vi.mocked(dbMock.insert).mockReturnValue({ values: insertMock });

const response = await SELF.fetch('https://example.com/insert', {

method: 'POST',

body: JSON.stringify(inputData),

});

// Verify the response based on the expected output

const responseData = await response.json();

expect(responseData).toEqual(expectedOutput);

beforeEach(() => {

vi.clearAllMocks();

mockDb = {

insert: vi.fn().mockResolvedValue([]),

};

});

it('should correctly deduplicate and insert data into the database', async () => {

const inputData = {

messages: [

{

id: 1645479494256594945,

platform: 'RSS',

text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',

},

{

id: 1645479494256594945, // Duplicate ID to simulate deduplication

platform: 'RSS',

text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',

},

],

};

const expectedResponse = {

status: 'success',

data: {

inserted: 1,

deduplicated: 1,

total: 2,

},

message: 'Deduplication completed successfully',

};

const response = await SELF.fetch('https://example.com/deduplicate', {

method: 'POST',

headers: {

'Content-Type': 'application/json',

},

body: JSON.stringify(inputData),

});

const responseData = await response.json();

expect(response.status).toBe(200);

expect(responseData).toEqual(expectedResponse);

});

Your example is good too, the only thing I'd add is:

expect(mockDb.values).toHaveBeenCalledWith(expect.objectContaining({ id: 1645479494256594945, platform: 'RSS', text: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC', }));

Co-authored-by: Arina Razmyslovich <55647212+Lavriz@users.noreply.github.com>

Co-authored-by: Evgeny Dmitriev <56804873+evgenydmitriev@users.noreply.github.com>

kol3x · 2024-10-07T08:28:09Z

@kol3x I will address your points, lmk what you think

In your example for Classification Worker you used expect(mockDb.values) for both select and update, but I only found this method available for insert in drizzle. Even in deduplicated-insert worker tests you are only using it for insert, while for select you are just doing expect(mockdb.select).toHaveBeenCalled().

So test for classification worker would look something like this instead:

const batchMessagesMock = [
  {
    id: 1,
    content: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC',
  },
  {
    id: 2,
    content: 'New DeFi protocol launched with innovative features',
  },
];

describe('Classification Worker', () => {
  beforeEach(() => {
    vi.clearAllMocks();
  });

 it('processes unclassified messages and updates scores', async () => {
    const distinctPairMock = [{ topic: 'cyberattack', industry: 'finance_blockchain' }];
    const messagesMock = [
      { id: 1, content: 'Cryptocurrency theft: $13.9M stolen from South Korean exchange GDAC' },
      { id: 2, content: 'New DeFi protocol launched with innovative features' },
    ];
    const openAIResponse = {
      choices: [{ message: { content: '1. 0.9\n2. 0.2' } }],
    };

    global.fetch = vi.fn().mockResolvedValue({
      ok: true,
      json: () => Promise.resolve(openAIResponse),
    });

    vi.spyOn(env.DESCRIPTIONS, 'get').mockResolvedValue('Mock description');

    const response = await SELF.scheduled();

    // Check the input for the distinct pair query
    // And  input for the messages query
    expect(mockDb.select).toHaveBeenCalledTimes(2)

    // Check the input for the update query
    expect(mockDb.update).toHaveBeenCalledOnce()

    expect(global.fetch).toHaveBeenCalledWith(
      'https://api.openai.com/v1/chat/completions',
      expect.objectContaining({
        method: 'POST',
        headers: expect.objectContaining({
          'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
        }),
        body: expect.stringContaining('Cryptocurrency theft'),
      })
    );
  });

});

Or am I not understanding something about dbMock.values?

Lavriz · 2024-10-11T11:36:15Z

hey-hey everyone! sorry for a delay

so, @jalmonter is developing a test example using updated Best Practices from this PR and will add it here. Once complete, we'll review it to address previous questions (i think having a real example clears things out!). Meanwhile, feel free to add comments/improvements on updated best practices sections

@jalmonter can you please share a PR where you're working?

jalmonter · 2024-10-14T16:44:47Z

I'm trying out some ideas. Instead of mocking high-level ORM functions, we could mock user-defined functions and interfaces that handle database operations. This approach would make spec files easier to read and write and avoid including implementation details.

However, since tests for Cloudflare Workers run a bit differently, there could be challenges in getting this to work. I'll look into the internals to see what's possible.

The PR is in the newly created worker for scraping RSS feeds: https://github.com/1712n/cg-rss-data-collector/pull/4

jalmonter · 2024-10-16T18:20:58Z

Here's the spec file that mocks the high-level functions instead of the ORM ones.

That said, there are a few pitfalls with this approach. The biggest issue is that you can't reuse the same exported functions in the index.ts file generated by wall-e. This means wall-e will need to decouple them from the index.ts file and move them into a separate file.

Lavriz · 2024-10-17T18:14:12Z

@1712n/dni-nlp-backend

hey! yesterday on the grooming @evgenydmitriev suggested 3 different directions based on what we've got now:

super detailed best practices for every use case (for instance, examples of mocking drizzle internal functions)
we give developers full freedom and aren't concerned about testing as long as the code (index.ts) works, with the caveat of needing to hire only smart people then
simplifying the infrastructure (no Timescale, for example)

I was reading up more on Cloudflare Ecocystem, and I propose we pursue option 3 as CF offers D1 and Vectorize which aligns alright with our use cases & I don't see any downsides as I looked through SQLite
+ it'll be much simpler for people to write workers with D1 and therefore use wall-e

UPD: considering our use cases, i think we stick to Timescale for the main system, but for synthetic data we can switch to D1 and Vectorize, which means vector similarities will be leveraging Vectorize capabilities

feel free to share ur thoughts on this :)

jalmonter · 2024-11-06T23:21:22Z

Closing this in favor of #137

refactor spec file best practices

7ac88b4

evgenydmitriev requested review from Kseymur, Lavriz, jalmonter, kol3x and unicoder88 September 26, 2024 15:22

evgenydmitriev self-assigned this Sep 26, 2024

unicoder88 reviewed Sep 26, 2024

View reviewed changes

Comment thread wiki/spec_file_best_practices.md Outdated

Kseymur reviewed Sep 26, 2024

View reviewed changes

Comment thread wiki/spec_file_best_practices.md Outdated

improve type safety

4cdec09

kol3x reviewed Oct 4, 2024

View reviewed changes

merge similar points

c5eb34b

Lavriz reviewed Oct 5, 2024

View reviewed changes

evgenydmitriev and others added 4 commits October 5, 2024 20:26

Update wiki/spec_file_best_practices.md

3d2f368

Co-authored-by: Arina Razmyslovich <55647212+Lavriz@users.noreply.github.com>

Update wiki/spec_file_best_practices.md

5ccf3cb

Co-authored-by: Arina Razmyslovich <55647212+Lavriz@users.noreply.github.com>

Update wiki/spec_file_best_practices.md

dcfcf65

Co-authored-by: Arina Razmyslovich <55647212+Lavriz@users.noreply.github.com>

Update wiki/spec_file_best_practices.md

14105ce

Co-authored-by: Arina Razmyslovich <55647212+Lavriz@users.noreply.github.com>

evgenydmitriev commented Oct 6, 2024

View reviewed changes

Comment thread wiki/spec_file_best_practices.md Outdated

evgenydmitriev commented Oct 6, 2024

View reviewed changes

Comment thread wiki/spec_file_best_practices.md Outdated

Lavriz and others added 2 commits October 6, 2024 16:16

Update spec_file_best_practices.md

40c00c4

Co-authored-by: Evgeny Dmitriev <56804873+evgenydmitriev@users.noreply.github.com>

Update spec_file_best_practices.md

acdf406

Co-authored-by: Evgeny Dmitriev <56804873+evgenydmitriev@users.noreply.github.com>

jalmonter mentioned this pull request Nov 6, 2024

refactor: integration test section best practices #137

Merged

jalmonter closed this Nov 6, 2024

jalmonter deleted the refactor-spec-file-best-practices branch November 6, 2024 23:21

Conversation

evgenydmitriev commented Sep 26, 2024

Uh oh!

Uh oh!

Uh oh!

kol3x commented Oct 4, 2024

Uh oh!

evgenydmitriev commented Oct 4, 2024

Uh oh!

kol3x Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

Lavriz commented Oct 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integration Testing

Interfaces

Example for Classification Worker

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lavriz Oct 5, 2024

Choose a reason for hiding this comment

Uh oh!

Lavriz Oct 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evgenydmitriev Oct 5, 2024

Choose a reason for hiding this comment

ChatGPT o1-preview

Uh oh!

Lavriz Oct 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kol3x commented Oct 7, 2024

Uh oh!

Lavriz commented Oct 11, 2024

Uh oh!

jalmonter commented Oct 14, 2024

Uh oh!

jalmonter commented Oct 16, 2024

Uh oh!

Lavriz commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jalmonter commented Nov 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Lavriz commented Oct 5, 2024 •

edited

Loading

Lavriz Oct 5, 2024 •

edited

Loading

Lavriz commented Oct 17, 2024 •

edited

Loading