Increased test coverage #29

Mews · 2024-06-17T13:31:59Z

Related to #28 and #24

Changes
Before, the test coverage was at around 79% (this is including the soon to be removed main function in crawler.py)
I've added new test cases to test_crawler.py, so that now the test coverage is at 100% (excluding the main function)

I've also changed the ci workflow so that a coverage report is shown when running the tests, using the already installed pytest-cov extension

And I've also fixed some formatting issues present before my changes (not sure how they were passing the ci but they were getting found on my machine on the commit hook so I fixed them)

indrajithi · 2024-06-17T13:50:32Z

tests/test_crawler.py

+    spider = Spider("http://http.error")
+
+    resp404 = spider.fetch_url("http://http.error/404")
+


This test is too long.

We can break this based on SRP. like test_fetch_url_handles_404_error, test_fetch_url_handles_408_error etc.

We can extract out common code def setup_mock_response(url, status, body="<html><body><a href='http://http.error'>link</a></body></html>") and re use it.

type: ignore (can we use type hints here. Maybe pytest.CaptureFixture?)

Yeah you're right I'll shorten it 👍
I don't understand what you mean in the 3rd thing tho. Are you referring to the places where I wrote # type: ignore? That's because otherwise mypy would complain about the capsys argument, which is only passed when running the tests.

Just pushed the changes let me know what you think 👍

indrajithi · 2024-06-17T13:53:30Z

I've also changed the ci workflow so that a coverage report is shown when running the tests, using the already installed pytest-cov extension

Awesome!

Can we display it in Readme? Ideally we can make it like: If it falls below a threshold coverage (80%) the CI should fail. We can spec out that in a different issue.

Mews · 2024-06-17T13:55:05Z

I don't know about making the ci fail when its bellow a certain coverage percentage, but I can add a badge to the readme with the coverage percent, which then links to the full report.
You can see how it'd look here for example :)
It would use this action for it

indrajithi · 2024-06-17T14:13:47Z

tests/test_crawler.py

+        body=requests.exceptions.Timeout(),
+        status=408
+    )
+


responses.add( responses.GET, "http://timeout.error", body=requests.exceptions.Timeout(), status=408 )

This is getting repeated in every test case. We can move this into a reusable function. And maybe move it to test/utils.py so that we can re use this other places as well.

something like this

@contextmanager def capture_output(capsys: pytest.CaptureFixture) -> Generator[None, None, str]: yield captured = capsys.readouterr() return captured.out def setup_mock_response(url: str, status: int, body: str = "<html><body><a href='http://http.error'>link</a></body></html>") -> None: responses.add( responses.GET, url, body=body, status=status )

and we can call the tests like this

@responses.activate def test_fetch_url_handles_404_error(capsys: pytest.CaptureFixture) -> None: setup_mock_response("http://http.error/404", 404) spider = Spider("http://http.error") with capture_output(capsys) as output: resp404 = spider.fetch_url("http://http.error/404") assert

And repeat this patten for testing other exceptions.

Or even better to use parametrize fixtures

@pytest.mark.parametrize("url, status", [ ("http://http.error/404", 404), ("http://http.error/408", 408), ("http://http.error/403", 403), ]) def test_fetch_url_http_errors(mock_responses, capsys_output, url, status):

LMK your thoughts.

Oh I get it I didn't understand it was meant to be for every test, give me a second and I'll implement it.
I've never used pytest parametrize so I might not be the most indicated to implement that, maybe we can just open an issue for it and it can be dealt with later?

It is very simple. Basically instead of calling test_fetch_url_handles_404_error, test_fetch_url_handles_408_error we generalize it into one test test_fetch_url_http_errors and pass the fixture parms to it with the url to call and the response we expect. You can try it. If not working. I can pick it up later as part of the refactoring I am currently doing.

It seems like capsys doesn't capture the stdout from inside the context manager.

Ok. LMK once you fix other things and we can merge this.

indrajithi · 2024-06-17T14:24:48Z

tests/test_crawler.py

+@responses.activate
+def test_crawl_found_duplicate_url() -> None:
+    responses.add(
+        responses.GET,


We can reuse the helper setup_mock_response in all these places

@indrajithi I added the helper function everywhere, I can't get the context manager to work. It seems like capsys doesn't capture the stdout from inside the context manager.

Should I open an issue for it? To create the context manager for the tests?

No issues. Let us ignore it for now. This looks good. We can pick displaying coverage in readme in a separate issue. I will merge this one.

No issues. Let us ignore it for now. This looks good. We can pick displaying coverage in readme in a separate issue. I will merge this one.

I've already created an issue for it if you can just assign me there :) #30

Mews added 5 commits June 17, 2024 11:43

Report coverage when running tests in CI

b75e54f

Added new tests for fetch_url

bcb1e19

Added new tests for format_url

dfb05ff

Added new tests for crawl

922d1fe

Added new test for start

762dae5

indrajithi reviewed Jun 17, 2024

View reviewed changes

Shortened test_fetch_url_http_error

5f58ba8

indrajithi reviewed Jun 17, 2024

View reviewed changes

Mews added 2 commits June 17, 2024 15:43

Create helper function setup_mock_response

2561134

Use Union instead of pipe character in type hint

b809542

Mews mentioned this pull request Jun 17, 2024

Display coverage percentage in readme #30

Closed

indrajithi merged commit 2f158a9 into DataCrawl-AI:master Jun 17, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increased test coverage #29

Increased test coverage #29

Mews commented Jun 17, 2024 •

edited

Loading

indrajithi Jun 17, 2024

Mews Jun 17, 2024 •

edited

Loading

Mews Jun 17, 2024

indrajithi commented Jun 17, 2024

Mews commented Jun 17, 2024 •

edited

Loading

indrajithi Jun 17, 2024

Mews Jun 17, 2024

indrajithi Jun 17, 2024 •

edited

Loading

Mews Jun 17, 2024

indrajithi Jun 17, 2024

indrajithi Jun 17, 2024

Mews Jun 17, 2024

Mews Jun 17, 2024

indrajithi Jun 17, 2024

Mews Jun 17, 2024 •

edited

Loading

		spider = Spider("http://http.error")

		resp404 = spider.fetch_url("http://http.error/404")

Increased test coverage #29

Increased test coverage #29

Conversation

Mews commented Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Mews Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

indrajithi commented Jun 17, 2024

Mews commented Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

indrajithi Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mews Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Mews commented Jun 17, 2024 •

edited

Loading

Mews Jun 17, 2024 •

edited

Loading

Mews commented Jun 17, 2024 •

edited

Loading

indrajithi Jun 17, 2024 •

edited

Loading

Mews Jun 17, 2024 •

edited

Loading