Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased test coverage #29

Merged
merged 8 commits into from
Jun 17, 2024
Merged

Conversation

Mews
Copy link
Collaborator

@Mews Mews commented Jun 17, 2024

Related to #28 and #24

Changes
Before, the test coverage was at around 79% (this is including the soon to be removed main function in crawler.py)
I've added new test cases to test_crawler.py, so that now the test coverage is at 100% (excluding the main function)

I've also changed the ci workflow so that a coverage report is shown when running the tests, using the already installed pytest-cov extension

And I've also fixed some formatting issues present before my changes (not sure how they were passing the ci but they were getting found on my machine on the commit hook so I fixed them)

spider = Spider("http://http.error")

resp404 = spider.fetch_url("http://http.error/404")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is too long.

  1. We can break this based on SRP. like test_fetch_url_handles_404_error, test_fetch_url_handles_408_error etc.

  2. We can extract out common code def setup_mock_response(url, status, body="<html><body><a href='http://http.error'>link</a></body></html>") and re use it.

  3. type: ignore (can we use type hints here. Maybe pytest.CaptureFixture?)

Copy link
Collaborator Author

@Mews Mews Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right I'll shorten it 👍
I don't understand what you mean in the 3rd thing tho. Are you referring to the places where I wrote # type: ignore? That's because otherwise mypy would complain about the capsys argument, which is only passed when running the tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed the changes let me know what you think 👍

@indrajithi
Copy link
Collaborator

I've also changed the ci workflow so that a coverage report is shown when running the tests, using the already installed pytest-cov extension

Awesome!

Can we display it in Readme? Ideally we can make it like: If it falls below a threshold coverage (80%) the CI should fail. We can spec out that in a different issue.

@Mews
Copy link
Collaborator Author

Mews commented Jun 17, 2024

I don't know about making the ci fail when its bellow a certain coverage percentage, but I can add a badge to the readme with the coverage percent, which then links to the full report.
You can see how it'd look here for example :)
It would use this action for it

body=requests.exceptions.Timeout(),
status=408
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    responses.add(
        responses.GET,
        "http://timeout.error",
        body=requests.exceptions.Timeout(),
        status=408
    )

This is getting repeated in every test case. We can move this into a reusable function. And maybe move it to test/utils.py so that we can re use this other places as well.

something like this

@contextmanager
def capture_output(capsys: pytest.CaptureFixture) -> Generator[None, None, str]:
   yield
   captured = capsys.readouterr()
   return captured.out

def setup_mock_response(url: str, status: int, body: str = "<html><body><a href='http://http.error'>link</a></body></html>") -> None:
   responses.add(
       responses.GET,
       url,
       body=body,
       status=status
   )

and we can call the tests like this

@responses.activate
def test_fetch_url_handles_404_error(capsys: pytest.CaptureFixture) -> None:
 setup_mock_response("http://http.error/404", 404)

 spider = Spider("http://http.error")

 with capture_output(capsys) as output:
     resp404 = spider.fetch_url("http://http.error/404")

 assert 

And repeat this patten for testing other exceptions.

Or even better to use parametrize fixtures

@pytest.mark.parametrize("url, status", [
 ("http://http.error/404", 404),
 ("http://http.error/408", 408),
 ("http://http.error/403", 403),
])
def test_fetch_url_http_errors(mock_responses, capsys_output, url, status):

LMK your thoughts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I get it I didn't understand it was meant to be for every test, give me a second and I'll implement it.
I've never used pytest parametrize so I might not be the most indicated to implement that, maybe we can just open an issue for it and it can be dealt with later?

Copy link
Collaborator

@indrajithi indrajithi Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very simple. Basically instead of calling test_fetch_url_handles_404_error, test_fetch_url_handles_408_error we generalize it into one test test_fetch_url_http_errors and pass the fixture parms to it with the url to call and the response we expect. You can try it. If not working. I can pick it up later as part of the refactoring I am currently doing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like capsys doesn't capture the stdout from inside the context manager.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. LMK once you fix other things and we can merge this.

@responses.activate
def test_crawl_found_duplicate_url() -> None:
responses.add(
responses.GET,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reuse the helper setup_mock_response in all these places

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@indrajithi I added the helper function everywhere, I can't get the context manager to work. It seems like capsys doesn't capture the stdout from inside the context manager.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I open an issue for it? To create the context manager for the tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues. Let us ignore it for now. This looks good. We can pick displaying coverage in readme in a separate issue. I will merge this one.

Copy link
Collaborator Author

@Mews Mews Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues. Let us ignore it for now. This looks good. We can pick displaying coverage in readme in a separate issue. I will merge this one.

I've already created an issue for it if you can just assign me there :) #30

@indrajithi indrajithi merged commit 2f158a9 into DataCrawl-AI:master Jun 17, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants