Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JsonFileStore feature #52

Merged
merged 3 commits into from Oct 24, 2022

Conversation

Cyberschorsch
Copy link

I found your great project during #hacktoberfest and gave it a spin. I noticed that there is currently only an option for saving results as CSV and thought having the option for saving to a JSON file would be a nice addition to your project.

Usage is fairly simple, instead of adding the CSV Store to your crawler just add the new store instead:

$crawler->setStore(new JsonFileStore(YOUR_STORE_PATH));

Lucio Waßill added 2 commits October 19, 2022 21:06
This new store allows storing crawling results into an json file.
@otsch
Copy link
Member

otsch commented Oct 20, 2022

Hey, thanks a lot for you contribution! Good idea!
I just have a few requests:

  • As i can see in the it saves Results to a JSON file test case, the resulting structure in the created file is [{...}] with one result, [[{...}, {...}]] with two results, and then [[{...}, {...}], [{...}]] with three results. Can you please change it, so it's always just an array with a list of objects in it? So [{...}] after one result, [{...}, {...}] after two, [{...}, {...}, {...}] after three, and so on.
  • When I check out the branch and run the tests, they sometimes fail...not always but sometimes. I think it's probably related to running tests in parallel with pest and the beforeAll() and afterAll() methods in the two test files SimpleCsvFileStoreTest.php and JsonFileStoreTest.php. I think they probably try to create or delete folders/files at the same time. Maybe we should commit an empty _files directory with an empty .gitkeep file in it instead of always creating that directory. And also suppress errors when removing files in the afterAll() hook.
  • There's one phpstan error...please check it by running composer stan and fix it.

Add all results as elements of a flat array.
Also improve tests for stores: the _files directory, where the files
that the tests create are stored is checked in and when deleting files
they just delete the files that were created in the same test file
(.json/.csv).
@otsch
Copy link
Member

otsch commented Oct 24, 2022

Never mind. I changed it. Thanks for your contribution @Cyberschorsch! 🙏🏻🏅

@otsch otsch merged commit 18c4c39 into crwlrsoft:main Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants