Replies: 3 comments 1 reply
-
Thanks for the writeup, I'll start from the bottom:
Where exactly in the docs did you find that you can use Nevertheless, #993 allows configuring this from code and will be released in SDK 1.3.0 Regarding the state, we did think about a Finally, I suggest you use const requestList = await Apify.openRequestList(null, sources); |
Beta Was this translation helpful? Give feedback.
-
You can also delete the RequestList at the end using something like. I was not able to figure out how you can get the RequestList name from the RequestList object.
|
Beta Was this translation helpful? Give feedback.
-
This is quite outdated thread, please note that in recent versions (v2) we already have a helper function called In Crawlee (v3), we handle this automatically - purging is enabled by default and happens when you first try touch some storage class with an async method (e.g. if you try to open a KV store, which also happens under the hood in |
Beta Was this translation helpful? Give feedback.
-
Describe the feature
See simple code that describes the problem https://github.com/terchris/apify-sample
1.When debugging in vs code you must manually delete files otherwise you get unpredictable errors
When debugging you will get errors like:
WARN The following Key-value store directory contains a previous state: /workspace/apify-sample/apify_storage/key_value_stores/default If you did not intend to persist the state - please clear the respective directory and re-start the actor.
The reason for this is that Apify uses the folder apify_storage to store its internal workings. The first time the program is run the folder "apify_storage" is created. Inside this directory you will see the following structure:
Inside the default folder you find the following files
The files are never deleted. You must manually delete them every time you debug. If you do not delete the files you will get the state from the first time you run your program.
Suggested solution and workaround
The functionality for cleaning out state is already implemented using the "-p" for purge parameter
apify run -p
Adding it to the utils would be a solution eg:
await Apify.utils.purge();
The purge should be able to take a parameter so that when setting the APIFY_LOCAL_STORAGE_DIR from code works one could delete it using its name eg:
await Apify.utils.purge("my-local-storage-dir");
2.APIFY_LOCAL_STORAGE_DIR in .env file is ignored
According to the documentation you can change the name of the folder "apify_storage" to something else by adding the following line to the .env file
APIFY_LOCAL_STORAGE_DIR="./my-local-storage-dir"
This example project has this setting, but it is ignored and the following warning message is displayed: WARN Neither APIFY_LOCAL_STORAGE_DIR nor APIFY_TOKEN environment variable is set, defaulting to APIFY_LOCAL_STORAGE_DIR="/workspace/apify-sample/apify_storage"
Suggested solution and workaround
I suggest that it must be possible to set the name of the folder "apify_storage" in the code. The log level can be set in .env variable and it can be overridden in code like this
log.setLevel(log.LEVELS.DEBUG);
I suggest that the utils gets a function to set the APIFY_LOCAL_STORAGE_DIR Something like:
await Apify.utils.setLocalStorageDir("my-local-storage-dir");
Motivation
I have spent lots of time debugging with data from a previous run because I have forgotten to manually delete the folder "apify_storage".
I also have the need to start my scraping locally from another process. To do this I must first delete the folder "apify_storage" using node file system module.
Constraints
Are there any requirements or limits to the feature that need to be satisfied?
Beta Was this translation helpful? Give feedback.
All reactions