Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to delete a spider and the data? #380

Closed
phongtnit opened this issue Jan 27, 2016 · 9 comments
Closed

How to delete a spider and the data? #380

phongtnit opened this issue Jan 27, 2016 · 9 comments

Comments

@phongtnit
Copy link

Hi,

I want to delete immediately all data of a spider. I executed the following steps instead of the guide http://docs.pyspider.org/en/latest/Frequently-Asked-Questions/ (I don't want to wait 24 hours as the tutorial does):

  1. I remove spider's document in projectdb database.
  2. I remove spider's collection in taskdb database.
  3. I remove spider's collection in resultdb database.

However, when I recreate the spider with the same name. I checked http://x.x.x.x:5000/tasks?project=spidername. I saw old data of the spider. Why?

Please guide me all steps to delete immediately all data of a spider. Thanks in advance,

@binux
Copy link
Owner

binux commented Jan 27, 2016

http://docs.pyspider.org/en/latest/Frequently-Asked-Questions/#how-to-delete-a-project
you can change scheduler.DELETE_TIME by scheduler argument --delete-time

@phongtnit
Copy link
Author

Hello,

Do you mention that I will stop the schedule and restart with --delete-time option? For example, pyspider -c config.json scheduler --delete-time=1 ( 1 = an hour)

Thanks for your support,

@binux
Copy link
Owner

binux commented Jan 27, 2016

--delete-time=1 means 1 second, you could put argument 'delete_time' to config.json as well.

@phongtnit
Copy link
Author

Hello binux,

I did the following steps:

  1. Add delete_time option to config.json file
{
  "taskdb": "mongodb+taskdb://127.0.0.1:27017/pyspider_taskdb",
  "projectdb": "mongodb+projectdb://127.0.0.1:27017/pyspider_projectdb",
  "resultdb": "mongodb+resultdb://127.0.0.1:27017/pyspider_resultdb",
  "message_queue": "redis://127.0.0.1:6379/db",
  "webui": {
    "username": "admin",
    "password": "xxxxxx",
    "need-auth": true
  },
  "delete_time": 60
}
  1. Setup delete a project on webui, please refer http://i.imgur.com/mfmh9WU.png
  2. Kill the scheduler process
  3. Start scheduler process without deleting data folder of the scheduler.
nohup pyspider -c config.json scheduler >> logs/scheduler.log 2>&1&
  1. Wait more than 20 minutes
  2. I cheked webui, the project didn't delete. Also, I checked documents of the project in resultdb database, their documents didn't remove too.

What should I do? I'm using pyspider 0.3.6

Thanks for your help,

@binux
Copy link
Owner

binux commented Jan 27, 2016

{
  "taskdb": "mongodb+taskdb://127.0.0.1:27017/pyspider_taskdb",
  "projectdb": "mongodb+projectdb://127.0.0.1:27017/pyspider_projectdb",
  "resultdb": "mongodb+resultdb://127.0.0.1:27017/pyspider_resultdb",
  "message_queue": "redis://127.0.0.1:6379/db",
  "webui": {
    "username": "admin",
    "password": "xxxxxx",
    "need-auth": true
  },
  "scheduler": {
    "delete_time": 60
  }
}

@phongtnit
Copy link
Author

Hello binux,

That's works now thanks to your setting.

Many thanks for your script and awesome help,

@phongtnit
Copy link
Author

Hi binux,

After successful deleting a project, I added the project again with the same name. It showed that pyspider keep old data of the process of the spider. Please refer http://i.imgur.com/Maqhh7I.png

I guess this is a small bug.

@visoar
Copy link

visoar commented Feb 2, 2016

After successful deleting a project, I added the project again with the same name. It showed that pyspider keep old data of the process of the spider. Please refer http://i.imgur.com/Maqhh7I.png

have the same problem.

@binux
Copy link
Owner

binux commented Feb 3, 2016

it should be fixed via 468202f

@binux binux closed this as completed Feb 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants