Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vkontakte-user function crashes immediately on use #737

Open
AccentuSoft opened this issue Feb 24, 2023 · 1 comment
Open

vkontakte-user function crashes immediately on use #737

AccentuSoft opened this issue Feb 24, 2023 · 1 comment
Labels
bug Something isn't working module:vkontakte

Comments

@AccentuSoft
Copy link
Contributor

Describe the bug

Running the software in a regular fashion results in errors:

$ snscrape vkontakte-user durov
2023-02-24 14:31:55.752  WARNING  snscrape.modules.vkontakte  Skipping post without link: '<div class="_post post page_block all own post--withPostBottomAction post--with-likes closed_comments deep_active Post--redesign" data-post-id="1_2442097" data-replies-limit="0" id="post1_2442097" onc'
2023-02-24 14:31:55.808  CRITICAL  snscrape._cli  Dumped stack and locals to /tmp/snscrape_locals__x7ru5_r
Traceback (most recent call last):
  File "[PATH]/venv2/bin/snscrape", line 8, in <module>
    sys.exit(main())
  File "[PATH]/venv2/lib/python3.10/site-packages/snscrape/_cli.py", line 318, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "[PATH]/venv2/lib/python3.10/site-packages/snscrape/modules/vkontakte.py", line 278, in get_items
    yield from _process_soup(soup)
  File "[PATH]/venv2/lib/python3.10/site-packages/snscrape/modules/vkontakte.py", line 273, in _process_soup
    postID = int(item.url.rsplit('_', 1)[1])
AttributeError: 'NoneType' object has no attribute 'url'

In vkontakte.py:

Instead of post_link class, we see PostHeaderSubtitle__link.
For dates, instead of this:
post.find('div', class_ = 'post_date').find('span', class_ = 'rel_date')
we found this to work:
postLink.find('time', class_ = 'PostHeaderSubtitle__item')

By doing those replacements, we find that the function starts (mostly) working again.
We're not sure what the full extent of the replacements needs to be.

How to reproduce

Run the command:
snscrape vkontakte-user durov

Expected behaviour

After doing the aforementioned replacements, we start getting results like so:

$ snscrape vkontakte-user durov
https://vk.com/wall1_2442097
https://vk.com/wall1_2431591
https://vk.com/wall1_2422169
https://vk.com/wall1_2418560
https://vk.com/wall1_2412029
https://vk.com/wall1_2407925
https://vk.com/wall1_2405336
https://vk.com/wall1_2401719
https://vk.com/wall1_2401089
...

Screenshots and recordings

No response

Operating system

Ubuntu 22.04

Python version: output of python3 --version

Python 3.10.6

snscrape version: output of snscrape --version

snscrape 0.5.0.20230113 & snscrape 0.5.0.20230114.dev31+gf329b69

Scraper

vkontakte-user

Backtrace

No response

Dump of locals

No response

How are you using snscrape?

CLI (snscrape ... as a command, e.g. in a terminal)

Additional context

No response

@AccentuSoft AccentuSoft added the bug Something isn't working label Feb 24, 2023
@JustAnotherArchivist
Copy link
Owner

Indeed, VK restructured their HTML sometime recently, as I discovered a few days ago. Thanks for filing an issue about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:vkontakte
Projects
None yet
Development

No branches or pull requests

2 participants