# VK URL Scraper
This tool allows you to download all the information and media from one or many VKontakte/ВКонтакте/VK URLs in [JSON](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON) format, along with all their media.

To use it, you first need to create an account on the platform which typically requires a valid phone number and email. 

This notebook will show you how to authenticate into VK with Bellingcat's [vk-url-scraper](https://github.com/bellingcat/vk-url-scraper/) tool and how to then download posts from the social media site, including media such as photos and videos.

- Code and issue management: https://github.com/bellingcat/vk-url-scraper/
- Python package: https://pypi.org/project/vk-url-scraper/
- Documentation: https://vk-url-scraper.readthedocs.io/en/latest/

To run this notebook you need a valid `username/phone` and `password` for vk.com. 

Enter these below and they will be used throughout the notebook.

In [None]:
USER = "your_username_or_phone_here"
PASS = "your_password_here"

### Step 1 - install the python package
This project depends on another tool whose python packaged has not been updated at time of writing, so we have to hack its instalation directly from github, and accordingly perform the authentication into VK with some Python code in Step 2.

In [None]:
%%bash
# install correct dependency and then vk-url-scraper
pip install git+https://github.com/python273/vk_api.git@77b5a0d51a6bbf54d59554332f28a488615fbd6c
pip install vk-url-scraper

In [None]:
# to make sure the installation is successful we can call the help method of the tool
!vk_url_scraper --help

### Step 2 - login
The next cell contains helper python code needed the first time you login.

In [None]:
import vk_api

def captcha_handler(captcha):
    key = input(
        f"CAPTCHA DETECTED, please solve it and input the solution. url= {captcha.get_url()} :"
    ).strip()
    return captcha.try_again(key.strip())

The next code cell tries to login with the credentials you provided above.

You may get a `CAPTCHA DETECTED` message and a URL. In that case, go to the URL, solve the captcha and input it by clicking on the text box on the right of the message (different notebook environments have different appearances and in some the textbox is only visible when you click it).

In [None]:
(vk_api.VkApi(USER,PASS,captcha_handler=captcha_handler)).auth(token_only=True)

A `vk_config.v2.json` file should have been created, it contains your access tokens so don't share it. 

When that file is present you don't need to re-run the python code in this section. 

### Step 3 - scrape a post
We will scrape a post with both text and images using the VK API: https://vk.com/wall-152947668_126406

This will only work if the `vk_config.v2.json` file is present (see Step 2).

The results will be written to `scraped.json` file, if you want to see it in the console remove the final part of the command `> scraped.json`.

In [None]:
!vk_url_scraper --username "{USER}" --password "{PASS}" --urls https://vk.com/wall-152947668_126406 > scraped.json

In [None]:
# let's read the first lines of scraped.json to make sure it's working
!head scraped.json

Besides the post data, we can automatically `--download` the media.

Let's pass two URLs, one with images and one with videos and download all of them.

You can add as many as you want at once, so long as you separate them by a comma `,`. 

The JSON will be written to the console, you can direct it to a JSON file for latter processing by appending `> my_output_filename.json`.

In [None]:
!vk_url_scraper --username "{USER}" --password "{PASS}" --download --urls "https://vk.com/wall-152947668_126406,https://vk.com/video/@kot_minsk?z=video-28021233_456239018%2Fclub28021233%2Fpl_-28021233"_-2

This tool can be used, for example, to automate the scraping and download of large number of URLs from VK.