# Get started: PDF downloader for Bebauungsplane

The first step necessary to run the BP downloader is to have a database that contains links to different building plans in PDF format. The input for this was the information provided in the [NRW geoportal](https://www.geoportal.nrw/?activetab=map). Clicking on the download button there, you should be able to select all the areas of NRW, select to download information from Bebauungsplane and get the information in GeoPackage format (gpkg extension). 

This extension can be loaded into any GIS interface, and exported into a geojson format. This is the format that the functions finally take as input.


In [1]:
import geopandas as gpd

from geojson_parser import parse_geojson
from NRW_BP_PDF_scraper import run_pdf_downloader

+ **parse_geojson:** parses geojson file with download links to different building plans. It iterates over all rows and checks if the url matches the pattern of a osp-plan.de link without a list format, meaning than the scan url is not directly to a pdf, but the pdf is contained somewhere in the html of the page. If the url matches the pattern, the html of the page is downloaded and parsed with beautiful soup. All links that start with https://www.o-sp.de/download/ are extracted and written to a dataframe.
    + to parse only a sample of the rows, set a sample size defined by sample_n. 

+ **run_pdf_downloader:** goes through a GDF with PDF download links and downloads all the files. Links that return error are saved in a csv called error_links in the defined output folder. 
    + to parse only a sample of the rows, set a sample size defined by sample_n. 

Basic usage:

In [2]:
df = parse_geojson("NRW_BP_sample.geojson", sample_n = 15)

100%|██████████| 15/15 [00:01<00:00, 11.62it/s]


In [3]:
# Run downloader

run_pdf_downloader(df,
                   output_folder="data/NRW/sample_pdf",
                   sample_n=3)

Downloaded: 2408747_4.pdf
Failed to download: https://www.o-sp.de/rheine/plan?pid=39724
Downloaded: 2415976_4.pdf
