## Downloading Housing Stat. Data from GWR

In this initial notebook, the puplicly available GWR data is downloaded. The data is later used on the first benchmarking step which will test data read/write capabilities form the three tools DuckDB, Pandas, and PostgreSQL.


GWR data from: https://www.housing-stat.ch/de/madd/public.html <br>
Load data from: https://public.madd.bfs.admin.ch/ch.zip as zip.file

In [1]:
import os
import sys
import csv
import zipfile
import urllib.request

In [2]:
os.chdir("..\datasets\GWR")
path = os.getcwd()

## Step 1: Download CSV files for multiple cantons

**Why loading each canton individually?**<br>
For the later benchmark test, a realistic scenario is data coming from multiple csv files. Therefore, the data from a defined set of cantons is loaded and stored as individual files.

In [None]:
# defined cantons of which data should be retrieved - limit is 20 due to API restrictions
cantons = ['zh', 'sg', 'be', 'vd','gr','lu','bs','ge','ti','vs','ag','ai','sh','sz','zg', 'gl', 'ur']

### Loading the GWR Data for a set of Cantons and extract the CSV files
The following code downloads several ZIP filer from the GWR homepage which are subsequently unzipped. 

In [None]:
for canton in cantons:
    try:
        url= f'https://public.madd.bfs.admin.ch/{canton}.zip'
        file = f'\{canton}.zip'
        filepath = path + file
        urllib.request.urlretrieve(url=url,filename=filepath) # 20 calls per min is max.
        with zipfile.ZipFile(f'{canton}.zip') as z:
            z.extract('eingang_entree_entrata.csv')
        os.rename('eingang_entree_entrata.csv', f'eingang_entree_entrata_{canton}.csv')
    except:
        print("File downloaded failed for: " + canton)

### Optional: Delete ZIP files since they are no longer needed

In [None]:
for canton in cantons:
    os.remove(f'{canton}.zip')

# Step 2: Download single CSV file for whole Switzerland
This file is stored in a separate folder for the subsequent testing.

In [3]:
os.chdir("GWR_Total")
path = os.getcwd()

In [4]:
try:
    url= 'https://public.madd.bfs.admin.ch/ch.zip'
    file = '\ch.zip'
    filepath = path + file
    urllib.request.urlretrieve(url=url,filename=filepath)
    with zipfile.ZipFile('ch.zip') as z:
        z.extract('eingang_entree_entrata.csv')
    os.rename('eingang_entree_entrata.csv', 'eingang_entree_entrata_total.csv')
except:
    print("File downloaded failed")

### Optional: Delete ZIP file since they are no longer needed

In [5]:
os.remove('ch.zip')