## Description 
Automated script to build DVF+ from DVF dataset and Base Nationale des Batiments dataset.

Workflow:
1. Load BNB dataset.
2. Load DVF dataset for one geographical area and one property type.
3. Preprocess BNB dataset.
4. Merge BNB dataset with DVF dataset.
5. Save DVF+ dataset.
6. Repeat steps 2 to 5 for all geographical areas and property types.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys 
sys.path.append("../")

In [3]:
from lib.preprocessing import bnb, dvf
from lib.enums import CITIES

In [4]:
import gc 
gc.collect()

21

In [5]:
DATA_DIR = "../data/"
ZIP_DIR = f"{DATA_DIR}dvf.zip" 

BACKUP_DIR = f"{DATA_DIR}dvf+/"

In [6]:
bnb_df = bnb.load_bnb(DATA_DIR, "base_nat_bat.parquet")

In [7]:
bnb_df.shape

(27607773, 59)

In [11]:
geo_areas = CITIES + ["uban_areas", "rural_areas"]

In [10]:
for geo_area in geo_areas:

    for property_type in ["flats", "houses"]:

        print(f"Loading DVF dataset for {geo_area} and {property_type}...")
        dvf_args = {
            "geo_area": geo_area, 
            "property_type": property_type, 
        }

        dvf_df = dvf.concat_datasets_per_year(ZIP_DIR, **dvf_args)

        try:
            dvfplus = bnb.create_dvfplus(dvf=dvf_df, bnb=bnb_df, na_max=.2) 

            file_path = f"{BACKUP_DIR}{dvf_args['geo_area']}_{dvf_args['property_type']}.csv"
            dvfplus.to_csv(file_path, index=False)
            print(f"File successfully saved at {file_path}.")

        except:
            print(f"Error while processing {geo_area} and {property_type}.")
            continue

del dvf_df, dvfplus, bnb_df

Loading DVF dataset for urban_areas and flats...


Processing 2022: 100%|██████████| 6/6 [04:13<00:00, 42.25s/it]


File successfully saved at ../data/dvf+/urban_areas_flats.csv.
Loading DVF dataset for urban_areas and houses...


Processing 2022: 100%|██████████| 6/6 [06:18<00:00, 63.14s/it]


File successfully saved at ../data/dvf+/urban_areas_houses.csv.
Loading DVF dataset for rural_areas and flats...


Processing 2022: 100%|██████████| 6/6 [05:37<00:00, 56.26s/it]


File successfully saved at ../data/dvf+/rural_areas_flats.csv.
Loading DVF dataset for rural_areas and houses...


Processing 2022: 100%|██████████| 6/6 [05:45<00:00, 57.66s/it]


File successfully saved at ../data/dvf+/rural_areas_houses.csv.
