# DVF to SHP example

This repository contains simple python classes to import and generate shapefiles that can be loaded and analysed in your favorite GIS software (ie Qgis).

The data comes from the french administration and can be downloaded for free.
- [Etalab's Cadastre](https://cadastre.data.gouv.fr/data/etalab-cadastre/)
- [Demandes de valeurs foncières (DVF)](https://www.data.gouv.fr/fr/datasets/demandes-de-valeurs-foncieres/)

This notebook show how to load and process this data.

### Loading Cadastre and DVF datasets

The two following classes are designed to read the french administration format. In this example we concentrate on the Paris city.

In [1]:
from DVF_to_SHP import Cadastre, ValeursFoncieres

cad = Cadastre("data/cadastre-75-parcelles-shp/parcelles.shp")
cad.geom.head(5)

Loading: data/cadastre-75-parcelles-shp/parcelles.shp
Loaded (77485, 8) features


Unnamed: 0_level_0,commune,prefixe,section,numero,contenance,created,updated,coords
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
75102000AB0080,75102,0,AB,80,2300.0,2010-10-26,2016-07-21,"POLYGON ((650809.3584530563 6863456.489397706,..."
75102000AB0068,75102,0,AB,68,1159.0,2007-01-02,2016-07-21,"POLYGON ((650854.2209877718 6863423.215530167,..."
75102000AB0048,75102,0,AB,48,1510.0,2007-01-02,2016-07-21,"POLYGON ((650929.5783008639 6863417.50589484, ..."
75102000AB0046,75102,0,AB,46,1693.0,2007-01-02,2016-07-21,"POLYGON ((650976.6071363769 6863360.106440248,..."
75102000AB0053,75102,0,AB,53,664.0,2007-01-02,2016-07-21,"POLYGON ((650971.8465513664 6863447.694667905,..."


In [7]:
vf_2019 = ValeursFoncieres(files = ["data/valeurs_foncieres/valeursfoncieres-2019.txt"], departements = [75], paris = True)
vf_2019.df.head(5)

Loading : data/valeurs_foncieres/valeursfoncieres-2019.txt


  if self.run_code(code, result):


Loaded (20871, 46) DataFrame
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20871 entries, 995783 to 1017153
Data columns (total 46 columns):
Code service CH               0 non-null float64
Reference document            0 non-null float64
1 Articles CGI                0 non-null float64
2 Articles CGI                0 non-null float64
3 Articles CGI                0 non-null float64
4 Articles CGI                0 non-null float64
5 Articles CGI                0 non-null float64
No disposition                20871 non-null int64
Date mutation                 20871 non-null datetime64[ns]
Nature mutation               20871 non-null object
Valeur fonciere               20871 non-null float64
No voie                       20870 non-null float64
B/T/Q                         793 non-null object
Type de voie                  20848 non-null object
Code voie                     20871 non-null object
Voie                          20871 non-null object
Code postal                   20871 n

### Compute average price by section

Now that we have both information on price and localization, we can compute average prices by section, the number of sells, and every other interesting statistics. 

In [9]:
import pandas as pd

# COMPUTE AVERAGE PRICES BY SECTION
av_price_by_id = vf_2019.get_av_price_by_id()
av_price_by_section = av_price_by_id.groupby('section_id').mean()

# JOIN AVERAGE PRICES AND SECTION DATA
cad_section = cad.get_section_geom() # This takes a while
av_price_by_section = av_price_by_section.join(cad_section)
av_price_by_section.drop('ntransacs', axis = 1, inplace = True)

# ADD INTERESTING INFO TO ENRICH THE SHP LAYER
ntransacs_by_section = av_price_by_id.groupby('section_id').sum()['ntransacs']
ntransacs_by_section= pd.Series(ntransacs_by_section, name = 'ntransacs')
av_price_by_section = av_price_by_section.join(ntransacs_by_section)

av_price_by_section.head(5)

Unnamed: 0_level_0,Surface reelle bati,prix m2,section_coords,ntransacs
section_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
75101000AK,38.0,14356.45,(POLYGON ((651858.5620657594 6862370.872244564...,3
75101000AL,61.5,11858.75,(POLYGON ((651798.1085370607 6862047.830272658...,4
75101000AN,33.1875,inf,(POLYGON ((652147.2696394521 6862210.477592154...,20
75101000AO,57.55303,11231.56,(POLYGON ((652100.1124391644 6862317.006758546...,30
75101000AP,50.75,9446.705,(POLYGON ((652219.6584116643 6862510.028696066...,4


### Export the processed DVF data to a SHP layer

This layer can then be displayed and further processed using a GIS software such as QGIS.

In [None]:
import geopandas

sections_geo = geopandas.GeoDataFrame(av_price_by_section.reset_index(), geometry='coords')
sections_geo.to_file('sections-75-prix.shp', driver='ESRI Shapefile')