Ce notebook permet d'importer et de traiter les données. 

Son éxecution étant assez longue, nous avons enregistré  les dataframes finales dans des csv. Pour éxecuter ce notebook, vous aurez besoin d'un identifiant et d'un token que vous pouvez obtenir en vous connectant sur le site : https://api.insee.fr/catalogue/site/themes/wso2/subthemes/insee/pages/sign-up.jag et que vous devrez compléter dans la première ligne de code.

In [8]:
key="k32RC1ZJH8RV4Llh6kTRakU15tca"
secret="CTnVGRLchI7dbJOlrvYbrOyIfnMa"

**Les importations**

In [1]:
#!pip install contextily
#!pip install geopandas
#!pip install pygeos
#!pip install geopy
#!pip install pynsee[full]

In [2]:
import contextily as ctx
from geopy.geocoders import Nominatim
from shapely.geometry import Point
import geopandas as gpd
import pandas as pd
import pynsee
from pynsee.utils.init_conn import init_conn
from tqdm.notebook import tqdm, trange



**Importation des données**

Nous avons utilisé l'API sirene afin de ne pas surcharger notre environnement de travail avec la base complète du fichier sirene.

In [9]:
N=100000

init_conn(insee_key=key, insee_secret=secret)
data = pynsee.search_sirene(variable = ["activitePrincipaleEtablissement"],
                       pattern = "47.22Z", kind = 'siret', number=N)

data = data.loc[data['dateFin'].isnull()]
df=data[["activitePrincipaleEtablissement","typeVoieEtablissement","libelleVoieEtablissement","libelleCommuneEtablissement","codeCommuneEtablissement"]]

Token has been created
!!! Please subscribe to BDM API on api.insee.fr !!!
!!! Please subscribe to Metadata API on api.insee.fr !!!
!!! Please subscribe to Local Data API on api.insee.fr !!!


In [10]:
df=df.dropna()

liste_departements=["75","77","78","91","92","93","94","95"]
department=[]
for ville in df.codeCommuneEtablissement:
    department.append(ville[:2])
df['departement']=department

df=df[(df.activitePrincipaleEtablissement=="47.22Z") & (df.departement.isin(liste_departements))]
df

Unnamed: 0,activitePrincipaleEtablissement,typeVoieEtablissement,libelleVoieEtablissement,libelleCommuneEtablissement,codeCommuneEtablissement,departement
38,47.22Z,RUE,DE L ORILLON,PARIS 11,75111,75
62,47.22Z,RUE,RICHER,PARIS 9,75109,75
69,47.22Z,BD,ARISTIDE BRIAND,CHAMPIGNY-SUR-MARNE,94017,94
79,47.22Z,RUE,MONTORGUEIL,PARIS 2,75102,75
80,47.22Z,RUE,DES CARNETS,CLAMART,92023,92
...,...,...,...,...,...,...
20889,47.22Z,RUE,SERPENTE,CHAMPIGNY-SUR-MARNE,94017,94
20890,47.22Z,BD,JEAN ALLEMANE,ARGENTEUIL,95018,95
20894,47.22Z,RUE,ARMAND FABRE,BRUNOY,91114,91
20908,47.22Z,BD,PASTEUR,LA COURNEUVE,93027,93


In [11]:
def get_location(adress):
    geolocator = Nominatim(user_agent="tutorial")
    location = geolocator.geocode(adress)
    return((location.longitude, location.latitude))

def get_location_all(commerces):
    commerces["typeVoieEtablissement"] = commerces["typeVoieEtablissement"].map({"RUE": "rue", "AV": "avenue", "CHS": "chaussée", "CHE": "chemin", "PL": "place", "HAM": "hameau", "BD": "boulevard", "QUAI": "quai", "ALL": "allée"})
    geom = []
    liste_long=[]
    liste_lat=[]
    for k in tqdm(range(len(commerces))):
        number_dropped=0
        new_k=k-number_dropped
        try:
            type_voie, libelle_voie, commune = commerces.iloc[new_k,1], commerces.iloc[new_k,2], commerces.iloc[new_k,3]
            loc1,loc2=get_location(f"{type_voie} {libelle_voie} {commune}")
            if loc1<4 and loc1>1.5 and loc2>47 and loc2<51:
                geom.append(Point(loc1,loc2))
                liste_long.append(loc2)
                liste_lat.append(loc1)
            else:
                commerces=commerces.drop(df.index[new_k])
                new_k+=1
        except Exception:
            commerces=commerces.drop(df.index[new_k])
    commerces = commerces.set_geometry(geom)
    commerces['long']=liste_long
    commerces['lat']=liste_lat
    return(commerces)

In [12]:
df=get_location_all(df)
df

  0%|          | 0/3294 [00:00<?, ?it/s]

Unnamed: 0,activitePrincipaleEtablissement,typeVoieEtablissement,libelleVoieEtablissement,libelleCommuneEtablissement,codeCommuneEtablissement,departement,geometry,long,lat
38,47.22Z,rue,DE L ORILLON,PARIS 11,75111,75,POINT (2.37473 48.86977),48.869769,2.374732
62,47.22Z,rue,RICHER,PARIS 9,75109,75,POINT (2.34523 48.87403),48.874034,2.345234
69,47.22Z,boulevard,ARISTIDE BRIAND,CHAMPIGNY-SUR-MARNE,94017,94,POINT (2.49736 48.82125),48.821248,2.497360
79,47.22Z,rue,MONTORGUEIL,PARIS 2,75102,75,POINT (2.34689 48.86512),48.865124,2.346891
80,47.22Z,rue,DES CARNETS,CLAMART,92023,92,POINT (2.25968 48.78950),48.789496,2.259681
...,...,...,...,...,...,...,...,...,...
19449,47.22Z,rue,LOUNES MATOUB,ARCUEIL,94003,94,POINT (2.53319 48.81815),48.818151,2.533187
19457,47.22Z,rue,DES MOUTIERS,FAREMOUTIERS,77176,77,POINT (2.25476 48.95440),48.954400,2.254758
19476,47.22Z,avenue,JEAN JAURES,CLAMART,92023,92,POINT (2.50776 48.70651),48.706507,2.507756
19483,47.22Z,rue,I ET F JOLIOT CURIE,MONTREUIL,93048,93,POINT (2.38401 48.92330),48.923297,2.384014


In [13]:
df.to_csv("boucheries.csv", encoding='utf-8', index=False)