Ce notebook permet d'importer et de traiter les données. 

Son éxecution étant assez longue, nous avons enregistré  les dataframes finales dans des csv. Pour éxecuter ce notebook, vous aurez besoin d'un identifiant et d'un token que vous pouvez obtenir en vous connectant sur le site : https://api.insee.fr/catalogue/site/themes/wso2/subthemes/insee/pages/sign-up.jag et que vous devrez compléter dans la première ligne de code.

In [8]:
key="k32RC1ZJH8RV4Llh6kTRakU15tca"
secret="CTnVGRLchI7dbJOlrvYbrOyIfnMa"

**Les importations**

In [1]:
#!pip install contextily
#!pip install geopandas
#!pip install pygeos
#!pip install geopy
#!pip install pynsee[full]

In [24]:
import contextily as ctx
from geopy.geocoders import Nominatim
from shapely.geometry import Point
import geopandas as gpd
import pandas as pd
import pynsee
from pynsee.utils.init_conn import init_conn
from tqdm.notebook import tqdm, trange

**Importation des données**

Nous avons utilisé l'API sirene afin de ne pas surcharger notre environnement de travail avec la base complète du fichier sirene.

In [29]:
N=100000

init_conn(insee_key=key, insee_secret=secret)
data = pynsee.search_sirene(variable = ["activitePrincipaleEtablissement"],
                       pattern = "47.73Z", kind = 'siret', number=N)

data = data.loc[data['dateFin'].isnull()]
df=data[["activitePrincipaleEtablissement","typeVoieEtablissement","libelleVoieEtablissement","libelleCommuneEtablissement","codeCommuneEtablissement"]]

Token has been created
!!! Please subscribe to BDM API on api.insee.fr !!!
!!! Please subscribe to Metadata API on api.insee.fr !!!
!!! Please subscribe to Local Data API on api.insee.fr !!!


In [30]:
df=df.dropna()

liste_departements=["75","77","78","91","92","93","94","95"]
department=[]
for ville in df.codeCommuneEtablissement:
    department.append(ville[:2])
df['departement']=department

df=df[(df.activitePrincipaleEtablissement=="47.73Z") & (df.departement.isin(liste_departements))]
df

Unnamed: 0,activitePrincipaleEtablissement,typeVoieEtablissement,libelleVoieEtablissement,libelleCommuneEtablissement,codeCommuneEtablissement,departement
26,47.73Z,RUE,D ESTIENNE D ORVES,FONTENAY-SOUS-BOIS,94033,94
28,47.73Z,RUE,PETIT DE BEAUVERGER,BRIE-COMTE-ROBERT,77053,77
31,47.73Z,RUE,CHALIGNY,PARIS 12,75112,75
37,47.73Z,RUE,AUGUSTE RENOIR,MARGENCY,95369,95
41,47.73Z,AV,DE VILLIERS,PARIS 17,75117,75
...,...,...,...,...,...,...
25314,47.73Z,BD,PASTEUR,PARIS 15,75115,75
25318,47.73Z,RUE,DE LA MAISON BLANCHE,FONTENAILLES,77191,77
25324,47.73Z,RUE,DES ABBESSES,PARIS 18,75118,75
25352,47.73Z,PL,DE L'EGLISE,LIVRY-SUR-SEINE,77255,77


In [31]:
def get_location(adress):
    geolocator = Nominatim(user_agent="tutorial")
    location = geolocator.geocode(adress)
    return((location.longitude, location.latitude))

def get_location_all(commerces):
    commerces["typeVoieEtablissement"] = commerces["typeVoieEtablissement"].map({"RUE": "rue", "AV": "avenue", "CHS": "chaussée", "CHE": "chemin", "PL": "place", "HAM": "hameau", "BD": "boulevard", "QUAI": "quai", "ALL": "allée"})
    geom = []
    liste_long=[]
    liste_lat=[]
    to_drop=[]
    for k in tqdm(range(len(commerces))):
        try:
            type_voie, libelle_voie, commune = commerces.iloc[k,1], commerces.iloc[k,2], commerces.iloc[k,3]
            loc1,loc2=get_location(f"{type_voie} {libelle_voie} {commune}")
            if loc1<4 and loc1>1.5 and loc2>47 and loc2<51:
                geom.append(Point(loc1,loc2))
                liste_long.append(loc2)
                liste_lat.append(loc1)
            else:
                to_drop.append(k)
        except Exception:
            to_drop.append(k)
    commerces=commerces.drop(commerces.index[to_drop])
    commerces = commerces.set_geometry(geom)
    commerces['long']=liste_long
    commerces['lat']=liste_lat
    return(commerces)

In [32]:
df=get_location_all(df)
df

  0%|          | 0/4346 [00:00<?, ?it/s]

Unnamed: 0,activitePrincipaleEtablissement,typeVoieEtablissement,libelleVoieEtablissement,libelleCommuneEtablissement,codeCommuneEtablissement,departement,geometry,long,lat
26,47.73Z,rue,D ESTIENNE D ORVES,FONTENAY-SOUS-BOIS,94033,94,POINT (2.45423 48.85287),48.852871,2.454225
28,47.73Z,rue,PETIT DE BEAUVERGER,BRIE-COMTE-ROBERT,77053,77,POINT (2.60362 48.69547),48.695474,2.603616
31,47.73Z,rue,CHALIGNY,PARIS 12,75112,75,POINT (2.38379 48.84609),48.846095,2.383790
37,47.73Z,rue,AUGUSTE RENOIR,MARGENCY,95369,95,POINT (2.28987 49.00368),49.003681,2.289873
41,47.73Z,avenue,DE VILLIERS,PARIS 17,75117,75,POINT (2.31354 48.88181),48.881813,2.313540
...,...,...,...,...,...,...,...,...,...
25314,47.73Z,boulevard,PASTEUR,PARIS 15,75115,75,POINT (2.31193 48.84431),48.844310,2.311929
25318,47.73Z,rue,DE LA MAISON BLANCHE,FONTENAILLES,77191,77,POINT (2.95201 48.55047),48.550469,2.952010
25324,47.73Z,rue,DES ABBESSES,PARIS 18,75118,75,POINT (2.33817 48.88461),48.884614,2.338168
25352,47.73Z,place,DE L'EGLISE,LIVRY-SUR-SEINE,77255,77,POINT (2.68483 48.51077),48.510771,2.684826


In [18]:
df.to_csv("librairies.csv", encoding='utf-8', index=False)