# Day 1: points

We always say France is all about __cheese__ 🧀, let's prove that using maps based on official statistics open-data ! 

This notebook uses the following ressources:

- Sirene data (official firms register) downloaded from the [`API Sirene`](https://api.insee.fr/catalogue/site/themes/wso2/subthemes/insee/pages/item-info.jag?name=Sirene&version=V3&provider=insee) thanks to `Python` package [`pynsee`](https://github.com/InseeFrLab/pynsee)
- Geocoding based on address using [`API BAN`](https://api.gouv.fr/les-api/base-adresse-nationale)
- Official France geographic borders fetched with [`cartiflette`](https://github.com/InseeFrLab/cartiflette) `Python` package

## Preliminary steps

### Dependencies

To install all dependencies required to reproduce and extend this analysis, you need
to install a few packages with the following lines.

If your `pip install` does not know where to find `Git`, you can have a look to an alternative
command at the end of the notebook.

In [None]:
!pip install -r requirements.txt
!pip install git+https://github.com/inseefrlab/cartogether

<div class="alert alert-info" role="alert">
<h3 class="alert-heading">API Sirene </h3>
The <code>API Sirene</code> is an official French API developped by Insee, the French national statistical office. 
Documentation can be found
<a href="https://api.insee.fr/catalogue/site/themes/wso2/subthemes/insee/pages/item-info.jag?name=Sirene&version=V3&provider=insee">here</a>
(in French).
<br>
If you want to use it, you need to create an account, register to <code>API Sirene</code> service (free) and give 
<code>Python</code> your credentials (see below)
</div>

We propose to put the credentials for `API Sirene` authentification in a `secrets.yaml` file.
This technique, that I teach [here](https://ensae-reproductibilite.netlify.app/application/#configyaml), avoids to share personal credentials in notebooks.  

### Preparing authentification to `API Sirene`

In [10]:
import yaml
import os

with open('secrets.yaml', 'r') as file:
    options = yaml.safe_load(file)

os.environ['insee_key'] = options['insee_key']
os.environ['insee_secret'] = options['insee_secret']

## Getting French borders and French firm level codes

In this section, we propose to fetch some elements that will be useful to be able to produce our maps easily:

1. Get sector code from [French nomenclature of activities (NAF)](https://www.insee.fr/fr/information/2406147). This will
prove useful to only request data we are interested in.
2. Get French official border limits produced by the French institute for Geography (IGN)

We use two experimental packages that are build to help analyzing official statistics data: [`pynsee`](https://github.com/InseeFrLab/pynsee) and
[`cartiflette`](https://github.com/InseeFrLab/cartiflette)

We can get the full nomenclature of activity list using an internal function of `pynsee` package:

In [14]:
import pynsee
naf5 = pynsee.get_activity_list('NAF5')

naf5.sample(5)

This function renders package's internal data


Unnamed: 0,A10,A129,A17,A21,A38,A5,A64,A88,NAF1,NAF2,NAF3,NAF4,NAF5,TITLE_NAF5_40CH_FR,TITLE_NAF5_65CH_FR,TITLE_NAF5_FR
386,GI,G46Z,GZ,G,GZ,GU,GZ2,46,G,46,46.1,46.19,46.19B,Autre interm. commerce en prodts divers,Autres intermédiaires du commerce en produits ...,Autres intermédiaires du commerce en produits ...
445,GI,G47Z,GZ,G,GZ,GU,GZ3,47,G,47,47.2,47.24,47.24Z,Comm. dét. pain pâtiss. & confiser. (ms),Comm. détail pain pâtisserie & confiserie (mag...,"Commerce de détail de pain, pâtisserie et conf..."
129,BE,C17B,C5,C,CC,BE,CC2,17,C,17,17.2,17.21,17.21A,Fabrication de carton ondulé,Fabrication de carton ondulé,Fabrication de carton ondulé
312,BE,C33Z,C5,C,CM,BE,CM2,33,C,33,33.2,33.2,33.20D,Inst. éqpt élec. électro. optiq. ou aut.,"Instal. éqpts électriq, mat. électro. et optiq...","Installation d'équipements électriques, de mat..."
238,BE,C26F,C3,C,CI,BE,CI0,26,C,26,26.6,26.6,26.60Z,Fab. éqpt irrad. médic. & électromedic.,Fab. éqpts d'irradiation médic. électromédic. ...,Fabrication d'équipements d'irradiation médica...


If we want to know the code for cheese factories, we can use the following code:

In [16]:
naf5.loc[naf5['TITLE_NAF5_FR'].str.contains('fromage')]

Unnamed: 0,A10,A129,A17,A21,A38,A5,A64,A88,NAF1,NAF2,NAF3,NAF4,NAF5,TITLE_NAF5_40CH_FR,TITLE_NAF5_65CH_FR,TITLE_NAF5_FR
68,BE,C10E,C1,C,CA,BE,CA0,10,C,10,10.5,10.51,10.51C,Fabrication de fromage,Fabrication de fromage,Fabrication de fromage


Finally, to get the official French borders, we can use `cartiflette` experimental function:

In [21]:
from cartiflette.download import get_vectorfile_ign

france = get_vectorfile_ign(
  level = "REGION",
  field = "metropole",
  source = "COG",
  provider="opendatarchives"
  )
france = france.dissolve()

opendatarchives
COG


Downloading: 100%|██████████| 244M/244M [01:46<00:00, 2.41MiB/s] 
ERROR 1: PROJ: proj_create_from_database: Open of /opt/mamba/share/proj failed


We reported some functions to automatize downloads in a `functions.py`

In [28]:
import functions as fc

## Cheese producers map 🧀

Putting together all elements, we can use `matplotlib` and `contextily` features to get a nice
map.

We first fetch the data using Sirene API and transform in `GeoPandas DataFrame` to get a geographic object:

In [33]:
import geopandas as gpd

geodata_complete = fc.geoloc_data(
    fc.create_dataset_sirene()
)
gdf = gpd.GeoDataFrame(
    geodata_complete,
    geometry=gpd.points_from_xy(geodata_complete['longitude'], geodata_complete['latitude']),
    crs=4326)

--- 8.33044171333313 seconds ---


In [None]:
import contextily as ctx
import matplotlib.pyplot as plt

txt="Twitter: @linogaliana\nSource: IGN - Insee "

ax = gdf.to_crs(2154).plot(color = 'red', alpha = 0.5, markersize = 2, figsize = (7,7))
france.plot(ax = ax, zorder=1, edgecolor = "black", alpha = 0.6, facecolor="none",
                                                           color = None)
ctx.add_basemap(ax, source = ctx.providers.Stamen.Watercolor, crs = 2154)
ax.set_axis_off()
ax.set_title('Cheese producers in France')
plt.figtext(.2, .03, txt, wrap=True, horizontalalignment='left', fontsize=8)

## Cows breeders 🐄

In [None]:
geopandas_vaches = geoloc_data(
    create_dataset_sirene("01.41Z")
)
geopandas_vaches = gpd.GeoDataFrame(
    geopandas_vaches,
    geometry=gpd.points_from_xy(geopandas_vaches['longitude'], geopandas_vaches['latitude']),
    crs=4326)
ax = geopandas_vaches.to_crs(2154).plot(color = 'red', alpha = 0.5, markersize = 2)
france.plot(ax = ax, zorder=1, edgecolor = "black", alpha = 0.6, facecolor="none",
                                                           color = None)
ctx.add_basemap(ax, source = ctx.providers.Stamen.Watercolor, crs = 2154)
ax.set_axis_off()

## Camel breeders 🐪

In [None]:
geopandas_chameaux = geoloc_data(
    create_dataset_sirene("01.44Z")
)

In [None]:
gdf_chameaux = gpd.GeoDataFrame(
    geopandas_chameaux,
    geometry=gpd.points_from_xy(geopandas_chameaux['longitude'], geopandas_chameaux['latitude']),
    crs=4326)

In [None]:
ax = gdf_chameaux.to_crs(2154).plot(color = 'red', alpha = 0.5, markersize = 2, figsize = (7,7))
france.plot(ax = ax, zorder=1, edgecolor = "black", alpha = 0.6, facecolor="none",
                                                           color = None)
ctx.add_basemap(ax, source = ctx.providers.Stamen.Watercolor, crs = 2154)
ax.set_axis_off()

## Installations

In [None]:
!pip install pynsee

In [None]:
!git clone https://github.com/InseeFrLab/cartogether.git
%cd ./cartogether
!pip install -r requirements.txt
!pip install .