

### French statistical data from  [INSEE](http://example.com/ "Title").



    
###  File: base_etablissement_par_tranche_effectif :  give information on the number of firms in every french town, categorized by size , come from INSEE.


* CODGEO : geographique code for the town (can be joined with *code_insee* column from "name_geographic_information.csv')
* LIBGEO : name of the town (in french)
* REG : region number
* DEP : depatment number
* E14TST : total number of firms in the town
* E14TS0ND : number of unknown or null size firms in the town
* E14TS1 : number of firms with 1 to 5 employees in the town
* E14TS6 : number of firms with 6 to 9 employees in the town
* E14TS10 : number of firms with 10 to 19 employees in the town
* E14TS20 : number of firms with 20 to 49 employees in the town
* E14TS50 : number of firms with 50 to 99 employees in the town
* E14TS100 : number of firms with 100 to 199 employees in the town
* E14TS200 : number of firms with 200 to 499 employees in the town
* E14TS500 : number of firms with more than 500 employees in the town


###  File: name_geographic_information : give geographic data on french town (mainly latitude and longitude, but also region / department codes and names )

*    EU_circo : name of the European Union Circonscription
*    code_région : code of the region attached to the town
*   nom_région : name of the region attached to the town
*    chef.lieu_région : name the administrative center around the town
*    numéro_département : code of the department attached to the town
*    nom_département : name of the department attached to the town
*    préfecture : name of the local administrative division around the town
*    numéro_circonscription : number of the circumpscription
*    nom_commune : name of the town
*    codes_postaux : post-codes relative to the town
*    code_insee : unique code for the town
*    latitude : GPS latitude
*    longitude : GPS longitude
*   éloignement : i couldn't manage to figure out what was the meaning of this number

### File: net_salary_per_town_per_category : salaries around french town per job categories, age and sex

*    CODGEO : unique code of the town
*    LIBGEO : name of the town
*    SNHM14 : mean net salary
*    SNHMC14 : mean net salary per hour for executive
*    SNHMP14 : mean net salary per hour for middle manager
*    SNHME14 : mean net salary per hour for employee
*    SNHMO14 : mean net salary per hour for worker
*    SNHMF14 : mean net salary for women
*    SNHMFC14 : mean net salary per hour for feminin executive
*    SNHMFP14 : mean net salary per hour for feminin middle manager
*    SNHMFE14 : mean net salary per hour for feminin employee
*    SNHMFO14 : mean net salary per hour for feminin worker
*    SNHMH14 : mean net salary for man
*    SNHMHC14 : mean net salary per hour for masculin executive
*    SNHMHP14 : mean net salary per hour for masculin middle manager
*    SNHMHE14 : mean net salary per hour for masculin employee
*    SNHMHO14 : mean net salary per hour for masculin worker
*    SNHM1814 : mean net salary per hour for 18-25 years old
*    SNHM2614 : mean net salary per hour for 26-50 years old
*    SNHM5014 : mean net salary per hour for >50 years old
*    SNHMF1814 : mean net salary per hour for women between 18-25 years old
*    SNHMF2614 : mean net salary per hour for women between 26-50 years old
*    SNHMF5014 : mean net salary per hour for women >50 years old
*    SNHMH1814 : mean net salary per hour for men between 18-25 years old
*    SNHMH2614 : mean net salary per hour for men between 26-50 years old
*    SNHMH5014 : mean net salary per hour for men >50 years old

### File: population : demographic information in France per town, age, sex and living mode

*    NIVGEO : geographic level (arrondissement, communes...)
*    CODGEO : unique code for the town
*    LIBGEO : name of the town (might contain some utf-8 errors, this information has better quality  name_geographic_information)
*    MOCO : cohabitation mode : [list and meaning available in Data description]
*   AGE80_17 : age category (slice of 5 years) | ex : 0 -> people between 0 and 4 years old
*    SEXE : sex, 1 for men | 2 for women
*    NB : Number of people in the category

departments.geojson : contains the borders of french departments

In [41]:
import pandas as pd
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import numpy as np
%matplotlib inline




geography = pd.read_csv("name_geographic_information.csv")
industry = pd.read_csv("base_etablissement_par_tranche_effectif.csv")
salary = pd.read_csv("net_salary_per_town_categories.csv")
population = pd.read_csv("population.csv",low_memory=False)   

geography.drop(['EU_circo', 'code_région', 'éloignement', 'numéro_département', 'nom_département', 'préfecture', 'numéro_circonscription', 'codes_postaux'], axis=1, inplace=True)
geography.rename(columns={'nom_région': 'region_name',
                          'chef.lieu_région': 'region_capital',
                          'nom_commune': 'common_name',
                          'codes_postaux': 'postcodes'}, inplace=True)


## Tables data structure 

In [42]:
geography.head(3)

Unnamed: 0,region_name,region_capital,common_name,code_insee,latitude,longitude
0,Rhône-Alpes,Lyon,Attignat,1024,46.283333,5.166667
1,Rhône-Alpes,Lyon,Beaupont,1029,46.4,5.266667
2,Rhône-Alpes,Lyon,Bény,1038,46.333333,5.283333


In [43]:
industry.head(3)

Unnamed: 0,CODGEO,LIBGEO,REG,DEP,E14TST,E14TS0ND,E14TS1,E14TS6,E14TS10,E14TS20,E14TS50,E14TS100,E14TS200,E14TS500
0,1001,L'Abergement-Clémenciat,82,1,25,22,1,2,0,0,0,0,0,0
1,1002,L'Abergement-de-Varey,82,1,10,9,1,0,0,0,0,0,0,0
2,1004,Ambérieu-en-Bugey,82,1,996,577,272,63,46,24,9,3,2,0


In [44]:
salary.head(3)

Unnamed: 0,CODGEO,LIBGEO,SNHM14,SNHMC14,SNHMP14,SNHME14,SNHMO14,SNHMF14,SNHMFC14,SNHMFP14,...,SNHMHO14,SNHM1814,SNHM2614,SNHM5014,SNHMF1814,SNHMF2614,SNHMF5014,SNHMH1814,SNHMH2614,SNHMH5014
0,1004,Ambérieu-en-Bugey,13.7,24.2,15.5,10.3,11.2,11.6,19.1,13.2,...,11.6,10.5,13.7,16.1,9.7,11.8,12.5,11.0,14.9,18.6
1,1007,Ambronay,13.5,22.1,14.7,10.7,11.4,11.9,19.0,13.3,...,11.7,9.8,13.8,14.6,9.2,12.2,12.5,10.2,14.9,16.4
2,1014,Arbent,13.5,27.6,15.6,11.1,11.1,10.9,19.5,11.7,...,11.8,9.3,13.3,16.0,8.9,10.6,12.5,9.6,15.1,18.6


In [45]:
population.head(3)

Unnamed: 0,NIVGEO,CODGEO,LIBGEO,MOCO,AGEQ80_17,SEXE,NB
0,COM,1001,L'Abergement-Clémenciat,11,0,1,15
1,COM,1001,L'Abergement-Clémenciat,11,0,2,15
2,COM,1001,L'Abergement-Clémenciat,11,5,1,20


In [24]:
national_mean=salary['SNHM14'].mean()
national_mean_womans=salary['SNHMF14'].mean()
national_mean_mans=salary['SNHMH14'].mean()

print("National mean:{}".format(national_mean))
print("National mean womans:{}".format(national_mean_womans))
print("National mean mans:{}".format(national_mean_mans))

National mean:13.7063862928
National mean womans:12.0380257009
National mean mans:14.8481892523


## Top 5 regions with biggest salaries

In [34]:
sorted_net_mean_salaries = salary.sort_values(by='SNHM14', ascending=False)
sorted_net_mean_salaries.head(5)

Unnamed: 0,CODGEO,LIBGEO,SNHM14,SNHMC14,SNHMP14,SNHME14,SNHMO14,SNHMF14,SNHMFC14,SNHMFP14,...,SNHMHO14,SNHM1814,SNHM2614,SNHM5014,SNHMF1814,SNHMF2614,SNHMF5014,SNHMH1814,SNHMH2614,SNHMH5014
4225,78571,Saint-Nom-la-Bretèche,43.3,51.5,37.2,15.6,33.5,24.4,31.8,18.6,...,35.3,11.4,38.1,56.9,11.4,24.7,25.9,11.4,45.4,68.6
4170,78233,Feucherolles,38.7,47.8,29.0,15.5,25.1,22.8,29.6,16.6,...,26.4,10.0,32.3,54.0,10.2,21.9,27.1,9.8,39.2,65.8
4173,78251,Fourqueux,38.6,45.9,19.1,16.5,46.3,23.4,31.0,17.5,...,53.2,11.5,34.2,49.6,9.7,23.4,26.2,12.6,40.9,59.9
4872,92051,Neuilly-sur-Seine,36.7,47.8,23.4,15.7,22.9,26.7,35.5,19.0,...,23.5,12.6,34.4,47.6,12.0,26.6,31.0,13.2,41.6,61.3
4237,78650,Le Vésinet,36.3,46.7,19.9,15.1,22.0,25.2,33.4,16.5,...,23.8,11.3,32.3,50.5,10.9,24.7,30.3,11.6,38.1,62.1


## Top 5 regiuons with lowest salaries

In [40]:
sorted_net_mean_salaries.tail(5)

Unnamed: 0,CODGEO,LIBGEO,SNHM14,SNHMC14,SNHMP14,SNHME14,SNHMO14,SNHMF14,SNHMFC14,SNHMFP14,...,SNHMHO14,SNHM1814,SNHM2614,SNHM5014,SNHMF1814,SNHMF2614,SNHMF5014,SNHMH1814,SNHMH2614,SNHMH5014
604,17397,Saint-Savinien,10.3,20.7,12.4,9.7,8.7,10.1,17.9,11.9,...,8.9,8.7,9.8,12.2,8.6,9.6,12.0,8.7,10.0,12.3
763,23008,Aubusson,10.3,19.5,12.7,9.5,8.7,9.6,16.6,11.5,...,9.1,8.5,9.8,11.7,8.4,9.3,10.5,8.5,10.1,12.7
4314,81033,Blaye-les-Mines,10.3,21.1,12.4,9.5,9.0,9.6,15.6,11.3,...,9.4,8.5,10.1,11.9,8.5,9.6,10.0,8.5,10.4,13.7
2237,47168,Miramont-de-Guyenne,10.3,20.6,13.0,9.3,9.4,9.5,15.3,11.7,...,9.7,8.1,10.2,11.4,7.8,9.7,9.9,8.4,10.4,12.5
1184,31042,Bagnères-de-Luchon,10.2,19.6,12.5,9.2,8.8,9.6,19.8,10.8,...,9.1,9.0,9.9,11.8,9.1,9.6,10.1,8.8,10.2,13.7
