## Enrich from a Dropcontact file
Enrich a CSV file from a Dropcontact result file

In [49]:
import pandas
from marketing_data_cleaning import DATA_FOLDER_PATH

In [50]:
enrichment_file = pandas.read_csv(DATA_FOLDER_PATH / 'enrichment.csv', sep=';')

In [51]:
file_to_enrich = pandas.read_csv(DATA_FOLDER_PATH / 'db/V2.csv')

Only keep some of the columns and also ensure that the columns carry the same column names as the file to enrich

In [52]:
enrichment_file = enrichment_file[['Nom', 'Prénom', 'Email', 'Company LinkedIn']]
enrichment_file.drop_duplicates(inplace=True)
enrichment_file.head()

Unnamed: 0,Nom,Prénom,Email,Company LinkedIn
0,Delcommune,Franck,,www.linkedin.com/company/equilis
1,Cerisier,Jerome,jerome.cerisier@bmstores.fr,www.linkedin.com/company/france-bm
2,Barbera,Franck,franck.barbera@bmstores.fr,www.linkedin.com/company/france-bm
3,Rabia,Anis,anis.rabia@bchef.fr,www.linkedin.com/company/bchef
4,Fel,Elodie,,www.linkedin.com/company/bchef


In [53]:
enrichment_file = enrichment_file.rename(columns={'nom': 'Nom', 'prenom': 'Prénom'})

Keep the values for which we actually do have an email [eventually, consider checking the LinkedIn profile url]

In [54]:
# enrichment_file['email_empty'] = enrichment_file['Email'].empty
enrichment_file['email_empty'] = enrichment_file['Email'].isna()


In [55]:
with_emails = enrichment_file[enrichment_file['email_empty'] == False]


Use the merge technique which will keep all the columns on our base dataframe and will match the values to merge based on the `on` parameter

In [56]:
merged = file_to_enrich.merge(enrichment_file, on='Nom', how='left')
merged.head()

Unnamed: 0,Entreprise,Statut enrichissement,LinkedIn,Nom complet,Company LinkedIn_x,Site entreprise,Poste,Prénom_x,Nom,Email_x,Prénom_y,Email_y,Company LinkedIn_y,email_empty
0,Optique Lafayette,Non enrichi,https://www.linkedin.com/in/noemieleho,Noémie Le Ho,https://www.linkedin.com/company/optiquelafaye...,https://www.jeminstalle-optiquelafayette.com,Chargée Marketing et Communication,Noémie,Le Ho,,,,,
1,Optique Lafayette,Non enrichi,https://www.linkedin.com/in/laura-bernadac-704...,Laura Bernadac,https://www.linkedin.com/company/optiquelafaye...,https://www.jeminstalle-optiquelafayette.com,-Opticienne-,Laura,Bernadac,,,,,
2,Optique Lafayette,Non enrichi,https://www.linkedin.com/in/julien-berdeil-b75...,Julien Berdeil,https://www.linkedin.com/company/optiquelafaye...,https://www.jeminstalle-optiquelafayette.com,Gérant OPTIQUE LAFAYETTE,Julien,Berdeil,,,,,
3,Optique Lafayette,Non enrichi,https://www.linkedin.com/in/laurie-blanchard-2...,Laurie Blanchard,https://www.linkedin.com/company/optiquelafaye...,https://www.jeminstalle-optiquelafayette.com,Assistante ADV chez Groupe Lafsanté,Laurie,Blanchard,,,,,
4,Optique Lafayette,Non enrichi,https://www.linkedin.com/in/gilles-sureau-5666...,Gilles Sureau,https://www.linkedin.com/company/optiquelafaye...,https://www.jeminstalle-optiquelafayette.com,Monteur-vendeur chez le 7 opticien,Gilles,Sureau,,,,,


In [57]:
final_df = merged.drop(columns=['Prénom_x', 'Prénom_y', 'Email_x', 'email_empty'])


In [58]:
final_df.to_csv(DATA_FOLDER_PATH / 'db/V2-E.csv', index=False)