# Name file tutorial

While working with text data, names might undergo specific processing:
- stopword processing : if names are irrelevant, they may be discarded during the text processing
- flagging : if the information that the text contains a name is relevant but the name itself is irrelevant, names may be replaced with a name_flag (bob -> flag_name)

By default, Melusine identifies names using an explicit list of names available in a file ('melusine/config/names.csv').  
The default name list comes from a name dataset publicly available on the french government website ([link](https://www.data.gouv.fr/fr/datasets/ficher-des-prenoms-de-1900-a-2017)).  
This list contains first names given to children (french or not) born in France between 1900 and 2017.

The used may specify a **custom name list** using a **custom 'names.csv' file**.

In [1]:
import pandas as pd
import os

### Loading the name file

In [2]:
from melusine.config.config import ConfigJsonReader
conf = ConfigJsonReader()

### Print the path to the current name file

In [3]:
with open(conf.path_ini_file_, 'r') as ini_file:
        print(ini_file.read())

[PATH]
template_config = /home/97133d/maif/melusine/melusine/config/conf.json
default_name_file = /home/97133d/maif/melusine/melusine/config/names.csv




### Print the current list of names (first 5 names)

In [4]:
conf_dict = conf.get_config_file()
print(conf_dict['words_list']['names'][:5])

['aaliyah', 'aalyah', 'aaron', 'abbas', 'abbes']


### Use a custom name file
1. Create a new (custom) name file 
   - The file should be a csv file with a column called `Name`
2. Set the new file as the current Melusine name file

In [5]:
# Create a name DataFrame
df_names = pd.DataFrame({'Name' : ['Daenerys', 'Tyrion', 'Jon', 'Raegar']})

# Path to the new name.csv file
new_path = os.path.join(os.getcwd(), 'data', 'names.csv')

# Save new name.csv file
df_names.to_csv(new_path, encoding="latin-1", sep=";", index=False)

In [6]:
print(df_names)

       Name
0  Daenerys
1    Tyrion
2       Jon
3    Raegar


### Set a new path to the name file in Melusine

In [7]:
conf.set_name_file_path(file_path=new_path)

### Print the new path to the name file

In [8]:
with open(conf.path_ini_file_, 'r') as ini_file:
        print(ini_file.read())

[PATH]
template_config = /home/97133d/maif/melusine/melusine/config/conf.json
default_name_file = /home/97133d/maif/melusine/tutorial/data/names.csv




### Print the content of the new name file

In [9]:
conf_dict = conf.get_config_file()
print(conf_dict['words_list']['names'][:5])

['daenerys', 'tyrion', 'jon', 'raegar']


### Setting back the original name file

In [10]:
conf.reset_name_file_path()

In [11]:
conf_dict = conf.get_config_file()
print(conf_dict['words_list']['names'][:5])

['aaliyah', 'aalyah', 'aaron', 'abbas', 'abbes']


## Warning

The name file is loaded by the different modules (Tokenizer, KeywordExtractor, etc) during the import, therefore, for the new name file to be effective, the code / kernel should be restarted after each modification of the name file.