Skip to content

Latest commit



176 lines (141 loc) · 6.19 KB

File metadata and controls

176 lines (141 loc) · 6.19 KB


New commands:

  • to download names with json api files

New countries:

  • Switzerland (60693 names)

New Preprint: Damegender: Towards an International and Free Dataset about Name, Gender and Frequency

Updates in commands about datasets.


Refactoring and bug fixing: All source is reaching PEP8 coding style now.

Improvements in commands:

  • you can scrap the gender from wikipedia with –api=wikipedia. Before, you can use –api=wikidata, but this way (sparql) is worst in the current state of the development.


New names:

  • China (2614 females and 2614 males)
  • Turkey (116114 females and 67309 males)

New commands:

  • merge dataset files


New names from oficial Open Data statistics:

  • Denmark (62072 males and 79235 females)
  • France (16660 males and 19783 females)

Creating inter names from all countries with Open Data:

  • 259395 males
  • 279863 females

Improvements to csv2gender such as new arguments:

  • skip_header
  • delete_duplicated
  • outimg
  • outcsv
  • title


  • add –verbose argument

New names from oficial Open Data statistics:

  • Belgium (14208 names)
  • Slovenia (8788 names)
  • Austria (1899 names)
  • Deutchsland (22368 names)
  • Mexico (16122 names)


  • add –position argument, fix –less
  • and it shows males and females
  • app/ created due to refactor in app/
  • add –noshow argument
  • created. It’s about damegender tips written as jokes.
  • manual: new sections
  • Now we have a dual license and I add scripts to change licenses


We are starting to count males and females in Internet Communities:


  • print lists about the most used names in different countries
  • counting scientifics in Spain


New names from oficial Open Data census:

  • Ireland (382 names)
  • Iceland (326 names)
  • Finland (11449 names)


New names from oficial Open Data census:

  • Canada (107339 names)
  • New Zealand (6600 names)
  • Australia (52978 names)
  • Portugal (3999 names)


  • execute allnoundefined.csv with different ML models of to generate all logs needed
  • manual/damegender.texi, manual/damegender.pdf: we have grouped some articles and ideas in a book format (not finished)



Updating (more names, new calculus and results):

  • articles/damegender.pdf
  • files/datamodels/*sav


  • add example to count males and females in debian keyring
  • race about a name. Source: USA census
  • guess surname Spain and United States of America supported
  • about countries where a surname appears. Source: INE
  • now you can convert the readme from org to markdon with this script
  • add adaboost ml algorithm


  • to deploy roc curves to measure ML
  • execute all options of to generate all json needed
  • execute all options of to generate all logs needed
  • execute all options of to generate all logs needed
  • starting the option to use wikidata


  • Recreated all datamodels with new datasets. Augmented the accuracies with this feature!
  • Namsor stuff has been updated to Namsor2


Now the next datasets available from in damegender

  • [X] United Kingdom
  • [X] United States of America
  • [X] Uruguay
  • [X] Lucía Santamaría and Helena


  • Added to generate files with ML results.
  • Created ML json files
  • Added new ML algorithms: tree and mlp (neural network)


  • Added to download names from csv to one json file we have rewrited, and to make this calculus offline


  • Improved the test system with testing from python commands with bash.
  • Added to guess a name in different countries
  • added bernoulliNB ML algorithm and support to genderguesser
  • adding support to different dimensions
  • added bernoulliNB ML algorithm
  • adding genderapi and namsor support
  • Minor chances
  • rewriting to to recreate all files created with scripts from original files (not only ML models)
  • small fix, avoid duplicated
  • is related to letter_a, last_letter_a, last_letter_o, last_letter_consonant, last_letter_vocal, first_letter, first_letter_consonant, first_letter_vocal
  • pca support with and


First version packaged.

  • The application is supporting test with nose
  • is returning names in english and spanish
  • is for return names from main apis
  • is giving support for main apis
  • is giving support for main apis
  • is giving support for genderize, damegender, genderguesser and nameapi
  • allows create a password file for apis
  • is only for partial.csv and all.csv
  • is creating sav files for machine learning algorithms with scikit
  • is a prototype to calculate gender from google results with a name
  • is a prototype to return number of males and females in a git repository
  • is a prototype to return number of males and females in a mailing list
  • is related to last_letter_a, last_letter_consonant, last_letter_vocal
  • pca support is only a prototype