Skip to content

Latest commit

 

History

History
176 lines (141 loc) · 6.19 KB

releases-notes.org

File metadata and controls

176 lines (141 loc) · 6.19 KB

v0.3.6

New commands:

  • csv2jsonapirest.py: to download names with json api files

New countries:

  • Switzerland (60693 names)

New Preprint: Damegender: Towards an International and Free Dataset about Name, Gender and Frequency

Updates in commands about datasets.

v0.3.5

Refactoring and bug fixing: All source is reaching PEP8 coding style now.

Improvements in commands:

  • api2gender.py: you can scrap the gender from wikipedia with –api=wikipedia. Before, you can use –api=wikidata, but this way (sparql) is worst in the current state of the development.

v0.3.4

New names:

  • China (2614 females and 2614 males)
  • Turkey (116114 females and 67309 males)

New commands:

  • mergeinterfiles.py: merge dataset files

v0.3.3

New names from oficial Open Data statistics:

  • Denmark (62072 males and 79235 females)
  • France (16660 males and 19783 females)

Creating inter names from all countries with Open Data:

  • 259395 males
  • 279863 females

Improvements to csv2gender such as new arguments:

  • skip_header
  • delete_duplicated
  • outimg
  • outcsv
  • title

v0.3.2

  • csv2gender.py: add –verbose argument

New names from oficial Open Data statistics:

  • Belgium (14208 names)
  • Slovenia (8788 names)
  • Austria (1899 names)
  • Deutchsland (22368 names)
  • Mexico (16122 names)

v0.3.1

  • top.py: add –position argument, fix –less
  • mail2gender.py and git2gender.py: it shows males and females
  • app/dame_statistics.py: created due to refactor in app/dame_gender.py
  • csv2gender.py: add –noshow argument
  • jokes.py: created. It’s about damegender tips written as jokes.
  • manual: new sections
  • Now we have a dual license and I add scripts to change licenses

v0.2.11

We are starting to count males and females in Internet Communities:

v0.2.10

  • top.py: print lists about the most used names in different countries
  • count-scientifics.py: counting scientifics in Spain

v0.2.9

New names from oficial Open Data census:

  • Ireland (382 names)
  • Iceland (326 names)
  • Finland (11449 names)

v0.2.8

New names from oficial Open Data census:

  • Canada (107339 names)
  • New Zealand (6600 names)
  • Australia (52978 names)
  • Portugal (3999 names)

Create:

  • logs-errors.sh: execute allnoundefined.csv with different ML models of errors.py to generate all logs needed
  • manual/damegender.texi, manual/damegender.pdf: we have grouped some articles and ideas in a book format (not finished)

Refactor:

  • errors.py

Updating (more names, new calculus and results):

  • articles/damegender.pdf
  • files/datamodels/*sav

v0.2.7

  • count-debian-gender.py: add example to count males and females in debian keyring
  • ethnicity.py: race about a name. Source: USA census
  • surname.py: guess surname Spain and United States of America supported
  • surnameincountries.py: about countries where a surname appears. Source: INE
  • readme.sh: now you can convert the readme from org to markdon with this script
  • add adaboost ml algorithm

v0.2.6

  • roc.py: to deploy roc curves to measure ML
  • regenerate-ml-json.sh: execute all options of damegender2json.py to generate all json needed
  • logs-accuracies.sh: execute all options of accuracy.py to generate all logs needed
  • logs-confusion.sh: execute all options of confusion.py to generate all logs needed
  • api2gender.py: starting the option to use wikidata

v0.2.5

  • Recreated all datamodels with new datasets. Augmented the accuracies with this feature!
  • Namsor stuff has been updated to Namsor2

v0.2.4

Now the next datasets available from main.py in damegender

  • [X] United Kingdom
  • [X] United States of America
  • [X] Uruguay
  • [X] Lucía Santamaría and Helena

v0.2.3

  • Added damegender2json.py to generate files with ML results.
  • Created ML json files
  • Added new ML algorithms: tree and mlp (neural network)

v0.2.1

  • Added downloadjson.py to download names from csv to one json file we have rewrited accuracy.py, confusion.py and errors.py to make this calculus offline

v0.1.9

  • Improved the test system with testing from python commands with bash.
  • Added nameincountries.py to guess a name in different countries
  • main.py: added bernoulliNB ML algorithm and support to genderguesser
  • confusion.py: adding support to different dimensions
  • accuracy.py: added bernoulliNB ML algorithm
  • errors.py: adding genderapi and namsor support
  • csv2gender.py: Minor chances
  • rewriting damemodels.py to postinstall.py to recreate all files created with scripts from original files (not only ML models)
  • mail2gender.py: small fix, avoid duplicated
  • infofeatures.py: is related to letter_a, last_letter_a, last_letter_o, last_letter_consonant, last_letter_vocal, first_letter, first_letter_consonant, first_letter_vocal
  • pca support with pca-components.py and pca-features.py

v0.0.36

First version packaged.

  • The application is supporting test with nose
  • main.py is returning names in english and spanish
  • api2gender.py is for return names from main apis
  • confusion.py is giving support for main apis
  • accuracy.py is giving support for main apis
  • errors.py is giving support for genderize, damegender, genderguesser and nameapi
  • apikeyadd.py allows create a password file for apis
  • csv2gender.py is only for partial.csv and all.csv
  • damemodels.py is creating sav files for machine learning algorithms with scikit
  • gendergoogle.py is a prototype to calculate gender from google results with a name
  • git2gender.py is a prototype to return number of males and females in a git repository
  • mail2gender.py is a prototype to return number of males and females in a mailing list
  • infofeatures.py is related to last_letter_a, last_letter_consonant, last_letter_vocal
  • pca support is only a prototype