Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting this awesome lib for another language - guidelines #43

Open
BERA opened this issue May 13, 2014 · 12 comments
Open

Porting this awesome lib for another language - guidelines #43

BERA opened this issue May 13, 2014 · 12 comments

Comments

@BERA
Copy link

BERA commented May 13, 2014

Hi

First off all I want to congrats everyone involved in the development of this project.

I'm interested to improve the usage of this library for Portuguese language, so to achieve it I need to research for common words in and most popular password words in this language to build a more accurate bad password list, I guess.

There is some information describing the process for change the code to provide the dictionary for a set of words in another language, or maybe some simple approach to use this lib with bad password list and not permitted password in another language too?

Thanks!

@lowe
Copy link
Collaborator

lowe commented May 13, 2014

Hi Bera,

This is indeed very doable!

Question for you -- it isn't enough to have a dictionary for another
language. It needs to be a ranked dictionary, where you have some sense of
common (a, the, banana) vs uncommon (adolescent, optometrist). Did you come
across such a list for Portuguese? It'd be fun to build your own too with
e.g. nltk and a nice corpus.

Recommended steps:

  1. install zxcvbn dependencies: coffeescript, java, python, and the
    simplejson python module
  2. clone zxcvbn, confirm these steps work for you:
    cd zxcvbn/scripts
    python build_frequency_lists.py
    cd ..
  3. adapt build_frequency_lists.py to add your Portuguese lists and
    (optionally) remove the English lists. I recommend doing this by adding
    your datasource to zxcvbn/data and making as minor a change as possible to
    build_frequency_lists.py to read it in.

Hope that helps, let me know if you have any other questions.
Dan

On 13 May 2014 06:49, Bera notifications@github.com wrote:

Hi

First off all I want to congrats everyone involved in the development of
this project.

I'm interested to improve the usage of this library for Portuguese
language, so to achieve it I need to research for common words in and most
popular password words in this language to build a more accurate bad
password list, I guess.

There is some information describing the process for change the code to
provide the dictionary for a set of words in another language, or maybe
some simple approach to use this lib with bad password list and not
permitted password in another language too?

Thanks!


Reply to this email directly or view it on GitHubhttps://github.com//issues/43
.

@pyramids
Copy link

Google n-grams might be an acceptable source for ranked dictionaries. Freely available at
http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html

I suppose tweet word frequency lists could be an even better estimate for casually spelled and keyboard-based word usage, but I have no source for those.

EDIT:
I gave a bad (English-only) link to Google's n-gram data. This one covers more languages, but still does not include Portuguese: http://storage.googleapis.com/books/ngrams/books/datasetsv2.html

@lowe
Copy link
Collaborator

lowe commented May 13, 2014

Good idea!

On 13 May 2014 14:44, Björn Stein notifications@github.com wrote:

Google n-grams might be an acceptable source for ranked dictionaries.
Freely available at
http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html

I suppose tweet word frequency lists could be an even better estimate for
casually spelled and keyboard-based word usage, but I have no source for
those.


Reply to this email directly or view it on GitHubhttps://github.com//issues/43#issuecomment-43017223
.

@BERA
Copy link
Author

BERA commented May 13, 2014

Thank you guys! I'll try and let you know soon.

@ebeeson
Copy link

ebeeson commented Nov 5, 2015

@BERA did you ever have any success with this?

@BERA
Copy link
Author

BERA commented Nov 10, 2015

Hi Erik

Unfortunately in this year I gave up of it. But I'm planning to add this in
my todo list in the next year.

This will be a great experience for sure. Maybe I can port this for golang
and make an API for IT.

Thanks for your contact and I'm apologize for let this abandon issue for
now.

Em qui, 5 de nov de 2015 às 17:40, Erik Beeson notifications@github.com
escreveu:

@BERA https://github.com/Bera did you ever have any success with this?


Reply to this email directly or view it on GitHub
#43 (comment).

@Esekil
Copy link

Esekil commented Sep 28, 2016

Hello, I would like to add an Italian dictionary to this library. As a first step I added to the file data my dictionary and I modified the file built_frequency_list.py:
DICTIONARIES = dict
(
us_tv_and_film = 30000,
english_wikipedia = 30000,
passwords = 30000,
surnames = 10000,
male_names = None,
female_names = None,
italian_dictionary = None, )
adding the least line before ")" , unfortunately, I could not to compile file built_frequency_list.py. Would you please help me to figure out how to do it. Thank you for your time.

@pepve
Copy link

pepve commented Nov 5, 2017

I forked this repository and made some adjustments for Dutch. I added first and last names, and I added words from the Dutch Wikipedia using the same method as for English.

Repository here: https://github.com/pepve/zxcvbn-nl

Relevant commit here: pepve/zxcvbn-nl@30fad91

@JoSSte
Copy link

JoSSte commented Nov 5, 2017

maybe http://letterfrequency.org/letter-frequency-by-language/ could assist in porting the library to other languages...

@flaviogrossi
Copy link

To anyone interested, I'm working on adding italian words and names to zxcvbn here, based on wikipedia entries and common italian names.

@RawanAyoub
Copy link

To help anyone that sat like me with no pyhton experience at alll:
Add your files to the data folder,
Change build_frequency_lists.py so it includes your file name as a dict,
run python build_frequency_lists.py ../data ../src/frequency_lists.coffee,
run npm install

That created a new .js file which contain the new dictionaries!

Good luck to anyone having this issue

@RawanAyoub
Copy link

And you need pyhon 2 because some things are deprecated in python 3. For example iterItems() are now items().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants