Skip to content

afcarl/google_ngram

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GoogleNGram (Python 2.7 + Mechanize)

====================================== This is a set of scripts that help download Google NGram files from the web.

  1. Install the mechanize package. I used the command:

    sudo easy_install mechanize

  2. Create a file of URLs. E.g., to get all English 5-gram files, type:

    mkdir eng-5gram; python collect_file_urls.py googlebooks-eng-all-5gram-20120701-\(.*\).gz eng-5gram/urls.txt

  3. Download the files at the URLs using wget -i:

    wget -i eng-5gram/urls.txt -P eng-5gram/

  4. Create a processed file of actual n-gram counts in the downloaded files:

    python merge_files.py eng-5gram eng-5gram.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%