Skip to content

dnzengou/copysearch

 
 

Repository files navigation

copysearch

Search http://www.copyrightevidence.org and more.

.plan

Index content of copyright wiki into elasticsearch. Build a simple API and search frontend.

More data sources:

  • list of PDF uploads in the wiki
  • links from externallinks in the database dump

Pages, with studies:

$ find . -name "*html" | grep -v "action" | grep -v "Special:" | grep -v "User:" | grep -E '\([0-9]*\)' | grep -v "title="

Access API.

Oh my. http://stackoverflow.com/a/1625291/89391

I don't think it is possible using the API to get just the text.


Done

Basic data from wiki. Links to external PDFs.

TODO

Formalize related pages in wiki.

API

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.6%
  • JavaScript 26.8%
  • CSS 11.9%
  • HTML 10.7%