NOTE: I do not use this version anymore. It may not work. I have rewrited it using Kotlin -> stats-scraper
Programing languages statistics scraper
Tool which collects (scraps from web) statistics for programing languages in need for my site jaki-jezyk-programowania.pl.
Currently, the tool is fetching data for each language from:
- Github
- top 10 projects
- number of projects
- number of projects with more than 500 stars
- Meetup.com
- number of members
- number of meetups
- StackOverflow
- number of tagged questions
- Wikipedia
- latest language version
- Tiobe INDEX
- position at last year
- position at this year
- Spectrum ranking
- position at last year
- position at this year
Everything is stored in two json files: - statistics.json - languagesVersions.json
If statistics.json already exists, then it will be renamed (date will be appended) and fresh one will be created as statistics.json.
Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Prerequisites
- Java 8+
- Maven (Development)
- Github account (if you want to collect data from Github)
Demo
If you want to just demo test it, download demo release, put it to demo
folder and run this bat file:
demo/stats-scraper-demo.bat
or manually in cmd from project directory
java -jar stats-scraper-1.0.0.jar -no-git
(Github data will be skipped)
Note:
If you will run this tool more than once within short time then errors occurs due to api restrictions.
Also you cannot have more than one file created in the same day. You need to remove or move old file with appended date if you want to get fresh data in that day.
Installing
Install maven dependencies
mvn install
Provide github authentication token under src\main\resources\config.properties
if u want to fetch data from Github
Follow this guide if u don't have token.
GithubAuthToken=token 22sadasdsa34r32412342134214324123
Otherwise, you need to pass parameter -no-git
when you are running program:
java -jar stats-scraper-1.0.0.jar -no-git
or remove lines where GithubDataScraper
is added to StatisticsBuilder
at App
class.
If you want to regenerate jar file to use it as single file application run:
mvn packege
(it goes under target/stats-scraper-1.0.0.jar)
Sample output of tool:
Built With
- Maven - Dependency management and build tool.
- Jsoup - Used to parse HTML websites.
- Json smart - Working with JSONObjects.
- Apache Commons - helper libs for validating data.
- Logback - status logging.