Skip to content

🌪 Tornado: A crawler traces TDK's (Turkish Language Association) webpage to fetch the meanings of the given word list, in order to smooth the way for people who have needs to take Turkish language studies one step further.

License

buraktekin/tornado

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌪 Tornado:

A crawler traces TDK's (Turkish Language Association) webpage to fetch the meanings of the given words list, in order to smooth the way for people who have needs to take studies of Turkish language one step further.

Install:

  • Download this repo to your local machine: git clone https://github.com/buraktekin/tornado.git
  • cd into local repo: cd tornado
  • If you already have virtualenv in your system run the next command directly, if not first run bash pip install virtualenv.

After you have virtualenv run:

  • virtualenv venv
  • source venv/bin/activate
  • pip install -r requirements.txt

Run 🌪 Tornado:

scrapy crawl tornado

Extra Information:

For now, the system works by the corpus called 'words.txt' under the folder 'words'. You can modify this corpus with your own but the name should remain the same. This project will be improved and parameterized soon. After that, you will be able to use our system by giving your file's path directly.

Stats

  • Device: MacBook Pro (Retina, 13-inch, Mid 2014)
  • Processor: 2,6 GHz Intel Core i5
  • Memory: 8 GB 1600 MHz DDR3
  • Data size: 63840 words // 701KB
  • Completion Time: ~3600sec (~1000 words/min)

âš : It really takes too much time to complete crawling. We will make some progress to drop it down later on.

About

🌪 Tornado: A crawler traces TDK's (Turkish Language Association) webpage to fetch the meanings of the given word list, in order to smooth the way for people who have needs to take Turkish language studies one step further.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages