Skip to content

Saves sentences from public tilde.club-galaxy pages into db, and tweets random selections with links.

Notifications You must be signed in to change notification settings

adsgray/tilde.club-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tilde.club web crawler and tweeter

Purpose

  • create a database of text from tilde.club (and other .club) home pages
  • maybe manually prune them down to 'tweetable' or otherwise high-quality snippets
  • set up a twitter bot to tweet quotes along with a link to the corresponding tilde.club page

Files

  • getusers.pl generate SQL to insert user records into sqlite3 db
  • generate_usermap.sh select from db to create map of userid to username
  • local_crawl.pl takes a usermap file and process index.html for each user. Produces an INSERT statement for each sentence on the page. Sentences go into tweet table. (DEPRECATED)
  • crawl.py fetches and parses user pages and inserts sentences into tweet table
  • tweet.py Does not actually tweet. Chooses a random row from tweet table, marks it as used, and prints out a constructed tweet that includes link to originating page.
  • tweet_tilde_quote a shell script that takes the output of tweet.py and tweets it out. This is called from cron.
  • report.py generates a list of the top N most prolific writers

TODO

  • ~~add other ~sites~~
  • make crawl.py smart enough to only process recently updated pages (fetch the .JSON file)
  • add crawl.py to cron once it is smart enough. added it anyway.

About

Saves sentences from public tilde.club-galaxy pages into db, and tweets random selections with links.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published