GitHub - ceejbot/ao3tags: collect all ao3 fic tags

Quick and dirty node.js script to fetch all the canonical freeform tags used by the AO3. Requires a recent node.js.

To use, clone the repo then run npm install.

To collect new data: Run ./fetchtags.js then run parsepages.js to process the result into json. The output file is tags.json. Keys are tags in lexical order; values are the counts reported by AO3.

To have fun with the data I've already collected: ./analyze.js --help.

Working data set as of December 4 2016 is checked in here as tags.json.

fetchtags.js

Fetch the tag listing pages from AO3 for later munging. Fetches one at a time so as to not distress anybody's most robust servers. Because I am lazy, this depends on a pagecount constant to figure out if it's done. Will create a local directory named ./input/ and store files in it.

parsepages.js

Parses the output of fetchtags.js and turns it into a json blob in tags.json.

analyze.js

get a list of tags matching specific criteria

Options:
  --cutoff, -c     tags with usage counts lower than this will not be considered
  --filter, -f     string or pattern to filter for
  --transform, -t  transform tags to canonical form                    [boolean]
  --sort, -s       sorting criterion: lexical or count      [default: "lexical"]
  --json, -j       output results in json format                       [boolean]
  --help           Show help                                           [boolean]

Examples:
  analyze.js -c 200 -f hurt     filter for tags with "hurt" used at least 200 times
  analyze.js -c 5000 -s count   show tags used more than 5000 times sorted by usage count
  analyze.js -c 3000 file.json  read some other json file for data (defaults to tags.json)

The transform option changes the tags to a no-spaces all-lower-case version with some inconsistencies cleaned up. Needs more work.

./analyze.js -c 3000 --sort count -t gets you a list of tags that is mostly devoid of fandom-specific & content-free crud, ready to use.

LICENSE

ISC.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.editorconfig		.editorconfig
.gitignore		.gitignore
.jscsrc		.jscsrc
.jshintrc		.jshintrc
LICENSE		LICENSE
README.md		README.md
analyze.js		analyze.js
fetchtags.js		fetchtags.js
package.json		package.json
parsepages.js		parsepages.js
tags-2013-03.json		tags-2013-03.json
tags-2015-08.json		tags-2015-08.json
tags-2016-01.json		tags-2016-01.json
tags.json		tags.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fetchtags.js

parsepages.js

analyze.js

LICENSE

About

Releases

Packages

Languages

License

ceejbot/ao3tags

Folders and files

Latest commit

History

Repository files navigation

fetchtags.js

parsepages.js

analyze.js

LICENSE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages