Skip to content
Merged
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
0720ac0
Setup project
SeijiEmery Jul 12, 2018
ce014df
trivial webpage fetching
SeijiEmery Jul 12, 2018
f90e03a
added selectors to get page content
SeijiEmery Jul 12, 2018
141d078
basic test crawler now breaks page content into sections and dumps al…
SeijiEmery Jul 12, 2018
6fb62ad
added parallelism (trivial)
SeijiEmery Jul 12, 2018
91c8a01
hacking on fields; may need to rewrite this
SeijiEmery Jul 13, 2018
a7d4bd0
parsed most fields (some bugs)
SeijiEmery Jul 13, 2018
621f683
fixed issues
SeijiEmery Jul 13, 2018
8951254
cleaned up code
SeijiEmery Jul 14, 2018
591ebe2
added course list (temp text dump)
SeijiEmery Jul 14, 2018
96ad40f
moved data model (temp) into separate file
SeijiEmery Jul 14, 2018
c28a6a8
refactoring; setup structure to finish crawler with TDD
SeijiEmery Jul 15, 2018
cc3ae46
built out utils to enable saner child element searching
SeijiEmery Jul 15, 2018
fd84c9a
implemented childRange as a full bidirectional, random-access range
SeijiEmery Jul 15, 2018
b142662
demo-ed code to make sure the section selections work
SeijiEmery Jul 15, 2018
61e773d
wrote simple expect() library and refactored unittests
SeijiEmery Jul 15, 2018
93c4936
fixed + collected all contact info
SeijiEmery Jul 15, 2018
75b6c1f
finished parsing faculty info from the ucsc registrar
SeijiEmery Jul 15, 2018
364b083
started parsing courses
SeijiEmery Jul 15, 2018
51c163a
struggled with this last bit for ~ 20 minutes until I realized I'd ma…
SeijiEmery Jul 15, 2018
1594f60
fixing edge cases...
SeijiEmery Jul 15, 2018
1730ce9
cleanup + bugfixes
SeijiEmery Jul 15, 2018
e66e43f
fixed ElementRange impl
SeijiEmery Jul 15, 2018
e256b4b
added jsonizer for output
SeijiEmery Jul 15, 2018
3a5d81f
started writing thorough unittests for parser code
SeijiEmery Jul 15, 2018
912b8c6
added text dumps for testing
SeijiEmery Jul 15, 2018
404fab0
added unittests
SeijiEmery Jul 15, 2018
516b4ef
came up with thorough list of regex replacement rules (for raw text d…
SeijiEmery Jul 15, 2018
e45d329
renamed file; added fixSentences function
SeijiEmery Jul 15, 2018
e0458ab
made progress
SeijiEmery Jul 17, 2018
aa28d84
wrote a bunch of validator stuff for course names... then threw it ou…
SeijiEmery Jul 17, 2018
4c8e4e3
building out parse cases. not parsing semantically, just to see which…
SeijiEmery Jul 17, 2018
ad63197
adding cases
SeijiEmery Jul 17, 2018
509c6a8
added more cases. not done, but I think I'll stop here...
SeijiEmery Jul 17, 2018
828ac89
wrote nifty algorithm for transforming shitty input (ie. "CHEM 1A/B/C…
SeijiEmery Jul 17, 2018
8be5b47
finished basic output
SeijiEmery Jul 17, 2018
395c1dd
added pretty formatting
SeijiEmery Jul 17, 2018
19d66af
fixed parallelism (so it now actually works -_-)
SeijiEmery Jul 17, 2018
1f896db
added a nice-ish commandline interface
SeijiEmery Jul 17, 2018
7a4ae10
added preliminary impl for ucsc registrar...
SeijiEmery Jul 17, 2018
9fb0518
wrote a quick script to transform data into a vis.js format
SeijiEmery Jul 17, 2018
dc4a287
setup demo page
SeijiEmery Jul 17, 2018
272956b
fleshed out arguments
SeijiEmery Jul 18, 2018
9563cf5
renamed file
SeijiEmery Jul 18, 2018
f26b4ff
fixed departments (ie. should alias)
SeijiEmery Jul 18, 2018
21e5c0b
fixed edge order
SeijiEmery Jul 18, 2018
fff89d3
updated to include title, etc
SeijiEmery Jul 19, 2018
c2d16a6
Merge remote-tracking branch 'crawlers/master' into pull-crawlers
SeijiEmery Jul 19, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

These merge commits were added into this branch cleanly.

There are no new changes to show.