This project started as an exploration into the terms of service agreement employed by various tech companies but morphed into an applied algorithm-writing exercise. The initial concept of this project dawned on me after frustrtations with company policies restricting the ability of people to scrape information from their websites. Even if I'm not allowed to access their data, I can still at least study their terms of service, right?
You can see the output here. Note that the links will only work if you download the PDF.
-
word2vec
-
porter stemming algorithm
-
ngrams tokenization
-
TF-IDF statistic
-
singular value decomposition