better source for data model #2

cch5ng · 2015-03-22T05:01:57Z

currently web scraping contents of h5bp interview questions gh-pages index.html. while this works, it is unreliable b/c of the likelihood that the master README.mdn (original questions source) has not been pushed to the gh-pages branch.
would like to parse the text of the README.mdn but not exactly sure how to do this. may require learning reg expressions. may use some 3rd party text parse library??

cch5ng · 2015-03-28T15:18:54Z

try to update the scraper to point to the html output from https://github.com/h5bp/Front-end-Developer-Interview-Questions ... appears that the README content is all contained in <article> ... verified that this is a unique outer element for the existing scraper logic
or alt look at markdown parser (but those appear to output html instead of json)

cch5ng · 2015-03-28T23:15:40Z

skimmed docs for one markdown parser (https://github.com/evilstreak/markdown-js) and it sounds like their output is basically .MDN to HTML. I want the interim JSON. need to test it out
on a little further reading, it is possible to use an interim function to get from .MDN to JSONML but it does not look super easy to go from JSONML to the JSON I am interested in. the kicker is the embedded html tags like <code> and then a bunch of nested lists
- would it be viable to grab a copy of the raw .MDN file, run it through the MDN parser (to HTML) and then apply my current logic to those results? but then there are additional variables like what if the parser introduces bugs and causes my app to fail?
- would there be a way to automate grabbing a copy of the raw .MDN file (weekly), run that thru MDN parser, save HTML results to my github repo. then my existing logic should work automatically (and there wouldn't be a slowdown from doing parsing every time)

cch5ng · 2015-03-28T23:18:31Z

retried doing web scraping on the master github pages for the project root and the project README.mdn file. but got errors like:

XMLHttpRequest cannot load https://github.com/h5bp/Front-end-Developer-Interview-Questions. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:8000' is therefore not allowed access.
XMLHttpRequest cannot load https://github.com/h5bp/Front-end-Developer-Interview-Questions/blob/master/README.md. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:8000' is therefore not allowed access.
- Believe both errors are related to security and cross origin (or cross domain) access for XHR requests. But that means my data source is going to be unreliable unless I can get useful results out of a markdown parser

cch5ng · 2015-03-29T01:12:49Z

(self note: from what I can tell, the updates from original h5bp project README to their gh-pages index.html is being maintained manually by one person; I cannot detect any automated update process in the repo source files)

cch5ng · 2015-03-29T02:20:29Z

temp workaround for time being...

grabbled h5bp raw README.mdn (master) and put it into http://dillinger.io/ > output as html
- 03 29 15: 2 04a ... see one issue related to the readme formatting inconsistency. the coding questions are using <p> tags and <pre> tags so the max number counts by category are getting messed up. probably should swap the order of fun questions and coding questions. 2nd issue is that the form labels are currently hard coded and they should be dynamic based on the readme html contents
- 03 29 15: 1 29a ... got slightly further trying to read the generated readme html on my gh-pages. now am getting a legit list of categories but for some reason the questions are not getting read and appended into the final js array of categories/questions
- trying to test the results but the jquery .find() is not reading the html results correctly so I don't know what is the difference between the dillinger.io output and the html format used in h5bp's gh-pages index
plan to add resulting HTML into a new src folder in my repo and point to that file from my XHR
- don't like introducing a manual dependency but really hate giving people unreliable content
- in long term, would need better solution but in short would really prefer working on app functionality and improving angular skills

cch5ng · 2015-03-29T09:34:40Z

temp workaround to inconsistent formatting for the coding questions section
- hardcode the form labels. swap positions of fun questions and coding questions
- set coding questions to just a read only text or input where it communicates that all coding questions will be returned no matter what
- store the coding questions (category and questions set) in a different variable than the other category/question groups

cch5ng · 2015-03-31T00:39:22Z

fixed handling the inconsistency with coding questions (non list format and using different html tags).
a16c78b

cch5ng · 2015-03-31T00:59:41Z

this is about as much as I plan to do for this iteration
- in the future may want to revisit having a better data model and better way of pulling data from the h5bp repo's README file.
- but would like to wrap up this project more quickly and work on different projects

cch5ng added enhancement bug and removed enhancement labels Mar 22, 2015

cch5ng mentioned this issue Mar 29, 2015

(dependency) refactor form inputs to use ng-repeat #17

Open

cch5ng closed this as completed Mar 31, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better source for data model #2

better source for data model #2

cch5ng commented Mar 22, 2015

cch5ng commented Mar 28, 2015

cch5ng commented Mar 28, 2015

cch5ng commented Mar 28, 2015

cch5ng commented Mar 29, 2015

cch5ng commented Mar 29, 2015

cch5ng commented Mar 29, 2015

cch5ng commented Mar 31, 2015

cch5ng commented Mar 31, 2015

better source for data model #2

better source for data model #2

Comments

cch5ng commented Mar 22, 2015

cch5ng commented Mar 28, 2015

cch5ng commented Mar 28, 2015

cch5ng commented Mar 28, 2015

cch5ng commented Mar 29, 2015

cch5ng commented Mar 29, 2015

cch5ng commented Mar 29, 2015

cch5ng commented Mar 31, 2015

cch5ng commented Mar 31, 2015