Expression domain #16
Comments
Dear issue #16, It's with emotion I cannot hide that I'm writing this to you. I remember. Oh, I remember :-) We've been talking about you for so long, but never had the chance. Your time has come. |
Plan is as follow:
A heuristics is composed of :
|
Angel @oncletom shared something that'll save us time http://oembed.com/ |
So what have we do ? Use it or extend it to a more global concept of heuristic ? |
"Use it" (assuming you're talking about oembed), but in a way I don't think you've grasped yet. Oembed isn't a service, it's a standard other websites choose to adhere to. So instead of having to reverse-engineer the HTML to find the author (which can be time consuming) and have a result that is brittle by design, we have direct access via oembed to the structured information we need (very specifically Unfortunately, not all websites we need expose oembed (Facebook doesn't, Twitter does, but only via API, so it's probably too complicated for now), so these will have to be done one by one (as planned anyway). However, there is a good overlap between your list and the current list of oembed providers. Once I have written the code to transform one oembed output to what we call an expression domain, I can have all the oembed providers in the same movement (because it's a standard). |
Reference for expression domains |
For Facebook, it looks mostly impossible to find the expression domain of an event or the number of followers/friends because the HTML is very rough/minimalist (BigPipe, all that).
For now, we're choosing to not do it. If it turns out this is not efficient enough, we'll revisit, probably to use the API. |
Facebook heuristic at #211. |
Twitter and Linkedin done at #215. |
Nothing to do for Wordpress. Either people bought their own domain or use a |
Blogger is gone and is now G+. Cool. |
If you type in the main issue description you can create tickable checkboxes ;-)
|
Let's consider this fixed after #221. Done: Facebook |
Develop a heuristic manager and the first heuristic files for the first 10 content farms.
Facebook
Twitter
Linkedin
Slideshare
Wordpress
Viadeo
youtube
vimeo
dailymotion
blogger
typepad
pinterest
Scribd
google+
Tumblr
picasa
flickr
The text was updated successfully, but these errors were encountered: