Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Add: utilizing local meta keywords tags to suggest better and more relevant tags #47

Merged
merged 5 commits into from Oct 21, 2018

Conversation

ventz
Copy link
Contributor

@ventz ventz commented Oct 15, 2018

Feature Add: utilizing local meta keywords tags to suggest better and more relevant tags.

PR includes:

1.) New content script: keywords_suggestions.js with a getKeywordsSuggestionTags listener
(utilized for pulling out "meta keywords" content from websites

2.) updated manifest.json to include new content script allowance for keywords_suggestions.js

3.) Update to background.js to include optional parameter to pass local meta keywords, adding them to suggests, and de-dupping with popular and recommended tags.

4.) Updated to popups.js to include new getKeywordsSuggestionTags tab message sender (to our getKeywordsSuggestionTags listener from the keywords_suggestions.js), which either overloads getSuggests (in the background.js) with the local meta keywords, or if none are present, passes along the Pinboard suggestions.

Examples/Test with:

  • nytimes.com
  • washingtonpost.com
  • any custom site which includes meta keywords:
<html>
<head>
<meta content="exampletag1 someothertag yetanothertag news daily rss" name="keywords">
<title>test.com</title>
</head>
<body>Test.com</body>
</html>

… more relevant tags. Examples include: nytimes.com and washingtonpost.com
@clvrobj
Copy link
Owner

clvrobj commented Oct 15, 2018

Hi @ventz , Thanks for the pull request!

@ventz
Copy link
Contributor Author

ventz commented Oct 15, 2018

@clvrobj No problem.

If you can, please test it on your side - I started with the chrome extension at first, cloned it, and changed it there and got it to work, only to find your github repository :) I then modified it for here, following some of the names/conventions you have setup. My local version works (when I point chrome to the git working directory) -- but when I did the gulp build and took the dist bundle, it had display issues upon logging into the pinboard account. I think this might be due to some out of date/deprecated packages however.

Copy link
Owner

@clvrobj clvrobj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clvrobj No problem.

If you can, please test it on your side - I started with the chrome extension at first, cloned it, and changed it there and got it to work, only to find your github repository :) I then modified it for here, following some of the names/conventions you have setup. My local version works (when I point chrome to the git working directory) -- but when I did the gulp build and took the dist bundle, it had display issues upon logging into the pinboard account. I think this might be due to some out of date/deprecated packages however.

@ventz Sure, will try it! Thanks.

suggests = suggests.concat(keywordTags);
suggests = suggests.sort().filter(function(item, pos, ary) {return !pos || item != ary[pos - 1];});
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to replace the tabs in this block into 2 spaces? It'd be identical with the full project.
Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All set and pushed.

}
sendResponse({data: tags});
}
});
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to replace the tabs into 2 spaces? It'd be identical with the full project.
Thanks!

@clvrobj
Copy link
Owner

clvrobj commented Oct 16, 2018

Hi @ventz I tried it and it works, but the tags seems a bit too many in the page. Pls check the below screenshot.
screenshot 2018-10-16 21 49 11
I am thinking whether we can improve it by better sorting and show less of them. Currently, the keywords are sorted by alphabets, I think insteadly it might be better to sort the keywords by the showing times and only display the top 10 or 5 keywords, then the most relavent ones will be shown in the page, what do you think?
Just did a quick test on the keywords of NYtimes using the solution, the result:
[('and', 7), ('US', 3), ('Saudi', 3), ('Politics', 2), ('Arabia', 2), ('bin', 2), ('2016', 2), ('Charlottesville', 2), ('violence', 2), ('Restaurant', 2), ('Foreign', 1), ('Policy', 1), ('', 1), ('Iran', 1), ('Jamal', 1), ('Khashoggi', 1), ('Murders', 1), ('Homicides', 1), ('State', 1), ('Department', 1), ('King', 1), ('of', 1), ('Salman;Salman', 1), ('Abdul-Aziz', 1), ('Mike', 1), ('Pompeo', 1), ('Mohammed', 1), ('Salman', 1), ('Turkey', 1), ('Oil', 1), ('Gasoline', 1), ('Aramco', 1), ('Donald', 1), ('Trump', 1), ('House', 1), ('Committee', 1), ('on', 1), ('Ways', 1), ('Means', 1), ('Democrats', 1), ('Federal', 1), ('Taxes', 1), ('Russian', 1), ('meddling', 1), ('in', 1), ('election', 1), ('Teapot', 1), ('Dome', 1), ('Women', 1), ('Girls', 1), ('Sexual', 1), ('harassment', 1), ('Brett', 1), ('M', 1), ('Kavanaugh', 1), ('Abortion', 1), ('Missouri', 1), ('Claire', 1), ('McCaskill', 1), ('Presidential', 1), ('Election', 1), ('Blacks;African-Americans', 1), ('Tests', 1), ('School', 1), ('discipline', 1), ('K-12', 1), ('Education', 1), ('Race', 1), ('Ethnicity', 1), ('Segregation', 1), ('Equal', 1), ('Educational', 1), ('Opportunities', 1), ('VA', 1), ('Marijuana;Pot;Weed', 1), ('Canada', 1), ('Cannabis', 1), ('Foods', 1), ('Products', 1), ('Medical', 1), ('Marijuana', 1), ('Regulation', 1), ('Deregulation', 1), ('Legislation', 1), ('MeToo', 1), ('Movement', 1), ('Chef', 1), ('Spotted', 1), ('Pig', 1), ('Manhattan', 1), ('NY', 1), ('April', 1), ('Bloomfield', 1), ('Ken', 1), ('Friedman', 1), ('Domestic', 1), ('Moira', 1), ('Donegan', 1), ('Stephen', 1), ('Elliott', 1), ('Lawsuits', 1)], so we might also could filter out the and, of etc, or we can also do the filter in the next step.

@ventz
Copy link
Contributor Author

ventz commented Oct 17, 2018

Hi,

The NYT is sadly one of the places where the keywords are a bit much. and honestly - in my opinion at least, they are probably not that helpful there. They basically don't use them properly. But in most other places I think they are pretty good (the ones that still use them - as google has stopped looking at meta tags, some websites have stopped bothering with meta data). Usually websites keep less than 10 of them, and they are directly related to the website itself. (ex: see cnn.com - although they filter their traffic so sometimes they are not available to the scripts pulling the metadata).

Maybe as a solution the array length can be checked -- if it's over (picking an arbitrary number here) 10 for the ones coming from the meta keywords, we can then pull a keyword relevancy of some sort? (At that point, there's even the potential to do some keyword analysis from the html content itself :))
Esemtially this: https://www.rankranger.com/documentation/on-page-keyword-density-analysis

I would also lower case the tags by the way, otherwise there's some de-dups there and in general it probably makes sense to store them as lower case.

@clvrobj clvrobj merged commit 66a104c into clvrobj:master Oct 21, 2018
@clvrobj
Copy link
Owner

clvrobj commented Oct 21, 2018

It makes sense.
I've merged this and deployed v2.12.2 to the chrome web store. Let's see the performance.
Feel free to submit further pull request.
Thanks @ventz !

@ventz
Copy link
Contributor Author

ventz commented Oct 22, 2018

@clvrobj That's great - thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants