deduce article title from url if it is empty #453

p1er · 2016-12-23T21:32:34Z

Try to deduce title from url only if the parsed title is empty. This happens with some blogs which don't populate the title properly.

This patch uses the already existing function make_title, although it is improved to not include .html or .htm at the end of the title.

coveralls · 2016-12-23T21:50:07Z

Coverage decreased (-0.03%) to 32.394% when pulling f33f44d on p1er:title-from-url into 310d1ad on akrennmair:master.

Minoru

Thanks! Can you please cover this new functionality with tests?

Minoru · 2016-12-23T21:47:12Z

src/utils.cpp

@@ -1069,6 +1069,11 @@ std::string utils::make_title(const std::string& const_url) {
 	if (title.at(0)>= 'a' && title.at(0)<= 'z') {
 		title[0] -= 'a' - 'A';
 	}
+	//strip .htm or .html from title


Better move this up and place it on line 1065. This will keep the flow of the function straightforward: it gradually trims the URL down to the title.

Minoru · 2016-12-23T21:51:04Z

src/utils.cpp

@@ -1069,6 +1069,11 @@ std::string utils::make_title(const std::string& const_url) {
 	if (title.at(0)>= 'a' && title.at(0)<= 'z') {
 		title[0] -= 'a' - 'A';
 	}
+	//strip .htm or .html from title
+	size_t pos = title.find(".html",0);


I'd rather limit this to the end of the URL. We probably don't want to strip ".html" from the middle of the title.

I also don't mind seeing this re-written with a regex and extended to cover .php and .aspx as these are quite common; but that's up to you.

coveralls · 2016-12-24T03:20:17Z

Coverage increased (+0.01%) to 32.434% when pulling e25e681 on p1er:title-from-url into 310d1ad on akrennmair:master.

coveralls · 2016-12-24T03:20:17Z

Coverage increased (+0.01%) to 32.434% when pulling e25e681 on p1er:title-from-url into 310d1ad on akrennmair:master.

coveralls · 2016-12-24T03:20:17Z

Coverage increased (+0.01%) to 32.434% when pulling e25e681 on p1er:title-from-url into 310d1ad on akrennmair:master.

Minoru · 2016-12-24T11:13:12Z

Looks great, but a test for rss_parser will be nice, too—after all, that's what you wanted to change in the first place; changes to make_title are incidental.

Minoru · 2017-01-25T17:43:06Z

@p1er, will you finish this, or should I take over? (I'll credit you in the Changelog either way.)

Minoru · 2017-02-12T08:41:20Z

Okay, I'm taking over this.

Minoru · 2017-02-12T09:56:50Z

Thank you for your work, @p1er!

deduce article title from url if it is empty

f33f44d

Minoru suggested changes Dec 23, 2016

View reviewed changes

use regex to eliminate php/html/etc suffix from title

e25e681

Minoru approved these changes Dec 24, 2016

View reviewed changes

Minoru merged commit e25e681 into akrennmair:master Feb 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deduce article title from url if it is empty #453

deduce article title from url if it is empty #453

p1er commented Dec 23, 2016

coveralls commented Dec 23, 2016 •

edited

Loading

Minoru left a comment

Minoru Dec 23, 2016

Minoru Dec 23, 2016

coveralls commented Dec 24, 2016 •

edited

Loading

coveralls commented Dec 24, 2016

coveralls commented Dec 24, 2016

Minoru commented Dec 24, 2016

Minoru commented Jan 25, 2017

Minoru commented Feb 12, 2017

Minoru commented Feb 12, 2017

deduce article title from url if it is empty #453

deduce article title from url if it is empty #453

Conversation

p1er commented Dec 23, 2016

coveralls commented Dec 23, 2016 • edited Loading

Minoru left a comment

Choose a reason for hiding this comment

Minoru Dec 23, 2016

Choose a reason for hiding this comment

Minoru Dec 23, 2016

Choose a reason for hiding this comment

coveralls commented Dec 24, 2016 • edited Loading

coveralls commented Dec 24, 2016

coveralls commented Dec 24, 2016

Minoru commented Dec 24, 2016

Minoru commented Jan 25, 2017

Minoru commented Feb 12, 2017

Minoru commented Feb 12, 2017

coveralls commented Dec 23, 2016 •

edited

Loading

coveralls commented Dec 24, 2016 •

edited

Loading