Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Bug 3448 Make youtube video URLs visible to non-flash users. One edited ... #338

Closed
wants to merge 4 commits into from

3 participants

@deborahgu
Collaborator

...test, config changes to allow youtube apis.

  • If this goes prod memcache should probably be cleared for this memkey.
  • I don't have an eye for style and wasn't sure if this link should be styled at all.
  • Should the api call be a schwartz worker to make it async? It does only get called once per created embed.
@afuna
Owner

I've been thinking about the call to a worker -- btw gearman would work better here than theschwartz. I think I'd prefer it being async, though, because even though it's created only once per embed, if you do it synchronously, you'd have to wait for it to finish loading before you could show the "success" page. Whereas if it was async, you could show the success page after we're done processing things locally, fire something off...

cgi-bin/LJ/EmbedModule.pm
((68 lines not shown))
+ my $ua = LJ::get_useragent( role => 'vimeo', timeout => 60 );
+ my $api_url = "http://vimeo.com/api/v2/video/"
+ . $vid_id
+ . ".json";
+
+ # Pass request to the user agent and get a response back
+ my $request = HTTP::Request->new(GET => $api_url);
+ my $res = $ua->request($request);
+
+ # Check the outcome of the response
+ if ($res->is_success) {
+ my $obj = from_json( $res->content );
+ $linktext = '"'
+ . ${$obj}[0]{'title'}
+ . '" '
+ . BML::ml('embedmedia.vimeo')
@afuna Owner
afuna added a note

use LJ::Lang::ml here please! BML::ml should only be used on bml pages (and hopefully not even there, gone in a general phaseout...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((71 lines not shown))
+ . ".json";
+
+ # Pass request to the user agent and get a response back
+ my $request = HTTP::Request->new(GET => $api_url);
+ my $res = $ua->request($request);
+
+ # Check the outcome of the response
+ if ($res->is_success) {
+ my $obj = from_json( $res->content );
+ $linktext = '"'
+ . ${$obj}[0]{'title'}
+ . '" '
+ . BML::ml('embedmedia.vimeo')
+ . ")";
+ } else {
+ warn "error getting video info from Vimeo: ", $res->status_line, "\n";
@afuna Owner
afuna added a note

cough. warn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
@@ -257,6 +267,101 @@ sub _extract_num_unit {
return ( $num, "%" );
}
+# extract src info if useful
+# returns a hash of link text, url
+# currently handles: YouTube, Vimeo
+sub extract_src_info {
@afuna Owner
afuna added a note

I do think that given how well this is separated out, it should be fairly straightforward to convert to be used by a gearman worker, and just fire off a job when the embed is created/edited!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@afuna afuna commented on the diff
etc/config.pl
@@ -71,9 +71,9 @@
# );
@afuna Owner
afuna added a note

Config snuck in.

BTW, I'd suggest putting configs into ext/local, following the instructions here:

http://wiki.dwscoalition.org/wiki/index.php/Dreamwidth_Scratch_Installation#Editing_the_config_files

so you don't have to worry about configs in the future.

@deborahgu Collaborator

Grrr. Yeah, I did that, and for some reason it stopped working until I added it 2 places. I meant to debug but just put it in the etc dir as a stopgap and forgot about it. sigh.

@afuna Owner
afuna added a note

Hmm. Added to which two places? Anyway would be happy to help you poke at it if you want, or just cheer you on otherwise if you're poking it on your onw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@afuna afuna commented on the diff
doc/config-private.pl.txt
@@ -155,6 +155,10 @@
# email => ,
#);
+ #%YOUTUBE =(
@afuna Owner
afuna added a note

Hmm. what's the difference between this, and the one in config-local.pl.txt? do we need both or just one?

@deborahgu Collaborator

The one in private has the API key, the one in local has the URL. (for the record, when it gets pushed live we will need a Google API key. We can use the one that I registered, or somebody with a site administrative address can register more officially.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
bin/upgrading/update-db-general.pl
@@ -4143,6 +4147,21 @@
"ALTER TABLE externalaccount " .
"ADD COLUMN active enum('1', '0') NOT NULL default '1'" );
}
+
+ # Needed to cache embed titles to minimize external API calls
+ if ( column_type( "embedcontent", "title" ) eq '' ) {
@afuna Owner
afuna added a note

I think an error here -- "title" -- but there's no title column. Maybe you meant linktext? (same for the second table below too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((9 lines not shown))
## where new items overwrites old ones
my $table_name = ($preview) ? 'embedcontent_preview' : 'embedcontent';
- $journal->do("REPLACE INTO $table_name (userid, moduleid, content) VALUES ".
- "(?, ?, ?)", undef, $journal->userid, $id, $cmptext);
+ $journal->do( "REPLACE INTO $table_name " .
+ "(userid, moduleid, content, linktext, url) " .
+ "VALUES (?, ?, ?, ?, ?)",
+ undef, $journal->userid, $id, $cmptext, $src_info->{'linktext'}, $src_info->{'url'} );
@afuna Owner
afuna added a note

(style) don't need the quotes around the hash key here (same in a bunch of places too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@afuna afuna commented on the diff
cgi-bin/LJ/EmbedModule.pm
((17 lines not shown))
die $journal->errstr if $journal->err;
# save in memcache
my $memkey = $class->memkey($journal->userid, $id, $preview);
- LJ::MemCache::set($memkey, $contents);
+ my $cref = { content => $cmptext,
@afuna Owner
afuna added a note

hmm, I'd suggest changing the memkey from e.g., embedcontpreviwe to embedcontpreview2, etc, because it's otherwise not easy to dump just a subset of memcache.

@deborahgu Collaborator

Thanks. That's easier than clearing memcache, which is what I thought would be the best solution. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@afuna afuna commented on the diff
cgi-bin/LJ/Web.pm
@@ -3829,6 +3833,7 @@ sub placeholder_link {
<a href="$link">
<img src="$img" class="LJ_Placeholder" title="Click to show embedded content" />
</a>
+ $direct_link
@afuna Owner
afuna added a note

I like this. I think I'd have forgotten to update here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
@@ -257,6 +267,101 @@ sub _extract_num_unit {
return ( $num, "%" );
}
+# extract src info if useful
+# returns a hash of link text, url
+# currently handles: YouTube, Vimeo
+sub extract_src_info {
+ my ($class, $content) = @_;
+ my ($url, $site, $href, $linktext);
+
+ if ( $content =~ /src="http:\/\/.*youtube\.com/ ) {
+ # YouTube
+
+ my $host = "https://www.youtube.com/";
+ my $prefix = "watch?v=";
+
+ # get the video ID
+ $content =~ /.*src="[^"]*embed\/([^"]*)".*/;
@afuna Owner
afuna added a note

Hmm. Why does the first regex contain youtube.com, whereas the second one doesn't?

@deborahgu Collaborator

So this was a little handwavy, because the url styles can change at any time. But the idea was if it's a youtube embed, your regexp has already selected for that, so now we're just selecting for the video string in the attribute of the style src="http://www.youtube.com/embed/MfstYSUscBc"

I could put it there, too -- would make the regexp slightly more efficient. My thought proces was that if YT changed the embed address (say, to embed.youtu.be, for example) that would leave us fewer places to change the string. I'm not married to either way.

@afuna Owner
afuna added a note
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((33 lines not shown))
+ . "&key="
+ . $apikey
+ . "&part=snippet";
+ # Pass request to the user agent and get a response back
+ my $request = HTTP::Request->new(GET => $queryurl);
+ my $res = $ua->request($request);
+
+ # Check the outcome of the response
+ if ($res->is_success) {
+ my $obj = from_json( $res->content );
+ $linktext = '"'
+ . ${$obj}{'items'}[0]{'snippet'}{'title'}
+ . '" ('
+ . BML::ml('embedmedia.youtube')
+ . ")";
+ } else {
@afuna Owner
afuna added a note

warn here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@afuna afuna commented on the diff
cgi-bin/LJ/EmbedModule.pm
@@ -366,12 +474,14 @@ sub module_iframe_tag {
# append a random string to the name so it can't be targetted by links
my $id = "embed_${journalid}_$moduleid";
my $name = "${id}_" . LJ::make_auth_code( 5 );
-
+ my $direct_link = defined $url
+ ? '<div><a href="' . $url . '">' . $linktext . '</a></div>' : '';
@afuna Owner
afuna added a note

Since linktext / URL are from outside sources, we should make sure to ehtml these to make sure that they don't cause issues. This needs to be done any time we use the linktext/url on a page.

@zorkian Owner
zorkian added a note

(Important) Yes please, add the ehtml call.

@deborahgu Collaborator

yep, sanitizing, headdesk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@afuna afuna commented on the diff
cgi-bin/LJ/Web.pm
((6 lines not shown))
my $img = delete $opts{img} || "$LJ::IMGPREFIX/videoplaceholder.png";
+ my $direct_link = defined $url
+ ? '<div class="embed_link"><a href="' . $url . '">' . $linktext . '</a></div>' : '';
@afuna Owner
afuna added a note

(same concern here, re: escaping)

@zorkian Owner
zorkian added a note

(Important) Yup, escaping.

@deborahgu Collaborator

I went ahead and did this at define-time, rather than at use, so it's harder to miss if a new use turns up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@afuna
Owner

Phew, way overdue for which I apologize! I got caught up in the to-worker-or-not question -- but that just took way too long. So sorry!

cgi-bin/DW/Worker/EmbedWorker.pm
((49 lines not shown))
+ if keys %$arg;
+ return $job->permanent_failure("Missing argument")
+ unless defined $contents && defined $journalid;
+
+ my $result = LJ::EmbedModule->contact_external_sites({
+ vid_id => $vid_id,
+ host => $host,
+ preview => $preview,
+ contents => $contents,
+ cmptext => $cmptext,
+ journalid => $journalid,
+ id => $id,
+ linktext => $linktext,
+ url => $url,
+ });
+ my $sclient = LJ::theschwartz();
@zorkian Owner
zorkian added a note

Why do you get this client?

@deborahgu Collaborator

Because I didn't delete a line of defunct code. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
@@ -49,8 +52,7 @@ sub save_module {
my $need_new_id = !defined $id;
if (defined $id) {
- my $old_content = $class->module_content( moduleid => $id,
- journalid => LJ::want_userid($journal) ) || '';
+ my $old_content = $class->module_content( moduleid => $id, journalid => LJ::want_userid($journal))->{content} || '';
@zorkian Owner
zorkian added a note

(Style) Why did you break the multi-line? We prefer breaking long lines. (Scrolling is a bummer!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
t/clean-embed.t
@@ -474,7 +474,7 @@ note( "Testing parse_embed (We parse the embed contents first from a post)" );
# check the iframe contents
# LJ::EmbedModule takes the content and cleans it
- my $got_embed = LJ::EmbedModule->module_content( journalid => $u->userid, moduleid => 1 );
+ my $got_embed = LJ::EmbedModule->module_content( journalid => $u->userid, moduleid => 1 )->{'content'};
@zorkian Owner
zorkian added a note

(Style) Don't quote hash keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
htdocs/tools/embedcontent.bml
@@ -39,7 +39,7 @@ _c?>
journalid => $journalid,
moduleid => $moduleid,
preview => $preview,
- );
+ )->{'content'};
@zorkian Owner
zorkian added a note

(Style) Don't quote hash keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((14 lines not shown))
+
+ if ( $contents =~ /src="http:\/\/.*youtube\.com/ ) {
+ # YouTube
+
+ my $host = "https://www.youtube.com/";
+ my $prefix = "watch?v=";
+
+ # construct the URL and link text
+ $contents =~ /.*src="[^"]*embed\/([^"]*)".*/;
+ my $vid_id = $1;
+ $url = $host . $prefix . $vid_id;
+ $linktext = LJ::Lang::ml('embedmedia.youtube');
+
+ # Fire off the worker to get the correct title
+ my $sclient = LJ::theschwartz();
+ die "Can't get TheSchwartz client" unless $sclient;
@zorkian Owner
zorkian added a note

(Style) Marginally cleaner to say:

my $sclient = LJ::theschwartz()
    or croak "Can't get TheSchwartz client";
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((28 lines not shown))
+ my $sclient = LJ::theschwartz();
+ die "Can't get TheSchwartz client" unless $sclient;
+ my $job = TheSchwartz::Job->new_from_array("DW::Worker::EmbedWorker",
+ { vid_id => $vid_id,
+ host => 'youtube',
+ preview => $preview,
+ contents => $contents,
+ cmptext => $cmptext,
+ journalid => $journal->id,
+ preview => $preview,
+ id => $id,
+ linktext => $linktext,
+ url => $url,
+ });
+ die "Can't create job" unless $job;
+ $sclient->insert($job) or die ("Can't queue youtube api job: $@");
@zorkian Owner
zorkian added a note

(Style) Don't use parenthesis on this argument. Also, prefer croak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((33 lines not shown))
+ preview => $preview,
+ contents => $contents,
+ cmptext => $cmptext,
+ journalid => $journal->id,
+ preview => $preview,
+ id => $id,
+ linktext => $linktext,
+ url => $url,
+ });
+ die "Can't create job" unless $job;
+ $sclient->insert($job) or die ("Can't queue youtube api job: $@");
+
+ } elsif ( $contents =~ /src="http:\/\/.*vimeo\.com/ ) {
+ # Vimeo's default c/p embed code contains a link to the
+ # video by title. If that's present, don't build a link.
+ warn "$contents";
@zorkian Owner
zorkian added a note

(Important) Debugging crept in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((47 lines not shown))
+ # video by title. If that's present, don't build a link.
+ warn "$contents";
+ my $host = "http://vimeo.com/";
+
+ # get the video ID
+ $contents =~ /.*src="[^"]*vimeo\.com\/video\/([^"]*)".*/;
+ my $vid_id = $1;
+
+ $url = $host . $vid_id;
+
+ # error getting video info from Vimeo
+ $linktext = LJ::Lang::ml('embedmedia.vimeo');
+
+ # Fire off the worker to get the correct title
+ my $sclient = LJ::theschwartz();
+ die "Can't get TheSchwartz client" unless $sclient;
@zorkian Owner
zorkian added a note

(Style) As above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((61 lines not shown))
+ my $sclient = LJ::theschwartz();
+ die "Can't get TheSchwartz client" unless $sclient;
+ my $job = TheSchwartz::Job->new_from_array("DW::Worker::EmbedWorker",
+ { vid_id => $vid_id,
+ host => 'vimeo',
+ preview => $preview,
+ contents => $contents,
+ cmptext => $cmptext,
+ journalid => $journal->id,
+ preview => $preview,
+ id => $id,
+ linktext => $linktext,
+ url => $url,
+ });
+ die "Can't create job" unless $job;
+ $sclient->insert($job) or die ("Can't queue vimeo api job: $@");
@zorkian Owner
zorkian added a note

(Style) As above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((42 lines not shown))
+ die "Can't create job" unless $job;
+ $sclient->insert($job) or die ("Can't queue youtube api job: $@");
+
+ } elsif ( $contents =~ /src="http:\/\/.*vimeo\.com/ ) {
+ # Vimeo's default c/p embed code contains a link to the
+ # video by title. If that's present, don't build a link.
+ warn "$contents";
+ my $host = "http://vimeo.com/";
+
+ # get the video ID
+ $contents =~ /.*src="[^"]*vimeo\.com\/video\/([^"]*)".*/;
+ my $vid_id = $1;
+
+ $url = $host . $vid_id;
+
+ # error getting video info from Vimeo
@zorkian Owner
zorkian added a note

(Trivial) This comment seems incorrect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((148 lines not shown))
+ . ${$obj}[0]{title}
+ . '" ('
+ . LJ::Lang::ml('embedmedia.vimeo')
+ . ")";
+ } else {
+ # error getting video info from Vimeo
+ $linktext = LJ::Lang::ml('embedmedia.vimeo');
+ }
+ } else {
+ # Not one of our known embed types
+ return 'fail';
+ }
+
+ ## embeds for journal entry pre-post preview are stored in a special table,
+ ## where new items overwrites old ones
+ my $table_name = ($preview) ? 'embedcontent_preview' : 'embedcontent';
@zorkian Owner
zorkian added a note

(Style) Don't need the parenthesis around the first bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
cgi-bin/LJ/EmbedModule.pm
((149 lines not shown))
+ . '" ('
+ . LJ::Lang::ml('embedmedia.vimeo')
+ . ")";
+ } else {
+ # error getting video info from Vimeo
+ $linktext = LJ::Lang::ml('embedmedia.vimeo');
+ }
+ } else {
+ # Not one of our known embed types
+ return 'fail';
+ }
+
+ ## embeds for journal entry pre-post preview are stored in a special table,
+ ## where new items overwrites old ones
+ my $table_name = ($preview) ? 'embedcontent_preview' : 'embedcontent';
+ $journal->do( "REPLACE INTO $table_name " .
@zorkian Owner
zorkian added a note

(Style) I find all the quotes, dots, etc to be distracting and easy to mess up. Lately I've been doing this:

$journal->do(
    qq{REPLACE INTO $table_name
           (userid, moduleid, content, linktext, url)
       VALUES (?, ?, ?, ?, ?)},
    undef, $journal->userid, $id, $cmptext, $linktext, $url
);

It's just a style thing, but it looks cleaner to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@deborahgu
Collaborator

two reminders if this commit is ready to go: it needs an API key, and it needs workers.conf to be edited.

@zorkian
Owner

There were some new warn "FOO" lines added, so I took the liberty of cleaning it up. I pulled down the branch, flattened the commits into one, removed the warnings, trimmed trailing whitespace, and removed the changes to etc/config.pl. Functionally, I changed nothing.

7e6c7ca

I recommend you investigate settings for your text editor of choice to either highlight or auto-strip trailing whitespace. I find it really useful to be able to see it but not auto-strip it, personally, to make sure I don't commit any.

Thank you so much for the patch and all of your patience with this process! :)

@zorkian zorkian closed this
@deborahgu deborahgu deleted the branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jun 3, 2013
  1. @deborahgu

    Bug 3448 Make youtube video URLs visible to non-flash users. One edit…

    deborahgu authored
    …ed test, config changes to allow youtube apis.
  2. @deborahgu

    Per Fu's feedback, removing unecessary punctuation, fixing wrong colu…

    deborahgu authored
    …mn name, pulling out warn-text, removing deprecated BML-specific i18n code
Commits on Jun 7, 2013
  1. @deborahgu
Commits on Jun 8, 2013
  1. @deborahgu

    aaccounting for all of the code review comments, including sanitizing…

    deborahgu authored
    … input, stylistic changes, a few other things.
This page is out of date. Refresh to see the latest.
View
5 bin/upgrading/en.dat
@@ -645,8 +645,13 @@ Enjoy!
[[siteroot]]/
.
+
emailpost.reply.address=Reply as [[user]]
+embedmedia.vimeo=Watch on Vimeo
+
+embedmedia.youtube=Watch on YouTube
+
entryform.adultcontent=Age Restriction
entryform.adultcontent.concepts=Viewer Discretion Advised
View
26 bin/upgrading/update-db-general.pl
@@ -2508,6 +2508,8 @@
userid INT UNSIGNED NOT NULL,
moduleid INT UNSIGNED NOT NULL,
content TEXT,
+ linktext VARCHAR(255),
+ url VARCHAR(255),
PRIMARY KEY (userid, moduleid)
)
@@ -2655,9 +2657,11 @@
## --
register_tablecreate("embedcontent_preview", <<'EOC');
CREATE TABLE embedcontent_preview (
- userid int(10) unsigned NOT NULL default '0',
- moduleid int(10) NOT NULL default '0',
- content text,
+ userid int(10) unsigned NOT NULL default '0',
+ moduleid int(10) NOT NULL default '0',
+ content text,
+ linktext VARCHAR(255),
+ url VARCHAR(255),
PRIMARY KEY (userid,moduleid)
) ENGINE=InnoDB
@@ -4146,12 +4150,28 @@
"ADD COLUMN active enum('1', '0') NOT NULL default '1'" );
}
+
if ( column_type( "spamreports", "client" ) eq '' ) {
do_alter( "spamreports",
"ALTER TABLE spamreports " .
"ADD COLUMN client VARCHAR(255), " .
"ADD INDEX (client)"
);
+
+ # Needed to cache embed linktext to minimize external API calls
+ if ( column_type( "embedcontent", "linktext" ) eq '' ) {
+ do_alter( 'embedcontent',
+ "ALTER TABLE embedcontent "
+ . "ADD COLUMN linktext VARCHAR(255), "
+ . "ADD COLUMN url VARCHAR(255);" );
+ }
+
+ if ( column_type( "embedcontent_preview", "linktext" ) eq '' ) {
+ do_alter( 'embedcontent_preview',
+ "ALTER TABLE embedcontent_preview "
+ . "ADD COLUMN linktext VARCHAR(255), "
+ . "ADD COLUMN url VARCHAR(255);" );
+
}
});
View
29 bin/worker/embeds
@@ -0,0 +1,29 @@
+#!/usr/bin/perl
+#
+# bin/worker/embeds
+#
+# TheSchwartz worker for crossposting
+#
+# Authors:
+# Deborah Kaplan <deborah@dreamwidth.org>
+#
+# Copyright (c) 2013 by Dreamwidth Studios, LLC.
+#
+# This program is free software; you may redistribute it and/or modify it under
+# the same terms as Perl itself. For a copy of the license, please reference
+# 'perldoc perlartistic' or 'perldoc perlgpl'.
+
+use strict;
+use lib "$ENV{LJHOME}/extlib/lib/perl5";
+use lib "$ENV{LJHOME}/cgi-bin";
+BEGIN {
+ require 'ljlib.pl';
+}
+
+use LJ::Worker::TheSchwartz;
+use DW::Worker::EmbedWorker;
+
+schwartz_decl( $_ )
+ foreach (DW::Worker::EmbedWorker->schwartz_capabilities);
+
+schwartz_work(); # Never returns.
View
72 cgi-bin/DW/Worker/EmbedWorker.pm
@@ -0,0 +1,72 @@
+#!/usr/bin/perl
+#
+# DW::Worker::EmbedWorker
+#
+# TheSchwartz worker module for getting information about
+# embedded media content. Called with:
+# LJ::theschwartz()->insert('DW::Worker::EmbedWorker', {
+# ?? });
+#
+# Authors:
+# Deborah Kaplan <deborah@dreamwidth.org>
+#
+# Copyright (c) 2013 by Dreamwidth Studios, LLC.
+#
+# This program is free software; you may redistribute it and/or modify it under
+# the same terms as Perl itself. For a copy of the license, please reference
+# 'perldoc perlartistic' or 'perldoc perlgpl'.
+
+use strict;
+use warnings;
+
+package DW::Worker::EmbedWorker;
+use base 'TheSchwartz::Worker';
+
+sub schwartz_capabilities { return ('DW::Worker::EmbedWorker'); }
+
+# Retry nine times. Final back off times are lengthy (half a day,
+# a day) in case the remote site is having problems.
+sub max_retries { 9 }
+sub retry_delay {
+ my ($class, $fails) = @_;
+
+ return (10, 30, 60, 300, 600, 1200, 2400, 43200, 86400)[$fails];
+}
+sub grab_for { 600 } # Give the stable hand 600 seconds (10 minutes) to finish
+sub keep_exit_status_for { 86400 } # Keep the result of the feeding attempt for 24 hours
+
+# Attempts to contact the embed hosting API for more information, sets memcache and db
+# currently handles: YouTube, Vimeo
+sub work {
+ my ($class, $job) = @_;
+
+ my $arg = { %{$job->arg} };
+
+ my ($vid_id, $host, $contents, $preview, $journalid, $id, $cmptext, $linktext, $url)
+ = map { delete $arg->{$_} } qw( vid_id host contents preview journalid id cmptext linktext url );
+
+ return $job->permanent_failure("Unknown keys: " . join(", ", keys %$arg))
+ if keys %$arg;
+ return $job->permanent_failure("Missing argument")
+ unless defined $contents && defined $journalid;
+
+ my $result = LJ::EmbedModule->contact_external_sites({
+ vid_id => $vid_id,
+ host => $host,
+ preview => $preview,
+ contents => $contents,
+ cmptext => $cmptext,
+ journalid => $journalid,
+ id => $id,
+ linktext => $linktext,
+ url => $url,
+ });
+ if ( $result eq 'fail' ) {
+ return $job->permanent_failure("Unknown failure");
+ } elsif ( $result eq 'warn' ) {
+ return $job->failed("Did not reach remote site, retrying.");
+ }
+ $job->completed;
+}
+
+1;
View
283 cgi-bin/LJ/EmbedModule.pm
@@ -17,6 +17,8 @@ use strict;
use Carp qw (croak);
use LJ::Auth;
use HTML::TokeParser;
+use LJ::JSON;
+use TheSchwartz;
# states for a finite-state machine we use in parse()
use constant {
@@ -40,6 +42,7 @@ my %embeddable_tags = map { $_ => 1 } qw( object embed iframe );
sub save_module {
my ($class, %opts) = @_;
+
my $contents = $opts{contents} || '';
my $id = $opts{id};
my $journal = $opts{journal}
@@ -49,8 +52,8 @@ sub save_module {
my $need_new_id = !defined $id;
if (defined $id) {
- my $old_content = $class->module_content( moduleid => $id,
- journalid => LJ::want_userid($journal) ) || '';
+ my $old_content = $class->module_content( moduleid => $id,
+ journalid => LJ::want_userid($journal))->{content} || '';
my $new_content = $contents;
# old content is cleaned by module_content(); new is not
@@ -70,16 +73,30 @@ sub save_module {
my $cmptext = 'C-' . LJ::text_compress($contents);
- ## embeds for preview are stored in a special table,
+ # construct a direct link to the object if possible
+ my $src_info = $class->extract_src_info({ contents => $contents,
+ cmptext => $cmptext,
+ journal => $journal,
+ preview => $preview,
+ id => $id,
+ });
+
+ ## embeds for journal entry pre-post preview are stored in a special table,
## where new items overwrites old ones
my $table_name = ($preview) ? 'embedcontent_preview' : 'embedcontent';
- $journal->do("REPLACE INTO $table_name (userid, moduleid, content) VALUES ".
- "(?, ?, ?)", undef, $journal->userid, $id, $cmptext);
+ $journal->do( "REPLACE INTO $table_name " .
+ "(userid, moduleid, content, linktext, url) " .
+ "VALUES (?, ?, ?, ?, ?)",
+ undef, $journal->userid, $id, $cmptext, $src_info->{linktext}, $src_info->{url} );
die $journal->errstr if $journal->err;
# save in memcache
my $memkey = $class->memkey($journal->userid, $id, $preview);
- LJ::MemCache::set($memkey, $contents);
+ my $cref = { content => $cmptext,
@afuna Owner
afuna added a note

hmm, I'd suggest changing the memkey from e.g., embedcontpreviwe to embedcontpreview2, etc, because it's otherwise not easy to dump just a subset of memcache.

@deborahgu Collaborator

Thanks. That's easier than clearing memcache, which is what I thought would be the best solution. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ linktext => $src_info->{linktext},
+ url => $src_info->{url},
+ };
+ LJ::MemCache::set($memkey, $cref);
return $id;
}
@@ -115,10 +132,10 @@ sub _expand_tag {
return '[invalid site-embed, id is missing]' unless $attrs{id};
if ($opts{expand_full}){
- return $class->module_content(moduleid => $attrs{id}, journalid => $journal->id);
+ return $class->module_content(moduleid => $attrs{id}, journalid => $journal->id)->{content};
} elsif ($edit) {
return '<site-embed ' . join(' ', map {"$_=\"$attrs{$_}\""} keys %attrs) . ">" .
- $class->module_content(moduleid => $attrs{id}, journalid => $journal->id) .
+ $class->module_content(moduleid => $attrs{id}, journalid => $journal->id)->{content} .
"<\/site-embed>";
} else {
@opts{qw /width height/} = @attrs{qw/width height/};
@@ -257,6 +274,189 @@ sub _extract_num_unit {
return ( $num, "%" );
}
+# Returns a hash of link text, url
+# Provides the fallback link text for when host API has not been contacted for title
+# Currently handles: YouTube, Vimeo
+sub extract_src_info {
+ my ($class, $args) = @_;
+ my ($site, $href);
+
+
+ my ($contents, $cmptext, $journal, $id, $preview, $vid_id, $host, $linktext, $url)
+ = map { delete $args->{$_} } qw( contents cmptext journal id preview vid_id host linktext url );
+
+ if ( $contents =~ /src="http:\/\/.*youtube\.com/ ) {
+ # YouTube
+
+ my $host = "https://www.youtube.com/";
+ my $prefix = "watch?v=";
+
+ # construct the URL and link text
+ $contents =~ /.*src="[^"]*embed\/([^"]*)".*/;
+ my $vid_id = $1;
+ $url = LJ::ehtml($host . $prefix . $vid_id);
+ $linktext = LJ::Lang::ml('embedmedia.youtube');
+
+ # Fire off the worker to get the correct title
+ my $sclient = LJ::theschwartz()
+ or croak "Can't get TheSchwartz client";
+ my $job = TheSchwartz::Job->new_from_array("DW::Worker::EmbedWorker",
+ { vid_id => $vid_id,
+ host => 'youtube',
+ preview => $preview,
+ contents => $contents,
+ cmptext => $cmptext,
+ journalid => $journal->id,
+ preview => $preview,
+ id => $id,
+ linktext => $linktext,
+ url => $url,
+ });
+ die "Can't create job" unless $job;
+ $sclient->insert($job)
+ or croak "Can't queue youtube api job: $@";
+
+ } elsif ( $contents =~ /src="http:\/\/.*vimeo\.com/ ) {
+ # Vimeo's default c/p embed code contains a link to the
+ # video by title. If that's present, don't build a link.
+ my $host = "http://vimeo.com/";
+
+ # get the video ID
+ $contents =~ /.*src="[^"]*vimeo\.com\/video\/([^"]*)".*/;
+ my $vid_id = $1;
+
+ $url = LJ::ehtml($host . $vid_id);
+ $linktext = LJ::Lang::ml('embedmedia.vimeo');
+
+ # Fire off the worker to get the correct title
+ my $sclient = LJ::theschwartz()
+ or croak "Can't get TheSchwartz client";
+ my $job = TheSchwartz::Job->new_from_array("DW::Worker::EmbedWorker",
+ { vid_id => $vid_id,
+ host => 'vimeo',
+ preview => $preview,
+ contents => $contents,
+ cmptext => $cmptext,
+ journalid => $journal->id,
+ preview => $preview,
+ id => $id,
+ linktext => $linktext,
+ url => $url,
+ });
+ die "Can't create job" unless $job;
+ $sclient->insert($job)
+ or croak "Can't queue vimeo api job: $@";
+ } else {
+ # Not one of our known embed types
+ $linktext = "";
+ $url = "";
+ }
+
+ return { linktext => $linktext, url => $url };
+}
+
+# Used by TheSchwartz to contact external embed site APIs
+sub contact_external_sites {
+ my ($class, $args) = @_;
+
+ my ($vid_id, $host, $contents, $preview, $journalid, $id, $cmptext, $linktext, $url)
+ = map { delete $args->{$_} } qw( vid_id host contents preview journalid id cmptext linktext url );
+
+ my ($site, $href);
+ my $journal = LJ::want_user($journalid);
+
+ warn "FOO";
+ if ( $host eq 'youtube' ) {
+
+ # Get our YouTube API variables and set up the variables
+ # for constructing a YouTube URL. If we don't have an API
+ # key, we shouldn't be here
+ if ( $LJ::YOUTUBE_CONFIG{apikey} ) {
+ warn "FOO1";
+ my $api_url = $LJ::YOUTUBE_CONFIG{api_url};
+ my $apikey = $LJ::YOUTUBE_CONFIG{apikey};
+
+ # put together the GET request to get the video title
+ my $ua = LJ::get_useragent( role => 'youtube', timeout => 60 );
+ my $queryurl = $api_url
+ . $vid_id
+ . "&key="
+ . $apikey
+ . "&part=snippet";
+ # Pass request to the user agent and get a response back
+ warn "FOO $queryurl";
+ my $request = HTTP::Request->new(GET => $queryurl);
+ my $res = $ua->request($request);
+
+ # Check the outcome of the response
+ if ($res->is_success) {
+ warn "FOO2";
+ my $obj = from_json( $res->content );
+ $linktext = '"'
+ . LJ::ehtml(${$obj}{items}[0]{snippet}{title})
+ . '" ('
+ . LJ::Lang::ml('embedmedia.youtube')
+ . ")";
+ } else {
+ warn "FOO3";
+ # error getting video info from youtube
+ return 'warn';
+ }
+ } else {
+ # no API key; use generic text
+ return 'fail';
+ }
+ } elsif ( $host eq 'vimeo' ) {
+
+ # put together the GET request to get the video title
+ my $ua = LJ::get_useragent( role => 'vimeo', timeout => 60 );
+ my $api_url = "http://vimeo.com/api/v2/video/"
+ . $vid_id
+ . ".json";
+
+ # Pass request to the user agent and get a response back
+ my $request = HTTP::Request->new(GET => $api_url);
+ my $res = $ua->request($request);
+
+ # Check the outcome of the response
+ if ($res->is_success) {
+ my $obj = from_json( $res->content );
+ $linktext = '"'
+ . LJ::ehtml(${$obj}[0]{title})
+ . '" ('
+ . LJ::Lang::ml('embedmedia.vimeo')
+ . ")";
+ } else {
+ # error getting video info from Vimeo
+ return 'warn';
+ }
+ } else {
+ # Not one of our known embed types
+ return 'fail';
+ }
+
+ ## embeds for journal entry pre-post preview are stored in a special table,
+ ## where new items overwrites old ones
+ my $table_name = $preview ? 'embedcontent_preview' : 'embedcontent';
+ $journal->do(
+ qq{REPLACE INTO $table_name
+ (userid, moduleid, content, linktext, url)
+ VALUES (?, ?, ?, ?, ?)},
+ undef, $journal->userid, $id, $cmptext, $linktext, $url
+ );
+ die $journal->errstr if $journal->err;
+
+ # save in memcache
+ my $memkey = $class->memkey($journal->userid, $id, $preview);
+ my $cref = { content => $cmptext,
+ linktext => $linktext,
+ url => $url,
+ };
+ LJ::MemCache::set($memkey, $cref);
+
+
+}
+
sub module_iframe_tag {
my ($class, $u, $moduleid, %opts) = @_;
@@ -267,10 +467,13 @@ sub module_iframe_tag {
my $preview = defined $opts{preview} ? $opts{preview} : '';
# parse the contents of the module and try to come up with a guess at the width and height of the content
- my $content = $class->module_content( moduleid => $moduleid, journalid => $journalid, preview => $preview );
- my $width = 0;
- my $height = 0;
- my $width_unit = "";
+ my $embed_details = $class->module_content( moduleid => $moduleid, journalid => $journalid, preview => $preview );
+ my $content = $embed_details->{content};
+ my $linktext = $embed_details->{linktext};
+ my $url = $embed_details->{url};
+ my $width = 0;
+ my $height = 0;
+ my $width_unit = "";
my $height_unit = "";
my $p = HTML::TokeParser->new(\$content);
my $embedcodes;
@@ -366,12 +569,14 @@ sub module_iframe_tag {
# append a random string to the name so it can't be targetted by links
my $id = "embed_${journalid}_$moduleid";
my $name = "${id}_" . LJ::make_auth_code( 5 );
-
+ my $direct_link = defined $url
+ ? '<div><a href="' . $url . '">' . $linktext . '</a></div>' : '';
@afuna Owner
afuna added a note

Since linktext / URL are from outside sources, we should make sure to ehtml these to make sure that they don't cause issues. This needs to be done any time we use the linktext/url on a page.

@zorkian Owner
zorkian added a note

(Important) Yes please, add the ehtml call.

@deborahgu Collaborator

yep, sanitizing, headdesk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
my $auth_token = LJ::eurl(LJ::Auth->sessionless_auth_token('embedcontent', moduleid => $moduleid, journalid => $journalid, preview => $preview,));
my $iframe_link = qq{http://$LJ::EMBED_MODULE_DOMAIN/?journalid=$journalid&moduleid=$moduleid&preview=$preview&auth_token=$auth_token};
- my $iframe_tag = qq {<iframe src="$iframe_link" } .
- qq{width="$width$width_unit" height="$height$height_unit" allowtransparency="true" frameborder="0" class="lj_embedcontent" id="$id" name="$name"></iframe>};
-
+ my $iframe_tag = qq {<iframe src="$iframe_link" }
+ . qq{width="$width$width_unit" height="$height$height_unit" allowtransparency="true" frameborder="0" class="lj_embedcontent" id="$id" name="$name"></iframe>}
+ . qq{$direct_link};
+
my $remote = LJ::get_remote();
return $iframe_tag unless $remote;
return $iframe_tag if $opts{edit};
@@ -401,6 +606,8 @@ sub module_iframe_tag {
height => $height,
height => $height_unit,
img => "$LJ::IMGPREFIX/videoplaceholder.png",
+ url => $url,
+ linktext => $linktext,
);
}
@@ -411,20 +618,26 @@ sub module_content {
croak "No moduleid" unless defined $moduleid;
$moduleid += 0;
- my $journalid = $opts{journalid}+0 or croak "No journalid";
+ my $journalid = $opts{journalid}+0
+ or croak "No journalid";
my $journal = LJ::load_userid($journalid) or die "Invalid userid $journalid";
return '' if ($journal->is_expunged);
my $preview = $opts{preview};
# try memcache
my $memkey = $class->memkey($journalid, $moduleid, $preview);
- my $content = LJ::MemCache::get($memkey);
+ my ($content, $linktext, $url); # for direct linking
+ my $cref = LJ::MemCache::get($memkey);
+ $content = $cref->{content};
+ $linktext = $cref->{linktext};
+ $url = $cref->{url};
my ($dbload, $dbid); # module id from the database
unless (defined $content) {
my $table_name = ($preview) ? 'embedcontent_preview' : 'embedcontent';
- ($content, $dbid) = $journal->selectrow_array("SELECT content, moduleid FROM $table_name WHERE " .
- "moduleid=? AND userid=?",
- undef, $moduleid, $journalid);
+ ($content, $dbid, $linktext, $url) = $journal->selectrow_array("SELECT " .
+ "content, moduleid, linktext, url FROM $table_name " .
+ "WHERE moduleid=? AND userid=?",
+ undef, $moduleid, $journalid);
die $journal->errstr if $journal->err;
$dbload = 1;
}
@@ -436,22 +649,34 @@ sub module_content {
# clean js out of content
LJ::CleanHTML::clean_embed( \$content );
+ my $return_content;
+
# if we got stuff out of database
if ($dbload) {
- # save in memcache
- LJ::MemCache::set($memkey, $content);
-
# if we didn't get a moduleid out of the database then this entry is not valid
- return defined $dbid ? $content : "[Invalid lj-embed id $moduleid]";
+ $return_content = {
+ content => defined $dbid ? $content : "[Invalid lj-embed id $moduleid]",
+ linktext => $linktext,
+ url => $url,
+ };
+
+ # save in memcache
+ LJ::MemCache::set($memkey, $return_content);
+ } else {
+ # get rid of whitespace around the content
+ $return_content = {
+ content => LJ::trim($content) || '',
+ linktext => $linktext,
+ url => $url,
+ };
}
- # get rid of whitespace around the content
- return LJ::trim($content) || '';
+ return $return_content;
}
sub memkey {
my ($class, $journalid, $moduleid, $preview) = @_;
- my $pfx = $preview ? 'embedcontpreview' : 'embedcont';
+ my $pfx = $preview ? 'embedcontpreview2' : 'embedcont2';
return [$journalid, "$pfx:$journalid:$moduleid"];
}
@@ -472,7 +697,7 @@ sub reconstruct {
next;
}
- # FIXME: ultra ghetto.
+ # FIXME: not the right way to do this.
$attr->{$name} = LJ::no_utf8_flag($attr->{$name});
$txt .= " $name=\"" . LJ::ehtml($attr->{$name}) . "\"";
View
5 cgi-bin/LJ/Web.pm
@@ -3770,8 +3770,12 @@ sub placeholder_link {
my $width_unit = delete $opts{width_unit} || "px";
my $height_unit = delete $opts{height_unit} || "px";
my $link = delete $opts{link} || '';
+ my $url = delete $opts{url} || '';
+ my $linktext = delete $opts{linktext} || '';
my $img = delete $opts{img} || "$LJ::IMGPREFIX/videoplaceholder.png";
+ my $direct_link = defined $url
+ ? '<div class="embed_link"><a href="' . $url . '">' . $linktext . '</a></div>' : '';
@afuna Owner
afuna added a note

(same concern here, re: escaping)

@zorkian Owner
zorkian added a note

(Important) Yup, escaping.

@deborahgu Collaborator

I went ahead and did this at define-time, rather than at use, so it's harder to miss if a new use turns up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
return qq {
<div class="LJ_Placeholder_Container" style="width: ${width}${width_unit}; height: ${height}${height_unit};">
<div class="LJ_Placeholder_HTML" style="display: none;">$placeholder_html</div>
@@ -3779,6 +3783,7 @@ sub placeholder_link {
<a href="$link">
<img src="$img" class="LJ_Placeholder" title="Click to show embedded content" />
</a>
+ $direct_link
@afuna Owner
afuna added a note

I like this. I think I'd have forgotten to update here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
</div>
};
}
View
13 doc/config-local.pl.txt
@@ -86,6 +86,19 @@
# email => $DW::PRIVATE::PAYPAL{email},
# );
+
+ # YouTube configuration.
+ # To get access to YouTube APIs, you will need to create a Google API key.
+ # Uncomment this section and make sure to fill in the fields at the bottom of config-private.pl.
+ #%YOUTUBE_CONFIG = (
+ # # api URL, the token gets appended to this
+ # api_url => 'https://www.googleapis.com/youtube/v3/videos?id=',
+
+ # # credentials for the API
+ # apikey => $DW::PRIVATE::YOUTUBE{apikey},
+
+ #);
+
# if you define these, little help bubbles appear next to common
# widgets to the URL you define:
%HELPURL = (
View
4 doc/config-private.pl.txt
@@ -155,6 +155,10 @@
# email => ,
#);
+ #%YOUTUBE =(
@afuna Owner
afuna added a note

Hmm. what's the difference between this, and the one in config-local.pl.txt? do we need both or just one?

@deborahgu Collaborator

The one in private has the API key, the one in local has the URL. (for the record, when it gets pushed live we will need a Google API key. We can use the one that I registered, or somebody with a site administrative address can register more officially.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ # apikey => '',
+ #);
+
#%DBINFO = (
# master => {
# pass => ,
View
4 etc/config.pl
@@ -71,9 +71,9 @@
# );
@afuna Owner
afuna added a note

Config snuck in.

BTW, I'd suggest putting configs into ext/local, following the instructions here:

http://wiki.dwscoalition.org/wiki/index.php/Dreamwidth_Scratch_Installation#Editing_the_config_files

so you don't have to worry about configs in the future.

@deborahgu Collaborator

Grrr. Yeah, I did that, and for some reason it stopped working until I added it 2 places. I meant to debug but just put it in the etc dir as a stopgap and forgot about it. sigh.

@afuna Owner
afuna added a note

Hmm. Added to which two places? Anyway would be happy to help you poke at it if you want, or just cheer you on otherwise if you're poking it on your onw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
# require new free acounts to be referred by an existing user?
- $USE_ACCT_CODES = 1;
+ $USE_ACCT_CODES = 0;
- #$EVERYONE_VALID = 1; # are all users validated by default?
+ $EVERYONE_VALID = 1; # are all users validated by default?
###
### System Information
View
1  etc/workers.conf
@@ -108,6 +108,7 @@ reference-list:
birthday-notify: 1 # queue up birthday notifications
change-poster-id: 1 # remaps comments from one user to another (remapping imported)
distribute-invites: 1 # distributes invites that an admin has set up
+ embeds: 1 # grab video titles
expunge-users: 1 # expunges users
latest-feed: 1 # puts posted entries into /latest
lazy-cleanup: 1 # cleans up entry deletion
View
2  htdocs/tools/embedcontent.bml
@@ -39,7 +39,7 @@ _c?>
journalid => $journalid,
moduleid => $moduleid,
preview => $preview,
- );
+ )->{content};
return qq {
<html><head><style type="text/css">html, body { background-color:transparent; padding:0; margin:0; border:0; overflow:hidden; }</style></head><body>$content</body></html>
View
4 t/clean-embed.t
@@ -191,7 +191,7 @@ note( "Testing parse_embed (We parse the embed contents first from a post)" );
# because we have additional checks in the callers
my $invalid_embed = qq{[Invalid lj-embed id 1]};
- my $iframe = qq{<iframe ([^>]+)></iframe>};
+ my $iframe = qq{<iframe ([^>]+)></iframe>(<div><a href=""></a></div>)};
foreach ( (
# [ "title"
@@ -474,7 +474,7 @@ note( "Testing parse_embed (We parse the embed contents first from a post)" );
# check the iframe contents
# LJ::EmbedModule takes the content and cleans it
- my $got_embed = LJ::EmbedModule->module_content( journalid => $u->userid, moduleid => 1 );
+ my $got_embed = LJ::EmbedModule->module_content( journalid => $u->userid, moduleid => 1 )->{content};
if( ref $expected_iframe && ref $expected_iframe eq "Regexp" ) {
like( $got_embed, $expected_iframe, "clean_embed: $title" );
} else {
Something went wrong with that request. Please try again.