Subtitles

SharkHunter edited this page Jan 7, 2012 · 9 revisions

For those of us that does not have English as our native language it could be nice to have subtitles when looking at movies/tv shows. There are various sites out there that offers subtitles files for you to download. From version 0.70 Channels allows you to perform this downloading when you start watching the stream and automatically integrates the subtitle. This wiki will try to describe how you can integrate your own subtitle sites in to Channels. Currently Channels integrates support for s4u.se which is a Swedish site that has a nice API. This will serve as example in this wiki.


Concept

The basic design idea behind Channels has been to use the web to find links (what is normally called scraping). The .ch files describes how and in what order Channels will find those links. This idea continues in to the subtitle configuration. Subtitles are found at some subtitle site which contains zillions of subtitles. Since Channels just want to find the subtitle for the movie/tv show that you are about to watch it needs to search the subtitle site for just that subtitle file. The final goal of the subtitle configuration is an url to the actual subtitle file that Channels can download.

Basic Configuration

As with all configuration in Channels subtitles are added in .ch files. Subtitles are just like scripts globally configured and can thus be placed in a .ch file of their own. Each subtitle definition must (just like scripts) be given a name so that it can be used on a particular channel. Subtitle sites are defined with the subdef primitive.


subdef s4u {
	# http://s4u.se/?film=Airplane!
	#Http://api.s4u.se/ Version / ApiKey / xml | json | serialize / movie | serie | all / imdb | tmdb | tvdb | title | rls | fname / SearchString / 
	url=http://api.s4u.se/Beta/DemoKey/xml/all/title/
	# <div class="DL_Box"> <a href="dl.php?cat=film&amp;dl=9062">
	matcher=<download_file>([^<]+)</download_file>
	best_match=1
        lang=swe
	name_script=s4uName
}

As always line starting with a '#' is a comment and are ignored by Channels. The subdef looks much like a channel definition. Straight after the keyword subdef comes the name of the subtitle site (in this case s4u ). Like with channels there is a url statement that instructs Channels where the search are to be done. This url will then be appended with the name of the movie/show (see below). The matcher statement will be applied to the fetched page and this statement should produce a link to the actual subtitle file. This is the file that Channel then will fetch and use as subtitles. The other lines are described in detail below.

Activating subtitle on a channel

To use the subtitles on a particular channel add a subtitle=<subname> to your media or NaviX folder. You can only use the subtitle primitive on media objects or on NaviX folders. If used anywhere else it is ignored. From 1.07 it is possible to give a list of subtitles i.e subtitle=<subA>,<subB>. Channels will then search for the subtitle at the sites in order until it is found.

Finding the name

The subtitle site is assumed to have a search system so all that Channels have to do is to give it the name and read the response and then there is the subtitle file right? Well yes normally it would be but the name that Channel has when it begins to stream might not be the name of the movie/show it might be the name of the site or something silly like "Part1". To solve this issue a new property is added to the media object. The property is called name_index and should be given a value of where to find the name. It is assumed that the name can be found somewhere in the folder structure displayed so far. The name_index=<x> specifies how many levels backwards channels will find the name that it use to search on the subtitle site (x is the number of levels it Channels should go back. 0 is the last).

Example.

The following is what has been browsed to find a playable MegaVideo link (via IceFilms ) IceFilms->Movies->Popular->Inception (2010)->PART 1->PART 1 MegaVideo->PLAY (the -> are just there to show where new folders has been showed the rest is the names that has been displayed). Now PLAY is added by Channels as a result of [save configuration ](https://github.com/SharkHunter/Channel/wiki/Save-(or-record) and this folder is not counted since it might be gone if the save configuration is changed. So the last folder where a name can be searched for is "PART 1 MegaVideo" and this folder has thus index 0. This is not something that subtitle sites will find any good matches for. So we'll need to go further back. Next up is "PART 1" which is also not a good name, so index 1 is skipped as well. "Inception (2010)" does however look more like a name that can be fed to subtitle sites. This can be used so we set name_index=2 .

The name_index can be a combination of path elements. This is needed for Tv shows where the show name (The Big Bang Theory) comes before all episodes and the search sites needs both series name and some episode info. Combining path elements is done with a "+".

Example

IceFilms->Tv Shows->Popular->The Big Bang Theory (2007)->1x01 Pilot->1x01 Pilot->1x01 Pilot MegaVideo->PLAY The double 1x01 Pilot is due to the fact that there are multiple sources for the episode we just pick one here. Again start at zero at 1x01 Pilot MegaVideo (the PLAY is automatically disregarded by Channels). We need the series name which can be found at index 3. Then we need to combine this with the episode info which can be found at either 1 or 2. We choose 2 since for other shows the name at one might be called "Source3#" or something. Thus we set name_index=3+2 resulting in a name for this show to be "The Big Bang Theory (2007) 1x01 Pilot".

Name found all done?

Yes it could be but then again it isn't. That "(2010)" stuff at the end might confuse the subtitle site. There is no movie called "Inception (2010)". Some sites might be smart enough to understand that this is a year but then again some might not. So this must be fixed. For this purpose the name_script option exist. This a NaviX script that is supposed to mangle the name (that is "Inception (2010)") and as "url" return the name that can be fed to the search engine. This feature is more important for TV shows then for movies since a TV show normally is called something like s01e01 - Pilot. This name is not liked very well by the subtitle site. To solve the name mangling and have a uniform interface between the channel and the sub site each channel should define a sub_script. Define a subscript like any NIPL script and then add the sub_script=<script name> to the channel definition at top level. This script should as result set the "url" variable to the mangled name. Optionally (but strongly encouraged) it should also set the variables "year","season" and "episode" to values that it can retrieve from the backtracked name.

Mangle the name again

Once the channel has mangled the name and set the "real" name and possibly the "year","season" and "episode" it is time for the sub site script to do its mangling. The sub def defines a name_script (a NIPL script) that is called with the "year","season" and "episode" variables already set (if set by the channels script) and a variable called "lang" set to the prefered language. By default this is set to a three letter language code but if the property iso2 is set channels will convert it to the equivalent two letter code. The name_script should return the url part that can be concatenated with the url to be sent to the sub site.

Supported languages

Each subtitle site configuration must have an lang field set. This field should be set to a comma separated list of supported languages. If the site doesn't support any language that the user prefer then the site will not be used. This is the reason for configuring multiple subtitle sites.

Fetch the subtitles

Once the name has been mangled by the script (or left unchanged if no script was defined) the name is concatenated with the "url" to produce the complete url that is the sent to the site.

In the above example the complete url would be http://api.s4u.se/Beta/DemoKey/xml/all/title/Inception If the reply is to complex to be able to use a single matcher expression a new "script" section can be used.

Zipped subtitles

Sometimes the subtitles are spread out over more than one .srt file and instead bundled into a zip file (it is of course possible that only one .srt file is placed in a zip file to reduce space etc.). From 1.07 Channels supports zipped subtitles. If the subtitle file is a zip file Channels will extract all .srt files from it (only the .srt files are extracted). If the property zip_concat is set Channels will concatenate all these .srt files to one large one. Once all files has been extracted the zip file is deleted unless the zip_keep property is set, then the zip file will be kept. Channels will consider a subtitle to zipped if it contains the word zip anywhere in the url. If the url does not contain zip but you know that the file is a zip file set the property zip_force to instruct Channels to treat the file as a zip file anyhow. NOTE!! It is recommended to ALWAYS use the zip_force option if the subtitle site uses zip files to make sure that they are handled properly.


Selecting your language

The user can set the language priority via the PMS GUI. This is done under the "Transcode" tab and further under the "Mencoder options" there you'll find something called "Subtitles language priority". This is a comma separated list of language codes that can be used as subtitles. Note that it is a strict priority list. If a subtitle for the language first in the list is found it will be used and the other languages will not be considered.