GitHub - JaminB/Quick-Torrent-API: An API with methods that glean data from popular tracker sites and have the ability to find the optimal download link.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
quickTorrent		quickTorrent
.classpath		.classpath
.gitignore		.gitignore
.project		.project
DictionaryList.txt		DictionaryList.txt
README		README
bad_words.txt		bad_words.txt

Repository files navigation

Quick-Torrent API (Pre-Alpha)

Overview: 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An extensive tracker-site data-mining API, containing methods that allow anything from simple information gathering, to "smart torrent searches."
Provides methods for converting magnet links to torrent files for download.
Provides methods for ranking and comparing torrents.
Currently supports the piratebay.sx and kickass.to
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-------------------------------------------------------------------------------
                              The sites package
-------------------------------------------------------------------------------
The sites package contains all the classes associated with a specific site. 
These classes contain methods for searching and ranking torrents on tracker sites.
	
Class hierarchy within the sites package (top to bottom):
	SimpleSearch extends Rating;
	Rating extends BuildCache;
	KATBuildCache extends KATGrep;
		 
----------------------
The Traversal Routine:
----------------------
The sites packages hold all the methods necessary for traversing sites and running queries except for actually downloading the HTML, this is handled by the connect packages.
	
This API traverses the target site very similarly to the way a user would. The basic pattern is as follows: 
	
search term --> search URI --> search URI HTML --> all detail pages URIs --> detail page HTML --> find size, seeds, leeches, other torrent info
	
String searchTerm: The the music/movie title the user is searching
i.e "Some song"

String mediaType: The type of media you are searching (right now only songs and movies)
i.e "movie"

String searchURI: The URI generated by the sites.Grep.createParsedURI method
i.e searchURI = createParsedURI("mySearch", "movie")
	
String searchPage: The plain HTML of a searchURI generated when the searchURI is passed to the connect.GetHTTP.getWebPageHTTP or connect.GetGzippedHTTP.getWebPageHTTP methods
i.e searchPage = getWebPageHTTP(searchURI) or 
searchPage = getWebPageHTTP(createParsedURI("mySearch", "movie"))
	
String detailsURI: The URI generated by the sites.Grep.grepDetailsURI method, note that grepDetailsURI returns an array of n number of links.
i.e detailsURI = grepDetailsPage(searchURI) or 
detailsURI = grepDetailsPage(getWebPageHTTP(createParsedURI("mySearch", "movie")))
			
String detailsPage: The plain HTML of the detailsURI generated when the detailsURI is passed to the connect.GetHTTP.getWebPageHTTP or connect.GetGzippedHTTP.getWebPageHTTP methods
i.e detailsPage = getWebPageHTTP(detailsURI) or 
detailsPage = getWebPageHTTP(grepDetailsPage(getWebPageHTTP(createParsedURI("mySearch", "movie")))[0])
			
---------
PageGrep:
---------
The traversal routine has only one purpose to return a detailsPage. The detailsPage variable contains the size, number of seeds, number of leeches, and magnetLinks fields.
This information is critical if a person would like to find the best torrentLink later on.
To see how many seeds exist on a given detailsPage it's as simple as passing the entire detailsPage String to the grepSites method.
	
i.e grepSeeds(detailsPage) or
grepSeeds(getWebPageHTTP(grepDetailsPage(getWebPageHTTP(createParsedURI("mySearch", "movie"))))[0]))
	
to find the number of leeches:
grepLeeches(detailsPage) or 
grepLeeches(getWebPageHTTP(grepDetailsPage(getWebPageHTTP(createParsedURI("mySearch", "movie"))))[0]))
	
to find the size of the torrent:
grepSize(detailsPage) or 
grepSize(getWebPageHTTP(grepDetailsPage(getWebPageHTTP(createParsedURI("mySearch", "movie"))))[0]))
	
and to find the associated MagnetLink
grepMagnetLink(detailsPage) or 
grepMagnetLink(getWebPageHTTP(grepDetailsPage(getWebPageHTTP(createParsedURI("mySearch", "movie"))))[0]))
	
---------------
BuildDataCache:
---------------
PageGrep holds many primitive methods that can perform very simple operations like returning the number of seeds associated with a torrent.
The BuildDataCache is designed to simplify this process. Given an array of detailsURIs (generated by the grepDetailsURI method) BuildDataCache will store every seed, leech, and size value and associate it to a detailsURI.
In this way a person can search the results of a single query 
	
i.e buildDataCache(detailsURIs) where detailsURIs is an array of detailsURIs or
buildDataCache(getWebPageHTTP(grepDetailsPage(getWebPageHTTP(createParsedURI("mySearch", "movie")))))
	
Like the PageGrep class BuildDataCache also has methods for accessing it's cached content.
the dataCacheToArray method will separate out seeds, leeches, sizes, and detailLinks from the the cache.
	i.e dataCacheToArray(getDataCache(), "seed") or
		dataCacheToArray(getDataCache(), "leech") or
		dataCacheToArray(getDataCache(), "size") or
		dataCacheToArray(getDataCache(), "link")

The qualityFilter method is a simple rating/comment sniffer that removes untrusted results from your cache. To use it simply pass your existing cache through the qualityFilter method.
The filter will clone your existing cache and then connect back back to the site and remove any untrusted links.
	i.e qualityFilter(getDataCache()) will return a list of good links.

The leechToInt, seedToInt, and sizeToFloat act the same way but will perform two operations instead of one, as they split and convert parts of the cache to their respected data-type.
		i.e leechToInt(getDataCache()) returns the integer values of all the leeches stored in the cache
		i.e seedToInt(getDataCache()) returns the integer values of all the seeds stored in the cache
		i.e sizeToFloat(getDataCache()) returns the float values of all the various sizes stored in the cache
	
*So how do you use this information?*
	Once you plug your cache into any of these methods it becomes extremely easy to see the associations between values.
	For example the seed value stored at dataCacheToArray(getDataCache(), "seed")[0] can be connected to the dataCacheToArray(getDataCache(), "link")[0] which can be connected to dataCacheToArray(getDataCache(), "leech")[0]
	
-------
Rating:
-------
Once you have aggregated all of your data into a cache you are ready to rate. In the future I will be using a heuristic to determine the best results, right now these are simply sorted with basic boolean logic.
	
Most of these methods are self-explanatory and easily understood. You can set up your variables in the constructor or pass them directly to the methods.
	
This class is extremely customizable but it is important to remember that it is built on top of the BuildCache class and thus will only work if you first generate a cache.
	
-------------
SimpleSearch:
-------------
SimpleSearch is designed to be the easiest part of this API. It has only one purpose: to find the best download link for a given searchTerm and mediaType. The below example is all you have to do to get searching
i.e SimpleSearch myNewSearch = new SimpleSearch("Some Song", "music"); myNewSearch.findBestDownload();
	
-------------
Test Classes:
-------------
Any class ending in test should run out of the box. Use these to better understand how the API is put together.
	
-------------------------------------------------------------------------------
                              The Connect Package
-------------------------------------------------------------------------------
The connect package has one job. To retrieve information from web-sites. More specifically it pulls down HTML so that it can be parsed by the methods up above.
---------------------------
GetHTTP and GetGzippedHTTP:
---------------------------
Given a URI this method will grab the HTML from that page using HTTP protocol
	i.e getWebPageHTTP("http://www.google.com") will return the HTML from that page
	GetGzippedHTTP works the exact same way but can decode gzipped content (i.e kickass.to pages)

-------------------------------------------------------------------------------
                             The Converters Package
-------------------------------------------------------------------------------
Currently this package only has one purpose: to convert magnetLinks to torrentFiles (URIs)
Given a magnetLink it will search through the isohunt database or torcache database and return the torrentLink URI if it finds one.

-------------------------------------------------------------------------------
                              The FileIO package
-------------------------------------------------------------------------------
This package's main purpose is to download .torrent files to a destination of choice.

-------------------------------------------------------------------------------
                              The Globals Package
-------------------------------------------------------------------------------
----------
constants:
----------
Contains static global constants accessible by all classes

----------
variables:
----------
Contains static variables accessible by all classes