Estimating the Number of Movie Torrents

Project Description

A multivariate regression model using pre-release film data to estimate the number of torrent copies that might become available online.

Motivation

As more media content (films, tv shows, music, etc.) move online to streaming providers and other digital sources, 'pirating' content illegally has become easier through the use of torrent sites. This investigation is intended to look at what factors can better predict the number of torrent copies that will become readily available online for a given movie.

Data Sources

Film Data
- The Numbers - The Numbers provides detailed movie financial analysis, including box office, DVD and Blu-ray sales reports, and release schedules
- OMDB API - The OMDb API is a free web service to obtain movie information
Torrent Data
- Kickass Torrents - KAT provided a directory for torrent files and magnet links to facilitate peer-to-peer file sharing using the BitTorrent protocol.
- Torrentz - Torrentz is a meta-search engine for BitTorrent that indexed torrents from various major torrent websites, and offered compilations of various trackers.
- Note - The torrent data sources for this project have since been either seized or voluntarily shutdown.
  - Kickass Torrents was seized by the US Justice Department on July 20th, 2016.
  - Torrentz was voluntarily shutdown by its owners on August 5th, 2016.

Libraries Utilized

beautifulsoup4, requests - Retrieve and extract data sources from web
s3fs - Data storage in AWS (access available upon request)
numpy, pandas - Clean and aggregate data sources
sckit-learn - Estimator models

Process

Crawl web pages to retrieve data sources
ETL - Extract data from web page requests, clean up data and transform into respective data type (int, string, etc.), load data into S3 for long term storage
Combine/join data sources into one table/multi-dimensional array
Train and test regression models to make future predictions
Tune/optimize model parameters and perform feature engineering - repeat step 4

Results

Original presentation delivered on 07/15/2016

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
notebooks		notebooks
src		src
static		static
.codeclimate.yml		.codeclimate.yml
.gitattributes		.gitattributes
.gitignore		.gitignore
.pylintrc		.pylintrc
.pyup.yml		.pyup.yml
EDA.ipynb		EDA.ipynb
LICENSE		LICENSE
Model.ipynb		Model.ipynb
README.md		README.md
Runtime_Analysis_Draft1.ipynb		Runtime_Analysis_Draft1.ipynb
Test_Analysis2.ipynb		Test_Analysis2.ipynb
Total_Analysis1_57%.ipynb		Total_Analysis1_57%.ipynb
Total_Analysis2.ipynb		Total_Analysis2.ipynb
analysis.ipynb		analysis.ipynb
charts.ipynb		charts.ipynb
requirements.txt		requirements.txt
summary_analysis.ipynb		summary_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Estimating the Number of Movie Torrents

Project Description

Motivation

Data Sources

Libraries Utilized

Process

Results

About

Releases

Packages

Contributors 2

Languages

License

bryantbiggs/movie_torrents

Folders and files

Latest commit

History

Repository files navigation

Estimating the Number of Movie Torrents

Project Description

Motivation

Data Sources

Libraries Utilized

Process

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages