GitHub - Beluki/MovieWarDBGen: Scripts to generate and sanitize a movie database for the MovieWar game.

About

This repository contains four Python scripts that can be used to generate a movie database for the trivia game MovieWar.

You probably don't need to use them, the generated JSON files are already available in the Releases tab.

How it works

The first script, 00 download freebase.py, uses the Freebase API to download information for about 9200 movies, using Brad Bourland's top movies list as a source.

This is a great starting point for a trivia game. It's a big enough number of movies to avoid repetition. All the movies are reasonably popular. No Bollywood or obscure titles.

Sample output:

Total requests: 1 Total movies: 3000
Total requests: 2 Total movies: 6000
Total requests: 3 Total movies: 9000
Total requests: 4 Total movies: 9174
All data gathered, saving...

Freebase is a great resource, but nothing is perfect.

The second script, 01 convert freebase.py, sanitizes the list. It removes movies with no title or release date and sorts the movies by name. It also checks the release date to be between 1900 and 2000, which should be true for all the movies in the list.

5 of the 9174 movies are removed due to invalid name/dates.

Sample output:

Invalid year range: 2002 for movie: Stevie, skipping...
Empty date, skipping movie: Siddhartha
Invalid year range: 2003 for movie: Johnny Thunders: What About Me, skipping...
Invalid year range: 2004 for movie: The Big Bounce, skipping...
Invalid year range: 2004 for movie: Grizzly Falls, skipping...

The third script, 02 match omdb.py, uses the OMDB API to check that the movie names/dates are actually correct. Only the movies that have the exact same title and date on both Freebase and OMDB are kept.

In a trivia game, quality is more important than quantity. Using multiple sources of information guarantees correctness. From the 9169 movies, 7601 match in both databases.

Sample output:

1 - ok: 0 miss: 1 - movie title mismatch.
2 - ok: 0 miss: 2 - movie title mismatch.
3 - ok: 1 miss: 2 - movie ok.
4 - ok: 2 miss: 2 - movie ok.
5 - ok: 3 miss: 2 - movie ok.
6 - ok: 4 miss: 2 - movie ok.
7 - ok: 4 miss: 3 - movie title mismatch.
8 - ok: 4 miss: 4 - movie title mismatch.
9 - ok: 5 miss: 4 - movie ok.
10 - ok: 6 miss: 4 - movie ok.

The fourth script, 03 collapse years.py, looks for duplicate movie names and years. For example, it converts this:

{"name": "Jane Eyre", "year": "2011"}
{"name": "Jane Eyre", "year": "1996"}

Into this:

{"name": "Jane Eyre", "years": ["2011", "1996"]}

It also checks for duplicate years. Only one movie in the entire dataset produces this error.

Sample output:

Skipping duplicate year for movie: A Doll's House...

Portability

Information and error messages are written to stdout and stderr respectively, using the current platform newline format and encoding.

Note that since the scripts were a one-off effort (once we have the JSON there's no need to run them again), they actually do very little to no error checking.

There are no options or arguments.

The output JSON is written as UTF-8 without BOM, using Unix newlines.

I wrote and ran them on Windows 7 x86-64, using Python 3.5.0 and requests 2.9.1.

Status

This program is finished!

Those scripts served their purpose, generating a good, basic database for MovieWar. I plan no further development on them.

License

Like all my hobby projects, this is Free Software. See the Documentation folder for more information. No warranty though.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Documentation		Documentation
Source		Source
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation

Documentation

Source

Source

README.md

README.md

Repository files navigation

About

How it works

Portability

Status

License

About

Releases 1

Packages

Languages

Beluki/MovieWarDBGen

Folders and files

Latest commit

History

Repository files navigation

About

How it works

Portability

Status

License

About

Resources

Stars

Watchers

Forks

Languages