Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAKING] A Smart Scraping system to allow for standardized scraping metadata and inferring information from only the filename and file data. (Fixes #167) #170

Merged
merged 33 commits into from
Oct 19, 2015

Conversation

chyyran
Copy link
Member

@chyyran chyyran commented Sep 7, 2015

This pull request breaks the IScraper API and removes the IIdentifier API

This pull request adds 3 new interfaces and changes the Snowflake.Scraper.IScraper API

IStructuredFilename
Implemented as Snowflake.Romfile.StructuredFilename

Represents a structured ROM filename using either the TOSEC, GoodTools or NoIntro naming convention. This provides scrapers a standard way of getting a game's title and related metadata such as region code and year.

IScrapableInfo
Implemented as Snowflake.Romfile.ScrapableInfo

This data is passed to the scraper plugin and contains information such as a queryable title and game ID if present.

IFileSignature
Implemented per platform in Snowflake.Romfile.FileSignatures
This interface provides a contract by which certain platforms are able to infer and determine a game ID and platform.

IScrapeEngine
Implemented as Snowflake.Service.ScrapeEngine
This interface provides a service is able to create an IScrapableInfo given the ROM filename by using FileScrapers to determine the relevant platform and by extracting any information from the filename itself. It is also able to calculate the best result given a set of scrapers and automatically use the most accurate one.

@chyyran chyyran changed the title [WIP BREAKING] A Smart Scraping system to allow for standardized scraping metadata and inferring information from only the filename and file data. [WIP BREAKING] A Smart Scraping system to allow for standardized scraping metadata and inferring information from only the filename and file data. (#167) Sep 7, 2015
@chyyran chyyran changed the title [WIP BREAKING] A Smart Scraping system to allow for standardized scraping metadata and inferring information from only the filename and file data. (#167) [WIP BREAKING] A Smart Scraping system to allow for standardized scraping metadata and inferring information from only the filename and file data. (Fixes #167) Sep 7, 2015
@chyyran chyyran added this to the Alpha Preview Release milestone Sep 7, 2015
@chyyran
Copy link
Member Author

chyyran commented Sep 8, 2015

Waiting on #171 as we don't want to change ICoreService every time we add a new service.

@chyyran
Copy link
Member Author

chyyran commented Sep 12, 2015

Merged master into this

@chyyran
Copy link
Member Author

chyyran commented Sep 12, 2015

This PR should also encompass the deprecation of PluginPreferencesDatabase, or atleast using it to determine scraper order. Instead use a graceful fallback system, if one scraper fails use the next most accurate as determined by a accuracy_weight value int in the plugin.json file. The scraper engine should also be able to determine the likeliness of the scraped value. Although quick scraping is desirable, accuracy takes preference over speed.

@chyyran
Copy link
Member Author

chyyran commented Sep 16, 2015

FIleSignatures left to write

We can probably also identify .NES files by the iNES header.

Any other types we will have a method that matches file extensions to rom type for best-guess

@chyyran chyyran mentioned this pull request Sep 17, 2015
@chyyran
Copy link
Member Author

chyyran commented Sep 21, 2015

This isn't going to build because I'm working with a FileSignatureTester project that I'm using to test the filesignature plugins locally, which I'm not going to commit, so until I've written all the FileSignatures and cleaned it up it won't build.

@chyyran chyyran mentioned this pull request Sep 21, 2015
8 tasks
@chyyran
Copy link
Member Author

chyyran commented Sep 22, 2015

I don't usually rebase commits before a merge, but since this is shaping up to be a huge PR, I'm putting down a reminder to squash similar commits here.

@chyyran
Copy link
Member Author

chyyran commented Sep 24, 2015

The N64 FIleSignature comparator uses @bryanperris's code Soft64.IO.ByteSwapStream, and is licensed under GPL thusly. Perhaps adding a special thanks section to NOTICE would be in order due to use of third party code in these FileSignatures (SFOSharp as well)

@chyyran
Copy link
Member Author

chyyran commented Sep 24, 2015

@chyyran
Copy link
Member Author

chyyran commented Oct 5, 2015

LGTM after some unit tests and integration testing. The javascript bindings will also need updating as well as scrapers.

@chyyran
Copy link
Member Author

chyyran commented Oct 11, 2015

I'm porting the existing scraper plugins to the new API and writing some tests. To be honest this PR has been in the queue for too long so I'm hoping I can merge by next week once I've confirmed everything's working.

@chyyran
Copy link
Member Author

chyyran commented Oct 14, 2015

We're in business

We're in business 👍

chyyran added a commit to SnowflakePowered/Scraper-TheGamesDB that referenced this pull request Oct 15, 2015
@chyyran
Copy link
Member Author

chyyran commented Oct 19, 2015

Just have to write tests and this will be merged

@chyyran chyyran changed the title [WIP BREAKING] A Smart Scraping system to allow for standardized scraping metadata and inferring information from only the filename and file data. (Fixes #167) [BREAKING] A Smart Scraping system to allow for standardized scraping metadata and inferring information from only the filename and file data. (Fixes #167) Oct 19, 2015
@chyyran
Copy link
Member Author

chyyran commented Oct 19, 2015

I've looked over it and since most of this stuff deals with plugins it will be difficult to properly testing it unit-wise. I'll merge and write tests for IStructuredFilename later.

chyyran added a commit that referenced this pull request Oct 19, 2015
[BREAKING] A Smart Scraping system to allow for standardized scraping metadata and inferring information from only the filename and file data. (Fixes #167)
@chyyran chyyran merged commit d35c4b3 into SnowflakePowered:master Oct 19, 2015
@chyyran chyyran deleted the smart-scraper branch October 19, 2015 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant