Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically extract information from http://minecraft.gamepedia.com/ ? #229

Closed
rom1504 opened this issue Mar 22, 2015 · 21 comments
Closed

Comments

@rom1504
Copy link
Member

rom1504 commented Mar 22, 2015

http://minecraft.gamepedia.com/ is a really complete reference on many things on minecraft.

There are already some scripts (https://github.com/andrewrk/mineflayer/blob/master/bin/transform1_recipes.js for example, currently broken though) to extract the recipes from that wiki.
And I think we could extract more things, for example everything that's on the infobox (see http://minecraft.gamepedia.com/Rabbit%27s_Foot vs https://github.com/andrewrk/mineflayer/blob/master/lib/enums/items.json#L950 )

I'm not sure this can really be applied here, but http://dbpedia.org/ has a really good framework to extract information from wikipedia infoboxes and the infoboxes from http://minecraft.gamepedia.com/ just look like the ones from Wikipedia so that might be interesting to look into.

@Kupferhirn has extracted the items manually from that wiki (see #227) and that's nice, but doing the same thing automatically would be really nice.

Edit: well I think the extraction framework of DBpedia is probably way to big for that, doing some simple scripts would be easier.

@thejoshwolfe
Copy link
Contributor

An alternative to scraping a wiki is to install debug statements into the game itself. That would be guaranteed to be 100% correct and complete (at least for the mechanical data like id numbers), but it relies on the Minecraft Coder Pack project being caught up to the latest version of Minecraft. I can't really find any authoritative information on MCP anymore; I wonder if that project is still alive.

@rom1504
Copy link
Member Author

rom1504 commented Mar 22, 2015

I think the official site of MCP is there http://www.modcoderpack.com/website/releases .
Yeah I agree there are many ways to do it.
For example @deathcap is working on upgrading burger (TkTech/Burger#12)

So I think whatever ways we can extract these infos automatically is fine.

@roblabla
Copy link
Member

Relying on MCP is a bad idea. The project seems very volatile, sadly. A bukkit or forge plugin could also extract information and would seem more stable.

@thejoshwolfe
Copy link
Contributor

I thought Forge was built on MCP. Maybe it used to be? If Forge works with 1.8.3, then that seems like the way to go.

What seems so attractive about a mod/plugin is that all the heavy data comes straight from Mojang. The only thing the community provides in this case is a scraping tool. The wiki is community maintained, and might be wrong. Bukkit is community maintained and might be wrong.

The downside of scraping the minecraft binary itself is that you don't always get very good string names and descriptions. Perhaps scraping would only be appropriate for recipes and a sanity check list of id numbers.

@roblabla
Copy link
Member

Forge is built on MCP, but public builds of MCP take longer and longer to get released.

Bukkit is based on mojang's minecraft server, it can hardly be wrong. They use a similar technique as MCP, but do it themselves.

@thejoshwolfe
Copy link
Contributor

Bukkit is based on mojang's minecraft server, it can hardly be wrong.

Bukkit currently doesn't know about Granite: https://github.com/Bukkit/Bukkit/search?utf8=%E2%9C%93&q=granite (contrast with: https://github.com/Bukkit/Bukkit/search?utf8=%E2%9C%93&q=acacia )

Bukkit, like the wiki, is supposed to be kept up to date by the community. This makes it inherently less trustworthy than the actual data in the notchian game itself, which we know must be right at all times by definition.

A Forge plugin still seems like the most reliable solution to me at this point.

@roblabla
Copy link
Member

This is the wrong repo. Bukkit repo's last commit is in 2014 august. Spigot is still up-to-date and does know about granite, prismarine, etc...

@thejoshwolfe
Copy link
Contributor

This is the wrong repo.

Oh ok. Where do we get the current source? Or are you proposing we write a Bukkit plugin to dump the data from the Bukkit runtime binary?

@roblabla
Copy link
Member

Yes, that's what I was proposing. A forge plugin works too though.

Current source is closed due to the DMCA stuff

@rom1504
Copy link
Member Author

rom1504 commented Mar 22, 2015

I started fixing the recipes extractor.
And as expected : not all the blocks info are correct, for example the "Trapdoor" https://github.com/andrewrk/mineflayer/blob/master/lib/enums/blocks.json#L1096
is now named "Wooden Trapdoor" (http://minecraft.gamepedia.com/Trapdoor#Crafting)

I think there are many other such errors, that's why some kind of automatic extractor is needed for this.

I will still update the recipes but it won't be perfect until we have an extractor for the blocks and the items (the recipes extractor depend on having correct items.json and blocks.json)

rom1504 added a commit that referenced this issue Mar 22, 2015
…ecipes with that.

Also put the output file in the arguments of the file instead of printing to stdout.
I used merge_recipes.js so recipes aren't changed, just added. blocks.json and items.json aren't fully updated (see #229) so some recipes are probably still missing.
@rom1504
Copy link
Member Author

rom1504 commented Mar 23, 2015

I'm currently extracting from the html of http://minecraft.gamepedia.com/Crafting#Complete_recipe_list but it's not very reliable (or easy).
Getting the wiki source of that might be useful, I didn't find how to do that for the complete list, but it's possible for a single item (for example http://minecraft.gamepedia.com/index.php?title=Andesite&action=edit&section=3) which might be easier to parse.
To use the individual pages it would be needed to get them all : should be integrated in the script.

The wiki source is generally much easier to parse than the html, and it might be possible to parse the items and blocks information from it (see the source of the infobox there http://minecraft.gamepedia.com/index.php?title=Andesite&action=edit)

Edit: apparently the complete list is generated with a script like that http://minecraft.gamepedia.com/Module:Recipe_list , this might be useful

Edit2: there's a "Pocket Edition only" or "Console edition only" note on some of the recipes, check that on the script (and remove the recipes that shouldn't have been added if needed)

@Kupferhirn
Copy link
Contributor

"trapdoor" is the unlocationed name from the notchian client. I have checked all block that could have changed

@rom1504
Copy link
Member Author

rom1504 commented Mar 23, 2015

@Kupferhirn "name": "trapdoor", is ok , the problem is "displayName": "Trapdoor",
And other similar stuff (I think most blocks/items with a different qualifiers like this have problem at least in the displayName)
And I need the display name to be coherent in my script to extract the recipes.

I don't have it right now, but I'll put here a list of blocks/items with problems tonight if that can be useful.

@rom1504
Copy link
Member Author

rom1504 commented Mar 23, 2015

@rom1504
Copy link
Member Author

rom1504 commented Mar 23, 2015

@rom1504
Copy link
Member Author

rom1504 commented Mar 23, 2015

The recipes of the furnace are there http://minecraft.gamepedia.com/Smelting

For the brewing stand : http://minecraft.gamepedia.com/Brewing

see http://minecraft.gamepedia.com/Template:Grid#Other_templates for various grid-related pages.

@rom1504
Copy link
Member Author

rom1504 commented Mar 27, 2015

this should somehow go in https://github.com/PrismarineJS/minecraft-data

@rom1504
Copy link
Member Author

rom1504 commented Mar 27, 2015

I think I might just start by making a script to get the wiki source of everything on the wiki, because there is a lot of information on it, not just recipes.

@pokeball99
Copy link

Or have said info hosted on a new repo,aND get it to draw info from it

@rom1504
Copy link
Member Author

rom1504 commented Mar 27, 2015

@pokeball99 that's already done there but we still need to extract minecraft info to put it in minecraft-data ;)

@rom1504
Copy link
Member Author

rom1504 commented Mar 27, 2015

Ok, this issue PrismarineJS/minecraft-data#8 tracks the progress for the wiki extraction.
If someone want to work on extraction from burger, mcp or whatever else, he can open an issue on the minecraft-data repo.

Closing this issue.

@rom1504 rom1504 closed this as completed Mar 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants