Skip to content
This repository has been archived by the owner on Apr 15, 2019. It is now read-only.

Infobox inconsistent attribute names. #127

Open
gyachdav opened this issue Apr 1, 2016 · 11 comments
Open

Infobox inconsistent attribute names. #127

gyachdav opened this issue Apr 1, 2016 · 11 comments
Assignees
Labels

Comments

@gyachdav
Copy link
Collaborator

gyachdav commented Apr 1, 2016

check out
http://awoiaf.westeros.org/index.php/Stannis_Baratheon

The scrapper did not pick up a House affiliation for Stannis because the title in the info box is royal house and not house. The scraper needs to be reconfigured to handle this.

@gyachdav gyachdav added the bug label Apr 1, 2016
@gyachdav gyachdav added this to the v0.2.0 Delivery milestone Apr 1, 2016
@Legenzoo
Copy link
Collaborator

Legenzoo commented Apr 1, 2016

I can fix this pretty quick...

@Legenzoo Legenzoo assigned Legenzoo and unassigned boriside Apr 1, 2016
@Legenzoo
Copy link
Collaborator

Legenzoo commented Apr 1, 2016

The houses are not the only issue. There are several inconsistencies in the infobox of the caracters. For example "Titles" != "Aliases" or "Book(s)" != "books" != "Books". I am actually reimplenting the whole scraper, because the regex stuff from theo is not very maintanable and slow... Sorry.
This is not a biggy and i am almost done and now i am trying to get even more information.

@sacdallago
Copy link
Contributor

Again @Adiolis do what you can and feel free to delegate to someone. Family is more important, always.

Legenzoo added a commit that referenced this issue Apr 1, 2016
@Legenzoo
Copy link
Collaborator

Legenzoo commented Apr 1, 2016

Yeah. I know.
It is just a fix of one line.

@Legenzoo
Copy link
Collaborator

Legenzoo commented Apr 1, 2016

But guys, there are more problems than only the houses.
"Titles" != "Aliases" or "Book(s)" != "books" != "Books" and so on.

Feel free to fix that.

@Legenzoo Legenzoo removed their assignment Apr 1, 2016
@sacdallago
Copy link
Contributor

@Legenzoo Legenzoo changed the title Scrape for royal house Infobox inconsistent attribute names. Apr 1, 2016
@kordianbruck
Copy link
Collaborator

How about using http://www.w3schools.com/jsref/jsref_tolowercase.asp for all the fields to normalize these things somewhat?

@Legenzoo
Copy link
Collaborator

Legenzoo commented Apr 1, 2016

Jep. Still extra fixes for "Titles" != "Aliases" and so on are necessary.
First someone needs to make a list of possible atttribute names.

@sacdallago
Copy link
Contributor

Correct. You should have an array of synonyms for a given field.

@sacdallago
Copy link
Contributor

Or you can machine learn what goes where :D I think the static approach is easier :D

@Legenzoo
Copy link
Collaborator

Legenzoo commented Apr 4, 2016

Any volunteers? 😆
@kordianbruck @boriside @togiberlin @theocheslerean @docjag @alschm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

5 participants