Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need some nuclear knowledge boost & clarification of our goals #7

Closed
ytan15 opened this issue Mar 12, 2017 · 4 comments
Closed

Need some nuclear knowledge boost & clarification of our goals #7

ytan15 opened this issue Mar 12, 2017 · 4 comments

Comments

@ytan15
Copy link
Contributor

ytan15 commented Mar 12, 2017

Hi Dr. @katyhuff ! I've been looking into scraping from wikidata, and I think I've grabbed the gist of it. So I started to expand on scraping_wikidata.py, trying to find more information. However, I've encountered some confusions-

  1. From my understanding, nuclear reactor is the energy source of a nuclear power plant, and nuclear power plant is the key facility of a nuclear power station, is it right? Are we looking for nuclear power plant, nuclear power station, or nuclear reactor?
  2. I was trying to find the country of a nuclear reactor, by adding the line
 ?reactors wdt:P17 ?country .

into the query. However, the problem with that is, not all nuclear reactors have a country attribute (because in wikidata "country" means "sovereign state of this item"). For example, Bhabha Atomic Research Centre (https://www.wikidata.org/wiki/Q854682) doesn't have "country" attribute, although from the description we know that it is based in India. Hence the issue is, if I add this line into the query, it will filter out this entry (Bhabha Atomic Research Centre). Is this something that we should be concerned about?

  1. Although I haven't started on Wikipedia yet, I'm a bit concerned that would querying from wikidata and wikipedia possibly cause any overlap, since wikidata stores the data of wikipedia?

  2. If Wikidata contains data of wikipedia, how come there is "Category: Nuclear power reactor types" in Wikipedia, but not Wikidata? Am I having some kind of misunderstanding?

Would you mind discussing these issues with me? Either on here or I could make an appointment with you if you feel like it would easier to talk in person.

@ytan15
Copy link
Contributor Author

ytan15 commented Mar 12, 2017

On a separate note - from the above message I noticed that adding the code block in Item 2 breaks the Itemize pattern in Markdown, judging from the indentation of the lines after the code block before Item 3. Do you know how to fix this formatting?

@katyhuff
Copy link
Member

At this level, plant & reactor should be effectively synonymous to you. Is there a particular case in which these words are used in a confusing way? The only caveat to that is when there is more than one reactor at a single site (sometimes 2 or 3... maybe up to 8 in international contexts). In those cases, please note the number of reactors (sometimes called reactor units) and individual information about each unit (the units often come online at different times, though they are typically the same type.)

Regarding markdown, keep the indentation of the list, and the code won't screw it up:

  1. First item
  2. Second item, has code
    def test_func(x):
        a=x
        b=a
        return b
  3. Third item, no code

@katyhuff
Copy link
Member

Sorry it took so long to answer these. Please don't hesitate to pester me, set up a meeting, etc. if it seems like I've dropped the ball.

2: .. ?reactors wdt:P17 ?country .... if I add this line into the query, it will filter out this entry (Bhabha Atomic Research Centre). Is this something that we should be concerned about?

Yes, we don't want to miss reactors. If they are missing a country attribute, then perhaps make it blank.

  1. Although I haven't started on Wikipedia yet, I'm a bit concerned that would querying from wikidata and wikipedia possibly cause any overlap, since wikidata stores the data of wikipedia?

yes, there may be some overlap, which should be handled with some validation and error checking on your programming side of things. Can you think of ways to check whether a particular entry already exists in the database? If you can, then can you think of how you'll then avoid duplicate entries? One could write a function to check the former and then use that function to determine whether to add a new entry.

  1. If Wikidata contains data of wikipedia, how come there is "Category: Nuclear power reactor types" in Wikipedia, but not Wikidata? Am I having some kind of misunderstanding?

Wikidata is the data backend for some of wikipedia's data. It could be much better organized, but it's just not that well organized and synchronized at this point.

@katyhuff
Copy link
Member

katyhuff commented Apr 3, 2017

Can this be closed @ytan15 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

2 participants