You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Dr. @katyhuff ! I've been looking into scraping from wikidata, and I think I've grabbed the gist of it. So I started to expand on scraping_wikidata.py, trying to find more information. However, I've encountered some confusions-
From my understanding, nuclear reactor is the energy source of a nuclear power plant, and nuclear power plant is the key facility of a nuclear power station, is it right? Are we looking for nuclear power plant, nuclear power station, or nuclear reactor?
I was trying to find the country of a nuclear reactor, by adding the line
?reactors wdt:P17 ?country .
into the query. However, the problem with that is, not all nuclear reactors have a country attribute (because in wikidata "country" means "sovereign state of this item"). For example, Bhabha Atomic Research Centre (https://www.wikidata.org/wiki/Q854682) doesn't have "country" attribute, although from the description we know that it is based in India. Hence the issue is, if I add this line into the query, it will filter out this entry (Bhabha Atomic Research Centre). Is this something that we should be concerned about?
Although I haven't started on Wikipedia yet, I'm a bit concerned that would querying from wikidata and wikipedia possibly cause any overlap, since wikidata stores the data of wikipedia?
If Wikidata contains data of wikipedia, how come there is "Category: Nuclear power reactor types" in Wikipedia, but not Wikidata? Am I having some kind of misunderstanding?
Would you mind discussing these issues with me? Either on here or I could make an appointment with you if you feel like it would easier to talk in person.
The text was updated successfully, but these errors were encountered:
On a separate note - from the above message I noticed that adding the code block in Item 2 breaks the Itemize pattern in Markdown, judging from the indentation of the lines after the code block before Item 3. Do you know how to fix this formatting?
At this level, plant & reactor should be effectively synonymous to you. Is there a particular case in which these words are used in a confusing way? The only caveat to that is when there is more than one reactor at a single site (sometimes 2 or 3... maybe up to 8 in international contexts). In those cases, please note the number of reactors (sometimes called reactor units) and individual information about each unit (the units often come online at different times, though they are typically the same type.)
Regarding markdown, keep the indentation of the list, and the code won't screw it up:
Sorry it took so long to answer these. Please don't hesitate to pester me, set up a meeting, etc. if it seems like I've dropped the ball.
2: .. ?reactors wdt:P17 ?country .... if I add this line into the query, it will filter out this entry (Bhabha Atomic Research Centre). Is this something that we should be concerned about?
Yes, we don't want to miss reactors. If they are missing a country attribute, then perhaps make it blank.
Although I haven't started on Wikipedia yet, I'm a bit concerned that would querying from wikidata and wikipedia possibly cause any overlap, since wikidata stores the data of wikipedia?
yes, there may be some overlap, which should be handled with some validation and error checking on your programming side of things. Can you think of ways to check whether a particular entry already exists in the database? If you can, then can you think of how you'll then avoid duplicate entries? One could write a function to check the former and then use that function to determine whether to add a new entry.
If Wikidata contains data of wikipedia, how come there is "Category: Nuclear power reactor types" in Wikipedia, but not Wikidata? Am I having some kind of misunderstanding?
Wikidata is the data backend for some of wikipedia's data. It could be much better organized, but it's just not that well organized and synchronized at this point.
Hi Dr. @katyhuff ! I've been looking into scraping from wikidata, and I think I've grabbed the gist of it. So I started to expand on scraping_wikidata.py, trying to find more information. However, I've encountered some confusions-
into the query. However, the problem with that is, not all nuclear reactors have a country attribute (because in wikidata "country" means "sovereign state of this item"). For example, Bhabha Atomic Research Centre (https://www.wikidata.org/wiki/Q854682) doesn't have "country" attribute, although from the description we know that it is based in India. Hence the issue is, if I add this line into the query, it will filter out this entry (Bhabha Atomic Research Centre). Is this something that we should be concerned about?
Although I haven't started on Wikipedia yet, I'm a bit concerned that would querying from wikidata and wikipedia possibly cause any overlap, since wikidata stores the data of wikipedia?
If Wikidata contains data of wikipedia, how come there is "Category: Nuclear power reactor types" in Wikipedia, but not Wikidata? Am I having some kind of misunderstanding?
Would you mind discussing these issues with me? Either on here or I could make an appointment with you if you feel like it would easier to talk in person.
The text was updated successfully, but these errors were encountered: