# Nuclear power around the world

### It is challanging to find a comprihensive dataset of power production on a reactor level. There is public data out there available on pris.iaea.org, however there is no way to download this data, so I built a simple web-scraped to collect everything I need.

[GitHub Repo](https://github.com/Letsopappaaa/Nuclear_Power) including all source tables I have used for the visualizations.

In [37]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time
import numpy as np

In [None]:
baseURL = 'https://pris.iaea.org/PRIS/CountryStatistics/ReactorDetails.aspx?current='

headers = {
    'User-Agent': 'Personal project; Your Name',
    'From': 'your@email.com'
}

main_tables_list = []
reactor_details_list = []

#The URL of reactor details simply ends in the index number of the reactor.
for i in range(1,1150):
	cur_url = baseURL + str(i)
	cur_html = requests.get(cur_url, headers = headers).text
	soup = BeautifulSoup(cur_html, 'html.parser')
	#Some reactor index numbers are not populated, those URLs return an error page
	if soup.find('div', id='content').h3.text.strip() != "Unauthorized Access" and soup.find('div', id='content').h3.text.strip() != "Unexpected Problem Occurred":
		##Parsing HTML to find country and reactor name, html tables
		tables = pd.read_html(cur_html)
		reactor_name = soup.find('span', id='MainContent_MainContent_lblReactorName').b.text.strip()
		country_name = soup.find('a', id='MainContent_litCaption').text.strip().capitalize()
        
        #Replacing some country names to have no commas. Tableau gets confused from commas within strings in CSV files, at least I was not able to figure out a way to make it work properly.
		if country_name == "Iran, islamic republic of":
			country_name = "Iran"
		elif country_name == "Korea, republic of":
			country_name = "Korea"
		elif country_name == "Taiwan, china":
			country_name = "Taiwan"

		#Some reactors do not have any production data, and no main data table which means there are only 2 tables on the page
		if len(tables)>2:
			##Amending main data table with country and reactor names
			tables[2]["Country"] = country_name
			tables[2]["Reactor_name"] = reactor_name
			#Add main table data to list
			main_tables_list = main_tables_list + tables[2].values.tolist()

		#Add reactor details to list
		current_detail = [tables[0][0][1],
                          tables[0][1][1],
                          tables[0][1][3],
                          tables[0][0][5],
                          tables[0][1][7],
                          tables[0][2][1],
                          tables[0][3][1],
                          tables[0][0][9],
                          country_name,
                          reactor_name]
		reactor_details_list.append(current_detail)
		message = "Reactor ID: (" + str(i) + ") details collected."
	else:
		message = "Reactor ID: (" + str(i) + ") did not return a valid page."
	print(message)
	time.sleep(0.3)
    

#### Creating dataframes from our collected lists

In [42]:
reactor_details_df = pd.DataFrame(reactor_details_list, columns=[
	"reactor_Type", 
	'model', 
	'design_net_capacity', 
	'construction_start_date', 
	"commercial_operation_date", 
	"owner", 
	"operator", 
	"Shutdown Date",
	"country", 
	"reactor_name"])

main_tables_df = pd.DataFrame(main_tables_list, columns = [
	"Year", 
	"Electricity Supplied [GW.h]", 
	"Reference Unit Power [MW]", 
	"Annual Time On Line [h]", 
	"Operation Factor [%]", 
	"Annual Energy Availability Factor [%]", 
	"Cumulative Energy Availability Factor [%]", 
	"Annual Load Factor [%]", 
	"Cumulative Load Factor [%]", 
	"country", 
	"reactor_name"])

### Data cleanup

In [43]:
#Clean up string values, empty values and NaN in all columns that should have only numerical data
for col in main_tables_df.columns:
    if "[" in col:
        main_tables_df[col] = (pd.to_numeric(main_tables_df[col],errors='coerce').fillna(0))

for col in reactor_details_df.columns:
    reactor_details_df[col] = reactor_details_df[col].fillna(0)

#Keep only numeric values for capacity
reactor_details_df["design_net_capacity"] = reactor_details_df.design_net_capacity.str.extract('(\d+)')


### Adding columns

In [44]:
# Adding reactur status based on available commercial operation date, or shutdown date
reactor_details_df["Status"] = np.where(reactor_details_df["Shutdown Date"]!= 0, "Permanent shutdown",
                            np.where(reactor_details_df["commercial_operation_date"]== 0, "Under Construction", "Operational"))
main_tables_df["Calendar_Date"] = pd.to_datetime(dict(year=main_tables_df['Year'], month=1, day=1)).dt.date

#### Saving dataframes to .csv files for further use. Available in git repo.

In [51]:
main_tables_df.to_csv(path_or_buf = "main_tables_output.csv", index=False)
reactor_details_df.to_csv(path_or_buf = "reactor_details_output.csv", index=False)

#load df from csv files if necessary
#main_tables_df = pd.read_csv("main_tables_output.csv", sep=';')
#reactor_details_df = pd.read_csv("reactor_details_output.csv", sep=';')

#### Some additional data formating was performed in Excel, to align to other data sources, then the data was imported to Tableau.


#### Unfortunately I could not get the sizing quite right to fit the dimensions of a jupyter notebook, viewing the dashboard in full-screen mode is suggested.

In case the below cell does not render, here is the link to the dashboard: https://public.tableau.com/app/profile/peter.oravecz/viz/Nuclearpower/Dashboard1

In [8]:
%%HTML
<div class='tableauPlaceholder' id='viz1632432476878' style='position: relative'><noscript><a href='#'><img alt='Dashboard 1 ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Nu&#47;Nuclearpower&#47;Dashboard1&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='Nuclearpower&#47;Dashboard1' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Nu&#47;Nuclearpower&#47;Dashboard1&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1632432476878');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else { vizElement.style.width='100%';vizElement.style.height='1577px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>