# 📊 Ukrainian NPPs: Open Data Analysis 

## About

This project explores open datasets on radioactive emissions and discharges from  
nuclear power plants (NPPs) in Ukraine. Using Python, Pandas, and Plotly, we  
analyze environmental indicators across several stations and visualize quarterly  
changes over time.

### Goal  

better understand trends in emission levels and present the data in an  
accessible, interactive format.  
All data is retrieved from publicly available government sources.

_**(For more information look at a README.md)**_

## Plan of Book

1. [Work with API](#Work-with-API)
2. [Research dataset metadata](#Research-dataset-metadata)
3. [Dataset passport overview](#Dataset-passport-overview)
4. [Dataset acquisition and analysis](#Dataset-acquisition-and-analysis)
    * [Format data](#Format-data)
    * [Stations](#Stations)
6. [Dataset visualization](#Dataset-visualization)
7. [Summary](#Summary)

## Work with API  

Create variables according to the API documentation for loading dataset metadata  
and exploring them. Takes dataset_id on a dataset page.

In [1]:
dataset_id = "4a9d3d56-bd95-4c3e-97e7-1cdc7bcbd445"

## Research dataset metadata

Download dataset metadata.

In [2]:
from common.client import DataGovUAClient

# This script fetches metadata for a specific dataset from the DataGovUA API.
sesion = DataGovUAClient(dataset_id)
sesion.fetch_metadata()
sesion.BASE_URL + sesion.dataset_id

Metadata and resources is fetched.


'https://data.gov.ua/api/3/action/package_show?id=4a9d3d56-bd95-4c3e-97e7-1cdc7bcbd445'

You can reseach metadata with call `sesion.metadata`

In [3]:
# sesion.metadata

We adds an identifier for the dataset passport and dataset. We see them in metadata.  
For more details, look carefully at the `metadata`:

In [4]:
dataset_id = "d55eebcf-4660-4919-96b3-4894be5a6cda"
passport_id = "afa0c772-2554-4b9a-98b4-980e54b1e21a"

Next we take a links on resources from metadata via `id`:

In [5]:
dataset_url = sesion.get_resource_url(resource_id=dataset_id)
passport_url = sesion.get_resource_url(resource_id=passport_id)

print(dataset_url)
print(passport_url)

https://data.gov.ua/dataset/c445c6ea-f0c3-4167-abb1-5afb4a0e5499/resource/d55eebcf-4660-4919-96b3-4894be5a6cda/download/nuclear_safety_q1_2025.xlsx
https://data.gov.ua/dataset/c445c6ea-f0c3-4167-abb1-5afb4a0e5499/resource/afa0c772-2554-4b9a-98b4-980e54b1e21a/download/pasport-naboru-danikh.xlsx


## Dataset passport overview

Now we have two different dataset: it's passport and dataset with quarterly  
observations. First of all, let's see at a pasport  of dataset. Let's put  
it in a tabular form.

In [6]:
passport_df = sesion.load_dataframe(resource_url=passport_url)

# Customizes table styles.
columns_rename = list(passport_df.columns[:-1]) + ["Примітка"]
passport_df.columns = columns_rename

(
    passport_df.fillna("").style.set_table_styles(
        [
            {"selector": "th", "props": [("text-align", "center")]},
            {
                "selector": "td",
                "props": [
                    ("white-space", "pre-wrap"),
                    ("text-align", "left"),
                    ("width", "300px"),
                ],
            },
        ]
    )
)

Unnamed: 0,Назва набору,Екологічна та радіаційна обстановка в зоні розташування атомних електростанцій,Примітка
0,Формати файлів,"xlsx, csv",
1,Шаблон назв файлів,nuclear_safety_QN_РРРР,
2,Ключові слова,"енергоатом, викид, викиди, скид, скиди, радіоактивність, цезій, кобальт, аес, атомна станція, атомні станції",
3,Періодичність оприлюднення,"Щоквартально, до 25 числа місяця, наступного за звітнім періодом",
4,Додаткові уточнення,"У випадку відсутності інформації або наявності значень нижчих за мінімальну активність, що може бути виміряною, залишати комірку пустою",
5,,,
6,Структура набору:,,
7,Назва поля,Переклад на українську,Опис
8,year,рік,Формат РРРР
9,quarter,квартал,"Число від 1 до 4, де ""1"" означає період січень-березень ""2"" - квітень-червень ""3"" - липень-вересень ""4"" - жовтень-грудень"


This passport show a metric keys that show us evaluation parameters of  
radioactive emissions. 

Let`s create hash table of metrics whot use us in create a graphs.  
He will included the descriptions  and titles on English and Ukraine.

In [7]:
from common.translate_text import get_translation

# Convert the DataFrame to a dictionary with the first column as keys
metric_mapping = {}
for _, row in passport_df.iloc[11:, :].iterrows():
    key = row.iloc[0].replace(" ", "")
    title_en = row.iloc[0].replace("_", " ").replace("  ", " ").title()
    title_ua = row.iloc[1]
    description_ua = row.iloc[2]
    description_en = await get_translation(description_ua)

    metric_mapping[key] = {
        "title_en": title_en,
        "title_ua": title_ua,
        "description_en": description_en,
        "description_ua": description_ua,
    }

metric_mapping

{'Irg': {'title_en': 'Irg',
  'title_ua': 'ІРГ',
  'description_en': 'Number, GBK/day\nThe average daily value of radioactivity of gas-aerosol emissions of inert radioactive gases into the environment',
  'description_ua': 'Число, ГБк/добу\nСередньодобове значення радіоактивності газо-аерозольних викидів інертних радіоактивних газів у навколишнє середовище'},
 'irg_index': {'title_en': 'Irg Index',
  'title_ua': 'Індекс викидів ІРГ',
  'description_en': 'Number, %\nPercentage of permissible daily emissions of inert radioactive gases into the environment',
  'description_ua': 'Число, %\nВідсоток від допустимого добового рівня викидів інертних радіоактивних газів у навколишнє середовище'},
 'iodine_radionuclides': {'title_en': 'Iodine Radionuclides',
  'title_ua': 'Радіонуклідів йоду',
  'description_en': 'Number, KBK/day\nThe average daily value of radioactivity of gas-aerosol emissions of iodine radionuclides into the environment',
  'description_ua': 'Число, кБк/добу\nСередньодобове з

For analysis and detailing, we will select indicators under the following 
indices metrics:

In [8]:
metrics = [
    "iodine_ radionuclides",
    "iodine_ radionuclides_index",
    "stable_radionuclides",
    "cs_137_emission",
    "co_60_ emission",
    "cs_137_dump",
    "co_60_dump",
    "index_radioactive_releas",
    "index_dump",
]

## Dataset acquisition and analysis

First, let's download the dataset from the link.

In [9]:
dataset_df = sesion.load_dataframe(resource_url=dataset_url)
dataset_df.head()

Unnamed: 0,year,quarter,station,irg,irg_index,iodine_ radionuclides,iodine_ radionuclides_index,stable_radionuclides,stable_ radionuclides_index,cs_137_emission,co_60_ emission,cs_137_dump,co_60_dump,volume,index_radioactive_releas,index_dump
0,2018,1,ЗАЕС,89.0,0.13,260.0,"<0,01",650.0,0.03,1980.0,1020.0,4330.0,3670.0,833000.0,0.149,0.33
1,2018,1,РАЕС,105.0,0.16,147.0,"<0,01",269.0,0.07,587.0,165.0,4800.0,620.0,2220000.0,0.78,0.096
2,2018,1,ЮУАЕС,45.0,0.1,76.0,"<0,01",116.0,0.02,136.0,373.0,390.0,370.0,14600.0,0.136,0.284
3,2018,1,ХАЕС,31.0,0.07,26.8,"<0,01",37.5,"<0,01",29.4,13.8,380.0,,22070.0,0.11,0.03
4,2018,2,ЗАЕС,84.0,0.12,262.0,"<0,01",640.0,0.03,453.0,1003.0,4627.0,3432.0,812667.0,0.115,0.91


### Format data
We will format the quarter and year data as "DD.MM.YYYY".  
This will be needed for visualization.

In [10]:
from common.utils import get_date

dataset_df["date"] = dataset_df.apply(
    lambda row: get_date(year=row["year"], quarter=row["quarter"]), axis=1
)
dataset_df[["year", "quarter", "date"]].head()

Unnamed: 0,year,quarter,date
0,2018,1,31.03.2018
1,2018,1,31.03.2018
2,2018,1,31.03.2018
3,2018,1,31.03.2018
4,2018,2,30.06.2018


### Stations

Let's list all nuclear power plants.

In [11]:
stations = dataset_df["station"].unique()
tuple(stations)

('ЗАЕС', 'РАЕС', 'ЮУАЕС', 'ХАЕС', 'ПАЕС')

In April 2022, Energoatom, by its order, approved the change of the name of the  
NPP from “South Ukrainian Nuclear Power Plant” to “South Ukrainian Nuclear Power  
Plant”. In accordance with the Resolution of the Cabinet of Ministers No. 1061  
of September 27, 2022, the government finally approved the renaming of the NPP  
from “South Ukrainian Nuclear Power Plant” to “South Ukrainian” 
[1](https://epravda.com.ua/news/2022/09/29/692034/).

Will make station names more readable.

In [12]:
station_map = {
    "ЗАЕС": "Zaporizhzhia NPP",
    "РАЕС": "Rivne NPP",
    "ПАЕС": "South Ukrainian NPP",
    "ХАЕС": "Khmelnytskyi NPP",
    "ЮУАЕС": "South Ukrainian NPP",
}

dataset_df["station_en"] = dataset_df["station"].map(station_map)
dataset_df[["station", "station_en"]].head()

Unnamed: 0,station,station_en
0,ЗАЕС,Zaporizhzhia NPP
1,РАЕС,Rivne NPP
2,ЮУАЕС,South Ukrainian NPP
3,ХАЕС,Khmelnytskyi NPP
4,ЗАЕС,Zaporizhzhia NPP


As you can see in the result, the replacement was successful.

## Dataset visualization 

Now, we will makes a simple resauch for see a "Index radioactive releas" its the  
number, % (in thousands) total radioactive emissions into the atmosphere for all  
stations.  

It's will give us a general understanding situation with emissions into in  
atmosphere.

In [13]:
import plotly.graph_objects as go

stations_en_list = sorted(set(station_map.values()))
index_radioactive_max = dataset_df["index_radioactive_releas"].max()

fig = go.Figure()

# Create a separate graph with data for each station.
traces = []
for _, station in enumerate(stations_en_list):
    df_station = dataset_df[dataset_df["station_en"] == station]

    trace = go.Scatter(
        x=df_station["date"],
        y=df_station["index_radioactive_releas"],
        name=station,
        visible=True,
    )
    traces.append(trace)
    fig.add_trace(trace)

# Customize buttons for selecting stations.
buttons = []

# Button for selecting all stations.
buttons.append(
    dict(
        label="All",
        method="update",
        args=[
            {"visible": [True] * len(traces)},
            {"title": "Radioactive Release — All Station"},
        ],
    )
)

# Buttons for selecting separately station.
for i, station in enumerate(stations_en_list):
    visibility = [False] * len(traces)
    visibility[i] = True
    buttons.append(
        dict(
            label=station,
            method="update",
            args=[
                {"visible": visibility},
                {"title": f"Radioactive Release — {station}"},
            ],
        )
    )

# Add a menu.
fig.update_layout(
    updatemenus=[
        dict(
            type="dropdown",
            active=0,
            buttons=buttons,
            xanchor="right",
            x=1,
            y=1.2,
        )
    ],
    title=f"Radioactive Release — {station}",
    xaxis_title="Date",
    xaxis=dict(tickangle=-45, showgrid=True, showline=True, domain=[0, 0.94]),
    yaxis_title="Release Index",
    yaxis=dict(range=[0, index_radioactive_max + 1]),
    template="plotly_dark",
    width=800,
    height=600,
    legend=dict(
        title="Station",
        orientation="v",
        x=1.02,
        y=1,
        xanchor="left",
        yanchor="top",
        bgcolor="rgba(0,0,0,0)",
        borderwidth=1,
        font=dict(
            size=12,
            color="white",
        ),
        itemwidth=40,
    ),
    margin=dict(r=150),
)

fig.show()

On South Ukrainian NPP graph we can see anomaly in total radioactive emissions  
into the atmosphere. So, in this set is a interesting data, that needed to deep  
visualizations and resauches.

In a next step get a total graph for all stations and most interesting metrics,  
for this set, in one place.



In [14]:
import textwrap

import pandas as pd
import plotly.graph_objects as go

# Make sure that the columns with metrics are numeric.
for metric in metrics:
    dataset_df[metric] = pd.to_numeric(dataset_df[metric], errors="coerce")

# Next, we group the data by stations.
stations_en_list = sorted(set(dataset_df["station_en"]))
station_data = {
    station: dataset_df[dataset_df["station_en"] == station]
    for station in stations_en_list
}

# Precalculate y-values ​​for each station and metric
y_values = {station: {} for station in stations_en_list}
for station in stations_en_list:
    for metric in metrics:
        y_values[station][metric] = station_data[station][metric].tolist()

# We define the initial metric (index_radioactive_releas).
if "index_radioactive_releas" in metrics:
    metrics.remove("index_radioactive_releas")
    metrics.insert(0, "index_radioactive_releas")

default_metric = metrics[0]

# Generate initial routes for each station by default (all show default_metric).
traces = []
for station in stations_en_list:
    trace = go.Scatter(
        x=station_data[station]["date"],
        y=y_values[station][default_metric],
        name=station,
        visible=True,
    )
    traces.append(trace)

# Create a figure with the initial traces.
fig = go.Figure(data=traces)

# Calculate the global maximum for the first metric
global_max = max(
    [
        (
            max(y_values[station][default_metric])
            if y_values[station][default_metric]
            else 0
        )
        for station in stations_en_list
    ]
)

# Creating buttons for filtering by stations
station_buttons = []

# "All stations" button for all stations
station_buttons.append(
    dict(
        label="All stations",
        method="update",
        args=[
            {"visible": [True] * len(traces)},
            {"title": f"{default_metric} — All Stations"},
        ],
    )
)

# Separate buttons for each station.
for i, station in enumerate(stations_en_list):
    visibility = [False] * len(traces)
    visibility[i] = True
    station_buttons.append(
        dict(
            label=station,
            method="update",
            args=[
                {"visible": visibility},
                {"title": f"{default_metric} — {station}"},
            ],
        )
    )

# Forming buttons for selecting metrics.
metric_buttons = []
for metric in metrics:
    # For each metric, prepare new y-data for each trace.
    new_y = [y_values[station][metric] for station in stations_en_list]

    # Calculate the global maximum to set the y-axis range
    global_max_metric = max(
        [
            max(y_values[station][metric]) if y_values[station][metric] else 0
            for station in stations_en_list
        ]
    )

    # Get the title and description from mapping.
    metric_clear = metric.replace(" ", "")
    metric_title = metric_mapping[metric_clear]["title_en"]
    description_text = metric_mapping[metric_clear]["description_en"]

    # Auto-wrap lines, for example, 35 characters each
    wrapped_description = "<br>".join(textwrap.wrap(description_text, width=35))

    metric_buttons.append(
        dict(
            label=metric_title,
            method="update",
            args=[
                {"y": new_y},
                {
                    "yaxis": {"range": [0, global_max_metric + 1]},
                    "title": {
                        "text": f"{metric_title} — All Stations",
                        "x": 0.5,
                        "xanchor": "center",
                        "font": {"size": 16},
                    },
                    "annotations": [
                        dict(
                            x=1.05,
                            y=0.40,
                            xref="paper",
                            yref="paper",
                            showarrow=False,
                            font=dict(size=12),
                            align="left",
                            xanchor="left",
                            yanchor="middle",
                            text=wrapped_description,
                        )
                    ],
                },
            ],
        )
    )


# Add two dropdown menus to the layout (one for stations, the other for metrics).
fig.update_layout(
    title={
        "text": f"{metric_title} — All Stations",
        "x": 0.5,
        "xanchor": "center",
        "font": {"size": 16},
    },
    updatemenus=[
        dict(
            buttons=station_buttons,
            direction="down",
            showactive=True,
            x=1.05,
            xanchor="left",
            y=1.0,
            yanchor="top",
            pad={"r": 10, "t": 10},
            active=0,
        ),
        dict(
            buttons=metric_buttons,
            direction="down",
            showactive=True,
            x=1.05,
            xanchor="left",
            y=0.9,
            yanchor="top",
            pad={"r": 10, "t": 10},
            active=0,
        ),
    ],
    xaxis_title="Date",
    xaxis=dict(tickangle=-45, showgrid=True, showline=True, domain=[0, 0.94]),
    yaxis_title="Value",
    template="plotly_dark",
    width=1000,
    height=600,
    margin=dict(r=150),
    # Place the legend on the left of the graph.
    legend=dict(
        orientation="v",
        x=1.05,
        y=0.75,
        xanchor="left",
        borderwidth=1,
        font=dict(
            size=12,
            color="white",
        ),
    ),
    annotations=[
        dict(
            x=1.05,
            y=0.40,
            xref="paper",
            yref="paper",
            showarrow=False,
            font=dict(size=12),
            align="left",
            xanchor="left",
            yanchor="middle",
            text=wrapped_description,
        )
    ],
)

fig.show()

This interactive graph show as stations makes a different emissions in  
atmosphere. 

## Summary  

So, in this book we:  
- taken a open dataset from govemental site  
- explore it with Python and Pandas  
- finding anomalies in a data  
- create a informative graph with key metrics  

Thank you for doing this open data research with me.  
Good luck!  