# Colectarea datelor - US Energy Information Administration (EIA)

- Pentru colectarea datelor EIA, trebuie sa aveti cheia API : https://www.eia.gov/opendata/

- indicatorii și exemplul de URL și parametrii pentru API : https://www.eia.gov/opendata/browser/international?frequency=annual&data=value;&facets=productId;&productId=31;32;&sortColumn=period;&sortDirection=desc;



In [1]:
import requests
import json
import pandas as pd
import time
from dotenv import load_dotenv
import os


In [None]:
# incarcarea cheii API din fisierul .env
load_dotenv()
api_key = os.getenv("EIA_API_KEY")
# sau puteti folosi direct: api_key = "your_api_key_here"
print(api_key)

### Selectam indicatorii

A. Mix energetic – PRODUCȚIE ELECTRICĂ

- Electricity (total electricity generation),  
- Coal (electricity from coal),
- Natural gas
- Oil
- Nuclear
- Hydroelectricity
- Wind
- Solar
- Geothermal
- Biomass and waste
- Non-hydroelectric renewables
- Renewables

B. Primary Energy
- Primary energy
- Fossil fuels
- Renewables and other

C. Emisii
- CO2 emissions

D. Altele utile pentru decoupling
- Energy intensity (este direct din EIA)
- Gross domestic product (EIA deja îl dă!)
- Population

## Colectam datele 

In [None]:
# ========================
# PRODUSE SELECTATE PENTRU CURS 
# ========================

# 1. Electricity (mix electric)  → productId = 2
# 2. Coal (combustibil fosil principal → productId = 30   # varianta principală și stabilă
# 3. Natural gas  → productId = 31
# 4. Oil  → productId = 32
# 5. Renewables (toate regenerabilele la un loc)  → productId = 29
# 6. Solar  → productId = 116
# 7. Wind  → productId = 37
# 8. Hydroelectricity  → productId = 33
# 9. Nuclear  → productId = 27
# 10. Primary energy (energia totală a țării)  → productId = 44
# 11. CO2 emissions  → productId = 4008
# 12. Energy intensity   → productId = 47
# 13. GDP  → productId = 4701
# 14. Population → productId = 4702


## EXEMPLU MINIMAL — doar 1 produs (Electricity, productId = 2)

In [3]:
# definiți parametrii pentru cererea API

API = api_key
URL = "https://api.eia.gov/v2/international/data/"

params = {
    "api_key": api_key,
    "frequency": "annual",
    "data[0]": "value",
    "start": "2000",
    "end": "2025",
    "facets[productId][]": "2",    # EXACT CA ÎN BROWSER
    "sort[0][column]": "period",
    "sort[0][direction]": "desc",
    "offset": 0,
    "length": 5000
}

# faceți cererea API și creați DataFrame
r = requests.get(URL, params=params)
df = pd.DataFrame(r.json()["response"]["data"])

df.head()


Unnamed: 0,period,productId,productName,activityId,activityName,countryRegionId,countryRegionName,countryRegionTypeId,countryRegionTypeName,dataFlagId,dataFlagDescription,unitName,value,unit
0,2023,2,Electricity,2,Consumption,ABW,Aruba,c,Country,,,billion kilowatthours,0.824035908,BKWH
1,2023,2,Electricity,2,Consumption,AFG,Afghanistan,c,Country,,,billion kilowatthours,6.46823218,BKWH
2,2023,2,Electricity,2,Consumption,AGO,Angola,c,Country,,,billion kilowatthours,16.213833931,BKWH
3,2023,2,Electricity,2,Consumption,ALB,Albania,c,Country,,,billion kilowatthours,7.489578147,BKWH
4,2023,2,Electricity,2,Consumption,ARE,United Arab Emirates,c,Country,,,billion kilowatthours,157.973627617,BKWH


In [22]:
df.period.unique()

array(['2023', '2022'], dtype=object)

## Funcție pentru descărcarea tuturor datelor pentru proiect

In [36]:
import requests
import pandas as pd

API = api_key
URL = "https://api.eia.gov/v2/international/data/"

def get_eia_product(pid):
    rows = []
    offset = 0

    while True:
        params = {
            "api_key": API,
            "frequency": "annual",
            "data[0]": "value",
            "start": "2000",
            "end": "2025",
            "facets[productId][]": pid,
            "sort[0][column]": "period",
            "sort[0][direction]": "desc",
            "offset": offset,
            "length": 5000
        }

        res = requests.get(URL, params=params).json()
        data = res["response"]["data"]

        rows.extend(data)

        if len(data) < 5000:
            break

        offset += 5000

    return pd.DataFrame(rows)



## Colectăm toti indicatorii

In [None]:
df_list = []
PRODUCT_IDS = [2, 30, 31, 32, 29, 37, 116, 33, 27, 44, 4008, 47, 4701, 4702] # lista de productId-uri

for pid in PRODUCT_IDS:
    df_list.append(get_eia_product(pid))

df = pd.concat(df_list, ignore_index=True)


In [None]:
# acelasi lucru dar cu tqdm pentru progress bar
#%pip install tqdm - daca nu aveti tqdm instalat
from tqdm import tqdm

df_list = []
PRODUCT_IDS = [2, 30, 31, 32, 29, 37, 116, 33, 27, 44, 4008, 47, 4701, 4702]

for pid in tqdm(PRODUCT_IDS, desc="Downloading EIA products"):
    df_list.append(get_eia_product(pid))

df = pd.concat(df_list, ignore_index=True)

Downloading EIA products: 100%|██████████| 14/14 [03:17<00:00, 14.08s/it]


## Explorarea datelor

In [41]:
print(df.shape)
df.head()

(332324, 14)


Unnamed: 0,period,productId,productName,activityId,activityName,countryRegionId,countryRegionName,countryRegionTypeId,countryRegionTypeName,dataFlagId,dataFlagDescription,unitName,value,unit
0,2023,2,Electricity,2,Consumption,ABW,Aruba,c,Country,,,billion kilowatthours,0.824035908,BKWH
1,2023,2,Electricity,2,Consumption,AFG,Afghanistan,c,Country,,,billion kilowatthours,6.46823218,BKWH
2,2023,2,Electricity,2,Consumption,AGO,Angola,c,Country,,,billion kilowatthours,16.213833931,BKWH
3,2023,2,Electricity,2,Consumption,ALB,Albania,c,Country,,,billion kilowatthours,7.489578147,BKWH
4,2023,2,Electricity,2,Consumption,ARE,United Arab Emirates,c,Country,,,billion kilowatthours,157.973627617,BKWH


In [43]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 332324 entries, 0 to 332323
Data columns (total 14 columns):
 #   Column                 Non-Null Count   Dtype 
---  ------                 --------------   ----- 
 0   period                 332324 non-null  object
 1   productId              332324 non-null  object
 2   productName            332324 non-null  object
 3   activityId             332324 non-null  object
 4   activityName           332324 non-null  object
 5   countryRegionId        332324 non-null  object
 6   countryRegionName      332324 non-null  object
 7   countryRegionTypeId    332324 non-null  object
 8   countryRegionTypeName  332324 non-null  object
 9   dataFlagId             54576 non-null   object
 10  dataFlagDescription    54576 non-null   object
 11  unitName               332324 non-null  object
 12  value                  332324 non-null  object
 13  unit                   332324 non-null  object
dtypes: object(14)
memory usage: 35.5+ MB


In [42]:
#Distribuția activităților (production / consumption / imports / generation etc.)
df['activityName'].value_counts()



activityName
Generation             144771
Capacity                36112
Consumption             24412
Imports                 24100
Exports                 23740
Net imports             23672
Production              18426
Population              12452
GDP                     12440
Emissions                6161
Distribution losses      6038
Name: count, dtype: int64

In [None]:
# group by productName and get descriptive statistics for 'value'
df.groupby("productName")['value'].describe()


Unnamed: 0_level_0,count,unique,top,freq
productName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CO2 emissions,6161,4938,--,202
Coal,6037,2151,0,3110
Electricity,95501,41646,0,29745
Energy intensity,12186,9917,0,1082
Gross domestic product,6347,5337,,493
Hydroelectricity,30185,15461,0,6857
Natural gas,6037,2883,0,2386
Nuclear,30185,4697,0,21640
Oil,6036,3773,0,336
Population,6359,5355,,487


In [46]:
len(df.countryRegionName.unique())

260

In [None]:
# anii unici disponibili
df.period.unique()

array(['2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016',
       '2015', '2014', '2013', '2012', '2011', '2010', '2009', '2008',
       '2007', '2006', '2005', '2004', '2003', '2002', '2001', '2024'],
      dtype=object)

In [None]:
# columnele disponibile
df.columns

Index(['period', 'productId', 'productName', 'activityId', 'activityName',
       'countryRegionId', 'countryRegionName', 'countryRegionTypeId',
       'countryRegionTypeName', 'dataFlagId', 'dataFlagDescription',
       'unitName', 'value', 'unit'],
      dtype='object')

In [None]:
# product map: productId, productName, activityName
product_map = df[['productId', 'productName',"activityName"]].drop_duplicates().sort_values('productId')
product_map

Unnamed: 0,productId,productName,activityName
174131,116,Solar,Generation
173868,116,Solar,Capacity
0,2,Electricity,Consumption
262,2,Electricity,Imports
1266,2,Electricity,Exports
1910,2,Electricity,Capacity
2170,2,Electricity,Distribution losses
2434,2,Electricity,Generation
2695,2,Electricity,Net imports
234497,27,Nuclear,Generation
