# 1. Context, Objectives and Research Questions

--------------------

## Project Objective
Conduct an in-depth exploratory analysis and apply data science techniques to understand the relationship between oil dependency and Venezuela's economic performance over 64 years (1960-2024).

##  Main Research Questions

- How has Venezuela's GDP evolved over 64 years?

- What is the relationship between oil revenue and economic growth?

- Does Venezuela suffer from the "resource curse"?

- Which historical events most impacted the economy?



## Setup & Preparation

In [2]:
pip install pandas numpy matplotlib seaborn statsmodels scipy scikit-learn ruptur plotly

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement ruptur (from versions: none)

[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: No matching distribution found for ruptur


## Load Dataset

Data obtained from Kaggle dataset: https://www.kaggle.com/datasets/ibrahimqasimi/venezuela-resource-dependency-and-economy1960-2023

In [None]:
import pandas as pd
df = pd.read_csv('/kaggle/input/venezuela-resource-dependency-and-economy1960-2023/venezuela_wdi_indicators.csv') #load the file
df #show the dataframe

In [4]:
df.columns

Index(['country_iso3', 'year', 'oil_rents_pct_gdp',
       'total_natural_resource_rents_pct_gdp',
       'fuel_exports_pct_merch_exports',
       'ores_and_metals_exports_pct_merch_exports', 'gdp_current_usd',
       'gdp_growth_pct'],
      dtype='object')

country_iso3 -> Country, in this case Venezuela

year -> year of analysis

oil_rents_pct_gdp -> how much oil represents of GDP (%)

total_natural_resource_rents_pct_gdp -> total weight of natural resources in GDP (%)

fuel_exports_pct_merch_exports -> percentage of fuels in exports

ores_and_metals_exports_pct_merch_exports -> percentage of ores and metals in exports

gdp_current_usd -> GDP in current US dollars

gdp_growth_pct -> GDP growth in the year (%)

In [None]:
df.head(30) #first 30 rows

Immediately, we observe the presence of several null values, indicating the need to perform data treatment before analysis and preparation of the prediction model.

- First, I will analyze how many null data exist in each column:

In [6]:
df.isna().sum() 

country_iso3                                  0
year                                          0
oil_rents_pct_gdp                            20
total_natural_resource_rents_pct_gdp         20
fuel_exports_pct_merch_exports               16
ores_and_metals_exports_pct_merch_exports    16
gdp_current_usd                               0
gdp_growth_pct                                1
dtype: int64

- We see a very large amount of null data in the main columns we will analyze, those that have oil and metals data.

As these are relevant variables for the analysis and there are no records available for the years prior to 1970, I chose to exclude data prior to this period, maintaining the analysis restricted to years from 1970 onwards, in which the information is complete.

In [None]:
df = df[df['year'] >= 1970].reset_index(drop=True) #Keep data from 1970 onwards
df.head()


In [8]:
df

Unnamed: 0,country_iso3,year,oil_rents_pct_gdp,total_natural_resource_rents_pct_gdp,fuel_exports_pct_merch_exports,ores_and_metals_exports_pct_merch_exports,gdp_current_usd,gdp_growth_pct
0,VEN,1970,4.799272,5.487721,91.015971,5.807845,11561110000.0,7.711914
1,VEN,1971,6.787192,7.387941,91.523845,5.503732,12986590000.0,1.479291
2,VEN,1972,6.691746,7.192719,90.574042,5.032713,13977730000.0,1.282805
3,VEN,1973,10.286674,10.755443,93.089806,4.354791,17035580000.0,7.109958
4,VEN,1974,33.340228,33.947884,95.130787,2.873486,26100930000.0,2.069333
5,VEN,1975,24.023362,24.891501,94.640486,3.359402,27464650000.0,2.896258
6,VEN,1976,22.15094,22.993151,93.827282,3.631743,31419530000.0,7.72774
7,VEN,1977,15.743835,16.198534,92.611057,4.937538,36210700000.0,6.270784
8,VEN,1978,15.461522,15.8264,94.467874,2.887393,39316280000.0,2.346896
9,VEN,1979,35.900591,36.481381,92.763317,4.923059,48310930000.0,0.764355


- Now I will check if there are still null data after the 1970 filter:

In [None]:
# Check null data after 1970 filter
print("Null values per column after 1970 filter:")
print(df.isna().sum())
print("\nPercentage of null values:")
print(round((df.isna().sum() / len(df)) * 100, 2))

# Identify which years have missing data in critical columns
print("\n--- Years with missing oil data ---")
years_missing_oil = df[df['oil_rents_pct_gdp'].isna()]['year'].tolist()
if years_missing_oil:
    print(f"Years without 'oil_rents_pct_gdp' data: {years_missing_oil}")
else:
    print("All years have oil data!")

- It is observed that there are still null values after 1970, mainly in the most recent years (2015-2024). For correlation analysis and predictive models, it is necessary to treat this data. I will use linear interpolation to fill in the missing values, as this technique is appropriate for economic time series.

In [None]:
# Handling null values using linear interpolation
df_interpolated = df.copy()

# Interpolate columns with null values
cols_to_interpolate = ['oil_rents_pct_gdp', 'total_natural_resource_rents_pct_gdp', 
                      'fuel_exports_pct_merch_exports', 'ores_and_metals_exports_pct_merch_exports']

for col in cols_to_interpolate:
    df_interpolated[col] = df_interpolated[col].interpolate(method='linear')

# Check if there are still nulls
print("Null values after interpolation:")
print(df_interpolated.isna().sum())

# Update main dataframe
df = df_interpolated.copy()

- With the interpolated data, I can now proceed with exploratory analysis without the risk of bias caused by missing values.

# 2. Exploratory Data Analysis (EDA)

--------------------

## Descriptive Statistics

In [None]:
df.describe()# Complete descriptive statistics

## Correlation Analysis

In [None]:
# Correlation matrix - selecting only numeric columns (excluding country_iso3)
correlation_matrix = df.select_dtypes(include=['float64', 'int64']).corr()
print(correlation_matrix)

In [None]:
# Visualization of correlation matrix with heatmap
import matplotlib.pyplot as plt
import seaborn as sns

# Translate column names to English
english_names = {
    'year': 'Year',
    'oil_rents_pct_gdp': 'Oil Rents (% GDP)',
    'total_natural_resource_rents_pct_gdp': 'Natural Resources (% GDP)',
    'fuel_exports_pct_merch_exports': 'Fuel Exports (%)',
    'ores_and_metals_exports_pct_merch_exports': 'Metals Exports (%)',
    'gdp_current_usd': 'GDP (USD)',
    'gdp_growth_pct': 'GDP Growth (%)'
}

# Rename correlation matrix columns
correlation_matrix_en = correlation_matrix.rename(columns=english_names, index=english_names)

plt.figure(figsize=(12, 10))
sns.heatmap(correlation_matrix_en, annot=True, cmap='coolwarm', center=0,
            fmt='.2f', square=True, linewidths=1, cbar_kws={'label': 'Correlation'})
plt.title('Correlation Matrix - Venezuela Economic Indicators (1970-2024)', fontsize=14)
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

The matrix reveals a Venezuelan economy extremely dependent on oil (perfect 1.00 correlation with natural resources), but paradoxically did not convert this wealth into sustainable growth (weak 0.21 correlation between oil revenue and GDP growth). Time shows economic deterioration, with strong decline in metal exports (-0.70) and growth deceleration (-0.32). The negative correlation between fuel and metal exports (-0.69) evidences lack of diversification, characterizing a vulnerable economy concentrated in a single sector that did not generate robust economic development over 54 years.

## Time Series Visualizations

In [None]:
# GDP evolution over time
plt.figure(figsize=(14, 6))
plt.plot(df['year'], df['gdp_current_usd'], marker='o', linewidth=4, markersize=4)
plt.title('Venezuela GDP Evolution (1970-2024)', fontsize=14, fontweight='bold')
plt.xlabel('Year', fontsize=12)
plt.ylabel('GDP (USD)', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

The graph reveals three distinct phases of the Venezuelan economy: modest and stable growth from 1970-2000 (from ~$10 to ~$120 billion USD), an explosive boom between 2003-2013 driven by high oil prices (reaching peak of ~$390 billion USD), followed by a catastrophic collapse post-2013 that reduced GDP by approximately 75% until 2020 (~$45 billion USD). The trajectory evidences extreme volatility and oil dependency, with recent modest recovery still leaving GDP well below historical levels, demonstrating an economy that experienced both bonanza and one of the worst economic crises in modern history.

In [None]:
# Evolution of oil dependency
plt.figure(figsize=(14, 6))
plt.plot(df['year'], df['oil_rents_pct_gdp'], marker='o', color='orange', linewidth=4, markersize=4)
plt.title('Oil Dependency (% of GDP) - Venezuela (1970-2024)', fontsize=14, fontweight='bold')
plt.xlabel('Year', fontsize=12)
plt.ylabel('Oil Rents (% of GDP)', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

The graph demonstrates extreme volatility in Venezuelan oil dependency, oscillating between 5% and 36% of GDP over 54 years, with notable peaks during oil shocks of 1974-1975 (~33%), 1980-1981 (~36%), and during the Chávez era in the 2000s (~28%). Historical average dependency sits between 15-25% of GDP, but presents dramatic falls during price crises (such as 1986 and 1998, dropping to ~8-10%) and rises during oil booms. After 2014, no more data is available in the dataset, but the graph until then shows stabilization around 11-12%, possibly reflecting the beginning of the crisis that would lead to subsequent economic collapse. High volatility over decades evidences structural economic fragility.

In [None]:
# Relationship between oil and economic growth (Scatter plot)
plt.figure(figsize=(10, 6))
plt.scatter(df['oil_rents_pct_gdp'], df['gdp_growth_pct'], alpha=0.6, s=100, edgecolors='black')
plt.xlabel('Oil Rents (% of GDP)', fontsize=12)
plt.ylabel('GDP Growth (%)', fontsize=12)
plt.title('Relationship between Oil Dependency and Economic Growth', fontsize=14, fontweight='bold')
plt.axhline(y=0, color='red', linestyle='--', alpha=0.5)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

The scatter plot confirms weak correlation (+0.21) between oil dependency and economic growth, evidencing that greater oil revenue does not guarantee higher GDP growth in Venezuela. The highly dispersed point cloud shows that both exceptional growth (18%) and catastrophic collapses (-30%) occur independently of oil dependency level, suggesting that factors such as macroeconomic management, public policies, political instability, and external shocks have much more determinant impact than the simple magnitude of oil revenue.

# 3. Answering Research Questions

--------------------

Based on the exploratory analysis performed, I will answer the main questions proposed at the beginning of the project:

##  Main Research Questions

- How has Venezuela's GDP evolved over 64 years?

- What is the relationship between oil revenue and economic growth?

- Does Venezuela suffer from the "resource curse"?

- Which historical events most impacted the economy?



##  How has Venezuela's GDP evolved over 64 years?


- Venezuela's GDP, according to the graphs, presents an extremely volatile trajectory marked by natural resource dependency. Between 1970 and 2003, the economy maintained gradual growth, but the scenario changed drastically with the commodities "boom", leading GDP to a historical peak near $400 billion in 2012. However, the strong 1.00 correlation between oil revenue and natural resources exposed the model's fragility, resulting in severe collapse from 2014. The scatter plot confirms this crisis with negative growth rates exceeding 20% in several periods. Recently, from 2021, a slight nominal recovery is observed, although current level is still comparable to two decades ago.

##  What is the relationship between oil revenue and economic growth?


- The relationship between oil revenue and growth is marked by extreme volatility and weak positive correlation of 0.21. The scatter plot reveals that similar dependency levels result in both highs and drops of up to -30%. This indicates oil does not guarantee stability, leaving the economy vulnerable to external shocks and deep crises. Even with significant revenues, many years registered negative growth, evidencing a strong dependency trap. Thus, the resource acts more as a generator of instability cycles than as an engine of constant progress.

## Does Venezuela suffer from the "resource curse"?


The scatter plot proves this fragility: without diversification, any drop in barrel price generates brutal collapses of up to -30% in GDP. Thus, mineral wealth, instead of development, fueled a cycle of unpayable spending and chronic instability.

##  Which historical events most impacted the economy?

In [None]:
import matplotlib.pyplot as plt

# Defining eras based on historical milestones
eras = [
    (1970, 1983, 'Expansion and Nationalization', '#2ecc71'),
    (1983, 1998, 'Debt Crisis and Instability', '#e67e22'),
    (1999, 2013, 'Commodities Boom (Chavismo)', '#3498db'),
    (2014, 2024, 'Economic Collapse and Sanctions', '#e74c3c')
]

plt.figure(figsize=(14, 7))

# Main line plotting
plt.plot(df['year'], df['gdp_current_usd'], color='#2c3e50', linewidth=3, zorder=5)

# Era shading
for start, end, name, color in eras:
    plt.axvspan(start, end, alpha=0.2, color=color, label=name)

plt.title('Venezuela GDP Evolution by Historical Eras', fontsize=15, fontweight='bold')
plt.xlabel('Year', fontsize=12)
plt.ylabel('GDP (USD Billions)', fontsize=12)
plt.legend(loc='upper left', bbox_to_anchor=(1, 1))
plt.grid(True, linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()

The most significant event was the Commodities Supercycle (2004-2013), which led GDP to historical peak driven by record barrel prices. However, Hugo Chávez's death and oil price drop (2014) triggered unprecedented collapse, aggravated by expropriations and mismanagement. Finally, international sanctions and hyperinflation deepened the fall until 2020. Currently, the country attempts slight stabilization after losing almost 80% of its nominal wealth.