1. Download these data:
* https://www.cia.gov/the-world-factbook/field/reserves-of-foreign-exchange-and-gold/country-comparison/
* https://www.cia.gov/the-world-factbook/field/energy-consumption-per-capita/country-comparison/
* https://www.cia.gov/the-world-factbook/field/electricity/country-comparison/
* https://www.cia.gov/the-world-factbook/field/education-expenditures/country-comparison/

2. Clean and format the data frames as needed

3. Combine those dataframes into one. Decision: what to do to keep as many cases as possible.

4. Report the average, min and max for the variables (reserves, energy, electricity. education) by region.
5. Create new columns for each variable , so that they are in 5 intervals. Choose the same methods for the four variables. Two decisiones here.

6. Save the final result: country, region and the eight variables.

Rubric:
1. 10 points off if you do not use GitHub at all.
2. 5 points off if your code is not publiched using GitHub pages
3. 5 points of if your code is disorganized: should look like a tutorial, erase code not neeeded.
4. 3 points off for each decision poor decision.
5. 3 points off if README of repository does not explain what the repo is for.
6. 5 points off if final file NOT saved on GitHub
7. 3 points off for every variable not well formatted.
8. 3 points off if statisticas by region not computed.
9. 1 point off for each column still present which is not needed.

1. CARGAR LA DATA

In [1]:
# Establecemos las URLs directas a los archivos CSV alojados en el repositorio de GitHub
reservesLink = 'https://github.com/alonso-mendoza/PC3/raw/refs/heads/main/Reserves.csv'
energyLink = 'https://github.com/alonso-mendoza/PC3/raw/refs/heads/main/Energy.csv'
electricityLink = 'https://github.com/alonso-mendoza/PC3/raw/refs/heads/main/Electricity.csv'
educationLink = 'https://github.com/alonso-mendoza/PC3/raw/refs/heads/main/Education.csv'

# Cargamos cada archivo CSV desde su URL directamente en un DataFrame
import pandas as pd

reserves = pd.read_csv(reservesLink,header=0)
energy = pd.read_csv(energyLink,header=0)
electricity = pd.read_csv(electricityLink,header=0)
education = pd.read_csv(educationLink,header=0)

2. LIMPIEZA Y FORMATEO DE DATA

RESERVES

In [2]:
# Primero revisamos que información tenemos
reserves.info()

# Tenemos 6 columnas con ningún valor nulo

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   name                 195 non-null    object
 1   slug                 195 non-null    object
 2   value                195 non-null    object
 3   date_of_information  195 non-null    int64 
 4   ranking              195 non-null    int64 
 5   region               195 non-null    object
dtypes: int64(2), object(4)
memory usage: 9.3+ KB


In [3]:
# Revisamos las primeras filas
reserves.head()

# Notamos que la columna 'value' no es reconocida como un tipo númerico

Unnamed: 0,name,slug,value,date_of_information,ranking,region
0,China,china,"$3,265,000,000,000",2024,1,East and Southeast Asia
1,Japan,japan,"$1,160,000,000,000",2024,2,East and Southeast Asia
2,Switzerland,switzerland,"$822,130,000,000",2024,3,Europe
3,Russia,russia,"$597,217,000,000",2023,4,Central Asia
4,India,india,"$569,544,000,000",2024,5,South Asia


In [4]:
# Eliminamos las columnas innecesarias (índices 1, 3 y 4)
reserves = reserves.drop(reserves.columns[[1, 3, 4]], axis=1)

In [5]:
# Renombramos las columnas 'name' y 'value'
reserves.rename(columns={ reserves.columns[1]: "Reserves" }, inplace = True)
reserves.rename(columns={ reserves.columns[0]: "Country" }, inplace = True)

In [6]:
# Vamos a revisar que la data este completa
Filasvacias=reserves[reserves['Reserves'].isnull()]

display(Filasvacias)

# Significa que la data está completa

Unnamed: 0,Country,Reserves,region


In [7]:
!pip install unidecode



In [8]:
# Solo queremos datos del ASCII
from unidecode import unidecode

reserves['Country']=reserves.Country.apply(unidecode)
reserves['region']=reserves.region.apply(unidecode)

In [9]:
# Eliminamos elementos que no son números por espacios vacios
reserves['Reserves'] = reserves['Reserves'].str.replace(pat=r'[^\d.]', repl='', regex=True)

In [10]:
# Transformamos las variables a numéricas con pandas, ya que no se encontraron errores en la data
reserves['Reserves']=pd.to_numeric(reserves.Reserves)

In [11]:
# Revisamos la info del DataFrame luego de limpiar
reserves.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Country   195 non-null    object
 1   Reserves  195 non-null    int64 
 2   region    195 non-null    object
dtypes: int64(1), object(2)
memory usage: 4.7+ KB


In [12]:
# Vemos como ahora se ven las nuevas filas
reserves.head()

# Todo se ve correcto, continuamos con la limpieza de energy

Unnamed: 0,Country,Reserves,region
0,China,3265000000000,East and Southeast Asia
1,Japan,1160000000000,East and Southeast Asia
2,Switzerland,822130000000,Europe
3,Russia,597217000000,Central Asia
4,India,569544000000,South Asia


ENERGY

In [13]:
# Checkeamos que información tenemos
energy.info()

# Otra vez tenemos 6 columnas con ningún valor nulo

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   name                 195 non-null    object
 1   slug                 195 non-null    object
 2    Btu/person          195 non-null    object
 3   date_of_information  195 non-null    int64 
 4   ranking              195 non-null    int64 
 5   region               195 non-null    object
dtypes: int64(2), object(4)
memory usage: 9.3+ KB


In [14]:
# Revisamos las primeras filas
energy.head()

# El Btu/person no es reconocido como un valor numérico

Unnamed: 0,name,slug,Btu/person,date_of_information,ranking,region
0,Qatar,qatar,814308000,2023,1,Middle East
1,Singapore,singapore,643259000,2023,2,East and Southeast Asia
2,Bahrain,bahrain,554202000,2023,3,Middle East
3,United Arab Emirates,united-arab-emirates,450432000,2023,4,Middle East
4,Brunei,brunei,403365000,2023,5,East and Southeast Asia


In [15]:
# Realizamos un proceso similar a la data anterior
energy = energy.drop(energy.columns[[1,3,4]], axis=1)

In [16]:
# Renombramos las columnas 'name' y 'Btu/person'
energy.rename(columns={ energy.columns[0]: "Country" }, inplace = True)
energy.rename(columns={ energy.columns[1]: "Energy" }, inplace = True)

In [17]:
# Revisamos que la data esté completa
Filasvacias=energy[energy['Energy'].isnull()]

display(Filasvacias)

# Significa que la data está completa

Unnamed: 0,Country,Energy,region


In [18]:
# Nos quedamos con los datos del ASCII
energy['Country']=energy.Country.apply(unidecode)
energy['region']=energy.region.apply(unidecode)

In [19]:
# Eliminamos elementos que no son números por espacios vacios
energy['Energy'] = energy['Energy'].str.replace(pat=r'[^\d.]', repl='', regex=True)

In [20]:
# Transformamos las variables a numéricas
energy['Energy']=pd.to_numeric(energy.Energy)

In [21]:
# Revisamos como quedó
energy.info()

# Todo se ve correcto, continuamos con la limpieza de electricity

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Country  195 non-null    object
 1   Energy   195 non-null    int64 
 2   region   195 non-null    object
dtypes: int64(1), object(2)
memory usage: 4.7+ KB


ELECTRICITY

In [22]:
# Revisamos las primeras filas
electricity.head()

Unnamed: 0,name,slug,kW,date_of_information,ranking,region
0,China,china,2949000000,2023,1,East and Southeast Asia
1,United States,united-states,1235000000,2023,2,North America
2,India,india,499136000,2023,3,South Asia
3,Japan,japan,361617000,2023,4,East and Southeast Asia
4,Russia,russia,301926000,2023,5,Central Asia


In [23]:
# Eliminamos las columnas que no necesitamos
electricity = electricity.drop(electricity.columns[[1, 3, 4]], axis=1)

In [24]:
# Renombramos las columnas de acuerdo al uso que le daremos
electricity.rename(columns={ electricity.columns[1]: "Electricity" }, inplace = True)
electricity.rename(columns={ electricity.columns[0]: "Country" }, inplace = True)

In [25]:
# Queremos las columnas con datos del ASCII
electricity['Country']=electricity.Country.apply(unidecode)
electricity['region']=electricity.region.apply(unidecode)

In [26]:
# Eliminamos elementos que no son números por espacios vacios
electricity['Electricity'] = electricity['Electricity'].str.replace(pat=r'[^\d.]', repl='', regex=True)

In [27]:
# Transformamos en variables numéricas la columna 'Electricity'
electricity['Electricity']=pd.to_numeric(electricity.Electricity)

In [28]:
# Revisamos como quedó
electricity.info()

# Todo se ve correcto, continuamos con la limpieza de education

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 211 entries, 0 to 210
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Country      211 non-null    object
 1   Electricity  211 non-null    int64 
 2   region       211 non-null    object
dtypes: int64(1), object(2)
memory usage: 5.1+ KB


EDUCATION

In [29]:
# Revisamos las primeras filas
education.head()

Unnamed: 0,name,slug,% of GDP,date_of_information,ranking,region
0,Marshall Islands,marshall-islands,13.6,2020,1,Australia and Oceania
1,Solomon Islands,solomon-islands,12.8,2020,2,Australia and Oceania
2,Kiribati,kiribati,12.4,2019,3,Australia and Oceania
3,Greenland,greenland,10.2,2019,4,North America
4,Bolivia,bolivia,9.8,2020,5,South America


In [30]:
# Revisamos la información
education.info()

# La columna deseada se encuentra en float64, valores numéricos

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 197 entries, 0 to 196
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   name                 197 non-null    object 
 1   slug                 197 non-null    object 
 2   % of GDP             197 non-null    float64
 3   date_of_information  197 non-null    int64  
 4   ranking              197 non-null    int64  
 5   region               197 non-null    object 
dtypes: float64(1), int64(2), object(3)
memory usage: 9.4+ KB


In [31]:
# Eliminamos las columnas que no necesitamos
education = education.drop(education.columns[[1,3,4]], axis=1)

In [32]:
# Renombramos las columnas que vamos a usar
education.rename(columns={ education.columns[0]: "Country" }, inplace = True)
education.rename(columns={ education.columns[1]: "Education" }, inplace = True)

In [33]:
# Formateamos acorde los datos del ASCII
education['Country']=education.Country.apply(unidecode)
education['region']=education.region.apply(unidecode)

In [34]:
# Revisamos como quedó
education.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 197 entries, 0 to 196
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Country    197 non-null    object 
 1   Education  197 non-null    float64
 2   region     197 non-null    object 
dtypes: float64(1), object(2)
memory usage: 4.7+ KB


In [35]:
Filasvacias=education[education['Education'].isnull()]

display(Filasvacias)

# Significa que la data está completa, fin de la limpieza

Unnamed: 0,Country,Education,region


3. MERGING

In [36]:
# Imprimimos las dimensiones (filas, columnas) de cada DataFrame para verificar su tamaño
reserves.shape,education.shape,energy.shape,electricity.shape

((195, 3), (197, 3), (195, 3), (211, 3))

In [37]:
# Verificamos cual es el método más adecuado y donde no se pierda data
reserves.merge(energy,how='inner',left_on='Country',right_on='Country').shape

(188, 5)

In [38]:
reserves.merge(energy,how='left',left_on='Country',right_on='Country').shape

(195, 5)

In [39]:
reserves.merge(energy,how='outer',left_on='Country',right_on='Country').shape
# Elegimos este método

(202, 5)

In [40]:
# Combinamos la data
t1=reserves.merge(energy,how='outer',left_on='Country',right_on='Country')
t1

Unnamed: 0,Country,Reserves,region_x,Energy,region_y
0,Afghanistan,9.749000e+09,South Asia,3380000.0,South Asia
1,Albania,6.455000e+09,Europe,27407000.0,Europe
2,Algeria,6.844800e+10,Africa,61843000.0,Africa
3,American Samoa,,,89105000.0,Australia and Oceania
4,Angola,1.424300e+10,Africa,9146000.0,Africa
...,...,...,...,...,...
197,Vietnam,9.223800e+10,East and Southeast Asia,40263000.0,East and Southeast Asia
198,West Bank,1.328000e+09,Middle East,14991000.0,Middle East
199,Yemen,1.251000e+09,Middle East,2987000.0,Middle East
200,Zambia,3.173000e+09,Africa,8265000.0,Africa


In [41]:
# Combinamos la data education y electricity
t2=education.merge(electricity,how='outer',left_on='Country',right_on='Country')
t2

Unnamed: 0,Country,Education,region_x,Electricity,region_y
0,Afghanistan,2.9,South Asia,627000.0,South Asia
1,Albania,3.1,Europe,2857000.0,Europe
2,Algeria,7.0,Africa,22591000.0,Africa
3,American Samoa,,,50000.0,Australia and Oceania
4,Andorra,2.9,Europe,,
...,...,...,...,...,...
215,Virgin Islands,,,326000.0,Central America and the Caribbean
216,West Bank,5.3,Middle East,352000.0,Middle East
217,Yemen,,,1790000.0,Middle East
218,Zambia,3.7,Africa,3986000.0,Africa


In [42]:
# Identificamos los países presentes solo en t1 que no están en t2
Onlyt1=set(t1.Country)-set(t2.Country)
Onlyt1

set()

In [43]:
# Identificamos los países presentes solo en t2 que no están en t1
Onlyt2=set(t2.Country)-set(t1.Country)
Onlyt2

{'Andorra',
 'Cook Islands',
 'Curacao',
 'Falkland Islands (Islas Malvinas)',
 'Faroe Islands',
 'French Polynesia',
 'Gibraltar',
 'Greenland',
 'Liechtenstein',
 'Marshall Islands',
 'Monaco',
 'Nauru',
 'New Caledonia',
 'Niue',
 'Saint Helena, Ascension, and Tristan da Cunha',
 'Saint Pierre and Miquelon',
 'Turks and Caicos Islands',
 'Virgin Islands'}

In [44]:
!pip install thefuzz



In [45]:
from thefuzz import process as fz

# Para cada país en Onlyt2, buscamos y mostramos la coincidencia más similar en Onlyt1
[(c,fz.extractOne(c,Onlyt1 )) for c in sorted(Onlyt2)]

# Significa que no se pierde ningún valor y no hay repetidos

[('Andorra', None),
 ('Cook Islands', None),
 ('Curacao', None),
 ('Falkland Islands (Islas Malvinas)', None),
 ('Faroe Islands', None),
 ('French Polynesia', None),
 ('Gibraltar', None),
 ('Greenland', None),
 ('Liechtenstein', None),
 ('Marshall Islands', None),
 ('Monaco', None),
 ('Nauru', None),
 ('New Caledonia', None),
 ('Niue', None),
 ('Saint Helena, Ascension, and Tristan da Cunha', None),
 ('Saint Pierre and Miquelon', None),
 ('Turks and Caicos Islands', None),
 ('Virgin Islands', None)]

In [46]:
# Realizamos un outer merge entre 't1' y 't2' basada en 'Country', creando un nuevo DataFrame 'tf'
tf=t1.merge(t2,how='outer',left_on='Country',right_on='Country')
tf

Unnamed: 0,Country,Reserves,region_x_x,Energy,region_y_x,Education,region_x_y,Electricity,region_y_y
0,Afghanistan,9.749000e+09,South Asia,3380000.0,South Asia,2.9,South Asia,627000.0,South Asia
1,Albania,6.455000e+09,Europe,27407000.0,Europe,3.1,Europe,2857000.0,Europe
2,Algeria,6.844800e+10,Africa,61843000.0,Africa,7.0,Africa,22591000.0,Africa
3,American Samoa,,,89105000.0,Australia and Oceania,,,50000.0,Australia and Oceania
4,Andorra,,,,,2.9,Europe,,
...,...,...,...,...,...,...,...,...,...
215,Virgin Islands,,,,,,,326000.0,Central America and the Caribbean
216,West Bank,1.328000e+09,Middle East,14991000.0,Middle East,5.3,Middle East,352000.0,Middle East
217,Yemen,1.251000e+09,Middle East,2987000.0,Middle East,,,1790000.0,Middle East
218,Zambia,3.173000e+09,Africa,8265000.0,Africa,3.7,Africa,3986000.0,Africa


In [47]:
# Cambiamos de nombre a la columna 2 por Region
tf.rename(columns={ tf.columns[2]: "Region" }, inplace = True)

In [48]:
# Consolidamos los valores de la columna 'Region' usando la primera no nula de columnas auxiliares y luego las eliminamos
tf["Region"] = tf["Region"].combine_first(tf["region_y_x"])
tf["Region"] = tf["Region"].combine_first(tf["region_x_y"])
tf["Region"] = tf["Region"].combine_first(tf["region_y_y"])
tf.drop(columns=["region_y_x", "region_x_y", "region_y_y"], inplace=True)

In [49]:
# Movemos la columna 'Region' a la segunda posición
cols = tf.columns.tolist()
cols.remove('Region')
cols.insert(1, 'Region')
tf = tf[cols]

tf.head()

Unnamed: 0,Country,Region,Reserves,Energy,Education,Electricity
0,Afghanistan,South Asia,9749000000.0,3380000.0,2.9,627000.0
1,Albania,Europe,6455000000.0,27407000.0,3.1,2857000.0
2,Algeria,Africa,68448000000.0,61843000.0,7.0,22591000.0
3,American Samoa,Australia and Oceania,,89105000.0,,50000.0
4,Andorra,Europe,,,2.9,


4. AGGREGATING

In [50]:
# Agrupamos el DataFrame por 'Region' y calculamos la media, mínimo y máximo para las columnas de interés
region_stats = tf.groupby('Region')[['Reserves', 'Energy', 'Electricity', 'Education']].agg(['mean', 'min', 'max'])
region_stats

Unnamed: 0_level_0,Reserves,Reserves,Reserves,Energy,Energy,Energy,Electricity,Electricity,Electricity,Education,Education,Education
Unnamed: 0_level_1,mean,min,max,mean,min,max,mean,min,max,mean,min,max
Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
Africa,7880167000.0,30450000.0,80592000000.0,16122540.0,649000.0,100844000.0,4471309.0,5000.0,65989000.0,4.268,0.3,9.6
Australia and Oceania,9047955000.0,396530000.0,54455000000.0,68375500.0,5655000.0,223158000.0,8104800.0,3000.0,108193000.0,7.569231,2.2,13.6
Central America and the Caribbean,4506223000.0,47580000.0,23834000000.0,56383220.0,3486000.0,153952000.0,1792593.0,6000.0,7264000.0,4.544444,1.4,8.7
Central Asia,75558220000.0,3237000000.0,597217000000.0,103916100.0,16192000.0,261142000.0,42395780.0,3944000.0,301926000.0,4.333333,2.8,6.2
East and Southeast Asia,359039700000.0,781995000.0,3265000000000.0,118315300.0,6825000.0,643259000.0,193404700.0,277000.0,2949000000.0,3.633333,1.4,6.3
Europe,55353180000.0,836088000.0,822130000000.0,108276900.0,27407000.0,234698000.0,33872600.0,50000.0,275658000.0,5.017073,1.2,7.7
Middle East,82637830000.0,407300000.0,436769000000.0,211604200.0,2987000.0,814308000.0,30304250.0,352000.0,119620000.0,4.528571,1.7,7.8
North America,189085000000.0,117551000000.0,227760000000.0,192587200.0,57539000.0,311599000.0,250493700.0,26000.0,1235000000.0,5.54,1.9,10.2
South America,47486920000.0,87100000.0,318857000000.0,46414670.0,25733000.0,78496000.0,32773770.0,10000.0,240251000.0,4.808333,1.3,9.8
South Asia,79075930000.0,673203000.0,569544000000.0,23506380.0,3380000.0,64082000.0,72116120.0,432000.0,499136000.0,3.85,1.9,7.0


5. DISCRETIZACIÓN

In [51]:
!pip install mapclassify



In [52]:
import mapclassify
import numpy as np
import pandas as pd # Import pandas

np.random.seed(12345)

theVar = tf.Reserves

# Eliminamos los valores Nan
theVar = theVar.dropna()

#Definimos número de intervalos
K = 5

#Utilizamos los distintos métodos para obtener los intervalos
ei5 = mapclassify.EqualInterval(theVar, k=K)
msd = mapclassify.StdMean(theVar)
q5 = mapclassify.Quantiles(theVar, k=K)
mb5 = mapclassify.MaximumBreaks(theVar, k=K)
ht = mapclassify.HeadTailBreaks(theVar)
fj5 = mapclassify.FisherJenks(theVar, k=K)
jc5 = mapclassify.JenksCaspall(theVar, k=K)
mp5 = mapclassify.MaxP(theVar, k=K)

In [53]:
import seaborn
class5 = q5, ei5,msd, ht, mb5, fj5, jc5, mp5

#Para la elección del método a utilizar, recolectamos el ADCM de cada clasificación
fits = np.array([ c.adcm for c in class5])
adcms = pd.DataFrame(fits)
adcms['classifier'] = [c.name for c in class5]
adcms.columns = ['ADCM', 'Classifier']

#Elegimos el de menor valor, en este caso jc5
adcms.sort_values(by='ADCM', ascending=False)

Unnamed: 0,ADCM,Classifier
7,9299101000000.0,MaxP
0,9271262000000.0,Quantiles
1,8251084000000.0,EqualInterval
2,7552820000000.0,StdMean
4,6784226000000.0,MaximumBreaks
6,5983126000000.0,JenksCaspall
3,3384256000000.0,HeadTailBreaks
5,3248788000000.0,FisherJenks


In [54]:
#Creamos la clasificación con los valores originales del index
classified_reserves = pd.Series(fj5.yb, index=theVar.index)

# Llenamos con valores Nan para aquellas columbas que no tengan valores en Reserves
# Usamos reindex para alinear con el índice original del DataFrame tf
classified_reserves_reindexed = classified_reserves.reindex(tf.index)

# Asignamos la Serie al DataFrame, especificando el tipo de dato Int64 para permitir NaN
tf['Reserves_fj5'] = classified_reserves_reindexed.astype('Int64')

#Creamos la nueva columna
tf.loc[:,tf.columns.str.contains('tf|Coun')]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tf['Reserves_fj5'] = classified_reserves_reindexed.astype('Int64')


Unnamed: 0,Country
0,Afghanistan
1,Albania
2,Algeria
3,American Samoa
4,Andorra
...,...
215,Virgin Islands
216,West Bank
217,Yemen
218,Zambia


In [55]:
#Repetimos el proceso pero utilizaremos FisherJenks en las columnas restantes

for col in ['Energy', 'Electricity', 'Education']:
  theVar = tf[col]
  theVar = theVar.dropna()
  K = 5
  fj5 = mapclassify.FisherJenks(theVar, k=K) # Use FisherJenks as requested
  classified_col = pd.Series(fj5.yb, index=theVar.index)
  classified_col_reindexed = classified_col.reindex(tf.index)
  tf[f'{col}_fj5'] = classified_col_reindexed.astype('Int64')

tf.loc[:, tf.columns.str.contains('Country|Reserves|Energy|Electricity|Education')]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tf[f'{col}_fj5'] = classified_col_reindexed.astype('Int64')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tf[f'{col}_fj5'] = classified_col_reindexed.astype('Int64')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tf[f'{col}_fj5'] = classified_col_reindexed.astype('Int64')


Unnamed: 0,Country,Reserves,Energy,Education,Electricity,Reserves_fj5,Energy_fj5,Electricity_fj5,Education_fj5
0,Afghanistan,9.749000e+09,3380000.0,2.9,627000.0,0,0,0,1
1,Albania,6.455000e+09,27407000.0,3.1,2857000.0,0,0,0,1
2,Algeria,6.844800e+10,61843000.0,7.0,22591000.0,0,1,0,3
3,American Samoa,,89105000.0,,50000.0,,1,0,
4,Andorra,,,2.9,,,,,1
...,...,...,...,...,...,...,...,...,...
215,Virgin Islands,,,,326000.0,,,0,
216,West Bank,1.328000e+09,14991000.0,5.3,352000.0,0,0,0,2
217,Yemen,1.251000e+09,2987000.0,,1790000.0,0,0,0,
218,Zambia,3.173000e+09,8265000.0,3.7,3986000.0,0,0,0,1


In [56]:
# Definimos listas para organizar las columnas
cols = tf.columns.tolist()
numerical_cols = ['Reserves', 'Energy', 'Electricity', 'Education']
interval_cols = ['Reserves_fj5', 'Energy_fj5', 'Electricity_fj5', 'Education_fj5']

# Creamos una nueva lista de orden de columnas, intercalando las columnas numéricas y sus intervalos después de 'Country' y 'Region'
new_order = ['Country', 'Region']
for num_col, interval_col in zip(numerical_cols, interval_cols):
  new_order.append(num_col)
  new_order.append(interval_col)

# Redefinimos tf con el nuevo orden
tf = tf[new_order]

# Revisamos como quedó el DataFrame
tf.head()

Unnamed: 0,Country,Region,Reserves,Reserves_fj5,Energy,Energy_fj5,Electricity,Electricity_fj5,Education,Education_fj5
0,Afghanistan,South Asia,9749000000.0,0.0,3380000.0,0.0,627000.0,0.0,2.9,1.0
1,Albania,Europe,6455000000.0,0.0,27407000.0,0.0,2857000.0,0.0,3.1,1.0
2,Algeria,Africa,68448000000.0,0.0,61843000.0,1.0,22591000.0,0.0,7.0,3.0
3,American Samoa,Australia and Oceania,,,89105000.0,1.0,50000.0,0.0,,
4,Andorra,Europe,,,,,,,2.9,1.0


In [57]:
# Finalemente, guardamos el DataFrame final 'tf' en un archivo CSV
tf.to_csv('final_data.csv', index=False)