# Un nuevo modelo ARIMA

## Instrucciones
Ahora que has construido un modelo de ARIMA, construye uno nuevo con datos frescos (prueba uno de estos conjuntos de datos de Duke). Anota tu trabajo en un notebook, visualiza los datos y tu modelo, y prueba su precisión usando MAPE.

### Acerca del conjunto de datos

[Este conjunto de datos](https://www.kaggle.com/datasets/charanchandrasekaran/top-6-economies-in-the-world-by-gdp) contiene datos sobre indicadores clave de las 6 principales economías del mundo (por PIB), que incluyen EE. UU., China, Japón, Alemania, Reino Unido e India entre el intervalo de tiempo de 30 años de 1990 a 2020. Datos extraídos del sitio web de datos del Banco Mundial y procesado utilizando la biblioteca Python Pandas. Este conjunto de datos podría utilizarse para realizar análisis y pronósticos de series temporales.

#### INDICADORES
* PIB (USD actuales)
* PIB, PPA (dólares internacionales corrientes)
* PIB per cápita (USD actuales)
* Crecimiento del PIB (%) anual
* Importaciones de bienes y servicios (% del PIB)
* Exportaciones de bienes y servicios (% del PIB)
* Deuda del gobierno central, total (% del PIB)
* Reservas totales (incluye oro, dólares estadounidenses actuales)
* Desempleo, total (% de la fuerza laboral total) (estimación modelada de la OIT)
* Inflación, precios al consumo (% anual)
* Remesas personales recibidas (% del PIB)
* Población, total
* Crecimiento de la población (%) anual
* Esperanza de vida al nacer, total (años)
* Tasa de pobreza: 1,90 dólares al día (PPA de 2011) (% de la población)

In [1]:
#Data cleaning
import pandas as pd
import numpy as np

#Data Visualization
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

### Preparación de los datos

In [2]:
def country_df(country_name,reqcol):
    
    #loading data 
    df=pd.read_csv('../Data/top_six_economies.csv')
    
    #querry data based on the given country name
    country_df=df.loc[(df["Country Name"]==country_name)]
    
    country=country_df.copy()
    
    country.drop('Unnamed: 0',axis=1,inplace=True)
    
    country.set_index('Country Name')
    
    return country[reqcol]

### Columnas requeridas para el análisis y visualización

In [3]:
col=['Country Name','Year','GDP (current US$)','GDP, PPP (current international $)',
     'GDP per capita (current US$)','GDP growth (annual %)',
     'Imports of goods and services (% of GDP)',
     'Exports of goods and services (% of GDP)',
     'Unemployment, total (% of total labor force) (modeled ILO estimate)',
     'Population, total','Population growth (annual %)',
     'Life expectancy at birth, total (years)',
     'Poverty headcount ratio at $1.90 a day (2011 PPP) (% of population)']

Obtención de datos del csv principal, en este bloque los datos relacionados con EE. UU. y CHINA se extraen del archivo top_six_economies.csv
El marco de datos de ambos países contiene datos relacionados con medidas macroeconómicas como el PIB, el ingreso per cápita, el porcentaje de importación y exportación, etc., desde 1991 hasta 2020

In [5]:
# USA
usa=country_df(country_name="United States",reqcol=col)

# China
china=country_df(country_name="China",reqcol=col)

Las 5 principales del marco de datos de EE. UU.

In [6]:
usa.head()

Unnamed: 0,Country Name,Year,GDP (current US$),"GDP, PPP (current international $)",GDP per capita (current US$),GDP growth (annual %),Imports of goods and services (% of GDP),Exports of goods and services (% of GDP),"Unemployment, total (% of total labor force) (modeled ILO estimate)","Population, total",Population growth (annual %),"Life expectancy at birth, total (years)",Poverty headcount ratio at $1.90 a day (2011 PPP) (% of population)
0,United States,1991,6158129000000.0,6158129000000.0,24342.258905,-0.108265,10.125543,9.660905,6.8,252981000.0,1.336261,75.365854,0.5
1,United States,1992,6520327000000.0,6520327000000.0,25418.990776,3.522441,10.24168,9.708915,7.5,256514000.0,1.386886,75.617073,0.5
2,United States,1993,6858559000000.0,6858559000000.0,26387.293734,2.751781,10.497438,9.54718,6.9,259919000.0,1.31868,75.419512,0.5
3,United States,1994,7287236000000.0,7287236000000.0,27694.853416,4.028793,11.162312,9.893147,6.12,263126000.0,1.226296,75.619512,0.5
4,United States,1995,7639749000000.0,7639749000000.0,28690.875701,2.684217,11.814158,10.639224,5.65,266278000.0,1.190787,75.621951,0.5


Algunas medidas estadísticas importantes del marco de datos de EE. UU. dado, que incluyen valores medios, percentiles, máximos y mínimos

In [7]:
usa.describe()

Unnamed: 0,Year,GDP (current US$),"GDP, PPP (current international $)",GDP per capita (current US$),GDP growth (annual %),Imports of goods and services (% of GDP),Exports of goods and services (% of GDP),"Unemployment, total (% of total labor force) (modeled ILO estimate)","Population, total",Population growth (annual %),"Life expectancy at birth, total (years)",Poverty headcount ratio at $1.90 a day (2011 PPP) (% of population)
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,29.0
mean,2005.5,13209730000000.0,13209730000000.0,43754.501616,2.28838,14.121256,11.112766,5.928,295475400.0,0.945598,77.392439,0.855172
std,8.803408,4674607000000.0,4674607000000.0,12355.698715,1.868823,2.154676,1.413266,1.641669,23744270.0,0.241759,1.190069,0.235412
min,1991.0,6158129000000.0,6158129000000.0,24342.258905,-3.40459,10.125543,9.035659,3.67,252981000.0,0.455381,75.365854,0.5
25%,1998.25,9204907000000.0,9204907000000.0,33269.105271,1.732426,12.48403,9.915374,4.6475,276650500.0,0.734254,76.581098,0.7
50%,2005.5,13427390000000.0,13427390000000.0,45212.703974,2.695293,14.494803,10.719706,5.59,296948300.0,0.936831,77.487805,1.0
75%,2012.75,16695890000000.0,16695890000000.0,52914.45041,3.512635,15.823025,12.315318,6.875,315514400.0,1.159644,78.540854,1.0
max,2020.0,21372570000000.0,21372570000000.0,65094.799429,4.794499,17.441948,13.644049,9.63,331501100.0,1.386886,78.841463,1.2


In [8]:
usa.info()

<class 'pandas.core.frame.DataFrame'>
Index: 30 entries, 0 to 29
Data columns (total 13 columns):
 #   Column                                                               Non-Null Count  Dtype  
---  ------                                                               --------------  -----  
 0   Country Name                                                         30 non-null     object 
 1   Year                                                                 30 non-null     int64  
 2   GDP (current US$)                                                    30 non-null     float64
 3   GDP, PPP (current international $)                                   30 non-null     float64
 4   GDP per capita (current US$)                                         30 non-null     float64
 5   GDP growth (annual %)                                                30 non-null     float64
 6   Imports of goods and services (% of GDP)                             30 non-null     float64
 7   Exports of good

El marco de datos dado contiene 13 columnas y cada columna contiene 30 entradas

Las 5 principales del marco de datos de China

In [9]:
china.head()

Unnamed: 0,Country Name,Year,GDP (current US$),"GDP, PPP (current international $)",GDP per capita (current US$),GDP growth (annual %),Imports of goods and services (% of GDP),Exports of goods and services (% of GDP),"Unemployment, total (% of total labor force) (modeled ILO estimate)","Population, total",Population growth (annual %),"Life expectancy at birth, total (years)",Poverty headcount ratio at $1.90 a day (2011 PPP) (% of population)
30,China,1991,383373300000.0,1258454000000.0,333.142145,9.262786,10.629964,13.436371,2.37,1150780000.0,1.364434,69.242,
31,China,1992,426915700000.0,1470222000000.0,366.460692,14.22453,12.542033,13.555544,2.37,1164970000.0,1.225536,69.355,
32,China,1993,444731300000.0,1714031000000.0,377.389839,13.883729,13.902547,11.997883,2.69,1178440000.0,1.149619,69.496,56.7
33,China,1994,564324700000.0,1978860000000.0,473.492279,13.036807,17.233066,18.536749,2.9,1191835000.0,1.130261,69.67,
34,China,1995,734547900000.0,2241663000000.0,609.656679,10.953954,16.324446,17.952523,3.0,1204855000.0,1.086509,69.885,


In [10]:
china.describe()

Unnamed: 0,Year,GDP (current US$),"GDP, PPP (current international $)",GDP per capita (current US$),GDP growth (annual %),Imports of goods and services (% of GDP),Exports of goods and services (% of GDP),"Unemployment, total (% of total labor force) (modeled ILO estimate)","Population, total",Population growth (annual %),"Life expectancy at birth, total (years)",Poverty headcount ratio at $1.90 a day (2011 PPP) (% of population)
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,16.0
mean,2005.5,5070277000000.0,9608151000000.0,3722.204431,9.285176,19.64991,22.59981,3.98,1299286000.0,0.725246,73.081333,14.66875
std,8.803408,4911756000000.0,7421558000000.0,3489.084468,2.655472,4.885114,6.367606,0.793291,76894920.0,0.277037,2.558262,18.14926
min,1991.0,383373300000.0,1258454000000.0,333.142145,2.239702,10.629964,11.997883,2.37,1150780000.0,0.238041,69.242,0.1
25%,1998.25,1045282000000.0,3110635000000.0,839.757125,7.687776,16.117254,18.441681,3.2425,1244635000.0,0.549437,70.8185,0.65
50%,2005.5,2519049000000.0,7126068000000.0,1926.323632,9.249783,18.334447,20.602921,4.435,1307370000.0,0.626594,73.128,7.2
75%,2012.75,9310862000000.0,15919930000000.0,6840.407643,10.505309,23.342559,26.299273,4.565,1360978000.0,0.936126,75.244,21.8
max,2020.0,14687670000000.0,24255800000000.0,10408.669756,14.230861,28.444187,36.035026,5.0,1411100000.0,1.364434,77.097,56.7


Algunas medidas estadísticas importantes del marco de datos de China dado, que incluyen valores medios, percentiles, máximos y mínimos