<a href="https://colab.research.google.com/github/cristiandarioortegayubro/BDS/blob/main/modulo.01/bds_web_scraping_001_03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p align="center">
<img src="https://github.com/cristiandarioortegayubro/BDS/blob/main/images/Logo%20Pandas.png?raw=true">
</p>


 # **<font color="DeepPink">Web Scraping en Administración Tributaria Mendoza</font>**

Scripts desarrollados por [Mauro Ariel Donnantuoni Moratto](https://www.linkedin.com/in/mauro-ariel-donnantuoni-moratto/)

<p align="center">
<img src="https://atm.mendoza.gov.ar/wp-content/uploads/2023/01/Frente-Atm.png">
</p>



https://atm.mendoza.gov.ar/portalatm/zoneBottom/datosInteres/recaudacion/recaudacion_impuesto_ingresos.jsp


<p align="justify">
👀 En la página web de la Administración Tributaria Mendoza, podemos ver la recaudación por ejercicios fiscales del Impuesto sobre los Ingresos Brutos. Lo que vamos a hacer, es tomar esos datos y generar nuestro <code>DataFrame</code> para gráficar.</p>

 # **<font color="DeepPink">Habilitando bibliotecas requeridas</font>**

 ## **<font color="DeepPink">Para análisis de datos</font>**

In [None]:
import numpy as np
import pandas as pd

 ## **<font color="DeepPink">Para web scraping</font>**

In [None]:
import lxml
import bs4
import html5lib

 ## **<font color="DeepPink">Configuraciones del DataFrame</font>**

In [None]:
pd.options.display.precision = 2
pd.options.display.float_format = "${:,.2f}".format

 ## **<font color="DeepPink">Para graficos</font>**

In [None]:
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

 # **<font color="DeepPink">Obtención de datos y generación del DataFrame</font>**

In [None]:
data = "https://atm.mendoza.gov.ar/portalatm/zoneBottom/datosInteres/recaudacion/recaudacion_impuesto_ingresos.jsp"
lista = pd.read_html(data)
df = lista[0]
recaudacion = df.iloc[2:,1:24]
recaudacion.rename(columns = lambda col: f"Año fiscal {col + 1999}", inplace=True)
recaudacion.index = recaudacion.index - 1
recaudacion.index.name = "Mes"
recaudacion = recaudacion.astype("Float32")
recaudacion

Unnamed: 0_level_0,Año fiscal 2000,Año fiscal 2001,Año fiscal 2002,Año fiscal 2003,Año fiscal 2004,Año fiscal 2005,Año fiscal 2006,Año fiscal 2007,Año fiscal 2008,Año fiscal 2009,...,Año fiscal 2013,Año fiscal 2014,Año fiscal 2015,Año fiscal 2016,Año fiscal 2017,Año fiscal 2018,Año fiscal 2019,Año fiscal 2020,Año fiscal 2021,Año fiscal 2022
Mes,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,"$18,843,926.00","$18,114,420.00","$12,722,708.00","$20,192,886.00","$28,492,948.00","$37,455,920.00","$48,307,580.00","$61,131,912.00","$88,818,880.00","$92,258,440.00",...,"$397,924,128.00","$617,487,232.00","$782,112,960.00","$990,329,984.00","$1,317,982,208.00","$1,642,808,064.00","$2,195,529,728.00","$3,068,476,928.00","$4,210,333,184.00","$6,815,275,520.00"
2,"$16,258,743.00","$32,180,908.00","$11,237,981.00","$20,257,924.00","$26,276,428.00","$35,131,740.00","$45,290,616.00","$58,230,196.00","$76,675,888.00","$87,810,744.00",...,"$384,301,376.00","$594,553,984.00","$729,820,544.00","$919,810,304.00","$1,194,014,720.00","$1,497,939,072.00","$2,103,363,712.00","$3,246,583,040.00","$4,035,920,128.00","$6,525,087,744.00"
3,"$17,468,328.00","$14,453,688.00","$11,460,186.00","$19,766,456.00","$27,124,258.00","$33,982,256.00","$45,654,236.00","$58,267,060.00","$74,072,264.00","$78,868,768.00",...,"$380,740,032.00","$574,800,384.00","$702,603,264.00","$965,441,664.00","$1,202,600,320.00","$1,492,442,496.00","$2,143,791,360.00","$2,869,934,848.00","$3,969,197,824.00","$6,633,613,312.00"
4,"$17,463,084.00","$15,809,614.00","$12,300,935.00","$20,503,786.00","$32,466,740.00","$39,149,216.00","$47,268,952.00","$61,113,108.00","$88,241,184.00","$85,345,416.00",...,"$409,508,288.00","$590,621,184.00","$741,563,456.00","$1,046,270,144.00","$1,386,964,096.00","$1,611,327,488.00","$2,277,286,912.00","$2,711,721,984.00","$4,446,800,384.00","$7,882,943,488.00"
5,"$15,978,692.00","$14,610,889.00","$14,265,621.00","$20,988,948.00","$30,252,980.00","$39,887,132.00","$60,082,272.00","$65,113,468.00","$90,479,856.00","$89,655,040.00",...,"$434,638,400.00","$608,269,504.00","$770,937,472.00","$1,043,528,832.00","$1,288,922,112.00","$1,630,360,960.00","$2,456,052,992.00","$2,558,796,288.00","$4,285,918,464.00","$8,485,148,672.00"
6,"$14,676,011.00","$14,328,045.00","$16,351,168.00","$21,041,196.00","$30,687,746.00","$39,160,680.00","$56,268,380.00","$65,906,756.00","$83,292,808.00","$87,674,064.00",...,"$443,506,240.00","$604,987,648.00","$774,531,264.00","$991,656,640.00","$1,291,359,744.00","$1,725,683,968.00","$2,334,065,920.00","$2,601,499,392.00","$4,477,259,264.00","$8,272,230,912.00"
7,"$15,299,790.00","$14,675,150.00","$17,106,546.00","$21,763,328.00","$30,962,034.00","$39,205,416.00","$59,484,120.00","$66,270,872.00","$88,028,896.00","$90,860,864.00",...,"$457,074,208.00","$604,665,536.00","$856,524,736.00","$1,105,998,080.00","$1,355,228,800.00","$1,734,466,048.00","$2,419,755,008.00","$2,800,037,632.00","$4,653,112,320.00","$9,777,144,832.00"
8,"$15,802,483.00","$14,475,158.00","$18,373,950.00","$23,185,412.00","$31,937,418.00","$40,688,412.00","$58,732,120.00","$67,496,824.00","$94,763,400.00","$93,054,928.00",...,"$476,677,280.00","$693,626,752.00","$859,765,312.00","$1,127,364,480.00","$1,460,045,952.00","$1,884,253,312.00","$2,761,171,712.00","$3,271,900,928.00","$4,977,782,272.00","$10,673,847,296.00"
9,"$15,756,538.00","$16,781,292.00","$18,684,200.00","$25,513,568.00","$31,578,992.00","$43,003,892.00","$63,159,168.00","$70,058,016.00","$91,342,608.00","$96,826,064.00",...,"$517,929,600.00","$673,022,272.00","$831,099,072.00","$1,141,604,480.00","$1,518,901,760.00","$2,059,133,824.00","$2,874,249,728.00","$3,102,345,216.00","$5,349,090,816.00","$10,903,033,856.00"
10,"$15,731,005.00","$13,506,128.00","$18,518,490.00","$25,042,384.00","$33,774,688.00","$41,477,904.00","$63,007,200.00","$72,036,144.00","$90,090,536.00","$104,056,432.00",...,"$485,045,472.00","$766,913,472.00","$897,798,272.00","$1,098,256,384.00","$1,483,261,440.00","$2,154,164,736.00","$2,863,717,888.00","$3,254,208,768.00","$5,368,896,000.00","$11,468,663,808.00"


# **<font color="DeepPink">Gráficos</font>**


## **<font color="DeepPink">Graficos para los primeros y últimos cinco ejercicios</font>**


In [None]:
ejercicios = recaudacion.iloc[:, [0, 1, 2, 3, 4, 18, 19, 20, 21, 22]]

In [None]:
fig = make_subplots(rows=2, cols=5,
                    subplot_titles=[f'Año {col[-4:]}' for col in ejercicios.columns],
                    x_title='Mes',
                    y_title='Recaudación')

In [None]:
for i, (col_name, ej) in enumerate(ejercicios.items()):
  fig.add_trace(go.Scatter(x=ejercicios.index,
                           y=ej,
                           showlegend=True),
                row=i // 5 + 1, col=i % 5 + 1)

In [None]:
fig.update_layout(height=660, width=1000,
                  title_text="Ejercicios 2000-2004 y 2018-2022",
                  showlegend=False,
                  template="gridon")

fig.show()

## **<font color="DeepPink">Evolución interanual</font>**


In [None]:
df_annual_sum = ejercicios.sum().reset_index() \
                .rename(columns={'index': 'Año', 0: 'Recaudación'}) \
                .replace('Año fiscal', '', regex=True)

In [None]:
fig = make_subplots(rows=1,
                    cols=2,
                    subplot_titles=['Período 2000-2004', 'Período 2018-2022'],
                    x_title='Año',
                    y_title='Recaudación total')

In [None]:
for i, df in enumerate([df_annual_sum.iloc[:5], df_annual_sum.iloc[-5:]]):
  fig.add_trace(go.Bar(x=df['Año'],
                       y=df['Recaudación'],
                       showlegend=True),
  row=i // 2 + 1, col=i % 2 + 1)

In [None]:
fig.update_layout(
                  title_text="Recaudación total por año 2000-2004 y 2018-2022",
                  showlegend=False,
                  template="gridon")

fig.show()

## **<font color="DeepPink">Comparativa del promedio de recaudación por mes, en cada período</font>**


In [None]:
df_monthly_avg1 = ejercicios.T.iloc[:5].mean().reset_index() \
                  .rename(columns={0: 'Recaudación media'})

In [None]:
df_monthly_avg2 = ejercicios.T.iloc[-5:].mean().reset_index() \
                  .rename(columns={0: 'Recaudación media'})

In [None]:
fig = make_subplots(rows=1,
                    cols=2,
                    subplot_titles=['Período 2000-2004', 'Período 2018-2022'],
                    x_title='Mes de año',
                    y_title='Recaudación promedio')


In [None]:
for i, df in enumerate([df_monthly_avg1, df_monthly_avg2]):
  fig.add_trace(go.Bar(x=df['Mes'],
                       y=df['Recaudación media'],
                       showlegend=True),
  row=i // 2 + 1, col=i % 2 + 1)

In [None]:
fig.update_layout(title_text="Recaudación media de cada mes del año, en los períodos 2000-2004 y 2018-2022",
                  showlegend=False,
                  template="gridon")

fig.show()

## **<font color="DeepPink">Descripción estadística de los períodos considerados</font>**


In [None]:
ejercicios['Año fiscal 2001'].describe()

count           $12.00
mean    $16,162,706.67
std      $5,310,180.00
min     $11,570,299.00
25%     $14,122,565.75
50%     $14,543,023.50
75%     $16,052,533.50
max     $32,180,908.00
Name: Año fiscal 2001, dtype: Float64

# **<font color="DeepPink">¿Conclusión?...</font>**


<p align="justify">
La serie de gráficos que muestran la recaudación mensual por año en concepto de ingresos brutos en la Provincia de Mendoza indican que el comportamiento anual habitual tiende a ser una progresión lineal ascendente, con breves picos en alguno de los dos primeros meses del año.
<br><br>
Los años 2000 y 2001, sin embargo, son la excepción, posiblemente debido a la crisis económica. Este impacto se ve reflejado en los promedios de recaudación por cada mes del año del período 2000-2004, que son parejos para todos los meses del año, en contra de lo esperado (contrariamente a lo que se verifica para el período 2018-2022).
<br><br>
En cuanto a la recaudación absoluta total por año, también se comprueba un ascenso gradual permanente, salvo por el aplanamiento de los años 2000-2002.
<br><br>
Esa evolución parece explicarse por el proceso inflacionario que sucitó la salida de la convertibilidad a partir del año 2002.
<br><br>
Por último, si focalizamos en el año neurálgico de la crisis económica, 2001, vemos que los datos estadísticos son consistentes con esa situación ya que el valor, tanto de todos los cuartiles como del promedio de recaudación son mucho más cercanos al ingreso mensual mínimo que al máximo; lo que habla de un achatamiento en la recaudación, y que condice con la caída anómala del gráfico de evolución mensual correspondiente a ese año.