# ***Global Terrosism Dataset - Great Expectations***
---
El objetivo de este notebook será implementar validaciones mediante Expectations a las fuentes de nuestros datos (la tabla *global_terrorism_cleaned* ubicada en la DB de Postgres y *combined_data.csv* que recoge los datos obtenidos en la API) con el fin de implementarlas en Airflow.

## ***Configurando el entorno***

In [1]:
import os
print(os.getcwd())
try:
    os.chdir('../../GlobalTerrorismAnalysis_ETL')
except FileNotFoundError:
    print("""
        Posiblemente ya ejecutaste este bloque dos o más veces o tal vez el directorio está incorrecto. 
        ¿Ya ejecutaste este bloque antes y funcionó? Recuerda no ejecutarlo de nuevo. 
        ¿Estás en el directorio incorrecto? Puedes cambiarlo. 
        Recuerda el directorio donde estás:
        """)
print(os.getcwd())

c:\Users\marti\OneDrive\Escritorio - PC\Ingenieria de Datos e IA - UAO\Semestre 4\ETL\GlobalTerrorismAnalysis_ETL\notebooks
c:\Users\marti\OneDrive\Escritorio - PC\Ingenieria de Datos e IA - UAO\Semestre 4\ETL\GlobalTerrorismAnalysis_ETL


### **Librerias** 

In [2]:
import pandas as pd
import great_expectations as gx
import great_expectations.expectations as gxe

In [3]:
pd.set_option('display.max_columns', None)

### **Módulos** 

In [4]:
from src.database.db_operations import creating_engine

## ***Leyendo los datos***

### **Base de datos**

In [5]:
engine = creating_engine()

11/13/2024 02:14:02 PM Engine created. You can now connect to the database.


In [6]:
df_gtd = pd.read_sql_table("global_terrorism_db_cleaned", engine)

In [7]:
df_gtd.head()

Unnamed: 0,eventid,iyear,imonth,iday,extended,country_txt,country,region_txt,region,city,latitude,longitude,vicinity,crit1,crit2,crit3,doubtterr,multiple,success,suicide,attacktype1_txt,attacktype1,targtype1_txt,targtype1,natlty1_txt,natlty1,gname,guncertain1,individual,nperps,nperpcap,claimed,weaptype1_txt,weaptype1,nkill,nwound,property,ishostkid,INT_ANY,date,date_country_actor
0,201707070031,2017,7,7,0,Pakistan,153,South Asia,6,Karachi,24.891115,67.143311,0,1,1,1,0.0,0.0,1,0,Armed Assault,2,Educational Institution,8,Pakistan,153.0,Unknown,0.0,0,999.0,0.0,0.0,Firearms,5,1.0,0.0,0,0.0,999,2017-07-07,2017-07-07PakistanUnknown
1,201707070032,2017,7,7,1,Nigeria,147,Sub-Saharan Africa,11,Sapele,5.893874,5.676673,0,1,1,1,0.0,0.0,1,0,Hostage Taking (Kidnapping),6,Private Citizens & Property,14,Nigeria,147.0,Unknown,0.0,0,999.0,999.0,1.0,Firearms,5,2.0,1.0,0,1.0,999,2017-07-07,2017-07-07NigeriaUnknown
2,201707080002,2017,7,8,0,Egypt,60,Middle East & North Africa,10,Arish,31.126646,33.800865,0,1,1,1,0.0,0.0,1,0,Bombing/Explosion,3,Police,3,Egypt,60.0,Unknown,0.0,0,999.0,0.0,0.0,Explosives,6,4.0,7.0,1,0.0,999,2017-07-08,2017-07-08EgyptUnknown
3,201707080003,2017,7,8,0,Pakistan,153,South Asia,6,Panjgur,26.972136,64.114571,0,1,1,1,0.0,0.0,1,0,Bombing/Explosion,3,Private Citizens & Property,14,Pakistan,153.0,Unknown,0.0,0,999.0,0.0,0.0,Explosives,6,0.0,1.0,0,0.0,999,2017-07-08,2017-07-08PakistanUnknown
4,201707080012,2017,7,8,1,Iraq,95,Middle East & North Africa,10,Jurf al-Sakhar,32.867008,44.220455,0,1,1,1,0.0,0.0,1,0,Hostage Taking (Kidnapping),6,Private Citizens & Property,14,Iraq,95.0,Asa'ib Ahl al-Haqq,0.0,0,999.0,999.0,0.0,Unknown,13,20.0,0.0,0,1.0,0,2017-07-08,2017-07-08IraqAsa'ib Ahl al-Haqq


### **API**

In [8]:
df_api = pd.read_csv("./data/combined_data.csv")

In [9]:
df_api.head()

Unnamed: 0,event_date,country,disorder_type,actor1
0,2017-12-31,Israel,Demonstrations,Protesters (Israel)
1,2017-12-31,India,Demonstrations,Protesters (India)
2,2017-12-31,Nigeria,Political violence,Rioters (Nigeria)
3,2017-12-31,Tunisia,Political violence,Unidentified Armed Group (Tunisia)
4,2017-12-31,Tunisia,Demonstrations,Rioters (Tunisia)


## ***Entendiendo los datos***

### **Base de datos**

#### *Información sobre GTD*

In [10]:
df_gtd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88379 entries, 0 to 88378
Data columns (total 41 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   eventid             88379 non-null  int64         
 1   iyear               88379 non-null  int64         
 2   imonth              88379 non-null  int64         
 3   iday                88379 non-null  int64         
 4   extended            88379 non-null  int64         
 5   country_txt         88379 non-null  object        
 6   country             88379 non-null  int64         
 7   region_txt          88379 non-null  object        
 8   region              88379 non-null  int64         
 9   city                88379 non-null  object        
 10  latitude            88379 non-null  float64       
 11  longitude           88379 non-null  float64       
 12  vicinity            88379 non-null  int64         
 13  crit1               88379 non-null  int64     

#### *Valores nulos*

In [11]:
df_gtd.isna().sum()

eventid               0
iyear                 0
imonth                0
iday                  0
extended              0
country_txt           0
country               0
region_txt            0
region                0
city                  0
latitude              0
longitude             0
vicinity              0
crit1                 0
crit2                 0
crit3                 0
doubtterr             0
multiple              0
success               0
suicide               0
attacktype1_txt       0
attacktype1           0
targtype1_txt         0
targtype1             0
natlty1_txt           0
natlty1               0
gname                 0
guncertain1           0
individual            0
nperps                0
nperpcap              0
claimed               0
weaptype1_txt         0
weaptype1             0
nkill                 0
nwound                0
property              0
ishostkid             0
INT_ANY               0
date                  0
date_country_actor    0
dtype: int64

### *Columnas*

In [12]:
df_gtd.columns

Index(['eventid', 'iyear', 'imonth', 'iday', 'extended', 'country_txt',
       'country', 'region_txt', 'region', 'city', 'latitude', 'longitude',
       'vicinity', 'crit1', 'crit2', 'crit3', 'doubtterr', 'multiple',
       'success', 'suicide', 'attacktype1_txt', 'attacktype1', 'targtype1_txt',
       'targtype1', 'natlty1_txt', 'natlty1', 'gname', 'guncertain1',
       'individual', 'nperps', 'nperpcap', 'claimed', 'weaptype1_txt',
       'weaptype1', 'nkill', 'nwound', 'property', 'ishostkid', 'INT_ANY',
       'date', 'date_country_actor'],
      dtype='object')

### *Análisis numérico*

In [13]:
df_gtd.describe()

Unnamed: 0,eventid,iyear,imonth,iday,extended,country,region,latitude,longitude,vicinity,crit1,crit2,crit3,doubtterr,multiple,success,suicide,attacktype1,targtype1,natlty1,guncertain1,individual,nperps,nperpcap,claimed,weaptype1,nkill,nwound,property,ishostkid,INT_ANY,date
count,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379.0,88379
mean,201109100000.0,2011.0247,6.498965,15.688139,0.063092,123.581903,7.804196,26.067622,46.893057,0.411659,0.999943,0.999955,0.999864,0.0,0.157707,0.867321,0.055839,3.322,9.190611,122.98563,0.099153,0.004119,762.685593,15.924575,60.716969,6.38,2.264848,3.441564,168.701343,2.477014,533.981138,2011-07-10 12:20:48.893968128
min,197001000000.0,1970.0,1.0,1.0,0.0,4.0,1.0,-42.250458,-157.858333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,4.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1970-01-01 00:00:00
25%,201011200000.0,2010.0,4.0,8.0,0.0,92.0,6.0,15.325443,35.788337,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,2.0,3.0,92.0,0.0,0.0,999.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,2010-11-17 00:00:00
50%,201401100000.0,2014.0,6.0,16.0,0.0,95.0,8.0,32.314022,44.50972,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,3.0,10.0,95.0,0.0,0.0,999.0,0.0,0.0,6.0,0.0,0.0,1.0,0.0,999.0,2014-01-12 00:00:00
75%,201510100000.0,2015.0,9.0,23.0,0.0,159.0,10.0,34.208416,70.895905,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,3.0,14.0,160.0,0.0,0.0,999.0,0.0,0.0,6.0,2.0,2.0,1.0,0.0,999.0,2015-10-05 00:00:00
max,201712300000.0,2017.0,12.0,31.0,1.0,1004.0,12.0,999.0,179.366667,999.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,9.0,22.0,1004.0,1.0,1.0,25000.0,999.0,999.0,13.0,1384.0,8191.0,999.0,999.0,999.0,2017-12-31 00:00:00
std,818335200.0,8.183338,3.388972,8.777002,0.24313,93.295214,2.537863,14.397166,42.998741,18.403139,0.007521,0.006727,0.011652,0.0,0.364468,0.339229,0.229612,1.853455,6.667427,85.363614,0.298868,0.064045,434.895527,124.686065,238.36593,2.046983,11.567308,41.528274,373.71301,48.750956,498.153228,


### *Análisis de las columnas binarias*

In [14]:
def obtener_valores_unicos(df, columnas):
    valores_unicos = {}
    for columna in columnas:
        valores_unicos[columna] = df[columna].unique()
    return valores_unicos

columnas_a_verificar = ["extended", "vicinity", "crit1", "crit2", "crit3", "doubtterr", "multiple", "success", "suicide", "guncertain1", "individual", "claimed", "property", "ishostkid", "INT_ANY"]
valores_unicos = obtener_valores_unicos(df_gtd, columnas_a_verificar)
valores_unicos

{'extended': array([0, 1], dtype=int64),
 'vicinity': array([  0,   1, 999], dtype=int64),
 'crit1': array([1, 0], dtype=int64),
 'crit2': array([1, 0], dtype=int64),
 'crit3': array([1, 0], dtype=int64),
 'doubtterr': array([0.]),
 'multiple': array([0., 1.]),
 'success': array([1, 0], dtype=int64),
 'suicide': array([0, 1], dtype=int64),
 'guncertain1': array([0., 1.]),
 'individual': array([0, 1], dtype=int64),
 'claimed': array([  0.,   1., 999.]),
 'property': array([  0,   1, 999], dtype=int64),
 'ishostkid': array([  0.,   1., 999.]),
 'INT_ANY': array([999,   0,   1], dtype=int64)}

### **API**

#### *Información sobre la ACLED API*

In [15]:
df_api.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 377337 entries, 0 to 377336
Data columns (total 4 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   event_date     377337 non-null  object
 1   country        377337 non-null  object
 2   disorder_type  377337 non-null  object
 3   actor1         377337 non-null  object
dtypes: object(4)
memory usage: 11.5+ MB


#### *Valores nulos*

In [16]:
df_api.isna().sum()

event_date       0
country          0
disorder_type    0
actor1           0
dtype: int64

### *Columnas*

In [17]:
df_api.columns

Index(['event_date', 'country', 'disorder_type', 'actor1'], dtype='object')

## ***Configurando Great Expectations***

In [18]:
def gx_validation(gx_context, asset_name, suite_name, df, gx_expectations_object):
    # Connect data to GX
    try:
        data_source = gx_context.data_sources.add_pandas("pandas")
    except gx.exceptions.DataContextError:
        data_source = gx_context.data_sources.get("pandas")
    
    data_asset = data_source.add_dataframe_asset(name=asset_name)
    
    batch_definition_name = f"batch_definition_{asset_name}"
    try:    
        batch_definition = data_asset.get_batch_definition(name=batch_definition_name)
    except KeyError:
        batch_definition = data_asset.add_batch_definition_whole_dataframe(name=batch_definition_name)
    
    batch_parameters = {"dataframe": df}
    batch = batch_definition.get_batch(batch_parameters=batch_parameters)
    
    # Suites
    try:
        suite = gx_context.suites.add(gx.ExpectationSuite(name=suite_name))
    except gx.exceptions.DataContextError:
        gx_context.suites.delete(name=suite_name)
        suite = gx_context.suites.add(gx.ExpectationSuite(name=suite_name))
    
    for expectation in gx_expectations_object:
        suite.add_expectation(expectation)
    
    # Validate data
    validation_results = batch.validate(suite)
    
    # Results
    print(f"The validation for {asset_name} was succesful? ➤ {validation_results.success}")
    
    for result in validation_results["results"]:
        print(result)

## ***Creando las Expectations***

### **Database**

In [19]:
gtd_expectations = [
    gxe.ExpectTableColumnsToMatchOrderedList(
        column_list=[
            'eventid', 'iyear', 'imonth', 'iday', 'extended', 'country_txt',
            'country', 'region_txt', 'region', 'city', 'latitude', 'longitude',
            'vicinity', 'crit1', 'crit2', 'crit3', 'doubtterr', 'multiple',
            'success', 'suicide', 'attacktype1_txt', 'attacktype1', 'targtype1_txt',
            'targtype1', 'natlty1_txt', 'natlty1', 'gname', 'guncertain1',
            'individual', 'nperps', 'nperpcap', 'claimed', 'weaptype1_txt',
            'weaptype1', 'nkill', 'nwound', 'property', 'ishostkid', 'INT_ANY',
            'date', 'date_country_actor'
            ]
        ),
    gxe.ExpectColumnValuesToBeBetween(column="iyear", min_value=1970, max_value=2017),
    gxe.ExpectColumnValuesToBeBetween(column="imonth", min_value=1, max_value=12),
    gxe.ExpectColumnValuesToBeBetween(column="iday", min_value=1, max_value=31),
    gxe.ExpectColumnValuesToBeBetween(column="country", min_value=4, max_value=1004),
    gxe.ExpectColumnValuesToBeBetween(column="region", min_value=1, max_value=12),
    gxe.ExpectColumnValuesToBeBetween(column="latitude", min_value=-42.250458, max_value=999.0),
    gxe.ExpectColumnValuesToBeBetween(column="longitude", min_value=-157.858333, max_value=179.366667),
    gxe.ExpectColumnValuesToBeBetween(column="attacktype1", min_value=1, max_value=9),
    gxe.ExpectColumnValuesToBeBetween(column="targtype1", min_value=1, max_value=22),
    gxe.ExpectColumnValuesToBeBetween(column="natlty1", min_value=4, max_value=1004),
    gxe.ExpectColumnValuesToBeBetween(column="nperps", min_value=0, max_value=25000),
    gxe.ExpectColumnValuesToBeBetween(column="nperpcap", min_value=0, max_value=999),
    gxe.ExpectColumnValuesToBeBetween(column="weaptype1", min_value=1, max_value=13),
    gxe.ExpectColumnValuesToBeBetween(column="nkill", min_value=0, max_value=1384),
    gxe.ExpectColumnValuesToBeBetween(column="nwound", min_value=0, max_value=8191),
    gxe.ExpectColumnValuesToBeInSet(column="extended", value_set=[0, 1]),
    gxe.ExpectColumnValuesToBeInSet(column="vicinity", value_set=[0, 1, 999]),
    gxe.ExpectColumnValuesToBeInSet(column="crit1", value_set=[1, 0]),
    gxe.ExpectColumnValuesToBeInSet(column="crit2", value_set=[1, 0]),
    gxe.ExpectColumnValuesToBeInSet(column="crit3", value_set=[1, 0]),
    gxe.ExpectColumnValuesToBeInSet(column="doubtterr", value_set=[0.0]),
    gxe.ExpectColumnValuesToBeInSet(column="multiple", value_set=[0.0, 1.0]),
    gxe.ExpectColumnValuesToBeInSet(column="success", value_set=[1, 0]),
    gxe.ExpectColumnValuesToBeInSet(column="suicide", value_set=[0, 1]),
    gxe.ExpectColumnValuesToBeInSet(column="guncertain1", value_set=[0.0, 1.0]),
    gxe.ExpectColumnValuesToBeInSet(column="individual", value_set=[0, 1]),
    gxe.ExpectColumnValuesToBeInSet(column="claimed", value_set=[0.0, 1.0, 999.0]),
    gxe.ExpectColumnValuesToBeInSet(column="property", value_set=[0, 1, 999]),
    gxe.ExpectColumnValuesToBeInSet(column="ishostkid", value_set=[0.0, 1.0, 999.0]),
    gxe.ExpectColumnValuesToBeInSet(column="INT_ANY", value_set=[999, 0, 1]),
    gxe.ExpectColumnValuesToNotBeNull(column="eventid"),
    gxe.ExpectColumnValuesToNotBeNull(column="iyear"),
    gxe.ExpectColumnValuesToNotBeNull(column="imonth"),
    gxe.ExpectColumnValuesToNotBeNull(column="iday"),
    gxe.ExpectColumnValuesToNotBeNull(column="extended"),
    gxe.ExpectColumnValuesToNotBeNull(column="country_txt"),
    gxe.ExpectColumnValuesToNotBeNull(column="country"),
    gxe.ExpectColumnValuesToNotBeNull(column="region_txt"),
    gxe.ExpectColumnValuesToNotBeNull(column="region"),
    gxe.ExpectColumnValuesToNotBeNull(column="city"),
    gxe.ExpectColumnValuesToNotBeNull(column="latitude"),
    gxe.ExpectColumnValuesToNotBeNull(column="longitude"),
    gxe.ExpectColumnValuesToNotBeNull(column="vicinity"),
    gxe.ExpectColumnValuesToNotBeNull(column="crit1"),
    gxe.ExpectColumnValuesToNotBeNull(column="crit2"),
    gxe.ExpectColumnValuesToNotBeNull(column="crit3"),
    gxe.ExpectColumnValuesToNotBeNull(column="doubtterr"),
    gxe.ExpectColumnValuesToNotBeNull(column="multiple"),
    gxe.ExpectColumnValuesToNotBeNull(column="success"),
    gxe.ExpectColumnValuesToNotBeNull(column="suicide"),
    gxe.ExpectColumnValuesToNotBeNull(column="attacktype1_txt"),
    gxe.ExpectColumnValuesToNotBeNull(column="attacktype1"),
    gxe.ExpectColumnValuesToNotBeNull(column="targtype1_txt"),
    gxe.ExpectColumnValuesToNotBeNull(column="targtype1"),
    gxe.ExpectColumnValuesToNotBeNull(column="natlty1_txt"),
    gxe.ExpectColumnValuesToNotBeNull(column="natlty1"),
    gxe.ExpectColumnValuesToNotBeNull(column="gname"),
    gxe.ExpectColumnValuesToNotBeNull(column="guncertain1"),
    gxe.ExpectColumnValuesToNotBeNull(column="individual"),
    gxe.ExpectColumnValuesToNotBeNull(column="nperps"),
    gxe.ExpectColumnValuesToNotBeNull(column="nperpcap"),
    gxe.ExpectColumnValuesToNotBeNull(column="claimed"),
    gxe.ExpectColumnValuesToNotBeNull(column="weaptype1_txt"),
    gxe.ExpectColumnValuesToNotBeNull(column="weaptype1"),
    gxe.ExpectColumnValuesToNotBeNull(column="nkill"),
    gxe.ExpectColumnValuesToNotBeNull(column="nwound"),
    gxe.ExpectColumnValuesToNotBeNull(column="property"),
    gxe.ExpectColumnValuesToNotBeNull(column="ishostkid"),
    gxe.ExpectColumnValuesToNotBeNull(column="INT_ANY"),
    gxe.ExpectColumnValuesToNotBeNull(column="date"),
    gxe.ExpectColumnValuesToNotBeNull(column="date_country_actor"),
    gxe.ExpectColumnValuesToBeOfType(column="eventid", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="iyear", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="imonth", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="iday", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="extended", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="country_txt", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="country", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="region_txt", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="region", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="city", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="latitude", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="longitude", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="vicinity", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="crit1", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="crit2", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="crit3", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="doubtterr", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="multiple", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="success", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="suicide", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="attacktype1_txt", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="attacktype1", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="targtype1_txt", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="targtype1", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="natlty1_txt", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="natlty1", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="gname", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="guncertain1", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="individual", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="nperps", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="nperpcap", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="claimed", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="weaptype1_txt", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="weaptype1", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="nkill", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="nwound", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="property", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="ishostkid", type_="float64"),
    gxe.ExpectColumnValuesToBeOfType(column="INT_ANY", type_="int64"),
    gxe.ExpectColumnValuesToBeOfType(column="date", type_="datetime64"),
    gxe.ExpectColumnValuesToBeOfType(column="date_country_actor", type_="str")
]    

### **API**

In [20]:
api_exceptions = [
    gxe.ExpectTableColumnsToMatchOrderedList(
        column_list=[
            'event_date',
            'country',
            'disorder_type',
            'actor1'
            ]
    ),
    gxe.ExpectColumnValuesToNotBeNull(column="event_date"),
    gxe.ExpectColumnValuesToNotBeNull(column="country"),
    gxe.ExpectColumnValuesToNotBeNull(column="disorder_type"),
    gxe.ExpectColumnValuesToNotBeNull(column="actor1"),
    gxe.ExpectColumnValuesToBeOfType(column="event_date", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="country", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="disorder_type", type_="str"),
    gxe.ExpectColumnValuesToBeOfType(column="actor1", type_="str")
]

## ***Validación***

In [21]:
context = gx.get_context(mode="file")

11/13/2024 02:14:15 PM FileDataContext loading fluent config
11/13/2024 02:14:15 PM Loading 'datasources' ->
[]
  timestamp = datetime.utcnow().replace(tzinfo=tzutc())


### **Database**

In [22]:
gx_validation(context, "gtd", "gtd_suite", df_gtd, gtd_expectations)

11/13/2024 02:14:15 PM Saving 1 Fluent Datasources to c:\Users\marti\OneDrive\Escritorio - PC\Ingenieria de Datos e IA - UAO\Semestre 4\ETL\GlobalTerrorismAnalysis_ETL\gx\great_expectations.yml
11/13/2024 02:14:15 PM PandasDatasource.dict() - missing `config_provider`, skipping config substitution
11/13/2024 02:14:15 PM Saving 1 Fluent Datasources to c:\Users\marti\OneDrive\Escritorio - PC\Ingenieria de Datos e IA - UAO\Semestre 4\ETL\GlobalTerrorismAnalysis_ETL\gx\great_expectations.yml
11/13/2024 02:14:15 PM DataFrameAsset.dict() - missing `config_provider`, skipping config substitution
11/13/2024 02:14:15 PM PandasDatasource.dict() - missing `config_provider`, skipping config substitution
11/13/2024 02:14:15 PM Saving 1 Fluent Datasources to c:\Users\marti\OneDrive\Escritorio - PC\Ingenieria de Datos e IA - UAO\Semestre 4\ETL\GlobalTerrorismAnalysis_ETL\gx\great_expectations.yml
11/13/2024 02:14:15 PM DataFrameAsset.dict() - missing `config_provider`, skipping config substitution
11

The validation for gtd was succesful? ➤ True
{
  "success": true,
  "expectation_config": {
    "type": "expect_table_columns_to_match_ordered_list",
    "kwargs": {
      "batch_id": "pandas-gtd",
      "column_list": [
        "eventid",
        "iyear",
        "imonth",
        "iday",
        "extended",
        "country_txt",
        "country",
        "region_txt",
        "region",
        "city",
        "latitude",
        "longitude",
        "vicinity",
        "crit1",
        "crit2",
        "crit3",
        "doubtterr",
        "multiple",
        "success",
        "suicide",
        "attacktype1_txt",
        "attacktype1",
        "targtype1_txt",
        "targtype1",
        "natlty1_txt",
        "natlty1",
        "gname",
        "guncertain1",
        "individual",
        "nperps",
        "nperpcap",
        "claimed",
        "weaptype1_txt",
        "weaptype1",
        "nkill",
        "nwound",
        "property",
        "ishostkid",
        "INT_ANY",
  

### **API**

In [23]:
gx_validation(context, "api", "api_suite", df_api, api_exceptions)

11/13/2024 02:14:39 PM Saving 1 Fluent Datasources to c:\Users\marti\OneDrive\Escritorio - PC\Ingenieria de Datos e IA - UAO\Semestre 4\ETL\GlobalTerrorismAnalysis_ETL\gx\great_expectations.yml
11/13/2024 02:14:39 PM DataFrameAsset.dict() - missing `config_provider`, skipping config substitution
11/13/2024 02:14:39 PM DataFrameAsset.dict() - missing `config_provider`, skipping config substitution
11/13/2024 02:14:39 PM PandasDatasource.dict() - missing `config_provider`, skipping config substitution
11/13/2024 02:14:39 PM Saving 1 Fluent Datasources to c:\Users\marti\OneDrive\Escritorio - PC\Ingenieria de Datos e IA - UAO\Semestre 4\ETL\GlobalTerrorismAnalysis_ETL\gx\great_expectations.yml
11/13/2024 02:14:39 PM DataFrameAsset.dict() - missing `config_provider`, skipping config substitution
11/13/2024 02:14:39 PM DataFrameAsset.dict() - missing `config_provider`, skipping config substitution
11/13/2024 02:14:39 PM PandasDatasource.dict() - missing `config_provider`, skipping config sub

The validation for api was succesful? ➤ True
{
  "success": true,
  "expectation_config": {
    "type": "expect_table_columns_to_match_ordered_list",
    "kwargs": {
      "batch_id": "pandas-api",
      "column_list": [
        "event_date",
        "country",
        "disorder_type",
        "actor1"
      ]
    },
    "meta": {},
    "id": "75eadce8-fd4e-4d6e-9a57-cebf7844200a"
  },
  "result": {
    "observed_value": [
      "event_date",
      "country",
      "disorder_type",
      "actor1"
    ]
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}
{
  "success": true,
  "expectation_config": {
    "type": "expect_column_values_to_not_be_null",
    "kwargs": {
      "batch_id": "pandas-api",
      "column": "event_date"
    },
    "meta": {},
    "id": "cac179c0-bb68-44b1-a743-13576b493214"
  },
  "result": {
    "element_count": 377337,
    "unexpected_count": 0,
    "unexpected_percent": 0.0