# Exploration of new data dump tables

In [None]:
import sys
sys.path.append('../')
import pipeline.sql as plsql
import pipeline.eda as pleda

In [None]:
engine = plsql.create_engine('../config.yaml')

In [None]:
role = "set role direccion_trabajo_inspections_write;"

In [None]:
tables = plsql.query("""{} SELECT table_name FROM information_schema.tables
                      WHERE table_schema='public'
                      and table_name like '{}';""".format(role, '%%dt%%'), engine)

In [None]:
def descriptive_stats(table, engine, role, schema):
    print("\nTable name: {}".format(table))
    print("Total rows: {}".format(pleda.total_rows(engine, role, schema, table)))
    table = plsql.query("set role '{}'; select * from {}.{} limit 0;".format(role, schema, table), engine)
    print("Column names: {}".format(table.columns.values))
    #nulls = pleda.proportion_nulls_all_columns(engine, role, schema, table)
    #print("Total nulls by column: {}".format(nulls['count']))
    #print("Proportion nulls by column: {}".format(nulls['proportion']))

In [None]:
for index, table in tables.iterrows():
    descriptive_stats(table.table_name, engine, "direccion_trabajo_inspections_write", "public")

## dt_fi_detallemateria

Guess of what this is: details about materials? Maybe helpful for creating "severity" feature. Looks like it goes through materials code-by-code and gives information about them.

## dt_fi_detallemateriaturno

Guess of what this is: a different table of material codes??? Significantly smaller than the other table.

## dt_fi_estadofiscalizacion

Translation: "Inspection status"

Guess: this includes a gloss for the codes, so looking at the contents should help in figuring out what's in here. Probably it's a table explaining the codes for inspection status, which isn't a variable I think we've seen before.

## dt_fi_informefiscalizacion

Translation: Inspection report!!!!

Guess: this is the information inspectors enter when they create their reports. So, this is the closest thing to our inspections data.

PROBLEM: this table is empty, even though it didn't error in the SQL Server --> Postgres conversion

## dt_fi_informemateriafisc

Translation: Inspection matter report

Guess: Maybe this is the equivalent of our long table for violations?

PROBLEM: this table is empty. This one did have an error in the SQL Server --> Postgres conversion.

## dt_fi_tipodocumento

Translation: Document type

Guess: another code table of some kind

## dt_fi_tipomateria

Translation: Material type

Guess: another code table

## dt_fi_tipoterminofiscalizacion

Translation: ?? my guess is the type of result of inspections but I'm not sure

## dt_fi_auditasignacion

Translation: Audit assignment

# BELOW THIS IS MY SECTION

## dt_mul_capitulonormasan

**Translation:** Rules chapter (chapter number from a handbook of some kind?)

**Number of rows:** 0

**Columns:** codigo, glosa

**Column translation:** code, gloss

In [None]:
plsql.query("""{} select * from dt_mul_capitulonormasan limit 5;""".format(role), engine)

## dt_mul_categnorma

**Translation:** Rule category

**Number of rows:** 2043

**Columns:** codcategoria, coddetcategoria, idnormasan, ponderacion, excluyente, urgente, vigente

**Column translation:** code_category, code_category_detailed, rule_id, weight, excluding?, urgency, validity

**Observations:** 
* Can be joined with other tables by `codcategoria`, `coddetcategoria,` and `idnormasan`, but none of the columns in this table are unique in this table

In [None]:
plsql.query("""{} select * from dt_mul_categnorma limit 5;""".format(role), engine)

In [None]:
pleda.count_distinct(engine, "direccion_trabajo_inspections_write", "public", "dt_mul_categnorma", "idnormasan")

## dt_mul_conceptonormasan

**Translation:** Rule concept

**Number of rows:** 14

**Columns:** codigo, glosa

**Column translation:** code, gloss

**Observations:**
* Can be joined with other tables with `codigo` matching `codconcepto`

In [None]:
plsql.query("""{} select * from dt_mul_conceptonormasan limit 5;""".format(role), engine)

## dt_mul_detallenormasan

**Translation:** Detailed rule

**Number of rows:** 3399

**Columns:** normasan, codigo, nl_infringida, nl_sancionada, cuerpolegal, desde, hasta, codgravedad, enunciado, hecho, codtipomoneda, junji, capitulo, codconcepto

**Column translation:** rule_id, code_in_handbook, article_location_infringed, article_location_sanctioned, legal_body, from_when, until_when, gravity_code, statement, fact, currency_type_code, ????, chapter, concept_code

**Observations:**
* `desde` and `hasta` are integers and I have zero idea what they mean
* `capitulo` can probably be joined with `dt_mul_capitulonormasan`
* `codconcepto` can probably be joined with `dt_mul_conceptonormasan`

In [None]:
plsql.query("""{} select * from dt_mul_detallenormasan limit 5;""".format(role), engine)

## dt_mul_tipocategorias

**Translation:** Category types

**Number of rows:** 86

**Columns:** codigo, glosa, vigente

**Column translation:** code, gloss, validity

**Observations:** 
* Can possibly match to other tables on `codcategoria`

In [None]:
plsql.query("""{} select * from dt_mul_tipocategorias limit 5;""".format(role), engine)

## dt_mul_tipocategoriasturno

**Translation:** Changed category types?

**Number of rows:** 32

**Columns:** codigo, glosa, vigente

**Column translation:** code, gloss, validity

**Observations:**
* I suspect that this is a different version of the tipocategorias. maybe for a different time period, this is the appropriate code table?
* If so, can be joined using `codcategorias`

In [None]:
plsql.query("""{} select * from dt_mul_tipocategoriasturno limit 5;""".format(role), engine)

## missing: dt_fi_informemateriafisc

**Translation:** 

**Number of rows:** 4825047

**Columns:** idinformat, idfiscalizacion, codmateria, informe, proced, coddetecinfrac, nrovisitacorreccion, codsituatcionfinal, codaccion

**Column translation:** 

**Observations:**

# Questions

* Should `dt_mul_capitulonormasan` be empty?
* How are `dt_mul_tipocategorias` are `dt_mul_tipocategoriasturno` related?
* What is `nrovisitacorreccion` in `dt_fi_informemateriafisc`?

In [None]:
tables

In [None]:
plsql.query("""{} select * from dt_fi_informemateriafisc;""".format(role), engine)