## Summary of data

This script allow to see the crimes from 2015-2020, and the judicaturas that had such cases

In [1]:
import pandas as pd
from pathlib import Path

In [6]:
# Definir paths
root = Path.cwd().parent
raw = root/'data/raw'
proc = root/'data/proc'

***
### Get list of judicaturas 

There are two sources for the judicaturas codes: File sent by Consejo, and scrapped data.

First I will load the file from Consejo. It only includes the judicaturas that substanciate penal cases

In [68]:
cods_20 = pd.read_excel(raw/'satje/0645_Id_Judicaturas.xlsx', skiprows = 5, usecols='B:C', skipfooter=5)
cods_20 = cods_20[cods_20['ID_JUDICATURA'].str.startswith('09')].copy()
cods_20['penal'] = 'penal'
cods_20.reset_index(drop=True, inplace=True)

Get list of ready judicaturas

In [64]:
cods_15 = list((proc/'delitos_web/').glob(r'*.xls'))
cods_15 = list(map(lambda x: x.name[8:13], cods_15))
cods_15 = pd.DataFrame(cods_15, columns=['id_judicatura'])

# Limit to only Guayaquil
cods_15 = cods_15[cods_15['id_judicatura'].str.startswith('09')].reset_index(drop=True).copy()

Merge codes to see if we get all the penal judicaturas

In [69]:
cods = pd.merge(cods_20,
    cods_15,
    how='outer',
    left_on=['ID_JUDICATURA'],
    right_on=['id_judicatura'],
    validate='1:1',
    indicator=True)

***
### Get List of crimenes

I will get the list in to sets. First, those that match the penal judicaturas, and then those than dont.

In [79]:
matched_codes = list(cods.loc[(cods['_merge']=='both'), 'id_judicatura'])
causas=pd.DataFrame()

for idj in matched_codes:

    # Load data
    df = pd.read_excel(proc/f"delitos_web/delitos_{idj}.xls")

    # Keep only crimenes
    df.drop_duplicates(subset='causa', inplace=True)

    # Add code of judicatura
    df['id_judicatura'] = idj
    
    # Append causas
    causas = pd.concat([causas, df[['id_judicatura', 'causa']]], ignore_index=True)

In [103]:
causas = causas.loc[~causas['causa'].isna()]

In [124]:
causas['nombre_simple'] = (causas
    .loc[causas['causa'].apply(lambda x: x[0].isdigit()), 'causa']
    .apply(lambda t: t[t.find(' ')+1:]))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  causas['nombre_simple'] = (causas


In [127]:
causas.loc[causas['nombre_simple'].isna(), 'nombre_simple'] = causas['causa']

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  causas.loc[causas['nombre_simple'].isna(), 'nombre_simple'] = causas['causa']


In [129]:
(causas[['causa', 'nombre_simple']]
 .drop_duplicates('causa')
 .sort_values(['nombre_simple', 'causa'])
 .to_csv(proc/'lista_delitos_15_20.csv', encoding='latin-1', index=False))

In [142]:
causas[causas['causa'].str.startswith(r'MALTRATO DE')]

Unnamed: 0,id_judicatura,causa,nombre_simple
2428,9318,MALTRATO DE NIÑOS NIÑAS Y ADOLESCENTES,MALTRATO DE NIÑOS NIÑAS Y ADOLESCENTES
3011,9322,MALTRATO DE NIÑOS NIÑAS Y ADOLESCENTES,MALTRATO DE NIÑOS NIÑAS Y ADOLESCENTES
3430,9327,MALTRATO DE NIÑOS NIÑAS Y ADOLESCENTES,MALTRATO DE NIÑOS NIÑAS Y ADOLESCENTES
