## Cleaning Zonas de Patrullaje 2018 - RAW CSV 

This notebook is going to be used to extract and transform the raw dataset into a smaller and more usable one. In other words, we are cleaning the dataset using Pandas, Python and Jupyter Notebooks.

In [1]:
# We are importing the necessary libraries to start working

import pandas as pd
import numpy as np
from os import path

In [10]:
# Setting filename and location
filename = path.join("..","Raw","zonas-de-patrullaje-2018.csv")

# Read Purchasing File and store into Pandas data frame
zonas_patrullaje_df = pd.read_csv(filename, sep=';')

zonas_patrullaje_df.head()

Unnamed: 0,Geopoint,Geoshape,Alcaldía,Zona,Sector 18,Clave sector,ZP,Área km2,x,y,Fecha creación,Año,Mes,Día
0,"19.4559485754, -99.1339187632","{""type"": ""Polygon"", ""coordinates"": [[[-99.1373...",CUAUHTEMOC,CENTRO,TLATELOLCO,5,6,0.599031,-99.133437,19.455969,2018/08/17,2018,8,17
1,"19.4489311584, -99.1492549723","{""type"": ""Polygon"", ""coordinates"": [[[-99.1529...",CUAUHTEMOC,CENTRO,BUENAVISTA,2,6,0.542691,-99.149153,19.448988,2018/08/17,2018,8,17
2,"19.4466167038, -99.1372059309","{""type"": ""Polygon"", ""coordinates"": [[[-99.1388...",CUAUHTEMOC,CENTRO,BUENAVISTA,2,10,0.139906,-99.136628,19.446343,2018/08/17,2018,8,17
3,"19.4345863188, -99.1559474685","{""type"": ""Polygon"", ""coordinates"": [[[-99.1587...",CUAUHTEMOC,CENTRO,REVOLUCION,3,5,0.263673,-99.156147,19.434677,2018/08/17,2018,8,17
4,"19.4287224255, -99.1566989128","{""type"": ""Polygon"", ""coordinates"": [[[-99.1544...",CUAUHTEMOC,CENTRO,REVOLUCION,3,7,0.294612,-99.156832,19.428315,2018/08/17,2018,8,17


In [12]:
# Dropping unneeded columns from the dataframe: Zona, clave sector, zp, Fecha Creación, Mes, Día

zonas_patrullaje_df.drop(columns=['Zona', 'Clave sector','ZP','Fecha creación','Año','Mes','Día'], inplace=True)

zonas_patrullaje_df.head()

Unnamed: 0,Geopoint,Geoshape,Alcaldía,Sector 18,Área km2,x,y
0,"19.4559485754, -99.1339187632","{""type"": ""Polygon"", ""coordinates"": [[[-99.1373...",CUAUHTEMOC,TLATELOLCO,0.599031,-99.133437,19.455969
1,"19.4489311584, -99.1492549723","{""type"": ""Polygon"", ""coordinates"": [[[-99.1529...",CUAUHTEMOC,BUENAVISTA,0.542691,-99.149153,19.448988
2,"19.4466167038, -99.1372059309","{""type"": ""Polygon"", ""coordinates"": [[[-99.1388...",CUAUHTEMOC,BUENAVISTA,0.139906,-99.136628,19.446343
3,"19.4345863188, -99.1559474685","{""type"": ""Polygon"", ""coordinates"": [[[-99.1587...",CUAUHTEMOC,REVOLUCION,0.263673,-99.156147,19.434677
4,"19.4287224255, -99.1566989128","{""type"": ""Polygon"", ""coordinates"": [[[-99.1544...",CUAUHTEMOC,REVOLUCION,0.294612,-99.156832,19.428315


In [13]:
# Checking for null values, if no null values are found nothing done in this step
zonas_patrullaje_df.count()

Geopoint     698
Geoshape     698
Alcaldía     698
Sector 18    698
Área km2     698
x            698
y            698
dtype: int64

In [24]:
for column in zonas_patrullaje_df:
    print(f"Column {column} has {len(zonas_patrullaje_df[column].value_counts())} unique values")

Column Geopoint has 698 unique values
Column Geoshape has 698 unique values
Column Alcaldía has 16 unique values
Column Sector 18 has 71 unique values
Column Área km2 has 698 unique values
Column x has 696 unique values
Column y has 697 unique values


In [28]:
#Exporting clean dataset version 1 of Zonas de Patrullaje
fileExport = path.join("..","Clean","clean_ZonasPatrullaje.csv")
zonas_patrullaje_df.to_csv(fileExport, sep=';', index=False)

In [29]:
#Checking the exported file
exportFile_df = pd.read_csv(fileExport, sep=';')

exportFile_df.head()

Unnamed: 0,Geopoint,Geoshape,Alcaldía,Sector 18,Área km2,x,y
0,"19.4559485754, -99.1339187632","{""type"": ""Polygon"", ""coordinates"": [[[-99.1373...",CUAUHTEMOC,TLATELOLCO,0.599031,-99.133437,19.455969
1,"19.4489311584, -99.1492549723","{""type"": ""Polygon"", ""coordinates"": [[[-99.1529...",CUAUHTEMOC,BUENAVISTA,0.542691,-99.149153,19.448988
2,"19.4466167038, -99.1372059309","{""type"": ""Polygon"", ""coordinates"": [[[-99.1388...",CUAUHTEMOC,BUENAVISTA,0.139906,-99.136628,19.446343
3,"19.4345863188, -99.1559474685","{""type"": ""Polygon"", ""coordinates"": [[[-99.1587...",CUAUHTEMOC,REVOLUCION,0.263673,-99.156147,19.434677
4,"19.4287224255, -99.1566989128","{""type"": ""Polygon"", ""coordinates"": [[[-99.1544...",CUAUHTEMOC,REVOLUCION,0.294612,-99.156832,19.428315
