<a href="https://colab.research.google.com/github/Marco90v/Scraping_CNE/blob/main/Scraping_de_cne.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Data extraction from the Venezuelan CNE (National Electoral Center) web platform**.

The idea of developing this code began out of curiosity, when I saw a billing system of an electronics store, I could see that at the time of generating the invoice the customer's ID was entered and with this information the name and address of the buyer was extracted, this data was used to generate the invoice.
I always wanted to replicate this functionality, so on different occasions I investigated how to do it, which I will explain below.

The CNE makes all these data available to the public on its web page, for this you only need to enter the web platform and enter the ID number. Looking at the code of the page, I could see that a GET request is made to a PHP file, it returns an HTML structure to be displayed on the web, at this point just examine the HTML tags and performing Web Scraping extracts the data supplied or required.

I chose python, for the ease of use both in its syntax and its libraries, also for the possibility of using in Google Colab and have a faster execution than on my personal computer.


In [64]:
import requests
import csv
from bs4 import BeautifulSoup
import json

JSON Design:
---
```json
[
  "cedula":{
    "cedula":"",
    "nombre":"",
    "estado":"",
    "municipio":"",
    "parroquia":"",
    "centro":"",
    "direccion":""
  }
]
```

In [65]:
def consulta2(cedula=000000):
  registro = {}
  url = "http://www.cne.gob.ve/web/registro_electoral/ce.php?nacionalidad=V&cedula="+str(cedula)
  response = requests.get(url)
  html = BeautifulSoup(response.text, 'html.parser')
  td = html.find_all('td')
  existe = False

  for num,item in enumerate(td):
    if num == 11 and item.text[0]== 'V':
      existe = True
    elif num == 11 and item.text[0] != 'V':
      registro["error"] = "Cedula: " + str(cedula) + " no registrada"
    if existe:
      if num == 11:
        registro["cedula"] = str(item.text)
      if num == 13:
        registro["nombre"] = str(item.text)
      if num == 15:
        registro["estado"] = str(item.text)
      if num == 17:
        registro["municipio"] = str(item.text)
      if num == 19:
        registro["parroquia"] = str(item.text)
      if num == 21:
        registro["centro"] = str(item.text)
      if num == 23:
        registro["direccion"] = str(item.text)
  return registro

In [67]:
data = {}
lista = range(20000000,20000003)
for num in lista:
  d = consulta2( int(num) )
  data[str(num)]=d
# convert into JSON:
j = json.dumps(data)
print(j)

{"20000000": {"cedula": "V-20000000", "nombre": "JOSE GREGORIO URBANEJA CAMPOS", "estado": "EDO.NVA.ESPARTA", "municipio": "MP. MANEIRO", "parroquia": "CM. PAMPATAR", "centro": "UNIDAD EDUCATIVA ESTADAL LUISA ROSAS DE VELASQUEZ", "direccion": "SECTOR CAMPEARE DERECHA CALLE 3 DE MAYO. IZQUIERDA CALLE EL CRISTO DEL BUEN VIAJE. FRENTE CALLE EL LICEO DIAGONAL A LA UNIDAD EDUCATIVA ANGEL NORIEGA PEREZ EDIFICIO"}, "20000001": {"cedula": "V-20000001", "nombre": "JOSE MERCEDES OLIVEROS GOMEZ", "estado": "EDO.NVA.ESPARTA", "municipio": "MP. MANEIRO", "parroquia": "CM. PAMPATAR", "centro": "UNIDAD EDUCATIVA ESTADAL LUISA ROSAS DE VELASQUEZ", "direccion": "SECTOR CAMPEARE DERECHA CALLE 3 DE MAYO. IZQUIERDA CALLE EL CRISTO DEL BUEN VIAJE. FRENTE CALLE EL LICEO DIAGONAL A LA UNIDAD EDUCATIVA ANGEL NORIEGA PEREZ EDIFICIO"}, "20000002": {"cedula": "V-20000002", "nombre": "GINA JOSE DIAZ RODRIGUEZ", "estado": "EDO.NVA.ESPARTA", "municipio": "MP. MANEIRO", "parroquia": "CM. PAMPATAR", "centro": "UNIDAD

## Example of a result:
----
```json
{
  "20000000": {
    "cedula": "V-20000000", 
    "nombre": "JOSE GREGORIO URBANEJA CAMPOS", 
    "estado": "EDO.NVA.ESPARTA", 
    "municipio": "MP. MANEIRO", 
    "parroquia": "CM. PAMPATAR", 
    "centro": "UNIDAD EDUCATIVA ESTADAL LUISA ROSAS DE VELASQUEZ", 
    "direccion": "SECTOR CAMPEARE DERECHA CALLE 3 DE MAYO. IZQUIERDA CALLE EL CRISTO DEL BUEN VIAJE. FRENTE CALLE EL LICEO DIAGONAL A LA UNIDAD EDUCATIVA ANGEL NORIEGA PEREZ EDIFICIO"
  }, 
  "20000001": {
      "cedula": "V-20000001", 
      "nombre": "JOSE MERCEDES OLIVEROS GOMEZ", 
      "estado": "EDO.NVA.ESPARTA", 
      "municipio": "MP. MANEIRO", 
      "parroquia": "CM. PAMPATAR", 
      "centro": "UNIDAD EDUCATIVA ESTADAL LUISA ROSAS DE VELASQUEZ", 
      "direccion": "SECTOR CAMPEARE DERECHA CALLE 3 DE MAYO. IZQUIERDA CALLE EL CRISTO DEL BUEN VIAJE. FRENTE CALLE EL LICEO DIAGONAL A LA UNIDAD EDUCATIVA ANGEL NORIEGA PEREZ EDIFICIO"
  }, 
  "20000002": {
      "cedula": "V-20000002", 
      "nombre": "GINA JOSE DIAZ RODRIGUEZ", 
      "estado": "EDO.NVA.ESPARTA", 
      "municipio": "MP. MANEIRO", 
      "parroquia": "CM. PAMPATAR", 
      "centro": "UNIDAD EDUCATIVA ESTADAL LUISA ROSAS DE VELASQUEZ", 
      "direccion": "SECTOR CAMPEARE DERECHA CALLE 3 DE MAYO. IZQUIERDA CALLE EL CRISTO DEL BUEN VIAJE. FRENTE CALLE EL LICEO DIAGONAL A LA UNIDAD EDUCATIVA ANGEL NORIEGA PEREZ EDIFICIO"
  }
}
```

In [68]:
with open('data.json', 'w') as file:
  file.write(j)