**Group members** :
- Max Chipani
- Jesus Gamboa
- Karen Salazar
- Paolo Gutierrez
- Luis Camarena

Assignment 5
The script should not give an error. Any mistake will be consider as 0.


# Installing and Importing Libraries

In [2]:
# !pip install selenium
# !pip install webdriver-manager

In [3]:
from selenium import webdriver
from selenium.webdriver.edge.service import Service
from selenium.webdriver.edge.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from io import StringIO
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, ElementClickInterceptedException
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from tqdm import tqdm

import re
import time
import pandas as pd
import os

This section imports the necessary libraries for automating web interactions and data processing. Selenium is used for browser automation, enabling control over the Chrome browser via WebDriver. The webdriver_manager library simplifies the setup process by automatically managing the ChromeDriver installation. Other imported libraries (re, time, pandas, os) provide essential tools for tasks like data parsing, time handling, and file management. tqdm is included for displaying progress bars, which is useful for monitoring long-running operations.

# Configuring the WebDriver Instance and Service

In [4]:
edge_options = Options()
edge_options.use_chromium = True

# service = Service('C://Users//Paolo//Desktop//WE//msedgedriver.exe')
service = Service(executable_path="chromedriver-win64/chromedriver.exe")

The code sets up options for the WebDriver, initially for Edge but then adapted for Chrome. The use of webdriver_manager eliminates the need for manually specifying the path to the ChromeDriver executable, enhancing portability and ease of setup.

# Launching the Browser Instance

In [5]:
driver = webdriver.Chrome(service=webdriver.chrome.service.Service(ChromeDriverManager().install()))

# Maximizando y configurando el zoom
driver.maximize_window()
driver.execute_script("document.body.style.zoom='100%'")

The browser instance is initiated, now using Chrome instead of Edge. The code maximizes the browser window and sets the zoom level to 100% to ensure that the webpage elements are displayed correctly, which is critical for consistent interaction during web scraping.

# Calculating the number of elections

In [6]:
# Opens the JNE page
driver.get("https://infogob.jne.gob.pe/Eleccion")
# Waits for 10 seconds to allow the page to fully load before proceeding; increase if internet connection is slow
time.sleep(10)

# Clicks on the dropdown list to display the available election types
driver.find_element(By.XPATH, '//*[@id="section"]/div[2]/div[2]/div[2]/div[1]/div').click() 
time.sleep(3) # Waits 3 seconds; increase if internet connection is slow

# Selects the presidential election option from the dropdown list
driver.find_element(By.XPATH, '//*[@id="section"]/div[2]/div[2]/div[2]/div[1]/div/div[2]/div[2]').click() 
time.sleep(3) # Waits 3 seconds; increase if internet connection is slow

In [7]:
# Clicks on the dropdown list to display the available presidential elections
driver.find_element(By.XPATH, '//*[@id="section"]/div[2]/div[2]/div[2]/div[2]/div').click()
time.sleep(3) # Waits 3 seconds; increase if internet connection is slow

# Getting the list of elections
items = driver.find_elements(By.XPATH, '//*[@id="section"]/div[2]/div[2]/div[2]/div[2]/div/div[2]')
elecciones = items[0].text.split('\n') # Extracts the text and splits it into a list by line breaks
elecciones = elecciones[1:] # Removes the first element, which is an irrelevant text ([SELECCIONE])
elecciones # Outputs the final list of elections

['PRESIDENCIAL 2021 - 2DA VUELTA',
 'PRESIDENCIAL 2021',
 'PRESIDENCIAL 2016 - 2DA VUELTA',
 'PRESIDENCIAL 2016',
 'PRESIDENCIAL 2011 - 2DA VUELTA',
 'PRESIDENCIAL 2011',
 'PRESIDENCIAL 2006 - 2DA VUELTA',
 'PRESIDENCIAL 2006',
 'PRESIDENCIAL 2001 - 2DA VUELTA',
 'PRESIDENCIAL 2001',
 'PRESIDENCIAL 2000 - 2DA VUELTA',
 'PRESIDENCIAL 2000',
 'PRESIDENCIAL 1995',
 'PRESIDENCIAL 1990 - 2DA VUELTA',
 'PRESIDENCIAL 1990',
 'PRESIDENCIAL 1985',
 'PRESIDENCIAL 1980',
 'PRESIDENCIAL 1963',
 'PRESIDENCIAL 1962',
 'PRESIDENCIAL 1956',
 'PRESIDENCIAL 1950',
 'PRESIDENCIAL 1945',
 'PRESIDENCIAL 1939',
 'PRESIDENCIAL 1936',
 'PRESIDENCIAL 1931']

In [8]:
# Gets the number of elections in the list
len(elecciones)

25

# Defining functions

## Function that extracts a table and saves it into a list

In [9]:
def extrae_tabla():
    # Find the element containing the table and get its HTML content
    tabla = driver.find_element(By.XPATH, '//*[@id="CandidatosResultados"]/div/div[1]/div[2]/div[2]').get_attribute('innerHTML')
    tabla_1 = StringIO(tabla)
    # Use pandas to read the HTML content and convert it into a DataFrame
    data = pd.read_html(tabla_1)
    # Return the first DataFrame found
    return data[0]

## Next button function

Evaluate if the electoral results table has more than one part (returns True) and move to the next part

In [10]:
 def boton_siguiente(i):
    try:
         # Wait until the next button is clickable, with a maximum wait time of 10 seconds
        next_button = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, f'//*[@id="CandidatosResultados"]/div/div[1]/div[2]/div[2]/div[4]/div[2]/ul/li[{i}]/a'))
        )
        # Click the next button
        next_button.click()
        return True
    except (TimeoutException, ElementClickInterceptedException):
        # Return False if the button is not clickable or if there's a timeout
        return False

# Obtaining the tables with the results

In [11]:
# Obtaining each table for each presidential elecction
all_data = []
for x in tqdm(range(1, len(elecciones) + 1), desc='Progreso:'):
    # Web page to Scrap
    driver.get("https://infogob.jne.gob.pe/Eleccion")
    time.sleep(3)

    # Selecting process type = "elecciones presidenciales" 
    driver.find_element(By.XPATH, '//*[@id="section"]/div[2]/div[2]/div[2]/div[1]/div').click()
    time.sleep(3)
    driver.find_element(By.XPATH, '//*[@id="section"]/div[2]/div[2]/div[2]/div[1]/div/div[2]/div[2]').click()
    time.sleep(3)

    # List of elections
    driver.find_element(By.XPATH, '//*[@id="section"]/div[2]/div[2]/div[2]/div[2]/div').click()
    time.sleep(3)

    # Selecting each presidential election
    # Used "y" to iterate between all elections starting from div[2]. div[1] = "[SELECCIONE]" on the web page.
    y = x + 1
    driver.find_element(By.XPATH, f'//*[@id="section"]/div[2]/div[2]/div[2]/div[2]/div/div[2]/div[{y}]').click()
    time.sleep(3)

    # Click on "Ver datos de la eleccion"
    driver.find_element(By.XPATH, '//*[@id="btnVerDatos"]/span').click()
    time.sleep(3)

    # click on "Candidatos y resultados"
    driver.find_element(By.XPATH, '//*[@id="section"]/div[2]/div[3]/div[1]/ul/li[2]/a').click()
    time.sleep(3)

    # Saving the election table
    # i = 3 because it is the XPATH of number of the table. Beginning in table 1
    i = 3
    print(x, elecciones[x - 1])
    while True:
        # Getting the table of results
        aux = extrae_tabla()

        # Adding the column 'Elecciones'
        aux['Elecciones'] = elecciones[x - 1]

        # Appending new table in the master data
        all_data.append(aux)
        print('Pagina: ', i - 2)

        # If exist a next table in the election page, click to the next table. Else, finish the bucle.
        if boton_siguiente(i):
            i += 1
        else:
            break


Progreso::   0%|                                                                                | 0/25 [00:00<?, ?it/s]

1 PRESIDENCIAL 2021 - 2DA VUELTA
Pagina:  1


Progreso::   4%|██▉                                                                     | 1/25 [00:47<19:06, 47.78s/it]

2 PRESIDENCIAL 2021
Pagina:  1
Pagina:  2


Progreso::   8%|█████▊                                                                  | 2/25 [01:18<14:34, 38.02s/it]

3 PRESIDENCIAL 2016 - 2DA VUELTA
Pagina:  1


Progreso::  12%|████████▋                                                               | 3/25 [02:06<15:33, 42.43s/it]

4 PRESIDENCIAL 2016
Pagina:  1
Pagina:  2


Progreso::  16%|███████████▌                                                            | 4/25 [02:35<13:00, 37.16s/it]

5 PRESIDENCIAL 2011 - 2DA VUELTA
Pagina:  1


Progreso::  20%|██████████████▍                                                         | 5/25 [03:15<12:41, 38.09s/it]

6 PRESIDENCIAL 2011
Pagina:  1
Pagina:  2


Progreso::  24%|█████████████████▎                                                      | 6/25 [03:46<11:18, 35.73s/it]

7 PRESIDENCIAL 2006 - 2DA VUELTA
Pagina:  1


Progreso::  28%|████████████████████▏                                                   | 7/25 [04:21<10:37, 35.44s/it]

8 PRESIDENCIAL 2006
Pagina:  1
Pagina:  2
Pagina:  3


Progreso::  32%|███████████████████████                                                 | 8/25 [04:47<09:11, 32.42s/it]

9 PRESIDENCIAL 2001 - 2DA VUELTA
Pagina:  1


Progreso::  36%|█████████████████████████▉                                              | 9/25 [05:21<08:46, 32.90s/it]

10 PRESIDENCIAL 2001
Pagina:  1


Progreso::  40%|████████████████████████████▍                                          | 10/25 [05:54<08:16, 33.07s/it]

11 PRESIDENCIAL 2000 - 2DA VUELTA
Pagina:  1


Progreso::  44%|███████████████████████████████▏                                       | 11/25 [06:27<07:42, 33.06s/it]

12 PRESIDENCIAL 2000
Pagina:  1
Pagina:  2


Progreso::  48%|██████████████████████████████████                                     | 12/25 [06:52<06:37, 30.54s/it]

13 PRESIDENCIAL 1995
Pagina:  1
Pagina:  2


Progreso::  52%|████████████████████████████████████▉                                  | 13/25 [07:17<05:44, 28.71s/it]

14 PRESIDENCIAL 1990 - 2DA VUELTA
Pagina:  1


Progreso::  56%|███████████████████████████████████████▊                               | 14/25 [07:50<05:31, 30.17s/it]

15 PRESIDENCIAL 1990
Pagina:  1
Pagina:  2


Progreso::  60%|██████████████████████████████████████████▌                            | 15/25 [08:15<04:44, 28.47s/it]

16 PRESIDENCIAL 1985
Pagina:  1
Pagina:  2


Progreso::  64%|█████████████████████████████████████████████▍                         | 16/25 [08:39<04:04, 27.20s/it]

17 PRESIDENCIAL 1980
Pagina:  1
Pagina:  2


Progreso::  68%|████████████████████████████████████████████████▎                      | 17/25 [09:04<03:31, 26.43s/it]

18 PRESIDENCIAL 1963
Pagina:  1


Progreso::  72%|███████████████████████████████████████████████████                    | 18/25 [09:38<03:20, 28.70s/it]

19 PRESIDENCIAL 1962
Pagina:  1


Progreso::  76%|█████████████████████████████████████████████████████▉                 | 19/25 [10:11<03:01, 30.21s/it]

20 PRESIDENCIAL 1956
Pagina:  1


Progreso::  80%|████████████████████████████████████████████████████████▊              | 20/25 [10:45<02:36, 31.26s/it]

21 PRESIDENCIAL 1950
Pagina:  1


Progreso::  84%|███████████████████████████████████████████████████████████▋           | 21/25 [11:19<02:08, 32.08s/it]

22 PRESIDENCIAL 1945
Pagina:  1


Progreso::  88%|██████████████████████████████████████████████████████████████▍        | 22/25 [11:53<01:37, 32.53s/it]

23 PRESIDENCIAL 1939
Pagina:  1


Progreso::  92%|█████████████████████████████████████████████████████████████████▎     | 23/25 [12:26<01:05, 32.93s/it]

24 PRESIDENCIAL 1936
Pagina:  1


Progreso::  96%|████████████████████████████████████████████████████████████████████▏  | 24/25 [13:00<00:33, 33.15s/it]

25 PRESIDENCIAL 1931
Pagina:  1


Progreso:: 100%|███████████████████████████████████████████████████████████████████████| 25/25 [13:34<00:00, 32.57s/it]


In [20]:
# Appending all tables
data_elecciones = pd.DataFrame()
for x in all_data:
    data_elecciones = pd.concat([data_elecciones, x], axis=0)

# Editing final data frame
data_elecciones = data_elecciones.reset_index(drop=True)
data_elecciones = data_elecciones[['Elecciones', 'ORGANIZACIÓN POLÍTICA', 'TOTAL VOTOS']]

In [21]:
# Exporting to Excel file
os.getcwd()
data_elecciones.to_excel('Data_Elecciones_group_1_ass_5_2024_2.xlsx', sheet_name='Data', index=False)