## Assignment 5 - Group 1

First we set a most appropriate notebook display format.

In [None]:
from IPython.display import display, HTML

display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }a
</style>
"""))

We import the libraries that we will use.

In [None]:
import os
import pandas as pd
import numpy as np
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select

For now, we establish the folder where the Chrome driver is located as the directory.

In [None]:
os.chdir(r'C:\Users\HP\Documents\qlab\chromedriver-win64') 

In this first block we will explain how the information from the result of the first presidential election chosen in a DataFrame was extracted. Later we will use this same procedure to extract information from the results of all available presidential elections through a loop.

We establish the driver and use it to access the provided web page:

In [None]:
driver=webdriver.Chrome() # With this we establish the controller located in the directory that we previously specified.

url='https://infogob.jne.gob.pe/Eleccion'
driver.get(url) # With this we access the provided web page.

Using the xpath of the “Tipo de Proceso” box we find and identify it. Then we click on it and access to the menu of options for the type of electoral processes available.

In [None]:
# With this we search and find said box using its XPATH. 
process=driver.find_element(By.XPATH,"/html/body/div[1]/section/div[2]/div[2]/div[2]/div[1]/div") 
# With this we click on the box. In this way the available options are displayed.
process.click() 

# We added this so that the page, with the click made, suspends subsequent executions for 2 seconds.
# This is so that if the entire notebook is executed together, the code can have enough time to access the information while it appears after clicking or something else.
time.sleep(2)

Now using the xpath of the "Elecciones presidenciales" option we identify and choose this option.

In [None]:
# With this we search and identify said option.
presidential=driver.find_element(By.XPATH,"/html/body/div[1]/section/div[2]/div[2]/div[2]/div[1]/div/div[2]/div[2]") 
# With this we click on the option. This is how we choose it.
presidential.click() 

time.sleep(2)

Once the type of process has been chosen, we proceed to choose the specific presidential election from which we want to extract the required information. Again, using the xpath of the box that contains the available options we find, identify and click on it to display the available options.

In [None]:
# xpath of the said box. 
xpath="/html/body/div[1]/section/div[2]/div[2]/div[2]/div[2]/div" 
# With this we search and identify said box.
election=driver.find_element(By.XPATH,xpath) 
# With this, we click on said box and display the available options.
election.click() 

time.sleep(2)

Before choosing the presidential election we are looking for, we implement the following lines of code to save the text of the name of the election we are looking for and the number of available elections (these will be useful later).

In [None]:
# The xpath of the options menu to be selected is the same as the one we defined previously for the box with the difference that at the end it adds '/select".
select_xpath=xpath+"/select" 
# We search and identify the selectable options using its xpath. 
s0=driver.find_element(By.XPATH,"/html/body/div[1]/section/div[2]/div[2]/div[2]/div[2]/div/select")
# With Select we change the way of interacting with this menu of selectable options.
s1=Select(s0)
# With the 'options' method we invoke all the menu options.
s2=s1.options
# We obtain the total number of available elections. This will be useful to us later.
len_opt=len(s2)
# We obtain the name of the election from which we want to extract the requested information. It will be useful to us later.
name_election=s2[1].get_attribute('text')

Now using the xpath of the "Presidencial 2021 - Segunda Vuelta" option we identify and choose this one.

In [None]:
# The xpath of the option is the same as the one in the box with the difference that it adds "/div[2]/div[2]"
div_xpath=xpath+'/div[2]/div[2]' 
# We search and identify the presidential election that we want.
date=driver.find_element(By.XPATH, div_xpath)
# We click on this option to choose it.
date.click()

time.sleep(1)

Now we search and identify the button that allows us to access the information we are looking for. Then we click on it to do so.

In [None]:
button=driver.find_element(By.XPATH,"/html/body/div[1]/section/div[2]/div[2]/div[3]/div/button")
button.click()

When entering the page, by default it shows us the "Normativa" tab. To switch to the "Candidatos y resultados" tab, which has the information that interests us, again, we look for and identify the button for this tab and then we click on it.

In [None]:
# In this case we search by name since this is available and it is a simpler way.
results=driver.find_element(By.NAME,"candidatos-y-resultados")
results.click()

time.sleep(2)

Once in the correct tab we can view the information of interest in a table. To extract it, we identify and search for the table using its xpath. Then we get its HTML code. Finally using this code we create a DataFrame that contains the same data as the table.

In [None]:
# We search and identify the table.
table = driver.find_element(By.XPATH, "/html/body/div[1]/section/div[2]/div[3]/div[3]/div/div/div/div[1]/div[2]/div[2]" )
# With the following we obtain the HTML code of the table.
table_html = table.get_attribute('innerHTML')
# We use the table's HTML code to create a DataFrame with the same data. 
pd_table = pd.read_html(table_html)[0]
# Let's see the DataFrame obtained:
pd_table

In [None]:
Now we make the following arrangements to the DataFrame:

In [None]:
# We only keep the columns of the political organization and the total votes (since they are the only ones that interest us).
pd_table=pd_table[['ORGANIZACIÓN POLÍTICA','TOTAL VOTOS']]
# We create a column that has in all its observations the name of the election we chose.
pd_table['Elecciones']=name_election
# We reorder the columns in the way the question requires.
pd_table=pd_table[['Elecciones','ORGANIZACIÓN POLÍTICA','TOTAL VOTOS']]
# Let's see the DataFrame obtained:
pd_table

Finally we return to the initial page from where we will choose other options to access other elections from which to obtain information.

In [None]:
driver.get(url)
time.sleep(2)

'''
We can alternatively use the following:

driver.back()
time.sleep(2)
driver.back()
time.sleep(2)

But we discovered that the first option is faster (in terms of loading the initial page).

'''

In [None]:
driver.close()
# But we close the driver to view the following:

Now below we implement the following loop that automates the entire previous process throughout the different presidential elections.

In [None]:
# We set the driver and use the provided URL to access the web page.
driver=webdriver.Chrome()
driver.get(url)

for x in range(len_opt): # Recall that len_opt saved the total number of presidential elections.
    
    # With this conditional we avoid exceeding the total number of valid elections 
    # (consider that we exclude the first option (which number is 0) that does not contain a valid election. You will see this lines later).
    if x+1<26:
        
        # We search, identify and select the presidential elections as the type of process.
        process=driver.find_element(By.XPATH,"/html/body/div[1]/section/div[2]/div[2]/div[2]/div[1]/div")
        process.click()

        time.sleep(2)

        presidential=driver.find_element(By.XPATH,"/html/body/div[1]/section/div[2]/div[2]/div[2]/div[1]/div/div[2]/div[2]")
        presidential.click()

        time.sleep(2)
        
        # We search and identify the available presidential elections.
        xpath2="/html/body/div[1]/section/div[2]/div[2]/div[2]/div[2]/div"
        election=driver.find_element(By.XPATH,xpath2)
        election.click()
        
        time.sleep(2)
        
        # We obtain the string of the name of the election that we are choosing.
        select_xpath=xpath2+"/select"
        s0=driver.find_element(By.XPATH, select_xpath)
        s1=Select(s0)
        s2=s1.options
        name_election=s2[x+1].get_attribute('text') # The iteration starts from 1 (since valid elections names start from there).

        # We search, identify and choose the presidential election that we want.
        div_xpath=xpath2+f'/div[2]/div[{x+2}]' # The iteration starts from 2 (since valid elections options start from there).
        date=driver.find_element(By.XPATH, div_xpath)
        date.click()
        
        time.sleep(1)
        
        # We identify and click the button that redirects us to the page that contains the information about the election we are looking for.
        button=driver.find_element(By.XPATH,"/html/body/div[1]/section/div[2]/div[2]/div[3]/div/button")
        button.click()
        
        # We change to the tab where the information of interest is.
        results=driver.find_element(By.NAME,"candidatos-y-resultados")
        results.click()

        time.sleep(2)
        
        # We search, identify and extract the table with the required information.
        table = driver.find_element(By.XPATH, "/html/body/div[1]/section/div[2]/div[3]/div[3]/div/div/div/div[1]/div[2]/div[2]" )
        table_html = table.get_attribute('innerHTML')
        pd_table = pd.read_html(table_html)[0]
        
        # We made a few adjustments to the extracted information.
        pd_table=pd_table[['ORGANIZACIÓN POLÍTICA','TOTAL VOTOS']]
        pd_table['Elecciones']=name_election
        pd_table=pd_table[['Elecciones','ORGANIZACIÓN POLÍTICA','TOTAL VOTOS']]
        
        # If we are in the first option, we create the dataset that will contain all the required information.
        if x==0:
            dataset=pd_table.copy(deep=True)
        
        # If we are after the first option, we concatenate the information obtained to the dataset created in the first option.
        else:
            dataset=pd.concat([dataset,pd_table],ignore_index=True)
        
        # We return to the initial page.
        driver.get(url)
        time.sleep(2)
        
        # And this process is repeated for all available and valid presidential elections.

In [None]:
# Finally finished executing the loop, we close the driver.
driver.close()