## Scrape results of UNMSM's admission exam (2024-I)
The exam comprises 100 multiple-choice questions —ranging from mathematics to social sciences, humanities, physics, chemistry, and biology— and spans a duration of 3 hours. Each correct answer carries 20 points, while an incorrect response deducts 1.125 points, and blank answers hold no point value. The maximum (and nearly impossible) score is 2,000 points. The objective of this code is to scrape the results of the 2024-I UNMSM's exam.

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [7]:
# Base URL for the subpages
base_url = "https://admision.unmsm.edu.pe/Website20241/A/"

# Career codes and keep string all variables (see Excel file in raw folder)
career_codes = [
 '011', '012', '013', '015', '041', '042', '043', '051', '081', '181', '182', '071',
 '101', '102', '103', '131', '141', '142', '144', '145', '072', '073', '132', '162',
 '163', '165', '166', '167', '168', '171', '172', '173', '191', '192', '193', '194',
 '201', '202', '091', '092', '093', '111', '112', '113', '121', '122', '123', '022',
 '023', '031', '033', '034', '035', '036', '037', '038', '039', '062', '151', '152',
 '153', '154', '155', '157', '0911', '0912', '0921', '0932', '0931', '0922', '1131',
 '203', '1111', '0611', '0612', '0613', '1121', '114', '1141', '0141', '0143', '0142', '0144'
]


In [9]:
# Initialize an empty list to store all the data
all_data = []

# Iterate over each code and scrape the corresponding page
for code in career_codes:
    # Construct the URL for the subpage
    subpage_url = f"{base_url}{str(code).zfill(3)}/0.html"

    try:
        # Fetch the content of the subpage
        response = requests.get(subpage_url)
        soup = BeautifulSoup(response.content, 'html.parser')

        # Find the table containing the results
        table = soup.find('table')

        if table:
            # Extract rows from the table
            rows = table.find_all('tr')
            print(f"Table from {subpage_url}:")

            # Iterate over each row and extract the data
            for row in rows[1:]:  # Skipping the header row
                cols = row.find_all('td')
                if len(cols) >= 7: #7 variables in this Exam 2023-II
                    # Extracting the required variables
                    data = {
                        'idcode': cols[0].text.strip(),
                        'fullname': cols[1].text.strip(),
                        'career': cols[2].text.strip(),
                        'examscore': cols[3].text.strip(),
                        'ranking': cols[4].text.strip(),
                        'status': cols[5].text.strip(),
                        'careercode': str(code),
                        'secondcareer': cols[6].text.strip(),
                        'exam': '2024-I'
                    }
                    all_data.append(data)
    except Exception as e:
        print(f"An error occurred while processing code {code}: {e}")
        break

# Convert the data to a pandas DataFrame
results_df = pd.DataFrame(all_data)

# Export the DataFrame to a CSV file
results_df.to_csv('../data/raw/scraping_results_unmsm.csv', index=False)

results_df.sample(20)

Table from https://admision.unmsm.edu.pe/Website20241/A/011/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/012/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/013/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/015/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/041/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/042/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/043/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/051/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/081/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/181/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/182/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/071/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/101/0.html:
Table from https://admision.unmsm.edu.pe/Website20241/A/102/0.html:
Table from https://admision.unmsm.edu.pe/Website

Unnamed: 0,idcode,fullname,career,examscore,ranking,status,careercode,secondcareer,exam
7912,156634,"SANCHEZ SARMIENTO, ANDRE ABEL",MEDICINA VETERINARIA,489.125,,,81,,2024-I
6290,110336,"TUCTO ANTUNEZ, JOSEPH WILLIAMS",ENFERMERÍA,584.625,,,13,,2024-I
15735,274046,"BECERRA CHOMBO, LUIS ALFREDO",INGENIERÍA DE SISTEMAS,626.875,,,201,,2024-I
23733,729533,"GONZALES SILVA, SAARA INDIRA",DERECHO,535.25,,,22,,2024-I
12458,270753,"PAIMA PEZO, GILARY TATIANA",INGENIERÍA CIVIL,474.75,,,167,,2024-I
9837,150240,"MATURRANO SORIA, CAROLINE GABRIELA",PSICOLOGÍA ORGANIZACIONAL Y DE LA GESTIÓN HUMANA,847.125,,,182,,2024-I
12767,275668,"VARAS FLORES, JOSE MARIA CRUZ",INGENIERÍA CIVIL,350.25,,,167,,2024-I
3649,881579,"PERALTA GARIBAY, YAMELIN DANIELA",MEDICINA HUMANA,609.625,,,11,,2024-I
21244,587507,"TOSCANO TAPIA, ANGIE ANABEL",AUDITORÍA EMPRESARIAL Y DEL SECTOR PÚBLICO - LIMA,861.5,,,113,,2024-I
11769,296555,"ACOSTA HUAMÁN, DAYANA NICOLE",INGENIERÍA CIVIL,1049.375,,,167,,2024-I
