##  Assigment 6: geocoding


 Group 3: Mauricio Flores Jiménez, Fátima Trujillo Quiñe, Reynaldo Padilla Milla, Claudia Córdova Yamauchi, Vania Aspilcueta Serey



The assignment consists of the following steps:

1. Import Data from [this url](https://github.com/alexanderquispe/Diplomado_PUCP/blob/main/_data/bbva_list.xlsx). This dataset is in excel format. You have to convert to PandasDataFrame.
2. Use GoogleMaps API and geocode all the BBVA offices. For those offices that Google API gets no information, use internet and get the latitude and longitude handly and add them to dataset.
3. Use Google API to find the driving time (best guess) from all the group members' address and all the LIMA BBVA offices.
4. Finally, you have to give a report which offices are the most closest and furthest to every group member's address.

# 1. Settings

As a first step, we install and charge the packages and libraries to be used in the assignment

The code utilizes different Python libraries to perform a variety of tasks. The called libraries are described:

- pandas: used for data manipulation, particularly with DataFrames 
- numpy: supports mathematical operations on arrays and matrices 
- urllib.request and requests: handle HTTP requests to interact with web APIs
- json: facilitates working with JSON data
- googlemaps: provides an interface to the Google Maps API for geocoding services
- os: manages file and directory operations
- datetime and dateutil.parser: help with parsing and formatting dates
- unicodedata: used for manipulating Unicode data
- tqdm: adds progress bars in Jupyter notebooks
- time: used for delays between requests
- csv: handles reading and writing CSV files

In [53]:
# Charging the libraries
!pip install -U googlemaps



In [1]:
# Packages
import pandas as pd
import numpy as np
import urllib.request, json, csv
import googlemaps

from tqdm import tqdm_notebook as tqdm
# For sending GET requests from the API
import requests
# For saving access tokens and for file management when creating and adding to the dataset
import os
# For dealing with json responses we receive from the API
import json
# For saving the response data in CSV format
import csv
# For parsing the dates received from twitter in readable formats
import datetime
import dateutil.parser
import unicodedata
#To add wait time between requests
import time
import requests

# 2. Loading and adjusting the file

As a second step, we load and adjust the file to be used. The file consists of an Excel file, containing the addresses for each BBVA office in Lima. It includes the main address, the region, province, and district for each office.

## 2.1. Reading the file

We read the file and show it.

In [3]:
# For importing the Excel file as a PandasDataFrame
bbva_offices = pd.read_excel('../../_data/bbva_list.xlsx')

bbva_offices

Unnamed: 0,Direccion,DEPARTAMENTO,PROVINCIA,DISTRITO
0,CENTRO AEREO COMERCIAL LOCALES 110 A Y 111 A,LIMA,LIMA,CALLAO
1,AV. CTRMTE. MORA S/N BASE NAVAL,LIMA,LIMA,CALLAO
2,"AV. ELMER FAUCETT Y ALEJANDRO BERTELLO, CC CAN...",LIMA,LIMA,CALLAO
3,AV. SAENZ PEN A 323,LIMA,LIMA,CALLAO
4,CALLE OMEGA 149 PARQUE INDUSTRIAL DEL CALLAO,LIMA,LIMA,CALLAO
5,AV. CONTRALMIRANTE RAYGADA N°lll,LIMA,LIMA,CALLAO
6,"AV. ELMERT FAUCETT N°2121 LOCALES N° 2-101,2-1...",LIMA,LIMA,CALLAO
7,AV. ELMER FAUCETT 6000,LIMA,LIMA,CALLAO
8,CENTRO COMERCIAL MINKA PABELL6N 2,LIMA,LIMA,CALLAO
9,"AV. OSCAR R. BENAVIDES 3866, URB. EL AGUILA, L...",LIMA,LIMA,CALLAO


## 2.2. Adjusting the file

Now, we rename columns and correct some district names to process the addresses correctly.

In [5]:
# Renaming columns
bbva_offices.rename(columns={'Direccion': 'DIRECCIÓN'}, inplace=True) 

# Adjusting district names
bbva_offices.iloc[23:26, 3], bbva_offices.iloc[36, 3], bbva_offices.iloc[40:45, 3] = 'BREÑA', 'EL AGUSTINO', 'JESÚS MARÍA'
# iloc uses the column index as an identifier (not the label), and when indicating a range, it does not include the last value (1:4 does not include 4)
# loc uses the column label as an identifier (not the index), and when indicating a range, it does include the last value

# Results
bbva_offices

Unnamed: 0,DIRECCIÓN,DEPARTAMENTO,PROVINCIA,DISTRITO
0,CENTRO AEREO COMERCIAL LOCALES 110 A Y 111 A,LIMA,LIMA,CALLAO
1,AV. CTRMTE. MORA S/N BASE NAVAL,LIMA,LIMA,CALLAO
2,"AV. ELMER FAUCETT Y ALEJANDRO BERTELLO, CC CAN...",LIMA,LIMA,CALLAO
3,AV. SAENZ PEN A 323,LIMA,LIMA,CALLAO
4,CALLE OMEGA 149 PARQUE INDUSTRIAL DEL CALLAO,LIMA,LIMA,CALLAO
5,AV. CONTRALMIRANTE RAYGADA N°lll,LIMA,LIMA,CALLAO
6,"AV. ELMERT FAUCETT N°2121 LOCALES N° 2-101,2-1...",LIMA,LIMA,CALLAO
7,AV. ELMER FAUCETT 6000,LIMA,LIMA,CALLAO
8,CENTRO COMERCIAL MINKA PABELL6N 2,LIMA,LIMA,CALLAO
9,"AV. OSCAR R. BENAVIDES 3866, URB. EL AGUILA, L...",LIMA,LIMA,CALLAO


# 3. Obtaining coordinates

As a third step, we create a function to obtain the coordinates of the BBVA offices.

## 3.1. Geocoding examples 

This part of the code, first, calls for the API key to be used. The API key enables us to get the directions using the Google Maps platform. Then, we do a test to check if the code is working properly and to inspect our output. Considering that the result is a nested dictionary, we start trying the dictionary keys to get the location (latitude and longitude).

In [7]:
## Calling for the API keys
gmaps = googlemaps.Client(key = 'AIzaSyDdnSjrt9BzHWv5i--lIgnJwS0BoKPAOkg')

## Example of the dictionary created when calling an address
example_1 = gmaps.geocode( "Av. Universitaria 18 - Interior PUCP, San Miguel" , region='PE')
print(example_1); print()

# Applying the right keys to get the latitude and longitude
print(len(example_1)); print()
print(f'Here we visualize the first element of the list\n', example_1[0]); print()
print(f'Here we generate a list with the dictionary keys\n', list(example_1[0].keys())); print()
print(f'Here we generate a list with the keys of the "geometry" dictionary\n', list(example_1[0]['geometry'])); print()
print(f'Here we generate a list with the keys of the "location" dictionary\n', list(example_1[0]['geometry']['location'])); print()

[{'address_components': [{'long_name': '18', 'short_name': '18', 'types': ['street_number']}, {'long_name': 'Avenida Universitaria', 'short_name': 'Av. Universitaria', 'types': ['route']}, {'long_name': 'Fund Pando', 'short_name': 'Fund Pando', 'types': ['political', 'sublocality', 'sublocality_level_1']}, {'long_name': 'San Miguel', 'short_name': 'San Miguel', 'types': ['locality', 'political']}, {'long_name': 'Lima', 'short_name': 'Lima', 'types': ['administrative_area_level_2', 'political']}, {'long_name': 'Provincia de Lima', 'short_name': 'Provincia de Lima', 'types': ['administrative_area_level_1', 'political']}, {'long_name': 'Peru', 'short_name': 'PE', 'types': ['country', 'political']}, {'long_name': '15088', 'short_name': '15088', 'types': ['postal_code']}], 'formatted_address': 'Av. Universitaria 18, San Miguel 15088, Peru', 'geometry': {'bounds': {'northeast': {'lat': -12.0722255, 'lng': -77.0786512}, 'southwest': {'lat': -12.0733886, 'lng': -77.082871}}, 'location': {'lat'

## 3.2. Function to get the location

The second part of the code defines a function called geo_bbva that is used to obtain the latitude and longitude coordinates for a given BBVA office based on its location details. The function takes a row from a dataset as input, concatenates the relevant location information (department, province, district, and address) into a single string, and then uses the Geocoding API to get the coordinates. It attempts to extract the latitude and longitude from the API's output. If the geocoding is successful, the function returns the coordinates; if not, it returns NaN values for both latitude and longitude.

In [11]:
# Creating the function geo_bbva
def geo_bbva(row_series):
    
    Ubicación = ', '.join(map(row_series.get, ['DEPARTAMENTO', 'PROVINCIA', 'DISTRITO', 'DIRECCIÓN']))

    # Seting the Geolocation
    result_api = gmaps.geocode(Ubicación, region = 'PE')
    
    # Getting the information
    try:
        lat = result_api[0]['geometry']['location']['lat']
        lon = result_api[0]['geometry']['location']['lng']   

   # Generating missings for the locations not found 
    except:
        lat = np.nan
        lon = np.nan
    
    return lat, lon

## 3.3. Applying the function to the dataframe

Finally, we modify the dataframe to include the column 'COORDENADAS: Lat, Long', which will contain the output of our previously defined function. We are applying the function geo_bbva to each row of the dataframe. Then, we reorder the columns.

In [13]:
# Applying the function
bbva_offices['COORDENADAS: Lat, Long'] = bbva_offices.apply(geo_bbva, axis=1)

# Reordering columns
bbva_offices = bbva_offices[['DIRECCIÓN', 'DISTRITO', 'PROVINCIA', 'DEPARTAMENTO', 'COORDENADAS: Lat, Long']]

# Results
bbva_offices

Unnamed: 0,DIRECCIÓN,DISTRITO,PROVINCIA,DEPARTAMENTO,"COORDENADAS: Lat, Long"
0,CENTRO AEREO COMERCIAL LOCALES 110 A Y 111 A,CALLAO,LIMA,LIMA,"(-12.0244324, -77.1041764)"
1,AV. CTRMTE. MORA S/N BASE NAVAL,CALLAO,LIMA,LIMA,"(-12.0511717, -77.1256883)"
2,"AV. ELMER FAUCETT Y ALEJANDRO BERTELLO, CC CAN...",CALLAO,LIMA,LIMA,"(-12.0309896, -77.1014989)"
3,AV. SAENZ PEN A 323,CALLAO,LIMA,LIMA,"(-12.0511717, -77.1256883)"
4,CALLE OMEGA 149 PARQUE INDUSTRIAL DEL CALLAO,CALLAO,LIMA,LIMA,"(-12.0506798, -77.0871366)"
5,AV. CONTRALMIRANTE RAYGADA N°lll,CALLAO,LIMA,LIMA,"(-12.0464501, -77.1400374)"
6,"AV. ELMERT FAUCETT N°2121 LOCALES N° 2-101,2-1...",CALLAO,LIMA,LIMA,"(-12.0379707, -77.0986351)"
7,AV. ELMER FAUCETT 6000,CALLAO,LIMA,LIMA,"(-11.9972712, -77.1245194)"
8,CENTRO COMERCIAL MINKA PABELL6N 2,CALLAO,LIMA,LIMA,"(-12.048302, -77.10948069999999)"
9,"AV. OSCAR R. BENAVIDES 3866, URB. EL AGUILA, L...",CALLAO,LIMA,LIMA,"(-12.0548677, -77.10383379999999)"


# 4. Distance to BBVA offices

This part of the code creates a new function based on the driving time from each member's house to each BBVA office.

## 4.1 Dataframe with addresses

First, we create a dataframe containing the location of our houses.

In [15]:
# Creating a dictionary with the coordinates of each of our houses
G3_coord = {'Integrante': ['Claudia', 'Fátima', 'Mauricio', 'Reynaldo', 'Vania'],
            'Geocode_Dom': [(-11.99150, -77.07072), (-12.000563, -77.049989), (-12.061457, -77.046877), (-11.976752, -77.059527), (-12.07083, -77.07289)]
           }

# Converting to a dataframe
G3_df = pd.DataFrame(G3_coord)

#Results
G3_df

Unnamed: 0,Integrante,Geocode_Dom
0,Claudia,"(-11.9915, -77.07072)"
1,Fátima,"(-12.000563, -77.049989)"
2,Mauricio,"(-12.061457, -77.046877)"
3,Reynaldo,"(-11.976752, -77.059527)"
4,Vania,"(-12.07083, -77.07289)"


## 4.2. Driving time function

Then, we create a function, driving_time, that calculates the estimated driving time between a specified origin and destination using the Directions API and Distance Matrix API. It does this by sending a request to the API with the provided origin and destination addresses. The API response includes various data, from which the function extracts the driving duration in seconds. The function then converts this duration to minutes and returns it as a string. If the API fails to return the expected data, the function handles the error by returning NaN instead of a time estimate.

In [19]:
# Defining a function to calculate the driving time
def driving_time(origin, destination):
    result = gmaps.distance_matrix( # We provide the necessary inputs of
        origins=origin, 
        destinations=destination, 
        mode='driving', 
        region='PE',
        language= 'es',
        traffic_model='best_guess',
        departure_time='now'
    )
    try:
        dist_segundos = result['rows'][0]['elements'][0]['duration']['value']
        time = f'{round(dist_segundos / 60)} min'  # Converting seconds to minutes
    except:
        time = np.nan # Returning a missing value if the function fails
    return time

## 4.3. Dataframe including offices locations and driving time from each of our houses

Finally, we create the DrivingTimes_df DataFrame, which includes the address for each BBVA office and the driving time from each member's house. This involves applying the driving_time function, which takes the coordinates of our houses as the point of origin and those of each office as the destination.

In [21]:
# Create the base dataframe with OFICINA and DIRECCIÓN
DrivingTimes_df = pd.DataFrame({'Ubicación de la oficina': bbva_offices['DIRECCIÓN'] + ', ' + bbva_offices['DISTRITO']})
	
# Iterate over each member to calculate the driving time and add the corresponding column
for index, row in G3_df.iterrows():
    DrivingTimes_df[f"Tiempo desde la casa de {row['Integrante']}"] = bbva_offices['COORDENADAS: Lat, Long'].apply(
	lambda office_coords: driving_time(row['Geocode_Dom'], office_coords))

In [23]:
DrivingTimes_df

Unnamed: 0,Ubicación de la oficina,Tiempo desde la casa de Claudia,Tiempo desde la casa de Fátima,Tiempo desde la casa de Mauricio,Tiempo desde la casa de Reynaldo,Tiempo desde la casa de Vania
0,"CENTRO AEREO COMERCIAL LOCALES 110 A Y 111 A, ...",23 min,29 min,30 min,26 min,23 min
1,"AV. CTRMTE. MORA S/N BASE NAVAL, CALLAO",30 min,38 min,31 min,34 min,23 min
2,"AV. ELMER FAUCETT Y ALEJANDRO BERTELLO, CC CAN...",22 min,28 min,32 min,25 min,26 min
3,"AV. SAENZ PEN A 323, CALLAO",30 min,38 min,31 min,34 min,23 min
4,"CALLE OMEGA 149 PARQUE INDUSTRIAL DEL CALLAO, ...",28 min,32 min,18 min,27 min,11 min
5,"AV. CONTRALMIRANTE RAYGADA N°lll, CALLAO",33 min,42 min,35 min,37 min,27 min
6,"AV. ELMERT FAUCETT N°2121 LOCALES N° 2-101,2-1...",25 min,31 min,27 min,27 min,21 min
7,"AV. ELMER FAUCETT 6000, CALLAO",19 min,30 min,38 min,23 min,31 min
8,"CENTRO COMERCIAL MINKA PABELL6N 2, CALLAO",28 min,34 min,29 min,32 min,22 min
9,"AV. OSCAR R. BENAVIDES 3866, URB. EL AGUILA, L...",32 min,38 min,27 min,38 min,20 min


# 5. Closest and farthest office

## 5.1. Closest and farthest function

This part of the code creates a new function, that provides the closest and farthest BBVA office from each member's house. As a final step, it shows the results, including the member, the address from the closest office, the driving time to that office, the address from the farthest office, and the driving time to that office.

In [25]:
# Defining a function to obtain the closest and farthest office based on travel times
def closest_farthest_times(column):
    
    # Get the index of the office with the minimum travel time
    idx_min = DrivingTimes_df[column].str.replace(' min', '').astype(int).idxmin()
    
    # Get the index of the office with the maximum travel time
    idx_max = DrivingTimes_df[column].str.replace(' min', '').astype(int).idxmax()
    
    # Return a series with the addresses and times of the closest and farthest offices
    return pd.Series({
        'Oficina más cercana': DrivingTimes_df.loc[idx_min, 'Ubicación de la oficina'],   # Closest office address
        'Tiempo hasta la oficina más cercana': DrivingTimes_df.loc[idx_min, column],      # Closest office driving time
        'Oficina más lejana': DrivingTimes_df.loc[idx_max, 'Ubicación de la oficina'],    # Farthest office address
        'Tiempo hasta la oficina más lejana': DrivingTimes_df.loc[idx_max, column]        # Farthest office driving time
    })

## 5.2. Final results

In [27]:
# Apply the function to each column of driving times
G3_XtremeT_bbva = G3_df['Integrante'].apply(
    lambda integrante: closest_farthest_times(f"Tiempo desde la casa de {integrante}"))

# Add the "Integrante" column
G3_XtremeT_bbva['Integrante'] = G3_df['Integrante']

# Reorder the columns
G3_XtremeT_bbva = G3_XtremeT_bbva[['Integrante', 'Oficina más cercana', 'Tiempo hasta la oficina más cercana', 'Oficina más lejana', 'Tiempo hasta la oficina más lejana']]

# Display the final output
G3_XtremeT_bbva

Unnamed: 0,Integrante,Oficina más cercana,Tiempo hasta la oficina más cercana,Oficina más lejana,Tiempo hasta la oficina más lejana
0,Claudia,"AV. CARLOS IZAGUIRRE N.- 275, INDEPENDENCIA",4 min,"AV. 15 DE JULIO, LOTE 33, PROGRAMA ESPECIAL HU...",59 min
1,Fátima,"AV. ALFREDO MENDIOLA3698 - C. C. CONO NORTE, T...",5 min,"AV. 15 DE JULIO, LOTE 33, PROGRAMA ESPECIAL HU...",66 min
2,Mauricio,"JR. HUARAZ1600, BREÑA",6 min,"MALECdN ANDRAS A. CACERES MZ C-3, LT. 19, VENT...",70 min
3,Reynaldo,"AV. TUPAC AMARU N° 1175, COMAS",7 min,"AV. 15 DE JULIO, LOTE 33, PROGRAMA ESPECIAL HU...",61 min
4,Vania,"CALLE OMEGA 149 PARQUE INDUSTRIAL DEL CALLAO, ...",11 min,"MALECdN ANDRAS A. CACERES MZ C-3, LT. 19, VENT...",65 min


In [41]:
# We'll iterate over each row of our last df to generate the report
for index, row in G3_XtremeT_bbva.iterrows():
    mensaje = f"La oficina del BBVA más cercana a la casa de {row['Integrante']} está a {row['Tiempo hasta la oficina más cercana']} en auto, y queda en {row['Oficina más cercana']}; la más lejana está a {row['Tiempo hasta la oficina más lejana']} en auto y queda en {row['Oficina más lejana']}.\n"
    print(mensaje)

La oficina del BBVA más cercana a la casa de Claudia está a 4 min en auto, y queda en AV. CARLOS IZAGUIRRE N.- 275, INDEPENDENCIA; la más lejana está a 59 min en auto y queda en AV. 15 DE JULIO, LOTE 33, PROGRAMA ESPECIAL HUAYCAN, ZONA B, ATE.

La oficina del BBVA más cercana a la casa de Fátima está a 5 min en auto, y queda en AV. ALFREDO MENDIOLA3698 - C. C. CONO NORTE, TIENDAS2y3, INDEPENDENCIA; la más lejana está a 66 min en auto y queda en AV. 15 DE JULIO, LOTE 33, PROGRAMA ESPECIAL HUAYCAN, ZONA B, ATE.

La oficina del BBVA más cercana a la casa de Mauricio está a 6 min en auto, y queda en JR. HUARAZ1600, BREÑA; la más lejana está a 70 min en auto y queda en MALECdN ANDRAS A. CACERES MZ C-3, LT. 19, VENTANILLA.

La oficina del BBVA más cercana a la casa de Reynaldo está a 7 min en auto, y queda en AV. TUPAC AMARU N° 1175, COMAS; la más lejana está a 61 min en auto y queda en AV. 15 DE JULIO, LOTE 33, PROGRAMA ESPECIAL HUAYCAN, ZONA B, ATE.

La oficina del BBVA más cercana a la ca