# Proyecto Final del curso de Ingeniería de Datos 

Se propone crear un pipeline que extraiga datos de una API pública de forma constante combinándolos con información extraída de una base de datos y colocándolos en un Data Warehouse.

## Setup

### Instalación de librerias

In [19]:
# Instalacion de la libreria para interactuar con la base de datos, especificamente con Postgres
#%pip install sqlalchemy psycopg2-binary

### Importación de librerias

In [20]:
# Libreria para interactuar con APIs
import requests

import pandas as pd

# Libreria para interactuar con la base de datos
import sqlalchemy as sa
from configparser import ConfigParser
from pathlib import Path

import psycopg2
import logging

### Definición de funciones

In [21]:
def read_api_credentials(config_file: Path, section: str) -> dict:
    """
    Lee las credenciales de la API desdde un archivo de configuracion

    Parametros:
    config_file: Ruta del archivo de configuracion
    section: seccion del archivo de configuracion que contiene las credenciales
    """
    config = ConfigParser()
    config.read(config_file)
    api_credentials = dict(config[section])
    return api_credentials

In [22]:
def load_df_bus_positions(df_origen, df_destino): 

    columnas_a_considerar = ['id', 'agency_id', 'route_id', 'latitude', 'longitude', 'speed', 'timestamp', 'route_short_name', 'trip_headsign']

    df_fusionado = pd.merge(df_destino, df_origen, on=columnas_a_considerar, how='outer')
    
    df_destino_actualizado = df_fusionado[columnas_a_considerar]
    
    return df_destino_actualizado
    

In [23]:
def load_df_agencies(fila_origen, df_destino):

    # Obtener los datos de agency_name y agency_id de la fila de origen
    agency_name = fila_origen['agency_name']
    agency_id = fila_origen['agency_id']

    # Agregar los datos al DataFrame de destino
    nueva_fila = {'agency_name': agency_name, 'agency_id': agency_id}
    df_destino_actualizado = df_destino.append(nueva_fila, ignore_index=True)

    return df_destino_actualizado

## Conexión con la API

Extraccion de datos de la API de transporte de Buenos Aires

In [24]:
base_url = "https://apitransporte.buenosaires.gob.ar"

api_keys = read_api_credentials("config/pipeline.conf", "api_transporte")

# No pude con los headers, lo puse como parametros pero oculte la info
params = { 
    "client_id" : api_keys["client_id"],
    "client_secret" : api_keys["client_secret"]
}

In [25]:
# Parametro que a veces es requisito
formato_json = {'json': 1}

### Extracción de datos de los bus

In [26]:
endpoint_bus = "colectivos"

_____________

Para buses en especifico, si quiero el general no corro ninguna de estas lineas

In [27]:
# LA NUEVA METROPOL S.A.
la_nueva_metropol = {'agency_id': 9}

In [28]:
# MICRO OMNIBUS PRIMERA JUNTA S.A
primera_junta = {'agency_id': 145}

In [29]:
# TRANSPORTE AUTOMOTORES LA PLATA SA
talp = {'agency_id': 155}

______________

#### Creción del df de posicion de los bus de interés

In [30]:
# Creo un DataFrame para agencies vacio con las columnas que tendra en la base de datos
df_agencies = pd.DataFrame(columns=['agency_id', 'agency_name'])

# Asigno tipos de datos a las columnas
df_agencies = df_agencies.astype({'agency_id': 'int', 'agency_name': 'str'})

print(df_agencies)

Empty DataFrame
Columns: [agency_id, agency_name]
Index: []


In [31]:
# Igual para la tabla de posiciones del bus
column_specifications = {
    'id': str,
    'agency_id': int,
    'route_id': str,
    'latitude': float,
    'longitude': float,
    'speed': float,
    'timestamp': int,
    'route_short_name': str,
    'trip_headsign': str
}

df_bus_positions = pd.DataFrame(columns=column_specifications.keys())

for column, dtype in column_specifications.items():
    df_bus_positions[column] = df_bus_positions[column].astype(dtype)

print(df_bus_positions.dtypes)

id                   object
agency_id             int32
route_id             object
latitude            float64
longitude           float64
speed               float64
timestamp             int32
route_short_name     object
trip_headsign        object
dtype: object


#### Información de la posicion de los bus

In [32]:
# Obtencion de la posición de los vehículos monitoreados actualizada cada 30 segundos. 
# Si no se pasan parámetros de entrada, retorna la posición actual de todos los vehículos monitoreados.

endpoint_busPositions = f"{endpoint_bus}/vehiclePositionsSimple"

full_url_busPositions = f"{base_url}/{endpoint_busPositions}"


##### Para acceder a las posiciones de las lineas de Primera Junta

In [33]:
params_PJPositions = params.copy()
params_PJPositions.update(primera_junta)

In [34]:
r_PJPositions = requests.get(full_url_busPositions, params=params_PJPositions)

r_PJPositions.status_code

200

In [35]:
json_PJData = r_PJPositions.json()
json_PJData

[{'route_id': '1279',
  'latitude': -34.83354,
  'longitude': -58.185215,
  'speed': 1.111111,
  'timestamp': 1708975496,
  'id': '23689',
  'direction': 0,
  'agency_name': 'MICRO OMNIBUS PRIMERA JUNTA S.A',
  'agency_id': 145,
  'route_short_name': '324R3',
  'tip_id': '82341-1',
  'trip_headsign': 'A - Barrio Sitra - IDA'},
 {'route_id': '1294',
  'latitude': -34.67706,
  'longitude': -58.33524,
  'speed': 0,
  'timestamp': 1708975494,
  'id': '23696',
  'direction': 1,
  'agency_name': 'MICRO OMNIBUS PRIMERA JUNTA S.A',
  'agency_id': 145,
  'route_short_name': '324R9',
  'tip_id': '83177-1',
  'trip_headsign': 'a Pte. Saavedra'},
 {'route_id': '1285',
  'latitude': -34.79261,
  'longitude': -58.24999,
  'speed': 0,
  'timestamp': 1708975496,
  'id': '23726',
  'direction': 0,
  'agency_name': 'MICRO OMNIBUS PRIMERA JUNTA S.A',
  'agency_id': 145,
  'route_short_name': '324T5',
  'tip_id': '82672-1',
  'trip_headsign': 'B - Barrio Centenario (por Milan) - IDA'},
 {'route_id': '1298

In [36]:
type(json_PJData)

list

In [37]:
json_PJData[1].keys()

dict_keys(['route_id', 'latitude', 'longitude', 'speed', 'timestamp', 'id', 'direction', 'agency_name', 'agency_id', 'route_short_name', 'tip_id', 'trip_headsign'])

Para pasar a un dataframe la data


In [38]:
# Para pasar el json a una dataframe
df_PJPositions = pd.json_normalize(json_PJData)
df_PJPositions.sample(n=10)

Unnamed: 0,route_id,latitude,longitude,speed,timestamp,id,direction,agency_name,agency_id,route_short_name,tip_id,trip_headsign
62,1290,-34.72901,-58.2625,9.166666,1708975494,24604,1,MICRO OMNIBUS PRIMERA JUNTA S.A,145,324R6P,82981-1,Ramal B - a Est. Lomas de Zamora
63,1295,-34.81171,-58.27306,0.0,1708975488,25231,0,MICRO OMNIBUS PRIMERA JUNTA S.A,145,324R9F,83232-1,a Pilar x Ford
13,1293,-34.73914,-58.26458,0.0,1708975436,23794,0,MICRO OMNIBUS PRIMERA JUNTA S.A,145,324R9,83125-1,a Moreno x Panamericana
36,1280,-34.81848,-58.19554,5.0,1708975496,23942,1,MICRO OMNIBUS PRIMERA JUNTA S.A,145,324R3,82387-1,a Tribunales de Retiro/Htal. Ferroviario
40,1279,-34.71922,-58.263386,7.5,1708975464,23961,0,MICRO OMNIBUS PRIMERA JUNTA S.A,145,324R3,82339-1,A - Barrio Sitra - IDA
11,1280,-34.79459,-58.2372,5.833333,1708975494,23778,1,MICRO OMNIBUS PRIMERA JUNTA S.A,145,324R3,82387-1,a Tribunales de Retiro/Htal. Ferroviario
53,1293,-34.82498,-58.23344,8.333333,1708975464,24040,0,MICRO OMNIBUS PRIMERA JUNTA S.A,145,324R9,83127-1,a Moreno x Panamericana
48,1291,-34.74595,-58.28792,1.111111,1708975494,23998,0,MICRO OMNIBUS PRIMERA JUNTA S.A,145,324R6C,83032-1,a Moreno x Virreyes
16,1280,-34.74382,-58.27322,9.722222,1708975494,23834,1,MICRO OMNIBUS PRIMERA JUNTA S.A,145,324R3,82388-1,a Tribunales de Retiro/Htal. Ferroviario
29,1289,-34.7339,-58.252335,3.055555,1708975496,23917,0,MICRO OMNIBUS PRIMERA JUNTA S.A,145,324R6P,82932-1,Ramal B - a A. Bello


In [39]:
df_PJPositions.shape

(68, 12)

In [40]:
df_PJPositions.dtypes

route_id             object
latitude            float64
longitude           float64
speed               float64
timestamp             int64
id                   object
direction             int64
agency_name          object
agency_id             int64
route_short_name     object
tip_id               object
trip_headsign        object
dtype: object

##### Cargado al DF para posterior subida a la DB

In [41]:
df_bus_positions = load_df_bus_positions(df_PJPositions, df_bus_positions)


In [42]:
df_bus_positions.shape


(68, 9)

In [43]:
df_bus_positions.head()

Unnamed: 0,id,agency_id,route_id,latitude,longitude,speed,timestamp,route_short_name,trip_headsign
0,23689,145,1279,-34.83354,-58.185215,1.111111,1708975496,324R3,A - Barrio Sitra - IDA
1,23696,145,1294,-34.67706,-58.33524,0.0,1708975494,324R9,a Pte. Saavedra
2,23726,145,1285,-34.79261,-58.24999,0.0,1708975496,324T5,B - Barrio Centenario (por Milan) - IDA
3,23729,145,1298,-34.73054,-58.265514,0.0,1708975494,324R16,a Pte. Saavedra
4,23730,145,1294,-34.76349,-58.262665,1.111111,1708975496,324R9,a Pte. Saavedra


In [44]:
df_agencies = load_df_agencies(df_PJPositions.iloc[0], df_agencies)

  df_destino_actualizado = df_destino.append(nueva_fila, ignore_index=True)


In [45]:
df_agencies

Unnamed: 0,agency_id,agency_name
0,145,MICRO OMNIBUS PRIMERA JUNTA S.A


##### Para acceder a las posiciones de las lineas de La Nueva Metropol

In [46]:
params_NMPositions = params.copy()
params_NMPositions.update(la_nueva_metropol)
r_NMPositions = requests.get(full_url_busPositions, params=params_NMPositions)

r_NMPositions.status_code

200

In [47]:
json_NMData = r_NMPositions.json()
df_NMPositions = pd.json_normalize(json_NMData)
df_NMPositions.sample(n=10)

Unnamed: 0,route_id,latitude,longitude,speed,timestamp,id,direction,agency_name,agency_id,route_short_name,tip_id,trip_headsign
190,140,-34.68434,-58.30674,24.444445,1708975492,46513,0,LA NUEVA METROPOL S.A.,9,195C,11950-1,Ramal A - IDA
131,1198,-34.53712,-58.475735,0.0,1708975490,20517,0,LA NUEVA METROPOL S.A.,9,365R4,77557-1,a Atalaya
217,137,-34.91144,-57.96798,0.0,1708975488,67987,1,LA NUEVA METROPOL S.A.,9,195A,11883-1,a Est. TORCUATO
150,1195,-34.52175,-58.758514,0.0,1708975494,20741,1,LA NUEVA METROPOL S.A.,9,365R2,77417-1,a Cement. de Villegas x Leon Gallo
183,151,-34.87611,-57.969425,24.444445,1708975492,46072,1,LA NUEVA METROPOL S.A.,9,195H,12307-1,a Miserere x Panamericana
130,2039,-34.49092,-58.567646,15.277777,1708975494,20501,1,LA NUEVA METROPOL S.A.,9,194H,140495-1,a Expreso - Pza. Miserere
151,1209,-34.51024,-58.73112,7.777777,1708975496,20758,1,LA NUEVA METROPOL S.A.,9,365R9,78070-1,Laferrere x Victor Martinez - VUELTA
106,1200,-34.56818,-58.808845,0.0,1708975494,20368,0,LA NUEVA METROPOL S.A.,9,365R5,77671-1,a Los Pinos
155,1195,-34.48385,-58.68336,13.055555,1708975496,20967,1,LA NUEVA METROPOL S.A.,9,365R2,77417-1,a Cement. de Villegas x Leon Gallo
58,2042,-34.1678,-58.9584,5.0,1708975492,8373,0,LA NUEVA METROPOL S.A.,9,194A,140582-1,Ramal A - IDA


In [48]:
df_NMPositions.shape

(220, 12)

##### Cargado al DF para posterior subida a la DB

In [49]:
df_bus_positions = load_df_bus_positions(df_NMPositions, df_bus_positions)


In [50]:
df_bus_positions.shape


(288, 9)

In [51]:
df_bus_positions.sample(5)

Unnamed: 0,id,agency_id,route_id,latitude,longitude,speed,timestamp,route_short_name,trip_headsign
154,20234,9,1195,-34.4959,-58.49803,0.0,1708975494,365R2,a Cement. de Villegas x Leon Gallo
99,7794,9,138,-34.60573,-58.370094,8.333333,1708975490,195B,a Term. La Plata x Autop.
160,20289,9,1214,-34.49426,-58.49808,0.0,1708975494,365R11,a Cañuelas
200,20518,9,1211,-34.44306,-58.577446,5.277777,1708975494,365R10,Isidro Casanova - VUELTA
100,7796,9,140,-34.90395,-57.9557,0.0,1708975488,195C,Ramal A - IDA


In [52]:
df_agencies = load_df_agencies(df_NMPositions.iloc[0], df_agencies)

  df_destino_actualizado = df_destino.append(nueva_fila, ignore_index=True)


In [53]:
df_agencies

Unnamed: 0,agency_id,agency_name
0,145,MICRO OMNIBUS PRIMERA JUNTA S.A
1,9,LA NUEVA METROPOL S.A.


##### Para acceder a las posiciones de las lineas TALP

In [54]:
params_TALPPositions = params.copy()
params_TALPPositions.update(talp)
r_TALPPositions = requests.get(full_url_busPositions, params=params_TALPPositions)

r_TALPPositions.status_code

200

In [55]:
json_TALPData = r_TALPPositions.json()
df_TALPPositions = pd.json_normalize(json_TALPData)
df_TALPPositions.sample(n=10)

Unnamed: 0,route_id,latitude,longitude,speed,timestamp,id,direction,agency_name,agency_id,route_short_name,tip_id,trip_headsign
44,1250,-34.5473,-58.582115,8.333333,1708975494,24245,0,TRANSPORTE AUTOMOTORES LA PLATA SA,155,338C,80565-1,Ramal F - RN 3 x Alberdi
5,1250,-34.69417,-58.54878,0.0,1708975490,23639,0,TRANSPORTE AUTOMOTORES LA PLATA SA,155,338C,80569-1,Ramal F - RN 3 x Alberdi
38,1255,-34.7018,-58.53888,10.0,1708975496,23901,1,TRANSPORTE AUTOMOTORES LA PLATA SA,155,406A,80732-1,a Cement. de Villegas x Mocoreta
50,1247,-34.69107,-58.55345,12.5,1708975492,31411,1,TRANSPORTE AUTOMOTORES LA PLATA SA,155,338B,80495-1,a Ituzaingo y 29 de Septiembre
20,1246,-34.7628,-58.409145,4.444444,1708974534,23750,0,TRANSPORTE AUTOMOTORES LA PLATA SA,155,338B,80442-1,a Lynch
31,1247,-34.64924,-58.619446,0.0,1708975494,23805,1,TRANSPORTE AUTOMOTORES LA PLATA SA,155,338B,80495-1,a Ituzaingo y 29 de Septiembre
7,1246,-34.50277,-58.56085,13.055555,1708975464,23646,0,TRANSPORTE AUTOMOTORES LA PLATA SA,155,338B,80439-1,a Lynch
25,1255,-34.66403,-58.53926,0.0,1708975494,23773,1,TRANSPORTE AUTOMOTORES LA PLATA SA,155,406A,80731-1,a Cement. de Villegas x Mocoreta
34,1251,-34.53127,-58.57543,0.0,1708975496,23831,1,TRANSPORTE AUTOMOTORES LA PLATA SA,155,338C,80628-1,Ramal F - Pza. Miserere x Av. J. B. Alberdi
9,1250,-34.7737,-58.275665,14.166666,1708975496,23656,0,TRANSPORTE AUTOMOTORES LA PLATA SA,155,338C,80574-1,Ramal F - RN 3 x Alberdi


In [56]:
df_TALPPositions.shape

(54, 12)

##### Cargado al DF para posterior subida a la DB

In [57]:
df_bus_positions = load_df_bus_positions(df_TALPPositions, df_bus_positions)


In [58]:
df_bus_positions.shape


(342, 9)

In [59]:
df_bus_positions.sample(5)

Unnamed: 0,id,agency_id,route_id,latitude,longitude,speed,timestamp,route_short_name,trip_headsign
95,7484,9,2010,-34.63254,-58.38416,0.0,1708975490,65B,a Est. Avellaneda
213,20709,9,1203,-34.54709,-58.81185,0.0,1708975496,365R6,a Los Pinos
40,23961,145,1279,-34.71922,-58.263386,7.5,1708975464,324R3,A - Barrio Sitra - IDA
1,23696,145,1294,-34.67706,-58.33524,0.0,1708975494,324R9,a Pte. Saavedra
308,23750,155,1246,-34.7628,-58.409145,4.444444,1708974534,338B,a Lynch


In [60]:
df_agencies = load_df_agencies(df_TALPPositions.iloc[0], df_agencies)

  df_destino_actualizado = df_destino.append(nueva_fila, ignore_index=True)


In [61]:
df_agencies

Unnamed: 0,agency_id,agency_name
0,145,MICRO OMNIBUS PRIMERA JUNTA S.A
1,9,LA NUEVA METROPOL S.A.
2,155,TRANSPORTE AUTOMOTORES LA PLATA SA


### Extracción de datos del estado de las estaciones de las ecobicis

In [65]:
endpoint_ecobici = "ecobici/gbfs"


#### Informacion de las estaciones

In [66]:
# Listado estático de todas las estaciones, sus capacidades y ubicaciones

endpoint_ecobiciSI = f"{endpoint_ecobici}/stationInformation"

full_url_ecobiciSI = f"{base_url}/{endpoint_ecobiciSI}"

r_ecobiciSI = requests.get(full_url_ecobiciSI, params=params)

r_ecobiciSI.status_code

200

In [67]:
json_ecobiciSI = r_ecobiciSI.json()
json_ecobiciSI

{'last_updated': 1708975524,
 'ttl': 22,
 'data': {'stations': [{'station_id': '2',
    'name': '002 - Retiro I',
    'physical_configuration': 'SMARTLITMAPFRAME',
    'lat': -34.59242413,
    'lon': -58.37470989,
    'altitude': 0.0,
    'address': 'AV. Dr. José María Ramos Mejía 1300',
    'post_code': '11111',
    'capacity': 40,
    'is_charging_station': False,
    'rental_methods': ['KEY', 'TRANSITCARD', 'PHONE'],
    'groups': ['RETIRO'],
    'obcn': '',
    'nearby_distance': 1000.0,
    '_ride_code_support': True,
    'rental_uris': {}},
   {'station_id': '3',
    'name': '003 - ADUANA',
    'physical_configuration': 'SMARTLITMAPFRAME',
    'lat': -34.61220714255728,
    'lon': -58.36912906378899,
    'altitude': 0.0,
    'address': 'Av. Paseo Colón 380',
    'cross_street': '.',
    'post_code': 'C1063',
    'capacity': 28,
    'is_charging_station': False,
    'rental_methods': ['KEY', 'TRANSITCARD', 'PHONE'],
    'groups': ['MONSERRAT'],
    'nearby_distance': 1000.0,
    '

In [68]:
data_ecobiciSI= json_ecobiciSI['data']['stations']
df_ecobiciSI = pd.DataFrame(data_ecobiciSI)

df_ecobiciSI.sample(n=10)

Unnamed: 0,station_id,name,physical_configuration,lat,lon,altitude,address,post_code,capacity,is_charging_station,rental_methods,groups,obcn,nearby_distance,_ride_code_support,rental_uris,cross_street
242,386,277 - Coghlan,SMARTLITMAPFRAME,-34.5654,-58.4759,0.0,2647 Estomba,1111,8,False,"[KEY, TRANSITCARD, PHONE]",[COGHLAN],,1000.0,True,{},
161,245,248 - Husares,SMARTLITMAPFRAME,-34.552594,-58.44294,0.0,Husares 2201,11111,16,False,"[KEY, TRANSITCARD, PHONE]",[BELGRANO],,1000.0,True,{},
117,181,181 - BILLINGHURST Y MANSILLA,SMARTLITMAPFRAME,-34.592665,-58.412007,0.0,"1520 Billinghurst & Mansilla, Lucio Norberto, ...",11111,20,False,"[KEY, TRANSITCARD, PHONE]",[RECOLETA],,1000.0,True,{},"1520 Billinghurst & Mansilla, Lucio Norberto, ..."
277,453,028 - Plaza de la Bandera,SMARTLITMAPFRAME,-34.62948,-58.494485,0.0,Av. Gaona 5181,C1407,20,False,"[KEY, TRANSITCARD, PHONE]",[VELEZ SARFIELD],,1000.0,True,{},
178,267,315 - BEIRO Y SAN MARTÍN,SMARTLITMAPFRAME,-34.597612,-58.498542,0.0,3209 Av. Francisco Beiro,1111,16,False,"[KEY, TRANSITCARD, PHONE]",[VILLA DEVOTO],,1000.0,True,{},3209 Av. Francisco Beiro
286,466,333 - PARQUE DE LA ESTACIÓN,SMARTLITMAPFRAME,-34.608096,-58.41184,0.0,Dr. Tomás Manuel de Anchorena 170,C1170,20,False,"[KEY, TRANSITCARD, PHONE]",[ALMAGRO],,1000.0,True,{},
137,207,123 - BASUALDO Y RODO,SMARTLITMAPFRAME,-34.652377,-58.487359,0.0,Guardia Nacional 1700,11111,16,False,"[KEY, TRANSITCARD, PHONE]",[MATADEROS],,1000.0,True,{},
159,241,348 - Villa del Parque,SMARTLITMAPFRAME,-34.600874,-58.494123,0.0,"Gutierrez, Ricardo 3105",11111,12,False,"[KEY, TRANSITCARD, PHONE]",[VILLA DEL PARQUE],,1000.0,True,{},
22,35,035 - INGENIERO BUTTY,SMARTLITMAPFRAME,-34.596425,-58.371847,0.0,Ing. E. Butty 291,11111,32,False,"[KEY, TRANSITCARD, PHONE]",[RETIRO],,1000.0,True,{},Ing. E. Butty 291 & Av Leandro N. Alem
55,80,080 - VALLE,SMARTLITMAPFRAME,-34.624581,-58.434123,0.0,Valle 486,C1424,12,False,"[KEY, TRANSITCARD, PHONE]",[CABALLITO],,1000.0,True,{},


#### Informacion del estado actual de las estaciones

In [69]:
# Obtencion del número de bicicletas y anclajes disponibles en cada estación y disponibilidad de estación.

endpoint_ecobiciSS = f"{endpoint_ecobici}/stationStatus"

full_url_ecobiciSS = f"{base_url}/{endpoint_ecobiciSS}"

r_ecobiciSS = requests.get(full_url_ecobiciSS, params=params)

In [70]:
r_ecobiciSS.status_code

200

In [71]:
json_ecobiciSS = r_ecobiciSS.json()
json_ecobiciSS

{'last_updated': 1708975536,
 'ttl': 29,
 'data': {'stations': [{'station_id': '2',
    'num_bikes_available': 8,
    'num_bikes_available_types': {'mechanical': 8, 'ebike': 0},
    'num_bikes_disabled': 0,
    'num_docks_available': 32,
    'num_docks_disabled': 0,
    'last_reported': 1708975418,
    'is_charging_station': False,
    'status': 'IN_SERVICE',
    'is_installed': 1,
    'is_renting': 1,
    'is_returning': 1,
    'traffic': None},
   {'station_id': '3',
    'num_bikes_available': 0,
    'num_bikes_available_types': {'mechanical': 0, 'ebike': 0},
    'num_bikes_disabled': 1,
    'num_docks_available': 27,
    'num_docks_disabled': 0,
    'last_reported': 1708975451,
    'is_charging_station': False,
    'status': 'IN_SERVICE',
    'is_installed': 1,
    'is_renting': 1,
    'is_returning': 1,
    'traffic': None},
   {'station_id': '4',
    'num_bikes_available': 2,
    'num_bikes_available_types': {'mechanical': 2, 'ebike': 0},
    'num_bikes_disabled': 0,
    'num_dock

In [72]:
json_ecobiciSS.keys()

dict_keys(['last_updated', 'ttl', 'data'])

In [73]:
data_ecobiciSS= json_ecobiciSS['data']
df_ecobiciSS = pd.DataFrame(data_ecobiciSS)

df_ecobiciSS

Unnamed: 0,stations
0,"{'station_id': '2', 'num_bikes_available': 8, ..."
1,"{'station_id': '3', 'num_bikes_available': 0, ..."
2,"{'station_id': '4', 'num_bikes_available': 2, ..."
3,"{'station_id': '5', 'num_bikes_available': 6, ..."
4,"{'station_id': '6', 'num_bikes_available': 17,..."
...,...
363,"{'station_id': '534', 'num_bikes_available': 4..."
364,"{'station_id': '535', 'num_bikes_available': 0..."
365,"{'station_id': '536', 'num_bikes_available': 0..."
366,"{'station_id': '537', 'num_bikes_available': 0..."


In [74]:
# Para pasar el json a una dataframe

data_ecobiciSS= json_ecobiciSS['data']['stations']
df_ecobiciSS = pd.DataFrame(data_ecobiciSS)

df_ecobiciSS.sample(n=10)

Unnamed: 0,station_id,num_bikes_available,num_bikes_available_types,num_bikes_disabled,num_docks_available,num_docks_disabled,last_reported,is_charging_station,status,is_installed,is_renting,is_returning,traffic
77,107,4,"{'mechanical': 4, 'ebike': 0}",0,12,0,1708975000.0,False,IN_SERVICE,1,1,1,
193,280,7,"{'mechanical': 7, 'ebike': 0}",0,21,0,1708975000.0,False,IN_SERVICE,1,1,1,
357,528,5,"{'mechanical': 5, 'ebike': 0}",1,10,0,1708975000.0,False,IN_SERVICE,1,1,1,
151,220,2,"{'mechanical': 2, 'ebike': 0}",1,13,0,1708975000.0,False,IN_SERVICE,1,1,1,
192,278,0,"{'mechanical': 0, 'ebike': 0}",3,13,0,1708975000.0,False,IN_SERVICE,1,1,1,
261,400,1,"{'mechanical': 1, 'ebike': 0}",6,13,0,1708975000.0,False,IN_SERVICE,1,1,1,
352,522,3,"{'mechanical': 3, 'ebike': 0}",0,13,0,1708971000.0,False,IN_SERVICE,1,1,1,
81,116,2,"{'mechanical': 2, 'ebike': 0}",3,7,0,1708976000.0,False,IN_SERVICE,1,1,1,
268,416,1,"{'mechanical': 1, 'ebike': 0}",3,24,0,1708975000.0,False,IN_SERVICE,1,1,1,
337,505,0,"{'mechanical': 0, 'ebike': 0}",1,15,0,1708976000.0,False,IN_SERVICE,1,1,1,


## Conexión con base de datos

In [75]:
db_keys = read_api_credentials("config/pipeline.conf", "RedShift")

try:
    conn = psycopg2.connect(
        host = db_keys["host"],
        dbname = db_keys["dbname"] ,
        user = db_keys['user'],
        password = db_keys['pwd'],
        port = db_keys["port"],
    )
    print("Conectado a Redshift con éxito!")
    
except Exception as e:
    print("No es posible conectar a Redshift")
    print(e)

Conectado a Redshift con éxito!


### Tablas para datos de los bus

Tabla para las agencias de interes

In [76]:
with conn.cursor() as cur:
    cur.execute("""
        create table if not exists  camilagonzalezalejo02_coderhouse.agencies
        (       	
	    agency_id INTEGER,
	    agency_name VARCHAR(100)
        )
    DISTSTYLE ALL
    sortkey(agency_id)
    """)
    conn.commit()

Tabla para los viajes realizados por dichas agencias

In [77]:
try:
    with conn.cursor() as cur:
        cur.execute("""
            DROP TABLE bus_positions;
            create table if not exists  camilagonzalezalejo02_coderhouse.bus_positions
            (	
            id INTEGER,
            agency_id INTEGER,
            route_id INTEGER,
            latitude NUMERIC,
            longitude NUMERIC,
            speed NUMERIC,
            timestamp timestamp,
            route_short_name VARCHAR(50),
            trip_headsign VARCHAR(100)
            )
        DISTKEY (agency_id)
        sortkey(agency_id)   
        """)
        conn.commit()
except psycopg2.Error as e:
    print("Error al ejecutar la consulta SQL:", e)

### Tablas para datos de las ecobici

Tabla para el estado estatico de las estaciones de ecobici

In [78]:
try:
    with conn.cursor() as cur:
        cur.execute("""
            DROP TABLE ecobici_stations;
            create table if not exists  camilagonzalezalejo02_coderhouse.ecobici_stations
            (       	
            station_id INTEGER,
            name VARCHAR(100),
            address VARCHAR(100),
            capacity INTEGER,
            latitude NUMERIC,   
            longitude NUMERIC, 
            neighborhood VARCHAR(100)      
            )
        DISTKEY (station_id)
        sortkey(station_id)
        """)
        conn.commit()
except psycopg2.Error as e:
    print("Error al ejecutar la consulta SQL:", e)

Tabla para el estado actual de las estaciones de ecobici

In [79]:
try:
    with conn.cursor() as cur:
        cur.execute("""
            DROP TABLE ecobici_stations_status;
            create table if not exists  camilagonzalezalejo02_coderhouse.ecobici_stations_status
            (  
            station_id INTEGER,     	
            num_bikes_available_mechanical INTEGER, 
            num_bikes_available_ebike INTEGER,
            num_bikes_disabled INTEGER,
            last_reported TIMESTAMP,
            status VARCHAR(50)          
            )
        DISTKEY (station_id)
        sortkey(station_id)
        """)
        conn.commit()
except psycopg2.Error as e:
    print("Error al ejecutar la consulta SQL:", e)

## Preparación de datos para la subida a RedShift

En la consigna se plantea que se use un solo df, dado que había extraido datos de dos medios de transporte voy a usar dos c: 

In [80]:
def load_to_sql(df, table_name, engine, if_exists="replace"):
    """
    Carga un DataFrame en la base de datos especificada.

    Parameters:
    df (pandas.DataFrame): El DataFrame a cargar en la base de datos.
    table_name (str): El nombre de la tabla en la base de datos.
    engine (sqlalchemy.engine.base.Engine): Un objeto de conexión a la base de datos.
    if_exists (str): "append OR replace"
    """
    try:
        logging.info("Cargando datos en la base de datos...")
        df.to_sql(
            table_name,
            engine,
            if_exists=if_exists,
            index=False,
            method="multi"
            )
        logging.info("Datos cargados exitosamente en la base de datos")
    except Exception as e:
        logging.error(f"Error al cargar los datos en la base de datos: {e}")