## Bus Stops Dataset
In this notebook, a dataset about bus stops in the City of Buenos Aires (CABA) is prepared for use in the final visualization. It contains info about the geo-localization, name, commune, and other attributes for each bus stops.

### Source
Source is the dataset [Paradas de Colectivo](https://data.buenosaires.gob.ar/dataset/colectivos-paradas/resource/d0e599d2-3e78-4fb2-9255-30a2be0525f8) from the Government of the City of Buenos Aires.

### Details
For this, Pandas will be used as the main tool. The main normalization will be translating some terms from Spanish to English. Also, deleting some columns that do not have to do with our use case.

I normalized a dataset using pandas by modifying the original DataFrame directly. This was achieved by utilizing the *inplace=True* parameter within the pandas functions used for normalization. This approach offers the advantage of memory efficiency as it avoids creating a new DataFrame to store the normalized data. However, a potential disadvantage is that if I need to revert to the original, unnormalized data, I would need to either maintain a separate copy or re-load the original dataset.

In [1]:
import pandas as pd

In [2]:
bus_stops_df = pd.read_csv('source_datasets/paradas-de-colectivo.csv')

Let's take a first look at the dataset

In [3]:
bus_stops_df.head()

Unnamed: 0,CALLE,ALT PLANO,DIRECCION,coord_X,coord_Y,COMUNA,BARRIO,L1,l1_sen,L2,l2_sen,L3,l3_sen,L4,l4_sen,L5,l5_sen,L6,l6_sen
0,DEFENSA,1524,1524 DEFENSA,-58.370995,-34.625659,1,SAN TELMO,22.0,V,53.0,I,,,,,,,,
1,BARTOLOME MITRE,906,"906 MITRE, BARTOLOME",-58.379659,-34.607216,1,SAN NICOLAS,105.0,V,,,,,,,,,,
2,REGIMIENTO DE PATRICIOS AV.,51,51 REGIMIENTO DE PATRICIOS AV.,-58.370664,-34.630226,4,BARRACAS,93.0,I,70.0,V,74.0,I,,,,,,
3,REGIMIENTO DE PATRICIOS AV.,389,389 REGIMIENTO DE PATRICIOS AV.,-58.37036,-34.63341,4,BARRACAS,10.0,I,22.0,I,,,,,,,,
4,REGIMIENTO DE PATRICIOS AV.,435,435 REGIMIENTO DE PATRICIOS AV.,-58.370277,-34.634012,4,BARRACAS,24.0,i,46.0,V,,,,,,,,


### Translation of attribute names

Let's now translate only the useful attributes for our visualization.

In [4]:
translations = {
    'columns': {
        'CALLE': 'street',
        'ALT PLANO': 'street number',
        'DIRECCION': 'complete address',
        'coord_X': 'long',
        'coord_Y': 'lat',
        'COMUNA': 'commune',
        'BARRIO': 'neighborhood'
    }
}
bus_stops_df = bus_stops_df.rename(columns=translations['columns'])

In [5]:
bus_stops_df.head()

Unnamed: 0,street,street number,complete address,long,lat,commune,neighborhood,L1,l1_sen,L2,l2_sen,L3,l3_sen,L4,l4_sen,L5,l5_sen,L6,l6_sen
0,DEFENSA,1524,1524 DEFENSA,-58.370995,-34.625659,1,SAN TELMO,22.0,V,53.0,I,,,,,,,,
1,BARTOLOME MITRE,906,"906 MITRE, BARTOLOME",-58.379659,-34.607216,1,SAN NICOLAS,105.0,V,,,,,,,,,,
2,REGIMIENTO DE PATRICIOS AV.,51,51 REGIMIENTO DE PATRICIOS AV.,-58.370664,-34.630226,4,BARRACAS,93.0,I,70.0,V,74.0,I,,,,,,
3,REGIMIENTO DE PATRICIOS AV.,389,389 REGIMIENTO DE PATRICIOS AV.,-58.37036,-34.63341,4,BARRACAS,10.0,I,22.0,I,,,,,,,,
4,REGIMIENTO DE PATRICIOS AV.,435,435 REGIMIENTO DE PATRICIOS AV.,-58.370277,-34.634012,4,BARRACAS,24.0,i,46.0,V,,,,,,,,


### Getting rid of unrelated attributes
For this visualization, only the commune attribute for each bus stop is actually relevant. However, further exploration related to our goal might require using other attributes like longitude and latitude, or even neighborhood information. Therefore, additional information related to the bus lines going through each stop will be deleted.

In [6]:
bus_stops_df = bus_stops_df[['street', 'street number', 'complete address', 'long', 'lat', 'commune', 'neighborhood']]

In [7]:
bus_stops_df.head()

Unnamed: 0,street,street number,complete address,long,lat,commune,neighborhood
0,DEFENSA,1524,1524 DEFENSA,-58.370995,-34.625659,1,SAN TELMO
1,BARTOLOME MITRE,906,"906 MITRE, BARTOLOME",-58.379659,-34.607216,1,SAN NICOLAS
2,REGIMIENTO DE PATRICIOS AV.,51,51 REGIMIENTO DE PATRICIOS AV.,-58.370664,-34.630226,4,BARRACAS
3,REGIMIENTO DE PATRICIOS AV.,389,389 REGIMIENTO DE PATRICIOS AV.,-58.37036,-34.63341,4,BARRACAS
4,REGIMIENTO DE PATRICIOS AV.,435,435 REGIMIENTO DE PATRICIOS AV.,-58.370277,-34.634012,4,BARRACAS


## Exporting the dataset

In [8]:
bus_stops_df.to_csv("processed_data/bus_stops.csv", encoding='utf-8', index=False)