## O que é GTFS?/ What is GTFS?

### GTFS --> https://developers.google.com/transit/gtfs

A Especificação Geral sobre Feeds de Transporte Público (GTFS, na sigla em inglês), 
também chamada de GTFS estática ou transporte público estático para diferenciá-la da 
extensão GTFS Realtime, define um formato comum para os horários de transporte público e 
as informações geográficas relacionadas. Os "feeds" GTFS permitem que empresas publiquem 
dados relevantes e que desenvolvedores criem aplicativos que consumam essas informações com interoperabilidade.

Cada arquivo modela um aspecto específico das informações sobre o transporte público: 
paradas, trajetos, viagens e outros dados relativos a horários. 
Os detalhes de cada arquivo são definidos na referência GTFS.


### Referência de arquivos/ Files reference 

- **reference:** https://developers.google.com/transit/gtfs/reference


### Arquivos utilizados para o processamento / GTFS Files for processing data

- routes.txt
- trips.txt
- shapes.txt
- stops.txt
- stop_times.txt


### Arquivos AVL / AVL Files
- MO_XXX --> registro de posicoes dos veiculos do dia XXX de outubro/vehicle positions registers of the buses
- AL_XXX --> arquivo descrevendo rota da linha e direção/file describing line route and direction

In [1]:
# Attentiton - A line can have more than one bus
# Exploring bus with avl_id 35148 and line_id 33011
import pandas as pd

# reading bus traces
traces = pd.read_csv("./testing-map-views-sample-data/day_15101_line_33011.csv",header=None,sep=",")

In [2]:
# setting header of the dataframe
traces.columns = ["dt_server","dt_avl","line_id","latitude","longitude","id_avl","event","id_point","hour_server","hour_avl","hour_diff","region"]

In [3]:
# selecting traces from bus 35148
traces_35148 = traces.loc[traces['id_avl'] == 35148]

# selecting traces from bus 35557
traces_35557 = traces.loc[traces['id_avl'] == 35557]

In [4]:
# sorting trace by dt_server
traces_35148_sorted = traces_35148.sort_values(by=['dt_server'])
traces_35148_sorted.head(10)

Unnamed: 0,dt_server,dt_avl,line_id,latitude,longitude,id_avl,event,id_point,hour_server,hour_avl,hour_diff,region
582,2015-10-01 06:00:21.777,2015-10-01 06:00:20.000,33011,-23.520023,-46.524605,35148,64,0,6,6,1.777,PENHA
3309,2015-10-01 06:01:07.190,2015-10-01 06:01:05.000,33011,-23.516272,-46.520957,35148,64,0,6,6,2.19,PENHA
1542,2015-10-01 06:01:52.540,2015-10-01 06:01:50.000,33011,-23.514693,-46.5197,35148,64,0,6,6,2.54,PONTE RASA
1298,2015-10-01 06:02:42.717,2015-10-01 06:02:35.000,33011,-23.513012,-46.516335,35148,64,0,6,6,7.717,PONTE RASA
7899,2015-10-01 06:03:21.970,2015-10-01 06:03:20.000,33011,-23.512138,-46.512293,35148,64,0,6,6,1.97,PONTE RASA
4472,2015-10-01 06:04:07.007,2015-10-01 06:04:05.000,33011,-23.51071,-46.508275,35148,64,0,6,6,2.007,PONTE RASA
6632,2015-10-01 06:04:53.523,2015-10-01 06:04:50.000,33011,-23.510942,-46.502922,35148,64,0,6,6,3.523,PONTE RASA
1230,2015-10-01 06:05:39.147,2015-10-01 06:05:35.000,33011,-23.51061,-46.49898,35148,64,0,6,6,4.147,PONTE RASA
1385,2015-10-01 06:06:22.343,2015-10-01 06:06:20.000,33011,-23.509108,-46.495805,35148,64,0,6,6,2.343,PONTE RASA
4665,2015-10-01 06:07:07.387,2015-10-01 06:07:05.000,33011,-23.507597,-46.490633,35148,64,0,6,6,2.387,PONTE RASA


In [5]:
# sorting trace by dt_server
traces_35557_sorted = traces_35557.sort_values(by=['dt_server'])
traces_35557_sorted.count()

dt_server      289
dt_avl         289
line_id        289
latitude       289
longitude      289
id_avl         289
event          289
id_point       289
hour_server    289
hour_avl       289
hour_diff      289
region         289
dtype: int64

In [6]:
# selecting coordinates from bus 35148
traces_coordinates = traces_35148_sorted[["latitude","longitude"]].values.tolist()
traces_coordinates

[[-23.520023000000002, -46.524605],
 [-23.516272, -46.520957],
 [-23.514692999999998, -46.5197],
 [-23.513012, -46.516335],
 [-23.512138, -46.512293],
 [-23.51071, -46.508275],
 [-23.510942, -46.502922],
 [-23.51061, -46.498979999999996],
 [-23.509107999999998, -46.495805],
 [-23.507597, -46.490633],
 [-23.507355, -46.487408],
 [-23.507268, -46.482555],
 [-23.507102, -46.4784],
 [-23.506587, -46.477205],
 [-23.504527, -46.473267],
 [-23.501942, -46.47096],
 [-23.501457000000002, -46.470555],
 [-23.49902, -46.467663],
 [-23.49795, -46.46099],
 [-23.498305, -46.457705],
 [-23.497757999999997, -46.454577],
 [-23.497377, -46.454115],
 [-23.494675, -46.451407],
 [-23.493237, -46.449625],
 [-23.492303, -46.447995],
 [-23.491443, -46.44631],
 [-23.492133, -46.445065],
 [-23.49201, -46.444753000000006],
 [-23.493572, -46.443065000000004],
 [-23.493507, -46.4411],
 [-23.493723000000003, -46.440073],
 [-23.494425, -46.438722],
 [-23.495107, -46.436595000000004],
 [-23.495222000000002, -46.433778

In [7]:
# selecting coordinates from bus 35557
traces_coordinates_35557 = traces_35557_sorted[["latitude","longitude"]].values.tolist()
traces_coordinates_35557

[[-23.525862, -46.47418],
 [-23.525827, -46.474455],
 [-23.525878, -46.474453000000004],
 [-23.532138, -46.473625],
 [-23.534342000000002, -46.473745],
 [-23.535242, -46.474077],
 [-23.537625, -46.47637],
 [-23.538933, -46.481446999999996],
 [-23.539448, -46.483543],
 [-23.537155, -46.486833000000004],
 [-23.535888, -46.49091],
 [-23.534985, -46.496083],
 [-23.532088, -46.500535],
 [-23.530903, -46.502377],
 [-23.528185, -46.506558],
 [-23.527638, -46.509122999999995],
 [-23.528403, -46.514178],
 [-23.530095000000003, -46.517808],
 [-23.531417, -46.520732],
 [-23.531587, -46.521487],
 [-23.531605, -46.521553000000004],
 [-23.531643, -46.521555],
 [-23.531651999999998, -46.521547999999996],
 [-23.532275, -46.521445],
 [-23.532488, -46.521873],
 [-23.532427, -46.522088000000004],
 [-23.532475, -46.525845000000004],
 [-23.532401999999998, -46.527372],
 [-23.530955, -46.527602],
 [-23.529957, -46.527495],
 [-23.529867000000003, -46.526545],
 [-23.531207000000002, -46.526063],
 [-23.531488,

In [10]:
# shape header
shape_columns = ["shape_id","shape_lat","shape_lon","shape_sequence","shape_dist_traveled"]

# reading shape 54264
shape_54264 = pd.read_csv("./testing-map-views-sample-data/day_15101_shape_54264.csv",header=None,sep=",")
shape_54264.columns = shape_columns

# reading shape 54265
shape_54265 = pd.read_csv("./testing-map-views-sample-data/day_15101_shape_54265.csv",header=None,sep=",")
shape_54265.columns = shape_columns

In [11]:
shape_54264.head(3)

Unnamed: 0,shape_id,shape_lat,shape_lon,shape_sequence,shape_dist_traveled
0,54264,-23.539646,-46.431387,1,8.881188
1,54264,-23.539573,-46.431351,2,8.881188
2,54264,-23.539467,-46.431299,3,27.883213


In [12]:
shape_54265.head(3)

Unnamed: 0,shape_id,shape_lat,shape_lon,shape_sequence,shape_dist_traveled
0,54265,-23.53128,-46.530338,1,0.0
1,54265,-23.531269,-46.530429,2,9.372123
2,54265,-23.531238,-46.530695,3,36.751312


In [13]:
# selecting shape coordinates
shape_54264_coordinates = shape_54264[["shape_lat","shape_lon"]].values.tolist()
shape_54265_coordinates = shape_54265[["shape_lat","shape_lon"]].values.tolist()

In [14]:
# https://github.com/dushyantkhichi/python
import folium
import csv

# map object
map = folium.Map([20.5937,78.9629], zoom_start=5)

# map tiles
# tile = folium.TileLayer('Mapbox Bright').add_to(map)
# tile = folium.TileLayer('Mapbox Control Room').add_to(map)
# tile = folium.TileLayer('Stamen Terrain').add_to(map)
# tile = folium.TileLayer('Stamen Toner').add_to(map)
# tile = folium.TileLayer('stamenwatercolor').add_to(map)
tile = folium.TileLayer('cartodbpositron').add_to(map)
# tile = folium.TileLayer('cartodbdark_matter').add_to(map)

for coord in shape_54264_coordinates:
    icon = folium.features.CustomIcon('red-point.png', icon_size=(5,5))
    folium.Marker(coord,icon=icon).add_to(map)

    
for coord in shape_54265_coordinates:
    icon = folium.features.CustomIcon('blue-point.png', icon_size=(5,5))
    folium.Marker(coord,icon=icon).add_to(map)

for coord in traces_coordinates:
    icon = folium.features.CustomIcon('green-point.png', icon_size=(5,5))
    folium.Marker(coord,icon=icon).add_to(map)

for coord in traces_coordinates_35557:
    icon = folium.features.CustomIcon('yellow-point.png', icon_size=(5,5))
    folium.Marker(coord,icon=icon).add_to(map)
    

In [15]:
map.save('testing-line_33011.html')

In [27]:
# Results
# Both of the buses follow the blue dot (shape_54265) which belogs to trip 2770-10-1 (direction = 1).
# In the MO file the buses are 35557 and 35148 which belong to the line_id 33011 in october 1 2015.
# In AL_15101 file the line_id belongs the line_number 2770-10, and it has the direction 2.
# Futhermore, AL fiole direction 2 = 1 in trips.txt, and direction 1 in AL = direction 0 in trips.txt

In [55]:
# Testing more buses and shapes
# trip 119L-10-1, shape = 58447
# trip 119L-10-0, shape = 58450
# trip 8002-10-1, shape = 58290
# trip 8002-10-0, shape = 58289
#
#
#
# Analysis for line 2273 october 2 2015/ trip 119L-10/

# reading bus traces
traces_line_2273 = pd.read_csv("./testing-map-views-sample-data/day_15102_line_2273.csv",header=None,sep=",")

# setting header of the dataframe
traces_line_2273.columns = ["dt_server","dt_avl","line_id","latitude","longitude","id_avl","event","id_point","hour_server","hour_avl","hour_diff","region"]

# selecting traces from bus 55013
traces_bus_55013 = traces_line_2273.loc[traces_line_2273['id_avl'] == 55013]
traces_bus_55013_coord = traces_bus_55013[["latitude","longitude"]].values.tolist()

# selecting traces from bus 55190
traces_bus_55190 = traces_line_2273.loc[traces_line_2273['id_avl'] == 55190]
traces_bus_55190_coord = traces_bus_55190[["latitude","longitude"]].values.tolist()

# shapes 58447 and 58450

# shape header
shape_columns = ["shape_id","shape_lat","shape_lon","shape_sequence","shape_dist_traveled"]

# reading shape 58447
shape_58447 = pd.read_csv("./testing-map-views-sample-data/day_15102_shape_58447.csv",header=None,sep=",")
shape_58447.columns = shape_columns

# coordinate df to list
shape_58447_coord = shape_58447[["shape_lat","shape_lon"]].values.tolist()


# reading shape 58450
shape_58450 = pd.read_csv("./testing-map-views-sample-data/day_15102_shape_58450.csv",header=None,sep=",")
shape_58450.columns = shape_columns

# coordinate df to list
shape_58450_coord = shape_58450[["shape_lat","shape_lon"]].values.tolist()

# https://github.com/dushyantkhichi/python
import folium
import csv

# map object
map = folium.Map([-23.595354, -46.664280], zoom_start=5)

# map tiles
tile = folium.TileLayer('cartodbpositron').add_to(map)

for coord in shape_58447_coord:
    icon = folium.features.CustomIcon('red-point.png', icon_size=(5,5))
    folium.Marker(coord,icon=icon).add_to(map)

    
for coord in shape_58450_coord:
    icon = folium.features.CustomIcon('blue-point.png', icon_size=(5,5))
    folium.Marker(coord,icon=icon).add_to(map)

for coord in traces_bus_55190_coord:
    icon = folium.features.CustomIcon('green-point.png', icon_size=(5,5))
    folium.Marker(coord,icon=icon).add_to(map)

for coord in traces_bus_55013_coord:
    icon = folium.features.CustomIcon('yellow-point.png', icon_size=(5,5))
    folium.Marker(coord,icon=icon).add_to(map)
    
map.save('testing_line_2273_day_15102_.html')

In [56]:
# Results for line 2273 in MO_15102
# The buses are following the trace blue which represents the shape_58450 which belongs to trip 119L-10-0.
# The buses belong to line 2273 in october 2 2015, the direction in AL file is 1.
# The direction of the blue shape/trip in the map is 0
# So, the direction 1 in AL file is equal to direction 0 in trips.txt
# OBS: There are traces for the bus that is out of the shape (many points)

In [1]:
Próximmos passos:
- processar os arquivos AL mapeando direction de 1 para 0 e de 2 para 1
- algoritmo limpeza/map matching

SyntaxError: invalid syntax (<ipython-input-1-4a6a46920ce8>, line 1)