-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTM shuttle times scraper #32
Comments
Proposed schema (per shuttle stop id): {
"id": String,
"name": String,
"dates": [{
"start": String,
"end": String
}]
} |
I don't really understand what the |
Woops, start and end don't apply here, do they. ;P So there is another aspect to this that I hadn't looked at or seen before. Based on other transit APIs I've looked at, they organize data with So the following is schema with top level being a route: {
"name": String,
"stops": [{
"location": String,
"building_id": String,
"times": [String]
}]
} An example with fictional data for St. George route: {
"name": "St. George Route",
"stops": [
{
"location": "Instructional Centre Layby",
"building_id": "334",
"times": [
"2016-04-13T05:55:00-04:00",
"2016-04-13T07:55:00-04:00",
"2016-04-14T05:55:00-04:00",
"2016-04-14T07:55:00-04:00"
]
},
{
"location": "Hart House",
"building_id": "002",
"times": [
"2016-04-13T08:55:00-04:00",
"2016-04-13T10:55:00-04:00",
"2016-04-14T08:55:00-04:00",
"2016-04-14T10:55:00-04:00"
]
}
]
} |
Note: the dates are formatted in the ISO 8601 standard, offset for the Eastern timezone. It balances human readability in a compact form, and of course remains machine readable. I think this is the standard the whole project should take, but if you have an argument for something better than we can discuss that. |
Would we have a gigantic list of all times for the month per stop, or would we try to split it up so it's 1 file per day? |
Once per day seems appropriate since there would be a /lot/ of times otherwise. I wish the shuttle times were a little more predictable, but on random days it likes to change slightly. :/ If we do days, then the top level would be days: {
"date": "2016-04-13",
"routes": [
...
]
} |
Yeah it's usually schedules that are consistent for Monday - Thursday, then a few are missing for Friday, and Saturday/Sunday have way less. Then there's the special schedules for exam periods, reading weeks, etc. |
So it seems like the route ids aren't the same across the days, so we'll need to use the names as the identifiers. Unless you have a better idea, @qasim ? (I'll probably take a shot at implementing this scraper.) |
@arkon that works. The convention so far has been id being all caps alphanumerical. So you could rmove the spaces/special characters,
|
@qasim Yeah that would probably work. It should be something like: {
"date": "2016-04-13",
"routes": [
{
"id": "STGEORGE",
"name": "St. George Route",
"stops": [
{
"location": "Instructional Centre Layby",
"building_id": "334",
"times": [
"2016-04-13T05:55:00-04:00",
"2016-04-13T07:55:00-04:00"
]
},
{
"location": "Hart House",
"building_id": "002",
"times": [
"2016-04-13T08:55:00-04:00",
"2016-04-13T10:55:00-04:00"
]
}
]
},
{
"id": "SHERIDAN",
"name": "Sheridan Route",
"stops": [
{
"location": "Deerfield Hall North Layby",
"building_id": "340",
"times": [
"2016-04-13T05:55:00-04:00",
"2016-04-13T07:55:00-04:00"
]
},
{
"location": "Sheridan",
"building_id": "",
"times": [
"2016-04-13T08:55:00-04:00",
"2016-04-13T10:55:00-04:00"
]
}
]
}
]
} Note that there's no |
Looks good. Eventually I want the project to start referencing other scraper's IDs as much as possible, there are a few cases where we don't right now. There's no infrastructure for that yet, though (matching building names to IDs in other scrapers). I guess for this one you'll have a manual mapping somewhere of the known stops to building IDs? |
Yeah, I guess the manual mapping would work. How are you going it elsewhere right now? |
If it's a map.utoronto.ca layer, chances are there is a |
This should be good to close after #41 (diff) is fixed. |
https://m.utm.utoronto.ca/shuttleByDate.php?year=2016&month=04&day=10
UTM has a mobile website for their UTM <-> UTSG shuttle. This would fall under transit / transportation. This scraper should scrape the current month's shuttle times (First day of the current month all the way to the last day). The URL makes this an easy ~30 page request scrape.
As for UTSG and UTSC and other UTM transportation, transit is solely TTC and Go. They already have their own open data APIs so we will leave it at that!
The text was updated successfully, but these errors were encountered: