Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public use of API #8

Open
konhi opened this issue Apr 14, 2022 · 5 comments
Open

Public use of API #8

konhi opened this issue Apr 14, 2022 · 5 comments
Assignees

Comments

@konhi
Copy link

konhi commented Apr 14, 2022

Hi!

When I was doing research on Wrocław's API, I've stumbled upon this repository, and it's incredible! It gives a good example of enterprise-grade API that works on production and uses modern infrastructure. I especially appreciate clear documentation of every detail (not omitting even the costs!) that makes understanding how does it work frictionless! Furthermore, the project doesn't limit only to a server, but also powers beautiful app! I wish I would have iOS to check it out. 😄 I hope this repository is going to have more stars, so it's easier to discover. Besides that, it seems to be perfect project to be put into portfolio! 👏

I work on a related project, poland-public-transport-api, and I was wondering if your API (/api/v1/lines;stops;vehicles) could be integrated into my middleware. It would leverage Cloudflare's Cache, so usage would be limited to minimum. Hopefully, your project could gain popularity and use among programmers! Of course, if it's something you wouldn't like, let me know, and I will just leave link to this repo in my project!

@LiarPrincess
Copy link
Owner

Ooo… thanks!


Ultra important!

If you want to use government data then you HAVE TO ask them for permission! Apply using one of the following legal acts as your base:


Anyway…

I wish I would have iOS to check it out.

If you have a mac then you can download, compile and run the app in the simulator. If not, then there is a video that I send to Apple with each update (this is an old version, now we also have bell for notifications from @AlertMPK).

As for the using Wroclive api:

  • /api/v1/lines and /api/v1/stops are taken from GTFS file from wroclaw.pl/open-data. I think that the easiest way of dealing with GTFS is to put the data in SQLite and run queries on it (SQLite has built-in import of .csv files).

    You can use Wroclive api for this, but this is rather trivial to implement (you already did this for Zielona Góra).

  • /api/v1/notifications - taken from @AlertMPK. You can use Wroclive api for this, but (again) this is trivial to implement.

  • /api/v1/vehicles - this is taken from wroclaw.pl/open-data or mpk.wroc.pl/mapa-pozycji-pojazdow.

    This is extremely tricky to implement, for many reasons:

    • you have to calculate headings yourself
    • both data sources include vehicles in depots/outside of their schedule etc. - you have to remove them by yourself
    • both data sources fail very often - wroclaw.pl/open-data was not working for the last 3 weeks, because reasons…
    • etc.

    As for the using Wroclive api: I would like to avoid additional unpredictable traffic hitting our servers. This leaves us with:

    • use our code to set up your own server
    • poll Wroclive every few seconds -> store the result -> distribute to your customers. This is “ok” for us, because it is a “predictable” traffic. Unfortunately this also adds delay/lag: we get data from MPK (they are already out-of-date - Wroclive refresh interval is 5s) -> you get data from us (even more out of date).
    • use cache like Cloudflare - we are dealing with extremely time-sensitive data, you can cache them but only for a few seconds. It also suffers delay, just like the “poll”.

    Obviously I do not guarantee that Wrocive will be returning “correct” data or even working, if MPK fails then we will also fail.

Some tips:

  • think carefully about what you are going to do - one may think that this is trivial and they can write it in one weekend. This is extremely wrong. There is a big difference between production-ready and “it works”. You will spend hundreds of hours on this. If you do not have enough free time then…

  • include timestamp with every response - this will allow you to check if the server hangs (obsolete timestamp) or the data is stale (fresh timestamp, old data).

  • separate GTFS ingestion from server functionality - we have a separate machine that puts things in database, so that the frontend servers (AppEngine) just gets the data from database.

  • for sending time (for example for stop departures) I would use one of:

    • military time - 18:05 becomes 1805. Easy to parse (int + divmod by 100) and human readable.
    • minutes since midnight - 18:05 becomes 18*60 + 5 = 1085. Easy to parse, but not human readable.

    Sending time as "18:05" makes parsing difficult.

  • using AWS/GCP/Azure gives you a lot of free things:

    • logging - every request
    • monitoring - uptime, expired SSL
    • error reporting - log error -> send mail

@LiarPrincess LiarPrincess self-assigned this Apr 14, 2022
@konhi
Copy link
Author

konhi commented Apr 14, 2022

If you want to use government data then you HAVE TO ask them for permission! Apply using one of the following legal acts as your base

Big thanks for letting me know. I will definitely look into it, I really hope it won't complicate things - I didn't worry about legal things before! 😄 thank you for possibly saving me from being sued lol

If not, then there is a video that I send to Apple with each update

Wow, thank you so much for sharing! I love the select design - I wanted to implement something similar in other project I'll be doing, and I will definitely take inspiration from that. App looks lovely in general, great artistic job! ✨

You can use Wroclive api for this, but this is rather trivial to implement (you already did this for Zielona Góra).

Dezipping and things is not a task suited for Cloud Functions (which my API works on), but I think GitHub Actions will do the work! Thank you for explaining how does this process work.

As for the using Wroclive api: I would like to avoid additional unpredictable traffic hitting our servers. This leaves us with:

Since I do not have personal interest in API for Wrocław yet, and I don't want to stress your servers, I will leave a link to your repository for now! It's incredibly big help that you managed to wrote all these details. It should be easy to incorporate it into the API, eventually! Thank you a thousand times 🎉

using AWS/GCP/Azure gives you a lot of free things:

This is how I ended trying to use my free $100 on Azure for this project before even going to production, hehe

image

Though, I'm amazed how much cloud services were smartly used in this project!

think carefully about what you are going to do - one may think that this is trivial and they can write it in one weekend. This is extremely wrong. There is a big difference between production-ready and “it works”. You will spend hundreds of hours on this. If you do not have enough free time then…

Seems to be a very universal tip, thank you for giving advice!

include timestamp with every response - this will allow you to check if the server hangs (obsolete timestamp) or the data is stale (fresh timestamp, old data).

Makes sense, it seems to be a really useful feature.

Sending time as "18:05" makes parsing difficult.

Noted, very useful tip!


Technical questions:

both data sources include vehicles in depots/outside of their schedule etc. - you have to remove them by yourself

  • what's the reason for doing that? I tried to fetch data for every possible bus and tram, and this is what I got. Is there anything off?

@LiarPrincess
Copy link
Owner

LiarPrincess commented Apr 15, 2022

You can use Wroclive api for this, but this is rather trivial to implement (you already did this for Zielona Góra).

Dezipping and things is not a task suited for Cloud Functions (which my API works on), but I think GitHub Actions will do the work! Thank you for explaining how does this process work.

Depends on what “Cloud Functions” mean:

  • lambda on AWS/Cloud Functions on GCP/Functions on Azure - then you can just query the database.
  • Cloudflare Cloud Functions - then you are right, you can't. Though a minor hack that you can use Cloud Functions to fetch resource from other domain. And as an “other domain” you can use GitHub + actions. As far as I know this is how szczepienia.github.io worked.

both data sources include vehicles in depots/outside of their schedule etc. - you have to remove them by yourself

what's the reason for doing that? I tried to fetch data for every possible bus and tram, and this is what I got. Is there anything off?

Try open data at 3 am -> you will see a lot of vehicles just standing there in depots. This is a pure visual noise.

@LiarPrincess
Copy link
Owner

LiarPrincess commented Apr 15, 2022

using AWS/GCP/Azure gives you a lot of free things:

This is how I ended trying to use my free $100 on Azure for this project before even going to production, hehe

The whole Wroclive is based on GCP - Free Tier products. Though you have to be careful when reading terms, for example: for Compute Engine you get “1 non-preemptible e2-micro VM instance per month” but only in us-west1, us-central1, us-east1.

This does not matter for AppEngine, so you can place them wherever you like.

@konhi
Copy link
Author

konhi commented Apr 15, 2022

Thank you once again for explaining all these things in so much detail! I don't have any more questions, but I feel like it's good to leave this issue opened, so others might see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants