Skip to content

feat(bus): fetch fcps bus delay information #1796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

Conversation

mktaher
Copy link

@mktaher mktaher commented May 22, 2025

Closes #1413

Proposed changes

  • Fetches data from busdelay.fcps.edu every 10 seconds or 1 minute during specific time ranges in the day and updates buses to delayed if a Jefferson High bus is found.
  • Displays the ETA, the route name, and the reason for the bus being delayed on the morning/afternoon bus pages.

Brief description of rationale

  • During Bus Hours we send a request to the bus delay website and scrape information that has to do with TJ buses. This will allow us to directly get buses marked as delayed from the official website, and people will be able to know if their buses are delayed or not quickly and efficiently.

@mktaher mktaher requested a review from a team as a code owner May 22, 2025 00:31
@coveralls
Copy link

coveralls commented May 22, 2025

Coverage Status

coverage: 79.083% (-0.3%) from 79.382%
when pulling 167767e on mktaher:dev
into 5fa3ea2 on tjcsl:dev.

Copy link
Member

@shrysjain shrysjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Mikaeel! Overall great job with this PR. It's looking to me almost ready to merge but I have a few suggestions and changes. I left some comments in your code so please take a look at those.

Also, one thing to consider is that when your scraper finds a new bus delay and updates it, the user has to refresh the page to actually see the fact that their bus is delayed. This can be pretty impractical, especially considering that many users (especially for the bus app) use Ion on their phones and should not be expected to refresh constantly to check for updates.

We already have code for websocket broadcasts used in another parts of the bus app which you can look at to make this happen. My recommendation would be to broadcast updates to the buses status to the correct websocket group in your task. You can use asgiref.sync.async_to_sync() for this as used in intranet/apps/bus/consumers.py. I'm keeping this very vague on purpose because I want you to take a look and see if you can figure out what I'm getting at (or come up with a better idea!) so feel free to lmk if you need some more help with this.

Other than that, great job again with your PR and I'm looking forward to seeing your changes :)

"fetch-fcpsbus-delays": {
"task": "intranet.apps.bus.tasks.fetch_fcpsbus_delays",
"schedule": 10.0,
"args":(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really really small thing but just for consistency

Suggested change
"args":(),
"args": (),

# Check if the current time is within 6-9 AM or 3-6 PM from Mon-Fri
if now.weekday() in (5, 6):
return
if now.hour not in range(6, 9) and now.hour not in range(15, 18):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is better for readability

Suggested change
if now.hour not in range(6, 9) and now.hour not in range(15, 18):
if not (now.hour in range(6, 9) or now.hour in range(15, 18)):

<div class="delay-card">
<i class="fas fa-exclamation-triangle"></i>
<span>
<strong>Bus {{ route }}</strong> delayed due to: {{ delay.reason }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lowercase the reason for formatting

Suggested change
<strong>Bus {{ route }}</strong> delayed due to: {{ delay.reason }}
<strong>Bus {{ route }}</strong> delayed due to: {{ delay.reason|lower }}

@shared_task
def fetch_fcpsbus_delays():
now = timezone.localtime(timezone.now())
logger.info(now.hour)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to log this

Comment on lines 54 to 55
bus_delays_qs = Route.objects.filter(status="d")
bus_delays = {delay.route_name: {"reason": delay.reason} for delay in bus_delays_qs}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability I suggest:

Suggested change
bus_delays_qs = Route.objects.filter(status="d")
bus_delays = {delay.route_name: {"reason": delay.reason} for delay in bus_delays_qs}
bus_delays_queryset = Route.objects.filter(status="d")
bus_delays = {delay.route_name: {"reason": delay.reason} for delay in bus_delays_queryset}



@shared_task
def fetch_fcpsbus_delays():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this task (and update where appropriate) to either fetch_fcps_bus_delays or fetch_fcps_busdelays? I think either of those (preferably the first) would make more sense

logger.info("Updated route %s with delay: %s", route_name, reason)
except Route.DoesNotExist:
logger.error("Route with route_name %s does not exist", route_name)
logger.info("Finished fetching bus delays from FCPS")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no need to log here (you are already logging the specific delays which I think is both better and enough)

@mktaher mktaher force-pushed the dev branch 2 times, most recently from ceb6365 to 60bb104 Compare May 22, 2025 03:07
@mktaher
Copy link
Author

mktaher commented May 22, 2025

Thanks for the feedback, I did everything and I got the web refresh fixed up (It wasn't that hard). Hopefully my PR gets approved :)

@shrysjain
Copy link
Member

Looks like you accidentally modified a bunch of other files (probably by running a format script or something), just revert your changes on every file that isn't yours and push again

@mktaher mktaher force-pushed the dev branch 2 times, most recently from 13bf0e6 to 7474068 Compare May 22, 2025 03:40
# Check if the current time is within 6-9 AM or 3-6 PM from Mon-Fri
if now.weekday() in (5, 6):
return
if not (now.hour in range(6, 9) and now.hour in range(15, 18)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if statement will always trigger because the current hour can never be in [6, 9) AND [15, 18) at the same time

Suggested change
if not (now.hour in range(6, 9) and now.hour in range(15, 18)):
if not (now.hour in range(6, 9) or now.hour in range(15, 18)):

Copy link
Member

@shrysjain shrysjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright good work, LGTM

@alanzhu0

Copy link
Member

@alanzhu0 alanzhu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work so far! Please see my comments.

Also, could you try to pull how long a bus is delayed in addition to the reason, and display that time to users?

return

soup = BeautifulSoup(response.text, "html.parser")
rows = soup.select("table tr")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested to make sure this and the following lines don't error on bad format from the website? To be safe, I would enclose the whole thing in a try except block, where the except catches everything (except Exception)

return
if not (now.hour in range(6, 9) or now.hour in range(15, 18)):
return
url = "https://busdelay.fcps.edu"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this in settings?

# Check if the current time is within 6-9 AM or 3-6 PM from Mon-Fri
if now.weekday() in (5, 6):
return
if not (now.hour in range(6, 9) or now.hour in range(15, 18)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It'd be better if you can get these times from the Ion schedule. I.e., find when the first block of the day starts and when the last block of the day ends, and start pulling, say, from 2.5 hours before the first block of the day to 1 hour after the first block, and 1 hour before the last block of the day to 2 hours after. This is so that this still works in cases of early release / 2 hr delay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is there any way to code these times into the celery schedule instead of here. That way this doesn't get called every 10 s during the school day. It might not be possible, just asking

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Alan, I tried doing a crontab schedule so I could get rid of the excess code but its not possible when working with seconds. However I did make it so it isn't running 10s constantly throughout the day, just the time frame you gave me, I hope thats okay!

@@ -955,6 +955,11 @@ def get_log(name): # pylint: disable=redefined-outer-name; 'name' is used as th
"schedule": celery.schedules.crontab(day_of_month=3, hour=1),
"args": (),
},
"fetch-fcps-bus-delays": {
"task": "intranet.apps.bus.tasks.fetch_fcps_bus_delays",
"schedule": 10.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like every 10s is too aggressive, can we increase that to like every minute, and then maybe every 15s from 3:55-4:20 (calculated dynamically from schedule)? We don't want to get rate limited.

else:
end_of_day = datetime.datetime(now.year, now.month, now.day, settings.SCHOOL_END_HOUR, settings.SCHOOL_END_MINUTE)

end_of_day = datetime.datetime(current_time.year, current_time.month, current_time.day, settings.SCHOOL_END_HOUR, settings.SCHOOL_END_MINUTE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to calculate based on schedule too (I know this is old)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean calculate this with the schedule, this else statement only runs if a day doesn't exist (holiday or smth) so how would I use the schedule app because isn't this supposed to be coded like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops sorry you're right

cells = row.find_all("td")
if len(cells) >= 4 and cells[0].text.strip() == "JEFFERSON HIGH":
route_name = cells[1].text.strip().split()[0]
reason = cells[3].text.strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick sanitization, can you limit the max length of this in case we receive bad input so that it doesn't put a massive block of text in Ion? Just choose a reasonable max based on what the page shows.

Copy link
Member

@aarushtools aarushtools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add on to what Alan said (I was gonna write this on your closed PR):
This is not the best example, but this (line 971 of settings) may be one way to implement scheduling celery tasks when day end times vary. For example, if it's a half day, your time window won't run this celery task on time. The way the linked example works is by calling a schedule function at 12 AM everyday, getting the day end time of the school day (it's 12 AM so it just became a new day), and then scheduling the celery task in advance to run at that day end time. You would probably need to schedule several tasks in 5 minute intervals in advance. That way the scheduling is dynamic and can account for these situations. I advise making the schedule and actual task functions separate, which I didn't do in the link but I should.

@aarushtools
Copy link
Member

You can copy a lot of the code for the bus schedule task, just add a bigger time window (i.e. 30 mins before and 1 hour after dismissal? or something like that. Maybe worth putting that in settings as well)

@mktaher mktaher force-pushed the dev branch 3 times, most recently from 63afe8f to 6fc5b63 Compare May 23, 2025 02:00
@mktaher
Copy link
Author

mktaher commented May 23, 2025

@alanzhu0 I fixed your comments, thanks for the feedback you guys gave me!

@mktaher mktaher requested a review from alanzhu0 May 23, 2025 13:49
return

# Sort out the JEFFERSON HIGH bus delays
for row in rows[1:]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you enclose this entire forloop in a try except? I'm concerned a bit about what happens in the parsing if the format isn't correct

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, thanks for catching that

Copy link
Member

@shrysjain shrysjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, we should try to find another way to properly schedule pulling the delay information without having to run a celery job every 10 seconds. What you have now (with the 2 helper functions) is much better than what you had initially, but it does still run the job every 10 seconds which is something we should avoid.

As Aarush said in his comment, a good way to do this is by calling a task at the start of every day to dynamically calculate the proper time windows needed for that day based on the schedule and then schedule the other tasks to run within that daily task at the correct time. You can either create all the tasks for 1m or 15s intervals once at the start of the day and schedule all of them, or you can just schedule the first one and have it recursively reschedule itself in the appropriate amount of time based on the schedule and current time.

I'll look a little bit more and see if I can give you some code to start with, but otherwise LMK if you have any questions or need anything else


Something like

    "schedule-all-bus-delay-fetches": {
        "task": "intranet.apps.bus.tasks.schedule_all_bus_delay_fetches",
        "schedule": celery.schedules.crontab(hour=0, minute=1),
        "args": (),
    },

for your celery job and using .apply_async(eta=t) to schedule the tasks

For more context, pre-scheduling all the jobs is better because it only schedules exactly as many celery jobs as needed per day instead of literally (24 hours * 60 minutes * 6 jobs per minute) jobs per day (which is a LOT) and also scheduling everything at once is a lot cleaner than constantly scheduling something every 10 sceonds, both for scalability and even for keeping the logs clean

@mktaher
Copy link
Author

mktaher commented May 24, 2025

Ok so the way I did this is that it has a schedules bus delay fetches task, and I took my logic from the previous 2 tasks and then made it so that at 12am it will schedule both the 1 minute jobs and 15 second jobs during those windows. This way the jobs are only scheduled for when we need it, and there is no need to run (24hr * 60 min * 6 jobs). Credits to Aarush for providing what I needed to do earlier, and thank you guys for helping me out!

Anything else? @alanzhu0 @shrysjain

@shrysjain
Copy link
Member

Nice work, this is exactly what I had in mind. Nothing else from me :)

shrysjain
shrysjain previously approved these changes May 24, 2025
Copy link
Member

@aarushtools aarushtools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit says fetch afternoon delays but it seems like you implemented fetching the morning as well (first_window_start)? Can you update the commit message to reflect this. I could also be reading your code wrong, but this means you have to add bus_delays as a context variable to the morning function as well.



@shared_task
def fetch_fcps_bus_delays():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstrings

@@ -1016,4 +1025,4 @@ def get_log(name): # pylint: disable=redefined-outer-name; 'name' is used as th
sentry_logging = LoggingIntegration(
level=logging.INFO, event_level=logging.ERROR # Capture info and above as breadcrumbs # Send errors as events
)
sentry_sdk.init(SENTRY_PUBLIC_DSN, integrations=[DjangoIntegration(), sentry_logging, CeleryIntegration()], send_default_pii=True)
sentry_sdk.init(SENTRY_PUBLIC_DSN, integrations=[DjangoIntegration(), sentry_logging, CeleryIntegration()], send_default_pii=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be out of your diff

@@ -67,6 +67,10 @@
SENIOR_DESTS_BANNER_LINK = "https://tinyurl.com/tjseniors2025"

""" -------- END UPDATE ANNUALLY -------- """

#Bus Delays URL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's a small thing but

Suggested change
#Bus Delays URL
# Bus Delays URL

# Idle window 3: 20m after end to 2h after end
third_window_start = end_datetime + timedelta(minutes=20)
third_window_end = end_datetime + timedelta(hours=2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you code these timedelta's in as settings? Also, I feel like checking 1 hour before is unnecessary. Most people are in class and it doesn't really matter if their bus is delayed until 10-15 minutes before dismissal. This is just a suggestion of course

Copy link
Author

@mktaher mktaher May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how you want me to code the timedelta's in settings, would you want it like BUS_DELAY_THIRD_WINDOW_START = timedelta(minutes=20)? Because I can't do the windows in as settings because it needs to use start_datetime and end_datetime, or I could just leave it as is?

Copy link
Member

@aarushtools aarushtools May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah something like this in settings

# Check for bus delays from start of day - MORNING_START_BUFFER to start of day + MORNING_END_BUFFER (minutes)
MORNING_START_BUFFER = 20
MORNING_END_BUFFER = 30

and then in the task

    first_window_start = start_datetime - timedelta(minutes=MORNING_START_BUFFER)
    first_window_end = start_datetime + timedelta(minutes=MORNING_END_BUFFER)

Copy link
Author

@mktaher mktaher May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Bus Delays
# Buffer are used in timedelta for when to schedule the bus delay fetches that occur every 15s seconds or 1 minute respectively.
BUS_DELAY_URL = "https://busdelay.fcps.edu/"
# First Window Buffer: 2.5 hours before start of school to 1 hour after start of school (for 1 minute task)
FIRST_WINDOW_START_BUFFER = 2.5
FIRST_WINDOW_END_BUFFER = 1
# Second Window Buffer: 20 minutes before end of school to 5 minutes before end of school (for 1 minute task)
SECOND_WINDOW_START_BUFFER = 20
SECOND_WINDOW_END_BUFFER = 5
# Third Window Buffer: 20 minutes after end of school to 2 hours after end of school (for 1 minute task)
THIRD_WINDOW_START_BUFFER = 20
THIRD_WINDOW_END_BUFFER = 2
# Active Window Buffer: 5 minutes before end of school to 20 minutes after end of school (for 15 second task)
ACTIVE_WINDOW_START_BUFFER = 5
ACTIVE_WINDOW_END_BUFFER = 20

Is this fine? Please tell me if any of the comments or variable names need changing for any formatting reasons.

@mktaher mktaher changed the title feat(bus): fetch fcps afternoon bus delays feat(bus): fetch fcps bus information May 25, 2025
@mktaher mktaher changed the title feat(bus): fetch fcps bus information feat(bus): fetch fcps bus delay information May 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pull bus delay info from FCPS
5 participants