-
Notifications
You must be signed in to change notification settings - Fork 102
feat(bus): fetch fcps bus delay information #1796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Mikaeel! Overall great job with this PR. It's looking to me almost ready to merge but I have a few suggestions and changes. I left some comments in your code so please take a look at those.
Also, one thing to consider is that when your scraper finds a new bus delay and updates it, the user has to refresh the page to actually see the fact that their bus is delayed. This can be pretty impractical, especially considering that many users (especially for the bus app) use Ion on their phones and should not be expected to refresh constantly to check for updates.
We already have code for websocket broadcasts used in another parts of the bus
app which you can look at to make this happen. My recommendation would be to broadcast updates to the buses status to the correct websocket group in your task. You can use asgiref.sync.async_to_sync()
for this as used in intranet/apps/bus/consumers.py
. I'm keeping this very vague on purpose because I want you to take a look and see if you can figure out what I'm getting at (or come up with a better idea!) so feel free to lmk if you need some more help with this.
Other than that, great job again with your PR and I'm looking forward to seeing your changes :)
intranet/settings/__init__.py
Outdated
"fetch-fcpsbus-delays": { | ||
"task": "intranet.apps.bus.tasks.fetch_fcpsbus_delays", | ||
"schedule": 10.0, | ||
"args":(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really really small thing but just for consistency
"args":(), | |
"args": (), |
intranet/apps/bus/tasks.py
Outdated
# Check if the current time is within 6-9 AM or 3-6 PM from Mon-Fri | ||
if now.weekday() in (5, 6): | ||
return | ||
if now.hour not in range(6, 9) and now.hour not in range(15, 18): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is better for readability
if now.hour not in range(6, 9) and now.hour not in range(15, 18): | |
if not (now.hour in range(6, 9) or now.hour in range(15, 18)): |
intranet/templates/bus/home.html
Outdated
<div class="delay-card"> | ||
<i class="fas fa-exclamation-triangle"></i> | ||
<span> | ||
<strong>Bus {{ route }}</strong> delayed due to: {{ delay.reason }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would lowercase the reason for formatting
<strong>Bus {{ route }}</strong> delayed due to: {{ delay.reason }} | |
<strong>Bus {{ route }}</strong> delayed due to: {{ delay.reason|lower }} |
intranet/apps/bus/tasks.py
Outdated
@shared_task | ||
def fetch_fcpsbus_delays(): | ||
now = timezone.localtime(timezone.now()) | ||
logger.info(now.hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to log this
intranet/apps/bus/views.py
Outdated
bus_delays_qs = Route.objects.filter(status="d") | ||
bus_delays = {delay.route_name: {"reason": delay.reason} for delay in bus_delays_qs} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability I suggest:
bus_delays_qs = Route.objects.filter(status="d") | |
bus_delays = {delay.route_name: {"reason": delay.reason} for delay in bus_delays_qs} | |
bus_delays_queryset = Route.objects.filter(status="d") | |
bus_delays = {delay.route_name: {"reason": delay.reason} for delay in bus_delays_queryset} |
intranet/apps/bus/tasks.py
Outdated
|
||
|
||
@shared_task | ||
def fetch_fcpsbus_delays(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename this task (and update where appropriate) to either fetch_fcps_bus_delays
or fetch_fcps_busdelays
? I think either of those (preferably the first) would make more sense
intranet/apps/bus/tasks.py
Outdated
logger.info("Updated route %s with delay: %s", route_name, reason) | ||
except Route.DoesNotExist: | ||
logger.error("Route with route_name %s does not exist", route_name) | ||
logger.info("Finished fetching bus delays from FCPS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, no need to log here (you are already logging the specific delays which I think is both better and enough)
ceb6365
to
60bb104
Compare
Thanks for the feedback, I did everything and I got the web refresh fixed up (It wasn't that hard). Hopefully my PR gets approved :) |
Looks like you accidentally modified a bunch of other files (probably by running a format script or something), just revert your changes on every file that isn't yours and push again |
13bf0e6
to
7474068
Compare
intranet/apps/bus/tasks.py
Outdated
# Check if the current time is within 6-9 AM or 3-6 PM from Mon-Fri | ||
if now.weekday() in (5, 6): | ||
return | ||
if not (now.hour in range(6, 9) and now.hour in range(15, 18)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if statement will always trigger because the current hour can never be in [6, 9)
AND [15, 18)
at the same time
if not (now.hour in range(6, 9) and now.hour in range(15, 18)): | |
if not (now.hour in range(6, 9) or now.hour in range(15, 18)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright good work, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work so far! Please see my comments.
Also, could you try to pull how long a bus is delayed in addition to the reason, and display that time to users?
intranet/apps/bus/tasks.py
Outdated
return | ||
|
||
soup = BeautifulSoup(response.text, "html.parser") | ||
rows = soup.select("table tr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested to make sure this and the following lines don't error on bad format from the website? To be safe, I would enclose the whole thing in a try except block, where the except catches everything (except Exception)
intranet/apps/bus/tasks.py
Outdated
return | ||
if not (now.hour in range(6, 9) or now.hour in range(15, 18)): | ||
return | ||
url = "https://busdelay.fcps.edu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this in settings?
intranet/apps/bus/tasks.py
Outdated
# Check if the current time is within 6-9 AM or 3-6 PM from Mon-Fri | ||
if now.weekday() in (5, 6): | ||
return | ||
if not (now.hour in range(6, 9) or now.hour in range(15, 18)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It'd be better if you can get these times from the Ion schedule. I.e., find when the first block of the day starts and when the last block of the day ends, and start pulling, say, from 2.5 hours before the first block of the day to 1 hour after the first block, and 1 hour before the last block of the day to 2 hours after. This is so that this still works in cases of early release / 2 hr delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is there any way to code these times into the celery schedule instead of here. That way this doesn't get called every 10 s during the school day. It might not be possible, just asking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Alan, I tried doing a crontab schedule so I could get rid of the excess code but its not possible when working with seconds. However I did make it so it isn't running 10s constantly throughout the day, just the time frame you gave me, I hope thats okay!
intranet/settings/__init__.py
Outdated
@@ -955,6 +955,11 @@ def get_log(name): # pylint: disable=redefined-outer-name; 'name' is used as th | |||
"schedule": celery.schedules.crontab(day_of_month=3, hour=1), | |||
"args": (), | |||
}, | |||
"fetch-fcps-bus-delays": { | |||
"task": "intranet.apps.bus.tasks.fetch_fcps_bus_delays", | |||
"schedule": 10.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like every 10s is too aggressive, can we increase that to like every minute, and then maybe every 15s from 3:55-4:20 (calculated dynamically from schedule)? We don't want to get rate limited.
else: | ||
end_of_day = datetime.datetime(now.year, now.month, now.day, settings.SCHOOL_END_HOUR, settings.SCHOOL_END_MINUTE) | ||
|
||
end_of_day = datetime.datetime(current_time.year, current_time.month, current_time.day, settings.SCHOOL_END_HOUR, settings.SCHOOL_END_MINUTE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this to calculate based on schedule too (I know this is old)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean calculate this with the schedule, this else statement only runs if a day doesn't exist (holiday or smth) so how would I use the schedule app because isn't this supposed to be coded like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops sorry you're right
intranet/apps/bus/tasks.py
Outdated
cells = row.find_all("td") | ||
if len(cells) >= 4 and cells[0].text.strip() == "JEFFERSON HIGH": | ||
route_name = cells[1].text.strip().split()[0] | ||
reason = cells[3].text.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick sanitization, can you limit the max length of this in case we receive bad input so that it doesn't put a massive block of text in Ion? Just choose a reasonable max based on what the page shows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to add on to what Alan said (I was gonna write this on your closed PR):
This is not the best example, but this (line 971 of settings) may be one way to implement scheduling celery tasks when day end times vary. For example, if it's a half day, your time window won't run this celery task on time. The way the linked example works is by calling a schedule function at 12 AM everyday, getting the day end time of the school day (it's 12 AM so it just became a new day), and then scheduling the celery task in advance to run at that day end time. You would probably need to schedule several tasks in 5 minute intervals in advance. That way the scheduling is dynamic and can account for these situations. I advise making the schedule and actual task functions separate, which I didn't do in the link but I should.
You can copy a lot of the code for the bus schedule task, just add a bigger time window (i.e. 30 mins before and 1 hour after dismissal? or something like that. Maybe worth putting that in settings as well) |
63afe8f
to
6fc5b63
Compare
@alanzhu0 I fixed your comments, thanks for the feedback you guys gave me! |
intranet/apps/bus/tasks.py
Outdated
return | ||
|
||
# Sort out the JEFFERSON HIGH bus delays | ||
for row in rows[1:]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you enclose this entire forloop in a try except? I'm concerned a bit about what happens in the parsing if the format isn't correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, thanks for catching that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, we should try to find another way to properly schedule pulling the delay information without having to run a celery job every 10 seconds. What you have now (with the 2 helper functions) is much better than what you had initially, but it does still run the job every 10 seconds which is something we should avoid.
As Aarush said in his comment, a good way to do this is by calling a task at the start of every day to dynamically calculate the proper time windows needed for that day based on the schedule and then schedule the other tasks to run within that daily task at the correct time. You can either create all the tasks for 1m or 15s intervals once at the start of the day and schedule all of them, or you can just schedule the first one and have it recursively reschedule itself in the appropriate amount of time based on the schedule and current time.
I'll look a little bit more and see if I can give you some code to start with, but otherwise LMK if you have any questions or need anything else
Something like
"schedule-all-bus-delay-fetches": {
"task": "intranet.apps.bus.tasks.schedule_all_bus_delay_fetches",
"schedule": celery.schedules.crontab(hour=0, minute=1),
"args": (),
},
for your celery job and using .apply_async(eta=t)
to schedule the tasks
For more context, pre-scheduling all the jobs is better because it only schedules exactly as many celery jobs as needed per day instead of literally (24 hours * 60 minutes * 6 jobs per minute) jobs per day (which is a LOT) and also scheduling everything at once is a lot cleaner than constantly scheduling something every 10 sceonds, both for scalability and even for keeping the logs clean
Ok so the way I did this is that it has a schedules bus delay fetches task, and I took my logic from the previous 2 tasks and then made it so that at 12am it will schedule both the 1 minute jobs and 15 second jobs during those windows. This way the jobs are only scheduled for when we need it, and there is no need to run (24hr * 60 min * 6 jobs). Credits to Aarush for providing what I needed to do earlier, and thank you guys for helping me out! Anything else? @alanzhu0 @shrysjain |
Nice work, this is exactly what I had in mind. Nothing else from me :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit says fetch afternoon delays but it seems like you implemented fetching the morning as well (first_window_start
)? Can you update the commit message to reflect this. I could also be reading your code wrong, but this means you have to add bus_delays
as a context variable to the morning function as well.
|
||
|
||
@shared_task | ||
def fetch_fcps_bus_delays(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstrings
intranet/settings/__init__.py
Outdated
@@ -1016,4 +1025,4 @@ def get_log(name): # pylint: disable=redefined-outer-name; 'name' is used as th | |||
sentry_logging = LoggingIntegration( | |||
level=logging.INFO, event_level=logging.ERROR # Capture info and above as breadcrumbs # Send errors as events | |||
) | |||
sentry_sdk.init(SENTRY_PUBLIC_DSN, integrations=[DjangoIntegration(), sentry_logging, CeleryIntegration()], send_default_pii=True) | |||
sentry_sdk.init(SENTRY_PUBLIC_DSN, integrations=[DjangoIntegration(), sentry_logging, CeleryIntegration()], send_default_pii=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be out of your diff
intranet/settings/__init__.py
Outdated
@@ -67,6 +67,10 @@ | |||
SENIOR_DESTS_BANNER_LINK = "https://tinyurl.com/tjseniors2025" | |||
|
|||
""" -------- END UPDATE ANNUALLY -------- """ | |||
|
|||
#Bus Delays URL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a small thing but
#Bus Delays URL | |
# Bus Delays URL |
# Idle window 3: 20m after end to 2h after end | ||
third_window_start = end_datetime + timedelta(minutes=20) | ||
third_window_end = end_datetime + timedelta(hours=2) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you code these timedelta
's in as settings? Also, I feel like checking 1 hour before is unnecessary. Most people are in class and it doesn't really matter if their bus is delayed until 10-15 minutes before dismissal. This is just a suggestion of course
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how you want me to code the timedelta
's in settings, would you want it like BUS_DELAY_THIRD_WINDOW_START = timedelta(minutes=20)
? Because I can't do the windows in as settings because it needs to use start_datetime
and end_datetime
, or I could just leave it as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah something like this in settings
# Check for bus delays from start of day - MORNING_START_BUFFER to start of day + MORNING_END_BUFFER (minutes)
MORNING_START_BUFFER = 20
MORNING_END_BUFFER = 30
and then in the task
first_window_start = start_datetime - timedelta(minutes=MORNING_START_BUFFER)
first_window_end = start_datetime + timedelta(minutes=MORNING_END_BUFFER)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Bus Delays
# Buffer are used in timedelta for when to schedule the bus delay fetches that occur every 15s seconds or 1 minute respectively.
BUS_DELAY_URL = "https://busdelay.fcps.edu/"
# First Window Buffer: 2.5 hours before start of school to 1 hour after start of school (for 1 minute task)
FIRST_WINDOW_START_BUFFER = 2.5
FIRST_WINDOW_END_BUFFER = 1
# Second Window Buffer: 20 minutes before end of school to 5 minutes before end of school (for 1 minute task)
SECOND_WINDOW_START_BUFFER = 20
SECOND_WINDOW_END_BUFFER = 5
# Third Window Buffer: 20 minutes after end of school to 2 hours after end of school (for 1 minute task)
THIRD_WINDOW_START_BUFFER = 20
THIRD_WINDOW_END_BUFFER = 2
# Active Window Buffer: 5 minutes before end of school to 20 minutes after end of school (for 15 second task)
ACTIVE_WINDOW_START_BUFFER = 5
ACTIVE_WINDOW_END_BUFFER = 20
Is this fine? Please tell me if any of the comments or variable names need changing for any formatting reasons.
Closes #1413
Proposed changes
Brief description of rationale