-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joindre les données d'activités des stations à la météo #25
Labels
Comments
Problème de rapidité avec la volumétrie de l'activité Vcub (16.564.512 lignes). On à besoin de transformer les date d'activité (resamplé à 10 min) en date Version classique%timeit ts_activity['date_year_month'] = ts_activity['date'].dt.strftime(date_format='%Y-%m-%d')
# 1min 4s ± 124 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Version tuneddef fast_parse_date(s):
"""
This is an extremely fast approach to datetime parsing.
For large data, the same dates are often repeated. Rather than
re-parse these, we store all unique dates, parse them, and
use a lookup to convert all dates.
cf https://github.com/sanand0/benchmarks/tree/master/date-parse
"""
dates = {date: date.strftime(format='%Y-%m-%d') for date in pd.Series(s.unique())}
return s.apply(lambda v: dates[v])
%timeit ts_activity['date_year_month'] = fast_parse_date(ts_activity['date'])
# 44.9 s ± 194 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
La jointure doit maintenant prendre en compte l'heure et non plus uniquement le jour. |
armgilles
added a commit
that referenced
this issue
Jan 17, 2022
armgilles
added a commit
that referenced
this issue
Jan 17, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Obtenir les données de température, pluie, humidité pour chaque date (à la journée) de l'activité des stations.
Données météo #24
The text was updated successfully, but these errors were encountered: