Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joindre les données d'activités des stations à la météo #25

Closed
armgilles opened this issue Sep 16, 2020 · 2 comments
Closed

Joindre les données d'activités des stations à la météo #25

armgilles opened this issue Sep 16, 2020 · 2 comments

Comments

@armgilles
Copy link
Owner

Obtenir les données de température, pluie, humidité pour chaque date (à la journée) de l'activité des stations.

Données météo #24

@armgilles
Copy link
Owner Author

Problème de rapidité avec la volumétrie de l'activité Vcub (16.564.512 lignes). On à besoin de transformer les date d'activité (resamplé à 10 min) en date yyyy-mm afin de faire la jointure avec la météo.

Version classique

%timeit ts_activity['date_year_month'] = ts_activity['date'].dt.strftime(date_format='%Y-%m-%d') 
# 1min 4s ± 124 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Version tuned

def fast_parse_date(s):
    """
    This is an extremely fast approach to datetime parsing.
    For large data, the same dates are often repeated. Rather than
    re-parse these, we store all unique dates, parse them, and
    use a lookup to convert all dates.
    
    cf https://github.com/sanand0/benchmarks/tree/master/date-parse
    """
    dates = {date: date.strftime(format='%Y-%m-%d') for date in pd.Series(s.unique())}
    return s.apply(lambda v: dates[v])

%timeit ts_activity['date_year_month'] = fast_parse_date(ts_activity['date'])
# 44.9 s ± 194 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@armgilles
Copy link
Owner Author

La jointure doit maintenant prendre en compte l'heure et non plus uniquement le jour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant