New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LRU cache to gettz and factories #691

Closed
pganssle opened this Issue Apr 17, 2018 · 1 comment

Comments

2 participants
@pganssle
Member

pganssle commented Apr 17, 2018

Now that #635 is resolved, gettz, tzoffset and tzstr satisfies the property that two tzinfo objects (tz1 and tz2) constructed with identical arguments will always be the same object (tz1 is tz2), but the cached objects expire whenever no instances of a given timezone exist outside of the cache. This satisfies the "timezone semantics" reasons for adding a cache and will have good performance characteristics so long as at least one object holds a strong reference to the time zone.

But this does sacrifice performance for people who would prefer a cache because they are creating the same object over and over again even though it goes out of scope every time, so for example:

def NY_timestamp(dt):
    dt.astimezone(tz.gettz('America/New_York')
    return dt.timestamp()

a = []
for i in range(10000):
    a.append(NY_timestamp(datetime.now()))

I think the above would needlessly create 10000 copies of tz.gettz('America/New_York'), when even a modest LRU cache would save the trouble.

I think we should leave the _instances dictionary as a WeakValueDictionary and add a second _lru_cache dictionary that holds a limited (and configurable) number of strong references to used time zones. Not sure the best implementation of the cache itself, preferably something where left-append, right-pop and reordering a single element are all O(1), so I think something implemented with a doubly-linked list (maybe OrderedDict).

Presumably we can take some cues from the implementation of functools.lru_cache, but keep in mind that the Python implementation may not have amazing performance, since CPython has a C implementation and may not have spent much time optimizing the Python implementation.

@gokcennurlu

This comment has been minimized.

Contributor

gokcennurlu commented Jun 8, 2018

I'd like to work on this.

@pganssle pganssle moved this from Open to Claimed in PyLondinium 2018 Sprint Jun 8, 2018

@gokcennurlu gokcennurlu referenced this issue Jun 8, 2018

Merged

Add LRU caching to tzoffset, tzstr and gettz #761

3 of 3 tasks complete

gokcennurlu added a commit to gokcennurlu/dateutil that referenced this issue Jun 14, 2018

Add LRU caching to tzoffset, tzstr and gettz
Caching had been switched to use `weakrefs` in order to reuse instances if they
are still alive, by dateutil#635. This introduces a LRU cache to the mentioned functions
in order to prevent the instances created by them getting dealloc'd and alloc'd
unnecessarily, in situations like this:

```python
for i in range(100):
    tz.gettz('America/New_York')
```

`tz.tzoffset` and `tz.tzstr` get a LRU cache with size of 8.
`tz.gettz`'s cache starts with 8 by default and can be modified by the
introduced `tz.set_cache_size(int)`

Closes dateutil#691

pganssle added a commit to gokcennurlu/dateutil that referenced this issue Sep 24, 2018

Add LRU caching to tzoffset, tzstr and gettz
Caching had been switched to use `weakrefs` in order to reuse instances if they
are still alive, by dateutil#635. This introduces a LRU cache to the mentioned functions
in order to prevent the instances created by them getting dealloc'd and alloc'd
unnecessarily, in situations like this:

```python
for i in range(100):
    tz.gettz('America/New_York')
```

`tz.tzoffset` and `tz.tzstr` get a LRU cache with size of 8.
`tz.gettz`'s cache starts with 8 by default and can be modified by the
introduced `tz.set_cache_size(int)`

Closes dateutil#691

pganssle added a commit to gokcennurlu/dateutil that referenced this issue Oct 16, 2018

Add LRU caching to tzoffset, tzstr and gettz
Caching had been switched to use `weakrefs` in order to reuse instances if they
are still alive, by dateutil#635. This introduces a LRU cache to the mentioned functions
in order to prevent the instances created by them getting dealloc'd and alloc'd
unnecessarily, in situations like this:

```python
for i in range(100):
    tz.gettz('America/New_York')
```

`tz.tzoffset` and `tz.tzstr` get a LRU cache with size of 8.
`tz.gettz`'s cache starts with 8 by default and can be modified by the
introduced `tz.set_cache_size(int)`

Closes dateutil#691

pganssle added a commit to gokcennurlu/dateutil that referenced this issue Oct 16, 2018

Add LRU caching to tzoffset, tzstr and gettz
Caching had been switched to use `weakrefs` in order to reuse instances if they
are still alive, by dateutil#635. This introduces a LRU cache to the mentioned functions
in order to prevent the instances created by them getting dealloc'd and alloc'd
unnecessarily, in situations like this:

```python
for i in range(100):
    tz.gettz('America/New_York')
```

`tz.tzoffset` and `tz.tzstr` get a LRU cache with size of 8.
`tz.gettz`'s cache starts with 8 by default and can be modified by the
introduced `tz.set_cache_size(int)`

Closes dateutil#691

PyLondinium 2018 Sprint automation moved this from Claimed to Done Oct 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment