<br><br><br><br><br>

# Advanced Uproot

<br><br><br><br><br>

<br><br>

## Cache management

<br>

**Uproot does not automatically cache arrays.** (Remote backends cache raw bytes, but that's different.)

  * **Disadvantage:** unless you opt-into caching, uproot reads and decompresses the data every time you ask for it.
  * **Advantage:** you control how much memory your process uses.

<br>

In this sense and others, uproot is a _low-level_ library.

<br><br>

In [7]:
import uproot

# any dict-like object may be used as a cache
cache = {}

arrays = uproot.open("data/Zmumu.root")["events"].arrays("*", cache=cache)

# cache contains UUID;treename;branchname;interpretation;entryrange → arrays
cache

{'AAGUS3fQmKsR56dpAQAAf77v;events;Type;asstring();0-2304': <ObjectArray [b'GT' b'TT' b'GT' ... b'TT' b'GT' b'GG'] at 0x7f56543036d8>,
 'AAGUS3fQmKsR56dpAQAAf77v;events;Run;asdtype(Bi4(),Li4());0-2304': array([148031, 148031, 148031, ..., 148029, 148029, 148029], dtype=int32),
 'AAGUS3fQmKsR56dpAQAAf77v;events;Event;asdtype(Bi4(),Li4());0-2304': array([10507008, 10507008, 10507008, ..., 99991333, 99991333, 99991333],
       dtype=int32),
 'AAGUS3fQmKsR56dpAQAAf77v;events;E1;asdtype(Bf8(),Lf8());0-2304': array([82.20186639, 62.34492895, 62.34492895, ..., 81.27013558,
        81.27013558, 81.56621735]),
 'AAGUS3fQmKsR56dpAQAAf77v;events;px1;asdtype(Bf8(),Lf8());0-2304': array([-41.19528764,  35.11804977,  35.11804977, ...,  32.37749196,
         32.37749196,  32.48539387]),
 'AAGUS3fQmKsR56dpAQAAf77v;events;py1;asdtype(Bf8(),Lf8());0-2304': array([ 17.4332439 , -16.57036233, -16.57036233, ...,   1.19940578,
          1.19940578,   1.2013503 ]),
 'AAGUS3fQmKsR56dpAQAAf77v;events;pz1;asdtyp

In [11]:
# So that the next time you make this exact request, the arrays come from cache, not disk.

arrays = uproot.open("data/Zmumu.root")["events"].arrays("*", cache=cache)
arrays

{b'Type': <ObjectArray [b'GT' b'TT' b'GT' ... b'TT' b'GT' b'GG'] at 0x7f56680889b0>,
 b'Run': array([148031, 148031, 148031, ..., 148029, 148029, 148029], dtype=int32),
 b'Event': array([10507008, 10507008, 10507008, ..., 99991333, 99991333, 99991333],
       dtype=int32),
 b'E1': array([82.20186639, 62.34492895, 62.34492895, ..., 81.27013558,
        81.27013558, 81.56621735]),
 b'px1': array([-41.19528764,  35.11804977,  35.11804977, ...,  32.37749196,
         32.37749196,  32.48539387]),
 b'py1': array([ 17.4332439 , -16.57036233, -16.57036233, ...,   1.19940578,
          1.19940578,   1.2013503 ]),
 b'pz1': array([-68.96496181, -48.77524654, -48.77524654, ..., -74.53243061,
        -74.53243061, -74.80837247]),
 b'pt1': array([44.7322, 38.8311, 38.8311, ..., 32.3997, 32.3997, 32.3997]),
 b'eta1': array([-1.21769, -1.05139, -1.05139, ..., -1.57044, -1.57044, -1.57044]),
 b'phi1': array([ 2.74126  , -0.440873 , -0.440873 , ...,  0.0370275,  0.0370275,
         0.0370275]),
 b'Q1': 

In [18]:
# Using a dict as a cache keeps everything in memory forever (until you call dict.clear()).

# More realistically, you should use an ArrayCache with a memory upper limit.

cache = uproot.cache.ArrayCache(100*1024)   # 100*1024 bytes is 100 kB

arrays = uproot.open("data/Zmumu.root")["events"].arrays("*", cache=cache)

# Now we only have the last ones that fit into cache.
list(cache.keys())

['AAGUS3fQmKsR56dpAQAAf77v;events;pz2;asdtype(Bf8(),Lf8());0-2304',
 'AAGUS3fQmKsR56dpAQAAf77v;events;pt2;asdtype(Bf8(),Lf8());0-2304',
 'AAGUS3fQmKsR56dpAQAAf77v;events;eta2;asdtype(Bf8(),Lf8());0-2304',
 'AAGUS3fQmKsR56dpAQAAf77v;events;phi2;asdtype(Bf8(),Lf8());0-2304',
 'AAGUS3fQmKsR56dpAQAAf77v;events;Q2;asdtype(Bi4(),Li4());0-2304',
 'AAGUS3fQmKsR56dpAQAAf77v;events;M;asdtype(Bf8(),Lf8());0-2304']

<br><br><br><br>

**Question:** couldn't you manage arrays in memory yourself?

Yes, but inserting `cache=whatever` into your function calls minimally changes your analysis script, which avoids cluttering it up with technical details.

<br><br><br><br>

In [34]:
# To see the caching in action, let's overload an interpretation so that it prints when used.

class CustomAsDtype(uproot.asdtype):
    @property
    def identifier(self):
        out = super(CustomAsDtype, self).identifier
        print(out, "identifier")
        return out
    def fromroot(self, *args):
        print(self.identifier, "fromroot (first step in interpreting data from a ROOT file)")
        return super(CustomAsDtype, self).fromroot(*args)
    def finalize(self, *args):
        print(self.identifier, "finalize (puts finishing touches on array and returns it)")
        return super(CustomAsDtype, self).finalize(*args)

custom_asdtype = uproot.open("data/Zmumu.root")["events"]["E1"].interpretation
custom_asdtype.__class__ = CustomAsDtype
custom_asdtype

asdtype('>f8')

In [35]:
# Exercise: modify this cell so that evaluating it draws from the cache, instead of reading
# fromroot and finalizing the array.

# You should see it print only one message: identifier.

cache = {}

arrays = uproot.open("data/Zmumu.root")["events"]["E1"].array(custom_asdtype, cache=cache)

asdtype(Bf8(),Lf8()) identifier
asdtype(Bf8(),Lf8()) identifier
asdtype(Bf8(),Lf8()) fromroot (first step in interpreting data from a ROOT file)
asdtype(Bf8(),Lf8()) identifier
asdtype(Bf8(),Lf8()) finalize (puts finishing touches on array and returns it)
