# MDF Analysis from S3 w/local caching.

> *In this article, we will present its new ability to cache remote content, keeping a local copy for faster lookup after the initial read. Similar text first appeared in the fsspec documentation, but here we provide more details and use cases.*  
  - https://www.anaconda.com/fsspec-remote-caching/

In [1]:
import s3fs
from asammdf import MDF

Connect to S3 with the given key/secret.

In [2]:
s3_cfg = {
    "key": "AKIA32WGRU62G4OVAYF4",
    "secret": "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",
}
fs = s3fs.S3FileSystem(**s3_cfg)

# Walk Through All Files:

Walk through all S3 files and find the first one.

In [3]:
import os
for root, dirs, files in fs.walk("canedge-live-demo-2"):
    for file in files:
        if file.lower().endswith(".mf4"):
            mdf_path = os.path.join(root, file)
            break
mdf_path

'canedge-live-demo-2/409EF5ED/00000089/00000001.mf4'

# Convert to fsspec filecache

Exact example from anaconda article: [Introducing Remote Content Caching with FSSpec.](https://www.anaconda.com/fsspec-remote-caching/)

In [4]:
import fsspec
of = fsspec.open("filecache://anaconda-public-datasets/iris/iris.csv", mode='rt', 
                 cache_storage='/tmp/cache1',
                 target_protocol='s3', target_options={'anon': True})
with of as f:
    print(f.readline())

5.1,3.5,1.4,0.2,Iris-setosa



In [7]:
cache_dir = "/tmp/mdf_cache"
fsspec_kwargs = {
    "urlpath": f"filecache://{mdf_path}",
    "mode": 'rb', 
    "cache_storage": cache_dir,
    "target_protocol": 's3',
    "target_options": s3_cfg,
}

In [8]:
import shutil
import time

In [10]:
shutil.rmtree(
    path=cache_dir,
    ignore_errors=True
)
t1=time.time()
with fsspec.open(**fsspec_kwargs) as of:
    mdf = MDF(of)
t2=time.time()
with fsspec.open(**fsspec_kwargs) as of:
    mdf2 = MDF(of)
t3=time.time()

print(f"Uncached Read: {t2-t1}s")
print(f"Cached Read: {t3-t2}s")

Uncached Read: 1.5884783267974854s
Cached Read: 0.3726973533630371s


# All S3 Files Cached/Uncached.

In [13]:
import os
mdf_paths=list()
for root, dirs, files in fs.walk("canedge-live-demo-2"):
    for file in files:
        if file.lower().endswith(".mf4"):
            mdf_paths.append(os.path.join(root, file))
len(mdf_paths)

In [None]:
mdfs_uncached = list()
mdfs_cached = list()

cache_dir = "/tmp/mdf_cache"



shutil.rmtree(
    path=cache_dir,
    ignore_errors=True
)
t1=time.time()
for mdf_path in mdf_paths:
    fsspec_kwargs = {
        "urlpath": f"filecache://{mdf_path}",
        "mode": 'rb', 
        "cache_storage": cache_dir,
        "target_protocol": 's3',
        "target_options": s3_cfg,
    }
    with fsspec.open(**fsspec_kwargs) as of:
        mdfs_uncached.append(MDF(of))  
        
t2=time.time()

for mdf_path in mdf_paths:
    fsspec_kwargs = {
        "urlpath": f"filecache://{mdf_path}",
        "mode": 'rb', 
        "cache_storage": cache_dir,
        "target_protocol": 's3',
        "target_options": s3_cfg,
    }
    with fsspec.open(**fsspec_kwargs) as of:
        mdfs_cached.append(MDF(of))  
t3=time.time()

print(f"Uncached Read: {t2-t1}s")
print(f"Cached Read: {t3-t2}s")