# Now let's look at ways to optimise the model for our specific use case


In [8]:
# Install medcat
! pip install medcat==1.8.0
try:
    from medcat.cat import CAT
except:
    print("WARNING: Runtime will restart automatically and please run other cells thereafter.")
    exit()

Collecting medcat==1.8.0
  Downloading medcat-1.8.0-py3-none-any.whl (182 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: medcat
  Attempting uninstall: medcat
    Found existing installation: medcat 1.7.1
    Uninstalling medcat-1.7.1:
      Successfully uninstalled medcat-1.7.1
Successfully installed medcat-1.8.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


**Restart the runtime if on colab, sometimes necessary after installing models**

In [9]:
from medcat.utils import memory_optimiser

In [10]:
DATA_DIR = "./data_p3.3/"
! DATA_DIR="./data_p3.3/"
model_pack_path = DATA_DIR + "medmen_wstatus_2021_oct.zip"

In [11]:
# Download the models and required data
!wget -N https://cogstack-medcat-example-models.s3.eu-west-2.amazonaws.com/medcat-example-models/medmen_wstatus_2021_oct.zip -P $DATA_DIR

--2023-07-07 10:39:55--  https://medcat.rosalind.kcl.ac.uk/media/medmen_wstatus_2021_oct.zip
Resolving medcat.rosalind.kcl.ac.uk (medcat.rosalind.kcl.ac.uk)... 193.61.202.225
Connecting to medcat.rosalind.kcl.ac.uk (medcat.rosalind.kcl.ac.uk)|193.61.202.225|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘./data_p3.3/medmen_wstatus_2021_oct.zip’ not modified on server. Omitting download.



## Loading the MedCAT modelpack

In [12]:
# Load model pack and Create CAT - the main class from medcat used for concept annotation
cat = CAT.load_model_pack(model_pack_path)




## Ability to save the CDB in JSON format to save on load times
MedCAT model save files can take a long time to load off disk.
Because of this, we've added a method that allows part of the model's CDB
to be saved in JSON format to allow for faster reading off disk.

However, it must be noted, that this load time improvement will come at a cost
in terms of file size on disk. In the limited testing, a model would become
around 35% faster to load off disk while gaining size of around 35% on disk.
The disk size increase happens for the unzipped model pack. The compressed
.zip files would be expected to be roughly the same size in either case.

PS! Memory-optimised models cannot be meaningfully loaded using medcat versions before 1.8.0.

In [13]:
save_dir = DATA_DIR + '/' + 'cdb_json_model'
# to save CDB as json
cat.create_model_pack(save_dir, cdb_format='json')  # the default format is dill

'medcat_model_pack_3754129a0c28ebbf'

## Ability to memory-optimise the medcat model

Many MedCAT models take up a lot of memory when loaded.
That's why there's now (since 1.8.0) a method to optimise
the model for lower memory usage. However, as expected,
this comes at the expense of _some_ performance (in terms
of execution time, not model performance).

The user can specify which parts of the memroy optimisation
they wish to use. However, the limited testing suggests
that the default (optimising on CUIs and snames) works best
and doing so on names as well will actually have the opposite
effect.

In [14]:
# perform memory optimisation
memory_optimiser.perform_optimisation(cat.cdb, optimise_cuis=True,
                                      optimise_names=False,
                                      optimise_snames=True)
# look at the parts that are memory-optimised
cat.cdb._memory_optimised_parts

{'CUIS', 'snames'}

The above method performs some optimisation on the model for memory usage.
The resulting model can be saved on disk just as a regular method.
And the memory optimisation will be remain available in the saved model as well.

### Undoing the memory opytimisation
There may be reasons one may want to undo the memory optimisations above.
One reason might be for use with an older version of medcat.
We've provided a method to do that.

In [15]:
# undo memory optimisation
memory_optimiser.unoptimise_cdb(cat.cdb)
# the method will look at the CDB and reverse the optimisation process