# Hugging Face Setup

Let's quickly make sure HF is set up and that you are able to access downloads from Hugging Face Hub using your Token.

## Python Libraries Install:

Note that we use various versions of these libraries throughout the course, make sure to watch the video to know which version to use!

In [1]:
!pip install transformers diffusers datasets evaluate accelerate

Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m5.

In [2]:
import transformers
transformers.__version__

'4.46.3'

In [3]:
from huggingface_hub import notebook_login

In [4]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [5]:
from huggingface_hub import scan_cache_dir

#hf_cache_info = scan_cache_dir()
#print(hf_cache_info)

When you work with Hugging Face's Python libraries, such as the `transformers` library, you'll often download pre-trained models and datasets. These downloaded files are stored locally on your machine to avoid repeated downloads and to ensure quick access in future uses. Let's explore where and how these files are stored.

## Where Are Hugging Face Models Stored?

By default, Hugging Face stores downloaded models in a directory under your home directory. Specifically, it uses a hidden folder named `.cache`. The typical path looks like this:

- On Unix-based systems (Linux, macOS):
  ```
  ~/.cache/huggingface/
  ```

- On Windows systems:
  ```
  C:\Users\<YourUsername>\.cache\huggingface\
  ```

**NOTE - .cache is hidden by default! You will need to set hidden files viewable to see it!

----

Hidden directories are often used to store configuration files and caches. These directories are typically not shown in default file explorer views. Here’s how you can view hidden directories on different operating systems:

## Viewing Hidden Directories on Different Operating Systems

### macOS

On macOS, hidden directories and files (those starting with a dot, such as `.cache`) can be made visible in Finder:

1. **Using Finder:**
   - Open Finder.
   - Press `Command + Shift + .` (period). This will toggle the visibility of hidden files and directories.

2. **Using Terminal:**
   - Open Terminal.
   - To list hidden files in a directory, use the following command:
     ```bash
     ls -la
     ```
   - The `-a` flag shows all files, including hidden ones, and the `-l` flag gives a detailed listing.

### Linux

On Linux, hidden files and directories can be viewed in the file manager or terminal:

1. **Using File Manager (e.g., Nautilus):**
   - Open your file manager.
   - Press `Ctrl + H`. This will toggle the visibility of hidden files and directories.

2. **Using Terminal:**
   - Open Terminal.
   - To list hidden files in a directory, use the following command:
     ```bash
     ls -la
     ```
   - The `-a` flag shows all files, including hidden ones, and the `-l` flag gives a detailed listing.

### Windows

On Windows, hidden files and directories can be viewed in File Explorer:

1. **Using File Explorer:**
   - Open File Explorer.
   - Click on the `View` tab at the top.
   - Check the box for `Hidden items` in the Show/hide group. This will toggle the visibility of hidden files and directories.

2. **Using Command Prompt:**
   - Open Command Prompt.
   - To list hidden files in a directory, use the following command:
     ```cmd
     dir /a
     ```
   - The `/a` flag lists all files, including hidden ones.

## Summary

Viewing hidden directories on different operating systems is straightforward:

- **macOS:** Press `Command + Shift + .` in Finder or use `ls -la` in Terminal.
- **Linux:** Press `Ctrl + H` in the file manager or use `ls -la` in Terminal.
- **Windows:** Check `Hidden items` in File Explorer’s View tab or use `dir /a` in Command Prompt.

These methods allow you to easily access and manage hidden files and directories on your system.

----

**Ok, let's move on, back to Hugging Face topics!**

The Hugging Face Python libraries store downloaded models in a centralized cache directory. This cache system is designed to be shared across various libraries that depend on the Hugging Face Hub. Here is a detailed explanation of where and how these models are stored:

## Cache Directory Structure

The cache directory is typically located in the user's home directory, but it can be customized using the `cache_dir` argument in methods or by setting the `HF_HOME` or `HF_HUB_CACHE` environment variables. The structure of the cache directory is as follows:

```
<CACHE_DIR>
├─ <MODELS>
├─ <DATASETS>
├─ <SPACES>
```

Within these main folders, the cache is further organized by repository type, namespace (if applicable), and repository name. For example:

```
<CACHE_DIR>
├─ models--julien-c--EsperBERTo-small
├─ models--lysandrejik--arxiv-nlp
├─ models--bert-base-cased
├─ datasets--glue
├─ datasets--huggingface--DataMeasurementsFiles
├─ spaces--dalle-mini--dalle-mini
```

## Detailed Folder Structure

Each repository folder contains subfolders that store different types of files, such as references, blobs, and snapshots. Here is an example of the folder structure for a dataset:

```
<CACHE_DIR>
├─ datasets--glue
│   ├─ refs
│   ├─ blobs
│   ├─ snapshots
```

## Managing the Cache

### Scanning the Cache

To manage and inspect the cache, you can use the `huggingface-cli` tool or the `scan_cache_dir` function from the `huggingface_hub` library. This allows you to see which repositories and revisions are taking up disk space. For example:

```python
from huggingface_hub import scan_cache_dir

hf_cache_info = scan_cache_dir()
print(hf_cache_info)
```

This will return an `HFCacheInfo` object containing details about the cached repositories, their sizes, and any warnings about corrupted caches.

### Example Command

Using the `huggingface-cli` to scan the cache:

```bash
huggingface-cli scan-cache
```

This command will output a detailed report of the cache, including repository IDs, types, sizes, and paths.

## Customizing the Cache Directory

You can customize the cache directory by setting the `cache_dir` argument in methods or by using environment variables. For example:

```python
from huggingface_hub import cached_assets_path

path = cached_assets_path(library_name="datasets", namespace="SQuAD", subfolder="download")
print(path)
```

This will return the path to the cached assets for the specified library, namespace, and subfolder.

## Conclusion

The Hugging Face cache system is designed to efficiently store and manage models, datasets, and other resources. By understanding the structure and management tools available, users can effectively control their cache usage and ensure optimal performance.

For more detailed information, you can refer to the Hugging Face documentation on managing the cache system[1][3][5][7].

Citations:
[1] https://huggingface.co/docs/huggingface_hub/guides/manage-cache
[2] https://discuss.huggingface.co/t/model-caching-and-locking/44152
[3] https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache
[4] https://huggingface.co/docs/hub/en/models
[5] https://huggingface.co/docs/huggingface_hub/package_reference/cache
[6] https://huggingface.co/docs/hub/en/models-libraries
[7] https://huggingface.co/docs/huggingface_hub/en/package_reference/cache
[8] https://huggingface.co/docs/hub/en/models-adding-libraries
[9] https://discuss.huggingface.co/t/how-to-save-my-model-to-use-it-later/20568
[10] https://huggingface.co/docs/transformers/en/main_classes/model