# Navigating the Filesystem

Let's delve into the essential skills of navigating and managing files and directories, a fundamental aspect of handling experimental data in neuroscience research. We will explore various commands and techniques to efficiently organize and access your experimental data, ensuring seamless integration into your analysis workflow.

Run the Following Code to Get the Data for this Notebook:

In [1]:
from pathlib import Path
paths = [
    "data/exp1/joey_2021-05-01_001/spikes.npy", 
    "data/exp1/joey_2021-05-02_001/spikes.npy", 
    "data/exp1/joey_2021-05-02_001/lfps.h5", 
    "data/exp1/phoebe_2021-05-02_001/spikes.npy",
    "data/exp1/phoebe_2021-05-03_001/spikes.npy", 
    "data/exp1/phoebe_2021-05-03_001/lfps.h5", 
    "data/exp1/phoebe_2021-05-04_001/spikes.npy",
]

for path in paths:
    path = Path(path)
    path.parent.mkdir(exist_ok=True, parents=True)
    path.touch()

## Using the pathlib library

The pathlib module in Python introduces an object-oriented approach to file system paths--. This section is designed to familiarize you with this powerful library, enhancing your ability to handle file paths and directories with more flexibility and intuitiveness. We'll cover basic operations like listing directories, globbing for pattern matching, and more, all through the lens of object-oriented programming.

| Command | Purpose |
| :-- | :-- |
| `from pathlib import Path` | | 
| `Path('.').resolve()` | Gets the current working directory. |
| `path = Path('./data')` | Make a `Path` object located in the data folder of the working directory. |
| `list(path.iterdir())` | List all the files and folders in the specified path |
| `new_path = path.joinpath("raw")` | Append the "/raw" folder to the current path |
| `new_path = path / "raw"` | Also append the "/raw" folder to the current path. |
| `glob.glob('*.h5')` | Search for files that end in ".h5" in the current path. |
| `glob.glob('data*')` | Search for files that start with "data" in the current path. |
| `glob.glob('./**/data*')` | Search for files that start with "data" in the any subfolder in the current path. |


In [2]:
from pathlib import Path

What is the current working directory?

In [5]:
path = Path() #creating a new path
path.resolve()


WindowsPath('C:/iBOTS course/file_and_data_management/iBOTS-File-And-Data-Management/day1')

What files and folders are inside the current working directory?

In [6]:
list(path.iterdir())

[WindowsPath('1_organizing-data-into-dictionaries.ipynb'),
 WindowsPath('2_parsing_metadata_from_filenames.ipynb'),
 WindowsPath('3_navigating_filesystems_pathlib_fsspec_objects.ipynb'),
 WindowsPath('4_webdav_sciebo.ipynb'),
 WindowsPath('data')]

What Files and folders are inside the "data" directory?

In [11]:
pathdata = Path('./data/exp1')
pathdata
list(pathdata.iterdir())

[WindowsPath('data/exp1/joey_2021-05-01_001'),
 WindowsPath('data/exp1/joey_2021-05-02_001'),
 WindowsPath('data/exp1/phoebe_2021-05-02_001'),
 WindowsPath('data/exp1/phoebe_2021-05-03_001'),
 WindowsPath('data/exp1/phoebe_2021-05-04_001')]

What Files and Folders are inside the "exp1" directory, inside the "data" directory?

What folders in exp1 start with the subject "phoebe" (Hint: use Path().glob())?

In [15]:
list(pathdata.glob("*phoebe*"))

[WindowsPath('data/exp1/phoebe_2021-05-02_001'),
 WindowsPath('data/exp1/phoebe_2021-05-03_001'),
 WindowsPath('data/exp1/phoebe_2021-05-04_001')]

What folders in exp1 start with the subject "joey"?

In [16]:
list(pathdata.glob("*joey*"))

[WindowsPath('data/exp1/joey_2021-05-01_001'),
 WindowsPath('data/exp1/joey_2021-05-02_001')]

What folders in exp1 were recorded on the 2nd of May (hint-glob on the date part of the filename)?

In [17]:
list(pathdata.glob("*05-02*"))

[WindowsPath('data/exp1/joey_2021-05-02_001'),
 WindowsPath('data/exp1/phoebe_2021-05-02_001')]

What files have the ".h5" file extension (include all files in any subfolders of exp1)?

In [19]:
list(pathdata.glob('./**/*.h5'))

[WindowsPath('data/exp1/joey_2021-05-02_001/lfps.h5'),
 WindowsPath('data/exp1/phoebe_2021-05-03_001/lfps.h5')]

What files have the ".npy" file extension (include all files in any subfolders of exp1)?

In [20]:
list(pathdata.glob('./**/*.npy'))

[WindowsPath('data/exp1/joey_2021-05-01_001/spikes.npy'),
 WindowsPath('data/exp1/joey_2021-05-02_001/spikes.npy'),
 WindowsPath('data/exp1/phoebe_2021-05-02_001/spikes.npy'),
 WindowsPath('data/exp1/phoebe_2021-05-03_001/spikes.npy'),
 WindowsPath('data/exp1/phoebe_2021-05-04_001/spikes.npy')]

Which of phoebe's files contain lfp data?

In [22]:
list(pathdata.glob('./*phoebe*/*lfp*')) #* * denotes anything inside a given folder/file

[WindowsPath('data/exp1/phoebe_2021-05-03_001/lfps.h5')]

## Accessing Remote File Systems using `fsspec`: 

In modern neuroscience research, accessing and manipulating data stored in remote file systems is increasingly common. This section introduces fsspec, a library for interacting with various file systems, including remote and cloud-based storage. We'll explore how to list, search, and manage files on different remote systems, an invaluable skill in a data-intensive field like neuroscience.


| Code | Description |
| :-- | :-- |
|`fs.ls()` | Lists all files and directories in the current directory of the filesystem. |
| `fs.glob('*.h5')` | Searches for files matching a specified pattern (in this case, all files ending with '.h5') in the current directory and subdirectories. |
| `fs.makedirs()` | Creates a new directory at the specified path, including any necessary intermediate directories. |
| `fs.removedirs()` | Removes directories recursively. Deletes a directory and, if it's empty, its parent directories as well. |
| `fs.rm()` | Removes (deletes) a file or directory. |
| `fs.read_text()`| Reads the contents of a file and returns it as a string. |
| `fs.read_bytes()` | Reads the contents of a file and returns it as bytes. |
| `fs.download()`| Downloads a file from the remote filesystem to the local filesystem. |


### GitHub Repos as a Remote Filesystem

GitHub, a platform widely used for code sharing and collaboration, can also serve as a remote filesystem for data storage and retrieval. This section guides you through using GitHub repositories for accessing and managing data files, leveraging the `GithubFileSystem` class in `fsspec`. 



**Exercises**: Explore navigating remote GitHub filesystems using the `fsspec`'s `GithubFileSystem` class.

In [24]:
%pip install fsspec

Collecting fsspecNote: you may need to restart the kernel to use updated packages.

  Downloading fsspec-2024.2.0-py3-none-any.whl.metadata (6.8 kB)
Downloading fsspec-2024.2.0-py3-none-any.whl (170 kB)
   ---------------------------------------- 0.0/170.9 kB ? eta -:--:--
   ------------------- -------------------- 81.9/170.9 kB 2.3 MB/s eta 0:00:01
   -------------------------------------- - 163.8/170.9 kB 2.5 MB/s eta 0:00:01
   ---------------------------------------- 170.9/170.9 kB 1.7 MB/s eta 0:00:00
Installing collected packages: fsspec
Successfully installed fsspec-2024.2.0


In [2]:
%pip install requests

Collecting requests
  Using cached requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting charset-normalizer<4,>=2 (from requests)
  Using cached charset_normalizer-3.3.2-cp311-cp311-win_amd64.whl.metadata (34 kB)
Collecting idna<4,>=2.5 (from requests)
  Using cached idna-3.6-py3-none-any.whl.metadata (9.9 kB)
Collecting urllib3<3,>=1.21.1 (from requests)
  Using cached urllib3-2.2.1-py3-none-any.whl.metadata (6.4 kB)
Collecting certifi>=2017.4.17 (from requests)
  Using cached certifi-2024.2.2-py3-none-any.whl.metadata (2.2 kB)
Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Using cached certifi-2024.2.2-py3-none-any.whl (163 kB)
Using cached charset_normalizer-3.3.2-cp311-cp311-win_amd64.whl (99 kB)
Using cached idna-3.6-py3-none-any.whl (61 kB)
Using cached urllib3-2.2.1-py3-none-any.whl (121 kB)
Installing collected packages: urllib3, idna, charset-normalizer, certifi, requests
Successfully installed certifi-2024.2.2 charset-normalizer-3.3.2 idna-3.6 requests-2.31.0 u

In [1]:
import fsspec
from fsspec.implementations.github import GithubFileSystem

**Example**: List all the files in the root directory of https://github.com/ibehave-ibots/iBOTS-Tools

In [12]:
from fsspec.implementations.github import GithubFileSystem
fs = GithubFileSystem(org="ibehave-ibots", repo="iBOTS-Tools")
fs.ls('')

['src/commit-scoreboard', 'src/workshop-registration']

List all the files in the root directory of https://github.com/mwaskom/seaborn-data

In [13]:
fs = GithubFileSystem(org="mwaskom", repo="seaborn-data")
fs.ls('')

['README.md',
 'anagrams.csv',
 'anscombe.csv',
 'attention.csv',
 'brain_networks.csv',
 'car_crashes.csv',
 'dataset_names.txt',
 'diamonds.csv',
 'dots.csv',
 'dowjones.csv',
 'exercise.csv',
 'flights.csv',
 'fmri.csv',
 'geyser.csv',
 'glue.csv',
 'healthexp.csv',
 'iris.csv',
 'mpg.csv',
 'penguins.csv',
 'planets.csv',
 'png',
 'process',
 'raw',
 'seaice.csv',
 'taxis.csv',
 'tips.csv',
 'titanic.csv']

List all the files whose filenames start with the letter "p" (i.e. "glob" the files)

In [15]:
fs.glob('*p*') #whatever word with p
#fs.glob('p*') starts with p

['healthexp.csv',
 'mpg.csv',
 'penguins.csv',
 'planets.csv',
 'png',
 'process',
 'tips.csv']

List all the files whose filenames end in the "CSV" extension.

In [16]:
fs.glob('*.csv') #ends with csv

['anagrams.csv',
 'anscombe.csv',
 'attention.csv',
 'brain_networks.csv',
 'car_crashes.csv',
 'diamonds.csv',
 'dots.csv',
 'dowjones.csv',
 'exercise.csv',
 'flights.csv',
 'fmri.csv',
 'geyser.csv',
 'glue.csv',
 'healthexp.csv',
 'iris.csv',
 'mpg.csv',
 'penguins.csv',
 'planets.csv',
 'seaice.csv',
 'taxis.csv',
 'tips.csv',
 'titanic.csv']

List all the PNG image files in the "png" folder.

In [20]:
fs.glob('png/*.png')

['png/img1.png',
 'png/img2.png',
 'png/img3.png',
 'png/img4.png',
 'png/img5.png',
 'png/img6.png']

Download all the PNG image files in the "png" folder.

In [22]:
png_files = fs.glob('png/*.png')
for file in png_files:
    fs.download(file, file.split("/")[-1]) #taking last element of list - img.png

List all the files in the root directory of the repo, with `detail=True` (i.e. `fs.ls("/", detail=True)`).  What information does it give us about these files?

In [23]:
fs.ls("/", detail=True)

[{'name': 'README.md',
  'mode': '100644',
  'type': 'file',
  'size': 3101,
  'sha': '453ab596a15d1f38f2514770783bda43d97ed755'},
 {'name': 'anagrams.csv',
  'mode': '100644',
  'type': 'file',
  'size': 361,
  'sha': '1d88d051b7fff295350bc2ed509b1946d41190b4'},
 {'name': 'anscombe.csv',
  'mode': '100644',
  'type': 'file',
  'size': 556,
  'sha': '62792b68fa5eed40eb75fe00e8daeaaf700f4f82'},
 {'name': 'attention.csv',
  'mode': '100644',
  'type': 'file',
  'size': 1198,
  'sha': '8d1f684e36f36aea05b10408c055eb4b30a3fcef'},
 {'name': 'brain_networks.csv',
  'mode': '100644',
  'type': 'file',
  'size': 1075911,
  'sha': '1ca1f474fa81aa8ee01654da5d6c9fd90c96fa27'},
 {'name': 'car_crashes.csv',
  'mode': '100644',
  'type': 'file',
  'size': 3301,
  'sha': '2248a441bfbbfb1d5c9fa7dbc9dae641c34829a1'},
 {'name': 'dataset_names.txt',
  'mode': '100644',
  'type': 'file',
  'size': 174,
  'sha': '2a27f085940eba05b41e87bbcc2d8c075c000831'},
 {'name': 'diamonds.csv',
  'mode': '100644',
  't

Read and print the text contents of the "anscombe.csv" file. What data is inside this file?

In [25]:
print(fs.read_text("anscombe.csv"))

dataset,x,y
I,10.0,8.04
I,8.0,6.95
I,13.0,7.58
I,9.0,8.81
I,11.0,8.33
I,14.0,9.96
I,6.0,7.24
I,4.0,4.26
I,12.0,10.84
I,7.0,4.82
I,5.0,5.68
II,10.0,9.14
II,8.0,8.14
II,13.0,8.74
II,9.0,8.77
II,11.0,9.26
II,14.0,8.1
II,6.0,6.13
II,4.0,3.1
II,12.0,9.13
II,7.0,7.26
II,5.0,4.74
III,10.0,7.46
III,8.0,6.77
III,13.0,12.74
III,9.0,7.11
III,11.0,7.81
III,14.0,8.84
III,6.0,6.08
III,4.0,5.39
III,12.0,8.15
III,7.0,6.42
III,5.0,5.73
IV,8.0,6.58
IV,8.0,5.76
IV,8.0,7.71
IV,8.0,8.84
IV,8.0,8.47
IV,8.0,7.04
IV,8.0,5.25
IV,19.0,12.5
IV,8.0,5.56
IV,8.0,7.91
IV,8.0,6.89



**DeepLabCut**: Answer the following questions about the DeepLabCut GitHub Repo:   https://github.com/DeepLabCut/DeepLabCut

What files are in the root directory of the DeepLabCut repo?

In [26]:
fs = GithubFileSystem(org="DeepLabCut", repo="DeepLabCut")
fs.ls('')

['.circleci',
 '.codespellrc',
 '.github',
 '.gitignore',
 'AUTHORS',
 'CODE_OF_CONDUCT.md',
 'CONTRIBUTING.md',
 'LICENSE',
 'NOTICE.yml',
 'README.md',
 '_config.yml',
 '_toc.yml',
 'conda-environments',
 'deeplabcut',
 'dlc.py',
 'docker',
 'docs',
 'examples',
 'reinstall.sh',
 'requirements.txt',
 'setup.py',
 'tests',
 'testscript_cli.py',
 'tools']

How many files or folders are in the "openfield-Pranav-2018-10-30" folder, which is in the "examples" folder?  (Tip: the `len()` function can be helpful here.)

In [30]:
len(fs.glob('examples/openfield-Pranav-2018-10-30/**'))

124

How many files are there, if you include every single file or folder in all the subfolders of the openfield example?

In [32]:
len(fs.glob('examples/openfield*/**'))

124

Download all the "labeled-data" files in the openfield example (`fs.download(recursive=True)`)

In [None]:
png_files = fs.glob('png/*.png')
for file in png_files:
    fs.download(file, file.split("/")[-1])