# Navigating the Filesystem

Run the Following Code to Get the Data for this Notebook:

In [75]:
from pathlib import Path
paths = [
    "data/exp1/joey_2021-05-01_001/spikes.npy", 
    "data/exp1/joey_2021-05-02_001/spikes.npy", 
    "data/exp1/joey_2021-05-02_001/lfps.h5", 
    "data/exp1/phoebe_2021-05-02_001/spikes.npy",
    "data/exp1/phoebe_2021-05-03_001/spikes.npy", 
    "data/exp1/phoebe_2021-05-03_001/lfps.h5", 
    "data/exp1/phoebe_2021-05-04_001/spikes.npy",
]

for path in paths:
    path = Path(path)
    path.parent.mkdir(exist_ok=True, parents=True)
    path.touch()

## using the pathlib library (OOP)

| Command | Purpose |
| :-- | :-- |
| `print()` | | 
| `os.listdir()` | |
| `glob.glob('*.h5')` |  |
| `os.getcwd()` | |
| `os.makedirs()` | |
| `os.removedirs()` | |
| `os.remove()` | |


In [45]:
from pathlib import Path

What is the current working directory?

In [49]:
Path.cwd()

WindowsPath('c:/Users/NickDG/Projects/remoteDuckDB/draft3')

In [52]:
Path('.').resolve()

WindowsPath('C:/Users/NickDG/Projects/remoteDuckDB/draft3')

In [53]:
Path().resolve()

WindowsPath('C:/Users/NickDG/Projects/remoteDuckDB/draft3')

What files and folders are inside the current working directory?

In [56]:
list(Path().iterdir())

[WindowsPath('1_navigating_filesystems_os_fsspec_objects.ipynb'),
 WindowsPath('2_parsing_metadata_from_filenames_str_glob.ipynb'),
 WindowsPath('3_metadata_in_json_arrays_dict.ipynb'),
 WindowsPath('4_sql_across_json_files_with_duckdb_and_hive.ipynb'),
 WindowsPath('5_storing_arrays_flat_npy.ipynb'),
 WindowsPath('6_hdf5.ipynb'),
 WindowsPath('7_sql_schemas_sql_joins_with_duckdb.ipynb'),
 WindowsPath('8_pipelines_finalizing_data_into_parquet_files.ipynb'),
 WindowsPath('data')]

What Files and folders are inside the "data" directory?

In [57]:
list(Path("data").iterdir())

[WindowsPath('data/exp1')]

What Files and Folders are inside the "exp1" directory, inside the "data" directory?

In [58]:
list(Path("data/exp1").iterdir())

[WindowsPath('data/exp1/joey_2021-05-01_001'),
 WindowsPath('data/exp1/joey_2021-05-02_001'),
 WindowsPath('data/exp1/phoebe_2021-05-02_001'),
 WindowsPath('data/exp1/phoebe_2021-05-03_001'),
 WindowsPath('data/exp1/phoebe_2021-05-04_001')]

In [59]:
list(Path().joinpath("data").joinpath("exp1").iterdir())

[WindowsPath('data/exp1/joey_2021-05-01_001'),
 WindowsPath('data/exp1/joey_2021-05-02_001'),
 WindowsPath('data/exp1/phoebe_2021-05-02_001'),
 WindowsPath('data/exp1/phoebe_2021-05-03_001'),
 WindowsPath('data/exp1/phoebe_2021-05-04_001')]

What folders in exp1 start with the subject "phoebe" (Hint: use Path().glob())?

In [61]:
list(Path("data/exp1").glob("phoebe*"))

[WindowsPath('data/exp1/phoebe_2021-05-02_001'),
 WindowsPath('data/exp1/phoebe_2021-05-03_001'),
 WindowsPath('data/exp1/phoebe_2021-05-04_001')]

What folders in exp1 start with the subject "joey"?

In [62]:
list(Path("data/exp1").glob("joey*"))

[WindowsPath('data/exp1/joey_2021-05-01_001'),
 WindowsPath('data/exp1/joey_2021-05-02_001')]

What folders in exp1 were recorded on the 2nd of May (hint-glob on the date part of the filename)?

In [63]:
list(Path("data/exp1").glob("*2021-05-02*"))

[WindowsPath('data/exp1/joey_2021-05-02_001'),
 WindowsPath('data/exp1/phoebe_2021-05-02_001')]

What files have the ".h5" file extension (include all files in any subfolders of exp1)?

In [67]:
list(Path("data/exp1").glob("**/*.h5"))

[WindowsPath('data/exp1/joey_2021-05-02_001/lfps.h5'),
 WindowsPath('data/exp1/phoebe_2021-05-03_001/lfps.h5')]

What files have the ".npy" file extension (include all files in any subfolders of exp1)?

In [66]:
list(Path("data/exp1").glob("**/*.npy"))

[WindowsPath('data/exp1/joey_2021-05-01_001/spikes.npy'),
 WindowsPath('data/exp1/joey_2021-05-02_001/spikes.npy'),
 WindowsPath('data/exp1/phoebe_2021-05-02_001/spikes.npy'),
 WindowsPath('data/exp1/phoebe_2021-05-03_001/spikes.npy'),
 WindowsPath('data/exp1/phoebe_2021-05-04_001/spikes.npy')]

Which of phoebe's files contain lfp data?

In [69]:
list(Path("data/exp1").glob("phoebe*/**/lfps*"))

[WindowsPath('data/exp1/phoebe_2021-05-03_001/lfps.h5')]

## Accessing Remote File Systems using `fsspec`: 

| Command | Purpose |
| :-- | :-- |
|`fs.ls()` | | 
| `fs.glob('*.h5')` | |
| `fs.makedirs()` | |
| `fs.removedirs()` | |
| `fs.rm()` |
| `fs.read_text()`|  |
| `fs.read_bytes()` |  |
| `fs.download()`|  |


### GitHub Repos as a Remote Filesystem

In [76]:
import fsspec
from fsspec.implementations.github import GithubFileSystem

https://github.com/mwaskom/seaborn-data

In [82]:
fs = GithubFileSystem(org="mwaskom", repo="seaborn-data")
fs.ls("/")

['README.md',
 'anagrams.csv',
 'anscombe.csv',
 'attention.csv',
 'brain_networks.csv',
 'car_crashes.csv',
 'dataset_names.txt',
 'diamonds.csv',
 'dots.csv',
 'dowjones.csv',
 'exercise.csv',
 'flights.csv',
 'fmri.csv',
 'geyser.csv',
 'glue.csv',
 'healthexp.csv',
 'iris.csv',
 'mpg.csv',
 'penguins.csv',
 'planets.csv',
 'png',
 'process',
 'raw',
 'seaice.csv',
 'taxis.csv',
 'tips.csv',
 'titanic.csv']

In [84]:
fs.glob("p*")

['penguins.csv', 'planets.csv', 'png', 'process']

In [85]:
fs.glob("*.csv", )

['anagrams.csv',
 'anscombe.csv',
 'attention.csv',
 'brain_networks.csv',
 'car_crashes.csv',
 'diamonds.csv',
 'dots.csv',
 'dowjones.csv',
 'exercise.csv',
 'flights.csv',
 'fmri.csv',
 'geyser.csv',
 'glue.csv',
 'healthexp.csv',
 'iris.csv',
 'mpg.csv',
 'penguins.csv',
 'planets.csv',
 'seaice.csv',
 'taxis.csv',
 'tips.csv',
 'titanic.csv']

In [86]:
fs.ls("png")

['png/img1.png',
 'png/img2.png',
 'png/img3.png',
 'png/img4.png',
 'png/img5.png',
 'png/img6.png']

In [91]:
fs.download("png/*", "data/seaborn-images")  # note: need glob (has to download files, apparantly)

In [99]:
import pandas as pd
pd.DataFrame(fs.ls("/", detail=True))

Unnamed: 0,name,mode,type,size,sha
0,README.md,100644,file,3101,453ab596a15d1f38f2514770783bda43d97ed755
1,anagrams.csv,100644,file,361,1d88d051b7fff295350bc2ed509b1946d41190b4
2,anscombe.csv,100644,file,556,62792b68fa5eed40eb75fe00e8daeaaf700f4f82
3,attention.csv,100644,file,1198,8d1f684e36f36aea05b10408c055eb4b30a3fcef
4,brain_networks.csv,100644,file,1075911,1ca1f474fa81aa8ee01654da5d6c9fd90c96fa27
5,car_crashes.csv,100644,file,3301,2248a441bfbbfb1d5c9fa7dbc9dae641c34829a1
6,dataset_names.txt,100644,file,174,2a27f085940eba05b41e87bbcc2d8c075c000831
7,diamonds.csv,100644,file,2772143,92259b40dbeea3165759a8f2cb576896612828ac
8,dots.csv,100644,file,25742,9b7eebf50146fd573b055b3b9f8d2caa57879723
9,dowjones.csv,100644,file,11349,8c35bf1355e823bd2aa119d2f4979c812e898df1


In [102]:

print(fs.read_text("/anscombe.csv").replace(',', '\t'))

dataset	x	y
I	10.0	8.04
I	8.0	6.95
I	13.0	7.58
I	9.0	8.81
I	11.0	8.33
I	14.0	9.96
I	6.0	7.24
I	4.0	4.26
I	12.0	10.84
I	7.0	4.82
I	5.0	5.68
II	10.0	9.14
II	8.0	8.14
II	13.0	8.74
II	9.0	8.77
II	11.0	9.26
II	14.0	8.1
II	6.0	6.13
II	4.0	3.1
II	12.0	9.13
II	7.0	7.26
II	5.0	4.74
III	10.0	7.46
III	8.0	6.77
III	13.0	12.74
III	9.0	7.11
III	11.0	7.81
III	14.0	8.84
III	6.0	6.08
III	4.0	5.39
III	12.0	8.15
III	7.0	6.42
III	5.0	5.73
IV	8.0	6.58
IV	8.0	5.76
IV	8.0	7.71
IV	8.0	8.84
IV	8.0	8.47
IV	8.0	7.04
IV	8.0	5.25
IV	19.0	12.5
IV	8.0	5.56
IV	8.0	7.91
IV	8.0	6.89



https://github.com/DeepLabCut/DeepLabCut/tree/main/examples/openfield-Pranav-2018-10-30/labeled-data/m4s1

In [104]:
fs = GithubFileSystem(org="DeepLabCut", repo="DeepLabCut")
fs.ls("/")

['.circleci',
 '.codespellrc',
 '.github',
 '.gitignore',
 'AUTHORS',
 'CODE_OF_CONDUCT.md',
 'CONTRIBUTING.md',
 'LICENSE',
 'NOTICE.yml',
 'README.md',
 '_config.yml',
 '_toc.yml',
 'conda-environments',
 'deeplabcut',
 'dlc.py',
 'docker',
 'docs',
 'examples',
 'reinstall.sh',
 'requirements.txt',
 'setup.py',
 'tests',
 'testscript_cli.py',
 'tools']

In [121]:
fs.glob("examples/open*/*")

['examples/openfield-Pranav-2018-10-30/config.yaml',
 'examples/openfield-Pranav-2018-10-30/labeled-data',
 'examples/openfield-Pranav-2018-10-30/videos']

In [122]:
fs.glob("examples/open*/**")

['examples/openfield-Pranav-2018-10-30',
 'examples/openfield-Pranav-2018-10-30/config.yaml',
 'examples/openfield-Pranav-2018-10-30/labeled-data',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/CollectedData_Pranav.csv',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/CollectedData_Pranav.h5',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/img0000.png',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/img0001.png',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/img0002.png',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/img0003.png',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/img0004.png',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/img0005.png',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/img0006.png',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/img0007.png',
 'examples/openfield-Pranav-2018-10-30/labeled-data/m4s1/i

In [131]:
paths = fs.glob("examples/open*/**/*.csv")
for path in paths:
    fs.download(path, "data/openfield-Pranav/data.csv")


In [137]:
fs.download("examples/open*/labeled-data", "deeplabcut/pranav/labeled-data", recursive=True)


In [160]:
fname = fs.glob("**/*.mp4")
for path in paths:
    fs.download(path, f"data/dlc-movies/{Path(path).name}")


### Sciebo/NextCloud/Owncloud Folders as a Remote Filesystem

https://uni-koeln.sciebo.de/login
https://uni-bonn.sciebo.de/login

### Read-Only, Public Datasets

In [181]:
from webdav4.fsspec import WebdavFileSystem
fs = WebdavFileSystem("https://uni-bonn.sciebo.de/public.php/webdav", auth=("f81JqGZmEHXxMnB", ""))
fs.ls("/", detail=False)

['README.md', 'data']

In [184]:
fs.ls("data/final", detail=False)

['data/final/steinmetz_all.csv',
 'data/final/steinmetz_all.parquet',
 'data/final/steinmetz_summer2017.csv',
 'data/final/steinmetz_winter2016.csv',
 'data/final/steinmetz_winter2017.csv']

In [185]:
pd.DataFrame(fs.ls("data/final"))

Unnamed: 0,name,href,size,created,modified,content_language,content_type,etag,type,display_name
0,data/final/steinmetz_all.csv,/public.php/webdav/data/final/steinmetz_all.csv,1516397,,2023-11-04 10:37:26+00:00,,text/csv,"""fda778a3f363c84e97456b6281796e2a""",file,
1,data/final/steinmetz_all.parquet,/public.php/webdav/data/final/steinmetz_all.pa...,336680,,2023-11-04 10:37:26+00:00,,application/octet-stream,"""6b743aabf944ada12a16e5440c617d03""",file,
2,data/final/steinmetz_summer2017.csv,/public.php/webdav/data/final/steinmetz_summer...,276260,,2023-11-02 07:24:58+00:00,,text/csv,"""56aca35e51527b69684f2091dcfa9f60""",file,
3,data/final/steinmetz_winter2016.csv,/public.php/webdav/data/final/steinmetz_winter...,359392,,2023-11-02 07:24:33+00:00,,text/csv,"""eed2c5d3bafe07f97ebaf39a745b2472""",file,
4,data/final/steinmetz_winter2017.csv,/public.php/webdav/data/final/steinmetz_winter...,805706,,2023-11-02 07:25:20+00:00,,text/csv,"""a0550494a6ca3ed9aa40e73173b6514e""",file,


In [182]:
fs.ls("data/processed", detail=False)

['data/processed/steinmetz_2016-12-14_Cori.nc',
 'data/processed/steinmetz_2016-12-17_Cori.nc',
 'data/processed/steinmetz_2016-12-18_Cori.nc',
 'data/processed/steinmetz_2017-01-07_Muller.nc',
 'data/processed/steinmetz_2017-01-08_Muller.nc',
 'data/processed/steinmetz_2017-01-08_Radnitz.nc',
 'data/processed/steinmetz_2017-01-09_Muller.nc',
 'data/processed/steinmetz_2017-01-09_Radnitz.nc',
 'data/processed/steinmetz_2017-01-10_Radnitz.nc',
 'data/processed/steinmetz_2017-01-11_Radnitz.nc',
 'data/processed/steinmetz_2017-01-12_Radnitz.nc',
 'data/processed/steinmetz_2017-05-15_Moniz.nc',
 'data/processed/steinmetz_2017-05-16_Moniz.nc',
 'data/processed/steinmetz_2017-05-18_Moniz.nc',
 'data/processed/steinmetz_2017-06-15_Hench.nc',
 'data/processed/steinmetz_2017-06-16_Hench.nc',
 'data/processed/steinmetz_2017-06-17_Hench.nc',
 'data/processed/steinmetz_2017-06-18_Hench.nc',
 'data/processed/steinmetz_2017-10-11_Theiler.nc',
 'data/processed/steinmetz_2017-10-29_Richards.nc',
 'dat

### Remote Filesystems that Require Authentication: Basic Secret Handling

In [193]:
import os
from webdav4.fsspec import WebdavFileSystem
import fsspec  # webdav4 should be installed
from dotenv import load_dotenv
load_dotenv(override=True)
env = os.environ


**Exercises**

Pick one person in your group to create a shared folder for the group to upload data to.  That person should: 
  - Using Sciebo in the web browser, create a new, empty folder in their Sciebo account.
  - Create a new text file in that folder called "README.txt", and put a special message inside the file.
  - **For each** team member to share the code with:
    - Share the folder by creating a "public link", giving the 
        - name of the team member as the link's name, 
        - permission to download/view/upload/edit, 
        - a password, 
        - and set the link to expire in the near-ish future.
    - Using the Zoom Chat, give each person the share link and the password.



**Hiding the Secrets from the Source Code**

It's important not to put the text of the username and password inside your code files; it makes it too easy for others to find (including robots), and makes it more difficult to share code with others.  Instead, we should put secrets it in a seperate file that we won't share with others; something that our code file can use.

Here, we'll try out a standard approach that works in a wide variety of situations: make a `.env` ("dot env") file, which has all the environment variables we want to use.  Because it has a special extension, this file is easy to tell git to ignore, and many tools know how to work with it automatically.




**Exercise** 

Make a file called `.env` and write the username and password into the file as variables like so:

```dotenv
URL=https://uni-bonn.sciebo.de/public.php/webdav  # or whatever the correct address is.
USERNAME=f81JqGZmEHLxMnB                          # the last part of the share link (e.g. from https://uni-bonn.sciebo.de/s/f81JqGZmEHLxMnB)
PASSWORD=mypassword                               # the password
```

To load the data into python, we can use the `python-dotenv` to create variables in the operating system from the file (called "environment variables"), and then use the `os.environ` dictionary to access those variables.  

Run the code below to see if you can now access the variables:

In [199]:
# %pip install python-dotenv
import os
from dotenv import load_dotenv
load_dotenv(override=True)
env = os.environ
env['URL']  # env['USERNAME] and env['PASSWORD'] should also work, if those variables are found.

'https://uni-bonn.sciebo.de/public.php/webdav'

**Exercise**

Use the `env` variables to connect to the shared remote folder, and list all the files in that folder.

In [203]:
fs = fsspec.filesystem("webdav", base_url=env['URL'], auth=(env['USERNAME'], env['PASSWORD']))
fs.ls("/", detail=False)

['README.txt']

Read the text in the README file.  What message is inside the file?

In [204]:
fs.read_text("README.txt")

'hi!\n'

Write text to a file named "\<your_name\>.txt" (e.g. "emma.txt"), and put a hello message inside that file, and check that the file was created!

In [206]:
fs.write_text("nick.txt", "Hello, Nick.")

12

Create a folder named "images" and upload a picture of your favorite animal to the folder.  Check that it was uploaded properly.

In [213]:
# %pip install requests
import requests
r = requests.get("https://media.hornbach.de/hb/packshot/as.47485436.jpg")
Path("data/giraffe.jpg").write_bytes(r.content)

80086

In [219]:
fs.makedirs("images", exist_ok=True)
fs.upload("data/giraffe.jpg", "images/giraffe.jpg")
fs.ls("images")

[{'name': 'images/giraffe.jpg',
  'href': '/public.php/webdav/images/giraffe.jpg',
  'size': 80086,
  'created': None,
  'modified': datetime.datetime(2024, 2, 10, 13, 31, 2, tzinfo=datetime.timezone.utc),
  'content_language': None,
  'content_type': 'image/jpeg',
  'etag': '"42c602d0a8d12b07d4da8a1ee91d1ba0"',
  'type': 'file',
  'display_name': None}]

What did the other people in your group upload?  Take a look, and download their favorite animal images!

## Mounting Remote Filesystems as a Network Drive

In Windows, Mac, and Linux, you can mount a remote filesystem so you can browse it in your file explorer.  Let's try it with our shared folder!  This doesn't require any code; when mapping the drive, use the same url, username, and password as we did in the previous exercise.

## (Extra) Other Remote Filesystems

The `fsspec` library supports a wide variety of filesystems, including "SSH" connections (like those used to access linux servers) and "SFTP" connections (like the one provided by the university of Bonn for extra file storage).  No matter where the data is, if you have permission to access it, you can use it!

  - Uni-Bonn Data Storage Services:
    - https://www.hrz.uni-bonn.de/de/services/datenablage-fileservices
    - https://www.hrz.uni-bonn.de/en/all-services/data-storage-fileservices/research-data-infrastructure-fdi
  - Uni-Köln Data Storage Services: 
    - https://fdm.uni-koeln.de/serviceangebot/servicekatalog-1