# Understanding where python is executed

You can use python to work with files. Depending on your setup, python runs on your machine or a remote machine.

## 1) Python on your machine, e.g., via Anaconda

You can access all files on your machine. For convenience, we usually store data near the python file in the same directory or a subfolder.

![image.png](attachment:image.png)

## 2) Python on a remote machine, e.g., via Google Colab

Colab creates a virtual machine just for you. It is empty except for some sample data. You can connect your Google Drive with the machine or upload data to the machine directly. 

**After you close your session, the virtual machine will be deleted and all files not downloaded to your local machine or saved in Google Drive will be lost.**

![image.png](attachment:image.png)

In both cases python is executed on a machine and can access it's filesystem. Python is executed either 1) on your machine with your filesystem, on 2) on a remote virtual machine with it's own filesystem.

# Working with files on the machine

Import pandas to read in the data from the files.

In [2]:
import pandas as pd

To open files in the same directory as the python code, use the filename.

In [None]:
data = pd.read_excel('data.xlsx')

To access files in a parent folder, use '..' for each level of folder hierarchy.

In [None]:
data = pd.read_excel('../../data.xlsx') # 2 levels up

For subfolders, use 'subfolder/'.

In [None]:
data = pd.read_excel('subfolder/data.xlsx') 

Generally, you can specify the exact path to access any data on the machine.

In [None]:
data = pd.read_excel('C:/Users/username/Desktop/data.xlsx')

Often, the location of the directory is stored in a variable for increased readability.

In [11]:
path = "C:/Users/username/Desktop/"

data = pd.read_excel(path + 'data.xlsx')

# Working with files from the internet

Pandas can use the internet to access files stored on a server, e.g., GitHub. GitHub is a repository service we use to help you working with files. The files are public and you can access them with their URL.

In [4]:
data = pd.read_excel("https://github.com/casbdai/notebooks2023/raw/main/Module2/Onboarding/Examples/data.xlsx")

Note, that this is a direct access and pandas reads the information from the file into the variable data, but it does not download the file 'data.xlsx'.

Sometimes, direct access might not work, e.g., with certain database files. You can download any file to the machine you are using (local or remote via Google Colab) with the wget command.

In [None]:
!wget "https://github.com/casbdai/datasets/raw/main/Module2/Onboarding/Examples/data.xlsx"

data = pd.read_excel('data.xlsx')

# Persistent storage in Google Colab with Google Drive

Upload your files to your personal Google Drive, e.g., via the browser interface. For this example, we uploaded them to a top level folder called "MyDataFolder".


First, provide the virtual machine with access to your personal Google Drive. You have to confirm a pop-up message in your browser.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

To access your files, you must append '/drive/My Drive/' to access the top level directory.

In [None]:
df = pd.read_excel("drive/My Drive/MyDataFolder/data.xlsx")

Again, you can use a variable for the directory.

In [None]:
path = 'drive/My Drive/MyDataFolder/'

df = pd.read_excel(path + 'data.xlsx')

# Temporary storage in Google Colab: Upload files

You can upload files directly to the environment in which your Google Colab Notebook is executed. This data will be lost after the session closes. 

To upload files manually, you can use the Google Colab sidebar on the left by clicking on the small folder and using the upload icon. 

![image.png](attachment:image.png)

Alternatively, use this code block:

In [None]:
from google.colab import files

uploaded = files.upload()

Your files will then be in the same folder as the notebook and you can access them:

In [None]:
df = pd.read_excel("data.xlsx")