## Create a  Python Google Colab Persistence Virtual Environment

*  ***Ephemeral nature of Colab:***

The "ephemeral nature" of Google Colab means that its runtime environment is temporary and will be automatically deleted when not actively in use. Essentially, any data or changes you make within a Colab session are not permanently stored unless explicitly saved, making it ideal for quick experiments and prototyping while requiring careful management of data persistence for longer-term projects.

*  ***Workarounds for persistence:***

To maintain changes across sessions, store the virtual environment and its dependencies outside the Colab runtime, such as on Google Drive or JuiceFS, based on the reference for the cloud-based high-performance distributed file system and its limitations.

## JuiceFS vs. Google Cloud Storage

Choosing between JuiceFS and Google Cloud Storage depends on specific needs. Still, if a file system with POSIX compatibility, high performance for frequent file access, and seamless integration with existing applications is required, JuiceFS is generally considered the better option. At the same time, Google Cloud Storage excels in large-scale, low-cost data storage with high durability, particularly for infrequently accessed data.

Users often use Google Drive to store files persistently in Colab. However, Google Drive has usage restrictions, such as total upload bandwidth and a maximum file count. As an open-source distributed file system, JuiceFS has no limitations and is cost-effective because it flexibly organizes resources.

[How to Persist Data in Google Colab Using JuiceFS](https://juicefs.com/en/blog/usage-tips/colab-persist-data)


JuiceFS is a cloud-native, high-performance distributed file system licensed under Apache 2.0. It is completely POSIX compatible and supports various access methods, including FUSE POSIX, HDFS, S3, the Kubernetes CSI Driver, and WebDAV.

In Colab, you can mount JuiceFS using the FUSE POSIX method as a background daemon.


#### Step 1: Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

#### Step 2: Create an AI development virtual environment (**python_ai_env**) for the Python programming language

In [None]:
!pip install virtualenv
!pip install virtualenv --upgrade
!virtualenv /content/drive/MyDrive/python_ai_env
%cd /content/drive/MyDrive/python_ai_env
!python -m venv python_ai_env

#### Step 3: Activate the Python Virtual Environment

In [None]:
!source /content/drive/MyDrive/my_colab_env/bin/activate

#### Step 4: Create the file: In our project's root directory, create a file named requirements.txt

In [None]:
with open('/content/drive/My Drive/requirements.txt', 'w') as f:
  f.write('requirements.txt!')

#### Step 5: Add dependencies list the packages you need, one per line, in the requirements.txt. We can specify versions if needed:

In [None]:
with open('requirements.txt', 'w') as f:
     f.write('altair\numpy\nscikit-learn\nmatplotlib\pandas\spacy\-U scikit-learn\xgboost\scispacy\pysoundfile')

#### Step 6: Install the dependencies in the requirements.txt:

In [None]:
pip install -r requirements.txt

#### Step 7: Install more libraries are need and not in requirements.txt:

In [None]:
!pip install altair
!pip install numpy
!pip install pandas
!pip install spacy
!pip install -U scikit-learn
!pip install xgboost
!pip install scispacy
!pip install pysoundfile
!pip install libav-tools -y
!pip install zip
!pip install tensorflow
!pip install tensorflow-io
!pip install matplotlib
!pip install transformers
!pip install scispacy

####Step 8: Upgrade Python virtual environment packages:

In [None]:
!pip install scikit-learn --upgrade
!pip install xgboost --upgrade

#### Step 9: Re-accessing the Python virtual environment packages:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

#### Step 10: Re-activate the Python virtual environment packages.

In [None]:
!source /content/drive/MyDrive/my_colab_env/bin/activate

#### Step 11: Import and use the Python packages forthe first time

In [None]:
import sys
import torch
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import warnings
import altair as alt
import pickle
import string
import spacy
import nltk
import re
import IPython
import librosa
import librosa.display
import random
import tensorflow as tf
import tensorflow_io as tfio
import os
import glob
import threading
import altair as alt
import pylab
import gc
import scispacy
import soundfile as sf
import scipy.stats as stats
import warnings
import csv
import google.colab.files

from google.colab import auth
from google.colab import output
from google.colab import drive
from nltk.corpus import stopwords
from collections import Counter
from PIL import Image
from pathlib import Path
from scipy.io import wavfile
from tqdm import tqdm_notebook as tqdm
from collections import Counter
from sklearn.preprocessing import OneHotEncoder
from tensorflow.python.keras.optimizer_v2.adam import Adam
from nltk.util import ngrams
from keras.callbacks import LearningRateScheduler
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras.layers import LSTM, Dense, Embedding, Conv1D, GlobalMaxPooling1D
from sklearn.metrics import classification_report, accuracy_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Conv1D, GlobalMaxPooling1D, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from PIL import Image
from sklearn.naive_bayes import *
from sklearn.ensemble import *
from sklearn.neighbors import *
from sklearn.tree import *
from sklearn.calibration import *
from sklearn.linear_model import *
from sklearn.multiclass import *
from sklearn.svm import *
from sklearn.neural_network import MLPClassifier
from xgboost import XGBClassifier
from nltk.stem import WordNetLemmatizer
from collections import Counter
from spacy.lang.en.stop_words import STOP_WORDS
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, auc, roc_curve
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer, TfidfVectorizer, HashingVectorizer
from sklearn.model_selection import train_test_split, cross_val_score, KFold, GridSearchCV
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.feature_extraction.text import CountVectorizer
from spacy.lang.en.stop_words import STOP_WORDS
from collections import Counter
from nltk.probability import FreqDist
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from sklearn.neural_network import MLPClassifier  # Simple deep learning (multi-layer perceptron)
from sklearn.model_selection import KFold, cross_val_score, train_test_split
from sklearn.metrics import accuracy_score
from fastai.text import *
from fastai.vision import *
from spacy import displacy
from nltk.corpus import stopwords
from wordcloud import WordCloud, STOPWORDS
from collections import Counter
from glob import glob
from tqdm import tqdm
from xgboost import XGBClassifier
from transformers import BertTokenizer, TFBertModel

nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('stopwords')
nltk.download('vader_lexicon')

sns.set(style='whitegrid')
%matplotlib inline
warnings.filterwarnings('ignore')

#### Step 12: Access the Google Colab Persistence Virtual Environmentmodules or packages with sys.path:


In [None]:
import sys
sys.path.append("/content/drive/MyDrive/my_colab_env/lib/python3.10/site-packages")

## Workflow After Disconnect

1. Mount Google Drive:
from google.colab import drive
   drive.mount('/content/drive')
2. Activate Virtual Environment:
!source '/content/drive/My Drive/my_env/bin/activate'

**Activation is Key**: Activating the virtual environment is essential. This tells Colab to use the Python interpreter and packages within your environment instead of the default Colab environment.

**Package Updates**: If you need to update a package or install new ones, make sure you do so within the activated virtual environment. This ensures the changes are saved to your Google Drive.

In summary, by using a virtual environment on Google Drive, you create a persistent space for your packages. You only need to activate the environment after a disconnect, and the packages you installed previously will be readily available. This saves you time and effort in reinstalling packages every time you work on your project in Colab.

## Do I need to install the packages I installed in the Google Drive virtual environment after the disconnect for Google Colab?

1. Virtual Environment Isolation: A virtual environment isolates the packages for your project. When you install packages inside the environment, they are stored within the environment's directory on your Google Drive, not in the temporary Colab runtime.

2. Google Drive Persistence: Google Drive is persistent storage. Even if your Colab runtime disconnects, the contents of your Google Drive, including your virtual environment and its installed packages, remain unchanged.

## How to import the package from my virtual environment to my google colab?

#### In order to find the library installed on the virtual environment we should add the path of the virtual environmentsite-packages to colab system path.

In [None]:
# added the path of virtual environment packages to the system path.

import sys
sys.path.append('/content/drive/MyDrive/ai_env/lib/python3.10/site-packages')