This guide will help you install Python on your system to run this project.
- A computer running Windows, macOS, or Linux
- Administrator/root privileges (for some installation steps)
-
Download the latest Python installer from python.org
-
Run the installer
-
Check "Add Python to PATH" at the bottom of the first page
-
Click "Install Now"
-
After installation completes, verify by opening Command Prompt and running:
python --version
-
Update package list:
sudo apt update sudo apt-get install python3
- Go to Google Cloud Console
- Click "Create Project"
- Enter a project name and click "Create"
- In your project dashboard, go to "APIs & Services" > "Library"
- Search for "Google Drive API"
- Click "Enable"
- Go to "IAM & Admin" > "Service Accounts"
- Click "Create Service Account"
- Enter:
- Service account name
- Service account ID
- Description (optional)
- Click "Create and Continue"
- Assign "Service Account Token Creator" role
- Click "Continue" then "Done"
- In your service account list, click the account you created
- Go to "Keys" tab
- Click "Add Key" > "Create new key"
- Select JSON format and click "Create"
- The JSON key file will download automatically - keep this secure!
- Create a folder in your Google Drive
- Right-click the folder and select "Share"
- Add your service account email (found in the JSON file as "client_email")
- Set permission to "Editor"
- Place the downloaded JSON file in your project's secure location
- Add to
.gitignore: In your Python code, authenticate with: - update the scraper.py file to access the json file from correct location
SERVICE_ACCOUNT_FILE = 'doc/service_account.json'- first take the client email from service_account.json
- drive-upload@projectalfa-abcd.iam.gserviceaccount.com is your client email
{
"type": "service_account",
"project_id": "abcd-abcd",
"private_key_id": "abcd",
"private_key": "abcd",
"client_email": "drive-upload@projectalfa-abcd.iam.gserviceaccount.com",
"client_id": "abcd",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "abcd",
"universe_domain": "googleapis.com"
}- Go to Google Drive
- Click "+ New" → "Folder"
- Name it "osac" (or your preferred base name)
- Right-click the "osac" folder
- Select "Share"
- Add your service account's client email (from the JSON credentials)
- Set permission to "Editor"
- Click "Send"
- Open the "osac" folder
- Create two new folders inside it:
- One named "html"
- One named "csv"
The structure should look like this:
osac/
├── html/
└── csv/- take the ids from both html and html folder url example https://drive.google.com/drive/u/0/folders/1JnNAo9No8etBRoNKIFvviuiBXAZEeDzH
- 1JnNAo9No8etBRoNKIFvviuiBXAZEeDzH is your id.
- update the scraper.py to use new google drive folder id
CSV_FOLDER_ID_DRIVE = "1NrmVf1DfAAqxkg72nFLCd4hwGciVVFbJ"
HTML_FOLDER_ID_DRIVE = "1JnNAo9No8etBRoNKIFvviuiBXAZEeDzH"- Python 3.10+ installed
- pip package manager (comes with Python 3.4+)
- Access to command line/terminal
# Create virtual environment
python -m venv venv
# Activate environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install requirements
pip install -r requirements.txt# Navigate to your project directory
cd path/to/your/project
# Install all required packages
pip install -r requirements.txt
python -m spacy download en_core_web_smpython scraper.py- Make it executable
scraper.sh:
chmod +x scraper.sh
./scraper.shgit lfs install
git clone github.com/username/shiny_app.git
git pull
git git lfs pull