# Notebook 0: Setup Google Colab Environment

This notebook sets up the environment for Google Colab or local execution.

## Steps:
1. Check runtime environment (Colab vs Local)
2. Mount Google Drive (if on Colab)
3. Install required packages
4. Verify project structure
5. Download NLTK data
6. Test imports

In [1]:
# Import necessary libraries
import sys
import os

## 1. Check Runtime Environment

In [2]:
# Check if running in Google Colab
is_colab = 'google.colab' in sys.modules
print(f"Running in Google Colab: {is_colab}")
print(f"Python version: {sys.version}")

Running in Google Colab: False
Python version: 3.12.3 (main, Jan  8 2026, 11:30:50) [GCC 13.3.0]


## 2. Mount Google Drive (if on Colab)

In [3]:
if is_colab:
    from google.colab import drive
    print("Mounting Google Drive...")
    drive.mount('/content/drive')
    print("\n✓ Google Drive mounted successfully!")
    
    # Navigate to project folder
    # IMPORTANT: Update this path to match your Google Drive folder structure
    project_folder = '/content/drive/MyDrive/analisis-sentiment-pelatih-baru-chelsea-liam-rosenior'
    
    if os.path.exists(project_folder):
        os.chdir(project_folder)
        print(f"\n✓ Changed directory to: {project_folder}")
    else:
        print(f"\n⚠️ WARNING: Project folder not found at: {project_folder}")
        print("Please upload the project to this path in Google Drive.")
else:
    print("Not running in Colab - using local environment.")

Not running in Colab - using local environment.


## 3. Verify Project Structure

In [4]:
# Check current directory
current_dir = os.getcwd()
print(f"Current working directory: {current_dir}")

# List directories
print("\nProject structure:")
for item in os.listdir('.'):
    item_type = "DIR " if os.path.isdir(item) else "FILE"
    print(f"  [{item_type}] {item}")

Current working directory: /home/emmanuelabayor/projects/analisis-sentiment-pelatih-baru-chelsea-liam-rosenior/notebooks

Project structure:
  [FILE] UNIFIED_complete_pipeline.ipynb
  [FILE] 5_results_visualization.ipynb
  [DIR ] data
  [FILE] 3_sentiment_labeling.ipynb
  [FILE] 0_setup_colab.ipynb
  [DIR ] .ipynb_checkpoints
  [FILE] 1_data_preprocessing.ipynb
  [DIR ] outputs
  [FILE] 2_exploratory_analysis.ipynb
  [FILE] 4_ml_modeling.ipynb


## 4. Install Required Packages

In [5]:
# Install packages (only needed in Colab)
if is_colab:
    print("Installing required packages...")
    !pip install -q pandas numpy scikit-learn nltk vaderSentiment langdetect matplotlib seaborn wordcloud plotly ipython
    print("\n✓ All packages installed successfully!")
else:
    print("Not running in Colab. Please ensure packages are installed locally:")
    print("pip install -r requirements.txt")

Not running in Colab. Please ensure packages are installed locally:
pip install -r requirements.txt


## 5. Download NLTK Data

In [6]:
import nltk

print("Downloading NLTK data...")

# Required NLTK data
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('punkt')

print("\n✓ NLTK data downloaded successfully!")

Downloading NLTK data...

✓ NLTK data downloaded successfully!


[nltk_data] Downloading package stopwords to
[nltk_data]     /home/emmanuelabayor/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /home/emmanuelabayor/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /home/emmanuelabayor/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /home/emmanuelabayor/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


## 6. Test Imports

In [7]:
# Test all required imports
print("Testing imports...\n")

try:
    import pandas as pd
    import numpy as np
    from sklearn.model_selection import train_test_split
    from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
    from sklearn.linear_model import LogisticRegression
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
    from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
    from langdetect import detect
    import matplotlib.pyplot as plt
    import seaborn as sns
    from wordcloud import WordCloud
    
    print("✓ pandas")
    print("✓ numpy")
    print("✓ scikit-learn")
    print("✓ vaderSentiment")
    print("✓ langdetect")
    print("✓ matplotlib")
    print("✓ seaborn")
    print("✓ wordcloud")
    print("\n✓ All imports successful!")
    
except ImportError as e:
    print(f"\n✗ Import error: {e}")
    print("Please ensure all packages are installed.")

Testing imports...

✓ pandas
✓ numpy
✓ scikit-learn
✓ vaderSentiment
✓ langdetect
✓ matplotlib
✓ seaborn
✓ wordcloud

✓ All imports successful!


## 7. Import Project Utilities

In [8]:
# Import custom utility modules
import sys
import os

# Add project root to Python path for proper module imports
# This allows 'from src import utils' to work correctly
project_root = os.path.dirname(os.getcwd())  # Go up one level from notebooks/
if project_root not in sys.path:
    sys.path.insert(0, project_root)

try:
    from src import utils, preprocessing, feature_engineering, models
    print("\n[OK] Custom modules imported successfully!")
    print("  - utils")
    print("  - preprocessing")
    print("  - feature_engineering")
    print("  - models")
except ImportError as e:
    print(f"\n[ERROR] Error importing custom modules: {e}")
    print("Make sure you are in the project root directory.")


[OK] Custom modules imported successfully!
  - utils
  - preprocessing
  - feature_engineering
  - models


## 8. Verify Data File Exists

In [9]:
# Check if data file exists
import pandas as pd

# Handle both local (running from notebooks/) and Colab environments
project_root = os.path.dirname(os.getcwd())
data_path = os.path.join(project_root, 'data', 'raw', 'tweets.csv')

if os.path.exists(data_path):
    df = pd.read_csv(data_path)
    print(f"\n[OK] Data file found: {data_path}")
    print(f"  - Total tweets: {len(df)}")
    print(f"  - Columns: {list(df.columns)}")
    print(f"\n  Sample tweets:")
    print(df[["Tweet Content"]].head(3))
else:
    print(f"\n[ERROR] Data file not found: {data_path}")
    print("Please ensure tweets.csv is in the data/raw/ folder.")


[OK] Data file found: /home/emmanuelabayor/projects/analisis-sentiment-pelatih-baru-chelsea-liam-rosenior/data/raw/tweets.csv
  - Total tweets: 408
  - Columns: ['Tweet Link', 'Author Handle', 'Tweet Content', 'Views', 'Likes', 'Retweets', 'Replies', 'Tweet Creation Date', 'Scraped Date']

  Sample tweets:
                                       Tweet Content
0  We tried to stop it from overthinking.\n\nWe f...
1                                                Waw
2                                 @grok\n who is he?


## 9. Ensure Output Directories Exist

In [10]:
# Ensure all output directories exist
directories = [
    'data/processed',
    'outputs/figures',
    'outputs/tables',
    'outputs/models',
    'outputs/metrics'
]

for directory in directories:
    os.makedirs(directory, exist_ok=True)

print("\n✓ All output directories ensured:")
for directory in directories:
    print(f"  - {directory}")


✓ All output directories ensured:
  - data/processed
  - outputs/figures
  - outputs/tables
  - outputs/models
  - outputs/metrics


## ✅ Setup Complete!

Your environment is now ready for analysis. You can proceed to the next notebook:

→ **`1_data_preprocessing.ipynb`**