# ISS Crew Health Analysis - Workspace Setup

This notebook sets up the development environment for analyzing ISS crew health data and space medicine research.

## Project Overview
- **Objective**: Analyze physiological data from NASA's Life Sciences Data Archive (LSDA)
- **Focus**: Study effects of microgravity on astronaut health
- **Applications**: Predictive modeling for Mars missions and terrestrial medicine

---

## 1. Install Essential Python Packages

Install commonly used Python packages for data science, space medicine analysis, and NASA API integration.

In [1]:
# Install core data science packages
!pip install pandas>=1.5.0 numpy>=1.24.0 scipy>=1.10.0

# Install visualization libraries
!pip install matplotlib>=3.6.0 seaborn>=0.12.0 plotly>=5.15.0

# Install machine learning libraries
!pip install scikit-learn>=1.2.0 statsmodels>=0.14.0

print("✓ Core packages installation completed")

^C
✓ Core packages installation completed
✓ Core packages installation completed


In [None]:
# Install NASA/Space data APIs and web scraping
!pip install requests>=2.28.0 beautifulsoup4>=4.11.0

# Install development and testing tools
!pip install jupyter>=1.0.0 notebook>=6.5.0 pytest>=7.2.0

# Install utilities
!pip install tqdm>=4.64.0 python-dotenv>=1.0.0 openpyxl>=3.1.0

print("✓ Specialized packages installation completed")

## 2. Configure Development Environment

Set up environment variables, configure paths, and establish development settings.

In [None]:
import os
import sys
from pathlib import Path

# Set project root directory
PROJECT_ROOT = Path.cwd().parent if 'notebooks' in str(Path.cwd()) else Path.cwd()
print(f"Project Root: {PROJECT_ROOT}")

# Add src directory to Python path
SRC_DIR = PROJECT_ROOT / 'src'
if str(SRC_DIR) not in sys.path:
    sys.path.insert(0, str(SRC_DIR))
    
# Set up data directories
DATA_DIR = PROJECT_ROOT / 'data'
MODELS_DIR = PROJECT_ROOT / 'models'
REPORTS_DIR = PROJECT_ROOT / 'reports'
NOTEBOOKS_DIR = PROJECT_ROOT / 'notebooks'

print(f"Data Directory: {DATA_DIR}")
print(f"Models Directory: {MODELS_DIR}")
print(f"Reports Directory: {REPORTS_DIR}")
print(f"Notebooks Directory: {NOTEBOOKS_DIR}")

# Environment variables for NASA APIs
os.environ['NASA_API_BASE'] = 'https://osdr.nasa.gov/bio/api/'
os.environ['LSDA_SEARCH_URL'] = 'https://osdr.nasa.gov/bio/repo/search'

print("✓ Environment configuration completed")

## 3. Set Up Project Structure

Create a standard project directory structure with folders for data, notebooks, scripts, and documentation.

In [None]:
# Create project directories
directories = [
    'data/raw',
    'data/processed',
    'data/external',
    'models/trained',
    'models/checkpoints',
    'reports/figures',
    'reports/documents',
    'notebooks/exploratory',
    'notebooks/modeling',
    'src/utils',
    'tests/unit',
    'tests/integration',
    'config'
]

for directory in directories:
    dir_path = PROJECT_ROOT / directory
    dir_path.mkdir(parents=True, exist_ok=True)
    print(f"✓ Created: {directory}")

print("\n✓ Project structure setup completed")

In [None]:
# Create data subdirectories for different types of space medicine data
data_subdirs = [
    'data/raw/crew_health',
    'data/raw/mission_data',
    'data/raw/physiological_metrics',
    'data/processed/bone_density',
    'data/processed/muscle_atrophy',
    'data/processed/cardiovascular',
    'data/external/nasa_lsda',
    'data/external/mission_logs'
]

for subdir in data_subdirs:
    dir_path = PROJECT_ROOT / subdir
    dir_path.mkdir(parents=True, exist_ok=True)
    print(f"✓ Created: {subdir}")

print("\n✓ Specialized data directories created")

## 4. Create Configuration Files

Generate necessary configuration files for the ISS crew health analysis project.

In [None]:
# Create requirements.txt
requirements_content = """# Core data science libraries
pandas>=1.5.0
numpy>=1.24.0
scipy>=1.10.0

# Data visualization
matplotlib>=3.6.0
seaborn>=0.12.0
plotly>=5.15.0

# Machine learning and statistics
scikit-learn>=1.2.0
statsmodels>=0.14.0

# NASA/Space data APIs
requests>=2.28.0
beautifulsoup4>=4.11.0

# Data processing and utilities
jupyter>=1.0.0
notebook>=6.5.0
openpyxl>=3.1.0
xlrd>=2.0.0

# Development and testing
pytest>=7.2.0
black>=22.0.0
flake8>=6.0.0

# Progress bars and utilities
tqdm>=4.64.0
python-dotenv>=1.0.0
"""

with open(PROJECT_ROOT / 'requirements.txt', 'w') as f:
    f.write(requirements_content)
    
print("✓ requirements.txt created")

In [None]:
# Create .gitignore
gitignore_content = """# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/

# IDE
.vscode/
.idea/

# Data files
data/raw/*.csv
data/raw/*.json
data/raw/*.xlsx
data/processed/*.csv
data/processed/*.json
data/external/*.csv
data/external/*.json

# Model files
models/trained/*.joblib
models/trained/*.pkl
models/checkpoints/*.h5

# NASA API keys and secrets
*.key
secrets.json

# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
"""

with open(PROJECT_ROOT / '.gitignore', 'w') as f:
    f.write(gitignore_content)
    
print("✓ .gitignore created")

In [None]:
# Create environment configuration file
env_content = """# NASA APIs Configuration
NASA_API_BASE=https://osdr.nasa.gov/bio/api/
LSDA_SEARCH_URL=https://osdr.nasa.gov/bio/repo/search

# Data Processing Configuration
MAX_RECORDS_PER_REQUEST=100
RATE_LIMIT_DELAY=1

# Model Configuration
RANDOM_SEED=42
TEST_SIZE=0.2
CROSS_VALIDATION_FOLDS=5

# Visualization Configuration
FIGURE_DPI=300
PLOT_STYLE=seaborn-v0_8

# Logging Configuration
LOG_LEVEL=INFO
LOG_FILE=iss_crew_analysis.log
"""

with open(PROJECT_ROOT / '.env.example', 'w') as f:
    f.write(env_content)
    
print("✓ .env.example created")
print("  → Copy this to .env and customize as needed")

In [None]:
# Create configuration for analysis parameters
config_content = """# ISS Crew Health Analysis Configuration

# Data Collection Parameters
HEALTH_KEYWORDS = [
    "bone density", "muscle atrophy", "cardiovascular",
    "physiology", "microgravity", "crew health",
    "medical", "biomedical", "space medicine"
]

# Mission Duration Categories (days)
SHORT_MISSION_MAX = 90
MEDIUM_MISSION_MAX = 180
LONG_MISSION_MAX = 365
EXTENDED_MISSION_MAX = 500

# Mars Mission Parameters
MARS_MISSION_DURATION_DAYS = 780  # ~2.1 years
MARS_MISSION_SCENARIOS = [500, 600, 700, 800, 900]  # Different duration scenarios

# Feature Engineering
OUTLIER_DETECTION_METHOD = "iqr"  # Options: "iqr", "zscore"
NORMALIZATION_METHOD = "standard"  # Options: "standard", "minmax"
MISSING_VALUE_STRATEGY = "median"  # Options: "median", "mean", "constant"

# Model Training
HYPERPARAMETER_TUNING = True
FEATURE_IMPORTANCE_ANALYSIS = True
CROSS_VALIDATION_SCORING = "r2"

# Visualization
CORRELATION_THRESHOLD = 0.5  # Minimum correlation for highlighting
TOP_FEATURES_TO_SHOW = 10
FIGURE_SIZE_DEFAULT = (12, 8)
"""

with open(PROJECT_ROOT / 'config' / 'analysis_config.py', 'w') as f:
    f.write(f'"""\nAnalysis Configuration for ISS Crew Health Study\n"""\n\n{config_content}')
    
print("✓ analysis_config.py created")

## 5. Initialize Version Control

Set up Git repository, configure user settings, and create initial commit.

In [None]:
# Initialize Git repository
!cd {PROJECT_ROOT} && git init

print("✓ Git repository initialized")

In [None]:
# Configure Git user (customize these values)
!cd {PROJECT_ROOT} && git config --local user.name "ISS Crew Health Researcher"
!cd {PROJECT_ROOT} && git config --local user.email "researcher@spacemedicine.org"

print("✓ Git user configuration completed")
print("  → Update user.name and user.email with your actual information")

In [None]:
# Add files to Git
!cd {PROJECT_ROOT} && git add .
!cd {PROJECT_ROOT} && git status

print("\n✓ Files staged for commit")

In [None]:
# Create initial commit
!cd {PROJECT_ROOT} && git commit -m "Initial project setup for ISS crew health analysis"

print("✓ Initial commit created")

## 6. Test Environment Setup

Verify that all installations and configurations are working correctly.

In [None]:
# Test core library imports
try:
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    import plotly.express as px
    print("✓ Core data science libraries imported successfully")
    
    # Print versions
    print(f"  • pandas: {pd.__version__}")
    print(f"  • numpy: {np.__version__}")
    print(f"  • matplotlib: {plt.matplotlib.__version__}")
    print(f"  • seaborn: {sns.__version__}")
    
except ImportError as e:
    print(f"❌ Import error: {e}")

In [None]:
# Test machine learning libraries
try:
    import sklearn
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression
    from sklearn.ensemble import RandomForestRegressor
    print("✓ Machine learning libraries imported successfully")
    print(f"  • scikit-learn: {sklearn.__version__}")
    
except ImportError as e:
    print(f"❌ Import error: {e}")

In [None]:
# Test web scraping and API libraries
try:
    import requests
    from bs4 import BeautifulSoup
    print("✓ Web scraping and API libraries imported successfully")
    print(f"  • requests: {requests.__version__}")
    
except ImportError as e:
    print(f"❌ Import error: {e}")

In [None]:
# Test project modules (if they exist)
try:
    # Try to import custom modules
    if (SRC_DIR / 'data_acquisition.py').exists():
        import data_acquisition
        print("✓ data_acquisition module imported")
    
    if (SRC_DIR / 'data_preprocessing.py').exists():
        import data_preprocessing
        print("✓ data_preprocessing module imported")
    
    if (SRC_DIR / 'exploratory_analysis.py').exists():
        import exploratory_analysis
        print("✓ exploratory_analysis module imported")
        
    if (SRC_DIR / 'predictive_modeling.py').exists():
        import predictive_modeling
        print("✓ predictive_modeling module imported")
    
    print("\n✓ All project modules are accessible")
    
except ImportError as e:
    print(f"⚠️  Project modules not yet available: {e}")
    print("   This is normal if you haven't created the source files yet")

In [None]:
# Test basic functionality with sample data
print("Testing basic functionality...")

# Create sample data
np.random.seed(42)
sample_data = pd.DataFrame({
    'mission_duration': np.random.randint(90, 400, 50),
    'crew_age': np.random.randint(25, 55, 50),
    'bone_density_change': np.random.normal(-5, 2, 50)
})

print(f"✓ Sample dataset created: {len(sample_data)} records")
print(f"  Columns: {list(sample_data.columns)}")

# Test basic visualization
plt.figure(figsize=(8, 5))
plt.scatter(sample_data['mission_duration'], sample_data['bone_density_change'], alpha=0.6)
plt.xlabel('Mission Duration (days)')
plt.ylabel('Bone Density Change (%)')
plt.title('Sample: Mission Duration vs Bone Density Change')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("✓ Basic visualization test completed")

In [None]:
# Test NASA API accessibility (basic connection test)
print("Testing NASA API accessibility...")

try:
    nasa_url = "https://osdr.nasa.gov/bio/repo/search"
    response = requests.get(nasa_url, params={'q': 'test', 'size': 1}, timeout=10)
    
    if response.status_code == 200:
        print("✓ NASA OSDR API is accessible")
        print(f"  Response status: {response.status_code}")
    else:
        print(f"⚠️  NASA API returned status: {response.status_code}")
        
except Exception as e:
    print(f"⚠️  Could not connect to NASA API: {e}")
    print("   This might be due to network restrictions or API changes")

## 7. Environment Setup Summary

Summary of the workspace setup and next steps.

In [None]:
# Print comprehensive setup summary
print("=" * 60)
print("ISS CREW HEALTH ANALYSIS - WORKSPACE SETUP COMPLETE")
print("=" * 60)

print("\n📁 PROJECT STRUCTURE:")
print(f"   Root Directory: {PROJECT_ROOT}")
print("   ├── data/")
print("   │   ├── raw/ (NASA LSDA data)")
print("   │   ├── processed/ (cleaned data)")
print("   │   └── external/ (external datasets)")
print("   ├── src/ (source code modules)")
print("   ├── notebooks/ (Jupyter analysis notebooks)")
print("   ├── models/ (trained ML models)")
print("   ├── reports/ (analysis reports)")
print("   ├── tests/ (unit and integration tests)")
print("   └── config/ (configuration files)")

print("\n🛠️  INSTALLED PACKAGES:")
print("   • Data Science: pandas, numpy, scipy")
print("   • Visualization: matplotlib, seaborn, plotly")
print("   • Machine Learning: scikit-learn, statsmodels")
print("   • APIs & Web: requests, beautifulsoup4")
print("   • Development: jupyter, pytest")

print("\n🚀 NEXT STEPS:")
print("   1. Review and update configuration files (.env, config/)")
print("   2. Run data acquisition to fetch NASA LSDA data")
print("   3. Explore the notebooks/ directory for analysis workflows")
print("   4. Execute main.py for complete pipeline analysis")
print("   5. Check reports/ directory for generated insights")

print("\n📚 KEY RESEARCH FOCUS AREAS:")
print("   • Bone density loss in microgravity")
print("   • Muscle atrophy during long-duration flights")
print("   • Cardiovascular changes in space")
print("   • Predictive modeling for Mars missions")
print("   • Applications to terrestrial medicine")

print("\n" + "=" * 60)
print("WORKSPACE READY FOR ISS CREW HEALTH RESEARCH! 🧑‍🚀")
print("=" * 60)