# Python Dataset Exploration Notebook
This notebook demonstrates how to install dependencies, import common data science libraries, and access the mounted dataset via the DATA_DIR environment variable or /data.

## Install Dependencies
Installs packages listed in requirements.txt. Add any additional libraries your analysis needs there.

In [None]:
# Install dependencies from requirements.txt (silent)
!pip install -r requirements.txt > /dev/null

In [None]:
# Example: install a package NOT listed in requirements.txt
# Useful for quick experiments
!pip install polars --quiet
import polars as pl

## Import Common Libraries
Example imports of widely used data science libraries.

In [None]:
import os
from pathlib import Path
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
print('pandas version:', pd.__version__)
print('seaborn version:', sns.__version__)

## Inspect Dataset Directory
DATA_DIR is an environment variable pointing to the mounted (read-only) dataset directory. You can also access it via the /data symlink. Always prefer DATA_DIR for portability.

In [None]:
data_dir = Path(os.environ['DATA_DIR'])
print('DATA_DIR =', data_dir)
print('\nListing via DATA_DIR:')
for p in data_dir.iterdir():
    print(' -', p.name)

csvs = sorted(data_dir.glob('*.csv'))
df = None
if csvs:
    first = csvs[0]
    print(f"\nAttempting to load {first.name} ...")
    df = pd.read_csv(first)
    
    if df is not None:
        print('Shape =', df.shape)
        display(df.head())

## Example library usage
Converts the pandas DataFrame (df) to a Polars DataFrame (pl_df), displays the first rows, and computes simple column means using Polars expressions

In [None]:
if 'df' in locals() and isinstance(df, pd.DataFrame) and df is not None:
    pl_df = pl.from_pandas(df)
    print('Converted pandas DataFrame to polars shape =', pl_df.shape)
    print(pl_df.head())
    print('\nColumn means:')
    print(pl_df.select(pl.all().mean()))