# CSV → Graphs (Quick Start)

Use this notebook to load any CSV and generate quick visualizations.

## How to use
1. Put your CSV file in the same folder as this notebook.
2. In **Step 2**, change `csv_path` to your filename (e.g., `"my_data.csv"`).
3. Run the cells from top to bottom.
4. Use the helper functions in **Step 4** to make the charts you want.

**Tip:** If your data comes from Apple Numbers, export as `File → Export To → CSV…` first.

In [1]:
# Step 1 — Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

%matplotlib inline
pd.set_option('display.max_rows', 20)
pd.set_option('display.max_columns', 50)
print('Versions → pandas', pd.__version__, '| numpy', np.__version__)

Versions → pandas 2.3.2 | numpy 2.3.2


In [2]:
# Step 2 — Load your CSV
csv_path = 'energy_log_2.csv'  # ← CHANGE THIS to your file name, e.g., 'energy.csv'

def smart_read_csv(path):
    """Read CSV and auto-detect common date columns."""
    df = pd.read_csv(path)
    # Try to parse likely date/time columns
    for col in df.columns:
        if df[col].dtype == object:
            try:
                s = pd.to_datetime(df[col], errors='raise', utc=False)
                # Heuristic: if at least 80% of non-null values parsed, keep it
                ok_ratio = s.notna().mean() if len(s) else 0
                if ok_ratio >= 0.8:
                    df[col] = s
            except Exception:
                pass
    return df

df = smart_read_csv(csv_path)
print('Rows:', len(df), '| Columns:', len(df.columns))
df.head()

Rows: 2095 | Columns: 5


Unnamed: 0,Timestamp,Voltage (V),Current (A),Power (W),Energy So Far (Joules)
0,2025-08-29 10:31:51,0.879731,0.78584,0.691327,0.0691
1,2025-08-29 10:31:51,0.878925,0.84835,0.745636,0.1437
2,2025-08-29 10:31:51,0.74945,0.58938,0.441711,0.1879
3,2025-08-29 10:31:52,0.878925,1.74135,1.530515,0.3409
4,2025-08-29 10:31:52,0.879193,1.76814,1.554537,0.4964


In [3]:
# Step 3 — Quick overview (compatible with old/new pandas)
import inspect

try:
    # Prefer using datetime_is_numeric if this pandas supports it
    sig = inspect.signature(type(df).describe)
    if "datetime_is_numeric" in sig.parameters:
        display(df.describe(include='all', datetime_is_numeric=True))
    else:
        display(df.describe(include='all'))
except TypeError:
    # Fallback for older pandas that doesn't accept the arg
    display(df.describe(include='all'))

df.info()

Unnamed: 0,Timestamp,Voltage (V),Current (A),Power (W),Energy So Far (Joules)
count,2095,2095.0,2095.0,2095.0,2095.0
mean,2025-08-29 10:34:31.450596352,0.853777,2.903682,2.518888,199.218647
min,2025-08-29 10:31:51,0.747569,0.57152,0.427558,0.0691
25%,2025-08-29 10:33:23.500000,0.878925,1.00016,0.879603,50.02895
50%,2025-08-29 10:34:34,0.879193,3.00941,2.535936,164.7843
75%,2025-08-29 10:35:43,0.879462,4.23282,3.722036,333.93755
max,2025-08-29 10:36:51,0.883223,8.1263,7.146771,527.707
std,,0.051412,1.682657,1.494295,162.628039


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2095 entries, 0 to 2094
Data columns (total 5 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Timestamp               2095 non-null   datetime64[ns]
 1   Voltage (V)             2095 non-null   float64       
 2   Current (A)             2095 non-null   float64       
 3   Power (W)               2095 non-null   float64       
 4   Energy So Far (Joules)  2095 non-null   float64       
dtypes: datetime64[ns](1), float64(4)
memory usage: 82.0 KB


In [4]:
# Step 4 — Plot helpers (pick and use the ones you need)

def numeric_cols(dataframe):
    return dataframe.select_dtypes(include=[np.number]).columns.tolist()

def datetime_cols(dataframe):
    return dataframe.select_dtypes(include=["datetime64[ns]", "datetime64[ns, UTC]"]).columns.tolist()

def line_over_time(dataframe, time_col=None, y_cols=None, title='Line over time'):
    """Plot one or more numeric columns against a time column."""
    if time_col is None:
        dcols = datetime_cols(dataframe)
        if not dcols:
            raise ValueError('No datetime-like column found. Specify time_col.')
        time_col = dcols[0]
    if y_cols is None:
        y_cols = numeric_cols(dataframe)
    if not y_cols:
        raise ValueError('No numeric columns to plot.')
    for col in y_cols:
        plt.figure()
        dataframe.plot(x=time_col, y=col, kind='line', legend=False)
        plt.title(f"{title}: {col}")
        plt.xlabel(time_col)
        plt.ylabel(col)
        plt.tight_layout()
        plt.show()

def bar_top_n(dataframe, by_col, value_col, n=10, ascending=False, title='Top N'):
    """Bar chart of top N categories by a numeric value."""
    tmp = dataframe[[by_col, value_col]].dropna()
    agg = tmp.groupby(by_col, dropna=False)[value_col].sum().sort_values(ascending=ascending)
    top = agg.head(n)
    plt.figure()
    top.plot(kind='bar')
    plt.title(f"{title}: {value_col} by {by_col}")
    plt.xlabel(by_col)
    plt.ylabel(value_col)
    plt.tight_layout()
    plt.show()

def histograms(dataframe, bins=30):
    """One histogram per numeric column."""
    for col in numeric_cols(dataframe):
        plt.figure()
        dataframe[col].dropna().plot(kind='hist', bins=bins)
        plt.title(f"Histogram: {col}")
        plt.xlabel(col)
        plt.tight_layout()
        plt.show()

def scatter(dataframe, x, y, title=None):
    plt.figure()
    dataframe.plot(kind='scatter', x=x, y=y)
    plt.title(title or f"Scatter: {x} vs {y}")
    plt.tight_layout()
    plt.show()

def correlation_heatmap(dataframe):
    """Simple correlation heatmap using matplotlib only (no seaborn)."""
    num = dataframe[numeric_cols(dataframe)]
    corr = num.corr(numeric_only=True)
    fig = plt.figure(figsize=(6,5))
    plt.imshow(corr, interpolation='nearest')
    plt.xticks(range(len(corr.columns)), corr.columns, rotation=90)
    plt.yticks(range(len(corr.columns)), corr.columns)
    plt.colorbar()
    plt.title('Correlation heatmap')
    plt.tight_layout()
    plt.show()

print('Helpers ready. See examples below ↓')

Helpers ready. See examples below ↓


In [None]:
# Step 5 — Examples (uncomment the ones you want)

# Example A: line plots for each numeric column over first datetime column
# line_over_time(df)

# Example B: top 10 categories by a value column
# bar_top_n(df, by_col='category', value_col='amount', n=10, title='Top categories')

# Example C: histograms for all numeric columns
# histograms(df, bins=30)

# Example D: scatter between two numeric columns
# scatter(df, x='voltage', y='current', title='Voltage vs Current')

# Example E: correlation heatmap of numeric columns
# correlation_heatmap(df)

### Notes
- If your date column wasn't auto-detected, set it manually: `df['timestamp'] = pd.to_datetime(df['timestamp'])`.
- If your CSV uses `;` as separator, use: `pd.read_csv(csv_path, sep=';')`.
- If you have decimals with commas (e.g., `"3,14"`), add `decimal=','` to `read_csv`.