# SharePoint List Items → DataFrame (graphfw) — Usage Guide

Dieses Notebook zeigt **alle wichtigen Aufrufvarianten** für
`graphfw.domains.sharepoint.lists.items.list_df` inklusive:

- Spaltenselektion (`columns="*"` oder Liste),
- Aliase & explizites Mapping,
- OData (`filter`, `orderby`, `search`, `expand`),
- Meta-Namen (CreatedBy/ModifiedBy),
- GUID-Stripping, deterministische Spaltenreihenfolge,
- Unknown-Fields-Handling,
- Type-Coercion & `tz_policy`,
- `top`, `page_size_hint`,
- Diagnostics/`info`, Schema-Dump,
- CSV-Export mit `io/writers/csv_writer.py`.

> **Voraussetzungen**: Das `graphfw`-Package ist installiert bzw. im Python-Pfad (z. B. `pip install -e .`).  
> Außerdem existiert eine gültige `config.json` mit Azure AD App-Credentials.


In [None]:
# --- Setup & Imports ---------------------------------------------------------
from graphfw.core.auth import TokenProvider
from graphfw.core.http import GraphClient
from graphfw.core.logbuffer import LogBuffer

from graphfw.domains.sharepoint.lists.items import list_df
from graphfw.io.writers.csv_writer import build_csv_path, write_csv

# Pfad zu deiner config.json (TenantID, ClientID, ClientSecret)
CONFIG_JSON = r"C:\python\Scripts\config.json"  # <-- anpassen!

# Beispiel-Parameter (bitte anpassen):
SITE_URL   = "https://contoso.sharepoint.com/sites/TeamA"  # <-- anpassen!
LIST_TITLE = "My Custom List"                               # <-- anpassen!

# GraphClient initialisieren
tp = TokenProvider.from_json(CONFIG_JSON)
gc = GraphClient(tp)

# Optional: zentraler Log-Puffer für dieses Notebook
log = LogBuffer()


## Parameter-Referenz (`list_df`)

| Parameter | Typ | Default | Beschreibung |
|---|---|---|---|
| `gc` | `GraphClient` | — | Authentifizierter Client mit Retry/Backoff/Paging |
| `site_url` | `str` | — | z. B. `https://tenant.sharepoint.com/sites/TeamA` |
| `list_title` | `str` | — | Anzeigename der Liste |
| `columns` | `str | Sequence[str] | None` | `"*"` | Interne Spaltennamen oder `"*"` |
| `aliases` | `Sequence[str] | None` | `None` | Aliasnamen parallel zu `columns` |
| `mapping` | `Sequence[dict] | None` | `None` | Explizites Mapping `{"source","alias"}` |
| `filter` | `str | None` | `None` | OData-Filter; wird automatisch zu `fields/<expr>` erweitert |
| `orderby` | `str | None` | `None` | z. B. `fields/Modified desc` |
| `search` | `str | None` | `None` | OData `$search` |
| `expand` | `str | Sequence[str] | Sequence[Expand] | None` | `None` | Zusätzliche Expands neben `fields` |
| `top` | `int | None` | `None` | Clientseitiges Limit |
| `page_size_hint` | `int | None` | `None` | Hinweis für Seitengröße (Performance) |
| `tz_policy` | `str` | `"utc+2"` | Normalisierung von Datumsfeldern (naiv) |
| `type_map` | `dict[str,str] | None` | `None` | z. B. `{ "Modified": "datetime", "Amount": "float" }` |
| `unknown_fields` | `"keep" | "drop"` | `"keep"` | Verhalten bei `columns="*"` |
| `add_meta` | `bool` | `True` | Top-Level-Metadaten (id, sharepointIds, created/modified timestamps) |
| `add_created_modified_names` | `bool | None` | `None` | `None`=auto (bei `*` True) |
| `include_weburl` | `bool` | `False` | Top-Level `webUrl` zusätzlich |
| `include_content_type` | `bool` | `False` | Top-Level `contentType` zusätzlich |
| `debug_schema_dump` | `bool` | `False` | Schema & Beispiel-Item dumpen |
| `pause_on_missing` | `bool` | `False` | Interaktive Pause bei fehlenden Spalten |
| `log` | `LogBuffer | None` | `None` | Optionaler Puffer für strukturierte Logs |


### 1) Minimalbeispiel – alle Felder (`columns="*"`)

In [None]:
df1, info1 = list_df(
    gc,
    site_url=SITE_URL,
    list_title=LIST_TITLE,
    columns="*",              # alle Felder
    tz_policy="utc+2",        # DateTime normalisieren (naiv)
    log=log,
)
print("Rows:", len(df1))
display(df1.head(10))
info1

### 2) Konkrete Spaltenliste inkl. Meta-Namen (CreatedBy/ModifiedBy)

In [None]:
df2, info2 = list_df(
    gc,
    site_url=SITE_URL,
    list_title=LIST_TITLE,
    columns=["ID","Title","Modified","Created","GUID","createdBy","lastModifiedBy"],
    tz_policy="utc+2",
    log=log,
)
display(df2.head(10))
info2

### 3) Spaltenliste mit Aliases (gleiche Reihenfolge wie `columns`)

In [None]:
df3, info3 = list_df(
    gc,
    site_url=SITE_URL,
    list_title=LIST_TITLE,
    columns=["ID","Title","GUID"],
    aliases=["ItemID","Titel","GUID"],
    log=log,
)
display(df3.head(10))
info3

### 4) Explizites Mapping (inkl. Top-Level & heuristische Auflösung)

In [None]:
mapping = [
    {"source": "id", "alias": "ItemID"},
    {"source": "Title", "alias": "Titel"},          # → fields.Title
    {"source": "GUID", "alias": "GUID"},            # strip {}
    {"source": "createdBy", "alias": "CreatedByName"},
    {"source": "lastModifiedBy", "alias": "ModifiedByName"},
    {"source": "webUrl", "alias": "Link"},
]
df4, info4 = list_df(
    gc,
    site_url=SITE_URL,
    list_title=LIST_TITLE,
    mapping=mapping,
    log=log,
)
display(df4.head(10))
info4

### 5) OData – `filter`, `orderby`, `search`, `expand`

In [None]:
df5, info5 = list_df(
    gc,
    site_url=SITE_URL,
    list_title=LIST_TITLE,
    columns=["ID","Title","Modified","GUID"],
    filter="Status eq 'Open'",                 # wird zu fields/Status ...
    orderby="fields/Modified desc",
    search=None,                                 # optional
    expand=None,                                 # zusätzliche expands (neben 'fields')
    log=log,
)
display(df5.head(10))
info5

### 6) Unknown-Fields-Handling bei `"*"`

In [None]:
# keep (Default): nimmt alle unbekannten Felder mit (Diagnosefreundlich)
df6a, info6a = list_df(gc, site_url=SITE_URL, list_title=LIST_TITLE, columns="*", unknown_fields="keep", log=log)
print("keep → cols:", len(df6a.columns))

# drop: nimmt nur bekannte/kerndefinierte Felder
df6b, info6b = list_df(gc, site_url=SITE_URL, list_title=LIST_TITLE, columns="*", unknown_fields="drop", log=log)
print("drop → cols:", len(df6b.columns))

### 7) Type-Coercion & `tz_policy`

In [None]:
# Beispiel: Modified → datetime, Amount → float
df7, info7 = list_df(
    gc,
    site_url=SITE_URL,
    list_title=LIST_TITLE,
    columns=["ID","Title","Modified","Created","Amount"],
    type_map={"Modified": "datetime", "Created": "datetime", "Amount": "float"},
    tz_policy="utc",   # oder "utc+2", "local"
    log=log,
)
display(df7.dtypes)
display(df7.head(10))

### 8) `top` & `page_size_hint`

In [None]:
df8, info8 = list_df(
    gc,
    site_url=SITE_URL,
    list_title=LIST_TITLE,
    columns=["ID","Title","Modified"],
    top=25,                 # nur 25 Items (clientseitig)
    page_size_hint=100,     # Seiten-Hinweis (Performance/Throttling)
    log=log,
)
print("Rows:", len(df8))
info8

### 9) Top-Level Zusatzfelder `webUrl`, `contentType`

In [None]:
df9, info9 = list_df(
    gc,
    site_url=SITE_URL,
    list_title=LIST_TITLE,
    columns=["ID","Title","GUID"],
    include_weburl=True,
    include_content_type=True,
    log=log,
)
display(df9.head(10))
info9

### 10) Debug: Schema-Dump & Beispiel-Item

In [None]:
df10, info10 = list_df(
    gc,
    site_url=SITE_URL,
    list_title=LIST_TITLE,
    columns=["UnbekannteSpalte","ID","Title"],  # absichtlich eine fehlende
    debug_schema_dump=True,
    pause_on_missing=False,    # in Notebooks sinnvollerweise False
    log=log,
)
display(df10.head(5))
info10

### 11) CSV-Export mit `build_csv_path()` & `write_csv()`

In [None]:
import os
from pathlib import Path

out_dir = Path("./exports")
out_dir.mkdir(parents=True, exist_ok=True)

df_csv, info_csv = list_df(gc, site_url=SITE_URL, list_title=LIST_TITLE, columns="*", log=log)
csv_path = build_csv_path(df_csv, site_url=SITE_URL, list_title=LIST_TITLE, out_dir=out_dir, timestamp=True)

write_csv(df_csv, csv_path)
print("CSV gespeichert:", csv_path)

### 12) LogBuffer – gesammelte Meldungen als DataFrame

In [None]:
log_df = log.to_df()
display(log_df.head(20))