-Normalize to size of student body-college wide aggregate counts and such
-Gender breakdown: map to USA
-Race breakdown: map to USA

1.Top/Bottom z-scores global vs peer by category
Weight z-scores by priorities

2.Relationship between expenditures and low/high z-scores. 

3.Similarity Index or !0 schools on top

In [1]:
import polars as pl
import marimo as mo
import pprint

In [2]:
labels_df = (
    pl.read_csv("Labels.csv")
    .filter(~pl.col("VariableName").str.starts_with("State"))
    .with_columns(pl.col("Value").cast(pl.Int64))
)

In [5]:
data_df = pl.read_csv("data.csv")
mobility_df = pl.read_csv("Third-Way-Economic-Mobility-Index-2024.csv", columns=["Institution Name", "Total Net Price for Low-Income Students", "Median Earnings 10 Years After Initial Enrollment for Low-Income Students", "EMI Score (low-income percentile rank*percentage pell)", "EMI Ranking "])


mapping_dict = {
    var: dict(zip(sub_df["Value"], sub_df["ValueLabel"]))
    for var, sub_df in labels_df.group_by("VariableName")
}

data_df = data_df.with_columns(
    [
        pl.col(col).cast(pl.Utf8).replace(mapping)
        for col, mapping in mapping_dict.items()
    ]
)

premium_df = pl.read_csv("Third-Way-Price-to-Earnings-Premiums-2024.csv", 
    columns=[
        "Institution Name",
        "Total Net Price for All Students",
        "Median Earnings 10 Years After Initial Enrollment for All Students",
        "Price-to-Earnings Premium for the Median Student",
        "Median Earnings 10 Years After Initial Enrollment for Low-Income Students",
        "Price-to-Earnings Premium for Low-Income Students",
    ]
)

mobility_df = pl.read_csv(
    "Third-Way-Economic-Mobility-Index-2024.csv",
    columns=[
        "Institution Name",
        "Total Net Price for Low-Income Students",
        "Median Earnings 10 Years After Initial Enrollment for Low-Income Students",
        "EMI Score (low-income percentile rank*percentage pell)",
        "EMI Ranking ",
    ],
)

In [97]:
pprint.pprint(data_df.columns)

['UnitID',
 'Institution Name',
 'Institution (entity) name (HD2023)',
 'Historically Black College or University (HD2023)',
 'Tribal college (HD2023)',
 'Carnegie Classification 2021: Basic (HD2023)',
 'Institution grants a medical degree (HD2023)',
 'State abbreviation (HD2023)',
 'Carnegie Classification 2021: Undergraduate Profile (HD2023)',
 'Primary public control (IC2023)',
 'Yellow Ribbon Program (officially known as Post-9/11 GI Bill, Yellow Ribbon '
 'Program) (IC2023)',
 'Dedicated point of contact for support services for veterans, military '
 'servicemembers, and their families (IC2023)',
 'Credit for military training (IC2023)',
 'Recognized student veteran organization (IC2023)',
 'Member of Department of Defense Voluntary Educational Partnership Memorandum '
 'of Understanding (IC2023)',
 'Percent of undergraduates  who are formally registered as students with '
 'disabilities  when percentage is more than 3 percent (IC2023)',
 'Percent indicator of undergraduates forma

Grouping Columns
Based on TU's [strategic plan](https://www.towson.edu/about/mission/strategic-plan/targets-2030.html), create the following strategic categories:

- Student Success
- Access & Equity
- Academic Resources
- Career & Economic Outcomes
- Innovation & Research
- Sustainability & Efficiency
- Community Engagement

In [29]:
def cols_from_range(start: int, end: int, cols: list[str]) -> list[str]:
    return cols[start-1:end]

In [68]:
groupings = {
    "Student Success": cols_from_range(30, 34, data_df.columns) + [data_df.columns[105]],
    "Access & Equity": cols_from_range(23, 29, data_df.columns) + cols_from_range(86, 97, data_df.columns),
    "Academic Resources": cols_from_range(66, 73, data_df.columns) + [data_df.columns[106]],
    "Career & Economic Outcomes": cols_from_range(18, 22, data_df.columns),
    "Innovation & Research": cols_from_range(35, 65, data_df.columns), 
    "Sustainability & Efficiency": cols_from_range(35, 65, data_df.columns),
    # "Community Engagement": []  # TBD or qualitative for now
}

In [69]:
for group, cols in groupings.items():
    print(f"\n{group} ({len(cols)} columns):")
    for col in cols:
        print("  ", col, data_df.select(col).dtypes)


Student Success (6 columns):
   Graduation rate - Bachelor degree within 4 years  total (DRVGR2023) [Int64]
   Graduation rate - Bachelor degree within 5 years  total (DRVGR2023) [Int64]
   Graduation rate - Bachelor degree within 6 years  total (DRVGR2023) [Int64]
   Transfer-out rate - Bachelor cohort (DRVGR2023) [Int64]
   Pell Grant recipients - Bachelor's degree rate within 6 years (DRVGR2023) [Int64]
   Full-time retention rate  2023 (EF2023D) [Int64]

Access & Equity (19 columns):
   Percent of undergraduate students awarded federal  state  local  institutional or other sources of grant aid (SFA2223) [Int64]
   Percent of undergraduate students awarded Federal Pell grants (SFA2223) [Int64]
   Average amount Federal Pell grant aid awarded to undergraduate students (SFA2223) [Int64]
   Average amount of federal  state  local  institutional or other sources of grant aid awarded to undergraduate students (SFA2223) [Int64]
   Percent of undergraduate students awarded federal student

>String col: Core expenses  total dollars (FASB) (DRVF2023) [String]

In [70]:
import json
with open("strategic_groupings.json", "w") as f:
    json.dump(groupings, f)

#### **Aggregating Categorical Metrics**
Calculate the z-score for every metric in each grouping and sum them

In [77]:
def z_score(df: pl.DataFrame, cols: list[str]) -> pl.DataFrame:
    return df.select([
        ((pl.col(col) - df.select(pl.col(col).mean()).item()) / df.select(pl.col(col).std()).item()).alias(col)
        for col in cols
        if df.schema[col].is_numeric()
    ])

In [78]:
scored_df = data_df
for group in groupings:
    cols = groupings[group]
    if not cols:
        continue
    try:
        normalized = z_score(data_df, cols)
        category_score = (
            normalized.select(pl.sum_horizontal(normalized.columns).alias(group))
            .with_columns((pl.col(group) / len(cols)).alias(group))
        )
        scored_df = scored_df.with_columns(category_score)
        print(scored_df.filter(pl.col("Institution Name") == "Towson University").select(["Institution Name", group]))
    except Exception as e:
        print(f"Skipping {col} - Error: {e}")
    

shape: (1, 2)
┌───────────────────┬─────────────────┐
│ Institution Name  ┆ Student Success │
│ ---               ┆ ---             │
│ str               ┆ f64             │
╞═══════════════════╪═════════════════╡
│ Towson University ┆ 0.627068        │
└───────────────────┴─────────────────┘
shape: (1, 2)
┌───────────────────┬─────────────────┐
│ Institution Name  ┆ Access & Equity │
│ ---               ┆ ---             │
│ str               ┆ f64             │
╞═══════════════════╪═════════════════╡
│ Towson University ┆ 0.363294        │
└───────────────────┴─────────────────┘
shape: (1, 2)
┌───────────────────┬────────────────────┐
│ Institution Name  ┆ Academic Resources │
│ ---               ┆ ---                │
│ str               ┆ f64                │
╞═══════════════════╪════════════════════╡
│ Towson University ┆ -0.008545          │
└───────────────────┴────────────────────┘
shape: (1, 2)
┌───────────────────┬────────────────────────────┐
│ Institution Name  ┆ Career & E

**TU Standings**

Student Success
>"With a z-score of 0.63 and in the 75th percentile, Towson outperforms most of its peers on student success metrics like graduation, retention, and Pell Grant completion — but it’s not crushing the top of the pack either."

## Weighted Total Score
TU's 2020-2030 strategic plan focuses on Student Success and Access & Equity so we could weigh these higher. Looking on Reddit and other forums, many users mentioned US News and Forbes. Apparently theres a distinction between National and Other Universities. TU is National. Text analysis of the 2020-2030 Strategic plan resulted in the following ranking

<ul>
    <li>Student Success: 0.23</li>
    <li>Access & Equity: 0.18</li>
    <li>Community Engagement: 0.17</li>
    <li>Career & Economic Outcomes:0.14</li>
    <li>Innovation & Research: 0.12</li>
    <li>Academic Resources: 0.10</li>
    <li>Sustainability & Efficiency: 0.06</li>
</ul>

**US News Weights**
<ul>
    <li>Peer assessment(?): 0.20</li>
    <li>SS Graduation rates: 0.16 </li>
    <li>SS Graduation rate performance: 0.10</li>
    <li>AR Financial resources per student: 0.08</li>
    <li>AR Faculty salaries: 0.06</li>
    <li>AE Pell graduation rates: 0.055</li>
    <li>AE Pell graduation performance: 0.055</li>
    <li>SS First-year retention rates: 0.05</li>
    <li>CEO College grads earning more than a high school grad: 0.05</li>
    <li>CEO Borrower debt: 0.05</li>
    <li>AR Standardized tests: 0.05</li>
    <li>AR Student-faculty ratio: 0.03</li>
    <li>IR Full-time faculty: 0.02</li>
    <li>IR Citations per publication: 0.0125</li>
    <li>IR Field-Weighted Citation Impact: 0.0125</li>
    <li>IR Publication share in the Top 5% of Journals by CiteScore: 0.01</li>
    <li>IR Publication share in the Top 25% of Journals by CiteScore: 0.005</li>
</ul>

<ul>
    <li>Student Success: 0.31</li>
    <li>Academic Resources: 0.22</li>
    <li>Access & Equity: 0.11</li>
    <li>Career & Economic Outcomes:0.10</li>
    <li>Innovation & Research: 0.06</li>
    <li>Sustainability & Efficiency: 0.0</li>
    <li>Community Engagement: 0.0</li>
</ul>

**Forbes Weights**
<ul>
    <li>CEO Alumni Salary: 0.20</li>
    <li>CEO Debt: 0.15</li>
    <li>SS Graduation Rate: 0.15</li>
    <li>CEO Forbes American Leaders List: 0.15</li>
    <li>CEO Return on Investment(data from Third Way): 0.15</li>
    <li>SS Retention Rate: 0.10</li>
    <li>SS Academic Success: 0.10</li>
</ul>

<ul>
    <li>Career & Economic Outcomes:0.65</li>
    <li>Student Success: 0.35</li>
    <li>Access & Equity: 0.0</li>
    <li>Academic Resources: 0.00</li>
    <li>Innovation & Research: 0.00</li>
    <li>Sustainability & Efficiency: 0.0</li>
    <li>Community Engagement: 0.0</li>
</ul>

In [105]:
weights = {
    "Student Success": 0.23,
    "Access & Equity": 0.18,
    "Academic Resources": 0.15,
    "Career & Economic Outcomes":0.14,
    "Innovation & Research": 0.12,
    "Sustainability & Efficiency": 0.06,
    "Community Engagement": 0.17   ##brainstorm
}

#combine CEO & Student Success


In [106]:
weighted_expr = sum(pl.col(cat) * weight 
                    for cat, weight in weights.items()
                    if cat in scored_df.columns
                   )
scored_df = scored_df.with_columns(
    weighted_expr.alias("Total Score")
)
scored_df = scored_df.with_columns(
    pl.col("Total Score").rank("ordinal", descending=True).alias("Rank")
)

In [107]:
scored_df.filter(pl.col("Institution Name") == "Towson University").select(["Institution Name", "Total Score", "Rank"])

Institution Name,Total Score,Rank
str,f64,u32
"""Towson University""",0.178461,137


Carnegie Classification 2021: Undergraduate Profile (HD2023)
str
"""Four-year, full-time, selectiv…"
"""Four-year, full-time, selectiv…"
"""Two-year, higher part-time"""
"""Four-year, medium full-time, i…"
"""Two-year, mixed part/full-time"""
…
"""Four-year, full-time, more sel…"
"""Four-year, full-time, inclusiv…"
"""Four-year, higher part-time"""
"""Four-year, medium full-time, i…"
