In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns

In [None]:
atp_w151_raw_data_csv = pd.read_csv('ATP W151.csv')

## üêç Project Summary: Strategic Social Justice Messaging in Pandas

This project delivers actionable political guidance by identifying the **Optimal Messaging Strategy for a Progressive Social Justice Initiative** focused on **Immigration and Public Resources**. The entire analysis is executed in Python, leveraging the power of the Pandas library for data transformation, quality control, and feature engineering.

---

### **Analytical Framework & Data Selection**

| Variable Type | Python Role | Key Variables |
| :--- | :--- | :--- |
| **Dependent Variable (The Effect)** | **Feature Engineering** target. | **Social Justice Support Index** (A composite score derived by combining four key attitude variables related to immigrant eligibility for public assistance and impact on public resources). |
| **Independent Variables (The Factors)** | **Segmentation** variables used for `groupby()` and comparative analysis. | **Demographics & Politics:** Income (`F_INC_SDT1`), Education (`F_EDUCCAT`), Religion (`F_RELIG`), and Ideology (`F_IDEO`). |


In [None]:
atp_w151_raw_data_csv.info()

In [None]:
atp_w151_raw_data_csv.dtypes

### **Pandas Workflow & Quality Control (QC)**

The raw survey data underwent a targeted cleaning process to ensure analytical integrity:

1.  **Data Isolation:** The original $\sim 140$ column DataFrame was trimmed to a dedicated **$\mathbf{8}$-column $\text{DataFrame}$ (`analysis_df`)** using Pandas column indexing (`df[['col1', 'col2']]`).
2.  **System Missing Value Imputation (QC):** The pervasive **System Missing Code ($\mathbf{99}$)**‚Äîrepresenting "Don't Know" or "Refused"‚Äîwas identified from the codebook. This code was explicitly converted to **$\text{Python}$'s $\mathbf{np.nan}$** across all 8 columns using the powerful $\mathbf{\text{df.replace()}}$ method. This ensures that missing data is properly handled before building the final support index, demonstrating a core data quality skill.

In [None]:
key_cols = [
    'QKEY',
    'IMMSER_LGL_W151',
    'IMMSER_ILGL_W151',
    'LGIMPCT_PUBRES_W151',
    'ILLIMPCT_PUBRES_W151',
    'F_RELIG',
    'F_EDUCCAT',
    'F_INC_SDT1',
    'F_IDEO'
]

analysis_df = atp_w151_raw_data_csv[key_cols].copy()

## ‚öñÔ∏è Feature Engineering: Recoding for Scale Alignment

The **Recoding Process** is essential to ensure that when we sum the four index questions, a **high number always means high support for the Social Justice Initiative**.

**The Goal:** Align all variables so that: $\mathbf{1 = \text{Least Progressive Support}}$ and **$\mathbf{\text{Highest Code} = \text{Most Progressive Support}}$**.

---

### **1. 2-Point Inversion (Eligibility Questions)**

This step inverts the original codes ($\mathbf{1 \rightarrow 2}$ and $\mathbf{2 \rightarrow 1}$) for the two questions where the progressive answer was $\mathbf{1}$ ("Yes").

| Variable | Focus | Original Code Map | New Code Map |
| :--- | :--- | :--- | :--- |
| $\text{IMMSER\_LGL\_W151}$ | Legal Assistance | $\{1: \text{Yes}, 2: \text{No}\}$ | $\mathbf{1 \rightarrow 2}$; $\mathbf{2 \rightarrow 1}$ |
| $\text{IMMSER\_ILGL\_W151}$ | Illegal Assistance | $\{1: \text{Yes}, 2: \text{No}\}$ | $\mathbf{1 \rightarrow 2}$; $\mathbf{2 \rightarrow 1}$ |



### **2. 3-Point Remapping (Public Resources Impact Questions)**

This step remaps the codes for the two questions that originally used a 3-point scale. This ensures the most progressive answer ("Making things better") is assigned the highest score ($\mathbf{3}$).

| Variable | Focus | Original Code Map | New Code Map |
| :--- | :--- | :--- | :--- |
| $\text{LGIMPCT\_PUBRES\_W151}$ | Legal Impact | $\{1: \text{Better}, 2: \text{Worse}, 3: \text{Neutral}\}$ | $\mathbf{1 \rightarrow 3}$; $\mathbf{2 \rightarrow 1}$; $\mathbf{3 \rightarrow 2}$ |
| $\text{ILLIMPCT\_PUBRES\_W151}$ | Illegal Impact | $\{1: \text{Better}, 2: \text{Worse}, 3: \text{Neutral}\}$ | $\mathbf{1 \rightarrow 3}$; $\mathbf{2 \rightarrow 1}$; $\mathbf{3 \rightarrow 2}$ |

In [None]:
# converting the 2pt scale

columns_to_invert_2pt = ['IMMSER_LGL_W151', 'IMMSER_ILGL_W151']
inversion_map_2pt = {1:2, 2:1}

analysis_df[columns_to_invert_2pt] = analysis_df[columns_to_invert_2pt].replace(inversion_map_2pt)

In [None]:
# converting the 3pt scale
columns_to_invert_3pt = ['LGIMPCT_PUBRES_W151', 'ILLIMPCT_PUBRES_W151']
inversion_map_3pt = {1:3, 2:1, 3:2}

analysis_df[columns_to_invert_3pt] = analysis_df[columns_to_invert_3pt].replace(inversion_map_3pt)

In [None]:
# updating 99 to nan
analysis_df[key_cols] = analysis_df[key_cols].replace(99, np.nan)

In [None]:
# making the social impact index column
index_columns = [
    'IMMSER_LGL_W151',
    'IMMSER_ILGL_W151',
    'LGIMPCT_PUBRES_W151',
    'ILLIMPCT_PUBRES_W151'
]

analysis_df['sj_support_index'] = analysis_df[index_columns].sum(axis=1)

print(analysis_df['sj_support_index'].describe())

In [None]:
# sorting by Ideology
analysis_df.groupby('F_IDEO')['sj_support_index'].mean()

### üîë Key Takeaway: Targeting the Middle

The numbers show that **Moderates** are the most important group we can actually win over.

* **Average Score:** $\mathbf{6.18}$ (on a 4-10 scale).
* **The Plan:** This score means they're not against us, but they're not with us either. The goal is to craft a **carefully framed message** that pushes them into the supportive range (7.0 or higher).
* **Don't Waste Effort:** We shouldn't spend money trying to persuade the **Liberals ($\mathbf{7.55}$)**‚Äîthey're already sold. Resources should focus entirely on moving the Moderates.

In [None]:
ideo_map = {1: 'Very Conservative', 2: 'Conservative', 3: 'Moderate', 4: 'Liberal', 5: 'Very Liberal'}
analysis_df['F_IDEO_LABEL'] = analysis_df['F_IDEO'].map(ideo_map)

ideology = analysis_df.groupby('F_IDEO_LABEL')['sj_support_index'].mean().sort_values()
ideology = ideology.rename('Avg Support Index (4-10)')

plt.figure(figsize=(9,5))
sns.barplot(x=ideology.index, y=ideology.values, palette='coolwarm')
plt.title('SJ Support Index by Ideology (Targeting the Persuadable Moderate)', fontsize=14)
plt.xlabel('Ideological Segment')
plt.ylabel('Average SJ Support Index')
plt.yticks(np.arange(4,8,0.5))
plt.grid(axis='y', alpha=0.5)
plt.show()

In [None]:
# sorting by income
analysis_df.groupby('F_INC_SDT1')['sj_support_index'].mean()

## üí∞ Strategic Insight 2: Income Segmentation

The numbers show that support for the initiative is **flat** across all income levels‚Äîeveryone scores about the same.

### üîë Key Takeaway for the Strategist

* **Income Doesn't Matter:** We don't need to waste time or money splitting our audience by how much money they make. **Income is not what's driving support** for this initiative.
* **Simple Focus:** This means the **Moderates** we decided to target based on ideology can be messaged **equally** whether they are low-income or high-income. Don't overcomplicate the ad buys.

In [None]:
# sorting by education
analysis_df.groupby('F_EDUCCAT')['sj_support_index'].mean()

## üéì Strategic Insight 3: Education Segmentation

The scores show a small difference based on education, but not a huge deal:

| F\_EDUCCAT Code | Education Label | Average $\mathbf{SJ\_Support\_Index}$ (4-10) |
| :---: | :--- | :---: |
| 1.0 | **College graduate+** (High) | $\mathbf{6.37}$ |
| 2.0 | Some College | $6.02$ |
| 3.0 | H.S. graduate or less (Low) | $5.99$ |

### üîë Key Takeaway for the Strategist

* **Education is a Bonus:** College grads support the initiative best ($\mathbf{6.37}$), but since the scores are close across the board (only about a 0.4 point difference), education shouldn't be your main focus.
* **Smart Filter:** Target the **Moderates** first (your main group). If you have extra budget, use **College Grad** as a simple filter *within* the Moderate group to get the highest probability of conversion.

In [None]:
# segmenting by religion
analysis_df.groupby('F_RELIG')['sj_support_index'].mean()

### üîë Key Takeaway: Religion Divides the Audience

The $\mathbf{1.10}$-point gap in the scores is a big deal, meaning we have to run **two very different strategies** based on religion:

* **Go All In (High Support, $\mathbf{\sim 6.8}$ and up):** Voters who identify as **Secular or Non-Traditional** (like Hindu, Atheist, etc.) are already on our side. The money here should go straight to **mobilization** using direct, values-based appeals. They are a high-return investment.

* **Be Careful (Low Support, $\mathbf{\sim 6.0}$ and below):** Traditional Christian voters (Protestant, Catholic, Mormon) are where we see the most resistance. We have two options:
    1.  **Avoid the topic entirely.**
    2.  If we must message them, use **neutral, economic language** (e.g., "fiscal health," "job growth") and avoid all social or moral arguments.

* **Actionable Strategy:** Do not try to win over the resistant Christian groups. **Focus all mobilization budget on secular and non-Christian voters** where our support is already strong.