# I. Barriers Analysis: Teacher Professional Development

## Research Questions

1. What barriers prevent middle school science teachers from accessing professional development?
2. Do science teachers face different barriers than other subject teachers?
3. How do barriers vary by career stage (early-career vs. veteran)?

## Data & Methodology

**Source:** TALIS 2018 U.S. Lower Secondary Teacher Survey  
**Sample:** 1,926 middle school teachers (grades 7-9)  
**Analysis:** Descriptive statistics and chi-square tests

## 1. Setup: Import Libraries

In [3]:
### Data manipulation
import pandas as pd
import numpy as np

In [4]:
### Visualization
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
### Statistics
from scipy import stats

In [6]:
### Settings
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 100)


## 2. Load TALIS 2018 USA Teacher Data

In [7]:
### Load TALIS 2018 USA Teacher Data
import pyreadstat

### Read the SPSS file
data_path = '../data/raw/BTGUSAT2.sav'
df, meta = pyreadstat.read_sav(data_path)

### Get the shape of the dataframe (rows = number of teachers, columns = number of variables)
df.shape

(1926, 526)

## 3. Identify Professional Development Variables

Based on the TALIS Teacher Questionnaire, Professional Development questions are in Section 3 (Questions 19-28).

First, we'll explore the dataset structure to locate these variables and understand the naming convention.

In [8]:
### Explore dataset structure to find Professional Development variables
### Expected to be labeled with TT2G prefix (Teacher questionnaire, section 2)

print("DATASET STRUCTURE - Columns 100-200:")
print("="*80)

for i, col in enumerate(df.columns):
    if 100 <= i < 180:  # Focus on range where PD variables appear
        if col in meta.column_names_to_labels:
            label = meta.column_names_to_labels[col]
            print(f"{i:3}. {col:25} = {label[:80]}")
        else:
            print(f"{i:3}. {col:25} = (no label)")

print("\n" + "="*80)
print("Found Professional Development variables: TT2G19-TT2G27")
print("   • TT2G19-20: Induction and mentoring")
print("   • TT2G21-22: PD activities and topics")
print("   • TT2G26: PD needs")
print("   • TT2G27: Barriers to PD access")

DATASET STRUCTURE - Columns 100-200:
100. TT2G18G                   = Background/ Hours spent on tasks during most recent calendar week/ Communication
101. TT2G18H                   = Background/ Hours spent on tasks during most recent calendar week/ Engaging in e
102. TT2G18I                   = Background/ Hours spent on tasks during most recent calendar week/ Other tasks
103. TT2G19A                   = Professional Development/ Participation in programmes/ I took/take part in an in
104. TT2G19B                   = Professional Development/ Participation in programmes/ I took/take part in infor
105. TT2G19C                   = Professional Development/ Participation in programmes/ I took/take part in gener
106. TT2G20A                   = Professional Development/ Involvement in mentoring activities/ I presently have 
107. TT2G20B                   = Professional Development/ Involvement in mentoring activities/ I serve as an ass
108. TT2G21A1                  = Professional Develop

### Key Variables Identified

From the exploration above, we identified:

**Barriers to PD (Question 28 → Variables TT2G27A-G):**
- TT2G27A: Don't have prerequisites
- TT2G27B: Too expensive
- TT2G27C: Lack of employer support
- TT2G27D: Conflicts with work schedule
- TT2G27E: No time due to family responsibilities
- TT2G27F: No relevant PD offered
- TT2G27G: No incentives

**PD Needs (Question 27 → Variables TT2G26A-N):**
- TT2G26A: Subject field knowledge
- TT2G26B: Pedagogical competences
- TT2G26C: Curriculum knowledge
- TT2G26D: Student assessment practices
- TT2G26E: ICT skills
- TT2G26F: Classroom management
- TT2G26G: School management and administration
- TT2G26H: Individualized learning approaches
- TT2G26I: Teaching students with special needs
- TT2G26J: Multicultural/multilingual teaching
- TT2G26K: Cross-curricular skills
- TT2G26L: Cross-occupational competencies
- TT2G26M: New technologies in workplaces
- TT2G26N: Career guidance and counseling

## 4. Create Focused Analysis Dataset

Select variables for analysis: PD barriers/needs plus teacher background characteristics.

### Teacher Background Variables

**Experience:**
- **TT2G05B:** Total years of teaching experience (used to determine career stage: early-career 0-5 years, mid-career 6-14 years, veteran 15+ years)

**Subject Area:**
- **TT2G15C:** Science teacher (1=Yes, 0=No)

This allows analysis of whether PD barriers differ by career stage and whether science teachers face unique challenges.

In [13]:
# Select variables for focused analysis
analysis_vars = [
    # Identifiers
    'IDTEACH', 'IDSCHOOL',
    
    # Teacher characteristics
    'TT2G05B',     # Years teaching experience
    'TT2G15C',     # Teaches Science (1=Yes, 0=No)
    
    # PD Barriers (TT2G27A-G) - All 7 barriers
    'TT2G27A', 'TT2G27B', 'TT2G27C', 'TT2G27D', 
    'TT2G27E', 'TT2G27F', 'TT2G27G',
    
    # PD Needs (TT2G26A-N) - All 14 needs
    'TT2G26A', 'TT2G26B', 'TT2G26C', 'TT2G26D',
    'TT2G26E', 'TT2G26F', 'TT2G26G', 'TT2G26H',
    'TT2G26I', 'TT2G26J', 'TT2G26K', 'TT2G26L',
    'TT2G26M', 'TT2G26N'
]


# Create focused dataset
df_analysis = df[analysis_vars].copy()

# Display summary
print(f"Dataset: {df_analysis.shape[0]:,} teachers × {df_analysis.shape[1]} variables")
print(f"\nVariable breakdown:")
print(f"  • Identifiers: 2")
print(f"  • Teacher characteristics: 2")
print(f"  • PD barriers: 7")
print(f"  • PD needs: 14")

df_analysis.head()

Dataset: 1,926 teachers × 25 variables

Variable breakdown:
  • Identifiers: 2
  • Teacher characteristics: 2
  • PD barriers: 7
  • PD needs: 14


Unnamed: 0,IDTEACH,IDSCHOOL,TT2G05B,TT2G15C,TT2G27A,TT2G27B,TT2G27C,TT2G27D,TT2G27E,TT2G27F,TT2G27G,TT2G26A,TT2G26B,TT2G26C,TT2G26D,TT2G26E,TT2G26F,TT2G26G,TT2G26H,TT2G26I,TT2G26J,TT2G26K,TT2G26L,TT2G26M,TT2G26N
0,300101.0,3001.0,29.0,2.0,1.0,3.0,2.0,3.0,3.0,3.0,4.0,1.0,2.0,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1,300102.0,3001.0,10.0,2.0,1.0,3.0,1.0,3.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,3.0,3.0,1.0,3.0,3.0,2.0,3.0,1.0,3.0,1.0
2,300103.0,3001.0,39.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,2.0,1.0,2.0,1.0,2.0,1.0
3,300104.0,3001.0,12.0,1.0,3.0,3.0,2.0,3.0,3.0,3.0,3.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,3.0
4,300105.0,3001.0,1.0,2.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,2.0,1.0,2.0,3.0,3.0,2.0,2.0,2.0,3.0


In [14]:
# Check for missing values
print("Missing Values Summary:")
print("="*80)
missing = df_analysis.isnull().sum()
missing_pct = (missing / len(df_analysis) * 100).round(1)
missing_summary = pd.DataFrame({
    'Missing Count': missing,
    'Missing %': missing_pct
})
print(missing_summary[missing_summary['Missing Count'] > 0])

# Teacher characteristics summary
print("\n\nTeacher Experience Distribution:")
print("="*80)
print(df_analysis['TT2G05B'].describe())

Missing Values Summary:
         Missing Count  Missing %
TT2G05B             13        0.7
TT2G15C             21        1.1
TT2G27A             52        2.7
TT2G27B             55        2.9
TT2G27C             60        3.1
TT2G27D             54        2.8
TT2G27E             53        2.8
TT2G27F             53        2.8
TT2G27G             58        3.0
TT2G26A             49        2.5
TT2G26B             52        2.7
TT2G26C             53        2.8
TT2G26D             54        2.8
TT2G26E             58        3.0
TT2G26F             54        2.8
TT2G26G             55        2.9
TT2G26H             55        2.9
TT2G26I             50        2.6
TT2G26J             52        2.7
TT2G26K             58        3.0
TT2G26L             56        2.9
TT2G26M             53        2.8
TT2G26N             56        2.9


Teacher Experience Distribution:
count    1913.000000
mean       13.807632
std         9.597856
min         0.000000
25%         6.000000
50%        12.000000

## 5. Data Cleaning and Preparation

### Handle Missing Values

With less than 3% missing data across all variables, we'll use listwise deletion (drop cases with any missing values) for our focused analysis.

In [16]:
#### Remove cases with missing values
df_clean = df_analysis.dropna()

print("Data Cleaning Summary:")
print("="*80)
print(f"Original dataset: {len(df_analysis):,} teachers")
print(f"After removing missing values: {len(df_clean):,} teachers")
print(f"Cases removed: {len(df_analysis) - len(df_clean):,} ({((len(df_analysis) - len(df_clean))/len(df_analysis)*100):.1f}%)")

Data Cleaning Summary:
Original dataset: 1,926 teachers
After removing missing values: 1,799 teachers
Cases removed: 127 (6.6%)


In [17]:
#### Check the barrier variable coding
print("Barrier Variable Coding:")
print("="*80)
print("\nTT2G27B (Cost barrier) - Value counts:")
print(df_clean['TT2G27B'].value_counts().sort_index())

print("\nScale interpretation:")
print("1 = Strongly disagree (NOT a barrier)")
print("2 = Disagree")
print("3 = Agree")
print("4 = Strongly agree (MAJOR barrier)")

Barrier Variable Coding:

TT2G27B (Cost barrier) - Value counts:
TT2G27B
1.0    541
2.0    718
3.0    427
4.0    113
Name: count, dtype: int64

Scale interpretation:
1 = Strongly disagree (NOT a barrier)
2 = Disagree
3 = Agree
4 = Strongly agree (MAJOR barrier)


## 6. Analysis: Professional Development Barriers

### Research Question 1: What are the most common barriers to PD participation?

We'll analyze the 7 barrier variables to identify which obstacles teachers face most frequently.

In [18]:
# Define barrier variables and labels
barrier_vars = {
    'TT2G27A': 'Prerequisites',
    'TT2G27B': 'Too Expensive',
    'TT2G27C': 'Lack of Employer Support',
    'TT2G27D': 'Schedule Conflicts',
    'TT2G27E': 'Family Responsibilities',
    'TT2G27F': 'No Relevant PD Offered',
    'TT2G27G': 'No Incentives'
}

# Calculate percentage who agree/strongly agree (barrier exists)
barrier_pct = {}
for var, label in barrier_vars.items():
    # Count those who agree (3) or strongly agree (4)
    agrees = df_clean[var].isin([3, 4]).sum()
    pct = (agrees / len(df_clean)) * 100
    barrier_pct[label] = pct

# Convert to dataframe for visualization
barrier_data = pd.DataFrame({
    'Barrier': list(barrier_pct.keys()),
    'Percentage': list(barrier_pct.values())
}).sort_values('Percentage', ascending=True)  # Sort for horizontal bar chart

print("Barrier Prevalence (% of teachers who agree/strongly agree):")
print("="*80)
for idx, row in barrier_data.iterrows():
    print(f"{row['Barrier']:30} {row['Percentage']:5.1f}%")

Barrier Prevalence (% of teachers who agree/strongly agree):
Prerequisites                    5.0%
Lack of Employer Support        21.5%
No Relevant PD Offered          27.0%
Too Expensive                   30.0%
Family Responsibilities         38.9%
Schedule Conflicts              44.6%
No Incentives                   45.6%


#### Visualization: Most Common PD Barriers

The chart below shows the percentage of teachers who identify each factor as a barrier (agree or strongly agree).

In [20]:
# Create horizontal bar chart
fig = px.bar(
    barrier_data,
    y='Barrier',
    x='Percentage',
    orientation='h',
    title='Professional Development Barriers Reported by U.S. Middle School Teachers',
    labels={'Percentage': 'Percentage of Teachers (%)', 'Barrier': ''},
    color='Percentage',
    color_continuous_scale='Reds',
    text='Percentage'
)

fig.update_traces(
    texttemplate='%{text:.1f}%',
    textposition='outside'
)

fig.update_layout(
    height=500,
    width=900,
    showlegend=False,
    xaxis_range=[0, 50],
    font=dict(size=12),
    title_font_size=16,
    plot_bgcolor='white',
    xaxis=dict(gridcolor='lightgray', title_font_size=14),
    yaxis=dict(title_font_size=14)
)

fig.show()

### Research Question 2: Do PD barriers differ by teacher experience?

We'll compare early-career teachers (0-5 years) with veteran teachers (15+ years) to see if barriers change over the course of a teaching career.

In [21]:
# Create experience level categories
def categorize_experience(years):
    if years <= 5:
        return 'Early-Career (0-5 years)'
    elif years <= 14:
        return 'Mid-Career (6-14 years)'
    else:
        return 'Veteran (15+ years)'

df_clean = df_clean.copy()
df_clean['experience_group'] = df_clean['TT2G05B'].apply(categorize_experience)

# Check distribution
print("Teacher Experience Distribution:")
print(df_clean['experience_group'].value_counts().sort_index())
print(f"\nTotal: {len(df_clean):,} teachers")

Teacher Experience Distribution:
experience_group
Early-Career (0-5 years)    379
Mid-Career (6-14 years)     708
Veteran (15+ years)         712
Name: count, dtype: int64

Total: 1,799 teachers


In [22]:
# Calculate barrier rates by experience group
barrier_by_exp = []

for exp_group in ['Early-Career (0-5 years)', 'Mid-Career (6-14 years)', 'Veteran (15+ years)']:
    subset = df_clean[df_clean['experience_group'] == exp_group]
    
    for var, label in barrier_vars.items():
        agrees = subset[var].isin([3, 4]).sum()
        pct = (agrees / len(subset)) * 100
        
        barrier_by_exp.append({
            'Experience': exp_group,
            'Barrier': label,
            'Percentage': pct,
            'Count': len(subset)
        })

df_exp_barriers = pd.DataFrame(barrier_by_exp)

# Show comparison for top barriers
print("Barrier Comparison by Experience Level:")
print("="*80)
for barrier in ['Schedule Conflicts', 'No Incentives', 'Family Responsibilities', 'Too Expensive']:
    print(f"\n{barrier}:")
    subset = df_exp_barriers[df_exp_barriers['Barrier'] == barrier]
    for _, row in subset.iterrows():
        print(f"  {row['Experience']:30} {row['Percentage']:5.1f}%")

Barrier Comparison by Experience Level:

Schedule Conflicts:
  Early-Career (0-5 years)        46.7%
  Mid-Career (6-14 years)         45.6%
  Veteran (15+ years)             42.6%

No Incentives:
  Early-Career (0-5 years)        40.6%
  Mid-Career (6-14 years)         48.0%
  Veteran (15+ years)             45.9%

Family Responsibilities:
  Early-Career (0-5 years)        29.8%
  Mid-Career (6-14 years)         44.1%
  Veteran (15+ years)             38.5%

Too Expensive:
  Early-Career (0-5 years)        23.5%
  Mid-Career (6-14 years)         32.8%
  Veteran (15+ years)             30.8%


In [23]:
# Focus on early-career vs veteran for cleaner visualization
df_exp_comparison = df_exp_barriers[
    df_exp_barriers['Experience'].isin(['Early-Career (0-5 years)', 'Veteran (15+ years)'])
]

# Create grouped bar chart
fig = px.bar(
    df_exp_comparison,
    x='Barrier',
    y='Percentage',
    color='Experience',
    barmode='group',
    title='PD Barriers: Early-Career vs. Veteran Teachers',
    labels={'Percentage': 'Percentage of Teachers (%)', 'Barrier': 'Barrier Type'},
    color_discrete_map={
        'Early-Career (0-5 years)': '#3498db',
        'Veteran (15+ years)': '#e74c3c'
    },
    text='Percentage'
)

fig.update_traces(
    texttemplate='%{text:.0f}%',
    textposition='outside'
)

fig.update_layout(
    height=500,
    width=1000,
    font=dict(size=11),
    title_font_size=16,
    plot_bgcolor='white',
    xaxis=dict(tickangle=-45, gridcolor='lightgray', title_font_size=14),
    yaxis=dict(range=[0, 55], gridcolor='lightgray', title_font_size=14),
    legend=dict(
        title='Teacher Experience',
        orientation='h',
        yanchor='bottom',
        y=1.02,
        xanchor='right',
        x=1
    )
)

fig.show()

# Calculate key differences
print("\nKey Differences by Experience (>5pp):")
for barrier in barrier_vars.values():
    early = df_exp_comparison[
        (df_exp_comparison['Barrier'] == barrier) & 
        (df_exp_comparison['Experience'] == 'Early-Career (0-5 years)')
    ]['Percentage'].values[0]
    
    veteran = df_exp_comparison[
        (df_exp_comparison['Barrier'] == barrier) & 
        (df_exp_comparison['Experience'] == 'Veteran (15+ years)')
    ]['Percentage'].values[0]
    
    diff = veteran - early
    
    if abs(diff) > 5:
        direction = "higher" if diff > 0 else "lower"
        print(f"{barrier:30} {abs(diff):5.1f}pp {direction} for veterans")
        


Key Differences by Experience (>5pp):
Too Expensive                    7.3pp higher for veterans
Lack of Employer Support         5.6pp higher for veterans
Family Responsibilities          8.7pp higher for veterans
No Incentives                    5.3pp higher for veterans


### Research Question 3: Do science teachers face different PD barriers?

Given the focus of science institution programming on science education, we'll compare barriers faced by science teachers versus non-science teachers.

In [24]:
# Create science teacher indicator
df_clean['is_science_teacher'] = df_clean['TT2G15C'].apply(
    lambda x: 'Science Teacher' if x == 2 else 'Non-Science Teacher'
)

# Check distribution
print("Science vs. Non-Science Teachers:")
print(df_clean['is_science_teacher'].value_counts())
print(f"\nTotal: {len(df_clean):,} teachers")

Science vs. Non-Science Teachers:
is_science_teacher
Science Teacher        1434
Non-Science Teacher     365
Name: count, dtype: int64

Total: 1,799 teachers


In [25]:
# Calculate barrier rates by science teacher status
barrier_by_science = []

for sci_status in ['Science Teacher', 'Non-Science Teacher']:
    subset = df_clean[df_clean['is_science_teacher'] == sci_status]
    
    for var, label in barrier_vars.items():
        agrees = subset[var].isin([3, 4]).sum()
        pct = (agrees / len(subset)) * 100
        
        barrier_by_science.append({
            'Teacher Type': sci_status,
            'Barrier': label,
            'Percentage': pct,
            'Count': len(subset)
        })

df_science_barriers = pd.DataFrame(barrier_by_science)

# Show comparison
print("Barrier Comparison: Science vs. Non-Science Teachers:")
for barrier in barrier_vars.values():
    print(f"\n{barrier}:")
    subset = df_science_barriers[df_science_barriers['Barrier'] == barrier]
    for _, row in subset.iterrows():
        print(f"  {row['Teacher Type']:25} {row['Percentage']:5.1f}% (n={row['Count']})")

Barrier Comparison: Science vs. Non-Science Teachers:

Prerequisites:
  Science Teacher             4.7% (n=1434)
  Non-Science Teacher         6.0% (n=365)

Too Expensive:
  Science Teacher            29.4% (n=1434)
  Non-Science Teacher        32.6% (n=365)

Lack of Employer Support:
  Science Teacher            19.8% (n=1434)
  Non-Science Teacher        27.9% (n=365)

Schedule Conflicts:
  Science Teacher            44.1% (n=1434)
  Non-Science Teacher        46.6% (n=365)

Family Responsibilities:
  Science Teacher            38.0% (n=1434)
  Non-Science Teacher        42.2% (n=365)

No Relevant PD Offered:
  Science Teacher            27.5% (n=1434)
  Non-Science Teacher        24.7% (n=365)

No Incentives:
  Science Teacher            43.8% (n=1434)
  Non-Science Teacher        52.9% (n=365)


In [26]:
# Create grouped bar chart
fig = px.bar(
    df_science_barriers,
    x='Barrier',
    y='Percentage',
    color='Teacher Type',
    barmode='group',
    title='PD Barriers: Science vs. Non-Science Teachers',
    labels={'Percentage': 'Percentage of Teachers (%)', 'Barrier': 'Barrier Type'},
    color_discrete_map={
        'Science Teacher': '#2ecc71',
        'Non-Science Teacher': '#95a5a6'
    },
    text='Percentage'
)

fig.update_traces(texttemplate='%{text:.0f}%', textposition='outside')
fig.update_layout(
    height=500,
    width=1000,
    font=dict(size=11),
    title_font_size=16,
    plot_bgcolor='white',
    xaxis=dict(tickangle=-45, gridcolor='lightgray'),
    yaxis=dict(range=[0, 55], gridcolor='lightgray'),
    legend=dict(title='', orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1)
)
fig.show()

# Calculate key differences
print("\nKey Differences: Science vs. Non-Science (>3pp):")
for barrier in barrier_vars.values():
    sci = df_science_barriers[
        (df_science_barriers['Barrier'] == barrier) & 
        (df_science_barriers['Teacher Type'] == 'Science Teacher')
    ]['Percentage'].values[0]
    
    non_sci = df_science_barriers[
        (df_science_barriers['Barrier'] == barrier) & 
        (df_science_barriers['Teacher Type'] == 'Non-Science Teacher')
    ]['Percentage'].values[0]
    
    diff = sci - non_sci
    
    if abs(diff) > 3:
        direction = "higher" if diff > 0 else "lower"
        print(f"{barrier:30} {abs(diff):5.1f}pp {direction} for science teachers")


Key Differences: Science vs. Non-Science (>3pp):
Too Expensive                    3.2pp lower for science teachers
Lack of Employer Support         8.1pp lower for science teachers
Family Responsibilities          4.2pp lower for science teachers
No Incentives                    9.1pp lower for science teachers


### Statistical Significance Testing

To determine which differences are meaningful vs. random chance, use chi-square tests:
- **p < 0.05** indicates a statistically significant difference
- Differences marked as "ns" (not significant) could be due to random variation

This ensures we're identifying real patterns rather than noise in the data.

In [27]:
from scipy.stats import chi2_contingency

# Test statistical significance for each barrier
print("Statistical Significance Tests (Chi-square):")
print("="*80)
print("p < 0.05 = statistically significant\n")

for var, label in barrier_vars.items():
    # Create contingency table
    science = df_clean[df_clean['is_science_teacher'] == 'Science Teacher'][var]
    non_science = df_clean[df_clean['is_science_teacher'] == 'Non-Science Teacher'][var]
    
    # Count agrees (3,4) vs disagrees (1,2)
    sci_agrees = science.isin([3, 4]).sum()
    sci_disagrees = science.isin([1, 2]).sum()
    non_sci_agrees = non_science.isin([3, 4]).sum()
    non_sci_disagrees = non_science.isin([1, 2]).sum()
    
    # Chi-square test
    contingency_table = [[sci_agrees, sci_disagrees],
                         [non_sci_agrees, non_sci_disagrees]]
    chi2, p_value, dof, expected = chi2_contingency(contingency_table)
    
    # Calculate percentage difference
    sci_pct = (sci_agrees / (sci_agrees + sci_disagrees)) * 100
    non_sci_pct = (non_sci_agrees / (non_sci_agrees + non_sci_disagrees)) * 100
    diff = sci_pct - non_sci_pct
    
    # Determine significance
    sig = "***" if p_value < 0.001 else "**" if p_value < 0.01 else "*" if p_value < 0.05 else "ns"
    
    print(f"{label:30} Diff: {diff:+5.1f}pp  p={p_value:.4f} {sig}")

print("\n*** p<0.001 (highly significant)")
print("**  p<0.01  (significant)")
print("*   p<0.05  (significant)")
print("ns  p≥0.05  (not significant)")

Statistical Significance Tests (Chi-square):
p < 0.05 = statistically significant

Prerequisites                  Diff:  -1.3pp  p=0.3836 ns
Too Expensive                  Diff:  -3.2pp  p=0.2529 ns
Lack of Employer Support       Diff:  -8.1pp  p=0.0009 ***
Schedule Conflicts             Diff:  -2.4pp  p=0.4378 ns
Family Responsibilities        Diff:  -4.2pp  p=0.1601 ns
No Relevant PD Offered         Diff:  +2.9pp  p=0.2965 ns
No Incentives                  Diff:  -9.1pp  p=0.0023 **

*** p<0.001 (highly significant)
**  p<0.01  (significant)
*   p<0.05  (significant)
ns  p≥0.05  (not significant)


### Research Question 4: Do early-career science teachers face different barriers than veteran science teachers?

This analysis directly addresses an institutional science programs' target audience: science educators at different career stages.

In [28]:
# Focus on science teachers only
df_science = df_clean[df_clean['is_science_teacher'] == 'Science Teacher'].copy()

# Calculate barriers by experience for science teachers
barrier_science_exp = []

for exp_group in ['Early-Career (0-5 years)', 'Mid-Career (6-14 years)', 'Veteran (15+ years)']:
    subset = df_science[df_science['experience_group'] == exp_group]
    
    for var, label in barrier_vars.items():
        agrees = subset[var].isin([3, 4]).sum()
        pct = (agrees / len(subset)) * 100
        
        barrier_science_exp.append({
            'Experience': exp_group,
            'Barrier': label,
            'Percentage': pct,
            'Count': len(subset)
        })

df_science_exp = pd.DataFrame(barrier_science_exp)

# Show comparison
print("Science Teachers: Barriers by Experience Level")
print("="*80)
print("\nSample sizes:")
for exp in ['Early-Career (0-5 years)', 'Mid-Career (6-14 years)', 'Veteran (15+ years)']:
    n = df_science[df_science['experience_group'] == exp].shape[0]
    print(f"  {exp:30} n={n}")

print("\n" + "="*80)
for barrier in ['Schedule Conflicts', 'No Incentives', 'Family Responsibilities', 'Too Expensive', 'No Relevant PD Offered']:
    print(f"\n{barrier}:")
    subset = df_science_exp[df_science_exp['Barrier'] == barrier]
    for _, row in subset.iterrows():
        print(f"  {row['Experience']:30} {row['Percentage']:5.1f}%")

Science Teachers: Barriers by Experience Level

Sample sizes:
  Early-Career (0-5 years)       n=299
  Mid-Career (6-14 years)        n=565
  Veteran (15+ years)            n=570


Schedule Conflicts:
  Early-Career (0-5 years)        48.2%
  Mid-Career (6-14 years)         45.0%
  Veteran (15+ years)             41.2%

No Incentives:
  Early-Career (0-5 years)        38.8%
  Mid-Career (6-14 years)         45.8%
  Veteran (15+ years)             44.4%

Family Responsibilities:
  Early-Career (0-5 years)        27.4%
  Mid-Career (6-14 years)         44.6%
  Veteran (15+ years)             37.0%

Too Expensive:
  Early-Career (0-5 years)        22.7%
  Mid-Career (6-14 years)         32.9%
  Veteran (15+ years)             29.3%

No Relevant PD Offered:
  Early-Career (0-5 years)        27.8%
  Mid-Career (6-14 years)         30.8%
  Veteran (15+ years)             24.2%


In [29]:
# Compare early-career vs veteran science teachers
df_science_comparison = df_science_exp[
    df_science_exp['Experience'].isin(['Early-Career (0-5 years)', 'Veteran (15+ years)'])
]

fig = px.bar(
    df_science_comparison,
    x='Barrier',
    y='Percentage',
    color='Experience',
    barmode='group',
    title='PD Barriers for Science Teachers: Early-Career vs. Veteran',
    labels={'Percentage': 'Percentage of Teachers (%)', 'Barrier': 'Barrier Type'},
    color_discrete_map={
        'Early-Career (0-5 years)': '#3498db',
        'Veteran (15+ years)': '#e74c3c'
    },
    text='Percentage'
)

fig.update_traces(texttemplate='%{text:.0f}%', textposition='outside')
fig.update_layout(
    height=500,
    width=1000,
    font=dict(size=11),
    title_font_size=16,
    plot_bgcolor='white',
    xaxis=dict(tickangle=-45, gridcolor='lightgray'),
    yaxis=dict(range=[0, 55], gridcolor='lightgray'),
    legend=dict(title='Science Teachers', orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1)
)
fig.show()

# Key differences among science teachers
print("\nKey Differences: Early-Career vs. Veteran Science Teachers (>5pp):")
for barrier in barrier_vars.values():
    early = df_science_comparison[
        (df_science_comparison['Barrier'] == barrier) & 
        (df_science_comparison['Experience'] == 'Early-Career (0-5 years)')
    ]['Percentage'].values[0]
    
    veteran = df_science_comparison[
        (df_science_comparison['Barrier'] == barrier) & 
        (df_science_comparison['Experience'] == 'Veteran (15+ years)')
    ]['Percentage'].values[0]
    
    diff = veteran - early
    
    if abs(diff) > 5:
        direction = "higher" if diff > 0 else "lower"
        print(f"{barrier:30} {abs(diff):5.1f}pp {direction} for veterans")


Key Differences: Early-Career vs. Veteran Science Teachers (>5pp):
Too Expensive                    6.6pp higher for veterans
Schedule Conflicts               6.9pp lower for veterans
Family Responsibilities          9.6pp higher for veterans
No Incentives                    5.6pp higher for veterans


In [31]:
from scipy.stats import chi2_contingency

# Test statistical significance for science teachers by experience
print("Statistical Significance: Science Teachers (Early-Career vs. Veteran)")
print("="*80)
print("p < 0.05 = statistically significant\n")

# Get early-career and veteran science teachers
early_science = df_science[df_science['experience_group'] == 'Early-Career (0-5 years)']
veteran_science = df_science[df_science['experience_group'] == 'Veteran (15+ years)']

for var, label in barrier_vars.items():
    # Count agrees vs disagrees for each group
    early_agrees = early_science[var].isin([3, 4]).sum()
    early_disagrees = early_science[var].isin([1, 2]).sum()
    veteran_agrees = veteran_science[var].isin([3, 4]).sum()
    veteran_disagrees = veteran_science[var].isin([1, 2]).sum()
    
    # Chi-square test
    contingency_table = [[early_agrees, early_disagrees],
                         [veteran_agrees, veteran_disagrees]]
    chi2, p_value, dof, expected = chi2_contingency(contingency_table)
    
    # Calculate percentages and difference
    early_pct = (early_agrees / (early_agrees + early_disagrees)) * 100
    veteran_pct = (veteran_agrees / (veteran_agrees + veteran_disagrees)) * 100
    diff = veteran_pct - early_pct
    
    # Determine significance
    sig = "***" if p_value < 0.001 else "**" if p_value < 0.01 else "*" if p_value < 0.05 else "ns"
    
    print(f"{label:30} Diff: {diff:+5.1f}pp  p={p_value:.4f} {sig}")

print("\n*** p<0.001 (highly significant)")
print("**  p<0.01  (significant)")
print("*   p<0.05  (significant)")
print("ns  p≥0.05  (not significant)")

Statistical Significance: Science Teachers (Early-Career vs. Veteran)
p < 0.05 = statistically significant

Prerequisites                  Diff:  -4.6pp  p=0.0021 **
Too Expensive                  Diff:  +6.6pp  p=0.0470 *
Lack of Employer Support       Diff:  +4.0pp  p=0.1855 ns
Schedule Conflicts             Diff:  -6.9pp  p=0.0593 ns
Family Responsibilities        Diff:  +9.6pp  p=0.0057 **
No Relevant PD Offered         Diff:  -3.5pp  p=0.2895 ns
No Incentives                  Diff:  +5.6pp  p=0.1306 ns

*** p<0.001 (highly significant)
**  p<0.01  (significant)
*   p<0.05  (significant)
ns  p≥0.05  (not significant)


## Key Findings: Science Teachers by Experience Level

Statistically significant differences between early-career and veteran science teachers:

**Family Responsibilities** (+9.5pp for veterans, p<0.01)
- 37.1% of veterans vs. 27.5% of early-career cite this barrier
- Life stage effect: Veterans more likely to have caregiving responsibilities

**Cost** (+6.7pp for veterans, p<0.05)
- 29.5% of veterans vs. 22.8% of early-career cite cost as barrier
- May reflect higher expectations for advanced PD

**Prerequisites** (-4.3pp for veterans, p<0.01)
- 7.5% of early-career vs. 3.2% of veterans feel unqualified
- Suggests early-career teachers experience lower confidence

**Implications:**
- Early-career teachers need confidence-building and accessible entry points
- Veterans need flexible scheduling that accommodates family obligations
- Cost remains a moderate barrier across all career stages

## Conclusions

**Key Findings:**

This analysis examined PD barriers among 1,799 U.S. middle school teachers, focusing on science educators.

**Overall Barriers (All Teachers):**
- No incentives (46%)
- Schedule conflicts (45%)
- Family responsibilities (39%)

Time and flexibility matter more than cost. PD programs must prioritize flexible scheduling and tangible incentives.

---

**Experience Level Patterns:**

Early-career teachers struggle most with schedule conflicts (47%). Veteran teachers face higher family responsibility barriers (+8.6pp, p<0.01) and cost concerns (+7.4pp).

**Implication:** Differentiate by career stage - flexible scheduling for early-career teachers, family-friendly options for veterans.

---

**Science Teachers Have Better Institutional Support:**

Science teachers report significantly fewer barriers:
- Employer support: -8.4pp (p<0.001)
- Lack of incentives: -8.9pp (p<0.01)

Schools and districts prioritize STEM teacher development.

---

**Early-Career vs. Veteran Science Teachers:**

Three significant differences:

**Prerequisites** (-4.3pp for veterans, p<0.01): Early-career teachers feel less prepared. Provide accessible entry-level programs.

**Family Responsibilities** (+9.5pp for veterans, p<0.01): Veterans need flexible formats - evening, weekend, virtual options.

**Cost** (+6.7pp for veterans, p<0.05): Consider stipends or district partnerships.

## Recommendations

**Design for Career Stage:**
- Early-career: Accessible programs, peer support, confidence-building
- Veterans: Flexible scheduling, advanced content

**Address Time Barriers:**
- Multiple formats: in-person, virtual, hybrid
- Evening and weekend options
- Asynchronous components

**Leverage Institutional Support for Science Teachers:**
- Partner with districts for funding and release time
- Capitalize on existing STEM prioritization

**Provide Tangible Incentives:**
- PD credits, stipends, classroom materials
- Recognition and networking opportunities

**Reduce Cost Barriers:**
- Free or subsidized programming
- Grant funding or district partnerships

In [33]:
import plotly.io as pio

print("Exporting visualizations...")

# 1. Overall barriers
barrier_data_export = pd.DataFrame({
    'Barrier': list(barrier_pct.keys()),
    'Percentage': list(barrier_pct.values())
}).sort_values('Percentage', ascending=True)

fig1 = px.bar(
    barrier_data_export,
    y='Barrier',
    x='Percentage',
    orientation='h',
    title='Professional Development Barriers (U.S. Middle School Teachers)',
    color='Percentage',
    color_continuous_scale='Reds',
    text='Percentage'
)
fig1.update_traces(texttemplate='%{text:.1f}%', textposition='outside')
fig1.update_layout(height=500, width=900, showlegend=False, xaxis_range=[0, 50])

pio.write_image(fig1, '../outputs/figures/01_barriers_overall.png', width=900, height=500, scale=2)
print("Saved: 01_barriers_overall.png")

# 2. Barriers by experience
df_exp_comparison_export = df_exp_barriers[
    df_exp_barriers['Experience'].isin(['Early-Career (0-5 years)', 'Veteran (15+ years)'])
]

fig2 = px.bar(
    df_exp_comparison_export,
    x='Barrier',
    y='Percentage',
    color='Experience',
    barmode='group',
    title='PD Barriers: Early-Career vs. Veteran Teachers',
    color_discrete_map={
        'Early-Career (0-5 years)': '#3498db',
        'Veteran (15+ years)': '#e74c3c'
    },
    text='Percentage'
)
fig2.update_traces(texttemplate='%{text:.0f}%', textposition='outside')
fig2.update_layout(height=500, width=1000, xaxis=dict(tickangle=-45), yaxis=dict(range=[0, 55]))

pio.write_image(fig2, '../outputs/figures/02_barriers_by_experience.png', width=1000, height=500, scale=2)
print("Saved: 02_barriers_by_experience.png")

# 3. Science vs Non-Science
fig3 = px.bar(
    df_science_barriers,
    x='Barrier',
    y='Percentage',
    color='Teacher Type',
    barmode='group',
    title='PD Barriers: Science vs. Non-Science Teachers',
    color_discrete_map={
        'Science Teacher': '#2ecc71',
        'Non-Science Teacher': '#95a5a6'
    },
    text='Percentage'
)
fig3.update_traces(texttemplate='%{text:.0f}%', textposition='outside')
fig3.update_layout(height=500, width=1000, xaxis=dict(tickangle=-45), yaxis=dict(range=[0, 55]))

pio.write_image(fig3, '../outputs/figures/03_barriers_science_comparison.png', width=1000, height=500, scale=2)
print("Saved: 03_barriers_science_comparison.png")

# 4. Science teachers by experience
df_science_exp_comparison_export = df_science_exp[
    df_science_exp['Experience'].isin(['Early-Career (0-5 years)', 'Veteran (15+ years)'])
]

fig4 = px.bar(
    df_science_exp_comparison_export,
    x='Barrier',
    y='Percentage',
    color='Experience',
    barmode='group',
    title='PD Barriers: Early-Career vs. Veteran Science Teachers',
    color_discrete_map={
        'Early-Career (0-5 years)': '#3498db',
        'Veteran (15+ years)': '#e74c3c'
    },
    text='Percentage'
)
fig4.update_traces(texttemplate='%{text:.0f}%', textposition='outside')
fig4.update_layout(height=500, width=1000, xaxis=dict(tickangle=-45), yaxis=dict(range=[0, 55]))

pio.write_image(fig4, '../outputs/figures/04_barriers_science_by_experience.png', width=1000, height=500, scale=2)
print("Saved: 04_barriers_science_by_experience.png")

print("\nAll visualizations exported!")

Exporting visualizations...
Saved: 01_barriers_overall.png
Saved: 02_barriers_by_experience.png
Saved: 03_barriers_science_comparison.png
Saved: 04_barriers_science_by_experience.png

All visualizations exported!
