<a href="https://colab.research.google.com/github/adrianriverarodriguez/Semantic-Modeling-Community-Response-Survey/blob/Keyword-Analysis/Keywords.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# This code runs a basic thematic analysis using python for a column of open ended responses (e.g., excel or csv)

## Step 1: Load your structured data file (e.g., .xlsx) - You'll need pandas to read excell data

In [None]:
from google.colab import files
import pandas as pd

# Upload Excel file
uploaded = files.upload()

# Get file name
excel_path = next(iter(uploaded))

# Read into DataFrame
df = pd.read_excel(excel_path) #if your Excel file has multiple sheets,
                               #specify the sheet name
                               #(excel_path, sheet_name='Sheet1')


# Preview
df.head()


Saving SDRP Phase III - School Closure Criteria.xlsx to SDRP Phase III - School Closure Criteria.xlsx


Unnamed: 0,Equity,Excellene,Effective Finance,Effective Effecienies,Right Categories?,Additional Categories,Creative Solutions,Unnamed: 7
0,Keep King Arts magnet school which allows a ch...,Keep King Arts magnet school which allows a ch...,Make sure a closing will achieve the needed sa...,Don't disrupt an otherwise high performing uni...,Academic excellence should be weighted the hig...,"Keeping King Arts, overall make the district m...",Maintain closed schools for community centers....,
1,Which buildings are emptiest and will have a h...,Which schools have close enough proximity that...,Which schools would yield a quick sale at a hi...,Schools that are not handicapped accessible an...,Yes,"Whichever schools close, please do a better jo...",Provide community members ample time to surviv...,
2,Every student deserves highly qualified teache...,"Again, first and foremost facilities should be...",Considering both the location of buildings and...,"Building capacity vs enrollment, as well as th...",Yes,,"If a north side school will close, which I und...",
3,,,,,No,"The D65 obsession with race, identity, gender ...",,
4,Establish a clear conceptualization of both: 1...,Identify specific responsibilities of stakehol...,Access and impact upon nearby community. We va...,"Examining areas to increase efficiency (e.g., ...",Unsure,"Especially with aging infrastructure, it becom...",Are there additional opportunities for partner...,


##Step 2 Select and clean you text colum: Pick the columns you want to analyze

In [None]:
# Drop NaN values and convert to strings
excellence_texts = df['Excellene'].dropna().astype(str)


##Step 3: Vectorize the text - Use CountVectorizer to turn text into a format the model can understand

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

# Convert text to a document-term matrix
vectorizer = CountVectorizer(stop_words='english', max_df=0.95, min_df=2)
X = vectorizer.fit_transform(excellence_texts)


##Step 4: Apply Topic Modeling with Latent Diritecht Allocation (LDA) to identify themes

In [None]:
from sklearn.decomposition import LatentDirichletAllocation

# Fit the model — n_components = number of topics/themes you want
lda = LatentDirichletAllocation(n_components=3, random_state=42)
lda.fit(X)


##Step 5: View the top keywords in each topic

In [None]:
# Get feature names
feature_names = vectorizer.get_feature_names_out()

# Function to print top keywords per topic
def get_top_words(model, feature_names, n_top_words=10):
    for topic_idx, topic in enumerate(model.components_):
        top_features = [feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]]
        print(f"Topic {topic_idx + 1}: {', '.join(top_features)}")

get_top_words(lda, feature_names)


Topic 1: students, programs, support, standards, schools, need, needs, grading, based, school
Topic 2: teachers, learning, quality, students, support, school, high, student, teacher, facilities
Topic 3: students, schools, district, school, need, learning, kids, excellence, buildings, class


##Optional: Vizualization Tools