## Census Testing Version

Link to [Shakespeare data](https://www.kaggle.com/datasets/kingburrito666/shakespeare-plays) on Kaggle

In [26]:
import streamlit as st
# st.set_page_config(layout="wide")
st.set_page_config(page_title="Top Frequency Extractor")
import os.path
import pathlib
import pandas as pd
import numpy as np
import re

from sklearn.feature_extraction.text import CountVectorizer
import nltk
nltk.data.path.append('/data/cqa/nltk_data/')
from nltk.corpus import stopwords


x = pd.read_csv('/home/c/chapm356/example_datasets/Shakespeare_data.csv')

In [28]:
x.shape

(111396, 6)

In [29]:
y = pd.read_csv('/home/c/chapm356/example_datasets/Shakespeare_data_short.csv')
y.shape

(3600, 7)

In [1]:
import streamlit as st
# st.set_page_config(layout="wide")
st.set_page_config(page_title="Top Frequency Extractor")
import os.path
import pathlib
import pandas as pd
import numpy as np
import re

from sklearn.feature_extraction.text import CountVectorizer
import nltk
nltk.data.path.append('/data/cqa/nltk_data/')
from nltk.corpus import stopwords

# set stop words & punctuation
stop_words = stopwords.words('english')
my_punctuation = '!"$%&\'()*+,-./:;<=>?[\\]^_`’{|}~•@'
word_rooter = nltk.stem.snowball.PorterStemmer(ignore_stopwords=False).stem

df = pd.read_csv('/home/c/chapm356/example_datasets/Shakespeare_data_short.csv', index_col=0)
df = df[['Play', 'ActSceneLine', 'Player', 'PlayerLine']]
df_example = pd.concat([df[df['Play']==play].head(3) for play in set(df['Play'])]).reset_index(drop=True).head(10)

##################################### APP CODE #########################################

st.write("""
## Extract top *n* words from text responses
###### *App by [Curtiss Chapman](curtiss.a.chapman@census.gov "Email curtiss.a.chapman@census.gov with any questions"), U.S. Census Bureau, Center for Behavioral Science Methods*

With this app, you can upload a .csv file and extract the top n 
most frequent words from a response column. 
""")
   
with st.expander("Instructions"):
    st.write("""
    1. **Select file:** Drag and drop your file onto the box below (or select the file 
    via the "Browse files" button).
        - A preview of your file will appear below the uploader, and a sidebar will appear.
    2. **Select response:** In the sidebar, choose which column contains the response 
    from which you want the most frequent words.
    3. **(Optional) Select grouping variable:** Choose which column contains groups for 
    which you want separate sets of top frequency words. If you don't have a grouping 
    variable, leave blank.
    4. **Input desired # of most frequent words:** Set how many of the top frequency words you want from your responses.
    5. **(Optional) Set additional words to ignore:**: If there are words you would not like to 
    include in the set of top words, type them here separated by a comma and space. Common 
    words like *the, a, an, if*, etc. are excluded by default and do not need to be written 
    in the box.  
    6. **(Optional) Set additional words to flag:**: If there are words that you know are 
    indicative of labels/topics you care about, type them here separated by a comma and 
    space. Columns for these words will be added along with the most frequent words.  
    7. **(Optional) Set number of most frequent 2-grams to include:** 2-grams are sets of two words separated by a space or punctuation, such as "data science" or "higgledy-piggledy". 
    8. **(Optional) Set number of most frequent 3-grams to include:** 3-grams are sets of three words separated by a space or punctuation, such as "secretary of state" or "mother-in-law". 
    9. **(Optional) Choose whether to stem words:** "Stemming" removes inflectional or 
    derivational prefixes and suffixes. Thus, stemming is useful if you want to treat words 
    with different forms as occurrences of the same word. For example, if you want *interrupt*, 
    *uninterrupted*, and *interruption* to all be treated as instances of the same word, you would 
    check the 'Stem' checkbox. If unchecked, these will be treated as different words for the 
    sake of frequency count.
    10. **Press "Get top words":** Press the "Get top words" button.
        - The words "Processing data..." will appear, and the top right corner of the page 
        will indicate that processing is ongoing.
        - The results will appear in a table, which you can scroll through and inspect.
        - A column for each top frequency word will be attached to your file to indicate 
        how many times a given word is present in the response column.
        - If you chose a grouping factor, each group will be output into a separate table 
        that will appear in its own tab.
    11. **Download results:** Type a name in the text input for your file, click outside the text input box, and then hit the 
    "Download CSV" button. Please note that if you save via the download button in the upper 
    right corner of the displayed table, only the first 50 rows of your data will be saved.
    12. **Restart or modify results:** After processing your results, you can restart 
    the process with the "Start Over" button to revise your inputs. Alternatively, you can 
    click the "X" to the right of your uploaded file to remove that file and upload a new file.
    """)
    
with st.expander("Data Format Example"):
    st.write("""
    An example of the expected format for input data is shown below. Your response 
    variable should be stored in a single column. Likewise your grouping variable 
    should be stored in a single column, repeated where it applies to a given response.
    - Here, we might reasonably choose "PlayerLine" as the response variable. 
    - Additionally, we might choose "Play" as a grouping variable. 
    
    ***NOTE:** Please make sure that your data is saved as a .csv before uploading. 
    The code will not handle other file formats.*
    """)
    st.table(df_example)

st.write("### Choose file")

# upload file
uploaded_file = st.file_uploader("Upload your CSV file", type=['.csv'])

# Setup or reset variables while no file is uploaded
if uploaded_file is None:
    # reset all session state variables
    if 'valid_response' not in st.session_state:
        st.session_state.valid_response = False
    st.session_state.valid_response = False
    
    if 'valid_n' not in st.session_state:
        st.session_state.valid_n = False
    st.session_state.valid_n = False
    
    if 'clicked' not in st.session_state:
        st.session_state.clicked = False
    st.session_state.clicked = False
    
    if 'process_button_disabled' not in st.session_state:
        st.session_state.process_button_disabled = False
    st.session_state.process_button_disabled = False
    
    if 'restart_button_disabled' not in st.session_state:
        st.session_state.restart_button_disabled = True
    st.session_state.restart_button_disabled = True
    
    if 'processed' not in st.session_state:
        st.session_state.processed = {}
    st.session_state.processed = {}
    
# when file is uploaded, do things 
if uploaded_file is not None:
    dataset = pd.read_csv(uploaded_file)
    
    # show head of uploaded data 
    st.write("### Uploaded Data Preview")
    st.dataframe(dataset.head())
    
    # add sidebar with drop down menus for response and grouping factor, text input for n
    st.sidebar.write("## Process Your Data")
    
    columns_list = dataset.columns.tolist()
    columns_list.insert(0, '')
    resp_col = st.sidebar.selectbox("Select response column", 
                                    columns_list, 
                                    help="Responses should be the text from which you wish to extract the top words.")
    group_col = st.sidebar.selectbox("(Optional) Select grouping column", 
                                     columns_list, 
                                     help="""
                                     Select a grouping factor if you want the top n words for each group. 
                                     Otherwise, leave blank.
                                     """)
    top_n = st.sidebar.text_input("Input desired # of most frequent words", 
                                  value=20, 
                                  max_chars=3, 
                                  help="Input number must be an integer between 1 and 999.")
        
    st.sidebar.write("#### Advanced Options (Optional)")
    
    added_stopwords = st.sidebar.text_input("Words to ignore", 
                                  value='', 
                                  help="Words typed here will be excluded from list of words added to your dataset. It is sometimes useful to review the words that have been added to your dataset in case they are not useful to you. Input lowercase words separated by comma and space to assure they are excluded from top frequency word set (e.g., census, table, like)")
    str_sep_added_stopwords = added_stopwords.split(', ')
    added_keywords = st.sidebar.text_input("Additional words to flag", 
                                  value='', 
                                  help="Words typed here will be added to the list of words added to your dataset, regardless of their frequency across responses. Input lowercase words separated by comma and space to assure they are excluded from top frequency word set (e.g., census, table, like)")
    str_sep_added_keywords = added_keywords.split(', ')
    top_n2 = st.sidebar.text_input("Number of 2-grams (e.g. United States, hot pot)", 
                              value=0, 
                              max_chars=3, 
                              help="Input number must be an integer between 1 and 999.")
    top_n3 = st.sidebar.text_input("Number of 3-grams (e.g. secretary of state, mother-in-law)", 
                              value=0, 
                              max_chars=3, 
                              help="Input number must be an integer between 1 and 999.")
    stem_checkbox = st.sidebar.checkbox('Stem words')
    
    # Check that response is selected
    if resp_col != '':
        st.session_state.valid_response = True
    else:
        st.session_state.valid_response = False
        
    # Check for valid n
    n_valid = not bool(re.search(r'\D+', top_n))
    n2_valid = not bool(re.search(r'\D+', top_n2))
    n3_valid = not bool(re.search(r'\D+', top_n3))
    if not n_valid | n2_valid | n3_valid:
        st.sidebar.write("Please input valid integer for n.")
        st.session_state.valid_n = False
    else:
        top_n = int(top_n)
        top_n2 = int(top_n2)
        top_n3 = int(top_n3)
        st.session_state.valid_n = True
            
    # Show button with valid response & n
    def click_button():
        st.session_state.clicked = True
        st.session_state.process_button_disabled = True
        
    if (st.session_state.valid_response) & (st.session_state.valid_n):
        st.sidebar.button("Get top words", on_click=click_button, disabled=st.session_state.process_button_disabled)
     
    ############################## GET TOP WORDS ##############################
    
    # define important functions
    def remove_stopwords(txt):
        return ' '.join([word for word in txt.split(' ') if word not in stop_words + str_sep_added_stopwords])
    
    def stem_words(txt):
        return ' '.join([word_rooter(wd) for wd in txt.split(' ')])
    
    def count_top_n(vec, dataset=dataset, cat='', group_col='', top_n=top_n):
        if (cat != '') & (group_col != ''):
            vec = vec[dataset[group_col]==cat]
            cat = "gp_" + cat + '_'
            
        # set cap on document frequency based on dataset size
        if len(vec) < 10000:
            min_df = 1
        elif len(vec) < 100000:
            min_df = 5
        else:
            min_df = 10
            
        # set ngram range based on ngram input
        if top_n3 > 0:
            ngram_range = (1,3)
        elif top_n2 > 0:
            ngram_range = (1,2)
        else:
            ngram_range = (1,1)
        count_vec = CountVectorizer(ngram_range=ngram_range, 
                              decode_error='ignore',
                                   min_df = min_df)
        dummy_matrix = count_vec.fit_transform(vec).toarray()
        df_matrix = pd.DataFrame(dummy_matrix, columns=count_vec.get_feature_names_out())
        df_matrix = df_matrix.astype(bool).astype(int) # change to doc freq before sum
        
        # retrieve top words based on ngram input
        sorted_wds = df_matrix.sum(axis=0).sort_values(ascending=False)
        sorted_1grams = sorted_wds[[wd for wd in sorted_wds.index if len(wd.split(' '))==1]]
        top_1grams = sorted_1grams.head(top_n).index.tolist()
        if top_n3 > 0:
            sorted_3grams = sorted_wds[[wd for wd in sorted_wds.index if len(wd.split(' '))==3]]
            top_3grams = sorted_3grams.head(top_n3).index.tolist()
        else:
            top_3grams = []
        if top_n2 > 0:
            sorted_2grams = sorted_wds[[wd for wd in sorted_wds.index if len(wd.split(' '))==2]]
            top_2grams = sorted_2grams.head(top_n2).index.tolist()
        else:
            top_2grams = []
        top_n_wds = top_1grams + top_2grams + top_3grams
        
        # add custom keywords
        if str_sep_added_keywords[0]!='':
            top_n_wds = top_n_wds + str_sep_added_keywords
        df_matrix_top_n = df_matrix[top_n_wds]
        df_matrix_top_n.columns = [f"{cat}wd_{i+1}_{col.replace(' ', '_')}" for i, col in enumerate(df_matrix_top_n.columns.values)]
        response_top_n = pd.concat([vec.reset_index(drop=True), df_matrix_top_n], axis=1)
        return response_top_n    
    
    # On click, get top words
    processing_placeholder = st.empty()
    
    if st.session_state.clicked:
        processing_placeholder.write("Processing data...")
        my_bar = st.progress(0, text='Subsetting data...')
        
        # subset dataset 
        dataset = dataset.reset_index(drop=True)
        response = dataset[resp_col].copy() # response

        # preprocess responses
        # lowercase
        response = response.str.lower()
        my_bar.progress(10, text='Removing potential encoding errors...')
        # get rid of weird encoding errors
#         response = response.apply(lambda x: str(x).encode('cp1252', 'backslashreplace').decode('utf-8','backslashreplace'))
        my_bar.progress(20, text='Stripping punctuation...')
        # strip punctuation
        response = response.apply(lambda x: re.sub(f'[{my_punctuation}]+', '', str(x)).strip())
        my_bar.progress(30, text='Removing stopwords...')
        # remove stopwords
        response = response.apply(remove_stopwords)
        my_bar.progress(40, text='Getting word counts...')
        # stem
        if stem_checkbox:
            response = response.apply(stem_words)
            my_bar.progress(60, text='Stemming words...')

        # get word count across responses
        ## if there isn't a grouping factor, treat as a single dataset
        if len(group_col) == 0:
            response_top_n = count_top_n(response)
            y = pd.concat([dataset.rename(columns={resp_col:f"{resp_col}_orig"}), response_top_n], axis=1)
            y = y.rename(columns={resp_col:f"{resp_col}_clean"})
            my_bar.progress(100, text='Finishing up...')

        ## if there is a grouping factor, get words for separate datasets
        elif len(group_col) != 0:
            g = dataset[group_col].copy() # grouping factor
            dummy_matrices_d = {}
            progress_addition = int(np.floor(50/len(set(g))))
            for i, cat in enumerate(set(g)): 
                print(f"Getting top {top_n} words for category: {cat} ")
                curr_response_top_n = count_top_n(response, cat=cat, group_col=group_col)
                curr_dataset = dataset[dataset[group_col]==cat]
                curr_dataset = curr_dataset.drop(columns=[group_col]).rename(columns={resp_col:f"{resp_col}_orig"}).reset_index()
                curr_y = pd.concat([curr_dataset, 
                                    curr_response_top_n], axis=1)
                curr_y = curr_y.rename(columns={resp_col:f"{resp_col}_clean"})
                dummy_matrices_d[cat] = curr_y
                my_bar.progress(50+(i*progress_addition), text=f"Getting top {top_n} words for category: {cat}...")
            my_bar.progress(100, text='Finishing up...')
            y = dummy_matrices_d

        # assign result to session state
        st.session_state.processed['top_n'] = y
        st.session_state.clicked = False
    
    # show that process was successful
    if 'top_n' in st.session_state.processed:
        st.success('Data successfully processed!', icon="✅")
        
        processing_placeholder.empty()
        y = st.session_state.processed['top_n']
        
        if isinstance(st.session_state.processed['top_n'], pd.core.frame.DataFrame):
            st.write(f"## Results with top {top_n} words attached")
            st.write("""
            - Scroll left and right to see table columns.
            - Save files by clicking download icon in top right corner of the displayed table.
            """)
            st.dataframe(st.session_state.processed['top_n'].head(50))
            typed_name = st.text_input("Input desired file name", 
                                  value='Top_n_words_data',
                                  help="File name to be saved when downloaded. Please click out of box after changing name before clicking download button. Please omit '.csv' from the name, as it will be added automatically.")
            st.download_button(
                label="Download CSV",
                data=st.session_state.processed['top_n'].to_csv().encode('utf-8'),
                file_name=f'{typed_name}.csv',
                mime='text/csv',
                key='1'
            )
        elif isinstance(y, dict):
            st.write(f"## Results with top {top_n} words for each group attached")
            st.write("""
            - Scroll left and right to see table columns. Hover over table to see scroll bars.
            - Save files by clicking download icon in top right corner of the displayed table.
            - To scroll left and right between long list of tabs, hover and 
            hold shift while using the scroll wheel. 
            """)
            # set up tabs with results
            n_tabs = len(y)
            tabs = st.tabs([k for k in y])
            for i, k in enumerate(y):
                with tabs[i]:
                    st.write(f"#### {k}")
                    st.dataframe(y[k].head(50))
                    typed_name = st.text_input("Input desired file name", 
                                  value='Top_n_words_data',
                                  help="File name to be saved when downloaded. Please click out of box after changing name before clicking download button. Please omit '.csv' from the name, as it will be added automatically.",
                                  key=f'name_{i}')
                    st.download_button(
                        label="Download CSV for this group",
                        data=y[k].to_csv().encode('utf-8'),
                        file_name=f'{typed_name}.csv',
                        mime='text/csv',
                        key=f'DL_{i}'
                    )
        st.session_state.restart_button_disabled = False

    def restart_button():
        st.session_state.clicked = False
        st.session_state.process_button_disabled = False
        st.session_state.processed = {}
        st.session_state.restart_button_disabled = True
        
    if not st.session_state.restart_button_disabled:
        st.sidebar.button("Start Over", on_click=restart_button, disabled=st.session_state.restart_button_disabled)
        

2024-12-06 13:29:08.889 
  command:

    streamlit run /apps/miniconda3/envs/nlp/lib/python3.11/site-packages/ipykernel_launcher.py [ARGUMENTS]
2024-12-06 13:29:08.895 Session state does not function when running a script without `streamlit run`


## Alternative layout with 3 columns

In [1]:
import streamlit as st
st.set_page_config(layout="wide")
import os.path
import pathlib
import pandas as pd
import numpy as np
import re

from sklearn.feature_extraction.text import CountVectorizer
import nltk
nltk.data.path.append('/data/cqa/nltk_data/')
from nltk.corpus import stopwords

# set stop words & punctuation
stop_words = stopwords.words('english')
my_punctuation = '!"$%&\'()*+,-./:;<=>?[\\]^_`’{|}~•@'

##################################### APP CODE #########################################

col1, col2, col3 = st.columns([4,2,4], gap='medium')

col1.write("""
## Extract top *n* words from text responses
###### *App by Curtiss Chapman, U.S. Census Bureau, Center for Behavioral Science Methods*
With this app, you can upload a csv file and extract the top n 
most frequent words from a response column. The steps are simple:
1. **Select file:** Drag and drop your file onto the box below (or select the file 
via the "Browse files" button).
    - A preview of your file will appear below the uploader.
2. **Select response:** In the sidebar, choose which column contains the response 
from which you want the most frequent words.
3. **(Optionally) Select grouping variable:** Choose which column contains groups for 
which you want separate sets of top frequency words.
4. **Set *n*:** Set how many of the top frequency words you want from your responses.
5. **Press go:** Press the "Get top words" button.
    - The words "Processing data..." will appear, and the top right corner of the page 
    will indicate that processing is ongoing.
    - The results will appear in a table, which you can scroll through and inspect.
    - A column for each top frequency word will be attached to your file to indicate 
    how many times a given word is present in the response column.
    - If you chose a grouping factor, each group will be output into a separate table 
    that will appear in its own tab.
6. **Download results:** Save results tables by hovering over the table and selecting 
the Download CSV icon that appears in the upper right corner of the table. 
(Hover over icons to see their functions).
7. **Restart or modify results:** After processing your results, you can restart 
the process with the "Start Over" button to revise your inputs. Alternatively, you can 
click the "X" to the right of your uploaded file to remove that file and upload a new file.
""")

col2.write("### Choose file")
           
# upload file
uploaded_file = col2.file_uploader("Upload your CSV file", type=['.csv'])

# Setup or reset variables while no file is uploaded
if uploaded_file is None:
    # reset all session state variables
    if 'valid_response' not in st.session_state:
        st.session_state.valid_response = False
    st.session_state.valid_response = False
    
    if 'valid_n' not in st.session_state:
        st.session_state.valid_n = False
    st.session_state.valid_n = False
    
    if 'clicked' not in st.session_state:
        st.session_state.clicked = False
    st.session_state.clicked = False
    
    if 'process_button_disabled' not in st.session_state:
        st.session_state.process_button_disabled = False
    st.session_state.process_button_disabled = False
    
    if 'restart_button_disabled' not in st.session_state:
        st.session_state.restart_button_disabled = True
    st.session_state.restart_button_disabled = True
    
    if 'processed' not in st.session_state:
        st.session_state.processed = {}
    st.session_state.processed = {}
    
# when file is uploaded, do things 
if uploaded_file is not None:
    dataset = pd.read_csv(uploaded_file, index_col=0)
    
    # show head of uploaded data 
    col2.write("### Uploaded Data Preview")
    col2.dataframe(dataset.head())
    
    # add sidebar with drop down menus for response and grouping factor, text input for n
    col2.write("## Process Your Data")
    
    columns_list = dataset.columns.tolist()
    columns_list.insert(0, '')
    resp_col = col2.selectbox("Select response column", 
                                    columns_list, 
                                    help="Responses should be the text from which you wish to extract the top words.")
    group_col = col2.selectbox("Select grouping column", 
                                     columns_list, 
                                     help="""
                                     Select a grouping factor if you want the top n words for each group. 
                                     Otherwise, leave blank.
                                     """)
    top_n = col2.text_input("Input desired # of most frequent words", 
                                  value=20, 
                                  max_chars=3, 
                                  help="Input number must be an integer between 1 and 999.")
    
    # Check that response is selected
    if resp_col != '':
        st.session_state.valid_response = True
    else:
        st.session_state.valid_response = False
        
    # Check for valid n
    n_valid = not bool(re.search(r'\D+', top_n))
    if not n_valid:
        col2.write("Please input valid integer for n.")
        st.session_state.valid_n = False
    else:
        top_n = int(top_n)
        st.session_state.valid_n = True
            
    # Show button with valid response & n
    def click_button():
        st.session_state.clicked = True
        st.session_state.process_button_disabled = True
        
    if (st.session_state.valid_response) & (st.session_state.valid_n):
        col2.button("Get top words", on_click=click_button, disabled=st.session_state.process_button_disabled)
     
    ############################## GET TOP WORDS ##############################
    
    # define important functions
    def remove_stopwords(txt):
        return ' '.join([word for word in txt.split(' ') if word not in stop_words])
    
    def count_top_n(vec, dataset=dataset, cat='', group_col='', top_n=top_n):
        if (cat != '') & (group_col != ''):
            vec = vec[dataset[group_col]==cat]
            cat = "cat_" + cat + '_'
        count_vec = CountVectorizer(ngram_range=(1,3), 
                              decode_error='ignore')
        dummy_matrix = count_vec.fit_transform(vec).toarray()
        df_matrix = pd.DataFrame(dummy_matrix, columns=count_vec.get_feature_names_out())
        top_n_wds = df_matrix.sum(axis=0).sort_values(ascending=False).head(top_n).index.values
        df_matrix_top_n = df_matrix[top_n_wds]
        df_matrix_top_n.columns = [f"{cat}wd_{i+1}_{col.replace(' ', '_')}" for i, col in enumerate(df_matrix_top_n.columns.values)]
        response_top_n = pd.concat([vec.reset_index(drop=True), df_matrix_top_n], axis=1)
        return response_top_n    
    
    # On click, get top words
    processing_placeholder = col3.empty()
    
    if st.session_state.clicked:
        processing_placeholder.write("Processing data...")
        
        # subset dataset 
        dataset = dataset.reset_index(drop=True)
        response = dataset[resp_col].copy() # response

        # preprocess responses
        # lowercase
        response = response.str.lower()
        # get rid of weird encoding errors
        response = response.apply(lambda x: x.encode('cp1252','backslashreplace').decode('utf-8','backslashreplace'))
        # strip punctuation
        response = response.apply(lambda x: re.sub(f'[{my_punctuation}]+', '', x).strip())
        # remove stopwords
        response = response.apply(remove_stopwords)

        # get word count across responses
        ## if there isn't a grouping factor, treat as a single dataset
        if len(group_col) == 0:
            response_top_n = count_top_n(response)
            y = pd.concat([dataset.drop(columns=[resp_col]), response_top_n], axis=1)

        ## if there is a grouping factor, get words for separate datasets
        elif len(group_col) != 0:
            g = dataset[group_col].copy() # grouping factor
            dummy_matrices_d = {}
            for cat in set(g): 
                print(f"Getting top {top_n} words for category: {cat} ")
                curr_response_top_n = count_top_n(response, cat=cat, group_col=group_col)
                curr_dataset = dataset[dataset[group_col]==cat]
                curr_dataset = curr_dataset.drop(columns=[group_col, resp_col]).reset_index()
                curr_y = pd.concat([curr_dataset, 
                                    curr_response_top_n], axis=1)
                dummy_matrices_d[cat] = curr_y
            y = dummy_matrices_d

        # assign result to session state
        st.session_state.processed['top_n'] = y
        st.session_state.clicked = False
    
    # show that process was successful
    if 'top_n' in st.session_state.processed:
        col3.success('Data successfully processed!', icon="✅")
        
        processing_placeholder.empty()
        y = st.session_state.processed['top_n']
        
        if isinstance(st.session_state.processed['top_n'], pd.core.frame.DataFrame):
            col3.write(f"## Results with top {top_n} words attached")
            col3.write("""
            - Scroll left and right to see table columns.
            - Save files by clicking download icon in top right corner of the displayed table.
            """)
            col3.dataframe(st.session_state.processed['top_n'])
        elif isinstance(y, dict):
            col3.write(f"## Results with top {top_n} words for each group attached")
            col3.write("""
            - Scroll left and right to see table columns.
            - Save files by clicking download icon in top right corner of the displayed table.
            """)
            # set up tabs with results
            n_tabs = len(y)
            tabs = col3.tabs([k for k in y])
            for i, k in enumerate(y):
                with tabs[i]:
                    st.write(f"#### {k}")
                    st.dataframe(y[k])
        st.session_state.restart_button_disabled = False

    def restart_button():
        st.session_state.clicked = False
        st.session_state.process_button_disabled = False
        st.session_state.processed = {}
        st.session_state.restart_button_disabled = True
        
    if not st.session_state.restart_button_disabled:
        col2.button("Start Over", on_click=restart_button, disabled=st.session_state.restart_button_disabled)
        

2024-04-19 10:26:05.616 
  command:

    streamlit run /apps/miniconda3/envs/nlp/lib/python3.11/site-packages/ipykernel_launcher.py [ARGUMENTS]
2024-04-19 10:26:05.618 Session state does not function when running a script without `streamlit run`
