In this notebook, I'll be taking the original cleaned NS data and creating partial dataframes.
For each topic, I'll
1. create a complete dataframe (which is just a subset from the Cleaned_NS.csv (save as origianl_topic_data.csv);
2. consolidate the data in all columns except "other" column;
3. save the consolidated dataframe into a csv file (topic_consolidated.csv)
4. save each of the above dataframes into csv files as indicated in the brackets.

In [6]:
import pandas as pd
import numpy as np

In [7]:
# import NS data:
NS = pd.read_csv('Cleaned_NS.csv')

In [8]:
# I'll create a list of questions to consolidate their answers:
list_to_consolidate = ['access',
                       'mode_transportation', 
                       'other_services',
                       'disease',
                       'service_enough',
                       'food_type',
                       'education',
                       'disability',
                       'ethnicity',
                       'income']

In [9]:
# create a function to consolidate the data:
def consolidate_row(row):
    """ 
    Consolidate answers from a DataFrame row, excluding the first and last columns.

    This function processes a row from a DataFrame to:
    1. Create a list of consolidated answers from specified columns, excluding the "participant_ID" (first column) and the last column.
       - Only non-null and non-empty values are included in the list.
    2. Count the number of answers provided.
    3. If the consolidated list contains "prefer not to answer", the list is updated to contain only this value.
    4. Return the consolidated answers as a single string and the count of answers.

    Parameters:
    row (pd.Series): A row from a DataFrame containing answers to be consolidated.

    Returns:
    tuple: A tuple containing:
           - A string of consolidated answers, joined by '; '.
           - An integer count of the number of answers provided.
    """
        
    consolidated = []
    
    # Loop through the specified columns excluding the "participant_ID" and last columns:
    for value in row.iloc[1:]:
        if pd.notna(value) and value != '':
            consolidated.append(value)
    
    # finding out how many answers were provided:
    number_of_answers = len(consolidated)
    
    # updating the records for "prefer not to answer":
    if 'prefer not to answer' in consolidated:
        consolidated = ['prefer not to answer']
    
    return '; '.join(consolidated) , number_of_answers

In [10]:
# for each question in list_to_consolidate, consolidate, and create csv filed with final result:

for x in list_to_consolidate:
    
    # select specific columns for the topic x:
    columns = [col for col in NS.columns if col.startswith('participant') or col.startswith(x)]
   
    # extract data related to topic x:
    df = NS[columns]
     
    # print columns:
    print(f'Topic {x} columns are: \n {df.columns}')
    
    # create an original dataframe with all topic's columns:
    df.to_csv(f'original_{x}_data.csv', index = False)
    
    # Apply the consolidate_row function to each row and create new columns:
    df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
        lambda row: pd.Series(consolidate_row(row)), axis=1 )
    
    # Now, we can drop all the columns which have been consolidated:
    df.drop(df.columns[1:-3], axis = 1, inplace = True)
    
    # save the data into a csv file:
    df.to_csv(f'{x}_consolidated.csv', index = False)
    
    # print confirmation:
    print(f'all files related to topic {x} were created successfully! \n ')
    
print('all topics have been successfully consolidated and saved into csv files \n')
    
    

Topic access columns are: 
 Index(['participant_ID', 'access_difficulty_none',
       'access_difficulty_language', 'access_difficulty_physical_disability',
       'access_difficulty_safety_concerns', 'access_difficulty_transportation',
       'access_difficulty_hours_of_operation',
       'access_difficulty_prefer_not_to_answer', 'access_difficulty_others'],
      dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(df.columns[1:-3], axis = 1, inplace = True)


all files related to topic access were created successfully! 
 
Topic mode_transportation columns are: 
 Index(['participant_ID', 'mode_transportation_walk',
       'mode_transportation_cycle', 'mode_transportation_public',
       'mode_transportation_private',
       'mode_transportation_prefer_not_to_answer',
       'mode_transportation_others'],
      dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(df.columns[1:-3], axis = 1, inplace = True)


all files related to topic mode_transportation were created successfully! 
 
Topic other_services columns are: 
 Index(['participant_ID', 'other_services_housing_utilities',
       'other_services_computers_internet',
       'other_services_tax_clinic_financial_services', 'other_services_legal',
       'other_services_navigation', 'other_services_employment_income_support',
       'other_services_education', 'other_services_childcare',
       'other_services_none', 'other_services_prefer_not_to_answer',
       'other_services_other'],
      dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(df.columns[1:-3], axis = 1, inplace = True)


all files related to topic other_services were created successfully! 
 
Topic disease columns are: 
 Index(['participant_ID', 'disease_prefer_not_to_answer', 'disease_diabetes',
       'disease_high_blood_pressure', 'disease_heart', 'disease_none',
       'disease_other'],
      dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(df.columns[1:-3], axis = 1, inplace = True)


all files related to topic disease were created successfully! 
 
Topic service_enough columns are: 
 Index(['participant_ID', 'service_enough_none_of_the_above',
       'service_enough_yes', 'service_enough_no', 'service_enough_sometime',
       'service_enough_prefer_not_to_answer', 'service_enough_other'],
      dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(df.columns[1:-3], axis = 1, inplace = True)


all files related to topic service_enough were created successfully! 
 
Topic food_type columns are: 
 Index(['participant_ID', 'food_type_halal', 'food_type_kosher',
       'food_type_vegan_vegetarian', 'food_type_medical_condition',
       'food_type_allergen_free', 'food_type_country', 'food_type_not_special',
       'food_type_prefer_not_to_answer', 'food_type_other'],
      dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(df.columns[1:-3], axis = 1, inplace = True)


all files related to topic food_type were created successfully! 
 
Topic education columns are: 
 Index(['participant_ID', 'education_high_school_some',
       'education_high_school_completed', 'education_college_some',
       'education_college_completed', 'education_trades',
       'education_graduate_education_some',
       'education_graduate_education_completed',
       'education_professional_degree', 'education_prefer_not_to_answer',
       'education_outside_Canada'],
      dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(df.columns[1:-3], axis = 1, inplace = True)


all files related to topic education were created successfully! 
 
Topic disability columns are: 
 Index(['participant_ID', 'disability_none', 'disability_physical',
       'disability_chronic_pain', 'disability_sensory',
       'disability_developmental', 'disability_learning', 'disability_mental',
       'disability_prefer_not_to_answer', 'disability_other'],
      dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(df.columns[1:-3], axis = 1, inplace = True)


all files related to topic disability were created successfully! 
 
Topic ethnicity columns are: 
 Index(['participant_ID', 'ethnicity_Indigenous', 'ethnicity_White/European',
       'ethnicity_Black_African_Caribbean', 'ethnicity_Southeast_Asian',
       'ethnicity_East_Asian', 'ethnicity_South_Asian',
       'ethnicity_Middle_Eastern', 'ethnicity_Latin_American',
       'ethnicity_do_not_know', 'ethnicity_prefer_not_to_answer',
       'ethnicity_other'],
      dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(df.columns[1:-3], axis = 1, inplace = True)


all files related to topic ethnicity were created successfully! 
 
Topic income columns are: 
 Index(['participant_ID', 'income_source_employed_35_hours_plus',
       'income_source_employed_less_35_hours', 'income_source_ODSP',
       'income_source_OW', 'income_source_CERB', 'income_source_scholarship',
       'income_source_OSAP', 'income_source_EI',
       'income_source_family_support', 'income_source_spousal_support',
       'income_source_CCB', 'income_source_OTB', 'income_source_CPP',
       'income_source_private_pension', 'income_source_OAS',
       'income_source_WSIB', 'income_source_disability',
       'income_source_other_government_programs', 'income_source_no_income',
       'income_source_prefer_not_to_answer', 'income_source_other'],
      dtype='object')
all files related to topic income were created successfully! 
 
all topics have been successfully consolidated and saved into csv files 



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[[f'{x}_consolidated', f'{x}_number_of_answers']] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(df.columns[1:-3], axis = 1, inplace = True)
