# Part 2: Data Exploration and Cleaning

**Name:** Brayden Uglione

**Date:** 10/1/24

**Exercise:** Data Cleaning with Pandas  

**Purpose:** Explore and clean survey data from computing and non-computing majors.

## Data Import
Import libraries and read survey results into dataframes.

In [7]:
import pandas as pd

# Read in the datasets
df0 = pd.read_csv('Non-Majors Survey Results/Non-Majors Survey Results - Fall 2020.csv')
df1 = pd.read_csv('Non-Majors Survey Results/Non-Majors Survey Results - Fall 2021.csv', encoding='latin-1')
df2 = pd.read_csv('Non-Majors Survey Results/Non-Majors Survey Results - Fall 2022.csv')
df3 = pd.read_csv('Non-Majors Survey Results/Non-Majors Survey Results - Fall 2023.csv')

# Combine all dataframes
df = pd.concat([df0, df1, df2, df3], ignore_index=True)

df.to_csv('concat_non_majors_survey_results.csv', index=False)

## Data Exploration
Explore the dataset to understand its structure and contents.

In [8]:
# Display basic information
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 564 entries, 0 to 563
Columns: 130 entries, Timestamp to Did you receive information about CCM computing classes from any of the following sources? [Other]
dtypes: float64(4), object(126)
memory usage: 572.9+ KB
None


In [9]:
# Show first few rows
print(df.head())

                    Timestamp  Which course are you currently enrolled in?  \
0  2020/07/08 10:30:22 AM EST  CMP 135 Computer Concepts with Applications   
1  2020/07/08 11:15:08 AM EST  CMP 135 Computer Concepts with Applications   
2  2020/07/08 11:22:23 AM EST  CMP 135 Computer Concepts with Applications   
3   2020/07/08 2:14:53 PM EST  CMP 135 Computer Concepts with Applications   
4   2020/07/08 3:34:11 PM EST  CMP 135 Computer Concepts with Applications   

  What motivated you to seek a computing class at CCM?  \
0   It’s a required class for the degree I’m seeking     
1  It’s a required class for the degree I’m seeki...     
2  To keep current in computing skills;Career Adv...     
3   It’s a required class for the degree I’m seeking     
4   It’s a required class for the degree I’m seeking     

  Prior to applying to college, did you participate in any of the following events or activities at the County College of Morris and/or with the Department of Information Technologie

In [10]:
# Display summary statistics
print(df.describe())

       On a scale of 1 to 5, with 1 being not at all interested and 5 being extremely interested, how interested are you taking more courses in Computer Science, Information Technology or Game Development?  \
count                                         564.000000                                                                                                                                                        
mean                                            2.581124                                                                                                                                                        
std                                             1.219369                                                                                                                                                        
min                                             1.000000                                                                                                            

In [11]:
# Show column names and data types
print(df.columns)

Index(['Timestamp', 'Which course are you currently enrolled in?',
       'What motivated you to seek a computing class at CCM?',
       'Prior to applying to college, did you participate in any of the following events or activities at the County College of Morris and/or with the Department of Information Technologies, if at all? [Open House]',
       'Prior to applying to college, did you participate in any of the following events or activities at the County College of Morris and/or with the Department of Information Technologies, if at all? [Instant Decision Day]',
       'Prior to applying to college, did you participate in any of the following events or activities at the County College of Morris and/or with the Department of Information Technologies, if at all? [On-Campus Information Session]',
       'Prior to applying to college, did you participate in any of the following events or activities at the County College of Morris and/or with the Department of Information Technologies,

## Data Cleaning
Clean the dataset by renaming columns, removing irrelevant features, and condensing values.

In [12]:
# Rename columns to lowercase with underscores
df.columns = df.columns.str.lower().str.replace(' ', '_').str.replace('[^a-z0-9_]', '').str.replace('?', '')

# Identify and remove irrelevant columns
columns_to_drop = ['timestamp']
for col in df.columns:
    if col.startswith('timestamp') or col in ['.1', '.2', '.3']:
        columns_to_drop.append(col)
df = df.drop(columns=columns_to_drop, errors='ignore')

# Clean and condense course names
course_column = 'which_course_are_you_currently_enrolled_in'
if course_column in df.columns:
    df[course_column] = df[course_column].replace({
        'CMP 126 Computer Technology and Applications': 'CMP 126',
        'CMP 101 Computer Information Literacy': 'CMP 101',
        'CMP 135 Computer Concepts with Applications': 'CMP 135',
    })

# Clean and condense motivation responses
motivation_columns = [col for col in df.columns if col.startswith('what_motivated_you_to_seek_a_computing_class_at_ccm')]
for col in motivation_columns:
    df[col] = df[col].replace({'Yes': 1, 'No': 0})

# Handle missing values
df = df.fillna('Unknown')

# Save cleaned dataset
df.to_csv('cleaned_non_majors_survey_results.csv', index=False)

# Display info of cleaned dataset
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 564 entries, 0 to 563
Columns: 128 entries, which_course_are_you_currently_enrolled_in to did_you_receive_information_about_ccm_computing_classes_from_any_of_the_following_sources_[other]
dtypes: float64(1), object(127)
memory usage: 564.1+ KB
None


  df[col] = df[col].replace({'Yes': 1, 'No': 0})
