# Part 2: Data Exploration and Cleaning

**Name:** Brayden Uglione

**Date:** 10/1/24

**Exercise:** Data Cleaning with Pandas  

**Purpose:** Explore and clean survey data from computing and non-computing majors.

## Data Import
Import libraries and read survey results into dataframes.

In [6]:
import pandas as pd

# Read in the dataset
df = pd.read_csv('Non-Majors Survey Results/Non-Majors Survey Results - Fall 2023.csv')

## Data Exploration
Explore the dataset to understand its structure and contents.

In [7]:
# Display basic information
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 96 columns):
 #   Column                                                                                                                                                                                                                                                           Non-Null Count  Dtype 
---  ------                                                                                                                                                                                                                                                           --------------  ----- 
 0   Timestamp                                                                                                                                                                                                                                                        105 non-null    object
 1   Which course are you currently enro

In [8]:
# Show first few rows
print(df.head())

                    Timestamp   Which course are you currently enrolled in?  \
0   2023/09/05 9:36:31 PM EST         CMP 101 Computer Information Literacy   
1   2023/09/05 9:42:31 PM EST         CMP 101 Computer Information Literacy   
2  2023/09/05 10:00:08 PM EST         CMP 101 Computer Information Literacy   
3  2023/09/05 11:12:17 PM EST  CMP 126 Computer Technology and Applications   
4  2023/09/05 11:56:29 PM EST  CMP 126 Computer Technology and Applications   

  How did you hear about County College of Morris? [CCM Web site]  \
0                                                Yes                
1                                                 No                
2                                                 No                
3                                                Yes                
4                                                 No                

  How did you hear about County College of Morris? [Social Media]  \
0                                        

In [9]:
# Display summary statistics
print(df.describe())

       On a scale of 1 to 5, with 1 being not at all interested and 5 being extremely interested, how interested are you taking more courses in Computer Science, Information Technology or Game Development?
count                                         105.000000                                                                                                                                                     
mean                                            2.495238                                                                                                                                                     
std                                             1.136152                                                                                                                                                     
min                                             1.000000                                                                                                                        

In [10]:
# Show column names and data types
print(df.columns)

Index(['Timestamp', 'Which course are you currently enrolled in?',
       'How did you hear about County College of Morris? [CCM Web site]',
       'How did you hear about County College of Morris? [Social Media]',
       'How did you hear about County College of Morris? [Community Event]',
       'How did you hear about County College of Morris? [Family member or friend]',
       'How did you hear about County College of Morris? [Current CCM student]',
       'How did you hear about County College of Morris? [CCM Alumni]',
       'How did you hear about County College of Morris? [High School Teacher]',
       'How did you hear about County College of Morris? [High School Counselor]',
       'How did you hear about County College of Morris? [In-app advertisement]',
       'How did you hear about County College of Morris? [Employer]',
       'How did you hear about County College of Morris? [Billboard]',
       'How did you hear about County College of Morris? [Television]',
       'How

## Data Cleaning
Clean the dataset by renaming columns, removing irrelevant features, and condensing values.

In [11]:
# Rename columns to lowercase with underscores
df.columns = df.columns.str.lower().str.replace(' ', '_').str.replace('[^a-z0-9_]', '', regex=True)

# Identify and remove irrelevant columns (if any)
columns_to_drop = ['timestamp']
df = df.drop(columns=columns_to_drop, errors='ignore')

# Clean and condense course names
course_column = 'which_course_are_you_currently_enrolled_in'
if course_column in df.columns:
    df[course_column] = df[course_column].replace({
        'CMP 126 Computer Technology and Applications': 'CMP 126',
        'CMP 101 Computer Information Literacy': 'CMP 101',
        'CMP 135 Computer Concepts with Applications': 'CMP 135',
    })

# Display info of cleaned dataset
print("\nCleaned Dataset Information:")
print(df.info())

# Save cleaned dataset
df.to_csv('cleaned_survey_results.csv', index=False)


Cleaned Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 95 columns):
 #   Column                                                                                                                                                                                                                                                     Non-Null Count  Dtype 
---  ------                                                                                                                                                                                                                                                     --------------  ----- 
 0   which_course_are_you_currently_enrolled_in                                                                                                                                                                                                                 105 non-null    object
 1   how_did_you_hear_about_