# CSV Data Preprocessing with Python and Pandas

In this Jupyter notebook, we will begin by preprocessing the CSV (Comma-Separated Values) dataset found in "data.csv." The preprocessing step involves converting numerical data into text or applying any necessary data transformations before proceeding with data analysis.

CSV files are a common format for storing structured data, and preprocessing is often required to clean and prepare the data for analysis. We will use Python and the Pandas library to perform these preprocessing tasks.

Throughout this notebook, we will cover the following steps:
1. Loading CSV Data: How to read data from "data.csv" into a Pandas DataFrame.
2. Data Preprocessing: Converting numerical data into text or applying necessary transformations.
3. Data Exploration: Exploring the preprocessed dataset, including basic statistics and data structure.
4. Data Cleaning: Handling missing values, duplicates, and inconsistent data.
5. Data Manipulation: Performing operations on the data, such as filtering, sorting, and grouping.
6. Data Visualization: Creating informative plots and visualizations to understand the data.
7. Exporting Data: Saving our modified data back to a CSV file or other formats.

Let's start by importing the necessary libraries, loading our dataset, and performing the initial data preprocessing tasks!


In [1]:
import pandas as pd

# Replace 'data.csv' with the actual path to your CSV file if it's not in the same directory as your Jupyter notebook.
file_path = 'data.csv'

# Load the CSV file into a Pandas DataFrame
df = pd.read_csv(file_path)

# Display the first few rows of the DataFrame to inspect the data
df.head()

Unnamed: 0,STUDENT ID,1,2,3,4,5,6,7,8,9,...,23,24,25,26,27,28,29,30,COURSE ID,GRADE
0,STUDENT1,2,2,3,3,1,2,2,1,1,...,1,1,3,2,1,2,1,1,1,1
1,STUDENT2,2,2,3,3,1,2,2,1,1,...,1,1,3,2,3,2,2,3,1,1
2,STUDENT3,2,2,2,3,2,2,2,2,4,...,1,1,2,2,1,1,2,2,1,1
3,STUDENT4,1,1,1,3,1,2,1,2,1,...,1,2,3,2,2,1,3,2,1,1
4,STUDENT5,2,2,1,3,2,2,1,3,1,...,2,1,2,2,2,1,2,2,1,1


In [6]:
# Define a dictionary to map old column names to new column names
column_mapping = {
    '1': 'Student Age',
    '2': 'Sex',
    '3': 'Graduated high-school type',
    '4': 'Scholarship type',
    '5': 'Additional work',
    '6': 'Regular artistic or sports activity',
    '7': 'Do you have a partner',
    '8': 'Total salary if available',
    '9': 'Transportation to the university',
    '10': 'Accommodation type in Cyprus',
    '11': 'Mothers’ education',
    '12': 'Fathers’ education',
    '13': 'Number of sisters/brothers (if available)',
    '14': 'Parental status',
    '15': 'Mothers’ occupation',
    '16': 'Fathers’ occupation',
    '17': 'Weekly study hours',
    '18': 'Reading frequency (non-scientific books/journals)',
    '19': 'Reading frequency (scientific books/journals)',
    '20': 'Attendance to the seminars/conferences related to the department',
    '21': 'Impact of your projects/activities on your success',
    '22': 'Attendance to classes',
    '23': 'Preparation to midterm exams 1',
    '24': 'Preparation to midterm exams 2',
    '25': 'Taking notes in classes',
    '26': 'Listening in classes',
    '27': 'Discussion improves my interest and success in the course',
    '28': 'Flip-classroom',
    '29': 'Cumulative grade point average in the last semester (/4.00)',
    '30': 'Expected Cumulative grade point average in the graduation (/4.00)',
    '31': 'Course ID',
    '32': 'Grade'
}

# Rename the columns using the mapping dictionary
df = df.rename(columns=column_mapping)
df.head()

Unnamed: 0,STUDENT ID,Student Age,Sex,Graduated high-school type,Scholarship type,Additional work,Regular artistic or sports activity,Do you have a partner,Total salary if available,Transportation to the university,...,Preparation to midterm exams 1,Preparation to midterm exams 2,Taking notes in classes,Listening in classes,Discussion improves my interest and success in the course,Flip-classroom,Cumulative grade point average in the last semester (/4.00),Expected Cumulative grade point average in the graduation (/4.00),COURSE ID,GRADE
0,STUDENT1,2,2,3,3,1,2,2,1,1,...,1,1,3,2,1,2,1,1,1,1
1,STUDENT2,2,2,3,3,1,2,2,1,1,...,1,1,3,2,3,2,2,3,1,1
2,STUDENT3,2,2,2,3,2,2,2,2,4,...,1,1,2,2,1,1,2,2,1,1
3,STUDENT4,1,1,1,3,1,2,1,2,1,...,1,2,3,2,2,1,3,2,1,1
4,STUDENT5,2,2,1,3,2,2,1,3,1,...,2,1,2,2,2,1,2,2,1,1


In [5]:
df.head()

Unnamed: 0,STUDENT ID,Student Age,Sex,3,4,5,6,7,8,9,...,23,24,25,26,27,28,29,30,COURSE ID,GRADE
0,STUDENT1,2,2,3,3,1,2,2,1,1,...,1,1,3,2,1,2,1,1,1,1
1,STUDENT2,2,2,3,3,1,2,2,1,1,...,1,1,3,2,3,2,2,3,1,1
2,STUDENT3,2,2,2,3,2,2,2,2,4,...,1,1,2,2,1,1,2,2,1,1
3,STUDENT4,1,1,1,3,1,2,1,2,1,...,1,2,3,2,2,1,3,2,1,1
4,STUDENT5,2,2,1,3,2,2,1,3,1,...,2,1,2,2,2,1,2,2,1,1
