# Survey data opschonen

## Metadata

- **Vak**: ID Datavisualisatie
- **Workshop**: Datavis API Workshop
- **Lecturers**: Laura Benvenuti & Danny de Vries
- **University**: Amsterdam University of Applied Sciences
- **Programme**: Communication and Multimedia Design
- **Faculty**: Digital Media and Creative Industries

## Omschrijving

Dit is een notebook gebasseerd op een notebook voor mijn (Danny's) Master Scriptie. Onderdeel van dat project was een survey onder studenten van een gebouw. 
Dit voorbeeld laat zien hoe je bijv. de data van een survey kunt opschonen en daar wat exploratieve plots van kunt maken. 
De de rauwe .csv komt direct uit Qualtrics dus verder is er geen data cleaning, transformatie gedaan. Alles meteen in Python uitgelezen.

## Taken

Dit notebook doet:

* Een .csv met een export vanuit qualtrics inladen.
* Opschonen tot alleen relevante data, weghalen van metadata zoals tijdstip. locatie etc.
* Verwijderen van respondenten die de survey niet hebben afgemaakt.
* Een schone .csv naar een map op je computer schrijven.

### Checkt python versie

In [9]:
from packaging import version
import platform
import sys

min_version = '3.8'

def check_version(min_version):
    current_version = sys.version.split()[0]
    return version.parse(current_version) >= version.parse(min_version)

# Example usage:
if __name__ == "__main__":
    if check_version(min_version):
        print("Running a sufficiently new version of Python.")
        print("Current version: " + platform.python_version())
        print("Minimum required version: " + min_version)
    else:
        print("Python version is too old. Upgrade to a newer version.")

Running a sufficiently new version of Python.
Current version: 3.12.5
Minimum required version: 3.8


### Packages installeren

In [10]:
!pip install pandas
!pip install seaborn
!pip install matplotlib
!pip install numpy



### Packages importeren

In [11]:
import pandas
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## Opschonen

### .csv inladen en data in variable opslaan

In [12]:
def import_csv(file):
    df = pd.read_csv(file)
    return df

file = '../data/survey_data.csv'

full_data = import_csv(file)

### Metadata verwijderen

In [13]:
data = pd.read_csv(file)

def remove_metadata_columns(data):
    # List of columns to remove
    metadata_columns = [
        'StartDate', 'EndDate', 'Status', 'IPAddress', 'Progress', 
        'Duration (in seconds)', 'RecordedDate', 'ResponseId', 
        'RecipientLastName', 'RecipientFirstName', 'RecipientEmail', 
        'ExternalReference', 'LocationLatitude', 'LocationLongitude', 
        'DistributionChannel', 'UserLanguage'
    ]
    
    # Drop the metadata columns
    data_cleaned = data.drop(columns=metadata_columns, errors='ignore')
    
    # Print the cleaned DataFrame
    print("Cleaned DataFrame:")
    print(data_cleaned)
    
    return data_cleaned

# Example usage
cleaned_data = data_cleaned = remove_metadata_columns(data)
print(cleaned_data);

Cleaned DataFrame:
    Finished                                     Text / Graphic  \
0       True  Yes, I understand, and I agree to participate ...   
1       True  Yes, I understand, and I agree to participate ...   
2       True  Yes, I understand, and I agree to participate ...   
3       True  Yes, I understand, and I agree to participate ...   
4       True  Yes, I understand, and I agree to participate ...   
5       True  Yes, I understand, and I agree to participate ...   
6       True  Yes, I understand, and I agree to participate ...   
7       True  Yes, I understand, and I agree to participate ...   
8       True  Yes, I understand, and I agree to participate ...   
9       True  Yes, I understand, and I agree to participate ...   
10     False  Yes, I understand, and I agree to participate ...   
11      True  Yes, I understand, and I agree to participate ...   
12      True  Yes, I understand, and I agree to participate ...   
13      True  Yes, I understand, and I agre

### Non-consenting respondenten verwijderen

In [14]:
def remove_non_consenting_users(data):
    data_cleaned = data[data['Text / Graphic'].str.contains('Yes, I understand, and I agree to participate in the survey')]
    return data_cleaned

data_cleaned = remove_non_consenting_users(cleaned_data)
print(data_cleaned)

    Finished                                     Text / Graphic  \
0       True  Yes, I understand, and I agree to participate ...   
1       True  Yes, I understand, and I agree to participate ...   
2       True  Yes, I understand, and I agree to participate ...   
3       True  Yes, I understand, and I agree to participate ...   
4       True  Yes, I understand, and I agree to participate ...   
5       True  Yes, I understand, and I agree to participate ...   
6       True  Yes, I understand, and I agree to participate ...   
7       True  Yes, I understand, and I agree to participate ...   
8       True  Yes, I understand, and I agree to participate ...   
9       True  Yes, I understand, and I agree to participate ...   
10     False  Yes, I understand, and I agree to participate ...   
11      True  Yes, I understand, and I agree to participate ...   
12      True  Yes, I understand, and I agree to participate ...   
13      True  Yes, I understand, and I agree to participate ..

### Niet afgemaakte surveys filteren

In [15]:
def remove_unfinished_surveys(data):
    data = data[data['Finished'] == True]    
    return data

data_final = remove_unfinished_surveys(data_cleaned)
print(data_final)

    Finished                                     Text / Graphic  \
0       True  Yes, I understand, and I agree to participate ...   
1       True  Yes, I understand, and I agree to participate ...   
2       True  Yes, I understand, and I agree to participate ...   
3       True  Yes, I understand, and I agree to participate ...   
4       True  Yes, I understand, and I agree to participate ...   
5       True  Yes, I understand, and I agree to participate ...   
6       True  Yes, I understand, and I agree to participate ...   
7       True  Yes, I understand, and I agree to participate ...   
8       True  Yes, I understand, and I agree to participate ...   
9       True  Yes, I understand, and I agree to participate ...   
11      True  Yes, I understand, and I agree to participate ...   
12      True  Yes, I understand, and I agree to participate ...   
13      True  Yes, I understand, and I agree to participate ...   
14      True  Yes, I understand, and I agree to participate ..

### Metadata kolommen verwijderen uit dataset

In [16]:
def remove_columns(data):
    data_cleaned = data.drop(columns=['Finished', 'Text / Graphic'])
    return data_cleaned

data_final_cleaned = remove_columns(data_final)
print(data_final_cleaned)

                                             Location       Activity  \
0                                                 NaN            NaN   
1   On the first floor (1st floor - in a working s...   1 day a week   
2   On the first floor (1st floor - in a working s...  4 days a week   
3   On the first floor (1st floor - in a working s...  4 days a week   
4                    On the ground floor (the atrium)  2 days a week   
5   On the first floor (1st floor - in a working s...   1 day a week   
6                    On the ground floor (the atrium)  3 days a week   
7   On the second floor (2th floor - in a working ...  5 days a week   
8   On the second floor (2th floor - in a working ...  4 days a week   
9   On the first floor (1st floor - in a working s...   1 day a week   
11                   On the ground floor (the atrium)  3 days a week   
12                   On the ground floor (the atrium)  3 days a week   
13  On the first floor (1st floor - in a working s...   1 day a 

### Nieuwe 'schone' .csv exporteren naar disk


In [17]:
def export_to_csv(data, filename):
    data.to_csv(filename, index=False)

export_to_csv(data_final_cleaned, 'survey-data/survey_data_clean.csv')

OSError: Cannot save file into a non-existent directory: 'survey-data'