# 🔃 Refactoring! 🔃
<font size="-0.5">*SPD 1.3 assignment*</font>

## Project Description

Students enrolled in DS 1.1 Data Analysis 📈completed the <b>Net Score Promoter</b> assignment to further our understanding of data cleaning 🛁and visualization 🤓. We looked at anonymous survey responses from individuals who took part in the Make School Summer Academy. They were categorized as <b>promoters</b> 😍, <b>passives</b> 😐, or <b>detractors</b> 😡 based on their responses.

Raw material for data analysis is often stored in <b>CSV</b> files ('<i>Comma-Separated Values</i>'). The data for this project spanned across <b>dozens</b> of such files. 🙀 The most <b>effective</b> 💯 solution would have entailed programmatic automation.

## *Did I do this?* 💔 Nope!

I manually opened <b>every single</b> individual CSV file. 😵 Like so:
```python
LA1 = pd.read_csv('Anon Week 1 Feedback - LA.csv')
NY1 = pd.read_csv('Anon Week 1 Feedback - NY.csv')
SF1 = pd.read_csv('Anon Week 1 Feedback - SF.csv')
SG1 = pd.read_csv('Anon Week 1 Feedback - Singapore.csv')
SV1 = pd.read_csv('Anon Week 1 Feedback - SV.csv')
LA2 = pd.read_csv('Anon Week 2 Feedback - LA.csv')
...
SG7 = pd.read_csv('Anon Week 7 Feedback - SV.csv')
SV7 = pd.read_csv('Anon Week 7 Feedback - SV.csv')
```

By the time I realized my mistake, I was over halfway done with my Sisyphean task. 🙄😭

Writing functions would have saved me immense trouble and effort ⏳, so this is the aim of refactoring my NPS project!

# 📦 Import packages

Remains unchanged, with one exception: I added ```import glob```. The ```glob``` module 🍮 finds all pathnames matching a specified pattern!

*I think that's dulce de leche. It looks sorta globby.*

In [1]:
import math
import csv
import glob
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats

# 🧭 Navigate to the directory storing CSVs

Cells in <b>Jupyter Notebooks</b> can be used like Terminal! Let me show you:

In [2]:
ls

NPS_Project.ipynb                  SA Feedback Surveys 2016-2017.zip
NPS_Project.slides.html            SSOT_legwork.ipynb
[1m[36mSA[m[m/                                Time_to_Refactor.ipynb


In [3]:
cd SA/

/Users/cherishkim/Code/school_projects/DS11/🏛 Projects 🏛/NPS_Project/SA


In [4]:
ls

[1m[36m2016[m[m/     [1m[36m2016.wk8[m[m/ [1m[36m2017[m[m/     ssot.pkl


In [5]:
cd 2016/

/Users/cherishkim/Code/school_projects/DS11/🏛 Projects 🏛/NPS_Project/SA/2016


In [6]:
ls

Anon Week 1 Feedback - LA.csv         Anon Week 5 Feedback - HK.csv
Anon Week 1 Feedback - NY.csv         Anon Week 5 Feedback - LA.csv
Anon Week 1 Feedback - SF.csv         Anon Week 5 Feedback - NY.csv
Anon Week 1 Feedback - SV.csv         Anon Week 5 Feedback - SF.csv
Anon Week 1 Feedback - Singapore.csv  Anon Week 5 Feedback - SG.csv
Anon Week 2 Feedback - LA.csv         Anon Week 5 Feedback - SV.csv
Anon Week 2 Feedback - NY.csv         Anon Week 6 Feedback - HK.csv
Anon Week 2 Feedback - SF.csv         Anon Week 6 Feedback - LA.csv
Anon Week 2 Feedback - SG.csv         Anon Week 6 Feedback - NY.csv
Anon Week 2 Feedback - SV.csv         Anon Week 6 Feedback - SF.csv
Anon Week 3 Feedback - LA.csv         Anon Week 6 Feedback - SG.csv
Anon Week 3 Feedback - NY.csv         Anon Week 6 Feedback - SV.csv
Anon Week 3 Feedback - SF.csv         Anon Week 6 Feedback - Taipei.csv
Anon Week 3 Feedback - SG.csv         Anon Week 6 Feedback - Tokyo.csv
Anon Week 3 Feedback - SV.c

# 📝 Time to programmatically speed things up!
My variable names did not follow Python convention originally (ex. 'step1' instead of 'step_1'. I aim to follow convention here.

In [7]:
step_1 = []

#glob finds all pathnames matching a specified pattern.
#Literally anything could be in place of the asterisk and it'd match
for f in glob.glob('Anon Week*.csv'): #for ea. file name matching this pattern,
    split_title = f.split(' ') #split the title at whitespace
    week = split_title[2] #index 2 in split_title array is week #
    location = split_title[5].split('.')[0] #remove .csv from the end
    
    #read the CSV file
    temp_partial_2k16 = pd.read_csv(f)
    
    #add columns for week and location from file title
    temp_partial_2k16['Week'] = week
    temp_partial_2k16['Location'] = location
    
    #append completed temp_partial_2k16 dataframe to list
    step_1.append(temp_partial_2k16)

### 🍮```glob``` to the rescue! 

### 🔂 If it works, this for loop will have done the job with <10 lines of code.

### 🤯 ***But does it work?*** I'll use the ```pandas``` library to try **concatenating** all the dataframes stored in **step_1** together. Then I'll use ```.head(3)``` to show the first three entries of that huge, cumulative dataframe.

### 🤔 **If it works**, those first three entries will appear. **If it doesn't work**, I'll get an error.

In [8]:
results_partial_2k16 = pd.concat(step_1, ignore_index = True, sort=False)
results_partial_2k16.head(3)

Unnamed: 0.1,Timestamp,How would you rate your overall satisfaction with the Summer Academy this week?,How well is the schedule paced?,Week,Location,How well are the tutorials paced?,What track are you in?,Unnamed: 0
0,8/5/2016 1:39:41,3,3,7,Taipei,,,
1,8/5/2016 1:40:47,4,3,7,Taipei,,,
2,8/5/2016 1:40:50,4,3,7,Taipei,,,


## Mission accomplished! 🎉
# This refactor has been a SUCCESS!! 🔥💯💥