# Getting Specific Data From a Typeform Survey

I'm going to share this little project in case it's of use to anyone else. In it, we will use a few different Python and `pandas` approaches to quickly grab some specific data from a Typeform survey.

The first step is to, on Typeform's site, look at the survey responses and click `Download All Responses`, then save as a CSV. Name it something logical, and put it into whatever folder you typically store data sets. For this example, I have it stored in the same folder as my Jupyter notebook files so that I can import it easily without having to type a file path.

## Step 1: Import Pandas

In [44]:
import pandas as pd

data = pd.read_csv('scholarship-responses.csv')

# data.head()

(Normally you might want to use `data.head()` here to take a look at the DataFrame, but since this particular CSV includes private student data, I won't print any of its contents in this notebook, so I have commented out that line.)

## Step 2: Grab the Data We Need 

Now we're going to do some light filtering and cleaning, and we'll look at three different ways of doing that. Specifically we want to:

1. Grab the email addresses of students who answered "Yes" to a "want to be notified?" question?
2. Clean that list to get rid of entries that aren't actually valid email addresses.

The code that follows probably isn't ideal or the most effcient, but it does all work!

### Combo Style

Here's an approach that uses a pandas Boolean in two separate lines. Personally I find this a bit easier to follow than the method chaining approach below.

Note also that the column labels here are very long. If you're doing a lot of work with a data set it probably would make sense to rename them something much shorter, but since we're just doing one quick thing here we'll just stick with the default column labels, which are the questions that were asked in the survey).

In [45]:
# create a boolean for only the rows that have 1.0 (yes) as the answer to being notified
yes_bool = data[data['Would you like us to notify you about future scholarships?'] == 1.0]

# create a pandas series that includes only the email address column from the rows in yes_bool
emails = yes_bool['And your email address?']

# check length of this series to be sure it makes sense
print(len(emails))

# create an empty list for the cleaned emails to be stored 
cleaned_emails = []

# use regular for look to iterate through this series
for row in emails:
    answer = row
    # if the row includes the @ symbol, add it to our list of cleaned emails
    if '@' in answer:
        cleaned_emails.append(row)
    else:
        pass

# check length of cleaned emails to be sure that it makes sense
print(len(cleaned_emails))

1548
1541


### Method Chaining

Here's how to do the same thing using method chanining to convert the first two lines of code in the previous cell to a single line of code here. The rest of the code is the same, except that I changed the variable names for clarity.

In [46]:
emails_2 = data.loc[data['Would you like us to notify you about future scholarships?'] == 1.0, 'And your email address?']

print(len(emails_2))

cleaned_emails_2 = []

for row in emails_2:
    answer = row
    if '@' in answer:
        cleaned_emails_2.append(row)
    else:
        pass
    
print(len(cleaned_emails_2))


1548
1541


### For Loop Style

Here's how to do the same thing in a straight-up for loop, which isn't the best way to do this, but here you have it anyway. It works!

In [47]:
emails_3 = []

for label, row in data.iterrows():
    email = row.loc['And your email address?']
    answer = row.loc['Would you like us to notify you about future scholarships?']
    if answer == 1.0:
        emails_3.append(email)
        
print(len(emails_3))

cleaned_emails_3 = []

for row in emails_3:
    answer = row
    if '@' in answer:
        cleaned_emails_3.append(row)
    else:
        pass
    
print(len(cleaned_emails_3))

1548
1541


## Step 3: Output Cleaned List to CSV

Now we've got our final list of cleaned emails, let's convert it to a pandas series and export it as a CSV. Again, you'd probably want to use `em_series.head()` here to double-check your series looks right, but I won't do that here because I don't want to leak any student email addresses, so I have commented out that line.

In [48]:
em_series = pd.Series(cleaned_emails)

# em_series.head()

In [49]:
em_series.to_csv('cleaned-emails-exported.csv')