# [PUBPOL/ETHSTD C164A] 5. Survey Analysis

**Notebook developed by:** <br>
Team Lead: Skye Pickett  <br>
Module Developers: Leah Hong, Emily Guo, Reynolds Zhang <br>

### Learning Outcome

In this notebook, you will apply the skills you've learned in the previous 4 notebooks and 1 future notebook to clean and conduct explatory analysis on your own survey data. We will provide guidance on importing your data and setting up basic analysis, but the rest is up to your ideas and your new coding knowledge! Open up previous notebooks for reference and, as always, some resources are provided below.

***

### Helpful Data Science Resources 
Here are some resources you can check out while doing this notebook and to explore data visualization further!
- [DATA 8 Textbook](https://inferentialthinking.com/chapters/06/Tables.html) - Tables chapter
- [DATA 8 Textbook](https://inferentialthinking.com/chapters/07/Visualization.html) - Visualization chapter
- [DATA 8 Textbook](https://inferentialthinking.com/chapters/03/1/Expressions.html) - Expressions and Notebook 1 content
- **[Reference Sheet for the datascience Module](http://data8.org/sp22/python-reference.html)**
- [Documentation for the datascience Modules](http://data8.org/datascience/index.html)
- [Cool Data Visualizations](https://www.tableau.com/learn/articles/best-beautiful-data-visualization-examples)
- [Statistica: Find Data on Interesting Topics](https://www.statista.com/)
- [Exploratory Data Analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis)


### Peer Consulting

If you find yourself having trouble with any content in this notebook, Data Peer Consultants are an excellent resource! Click [here](https://dlab.berkeley.edu/training/frontdesk-info) to locate live help.

Peer Consultants are there to answer all data-related questions, whether it be about the content of this notebook, applications of data science in the world, or other data science courses offered at Berkeley.
***

Remember you can change the type of cell from Code cell to Markdown cell to incorporate text in between your code for explanations.

## Importing your data
### Step 1: Obtain and download a .csv file of all responses

<img src="Data/pics/create_csv.png" alt="How to Create CSV File"/>

### Step 2: Open .zip file into .csv

When clicking on the button shown in step 1, it downloads a .zip file. **Click on/open the downloaded file**, <br>`SPRING 2023 - LABOR RAP SURVEY.csv.zip`. This will then produce a file called **`SPRING 2023 - LABOR RAP SURVEY.csv`**. Be sure you have this file, ending in .csv (not .zip) before proceeding.

<img src="Data/pics/zip.png" alt="Convert zip to csv"/>

### Step 3: Open Datahub

Go to the top left corner of this page, where it says `jupyterhub`. Right click and open the link in a new tab. <br>(If you don't do this, it's fine but it will move you away from this notebook so you'd have to re-open it.)

<img src="Data/pics/open_datahub.png" alt="Open DataHub"/>

### Step 4: Locate class repo folder
Locate the folder called "PUBPOL-ETHSTD-C164A-SP23" and open it. 

<img src="Data/pics/repo.png" alt="Open Class Repo"/>

### Step 5: Locate `Data` folder
Locate the folder called "Data" and open it.

<img src="Data/pics/data_folder.png" alt="Open Data Folder"/>

### Step 6: Upload survey csv file in `Data` folder
Click the `Upload` button on the top right, then follow the prompts from the pop-up to locate and open the survey's csv file that you downloaded in step 1 or were provided.

<img src="Data/pics/upload.png" alt="Upload button"/>

### Step 7: Check that the csv file is in the `Data` folder.

Check that your csv file is in the folder now. Be sure that it ends with `.csv`, not `.zip` or any other ending!

<img src="Data/pics/downloaded_file.png" alt="File in Data folder"/>

### Step 8: Import packages

In [13]:
# Run this cell
import numpy as np
import pandas as pd
import folium
import ipywidgets as widgets
from IPython.display import display, HTML
from otter import Notebook
from datascience import *
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')
import seaborn as sns
import otter
grader = otter.Notebook()
print("All necessary packages have been imported!")

All necessary packages have been imported!


### Step 9: Import the file!

In [14]:
survey = Table().read_table("Data/SPRING 2023 - LABOR RAP SURVEY.csv")
survey

Timestamp,At what college or university are you currently enrolled?,Are you a transfer student at a 4-year college?,How long have you been at your current college/university?,Where do you work?,"Altogether, how many paid work experiences have you held during your undergraduate career so far?",How long have you been employed at your current main paid work?,What is your status at your main paid work?,Time spent doing paid work and studies-related work: [How much time do you devote to your education per week?],Time spent doing paid work and studies-related work: [How much time do you devote to paid work per week?],How many types of paid work do you currently do?,Which industry do you work the most hours for?
2023/02/20 3:17:01 PM PST,Public 4-year university (UC),No,2-4 years,On campus,2-3,1-6 months,Employee,25-30 hours,10-15 hours,2,University/On-Campus;Retail
2023/02/21 8:24:50 PM PST,Public 4-year university (UC),Yes,Less than a year,Off campus,1,Less than 1 month,Informal / off the record / under the table,10-15 hours,0-5 hours,1,Construction
2023/02/21 10:01:35 PM PST,Private 4-year university,No,4-6 years,On campus,4-5,1-6 months,Employee,15-20 hours,10-15 hours,1,University/On-Campus
2023/02/22 7:15:23 AM PST,Public 2-year college or community college,No,Less than a year,Off campus,1,6 months-1 year,Self-employed / Independent contractor,30-35 hours,15-20 hours,1,Childcare
2023/02/22 11:49:06 AM PST,Private 4-year university,Yes,1-2 years,Both,2-3,1-3 years,Employee,40+ hours,10-15 hours,2,University/On-Campus;Fast food
2023/02/25 4:36:42 PM PST,Public 4-year university (CSU),No,2-4 years,On campus,2-3,1-6 months,Informal / off the record / under the table,30-35 hours,0-5 hours,1,University/On-Campus
2023/02/26 9:18:07 PM PST,Public 4-year university (UC),Yes,Less than a year,Off campus,1,6 months-1 year,Employee,25-30 hours,10-15 hours,1,Restaurant


If the above cell errors, something was wrong in your work in the prior steps. *Be sure that the file is called `SPRING 2023 - LABOR RAP SURVEY.csv` and that the file exists inside of the `Data` folder.*

If you can see the table, proceed.

### Step 10: Understand the columns and rows

Just like we did in previous notebooks, let's look at the columns and the rows using `.labels` and `.take()`. We can also view the type of data using `type()`.

In [15]:
survey.labels # View all the questions in the survey

('Timestamp',
 'At what college or university are you currently enrolled?',
 ' Are you a transfer student at a 4-year college?',
 'How long have you been at your current college/university?',
 'Where do you work?',
 'Altogether, how many paid work experiences have you held during your undergraduate career so far?',
 'How long have you been employed at your current main paid work?',
 'What is your status at your main paid work?',
 'Time spent doing paid work and studies-related work: [How much time do you devote to your education per week?]',
 'Time spent doing paid work and studies-related work: [How much time do you devote to paid work per week?]',
 'How many types of paid work do you currently do?',
 'Which industry do you work the most hours for?')

In [16]:
survey.take(0) #try other values if you want -- .take(0) gives first row

Timestamp,At what college or university are you currently enrolled?,Are you a transfer student at a 4-year college?,How long have you been at your current college/university?,Where do you work?,"Altogether, how many paid work experiences have you held during your undergraduate career so far?",How long have you been employed at your current main paid work?,What is your status at your main paid work?,Time spent doing paid work and studies-related work: [How much time do you devote to your education per week?],Time spent doing paid work and studies-related work: [How much time do you devote to paid work per week?],How many types of paid work do you currently do?,Which industry do you work the most hours for?
2023/02/20 3:17:01 PM PST,Public 4-year university (UC),No,2-4 years,On campus,2-3,1-6 months,Employee,25-30 hours,10-15 hours,2,University/On-Campus;Retail


Each row represents 1 respondant's answers to the survey.

In [17]:
survey.column("Where do you work?")

array([' On campus', 'Off campus', 'On campus', ' Off campus', 'Both',
       'On campus', 'Off campus'], dtype='<U11')

Using `type()` is helpful to understand the values of different columns.

In [18]:
type(survey.column("Where do you work?")[0])

numpy.str_

Now we know that this column is made up of strings!

Feel free to put in a different column name in place of `"Where do you work?"` to find out information about that column.

For surveys in particular, using `.relabeled` will be particularly helpful since the column names currently include the entire question asked. For example, "Do you have dependents? If so, how many?" can be relabelled to "Dependents" or "num_dependents" or whatever you see fit. 

**Documentation of syntax for `.relabeled` and every function we've covered in previous notebooks can be found on this [reference sheet](http://data8.org/sp22/python-reference.html), also linked in the [Resources section](###Helpful-Data-Science-Resources) at the top of the notebook.**

***
#### Tips/Solutions to Errors:
1. Remember that if you get an error that says **SyntaxError**, this means something was typed incorrectly.
    - You may be missing a parenthesis
    - *I always recommend copy and pasting column names, rather than typing them by memory.*
        - For example, `survey.column("age")` would produce a *syntax error* because the column is "Age", not "age". (Correct code: `survey.column("Age")`. Capitalization and spaces matter so in my opinion it saves time to copy and paste.
1. **Label your variables in a way that you understand.**
    - If you were to save `survey.column("Age")` to a variable, it make much more sense if it were called: `age_values`, `age_array`, `age_column`, `ages`,  etc. instead of `array`, `x`, `col2`, etc.
    - This also makes it easier for you (and classmates) to understand when you're working on your notebook over the span of days or weeks, as you might not remember exactly what you meant by `new_table`.
1. Frequently **save your notebook**! This is always good just in case any slip-ups happen.
    - You can use Control Z to undo a recent change within cells. You can't un-run a cell, but you can jump back to what your cell was a few seconds before you accidentally deleted the wrong line of code.
    - This way you can also revert to an old version by clicking "File" and "Revert to Checkpoint". 
    - *Always save your notebook before closing it.*
1. If you get a pop-up that your "**Kernel** is dead" or if you notice your cells aren't running:<br>
   - *First, save what currently exists! (Control+S or File, Save and Checkpoint)*
   - Then, go to "Kernel" in the menu, then click "Restart".
   - Once it's restarted, click "Cell" then click "Run All Above".
   - This will run all the cells above the one you're currently on. This is necessary because otherwise the cell that imports packages and that loads your dataset won't have been run and code cells will error when trying to refer to them.
   - *Every time you re-open the notebook, remember that you'll need to run cells from the top (by the same logic).*
1. Use the resources listed at the top of each notebook if you're having difficulties or feeling stuck!
***

### Step 11: Conduct your analysis!

**Follow techniques in previous notebooks to identify where your dataset needs to be cleaned, how to clean it, how to conduct Exploratory Data Analysis, and how to produce data visualizations.** Create hypotheses and test them by exploring the survey dataset. 

Use the `+` to add cells or to delete cells, go to "Edit" then "Delete cells".

Write hypotheses, ideas, and descriptions of your process (this helps your understanding too) by using Markdown cells. Change the type of cell by using the dropdown menu on the top of the page to switch from "Code" to "Markdown". 

In [None]:
# Happy analyzing!

***
# Submitting Your Work

Follow these steps: 
1. Go to `File` in the menu bar, then select `Save and Checkpoint` (or click CTRL+S).
2. Go to `Cell` in the menu bar, then select `Run All`.
3. Click `File` in the menu, find `Download As`, and choose `PDF via LaTeX (.pdf)` to save a copy of this notebook as a pdf onto your computer.
4. Submit the downloaded PDF on bCourses according to your professor's instructions.

**Check the PDF before submitting and make sure all of your answers and any changes are shown.**