![](_fig/labeled.jpg)

# Studio 2: How to Work with Data in Python
In this studio you will learn the basics of what computer code is and how best to think about learning to write code in the Python Programming Language. You will also learn how to test whether two quantitative variables have a statistically significant association.<br>
<br>
Each *Python for Healthcare* studio has six sections:
1. **Objectives**
2. **Readings and Videos**
3. **Discussion**
4. **Analysis**
5. **Conclusion**
6. **Reflection**

Refer to the video below to learn how to use the *Python for Healthcare* studios.

["How to use Py4HC Notebooks"](https://www.youtube.com/watch?v=5fzBGgflXk8&t=4s)  

---

## Objectives
By the end of this course, our goal is for you to learn how to start using the Python Programming Language in health science applications. This includes:

- Understanding how computer code allows humans to talk to computers
- Learning the concrete process for using computer code and Python
- Becoming familiar with the process of writing in Python
- Identify healthcare questions that can be answered with open source data science tools
- Experience how Python can be used to answer questions related to healthcare
- Increase awareness of how Python can be used in a future career in healthcare

Keep these goals in mind as you go through the studios and the course. If you dont understand every part, that is ok. Use these resources to get familair with the concepts.


## Videos
Before starting the **Discussion**, watch the following:

["How to Talk to Computers: Part 2"](https://www.youtube.com/watch?v=sw6bWACGybo)   
["What is the Point of Statistical Inference?": Causal Inference Bootcamp](https://www.youtube.com/watch?v=3IOzq0hOttY) 

## Discussion
After completing the readings and videos above, answer the following questions with your team. 

1. There are a handful of different programs you can use to run and write code. What experience have you had with programs like these? What was that experience like for you?
2. In the videos and readings, you can see that Data Science invovles both statistics and programming. What do you feel more comfortable with and why? What makes each practice easier to learn for you?
3. Statistical tests allow for us to provide evidence from the data for big important questions. What are the big questions you would like to answer?

Be sure that everyone answers each question and responds to at least one answer from another team member. 

## Analysis
In each module, you will complete a live data analysis with your team. The goal of the studio is to give you hands on experience writing Python code and using Data Science tools that are important for health science.<br>
<br>
In each **Analysis** section, there are four steps:
1. Setup Workspace
2. Process Data
3. Create Model
4. Display Results

Within each step of the **Analysis** a header will provide general details about what the lines of code below are used to do.<br>
<br>
As you complete the **Analysis** component of the studio, follow this video below to understand each of the steps in the code.

["Notebook 2: Multiple Regression"](https://www.youtube.com/watch?v=5fzBGgflXk8&t=4s)

### Step 1: Setup Workspace
In the first step, you will assemble libraries, set your working directory, and import data. You will do this same process before every analysis. This is just like setting up your brushes, canvas, and easel for paiting or putting your tools, parts, and bike in the repair stand.

![](_fig/E1_1_1.jpg)

![](_fig/E1_1_2.jpg)

#### Import Standard Libraries
These libraries are imported for every data science related Python script.

#### Import Specific Libraries
These libraries are used for specific components of the script.

#### Set Working Directory
This is the location of the folder on your device that holds all of your files. Once you set this, any file can be accessed by the relative location. 

#### Import Data
In order to use data, you will often import from a `".csv"` file located in your directory and save it with a useful name. After importing, you can use `.info()` and `.head()` to quicky view the data. 

### Step 2: Process Data
In the second step, you will modfiy the data frames that you imported in order to get them ready for whatever model you wish to create. You will need to make sure the data is correctly subset or joined, uses the correct shape and type, and has missing values resolved. This is just like kneading clay to remove air bubbles or cleaning the bike before you install new parts. 

![](_fig/E2_2_1.jpg)

#### Join DataFrames
This will join two pandas DataFrames along a column they both share with an identical name and data type. For common options:<br>
`how = "inner"` Keeps rows that appear in both<br>
`how = "outer"` Keeps every row from both<br>
`how = "left"` Keeps every row from the first, and assigns any row from the left (including duplicates).<br>

#### Filter DataFrame
This will remove all columns except those specified.

#### Rename Columns
This will rename selected columns except those specified.

#### Drop NA Values
This will drop all rows that have a missing value.

#### Verify
At the end of each step, use ".info()" and ".head()" to quicky view the data.

### Step 3: Create Model
In this step you will create a model that provides meaningful information about the data. This can include a statistical test, a machine learning algorithm, or a neural network. This is similar to drawing a still life or writing a poem. 


![](_fig/E2_3_1.jpg)

#### Multiple Linear Regression
This statistical test creates a model that shows the effect of multiple quantitative predictors on a quantitative outcome using "Ordinary Least Squares" regression. Individual predictors can be compared using the coefficients and the p-values. The overall model can be evaluated with an adjusted R-squared and the F-statistic. 

### Step 4: Display Results
In this step you will create an informative visual that displays the results for others to see. This is similar to framing your finished painting or posting an image to social media with an informative caption. 

![](_fig/E2_4_1.jpg)

#### Scatterplot
This displays two quantitative variables with dots corresponding to each observation. This is commonly used for displaying correlation tests. 

## Conclusion
In your own words write the following:

1. This topic is important because...

Identify two peer reviewed scientific articles that have findings related to your first statement. Then write the following:

2. Other studies have found that...

Using the results of the analysis above, craft a simple conclusion with the following items:

3. It was hypothesized that...
4. Data was collected from...
5. The study found that...
6. This provides evidence that...

After writing each of these statements, assemble them into a paragraph in the shown order. Then do the following:

- Edit the paragraph to be coherent
- Provide a simple title
- Add references in an appropirate style

Now you have an abstract!

## Reflection
After completing the studio session please pick one or two of the following questions to discuss with your team. 

1. What was something new that you learned in this module?
2. What was something that you knew previously but heard differently in this module?
3. What was something that understand better after this module?
4. What was something that is confusing after this module?

Each team member may select a different question, but be sure that everyone provides a refelction and responds to another's reflection.

---
After you have completed the studio, print the page as a PDF and save it on your local computer. 