# Data Analysis in Python!
This notebook introduces the data analysis process within Python as an additional tool to use, in addition to spread sheets.

Our goal is not perfect mastery, but to get you started and learn how to work with and edit code "snippits" so that they work in your particular application.

# What is this?
For some more "formal" terminology, this is a "Jupyter notebook" with blocks of code called cells. You can press shift+ENTER to run a cell and go on to the next one. You can also edit the code and run it again to see how the output changes.

# Importing Python Data Packages
To start with, we are going to import some commonly used, mathematical "packages" that contain resources we'll want to use in our analysis.  Note that in some programming languages "packages" are referred to as "libraries."  More details are in the block of Code below...


In [None]:
# Always start by importing the analysis packages we'll be using
# Notice that any line that stars with a " # " symbol is treated as a "comment" in Python, meaning it's for our reference and is not treated as code.

import pandas as pd  # Helps with organizing and formatting data.  The "pd" is a shorthand reference we can refer to this package as
import numpy as np   # "Num Py" is a package that contains common mathematical and statistical functions
import matplotlib as mpl # "Mat Plot Lib" is a package that helps with plotting and graphing data
import matplotlib.pyplot as plt  #  "Py Plot" is a sub-package of Mat Plot Lib that particularly helps with plotting and graphing data

# Also, don't forget to "run" this block of code to actually import the packages
# To "Run" a code block, make sure the cursor is in it and then press "shift" and "enter" at the same time
# You can also an option from the "Runtime" menu too.  When running this block, it loads the pacages in the background.
# To help assure the cell has run properly, an output line with a message is below...
print("Packages Imported!")

# Entering the Data
We need to bring data into the python notebook.  To do this, we need to create a "Pandas Data Frame" to store our data.

There are many ways to import data, but we're going to start with manually entering it here.  Pandas Data Frames work best with data sorted in "columns" and "rows" with a label.

# Use the Data Set below

The start of the data entry is seen below, but you should finish it out so that all of the data is present.  The full data set is seen below...

|Radius (m)| Force (N)|
|:--------:|:--------:|
|0.5|60.0|
|1.00|31.0|
|2.00|15.2|
|3.00|9.9|
|4.00|7.5|
|5.00|6.1|

In [None]:
data_3 = pd.DataFrame(
    { "radius": [0.5, 1.00, 2.00, __________ ],
     "force": [60.0, 31.0, 15.2, ___________ ]
     })

# To check to see if the data was entered correctly, you can use the "head" function to print a few lines
data_3.head(2)
# Use "shift and enter" to run the code and see the output

# Graphing the Data
Let's make our first "test plot" here.

Below is a chunk of code that creates a scatter plot, using the matplotlib pyplot library.  Take a look at each line and see if you can decipher what each does

In [None]:
xMin = 0
yMin = 0
plt.scatter(x = data_3["radius"], y = data_3["force"])
plt.axis(xmin = xMin, ymin = yMin)
plt.title("Force vs Radius")
plt.xlabel("Radius (m)")
plt.ylabel("Force (N)")
plt.show()

# Manipulating Data in a DataFrame
As you can see, a linear result is not seen when relating Force and Radius.

In your spreadsheet, you hopefully took the inverse to achieve a linear result.  Here we'll do the same thing to our data, but in Python.

Nicely, the Pandas DataFrame allows us to do math operations on data quickly and easily.  In other programming languages, a "For Loop" structure would be needed for this (reminiscent of needing to "drag" the small box down to "apply" the calculation to each data point in the spreadsheet), but Pandas will do the math operation of ALL of the data in a column for us.

In [None]:
data_3["inverseR"] = 1 / data_3["radius"]
data_3.head(5)

In [None]:
# Plot the data again to see the result.  Copy and Paste are your friends here...
xMin = 0
yMin = 0
plt.scatter(x = data_3["__________"], y = data_3["______________"]) # Fill in the blank spaces here
plt.axis(xmin = xMin, ymin = yMin)
plt.title("__________")
plt.xlabel("_________")
plt.ylabel("_________")
plt.show()

# Best Fit Line and Equation
As a final result, we should add a "fit" line and have Python determine the equation.  This process is more involved than on a spreadsheet, unfortunately.  To help, there's a sample setup (with different data) showing how to do this below.

As before, use "Copy and Paste", but then manipulate the code so it creates a best fit line and equation for YOUR data.

In [None]:
# Using the Best Fit Sample Code below, Make a Plot of your Linearized Graph for Data Set 3 that has a Best Fit Line.


# Best Fit Sample Code
The block of code below shows how to create a best fit line, showing it on a graph and also printing the resulting equation.

In [None]:
# The below set of code graphs a linear, best fit line, of a sample data set.  It also produces the resulting linear equation as well.

xMin = 0
yMin = 0
xMax = np.max(sampleData["sampleX"])

plt.scatter(x = sampleData["sampleX"], y = sampleData["sampleY"])
plt.axis(xmin = xMin, ymin = yMin)
plt.title("Sample Plot")
plt.xlabel("Sample X")
plt.ylabel("Sample Y")

# Code to create best fit line is here
slope, intercept = np.polyfit(sampleData["sampleX"], sampleData["sampleY"], 1)
xValues = np.arange(xMin, xMax, (xMax - xMin)/200) # Creates a set of 200, evenly spaced "x" values
plt.plot(xValues, slope*xValues + intercept, color = "r") # Plots the x values and calculates the "y" values based on the best fit line result
plt.show()

print("y=%.6fx+%.6f"%(slope, intercept)) #Code to print the resulting best fit equation.  Should show below the graph

# Analyze Another Data Set Next!
Essentially, repeat the process you just completed, but now for a different Data Set, which we'll call Data Set 4 here.

Some code blocks are set for you, but feel free to add more as needed.
Do your best to add some comments from time to time too to help.

Remember that "Copy and Paste" are your friends here, but you'll need to adjust the details of the code to make it work with this next data set.

The data for Data Set 4 can be seen below...

|Length (m)| Time (s)|
|:--------:|:--------:|
|0.10|0.63|
|0.20|0.90|
|0.30|1.10|
|0.50|1.41|
|0.60|1.54|
|0.80|1.79|

In [None]:
# Create a dataframe that called "data_4" that contains the "length" and "time" data within Data Set 4


In [None]:
# Graph Time vs Length and observe what the shape of the graph is


In [None]:
# If not linear, create a new column in the data_4 dataframe that manipulates the Time or Length values to attempt to achieve a proportional relationship
# As a side note, if you find the need to "raise" a column, use the " ** " operator.  ie, x squared in python is:  x**2


In [None]:
# Make the next Test Graph, checking for a proprotional result


In [None]:
# If a proprotional result has been reached, make a graph of your Linearized Graph for Data Set 4 that has a Best Fit Line.
# Be sure to display the equation of the best fit line too




---
# Yet Another Data Set!
Code blocks are in place below for analysis of another data set.

We'll call this Data Set 5 and the data values are...

|Distance (cm)| Electric Force (N)|
|:--------:|:--------:|
|0.10|500|
|0.20|125|
|0.30|56.0|
|0.50|20.2|
|0.70|10.2|
|0.80|7.8|


In [None]:
# Create a dataframe that called "data_5" that contains the "Distance" and "EForce" data within Data Set 5

In [None]:
# Graph Electric Force vs Distance and observe what the shape of the graph is

In [None]:
# If not linear, create a new column in the data_5 dataframe that manipulates the Distance or Electric Force values to attempt to achieve a proportional relationship

In [None]:
# Make the next Test Graph, checking for a proprotional result

In [None]:
# This is just an extra code block, in case you need it :)  
# You can always add additional code blocks by pressing the "+ Code" button at the top left of the screen

In [None]:
# If a proprotional result has been reached, make a graph of your Linearized Graph for Data Set 5 that has a Best Fit Line.
# Be sure to display the equation of the best fit line too