If you're running this in Google Colab, you can click "Copy to Drive" (above &#8593;) or go to **File > Save a Copy in Drive** so you'll have your own version to work on. That requires a Google login.  
<hr/>

# Lab 1: Analysis Practice  
This Jupyter notebook is a template for analyzing data from a file using Python. If you need to start over from scratch, open a [clean copy of this activity](https://colab.research.google.com/github/adamlamee/UCF_labs/blob/main/analysis_practice_a.ipynb). If you need a refresher on how to execute this notebook, try the [intro activity](https://colab.research.google.com/github/adamlamee/UCF_labs/blob/main/intro.ipynb).  

## Step 1: Import modules (aka libraries of functions) needed for the analysis

In [None]:
# first, run this to import the python modules needed for the analysis
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

## Step 2: Upload a data file
If you'd rather read in data from a shared Google Sheet, see the [other version of this notebook](https://github.com/adamlamee/UCF_labs/blob/main/README.md).  

This only lasts for the duration of running this notebook, so you'll need to upload the fiel each if you close the notebook and come back to it later.

- Click the folder icon in left menu. It's below the {x}.
- Click the upload icon at top of that window.
- Wait until you see the filename appear in the directory on the left. Then, 
- Right-click on the filename or click the 3 vertical dots at the end of the filename to "copy path"
- Paste that path as the URL in the pd.read fucntion below with *single quotes* around it.  

In [None]:
# or use pd.read_excel if you uploaded a .xlsx file (reading .xls files require more work than this)
data = pd.read_excel('paste_the_URL_to_your_file_here', skiprows=0)

# the .head() command gives a preview of the first n rows of the table
data.head(3)

How does the preview of the data table look? It should show column headings long with a few rows of numerical values. If the column headings are farther down, adjust the *skiprows* parameter when you read in the file until it looks right. You can read the  [pandas read_csv( ) page](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) or [pandas read_excel( ) page](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) for more info.

## Step 3: Explore the data  
Run the code in the cells below to get an idea of what your data set looks like.

In [None]:
# shows the number of rows, columns in the data table
data.shape

In [None]:
# View the column headings. You'll need to reference column names exactly when doing math and plotting
data.columns

## Step 4: Make some plots
A histogram plots a single column to show the range of those values and how they're distributed. Run the code below and try a few different values for the number of bins. See the [pyplot.hist( ) page](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) for more ways to customize your plot.

In [None]:
# make a histogram with one column
plt.hist(data['a column heading'], bins=10, histtype='step')
plt.title("here's a title!")
plt.xlabel("label me")
plt.ylabel("frequency")
plt.show()

A scatterplot visualizes the relationship between *two* columns. Run the code below and try adjusting some of the parameters. See the [pyplot.scatter( ) page](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html) for more ways to customize your plot.

In [None]:
# makes a scatterplot
plt.scatter(data['a column heading'], data['another column heading'], s=.5, c='purple')
plt.title("here's a title!")
plt.xlabel("label me")
plt.ylabel("so lonely")
plt.show()

## Step 5: Cropping the data file
Sometimes you'll want to analyze or visualize just some portion of the data. You can use the code below to crop the original data and make a copy (so you don't alter the original data set).

In [None]:
# crop the data based on some criteria
cropped = data.loc[(data['column heading'] > 3) & (data['column heading'] < 10)].copy()

cropped.shape   # see how big the cropped data set is

In [None]:
# if you know the range of rows you'd like, try using iloc instead (integer locate)
# this only keeps rows 2 through 100
cropped = data.iloc[2:100].copy()

cropped.shape   # see how big the cropped data set is

After cropping the data set, you'll probably want to plot your cropped data to see if you're satisfied. Look for how code below references *cropped* instead of the original *data*.

In [None]:
# plotting the cropped data
plt.scatter(cropped['column heading'], cropped['another column heading'], s=.5, c='purple')
plt.show()

## Step 6: Doing math with your data
You can do math  with your data like you might use a spreadsheet. The code below shows some common tasks that may be useful in analyzing your data.

In [None]:
# add a new column and fill it with values calculated using other columns
data['new column'] = 2 * data['one column'] + data['another column']

# shows the top of the data table again
data.head()

And Python can do statistics.

In [None]:
# how many values in a column?
n_1 = data['column heading'].count()
n_1

In [None]:
# and the average
mean_1 = data['column heading'].mean()
mean_1

In [None]:
# and standard deviation of the values in a column
std_1 = data['column heading'].std()
std_1

<hr/>  

# Credits
This notebook was written by [Adam LaMee](http://www.adamlamee.com). Thanks to the great folks at [Binder](https://mybinder.org/) and [Google Colaboratory](https://colab.research.google.com/notebooks/intro.ipynb) for making this notebook interactive without you needing to download it or install [Jupyter](https://jupyter.org/) on your own device.