If you're running this in Google Colab, you can click "Copy to Drive" (above &#8593;) or go to **File > Save a Copy in Drive** so you'll have your own version to work on. That requires a Google login.  
<hr/>

# Analyzing data from a file  
This Jupyter notebook is a template for analyzing data from a file using Python. If you need to start over from scratch, open a [clean copy of this activity](https://colab.research.google.com/github/adamlamee/UCF_labs/blob/main/analysis_practice.ipynb). If you need a refresher on how to execute this notebook, try the [intro activity](https://colab.research.google.com/github/adamlamee/UCF_labs/blob/main/intro.ipynb).  

## Step 1: Import modules (aka libraries of functions) needed for the analysis

In [None]:
# first, import the python modules needed for the analysis
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

## Step 2: Read in the data file
You have a few of options to read in a data file. Two convenient ones are (1) upload it here or (2) put a copy in a Google Drive, then use the sharing link to open in here.

### Option A: Upload a file directly to this Colab notebook  
This only lasts for the duration of running this notebook. You'll need to repeat this if you close the notebook and come back to it later.

- In Colab, click the folder icon in left menu. It's below the {x}.
- Click the upload icon at top of that window.
- Wait until you see the filename appear in the directory on the left. Then, 
- Right-click on the filename or click the 3 vertical dots at the end of the filename to "copy path"
- Paste that path as the URL in the pd.read fucntion below with *single quotes* around it.  

### Option B: Use a shared link from a Google Sheet  
This is good for collaboration since it works for anyone with the link to the shared data file.  

- Save the .csv or spreadsheet data file to Google Drive.  
- Open the file in Google Sheets (right-click for options). Oddly, this won' twork for the original file, only the file saved as a Sheet.
- Get the view-only link to the file. It'll look something like this:  
  https://docs.google.com/blah-blah/edit?usp=sharing  
- Replace the last part with "export", like this:  
  https://docs.google.com/blah-blah/export  
- Use pd.read_excel function to open it (not read_csv):  
  pd.read_excel('https://docs.google.com/blah-blah/export')
- Paste that path as the URL in the pd.read fucntion below with *single quotes* around it.  

In [None]:
# for option A above: use pd.read_csv if you're uploading a csv file
# or use pd.read_excel if you uploaded a .xlsx file (reading .xls files require more work than this)
data = pd.read_csv('URL_to_file')

In [None]:
# for option B above: use pd.read_excel if you're opening a Google Sheet
data = pd.read_excel('sharing_link')

## Step 3: Explore the data  
These are some examples of ways to look at your data.

In [None]:
# shows the number of rows, columns in the data table
data.shape

In [None]:
# Check to see what the data file looks like. The .head() command displays the first n rows of a file.
data.head(3)

In [None]:
# View the column headings. You'll need to reference column names exactly when doing math and plotting
data.columns

In [None]:
# make a scatter plot with two columns
plt.scatter(data['column 1'], data['column 2'], s=.5, c='purple')
plt.show()

In [None]:
# make a histogram with one column
plt.hist(data['column 1'], bins=10, histtype='step')
plt.show()

The official documentation pages for how the functions has more options for customizing them.  
- [pyplot.scatter](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html)
- [pyplot.hist](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html)

In [None]:
# add a new column and fill it with values calculated using other columns
data['new column'] = 2 * data['column 1'] + data['column 2']

# shows the top of the data table again
data.head()

Sometimes it's helpful to crop the data set to only include a particular section. 

In [None]:
# crop the data based on some criteria
cropped = data.loc[(data['column 1'] > 3) & (data['column 1'] < 10)].copy()

In [None]:
# if you know the range of rows you'd like, try using iloc instead (integer locate)
# this only keeps rows 2 through 100
cropped = data.iloc[2:100].copy()
cropped

In [None]:
# then look at the cropped data set to see if you're satisfied
plt.scatter(cropped['column 1'], cropped['column 2'], s=.5, c='purple')
plt.show()

And Python can do statistics.

In [None]:
# how many values in a column?
n_1 = data['column 1'].count()
n_1

In [None]:
# and the average
mean_1 = data['column 1'].mean()
mean_1

In [None]:
# and standard deviation of the values in a column
std_1 = data['column 1'].std()
std_1

<hr/>  

# Credits
This notebook was written by [Adam LaMee](http://www.adamlamee.com). Thanks to the great folks at [Binder](https://mybinder.org/) and [Google Colaboratory](https://colab.research.google.com/notebooks/intro.ipynb) for making this notebook interactive without you needing to download it or install [Jupyter](https://jupyter.org/) on your own device.