# Data Analysis with Python

This tutorial is based on the Data Carpentry Pandas Introduction

## Managing Data with Pandas
* Libraries extend the functionality of Python
* Pandas is a popular library for working with data.
* Pandas uses Dataframes as a data structure for accessing data by column (name or index) or row, so it *feels* like a spreadsheet.

## Import Data from CSV

In [None]:
# Import libraries
import pandas as pd
import numpy as np

In [None]:
# Sample data file
!cat data/simple.csv

In [None]:
# Dataframe with automatic index
data = pd.read_csv('data/simple.csv')
print(data)

In [None]:
# Dataframe with column as an index
idata = pd.read_csv('data/simple.csv', index_col=0)
idata

## Export Data

In [None]:
# Export to CSV
idata.to_csv('data/pandas_output.csv')
!cat data/pandas_output.csv

## Selecting Columns and Rows

In [None]:
# Select column
data['y']

In [None]:
# Select multiple columns
data[['z', 'x']]

In [None]:
idata

In [None]:
# Select row by index label
idata.loc[2]

In [None]:
# Select row by position (rown number)
idata.iloc[1]

In [None]:
# Select a slice
idata[0:2]

In [None]:
# Loop over rows (DANGER)
print(data)
for row in data.values:
    print(row * 2)

In [None]:
# Array from first column
data.values[0]

## Querying and Calculations

In [None]:
# Conditional select (Query)
data2 = data.query('x > 1')
data2

In [None]:
# Multiple query conditions
data.query('x > 0 and y > 2.0', inplace=False)

In [None]:
# Create a new column from a calculation
data['w'] = data['y'] * data['z'] + 2

In [None]:
data

In [None]:
# Complex calculations
data['u'] = np.log(data['y']) * np.sqrt(data['z'])
data

## Grouping

In [None]:
# Get more interesting data
# The purpose of this dataset was to compile general life history characteristics for a variety of mammalian species to perform comparative life history analyses among different taxa and different body size groups.
url = "http://www.esapubs.org/archive/ecol/E084/093/Mammal_lifehistories_v2.txt"
mdata = pd.read_csv(url, delimiter="\t")
#mdata = pd.read_csv("data/Mammal_lifehistories_v2.csv", skip_blank_lines=True, delimiter='\t')
mdata.head()

In [None]:
# Group data
data_by_order = mdata.groupby('order')
for order, order_data in data_by_order:
    avg_mass = np.mean(order_data['mass(g)'])
    print(f"The average mass of {order} is {avg_mass:.2f} grams")

[Section 2](GCP_Data_Analysis_02.ipynb)