# Intro to Pandas

What is Pandas?

![Alt text](http://getwallpapers.com/wallpaper/full/1/2/3/1185666-cute-anime-panda-wallpaper-1920x1200-hd.jpg) 

Pandas is a data structures, analysis, and manipulation tool. 

The name comes from the term '**pan**el **da**ta', a term used in statistics to describe multi-dimensional, tabulated data.

What can you do with pandas?

![Alt text](Pandas.png)

### 1) Import Package

In [None]:
import pandas 

Importing as pd makes it easier to call:

In [None]:
import pandas as pd

Import other package(s)

In [None]:
import matplotlib.pyplot as plt
import numpy as np

### 2) Series

- indexed one-dimensional array

In [None]:
data = pd.Series([0, 10, 100, 1000])
data

### 3) DataFrames

- two-dimensional array

In [None]:
df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]],columns = ['odds', 'evens'])
df

Or, you can create a datafame with multiple series.

In [None]:
tens = pd.Series([0, 10, 20, 30])
hundreds = pd.Series([0, 100, 200, 300])

df = pd.DataFrame({'tens':tens, 'hundreds': hundreds})
df

Try creating your own dataframe:

### 4) Import csv

We can also create dataframes by importing data from files. This is *extremely* valuable, as we can now analyze and manipulate large datasets quickly.

File types include:
* csv
* text
* JSON
* html
* Excel
* hdf5
* SQL
* And many others!

Let's take a look at a file from a battery cycling experiment. This file is the output from an Arbin battery cycler. 

The data is from Sandia National Labs: https://www.sandia.gov/energystoragesafety-ssl/research-development/research-data-repository/



First, import the csv file.

In [None]:
data = pd.read_csv('LFP_1_25C_Reg.csv')

`data.head()` and `data.tail()` allow you to see the beginning and end of the dataframe respectively. The automatic output is 5 rows. Place a number in the pararentheses to change the number of rows.

In [None]:
data.head()

In [None]:
data.head(2)

In [None]:
data.tail()

In [None]:
data.tail(10)

What does the first 15 rows look like? The last 20?

`len()` shows you the length of the dataframe

In [None]:
len(data)

Use `data.loc[]` or `data.iloc[]` to call a specific row. `.loc` uses an index begininning with 1, and `.iloc` uses an index beginning with 0.

In [None]:
data.loc[100]

In [None]:
data.iloc[99]

You can extract a certain column from the dataframe. The output is a series. 

In [None]:
data['Current(A)']

In [None]:
data['Current(A)'].head()

In [None]:
data['Current(A)'].values

In [None]:
data[1:10]

What is the current measurement for the 152nd row?

You can change the index of the dataframe:

In [None]:
data.index

In [None]:
data = data.set_index('Data_Point')

In [None]:
data.head()

In [None]:
data.columns

You can perform operations between columns and place the result in a new column:

In [None]:
data['avg_temp'] = (data['Temperature (C)_1'] + data['Temperature (C)_2'] + data['Temperature (C)_3'] + data['Temperature (C)_4'])/4
data.head()

Create a new dataframe with a condition statement.

In [None]:
data_step = data[data['Step_Index'] == 43]
data_step.head()

What information would you like to extract from the data? Try it finding it here:

### 5) Merge Data

### 6) Calculate Capacity 

How many charge/discharge cycles were used for this experiment?

In [None]:
data['Cycle_Index'].max()

One way to find capacity is to use a list and for loop:

In [None]:
capacity_charge = []
capacity_discharge = []
for i in range(1,int(data['Cycle_Index'].max())+1):
    cycle = data[data['Cycle_Index'] == i]
    charge = cycle['Charge_Capacity(Ah)'].max()
    discharge = cycle['Discharge_Capacity(Ah)'].max()
    capacity_charge.append(charge)
    capacity_discharge.append(discharge)
    
capacity_discharge = pd.DataFrame(capacity_discharge, columns = ['Discharge Capacity (Ah)'])
capacity_discharge

However, Pandas gives us a much easier and faster way to accomplish the same task, using `df.groupby`

In [None]:
cycles = data.groupby('Cycle_Index')
capacity_discharge = cycles['Discharge_Capacity(Ah)'].max()
capacity_discharge

### 7) Save as a new file

In [None]:
capacity_discharge.to_csv('capacity_discharge.csv')

### 8) Plot results

In [None]:
plt.plot(data['Test_Time(s)'],data['Current(A)'])

### 9) Iterate through files

What makes Pandas extra powerful, is we can iterate through multiple files and perform the same analysis. 

Resources used to create this tutorial:
* http://pandas.pydata.org/pandas-docs/stable/
* https://jakevdp.github.io/PythonDataScienceHandbook/03.01-introducing-pandas-objects.html
* https://www.youtube.com/playlist?list=PLQVvvaa0QuDfSfqQuee6K8opKtZsh7sA9
* https://www.dlr.de/sc/Portaldata/15/Resources/dokumente/pyhpc2011/submissions/pyhpc2011_submission_9.pdf