# Pandas Tutorial

## Practice: Pandas Library   

Basic pandas methods covered in class. All required codes are already written so that you can practice easily.  


---
__1. Import Pandas Library__  

~~~ python
import numpy as np
import pandas as pd
~~~

In [0]:
# Practice here!

---
__2. Series (1d labeled numpy arrays)__

* Creating a series

~~~ python
s = pd.Series(['one', 'two', 'three', 'four'],
             index=[1, 2, 1, 2])      

~~~

* Attributes

~~~ python
print("shape = ", s.shpae)        
print("dtype = ", s.dtype)        
print("size = ", s.size)          
print("ndim = ", s.ndim)
  
~~~

In [0]:
# Practice here!

---
__3. DataFrame (2d tables with row and column labels)__

* Creating a DataFrame

~~~ python
df1 = pd.DataFrame(np.arange(12).reshape(4,3),
                  index=s,
                  columns=['a', 'b', 'c'],
                  dtype=np.float32)
df2 = pd.DataFrame({
  'key1': ['a', 'a', 'b', 'b', 'a'],
  'key2': ['one', 'two', 'one', 'two', 'one'],
  'data1': np.random.randn(5),
  'data2': np.random.randn(5)
})
print(df['key1'])
~~~


In [0]:
# Practice here!

---
__4. Index__

* Hierarchical indexes

~~~ python     
s = pd.Series(['one', 'two', 'three', 'four'],
             index=[[1,1,2,2], ['a', 'a', 'b', 'a']])
~~~


In [0]:
# Practice here!

---
__5. Data Alignment__


~~~ python
df1 = pd.DataFrame(np.arange(15).reshape(5, 3),
                  index=['a', 'b', 'c', 'd', 'e'],
                  columns=['A', 'B', 'C'],
                  dtype=np.float32)
df2 = pd.DataFrame(np.ones((7, 2)),
                  index=['a', 'b', 'c', 'd', 'e', 'f', 'g'],
                  columns=['A', 'B'],
                  dtype=np.float32)
print(df1)
print(df2)
print(df1 + df2)
~~~


In [0]:
# Practice here!

---
## Exercise: Tip data

~~~python
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials


auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
link = 'https://drive.google.com/open?id=17So-kLcf--NclxicAYr8IWRol488hIIh'
fluff, id = link.split('=')

downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('tips.csv')

~~~

In [0]:
# Practice here!

---
__6. GroupBy__

* GroupBy object

~~~ python 
tips = pd.read_csv('tips.csv')
tips['tip_pct'] = tips['tip'] / tips['total_bill']
print(tips[:6])
grouped = tips.groupby('size')
~~~
* GroupBy using array, function

~~~ python 
array = tips['size'] < 4
function = lambda x: x%10
grouped  =tips.groupby([array, function])
~~~


In [0]:
# Practice here!   


* Iterate

~~~ python 
for key, group in tips.groupby('time'):
  print('    key: ', key)
  print(group[:3])  #  pandas.DataFrame type
~~~
* Agg

~~~ python 
tips.groupby('time').agg('mean')

peak2peak = lambda x: x.max() - x.min()
tips.groupby('time').agg(peak2peak)
~~~


* Transform

~~~ python 
transformed = tips.groupby('size').transform(np.mean)

tips['tip'] = tips['tip'].groupby('size').transform(np.mean)
~~~
* Apply

~~~ python 
def top(frame, n=5, column='tip_pct'):
  return frame.sort_values(by=column, ascending=False)[:n]

print(top(tips))
print(tips.groupby('sex').apply(top, n=3))
~~~