*** What is PANDAS? ***
- PANDAS is a powerful python library that is used for data manipulation, it follows the Split-Apply-Combine style of data analysis where you break up a big problem into manageable pieces, then operates on each piece independently and finally puts all the pieces back together.
    - for more on this see: Hadley Wickhan from Rice University: http://www.jstatsoft.org/v40/i01/paper

*** Types of types in PANDAS: ***
* Panda Series- Similar to a 1-D array in num.py, call using pd.Series 
* Panda Index- "which provides sort of an indexing framework" the first row of the series, call using pd.Index
- DataFrame- similar to a excel or sql table as it has both columns and rows, each column is a series of data and these series can have relationships, call using pd.DataFrame

*** Main Ways to Query: ***
- Note: Can querry by location or index value 
- data[]
    - can put a range of the data or a value in the brackets and returns more information on that set
* data.head(#) 
    - prints first # values of dataset, 5 by default
* data.tail(#) 
    - prints last # values of dataset, 5 by default
* series_name.ix[]
    - general indexing, can be used for integer position and for labels
* series_name.iget() (only for versions lower than 0.20.3) 
    * use .loc[] for updated versions
* series_name.iloc[]
    - "location"- use for integer position querring 
* series_name.sort('column_name')
    * series_name.sort() returns a sort low to high based on the first column 
- df.index

*** Series/DataFrame Functions: ***
- name.shape / len(name)
    - functionally the same returns the length of the series/ dataframe
- name.mean
- name.median 
- name.mode
- name.count 
- name.unique 
    - returns the number of unique values
- name.value_count
- **name.describe** 
    - provides count, mean, std, max, and dtype
- **.apply** 
    - applies a function to the DataFrame or Series

*** NaNs: ***
* note: NaNs are treated differently in PANDAS than in numpy
- NaNs are ignored for the most part when considering functions 
    - especially when doing math functions 
    - i.e. the mean of a set can be caluculated even if there are NaNs in the set


*** Formatting Dates/Time ***
- PANDAS recognizes  dates in yyyy-mm-dd format so it is suggested to print the .head() of your date column and ensure it is in the right format and has a dtype of datetime 
- One can change the dtype to a datetime by parsing the data as it is imported using the parse_dates tool
    - while parse_dates is a powerful tool it isn't always perfect so some tweaking may need to occur

*** DataFrame Manipulation: ***
- dataframe_name.columns - retrieves the names of the columns  
- different ways to fill in data:
    - .fillna()
        - looks for all NaNs and fills with value inputed, if no value selected defaluts to 0
    - .ffill()
        - fills last known values from before the NaNs, can select to use on just one column by stating df('columnname').ffill()
    - .bfill()
        - fills last known values from after the NaNs
    - .dropna()
        - Return object with labels on given axis omitted where alternately any or all of the data are missing
        - to drop all missing data use .dropna(how='any')
- .stack()
    - compresses a level in the DataFrame column, an "undo" by using .unstack()


In [1]:
from IPython.display import Image
Image(url='http://i.stack.imgur.com/GbJ7N.png')

*** Joining Data: ***
- The bulk of the data joining that the lab will be doing will be the "full outer join"
- To do this we use the function: A.join(B, how="outer")

*** Importing/Saving data: ***
- **.read_csv('')** - imports data that has been saved in a csv form
    - there are many parameters that this function has including skipping rows, columns and setting data types use ?.read_csv to further explore
    - or read in the format the data is in (json, excel...)
- name.to_csv/excel/pickel/json/....
- plt.savefig('.../ locaiton/.png') -or .img or other file type
- .DataFrame.to_latex 
    - returns a formatted latex strign of data, the command provides many parameters for custumiztion 

*** Plotting data:*** 
- PANDAS uses matplotlib 
- name.hist()
     - can define number of bins-- .hist('column_name',bins=#, figsize=(#,#))
- name.boxplot()
- name.bar()
- plt.xlim((#, #))
- plt.ylim((#,#))
- plt.title("title of plot")
- plt.savefig('location to save plot to.png')

*** Helpful Tips: *** (Feel free to add your own!)
- when editing/manipulating data in PANDAS it is advised to use the inplace= True command
- if not the package will make a new dataframe instead of editing the old one, this of course could be what you had in mind but then just be sure to name it appropriately to avoid confusion
