# pandas Cheat Sheet

1. Series:
    - Creation (by what):
        - array object (numpy array)
        - dictionary (key-value pairs): key -> index, value -> value
        - tuple
        - set, be careful if the data structure is not an ordered values
    - Attributes:
        - values
        - index
            - Attributes: name
        - dtype
    - Parameters:
        - data
        - index
        - dtype
        - name
    - Methods:
        - isnull
        - notnull
        - reindex:
            - Parameters:
                - index
                - method: 'ffill' -> forward fill
                 - fill_value: substitute value to use when introducing missing data by reindexing.
                - limit: when forward- or backfilling, maximum size gap (in number of elements) to fill.
                - tolerance: when forward- or backfilling, maximum size gap (in absolute numeric distance) to fill for inexact matches.
        - drop: 
            - Parameters:
                - labels: list-like
                - inplace: edit the Series, deleting the old one
        - add, sub, div, mul:
            - Parameters:
                - fill_value: fill nan by specified value
        - apply:
            - Parameters:
                - function
                - axis
        - applymap: apply and map (probably also casting)
        - sort_index:
            - Parameters:
                - ascending: default -> True
        - sort_values: nan will be place at the end
        - rank:
            - Parameters:
                - ascending
                - method
        - Aggregation functions: mean, sum, median, etc
            - skipna: If False, the null values are not excluded
        - idxmax: return the index label of max value
        - idxmin: return the index label of min value
        - cumsum: cumulative sum
        - describe: returns summary statistics. On non-numeric data returns alternative summary statistics
        - argmax, argmin: return index position
        - corr: correlations
        - cov: covariances
        - unique: returns array
        - value_counts: returns Series
        - isin: returns boolean array
        - match: compute integer indices for each value in an array into another array of distinct values; helpful for data alignment and join-type operations
        - fillna: fill null values by specified value
    - Accessing: numpy style

2. DataFrame:
    - Creation (by what):
        - array object (numpy array)
        - dictionary containing key-list_of_values
        - dictionary of dictionary: outer -> column label, inner -> index label
        - dictionary of arrays, lists or tuples
        - dictionary of Series
        - list of dicts or Series
        - list of lists
    - Attributes:
        - index
            - Attributes: name
        - columns
            - Attributes: name
        - dtypes
        - values: return ndarray
    - Parameters:
        - data
        - index
        - columns
        - dtype
    - Methods:
        - head: return the first five rows
        - tail: return the last five rows
        - transpose (dataframe.T)
        - isnull
        - notnull
        - reindex:
            - Parameters:
                - index
                - columns
                - method: 'ffill' -> forward fill
                - fill_value: substitute value to use when introducing missing data by reindexing.
                - limit: when forward- or backfilling, maximum size gap (in number of elements) to fill.
                - tolerance: when forward- or backfilling, maximum size gap (in absolute numeric distance) to fill for inexact matches.
        - drop:
            - Parameters:
                - labels
                - axis: 0 -> along row (row labels), 1 -> along column (column labels)
                - inplace: edit the data frame, delete the old one
        - add, sub, div, mul:
            - Parameters:
                - fill_value: fill nan by specified value
        - apply:
            - Parameters:
                - function
                - axis: 0 -> along column, 1 -> along row
        - applymap: apply and map (probably also casting)
        - sort_index:
            - Parameters:
                - axis: 0 -> index label, 1 -> columns label
                - ascending: default -> True
        - sort_values:
            - Parameters:
                - axis: 0 -> index label, 1 -> columns label
                - ascending: default -> True
                - by: specifying based on which label
        - rank:
            - Parameters:
                - ascending
                - method
                - axis
        - Aggregation functions: mean, sum, median, etc
            - axis: 0 -> along column, 1 -> along row
            - skipna: If False, null values are not excluded
        - idxmax: returns the index label of max value
        - idxmin: returns the index label of min value
        - cumsum: cumulative sum
        - describe: returns summary statistics. On non-numeric data returns alternative summary statistics
        - argmax, argmin: return index position
        - corr: correlations
        - cov: covariances
        - corrwith: compute pair-wise correlations
        - unique: returns array
        - value_counts: returns Series
        - isin: returns boolean array
        - match: compute integer indices for each value in an array into another array of distinct values; helpful for data alignment and join-type operations
        - fillna: fill null values by specified value
    - Accessing: 
        - numpy style
        - loc, label based
        - iloc,  position based
        - at
        - iat

3. Index Objects:
    - Creation:
        - list, tuple
    - Methods:
        - append: concatenate additional Index object, return a new object
        - difference: compute set difference
        - intersection: compute set intersection
        - union: compute set union
        - isin: returns boolean array with parameter arraylike
        - is_monotonic: returns True if each element is greater than or equal to the previous element
        - is_unique
        - unique
        - get_indexer: returns the index of argument based on index it provides
    - Attributes: 
        - name

4. Common functions:
    - Mostly each instance methods also have its general functions
    - null:
        - isnull
        - notnull

5. Data loading, Storage, and File Formats:
    - Parsing functions:
        - read_csv: Load delimited data from a file, URL, or file-like object; use comma as default delimiter
            - Parameters:
                - filepath: file path 
                - header: True, the first row will be header
                - sep: separator
                - names: column labels
                - index_col: column that we want to be index label, if provided list, index will be hierarchical form
                - skiprows: skip specified rows
                - na_values: the specified arguments will be treated as null, provided as dict (key as column label, value as list of element)
                - nrows: only takes the specified amount of rows
                - chunksize: take number of pieces, return TextFileReader
        - read_table: Load delimited data from a file, URL, or file-like object; use tab ('\t') as default delimiter
        - read_fwf: Read data in fixed-width column format (i.e., no delimiters)
        - read_clipboard: Version of read_table that reads data from the clipboard; useful for converting tables from web pages
        - read_excel: Read tabular data from an Excel XLS or XLSX file
        - read_hdf: Read HDF5 files written by pandas
        - read_html: Read all tables found in the given HTML document
        - read_json: Read data from a JSON (JavaScript Object Notation) string representation
        - read_msgpack: Read pandas data encoded using the MessagePack binary format
        - read_pickle: Read an arbitrary object stored in Python pickle format
        - read_sas: Read a SAS dataset stored in one of the SAS system’s custom storage formats
        - read_sql: Read the results of a SQL query (using SQLAlchemy) as a pandas DataFrame
        - read_stata: Read a dataset from Stata file format
        - read_feather: Read the Feather binary file format
    - Write (instance method of dataframe):
        - to_csv:
            - Parameters:
                - filename
                - sep: separator
                - na_rep: the argument will be the text for missing values
                - index: If false, no index included
                - header: If false, no header
                - columns: list of column label
    - JSON file:
        - json module
        - json.load: load the str to dict
        - json.dumps: back to str
        - pd.read_json: incase for good form of json file
        - df.to_json: make json file from data frame
    - HTML:
        - pd.read_html