# Pandas Data Structures Cheat Sheet

## Part I: Series

*`s = pd.Series()`*

- **1. Characteristics/Attributes of a Series:**
    - `s.name`
    - `s.index`
    - `s.value`
- **2. Accessing Series Elements:**
    - Indexing by integer `s.iloc[start:end:step]`
        - *reverse* $\rightarrow$ `.iloc[-start:-end:-1]`
    - Indexing by name `s.loc["index1":"index2"]`
- **3. Boolean Indexing:**
    - Elements greater than a value: `s.loc[s > n]`
    - Eements between two values: `s.loc[(s > n) & (s < m)]`
    - Index booleans: `s.loc[s.index > "index_name1"]`
- **4. Indexing and Time-Series:**
    - **DateTime Index:** `pd.date_range("startdate", period, freq)`
    - **Resampling:**
        - Resample the Datetime Index by Frequency: `resampled = s.resample("M")`
        - Aggregate the Resampling by Function: `resampled.agg(function)`
    - **Re-indexing:**
        - Create a new frequency date range: `new_range = pd.date_range("startdate", freq)`
        - Re-index the Series with the new date range `new_index = s.reindex(new_range, ffill/bfill)`
- **5. Missing Values:**
    - `s.dropna()`


## Part II: DataFrames

*`df = pd.DataFrame()`*

- **1. Creating a DataFrame:** `pd.DataFrame(matrix, index, columns)`

- **2. Selecting Data:**
    - **a. Selecting Columns:**
       - Single Column: `df["Column name"]`
       - Multiple Columns (list): `df[["Col name 1", "Col name 2"]]` 
       - SQL Syntax: `df.column_name`
    - **b. Selecting Rows:**
        - Based on Row Name: `df.loc["Row Name"]`
        - Based on Row Index: `df.iloc[index number]`
    - **c. Subsetting Rows and Columns:**
        - Values: `df["Row Name", "Column Name"]`
        - Subset: `df[["Row Name Start", "Row Name End"], ["Column Name Start", "Column Name End"]]`
    - **d. Conditional Formatting:**
    - *First Step:* Generate the boolean dataframe condition `df["col"] > n`
    - *Second Step:* Plug in the DataFrame the boolean dataframe `df[df["col"] > n]`

        - 1. Multiple Conditions: `df[(df["Col1"] > n) & (df["Col2"] < m)]`

        - 2. Selecting a Column after Conditioning: `df[df["col1"] > m]["col2"]`

        - 3. Selecting a sub-matrix after conditioning: `df[df["col1"] > m][["col2","col3"]]`
    - **e. Creating New Data:** 
        - `df["New Column"] = df["Col1"] + df["Col2"]`
    - **f. Deleting Data:**
        - Drop a Row: `df.drop("row name", axis = 0, inplace=True)`

        - Drop a Column: `df.drop("col name", axis = 1, inplace = True)`

- **3. Indexing:**
    - **a. Resetting an Index:**
        - `df.reset_index()`
    - **b. Setting an Index:**
        - Create a new Index: `new_index = " ... ".split()`
        - Create a new Column as the new Index: `df["new index"] = new_index`
        - Set the new index be equal to the new Column Index: `df.set_index("new index")`
    - **c. Multi-Indexing:** Two Factor Index:
        - **Step 1:**Generate the two indices such that index 1 match each of index 2 labels

            - *The first index:* `["G1", "G1", "G1", "G2", "G2", "G2", ...]`

            - *The second index:* `[1, 2, 3, 1, 2, 3, ...]`

        - **Step 2:** `zip` the two indices to get the combination of outside and inside indices and place them in a list, i,e. Zip the factors into factor(i,j) tuples

            - `[('G1', 1), ('G1', 2), ('G1', 3), ('G2', 1), ('G2', 2), ('G2', 3)]`
        - **Step 3:** Generate a multi-index from the list of tuples
            - `pd.MultiIndex.from_tuples(hierarchiacal_index)`
            **Step 4:** Apply the multi-index to a DataFrame object

            - `df = pd.DataFrame(np.random.randn(6,2), index = hier_index, columns = "A B".split())`

        - **Step 5:** Specify the multi-index column names with the attribute 
            - `df.index.names`
        - **Selecting from a Multi-Index DataFrame:**

            - a. `.loc["Index1 name" ].loc["index2 name"]`
            - b. `df.xs(["first level name", "second level name"]`
            - c. Access the Second level for all First Level Indices
                - `df.xs("second level row name", level = "second level col name")`