# <span style="color:maroon">**Manipulating DataFrames with Pandas**</span>

## <span style="color:blue">**Functions and Methods**</span>

#### DataFrame slicing
`df["column_label"]["row_label"]`  
`df.loc["row_label", "column_label"]`  
`df.iloc[row_index, column_index]`
  
`df["column_label"]` **--> pandas Series**  
`df[["column_label"]` **--> pandas DataFrame**  
`df[["column_label_1", "column_label_2", et...]]` **--> pandas DataFrame with only the column labels provided**  

`df["row_start":"row_end"]` **--> pandas DataFrame for rows between start and end inclusive**  
`df["row_end":"row_start":-1]` **--> pandas DataFrame for rows between start and end in reverse order inclusive**  
`df.loc[:, :"column_rightmost"]` **--> pandas DataFrame for all rows all columns from the left to rightmost inclusive**  
`df.loc[:, "column_start:"column_end"]` **--> pandas DataFrame for all rows all columns between start and end inclusive**  
`df.loc[row_list, column_list]` **--> pandas DataFrame for labels in row_list and column_list (this is a highly configured slice)**  

#### When a boolean series is used to slice a dataframe, it is called a filter
`boolean_series = df["column_label"] > value`  
`df_filtered = df[boolean_series]`  
`df["column_label_1"][boolean_series] = assign_new_value` **--> assign a new value to a column using a row-based boolean filter**  

#### Dropping data
`df.dropna(how="any")` **--> drops rows in df where any column values is NaN**  
`df.dropna(how="all")` **--> drops rows in df where all column values are NaN**  
`df.dropna(thres=threshold_value, axis="columns")` **--> drop columns with less than threshold_value non-missing values**  

#### Tranforming data
`df.apply(func)` **--> applies func to every element in df**  

##### <span style="color:red">**Example of using .map()**</span>
`red_vs_blue = {"Obama": "blue", "Romney": "red"}` **--> dictionary with keys corresponding to the categorical values that you want to map**  
`election["color"] = election["winner"].map(red_vs_blue)` **--> # use the dictionary to map the "winner" column to the new column "color"  

##### <span style="color:red">**Vectorizing over looping**</span>
`df.floordiv(12)` **--> this is a pandas method that utilizes vectorization**  

`numpy.floordiv(df, 12)` **--> this is a numpy ufunc (universal function) that also utilizes vectorization**  

>`def dozens(n):  
      return n // 12`
>
>`df.apply(dozens)`  

`df.apply(lambda n: n // 12)`

##### Vectorized methods work on pandas Series as well
`df.index = df.index.str.upper()`  

`df.index = df.index.map(str.upper)`  **--> for a DataFrame index, .map applies a function to the elements in an index**  

##### Example of using a function in a vectorized manner
>`from scipy.stats import zscore  
 turnout_zscore = zscore(election["turnout"])  
 election["turnout_zscore"] = turnout_zscore`  

## <span style="color:blue">**Advanced Indexing**</span>

#### Key building blocks of pandas data structures
1. indexes: sequences of labels, immutable (if you want to modify the index then you hneed to change the whole index)
2. Series: 1D array with an associated index
3. DataFrames: 2D array with Series as columns

##### <span style="color:red">**You should try to create data structures where indexes are unique (although this is not a requirement)**</span>

##### <span style="color:red">**Modifying an entire index**</span>
>`new_idx = [x.upper() for x in df.index]`  
`df.index = new_idx`  
`df.index.name = "index_name_label"`  
`df.columns.name = "columns_name_label"`  

##### <span style="color:red">**Creating a index from scratch**</span>
`index_list` **--> list that you want to use to generate an index**  
`df.index = index_list`  

##### <span style="color:red">**Hierarchical index (multi-index)**</span>
`df.loc[["outer_index_label", "inner_index_label"]]` **--> retrieves the relevant rows of df**  
`df["outer_index_row_label_start":"outer_index_row_label_end"]` **--> retrieves relevant rows between start and end inclusive**  
`df = df.set_index(["outer_index_label", "inner_index_label"])` **--> sets the multi-index for df**  
`df = df.sort_index()`

##### Accessing the outermost index works like single index slicing
##### Accessing inner indices requires --> slice <-- This is going to require some practice!!!
`df.loc[("outer_index_label", "inner_index_label")]` **--> observe that a tuple is being passed to the indexer**  
`df.loc[(["outer_index_labels"], "inner_index_label"), :]` **--> the tuple is defining the rows within the multi-index**  
`df.loc[(slice(None), 2), :]` **--> slice(None) is removing filtering on the outer index**  


## <span style="color:blue">**Rearranging and Reshaping Data**</span>

##### <span style="color:red">**Pivot a dataframe**</span>
`df_pivoted = pandas.pivot(data=df, index, columns, values)`  
`df_pivoted = pandas.pivot(data=df, index, values)` **--> pivots on all remain columns in df not specified in the index and values parameters**  


## <span style="color:blue">**Grouping Data**</span>

## <span style="color:blue">**Bringing It All Together**</span>