# "Pandas"
> "Collection for all my ressources, snippets and tricks on pandas"

- author: Christopher Thiemann
- toc: true
- branch: master
- badges: true
- comments: true
- categories: [python, ]
- hide: false
- search_exclude: true

In [1]:
#hide
import warnings


import numpy as np
import scipy as sp
import sklearn
import statsmodels.api as sm
from statsmodels.formula.api import ols


import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import seaborn as sns
sns.set_context("poster")
sns.set(rc={'figure.figsize': (16, 9.)})
sns.set_style("whitegrid")

import pandas as pd
pd.set_option("display.max_rows", 120)
pd.set_option("display.max_columns", 120)



  import pandas.util.testing as tm


## Pandas Pipe

Pandas pipe functionality allows to write clean data preperation steps. Instead of having varaibles flying around like df1, df2 ,... the pipe chains a series of function calls on on dataframe. The mental model goes along like this

df -> apply function -> apply function -> ...

By seperating out each step as a function this has the advantage that you can save theem in  a seperate python file where you can test them with unitests. Below is a small example which illustrates the functionality. Note it might be the case that the pipeline changes the original dataframe thats why the first step in the pipeline returns just a copy (there is probably a better way to do it) secondly it should b possible to use the logging module to get a better insight of what the pipeline steps do.

Source https://calmcode.io/pandas-pipe/end.html

In [69]:
list_df = pd.read_html("https://de.wikipedia.org/wiki/Liste_der_L%C3%A4nder_nach_Bruttoinlandsprodukt?oldformat=true")

In [113]:
def deal_first_col(df_pipe):

    df_pipe.columns = ['drop','Land','BIP in MIO US $ 2018', 'veränderung']



    return df_pipe.iloc[:, 1:]

def make_copy(df_pipe):

    return df_pipe.copy()

def set_dtypes(df_pipe, dtype_dict):
    df_pipe['veränderung'] = df_pipe['veränderung'].str.replace(r",", ".")

    df_pipe['veränderung'] = df_pipe['veränderung'].str.replace(r"\xa0", "")
    df_pipe['veränderung'] = df_pipe['veränderung'].str.replace(r"%", "")
    df_pipe['veränderung'] = df_pipe['veränderung'].str.replace("−", "-")

    df_pipe['BIP in MIO US $ 2018'] = df_pipe['BIP in MIO US $ 2018'].str.replace(r".", "")

    return df_pipe.astype(dtype_dict)



In [115]:
df = list_df[0]

(df
    .pipe(make_copy)
    .pipe(deal_first_col)
    .dropna()
    .pipe(set_dtypes, {'BIP in MIO US $ 2018': int,
                        'veränderung': float})
    )


Unnamed: 0,Land,BIP in MIO US $ 2018,veränderung
0,Welt,84929508,5.80
1,Vereinigte Staaten,20580250,5.43
3,Europäische Union,18736855,4.67
4,Volksrepublik ChinaA1,13368073,10.83
5,Japan,4971767,2.30
...,...,...,...
194,Palau,284,-0.70
195,Marshallinseln,214,2.88
196,Kiribati,189,1.61
197,Nauru,112,1.82


## Helper Functions

## Plot for the Blog Post

## Sources
- [x] https://calmcode.io/pandas-pipe/end.html
- [ ] https://www.dataschool.io/python-pandas-tips-and-tricks/
- [ ] https://www.kaggle.com/python10pm/pandas-100-tricks
- [ ] https://realpython.com/python-pandas-tricks/

- [ ] https://github.com/BrendaHali/python_cheat_sheets/blob/master/pandas-cheat-sheet.ipynb

- [ ] https://github.com/Zsailer/pandas_flavor

- [ ] https://github.com/pandera-dev/pandera

- [ ] https://github.com/firmai/pandasvault

- [ ] https://github.com/pandas-ml/pandas-ml

- [ ] https://github.com/TMiguelT/PandasSchema

- [ ] https://github.com/PatrikHlobil/Pandas-Bokeh

- [ ] https://github.com/pdpipe/pdpipe

- [ ] https://github.com/vaexio/vaex

- [ ] siuba

- [ ] ibis

- [ ] https://github.com/modin-project/modin

- [ ] https://github.com/nalepae/pandarallel

- [ ] https://github.com/engarde-dev/engarde

- [ ] https://github.com/santosjorge/cufflinks

- [ ] https://github.com/ZaxR/bulwark

- [ ] https://github.com/pandas-profiling/pandas-profiling

- [ ] https://github.com/pandas-dev/pandas

- [ ] https://github.com/kieferk/dfply

- [ ] https://github.com/pydata/pandas-datareader

- [ ] https://github.com/man-group/dtale

- [ ] https://github.com/jvns/pandas-cookbook

- [ ] https://twitter.com/data_cheeves/status/1183464943149965312

- [ ] https://github.com/nalepae/pandarallel

- [ ] https://github.com/jmcarpenter2/swifter/blob/master/examples/swifter_apply_examples.ipynb

- [ ] https://github.com/firmai/pandapy

- [ ] https://github.com/IntelPython/sdc

- [ ] https://github.com/chezou/tabula-py

- [ ] https://tomaugspurger.github.io/archives.html

- [ ] https://github.com/TomAugspurger/effective-pandas

- [ ] https://github.com/PatrikHlobil/Pandas-Bokeh

- [ ] https://twitter.com/jschwabish/status/1290323581881266177
- [ ] https://twitter.com/TedPetrou/status/1282378990561439746
- [ ] https://www.allthesnippets.com/search/
- [ ] https://github.com/yhat/pandasql/


## References

{% bibliography --cited %}