# Creating Your Own Python Modules & Packages

- 05/13/22

# Part 1: Reviewing Functions

- Concepts to review:
    - Defining Function Arguments/Parameters:
        - Positional Arguments
        - Keyword Arguments

        - Return Statements

    - Function Scope

    - Docstrings
    
- We will start with some example code from the LP that would be perfect for a function.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

url = 'https://docs.google.com/spreadsheets/d/1VMaw2oCn0ABitd-alLAEsEhGS1Je2UFNLu76TKrIH7w/gviz/tq?tqx=out:csv&sheet=Raw_Medical_Data_for_day1'
df = pd.read_csv(url, index_col = 0)
df.head()


In [None]:
# Code from: https://login.codingdojo.com/m/376/12533/89502
col = 'Income'
feature = df[col]

In [None]:
## Turn the code below into a function:
mean = feature.mean()
median = feature.median()
std = feature.std()
plus_one_std = mean + std
minus_one_std = mean - std
fig,ax = plt.subplots(figsize=(10,6))
sns.histplot(feature ,ax=ax,stat='probability')
ax.axvline(plus_one_std, color = 'black',label=f'+1 std = {plus_one_std:,.2f}')
ax.axvline(minus_one_std, color = 'black', label = f'-1 std = {minus_one_std:,.2f}')
ax.axvspan(plus_one_std, minus_one_std, color = 'yellow', zorder = 0)
ax.set_title(f'{col}')
ax.legend();

>- Turn the above code into a function.
    - Make it more flexible (e.g. be change figsize)
    - Make it return the figure
    

In [None]:
## turn the above code into a function
 

# Part 2: Moving Your Functions to a Py File


### Move Code To an External File

- Open the repository in VS Code. 
- Create a new py file in the same folder as the notebook (we will call ours `my_functions.py`)
- Copy over the function(s) from your notebook into the .py file. 
    - Make sure to save the file after adding code.
    
- In your notebook, import the file (just drop the `.py`) by name,just like you do with pandas.


>- Thanks to Alexis for letting me use her function as an example! 
    - Source: https://github.com/adeviney/data-enrichment-project/blob/main/Distributions/Describing%20Distributions.ipynb 


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from scipy import stats

In [None]:
#  Source: https://github.com/adeviney/data-enrichment-project/blob/main/Distributions/Describing%20Distributions.ipynb 
def plot_function(df, feature):
    
    fig = plt.figure(figsize=(8, 6))
    ax = sns.histplot(x=df[feature],kde=True)
    mean = df[feature].mean()
    median = df[feature].median()
    std = df[feature].std()
    ax.axvline(mean, color='r', label=f'Mean = {mean:,.2f}')
    ax.axvline(median, color='g', label=f'Median = {median:,.2f}')
    ax.axvline(mean-std, color = 'k', label = f'─1 StDev = {mean-std:,.2f}')
    ax.axvline(mean+std, color = 'k', label = f'+1 StDev = {mean+std:,.2f}')
    ax.axvspan(mean-std,mean+std,color = 'y',zorder = 0)
    ax.set_title(feature);
    ax.legend()
    
    
    # Question Answers
    print('Answers to Questions')
    print('1. Is it Discrete or Continuous?')
    if ((df.dtypes[feature] == 'float') & (df[feature].nunique()/ df[feature].count() > .90)):
        #probably continuous
        print("Continuous")
    else:
        print("Discrete")
    print('\n2. Does it have a skew? If so, which direction (+/-)')
    skew = round(stats.skew(df[feature]),1)
    skew_class = 'Normal; no skew' if skew == 0 else 'Negative Skew' if skew < 0 else 'Positive Skew'
    print(skew_class)
    
    print('\n3. What type of kurtosis does it display? (Mesokurtic, Leptokurtic, Platykurtic)')
    kurt_val = stats.kurtosis(df[feature], fisher = False)
    kurt_class = 'Mesokurtic' if round(kurt_val,1) == 3 else 'Leptokurtic' if kurt_val > 3 else 'Platykurtic'
    print(f'kurtosis = {kurt_val:.2f}, {kurt_class}')
    

In [None]:
df = pd.read_csv('https://docs.google.com/spreadsheets/d/1APV3pXiAszS_0mSgkiEt9IUNH-QmyX7KwxSAwuADl6Y/gviz/tq?tqx=out:csv&sheet=medical_data')
df.head()

#### Using Auto-Reload 
- To flexibly work on updating your function inside VS Code and automatically updating the version that was imported into your notebook, we will use the autoreload extensions.
- https://ipython.org/ipython-doc/3/config/extensions/autoreload.html

```python
%load_ext autoreload
%autoreload 2
```

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
## Importing custom functions


### using `inspect` to show source code

In [1]:
import inspect
from IPython.display import Markdown, display


# txt = inspect.getsource(mf.print_xy)
# display(Markdown("```python\n"+txt+"\n```"))

# Part 3: Making a Package with Sub-Modules

## Structuring Packages

### Use folders & `__init__.py` to define package

#### [Official Packaging Tutorial](https://packaging.python.org/tutorials/packaging-projects/)
- Our module is structure like this:
```
capstone_functions
    └── __init__.py
    └── my_functions.py
    └── lz.py
```

- The folder can contain other py files as well, but its needs an __init__.py to be recognized as a package. 
- The other py files can be import inside of __init__.py to make them part of the namespace

#### [Packaging Namespaces/Submodules](https://packaging.python.org/guides/packaging-namespace-packages/#packaging-namespace-packages)_

- Contents of our __init__.py

```python
"""A collection of example functions for 081720FT cohort
- James M. Irving, Ph.D.
- james.irving.phd@gmail.com"""

from capstone_functions import my_functions
from capstone_functions import lz
```

### Create a Python Package Folder

- Create a folder in our repo (name of folder is name of package).
    - Create a `__init__.py` file in the folder. 
    - Whatever we import into the init file will appear in our package.
    - Store functions in modules/py files.
        - .py file names are sub-module name/ 
        - e.g. from sklearn.**preprocessing** import StandardScaler()
    

### Now, make a `statistics` module
- Include our original function
- Add the following functions:


In [None]:
from scipy import stats
def find_outliers_Z(data, verbose=True):
    outliers = np.abs(stats.zscore(data))>3
    
    if verbose:
        print(f"- {outliers.sum()} outliers found in {data.name} using Z-Scores.")
    return outliers


def find_outliers_IQR(data, verbose=True):
    q3 = np.quantile(data,.75)
    q1 = np.quantile(data,.25)

    IQR = q3 - q1
    upper_threshold = q3 + 1.5*IQR
    lower_threshold = q1 - 1.5*IQR
    
    outliers = (data<lower_threshold) | (data>upper_threshold)
    if verbose:
        print(f"- {outliers.sum()} outliers found in {data.name} using IQR.")
        
    return outliers


def evaluate_ols(result,X_train_df, y_train, show_summary=True):
    """Plots a Q-Q Plot and residual plot for a statsmodels OLS regression.
    """
    if show_summary==True:
        try:
            display(result.summary())
        except:
            pass
    
    ## save residuals from result
    y_pred = result.predict(X_train_df)
    resid = y_train - y_pred
    
    fig, axes = plt.subplots(ncols=2,figsize=(12,5))
    
    ## Normality 
    sm.graphics.qqplot(resid,line='45',fit=True,ax=axes[0]);
    
    ## Homoscedasticity
    ax = axes[1]
    ax.scatter(y_pred, resid, edgecolor='white',lw=1)
    ax.axhline(0,zorder=0)
    ax.set(ylabel='Residuals',xlabel='Predicted Value');
    plt.tight_layout()
    

### Also make a `sql` module

In [5]:
def load_credentials(filename, verbose=True):
    import json
    with open(filename) as f:
        login = json.load(f)
    
    if verbose:
        print(f"[i] Credentials loaded from: {filename}")
        print('- Keys:')
        [print(f"    - {k}") for k in login.keys()]

    return login




def get_schema(table,debug=False):
    from sqlalchemy.types import Text, Integer, Float, NullType, Boolean, String
    ## save pandas dtypes in list, make empty dict
    dtypes = table.dtypes
    schema = {}
    
    # for each column
    for col in dtypes.index:
        ## print info if in debug mode
        if debug:
            print(f"{col} = {dtypes.loc[col]}")

        ## if its a string column (object)
        if dtypes.loc[col]=='object':
            
            ## Fill null values and make sure whole column is str
            data = table[col].fillna('').astype(str)
            
            ## get len first
            len_str = data.map(len).max()
            
            ## if the string is shorter than 21845 use String
            # (forget how i knew it was max size)
            if len_str < 21845:
                schema[col] = String( len_str + 1)
                
            ## If longer use Text
            else:
                schema[col] = Text(len_str+1)
        
        # if float make Float
        elif dtypes.loc[col] == 'float':
            schema[col] = Float()

        ## if int make Integer
        elif dtypes.loc[col] == 'int':
            schema[col] = Integer()
            
        ## if bool make Boolean
        elif dtypes.loc[col] == 'bool':
            schema[col] = Boolean()
            
    return schema


# Part 3B: Using Your Package As-Is (Locally)

> NOTE: IF YOU PLAN ON TRYING PART 4: PUBLISHING A PACKAGE ON PYPI, I RECOMMEND SKIPPING THIS!!!

- If we can add the folder that contains our package folder to our Python path, we can import our package from anywhere on our local machine. 


- Code example from:
    - https://stackoverflow.com/questions/3387695/add-to-python-path-mac-os-x

> NOTE: rename the files referenced below to match your actual folder names.

___

- The `cohort_package` folder with the `__init__.py` file is in another repo.
    - On my local machine the path to the parent folder is:  "/Users/codingdojo/Documents/GitHub/_COHORT_NOTES/022221FT/Online-DS-FT-022221-Cohort-Notes/py_files/"


- Add the following to ~/.bash_profile, replacing the filepath "/Users....py_files/" with your local folder path to the PARENT FOLDER of the package folder.


- The following goes into the following file:
    - REMINDER: `~` = your user folder.
        - e.g. "/Users/james/"
    - For windows:
        - `~/.bash_profile`
    - For Mac:
        - If the terminal says "`zsh`" on the top of the window:
            - `~/.zshrc`
        - If the terminal says "`bash`" on the top of the window:
            - `~/.bash_profile`
        
```bash
## After activating dojo-env
conda activate dojo-env
export PYTHONPATH="$PYTHONPATH:/Users/james/Documents/GitHub/My_Repo/"

```

- If it works properly. once you start a new terminal window, you should be able to import your package anywhere on your computer.