# Grayson's Python Tips and Resources

### INDEX:
 1. [Manging File Paths in Python on Windows, OSX, Linux, etc.](#1:-Manging-File-Paths-in-Python-on-Windows,-OSX,-Linux,-etc.)
 - [Docstrings: Why They Help](#2:-Docstrings-help-you-understand-the-purpose-of-a-function)
 - [Logging](#3:-Logging)
 - [Anaconda and Jupyter](#4:-Install-Anaconda-and-Master-Jupyter)
 - [Understanding Matplotlib](#5:-Understanding-Matplotlib---the-heart-of-data-visualization-frameworks)
 - [Argparse: Pass Parameters into Python](#6:-Argparse-to-pass-arguments-into-your-python-script)
 - [Interacting with Other Applications](#7:-Interacting-with-Other-Applications---Windows)
 - [Web Scraping Tips](#8:-Web-Scraping-with-Selenium,-Requests,-and-BeautifulSoup)
 - [Pandas](#9:-Pandas---Favorite-Functions-and-Notes)
 - [Favorite Resources](#10:-Favorite-Python-Resources)

## 1: Manging File Paths in Python on Windows, OSX, Linux, etc.
[Return To Index](#Grayson's-Python-Tips-and-Resources)

Windows embrace the back-slash character ("\") - which is an escape character in python. That's a problem. 

The path delimiter on OSX and Linux is the forward-slash ("/").

Lets say you copy a path to a file and paste it in your python script - pray that no successive characters contain "\n" (new line) or "\t"(tab).  

There are several solutions - and they're not super adorable solutions:

    - double up every back-slash ("\") --> ("\\"). 
        - For example: "L:\Folder\File.txt" should be "L:\\Folder\\File.txt"
        - Unfortunately this is too much work for deep file systems. 
        
    - replace each back-slash with a forward-slash  ("\") --> ("/")
        - For example: "L:\Folder\File.txt" should be "L:/Folder/File.txt"
        - Again - this is a lot work.
        
    - use a regex string - r'path_to_file'
        - this... suprisingly works. But it feels flimsly. 
        
I have a better idea... use the os library to set up your paths. 

In [1]:
# use os path library to set up your paths. It contains awesome methods to make your life easier.
import os

#get the current directory - this will return the absolute path
current_working_directory = os.getcwd()

Ideally, if you're going to distribute your python script - you're going to include a folders containing the necessary files. All you need is the name of the folder.

In [2]:
#just title the name of the folder containing all your resources, and use the os.path.join function!.
#This should work on all platforms
script_resources_subfolder = "resources"
sub_folder = os.path.join(current_working_directory, script_resources_subfolder)
print sub_folder

L:\User Reports\GS\Pandas Exercises\INTERESTING\resources


And now my favorite os function -  __os.walk__. This will allow you to iterate through the sub_folders and grab the files you need and preserving the path name. os.walk returns an iterator, which will require the next function.

In [3]:
#This targets the sub_folder, reads every file, folder path, and 
#current path top_down, and will venture into shortcut_links to locations on your network

folder = os.walk(sub_folder, topdown=True, onerror=None, followlinks=True)

try:
    os_result = folder.next()
except:
    print "Done"

current_location = os_result[0]
sub_folders = os_result[1]
files = os_result[2]

for f in files:
    print os.path.join(current_location, f)

L:\User Reports\GS\Pandas Exercises\INTERESTING\resources\text1.txt
L:\User Reports\GS\Pandas Exercises\INTERESTING\resources\text2.txt
L:\User Reports\GS\Pandas Exercises\INTERESTING\resources\text3.txt


## 2: Docstrings help you understand the purpose of a function
[Return To Index](#Grayson's-Python-Tips-and-Resources)

Beginners in python rarely rarely add docstrings to their code because they think hashtag (#) comments suffice. Sure, I'm thankful for the comments - but DocStrings make things a lot easier. In fact, a lot of code-completion software parse doctrings for the users to conveniently read without explicitly visiting the file or utilizing the \__doc\__ attribute (see below). 

In [4]:
def someFunction(parameter_1, parameter_2, *args, **kwargs):
    """
    someFunction enables you to peform a function. This is a summary. 
    
    parameter_1: object_type - you need parameter_1
    parameter_2: object_type - you need parameter_2
    
    ----Arg Parameters----
    args: relevent args to utilize
    
    ----Other Parameters----
    kwargs: relevent keywords to utilize for a specific class object or framework
    """
    pass

Now, see why docstrings are amazing...

In [5]:
print someFunction.__doc__


    someFunction enables you to peform a function. This is a summary. 
    
    parameter_1: object_type - you need parameter_1
    parameter_2: object_type - you need parameter_2
    
    ----Arg Parameters----
    args: relevent args to utilize
    
    ----Other Parameters----
    kwargs: relevent keywords to utilize for a specific class object or framework
    


See what happens when you call **os.walk.\__doc\__**. Also for further reading, check out this post by the <a href="http://docs.python-guide.org/en/latest/writing/documentation/">The Hitchhiker's Guide to Python</a>.

## 3:-Logging
[Return To Index](#Grayson's-Python-Tips-and-Resources)

<a href="https://fangpenlin.com/posts/2012/08/26/good-logging-practice-in-python/">Fang's blog post</a> will persuade you on why logging practices are super important even if you're just a beginner. 

Read it and you will understand:
 - why print statements are not a good idea
 - why you should use the Python standard logging module
 - how you can implement this in your code
 
__Not reinventing the wheel...__

Here are some docs:
 - <a href="https://docs.python.org/2/howto/logging.html#">HOWTO</a>
 - <a href="https://docs.python.org/2/howto/logging-cookbook.html#logging-cookbook">Logging Cookbook</a>
 
 __And here's the basic syntax__:

In [None]:
import logging

#Create Logger & Set the Level
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

#Create console handler
ch = logging.SteamHandler()
ch.setLevel(logging.DEBUG)

#Create file Handler
fh = logging.FileHandler('example.log')
fh.setLevel(logging.DEBUG)

#create Formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
fh.setFormatter(formatter)

#Add the ch handler
logger.addHandler(ch)
logger.addHandler(fh)


logger.debug("This is the debug.")
logger.warning('watch out!')
logger.info('I told you so')

## 4: Install Anaconda and Master Jupyter
[Return To Index](#Grayson's-Python-Tips-and-Resources)

Virtualenv is like a serious ex-girlfriend. We had good thing going a long time ago and we don't need to discuss it. There's an easier way to maintain your environment as a data analyst.

No, it's not pip - although almost everyone uses pip. But... 

<a href="https://www.anaconda.com/">__Anaconda__</a>:

Anaconda is a package manager that makes it so easy to maintain your enviroments, it's not even fair. Essentially, you can:
   - Install multiple versions of pythons on the same computer
   - Download and install libraries in their respective environments using the "conda" command
   - Search through a __huge__ library of python frameworks
   - Do many more things
    
Best of all - __it's free__.

Here are some reasources:
 - <a href="https://conda.io/docs/_downloads/conda-cheatsheet.pdf">Cheat Sheet</a>
 - <a href="https://docs.anaconda.com/">Official Documentation</a>

With anaconda, make sure you install...

<a href="http://jupyter.org/">__Jupyter__</a>:

Did you recognize the platform that I'm using to write this post?

Jupyter notebooks make it data analytics with Python __so easy__ because it:
   - allows you to organize your data analysis into digestible formats
   - render your analysis on your website, GitHub, or wherever!
   - debug your python code
   - explore data
   - many more tips and tricks
 
Here are some resources:
- <a href="https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Jupyter_Notebook_Cheat_Sheet.pdf">Cheat Sheet</a>
- <a href="https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/">28 Jupyter Notebook tips and tricks</a>
- <a href="http://jupyter-notebook.readthedocs.io/en/stable/">Official Documentation</a>
- <a href="https://medium.com/ibm-data-science-experience/markdown-for-jupyter-notebooks-cheatsheet-386c05aeebed"> Jupyter Markdowns</a>
 
 Jupyter notebook is my de facto development environment. 

## 5: Understanding Matplotlib - the heart of data visualization frameworks
[Return To Index](#Grayson's-Python-Tips-and-Resources)

Here's a compendium super helpful Matplotlib tutorials:
 - <a href="https://realpython.com/python-matplotlib-guide/#why-can-matplotlib-be-confusing">Real Python</a>

## 6: Argparse to pass arguments into your python script
[Return To Index](#Grayson's-Python-Tips-and-Resources)

Sometimes, it makes sense to call a python script from the command line or a batch file. However, you need to feed a series of parameters into the script. That's where the built-in framework __argparse__ comes in.

Assuming that the end-user isn't familiar with python but wishes to utilize your script for the first time, argparse will explicitly tell the user how to utilize the script.

In [None]:
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("-n", "--name", help="name of user")
parser.add_argument("-s", "--sex", help="gender of user", required=False)
parser.add_argument("-a", "--age", help="age of user", required=False)
args = parser.parse_args()
print args.name

This won't run in a jupyter notebook - but it works. Essentially, you can use this command:

__file_with_argparse.py --help__

And here's the output:

In [None]:
"""
usage: logging_practice.py [-h] [-n NAME] [-s SEX] [-a AGE]

optional arguments:
  -h, --help            show this help message and exit
  -n NAME, --name NAME  name of user
  -s SEX, --sex SEX     gender of user
  -a AGE, --age AGE     age of user
 """

## 7: Interacting with Other Applications - Windows
[Return To Index](#Grayson's-Python-Tips-and-Resources)

In [8]:
import pyautogui as pgui
import win32api, win32con

def click(x,y):
    """
    A mouse-click anywhere on the screen.
    
    Parameters:
    x: horizontal pixel location on your screen
    y: vertical pixel location on your screen
    """
    win32api.SetCursorPos((x,y))
    win32api.mouse_event(win32con.MOUSEEVENTF_LEFTDOWN,x,y,0,0)
    win32api.mouse_event(win32con.MOUSEEVENTF_LEFTUP,x,y,0,0)
    
def activate_window(x=770, y=165):
    """
    Activate window by utilzing the click function. It's same thing, but makes it easier for reading the code.
    """
    click(x, y)


def _write(x):
    """
    This uses the Keyboard library to type out words.
    
    Parameters:
    x: String
    """
    pgui.typewrite(x)

def _type(x):
    """
    This uses the Keyboard library to type out words.
    
    Parameters:
    x: String
    """
    for i in str(x):
        pgui.keyDown(i); pgui.keyUp(i)

def _enter(num_times=1):
    """
    Press Enter... 
    
    Parameters:
    num_times: integer - number of times you press enter. Default == 1
    """
    for _ in range(num_times):
        pgui.keyDown('enter'); pgui.keyUp('enter')


def _tab(num_times=1):
    """
    Press Tab... 
    
    Parameters:
    num_times: integer - number of times you press tab. Default == 1
    """
    for _ in range(num_times):
            pgui.keyDown('tab'); pgui.keyUp('tab')


def _F_Key(F_Key, num_times=1):
    """
    Press F_Key... 
    
    Parameters:
    F_key: String - Any F-Key is pressed. Example: 'f6', 'f3'
    num_times: integer - number of times you press F_key. Default == 1
    """
    for _ in range(num_times):
            pgui.keyDown(F_Key); pgui.keyUp(F_Key)

## 8: Web Scraping with Selenium, Requests, and BeautifulSoup
[Return To Index](#Grayson's-Python-Tips-and-Resources)

When API's aren't available, scraping a website is an option. Basically, you're looking at large tables in embedded in HTML, splashed across several pages on a website. You can even harvest information from javascript variables, xml element-trees, json, etc. 

Just remember: Get permission.

To get started -

__Requests and BeautifulSoup__:


Here's an implementation:

And here is what a link would look like. Feel free to check out the <a href="https://readthedocs.org/projects/beautiful-soup-4/">beautifulsoup4 documentation</a>.

In [4]:
link

<a href="/intl/en/policies/terms/">Terms</a>

__Selenium__:

Web developers use selenium to test features of their websites on multiple browsers - but for our purposes, we can surf the internet programatically.

That means you iterate through a large list of websites and programmatically iteract with them. Selenium can access links, images, text, and even javascript elements so you can watch the overall behavior of the website.

Secondly, selenium can _extract_ elements of HTML- using features like XPATH or "ElementTrees". If you look at the source code, selenium makes ample use of requests and beautifulsoup. 

However, there are pitfalls: 
Browsers, overtime, seems to be gradually limiting support for python's version of selenium. For instance, Safari intentionally ignores certain methods of selenium whereas Chrome and Firefox doesn't seem to care - so keep that in mind. If a method isn't working out for you, try another web browser.

Here's an implementation:

In [None]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()

For documentation:
<a href="https://selenium-python.readthedocs.io/getting-started.html#simple-usage">Selenium Docs</a>

The trickiest part was figuring out the role of the webdriver. You will 

## 9: Pandas - Favorite Functions and Notes
[Return To Index](#Grayson's-Python-Tips-and-Resources)

In [7]:
import pandas as pd

1. __When you make updates to rows of code:__
    - __Wrong__:   df\[\(filter logic)] = updated_values
    - __Correct__: df.loc["name of column", df\[(filter logic)].index] = updated_values
    - __Explanation__: The "wrong" version uses the filter which effectively changing the dimensions of the dataframe. It's much slower and requires many steps to "unfilter/unpack the dataframe".  Imagine changing 100 rows out of 2 million with this method, only to realize that you're compressing (2m - 100) rows to change only 100 rows, and then asking pandas to retrieve everything els in memory. Thats's a lot of work! Instead, you want to update the values "in_place" by referring to various row indices _without_ changing the dimensions of the dataframe. The correct method basically says "Here are the indices. Update these rows at this column without worrying about the other (2m - 100) rows!"


2. __Python List Slicing vs Pandas Slicing__:
    - Remember that pandas' endpoints are _inclusive_ in slicing functions. For example, df.ix[1:5] in pandas is basically equivalent to a python slicing function pl[0:5], where the endpoint is _exclusive_.
    - note that df means "dataframe" and pl means "python list". I consider both to contain the same data.
    
    
3. __Pandas is Not Thread-Safe!__

## 10: Interesting Tips and Resources
[Return To Index](#Grayson's-Python-Tips-and-Resources)

 - <a href="https://chrisalbon.com/">Chris Albon</a>
 - <a href="https://realpython.com/python-matplotlib-guide/#why-can-matplotlib-be-confusing">Real Python</a>
 - <a href="https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Jupyter_Notebook_Cheat_Sheet.pdf">Jupyter Cheat Sheet</a>
 - <a href="https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/">28 Jupyter Notebook tips and tricks</a>
 - <a href="http://jupyter-notebook.readthedocs.io/en/stable/">Jupyter Notebook Official Documentation</a>
 - <a href="https://medium.com/ibm-data-science-experience/markdown-for-jupyter-notebooks-cheatsheet-386c05aeebed"> Jupyter Markdowns</a>
 - <a href="https://docs.anaconda.com/">Anaconda Documentation</a>
 - <a href="https://docs.python.org/2/howto/logging.html#">HOWTO</a>
 - <a href="https://docs.python.org/2/howto/logging-cookbook.html#logging-cookbook">Logging Cookbook</a>
 