***
***
***
<br><br><br><br><br>
<h1>Python for Business Analytics</h1>
<em>A Nontechnical Approach for Nontechnical People</em><br><br>
<em><strong>Custom Edition for Hult International Business School</strong></em><br>

Written by Chase Kusterer - Faculty of Analytics <br>
Hult International Business School <br>
https://github.com/chase-kusterer <br><br><br><br><br>
***
***
***

# <u>Chapter 1: Before Learning Anything Else, Learn This</u>

**If you are new to coding, do not skip this chapter.**<br><br>
Python was designed to be a simple programming language. In fact, in 1999, Guido van Rossum, the founder of the Python language, sent a funding proposal to DARPA for his *Computer Programming for Everybody* initative, where he posited the following question:<br><br>
<div align="center"><strong>
    "What would the world look like if users could program their own computer?"
</strong><a class="tocSkip"></div><br>
The proposal went on to make the following claim:<br><br>
<div align="center"><strong>
"There is enough (anecdotal) evidence that Python is easy to<br>learn for people who are (nearly) computer-illiterate."
</strong><a class="tocSkip"></div>

This proposal, which was submitted by van Rossum and the *Python in Education Special Interest Group* [can be found here](https://www.python.org/doc/essays/everybody/). It is an interesting read and makes some very good points. However, even though Python may be easy to learn, **it is important to learn how to learn Python, and that is the purpose of this chapter.** At times, you will get stuck, and you need to know what to do. There will also be many times when your code won't run properly, even when you've spent hours trying to debug it. This is very frustrating, and it is something all coders experience from time to time. The good news is that as long as you are not one of the world's most advanced Python coders, someone has already experienced and solved your problem. More importantly, it is very likely that a solution to your problem has been generously shared to the open source community. All you need to know is where to find it. Not knowing can be devastating to your morale. To exemplify this, the following story is from one of my former students, which took place while conducting an analysis for the course: Machine Learning.<br><br>

<div align="left">
    <em><strong>Lucas Barros - Masters of Business Analytics - Class of 2019</strong></em><br>
    <em>
“While studying in the Masters of Business Analytics program, I was also working part time at the campus café. It's always good to make some extra money as a graduate student. Learning analytics, especially coding, was a very interesting experience, although sometimes I would lose confidence, questioning whether this was the right field for me. A great example of this is the Game of Thrones Character Prediction analysis project.<br>
        
We had a dataset based on the book series, which contained around 2000 characters and several features describing each character. It was probably the most stressful project of my life. Throughout this project, sleeping 6 hours a night was a luxury. I spent countless hours working on the dataset: engineering new features, testing out different machine learning algorithms, and trying not to question my life choices too deeply. On the note of life choices, coincidentally, the campus café was looking to hire a new manager and they asked me if I would like the position. I have to be very sincere, after the lack of sleep, the stress of hours of trying to debug my code to no avail, and the process of trying to build an algorithm that predicts reasonably well, the option of giving up and living a more chill life sounded very appealing.<br>

Nonetheless, after some more nights with minimal time to sleep, I found solutions to my coding issues and am proud to say I completed the project. While being one of the most difficult projects I've ever encountered, it was by far one of the most rewarding!”
</em><a class="tocSkip"></div><br>

In this chapter, we will cover five critical resources in an effort to alleviate your long-term coding frustration:

* finding help with the help() wrapper
* code complete in Jupyter notebook
* finding answers on the Internet
* talking to humans through code comments

**Reminder:** If you are new to coding, do not skip this chapter. Time invested here will save you several hours as you move forward.

## 1.1 Finding Help with the help() wrapper
### Introduction
One of the most critical functions for programmers at all levels is the **help()** function. This is one of the most amazing functions ever written, and you will be using it quite often. To learn more about what it does, search *help* in the **help()** function (*Code 1.1.1*).

In [1]:
## Code 1.1.1 ##

help(help)

Help on _Helper in module _sitebuiltins object:

class _Helper(builtins.object)
 |  Define the builtin 'help'.
 |  
 |  This is a wrapper around pydoc.help that provides a helpful message
 |  when 'help' is typed at the Python interactive prompt.
 |  
 |  Calling help() at the Python prompt starts an interactive help session.
 |  Calling help(thing) prints help for the python object 'thing'.
 |  
 |  Methods defined here:
 |  
 |  __call__(self, *args, **kwds)
 |      Call self as a function.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



***

We call the output of *Code 1.1.1* the help function's **documentation**. According to the documentation for **help()**, there are two ways to use this function:
* <strike>Calling help() at the Python prompt starts an interactive help session.</strike>
* Calling help(thing) prints help for the Python object 'thing'.

Notice how the first bullet point above has been stricken out. That is because we are going to **avoid using interactive help for the time being**. Believe it or not, interactive help sessions can (ironically) cause problems that most beginners are not ready to solve. If you are not comfortable using a command-line interface (i.e. terminal or PowerShell), avoid using interactive help. If you don't know what a command-line interface is, don't worry. For now, we are going to focus on mastering the *help(thing)* option. In *Code 1.1.2*, we are using this to check the documentation for the *print()* function.

##### Side Note: The Challenge with Interactive Help
After starting an interactive help session, you may run into a situation where you are unable to run any code. This is because your Python kernel is still running interactive help, and it cannot move on until you give it a command to do so. This can be done by typing **quit** into the help search box, but if we accidentally tried to run some code without closing interactive help, our Python kernel might get confused and need to be restarted.<br><br>


### Using help()

In [18]:
## Code 1.1.2 ##

# Starting an interactive help session
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



*Code 1.1.2* generates a very manageable amount of output. Let's dissect each component piece by piece. First, let's add line numbers for easier interpretation.

1&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp; Help on built-in function print in module builtins:
<br>2&nbsp;&nbsp;&nbsp;|
<br>3&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp; print(...)
<br>4&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
<br>5&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
<br>6&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
<br>7&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Prints the values to a stream, or to sys.stdout by default.
<br>8&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Optional keyword arguments:
<br>9&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file:  a file-like object (stream); defaults to the current sys.stdout.
<br>10&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; sep:   string inserted between values, default a space.
<br>11&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end:   string appended after the last value, default a newline.
<br>12&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; flush: whether to forcibly flush the stream.<br>

***

<table width="375" align="left">
<col width="25">
<col width="350">
    <tr>
        <th>Line</th>
        <th>Interpretation</th>
    </tr>
    <tr>
        <td>1</td>
        <td>States that this is a <strong>built-in function</strong>.</td>
    </tr>
    <tr>
    <tr>
        <td>3</td>
        <td>The function being looked up.</td>
    </tr>
    <tr>
        <td>5</td>
        <td> The arguments for the function.</td>
    </tr>   
    <tr>
        <td>7</td>
        <td> Explains what the function does.</td>
    </tr>
    <tr>
        <td>9-12</td>
        <td> Explains what each argument does.</td>
    </tr>
</table>

#### Line 1 - Built-in functions
Python comes with a set of default, or **built-in functions**. These are also referred to as **primitives**, and are functions that are so important, they are available without the need to import additional packages (for an example of a package import, see *Code 1.1.3* where we import the package *pandas*). At the time of this writing, the most recent version of Python (*version 3.7.3*) has 69 built-in functions, which can be found in [the Python documentation on Built-In Functions](https://docs.python.org/3/library/functions.html). Python also comes with a series of other built-ins. For example, *built-in types*. According [to the Python documentation on Built-In Types](https://docs.python.org/3/library/stdtypes.html), principal built-in types include numerics, sequences, mappings, classes, instances and exceptions. Think of built-ins as the primary numbers of Python: every function can be broken down into these built-ins. Also, keep in mind that Python was designed to be used for a wide variety of programming tasks, and what is commonly used for business analytics is just one small subset. To avoid information overload, this book will primarily focus on the built-ins, functions, methods, and packages that are most relevant to our purposes. If you're not sure what a function, method, or package is, don't worry. We will go into these details in later chapters.

#### Line 3 - The function being looked up
As the heading implies, this line specifies what you have looked up. When we called help on the *print()* function in *Code 1.1.2*, this didn't seem to add a lot of value. However, this information becomes extremely useful in other situations, such as in *Code 1.1.3* where *help()* is being called on a user-created object. For now, don't focus on what *Code 1.1.2* is trying to do. Instead, focus on the *help()* function in the last line.

***

In *Code 1.1.3*, we created an object and called **help()** on it (creating objects will be covered in more detail in [Chapter 2](http://localhost:8888/notebooks/Desktop/Jupyter_Files/Rough%20Draft%20Chapter%202%20-%20Printing.ipynb). From the object's name (*my_list_converted*), it is unclear as to what this object actually is. Notice that the help function recognized that this object is a DataFrame and returned its respective documentation. This is the result when we call help on any named object. Below is a snippet of Line 3 for *Code 1.1.3*<br><br>
~~~
class DataFrame(pandas.core.generic.NDFrame)
~~~

In [None]:
## Code 1.1.3 ##

import pandas as pd

my_list = [[1, 2, 3] , [4, 5, 6]]

my_list_converted = pd.DataFrame(my_list)

help(my_list_converted)

***

#### Line 5 - Function Arguments
For most functions you will encounter, arguments will come in three forms: <em>mandatory</em>, <em>optional</em>, and *variable* (also known as <em>\*args</em>  and <em>\*\*kwargs</em>). Line 5 of *Code 1.1.2* contains mandatory and optional arguments.<br><br>


\#\# Output of Code 1.1.2 \#\#
~~~
Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.
~~~

<div style="margin-left: 1em;">

##### Mandatory Arguments
Mandatory arguments are those that do not have a default value. All mandatory arguments must be specified for a function to run. If at least one is missing, the function will throw an error. The easiest way to tell if an argument is mandatory is by checking whether or not it already has a value assigned to it, which is indicated by an equals sign. If no equals sign, then the argument is mandatory. In *Code 1.1.2*, the argument *value* is mandatory.<br>

##### Optional Arguments
Optional arguments have a default value, which is indicated after an equals sign. If you do not specify anything for these arguments, the default value will be used and the function will run properly. In *Code 1.1.2*, the arguments *sep*, *end*, *file*, *flush* are optional arguments.<br><br>

##### Variable Arguments
Variable arguments are slightly more advanced. As their name implies, they allow a function to accept a *variable* number of arguments. This may seem confusing, but the idea behind such an invention is quite remarkable.<br><br>
Let's say, for example, that a programmer wanted to create a function to help organize their grocery list. The programmer may do some research and come to the conclusion that most shopping trips consist of exactly three items and write a code similar to the one displayed in *Code 1.1.4*.

*Note: Try not to get caught up in the syntax of Code 1.1.4. Instead, focus on the function's arguments.*
<br><br>
***
<br>
A challenge arises in that the function in <em>Code 1.1.4</em> requires exactly three items to work properly as all arguments are mandatory. In other words, each shopping list needs to be exactly three items long. Lessons from Social Life 101 have taught us that the number of items on our shopping list will vary. Therefore, the programmer needs a way to allow for this functionality. This is where variable arguments become very handy.<br><br>
By changing the arguments to <em>*args</em> as in <em>Code 1.1.5</em>, our shopping list can be of any length. <em>**kwargs</em> is similar to <em>*args</em> in that it allows for arguments of varying length. However, it operates using keywords, which is a concept for a later chapter. <em>**kwargs</em> will become more important when we discuss dictionaries in a later chapter.


<em>Note: As before, try not to get caught up in the syntax of Code 1.1.5. It will be explained in later chapters.</em>

***
</div>

In [24]:
## Code 1.1.4 ##

def shopping_list(item_1, item_2, item_3):
    print("Shopping List:")
    print(item_1)
    print(item_2)
    print(item_3)

shopping_list('bananas', 'oranges', 'grapes')

Shopping List:
bananas
oranges
grapes


<br><br>

***

In [13]:
## Code 1.1.5 ##

def shopping_list(*args):
    print("Shopping List:")
    
    for item in args:
        print(item)

shopping_list('bananas', 'oranges', 'grapes', 'pears', 'apples')

Shopping List:
bananas
oranges
grapes
pears
apples


#### Line 7 - What the Function Does
Line 7 of *Code 1.1.2* states that the print function:
<br><br><div align="center"><em>
    Prints the values to a stream, or to sys.stdout by default.
</em><a class="tocSkip"></div><br>
The programmers designing this function decided that this was the best way to explain what the <em>print()</em> function does. To someone less technical, this explanation may do more harm than good. An important concept to keep in mind is that:
<br><br><div align="center"><strong>
    Programmers like to write in ways that other programmers can understand.<br>If you are not a programmer, you are not their target audience.
</strong><a class="tocSkip"></div><br>

This is a disadvantage for those of us that did not study software engineering or a similar subject. Luckily, there are ways to mitigate this disadvantage, which are discussed in Section 1.2. If you consider yourself to be less technical than a software engineer, please remember that as you advance in Python, your understanding of technical concepts will grow.
#### Lines 9-12 - What each argument does.
This section is very important as it will save you several hours of time when learning how to code. By reading and understanding the arguments, you will be able to do many things with only a handful of functions. Understanding a few functions at a detailed level is far more efficient than trying to memorize the basics of several functions. You will find that you can do more with less, and will have built a strong foundation and set good coding habits.

***

## 1.2 Code Complete in Jupyter Notebook

Jupyter Notebook, like many other interfaces, offers a code complete feature. This can come in very handy, and in this section we will discuss using code complete to enhance our abilities to use *help()*. This is a great way to explore a new package and develop an understanding as to its methods.<br><br>
To activate code complete, simply start typing in a coding block and press <em><tab\></em> on the keyboard. If this doesn't work, check your keyboard shortcuts in Jupyter's menu under <em>Help > Keyboard Shortcuts</em>. For example, if we open up a coding cell and type the letter <em>p</em>, followed by pressing <em><tab\></em>, we get the result in *Figure 1.1*. In this case, Jupyter's code complete tool recognized the letter <em>p</em> and returned every currently-available syntax that starts with this letter.

Much of the syntax displayed in <em>Figure 1.1</em> are beyond our current scope, but with this feature, we can dive into new packages and syntax with ease, as exemplified in <em>Figures 1.2a through 1.2d</em>.<br><br>
<div style = "width:image width px; font-size:80%; text-align:center;"><img src="files/example-chapter-1-code-complete-1.png" width="100" height="300" style="padding-bottom:0.5em;"> <em>Figure 1.1: Code complete results.</em></div>

***

The steps throughout <em>Figures 1.2a through 1.2d</em> are using a technique called **chaining**, or linking multiple Python syntax together using a dot (i.e. a "."), to explore part of the *pandas* package. According to its [documentation from  pydata.org](https://pandas.pydata.org/) *pandas* is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It is a very powerful package for business analytics, and it is a very large. Anaconda has kindly included it in their Python 3 installation, and all we need to do is **import** it.

#### Importing Packages in Python 3
Importing packages is easy. To import pandas, we simply write: <br><br>

~~~
import pandas
~~~

<br>Conventionally, *pandas* should be imported as *pd*, so we modify the above code as follows:<br><br>

~~~
import pandas as pd
~~~

<br>We can also import specific modules and functions from a package with a minor adjustment to our syntax. For example, if we just wanted to import **DataFrame** (i.e. Python's version of a spreadsheet) instead of the entire *pandas* package, we could do so as follows:<br><br>

~~~
from pandas import DataFrame
~~~

<br>To illustrate the value of the line of code above: Have you ever noticed that when you get close to filling up your computer's hard drive it becomes very slow? The same concept applies to our Python environment. Every time something is imported, our Python environment slows down. This is why taking a minimalist approach by only importing what we need is a good practice. Since our current task involves exploring the *pandas* package, we will import the entire thing.
<br><br>




After *pandas* has been imported (as *pd*), it becomes available in our Python environment. If we run *help(pd)*, Python will return some documentation on *pandas*. If we add a dot to our syntax and press *<tab\>*, as in *Figure 1.2(a)*, code complete displays all of the wonderful tools *pandas* has to offer. Some of these tools have deeper levels, including *np*, which stands for *numpy*. This is another key package for business analytics, and it is included in *pandas* (*pandas* is built on top of *numpy*).<br><br>
If we chain *np.* onto our code and press *<tab\>*, code complete will display all of the wonderful tools available in this package, as in *Figure 1.2(b)*.<br><br>
This drill-down process can continue until we are at the deepest level of a package. *Figure 1.2(c)* drills down one level deeper by chaining *absolute.* onto our code, and *Figure 1.2(d)* drills down one level deeper by chaining *accumulate*. Since there are no further levels to drill down into, code complete does not display a popup window when trying to extend the chain beyond *pd.np.absolute.accumulate*.

Much of the syntax discovered in each step of this chaining example is beyond our current scope, and seeing it for the first time can be overwhelming. Keep in mind: Python was designed for a wide array of programming tasks. Having so many things pre-built and maintained is going to save you an immense amount of time without costing you a penny. This is one of the key benefits of Python as well as other open source programming languages.

Below is a summary of the chaining steps:

1. *pandas* was imported as *pd*<div style="margin-bottom:0.5em;"></div>
2. a *help()* wrapper was created<div style="margin-bottom:0.5em;"></div>
3. Code complete was called on *pd.* to access its available syntax<div style="margin-bottom:0.5em;"></div>
4. *np* was selected, chaining together *pd.np*<div style="margin-bottom:0.5em;"></div>
5. Code complete was called on *pd.np.* to access its available syntax<div style="margin-bottom:0.5em;"></div>
6. *absolute* was selected, chaining together *pd.np.absolute*.<div style="margin-bottom:0.5em;"></div>
7. Code complete was called on *pd.np.absolute.* to access its available syntax<div style="margin-bottom:0.5em;"></div>
8. *absolute* was selected, chaining together *pd.np.absolute.accumulate*<div style="margin-bottom:0.5em;"></div>
<br><br>


<br><br>
<div style = "width:image width px; font-size:80%; text-align:center;"><img src="files/example-chapter-1-code-complete-2a.png" width="200" height="300" style="padding-bottom:0.5em;"> <em>Figure 1.2(a): Exploring the pandas package (1 of 4).</em></div>
<br>

***

<br>
<div style = "width:image width px; font-size:80%; text-align:center;"><img src="files/example-chapter-1-code-complete-2b.png" width="200" height="300" style="padding-bottom:0.5em;"> <em>Figure 1.2(b): Exploring the pandas package (2 of 4).</em></div>
<br>

***

<br>
<div style = "width:image width px; font-size:80%; text-align:center;"><img src="files/example-chapter-1-code-complete-2c.png" width="200" height="300" style="padding-bottom:0.5em;"> <em>Figure 1.2(c): Exploring the pandas package (3 of 4).</em></div>
<br>

***

<br>
<div style = "width:image width px; font-size:80%; text-align:center;"><img src="files/example-chapter-1-code-complete-2d.png" width="200" height="300" style="padding-bottom:0.5em;"> <em>Figure 1.2(d): Exploring the pandas package (4 of 4).</em></div>
<br>

***

<em>Figure 1.3</em> displays each of these steps side-by-side. Its final code can be run in *Code 1.2.1*.<br><br>
<div style = "width:image width px; font-size:80%; text-align:center;">
    </div>

<div style = "width:image width px; font-size:80%; text-align:center;">
<img src="files/example-chapter-1-code-complete-2a.png" style="float: left; width: 20%; margin-right: 1%; margin-bottom: 0.5em; padding-left:0.5em;">
<img src="files/example-chapter-1-code-complete-2b.png" style="float: left; width: 20%; margin-right: 1%; margin-bottom: 0.5em; padding-left:0.5em;">
<img src="files/example-chapter-1-code-complete-2c.png" style="float: left; width: 20%; margin-right: 1%; margin-bottom: 0.5em; padding-left:0.5em;">
<img src="files/example-chapter-1-code-complete-2d.png" style="float: left; width: 20%; margin-right: 1%; margin-bottom: 0.5em; padding-left:0.5em;">
<p style="clear: both;">
<em>Figure 1.3: Code complete results after method chaining.</em>
</div>

***

In [16]:
## Code 1.2.1 ##

import pandas as pd

help(pd.np.absolute.accumulate)

Help on package pandas:

NAME
    pandas

DESCRIPTION
    pandas - a powerful data analysis and manipulation library for Python
    
    **pandas** is a Python package providing fast, flexible, and expressive data
    structures designed to make working with "relational" or "labeled" data both
    easy and intuitive. It aims to be the fundamental high-level building block for
    doing practical, **real world** data analysis in Python. Additionally, it has
    the broader goal of becoming **the most powerful and flexible open source data
    analysis / manipulation tool available in any language**. It is already well on
    its way toward this goal.
    
    Main Features
    -------------
    Here are just a few of the things that pandas does well:
    
      - Easy handling of missing data in floating point as well as non-floating
        point data.
      - Size mutability: columns can be inserted and deleted from DataFrame and
        higher dimensional objects
      - Automatic an

In [None]:
## Space to practice using help() ##





***

## 1.3 Finding Answers on the Internet

*Note: This is by no means an exhaustive list, and there are many great resources that are not mentioned here. If you find a different resource that explains code in a way that fits well with your learning style, use it. These two resources are mentioned as I have found them incredibly helpful in my coding journey.*

### Stack Overflow
There are many great coding resources available on the Internet. At the time of this writing, [Stack Overflow](www.stackoverflow.com) is one of the most popular. According to its website:
<br><br><div align="center"><em>
    Stack Overflow is a question and answer site for professional and enthusiast programmers.
</em><a class="tocSkip"></div><br>
In other words, Stack Overflow is a place where coders ask questions to coders. It is a wonderful place to find information to help get you through your coding challenges, and is also a place where you can post your own questions so that the coding community can help you out. This is one of the key benefits of the platform, and it is free of charge.<br>
    
**Caution:** The relevance of your search results can be hit or miss. Sometimes you will have trouble articulating your question in a way that gets good results. Other times you may find solutions to your question, but they are too complicated for your current coding ability. Every now and then, you will be more confused than when you started your search. This is normal. Keep in mind that Stack Overflow was designed so that coders could ask questions to coders, regardless of their programming level. You will find some very advanced solutions to problems that you never even knew existed. This leads us to Google, our second suggested resource.


### Working with a Search Engine
It goes without saying that many of your coding questions can be resolved via a search engine. However, it is a good idea to reflect on why search engines are a good resource. Most notably, they allow us to ask questions the way that a human would ask them. We can use this to our advantage, thus alleviating any disadvantages we may have from not being the target audience of a programmer.
<br><br><div align="center"><strong>
    Ask questions with a search engine the way that you would ask someone at your programming level.
</strong><a class="tocSkip"></div><br>
Interestingly enough, in many cases after experiencing poor results when searching on Stack Overflow, you will find that the top search result in a search engine is a page on Stack Overflow. Think about this. Search engines were designed in a way that is friendly to humans. The search functions in Stack Overflow was designed in a way that it is friendly to programmers, which is a subset of humans with a sophisticated understanding of programming jargon. When this happens, click on the Stack Overflow page and take note of how the question was phrased. This is a good way to improve your ability to find the information you need.


**Challenge for the Reader:** If you are serious about learning Python, register with Stack Overflow and start writing answers to other people's questions. This will intensify your learning speed. Stack Overflow also assists new contributors in that they let other coders know that they are new to the platform. This means that other coders will realize you are trying your best, and in most cases they will be more than willing to give you feedback.

***

## 1.4 Talking to Humans through Comments
Even though you may have very little coding experience, try reading the following code and understanding what the programmer was trying to accomplish:<br>
***

In [None]:
## Code 1.3.1 ##

import pandas as pd

original_df = pd.DataFrame([[None, 2, 3],
                            [4, None, 6],
                            [7, 8, None]])

df_mean = pd.DataFrame.copy(original_df)

for col in df_mean:
    if df_mean[col].isnull().any():
        col_mean = df_mean[col].mean()
        df_mean[col] = df_mean[col].fillna(col_mean).round(2)

***
We will be using a slightly-modified version of this code later in this book. If you're new to coding, it can be very intimidating. We learned in *Section 1.1* that we can use the *help()* function to understand what each line of code means. This is a good idea, especially if we are planning to use this code for a similar analysis. From reading the code, however, it is difficult to intuitively understand what this code is trying to do. Wouldn't it be great if the person who wrote this code gave us some additional documentation, in human language, that outlined what was happening step-by-step?<br><br>

In the coding world, we provide this additional documentation via **comments**. Comments allow humans to talk to other humans within their code. They are very special in that when we write them, the computer knows we are talking to other humans and ignores our writing. This gives us the freedom to write anything we need to in order to clearly explain our code to others. In this section, we will cover two of the most widely-used forms of comments:
* hashtag comments (#)
* triple-quote comments (""" """)

***


#### Hashtag Comments (#)
As the name implies, hashtag comments are denoted by, well, a hashtag (#). The essential purpose of hashtag comments is that they help others understand what you are doing, even if you are doing it wrong. If there is a discrepancy between your hashtag comments and your code, more experienced coders are likely to pick that up and give you feedback. If we add hashtag comments to our code, it becomes much more readable. This has been done in *Code 1.3.2*.

<br><br><div align="center"><em>
<a href="https://imgflip.com/i/32441k"><img src="https://i.imgflip.com/32441k.jpg" title="made at imgflip.com" width="250" height="250" alt="Game of Thrones Meme">Created with imgflip Meme Generator.</a>
</em><a class="tocSkip"></div>

***

In [None]:
## Code 1.4.2 ##

# Importing packages
import pandas as pd  # data science essentials

# Creating a DataFrame that has missing values
original_df = pd.DataFrame([[None, 2, 3],
                            [4, None, 6],
                            [7, 8, None]])

# Creating a copy of the original dataset so that I don't destroy the
# original.
# I'm planning to impute missing values on this dataset using the mean.
# This is why I called this df_mean instead of something else.
df_mean = pd.DataFrame.copy(original_df)

# This is a loop that looks for missing values in a column.
# For each column with missing values, use the mean to fill in the missing
# values.
for col in df_mean:
    if df_mean[col].isnull().any():
        col_mean = df_mean[col].mean()
        df_mean[col] = df_mean[col].fillna(col_mean).round(2)

***
Although the functionality of *Code 1.4.2* is identical to *Code 1.4.1*, it is much easier to understand what each line is doing and the rationale behind why it was coded. Again, the computer does not care that these lines are here. It will see each '#' and know to skip anything on that line written after it.

#### Stand-Alone Strings
Now that we know Python ignores hashtags, it is not surprising that Python ignores other things as well. More accurately, there are some things that Python reads but does not know what to do with, and thus it does not affect our code. One such example is **stand-alone strings**. Take for example *Code 1.4.3*, which is a modified version of *Code 1.4.2*:
***

In [None]:
# Code 1.4.3 ##

# Importing packages
import pandas as pd  # data science essentials

# Creating a DataFrame that has missing values
original_df = pd.DataFrame([[None, 2, 3],
                            [4, None, 6],
                            [7, 8, None]])

("""Creating a copy of the original dataset so that I don't destroy the
   original. I'm planning to impute missing values on this dataset using the
   mean. This is why I called this df_mean instead of something else.""")
df_mean = pd.DataFrame.copy(original_df)

("""This is a loop that looks for missing values in a column.
   For each column with missing values, use the mean to fill in the missing
   values.""")
for col in df_mean:
    if df_mean[col].isnull().any():
        col_mean = df_mean[col].mean()
        df_mean[col] = df_mean[col].fillna(col_mean).round(2)

***
Notice how in *Code 1.4.3* we replaced the longer hashtag comments with triple-quote strings. This is a way to provide comments to humans on multiple lines of code in a more readable way. Technically speaking, Python will recognize these as lines of code and then try to run them. However, since there is no *print()* wrapper around these statements and they are not assigned to objects, Python has no use for them and immediately forgets that they exist. In other words, since Python has not been instructed to do anything with the triple quote statements, it ignores them. Finally, notice also that we added parentheses around each of the triple quotes. This is a good practice as it helps to organize our code and to prevent unwanted errors.<br><br>
*Note:* There may be a time when someone reviewing your code tells you that using stand-alone strings (i.e. strings that aren't part of an object or a *print()* statement) to make comments is a bad practice. There is some rationale to this, as since Python runs stand-alone strings as code, it takes a tiny bit longer for your code to process. In fields like algorithmic trading, "a tiny bit longer" can mean missed opportunities, and thus such comments should be avoided. In business analytics, however, the purpose of most analyses is to gain insights and make recommendations. This takes a thorough and thoughtful exploration of the data, as well as constant communication with stakeholders to truly understand the problem you are trying to solve. In this sense, **your goal is to communicate as clearly as possible.** If this is best done by using stand-alone strings, use them.<br><br>

*Code 1.4.4(a)* and *Code 1.4.4(b)* demonstrate the additional processing time from the use of stand-alone strings. As can be observed, the use of stand-alone strings had a very minor, if any, impact on the processing time of our code.
***

In [49]:
## Code 1.4.4(a) ##

import time
start_time = time.time()

# Importing packages
import pandas as pd  # data science essentials

# Creating a DataFrame that has missing values
original_df = pd.DataFrame([[None, 2, 3],
                            [4, None, 6],
                            [7, 8, None]])




df_mean = pd.DataFrame.copy(original_df)





for col in df_mean:
    if df_mean[col].isnull().any():
        col_mean = df_mean[col].mean()
        df_mean[col] = df_mean[col].fillna(col_mean).round(2)


print("My program took", time.time() - start_time, "to run")

My program took 0.005373954772949219 to run


In [50]:
## Code 1.4.4(b) ##

import time
start_time = time.time()

# Importing packages
import pandas as pd  # data science essentials

# Creating a DataFrame that has missing values
original_df = pd.DataFrame([[None, 2, 3],
                            [4, None, 6],
                            [7, 8, None]])

("""Creating a copy of the original dataset so that I don't destroy the
   original. I'm planning to impute missing values on this dataset using the
   mean. This is why I called this df_mean instead of something else.""")
df_mean = pd.DataFrame.copy(original_df)

("""This is a loop that looks for missing values in a column.
   For each column with missing values, use the mean to fill in the missing
   values.""")
for col in df_mean:
    if df_mean[col].isnull().any():
        col_mean = df_mean[col].mean()
        df_mean[col] = df_mean[col].fillna(col_mean).round(2)

        

print("My program took", time.time() - start_time, "to run")

My program took 0.007046222686767578 to run


***

#### Side Note: Code Processing Time
The code used to time the difference between the two codes was found via [this page on Stack Overflow](https://stackoverflow.com/questions/12444004/how-long-does-my-python-application-take-to-run). The template provided via the Stack Overflow page is shown in *Code 1.3.5*.
***

In [42]:
## Code 1.4.5 ##

import time
start_time = time.time()

print("My program took", time.time() - start_time, "to run")

My program took 2.6941299438476562e-05 to run


***
## 1.5 Summary
~~~
 __   __        __   __       ___                ___    __        __    /
/  ` /  \ |\ | / _` |__)  /\   |  |  | |     /\   |  | /  \ |\ | /__`  / 
\__, \__/ | \| \__> |  \ /~~\  |  \__/ |___ /~~\  |  | \__/ | \| .__/ .  
~~~                                                                         


Hats off to you for finishing the first chapter of this book. You are now ready to move forward given your understanding of:
* finding help with the help() wrapper
* code complete in Jupyter notebook
* finding answers on the Internet
* talking to humans through code comments