# Introduction to Python

Author: Avery Blankenship

Date: 6/25/22

---

Before we delve too deeply into some of the ins and outs of navigating Jupyter Notebooks, it might be worth it to discuss some of the key features that distinguish Python from other programming languages. Unlike languages such as Java or C++, Python has a relatively forgiving syntax which is designed to be easy to learn. This means that some of the rigid requirements of other programming languages, for instance requiring that you declare variables before you can use them or wrap all code in curly braces `{}` or a semi-colon `;`, is not required in Python. This flexibility makes Python an ideal beginner-friendly programming language.

In [None]:
print("hello world")

For instance, look at the line of code above. Even if you don't know any Python, you can probably tell what the code does: it prints the phrase "hello world." You'll likely find that much of Python is just as readable as this "hello world" program. 

This tutorial is going to assume that you may have some light exposure to code, but no substantial experience in Python. For this reason, there are a couple of terms and features which may be useful to define for you up front. First, it is important to note that when you are running Python code, even in Jupyter Notebooks, the computer executes one line at time, starting from the top and working its way down the file. The computer will execute one line at a time until it either hits an error or it reaches the end of the file. It is important to understand this one-line-at-a-time execution for debugging purposes. If you are running a snippet of code and the output looks strange to you, you may want to debug the code by working through the file one line at a time in order to reproduce this strange output. Similarly, understanding that Python is executed from a top-to-bottom approach is important as you are declaring variables, functions, etc. 

The code only has access to the lines it has already encountered. In programming circles, this type of language is called an **interpreted** language. Other languages such as Java and C++ are **compiled langauges** which means that they don't run a single line at a time, they **compile** all of the code, and then execute it. Because Python is an interpreted language, it tends to be a bit slower than something like C++ and Java, but it has the benefit of being easier to debug. Interpreted languages are what make this Jupyter Notebook possible. Rather than rerunning all of the code at once, interpreted languages allow you to run a line at a time and run code at your own pace.

Let's look at these lines of code for example:

In [None]:
x = 2
x = 4 + 1
x = x / 2
print (x)

In [None]:
x = 4 + 1
x = 2
x = x / 2
print (x)

As you can see, because Python executes one line at a time, in the first example, the variable `x` is being set to `2` in the first line, is being reset to `4 + 1` in line two and finally, that value is divided by 2. In the second example, `x` is set to `4 + 1`, is reset to `2` in line two, and then is divided by 2. Although both examples use the same formulas, because they are ordered differently, the final value of `x` differs. If you are writing code that uses a lot of variables, it can even be helpful to keep a list of your variables and what their current values are (or at least are supposed to be) at different key points in the code.

Note that I didn't need to tell the computer what type of variable `x` is, it decided for itself based on the information I provided it about `x` (in this case that the value of `x` is an integer). In programming circles, this tendency of Python to determine variable types for itself is called "duck typing." "Duck typing" is when a program assumes that if it looks like a duck and sounds like a duck, it is probably a duck--or in this case an integer. 


## Using Jupyter Notebooks

There are a few things worth noting about Jupyter Notebooks before proceeding with the rest of this tutorial. First, one of most prominent features of Jupyter Notebooks, is that it seamlessly combines text and executable code. This feature makes Jupyter Notebooks an ideal environment in which to learn Python. Because Jupyter Notebooks are structured around text and chunks of code, you are able to run code in isolated sections. I recommend taking this approach as you proceed with the tutorial as it will allow you to test things out in bits and pieces in order to ensure that you really understand how the code is working before moving on.

In order to run a code snippet, simply click the box where the code is and then click the "Run" button on the toolbar at the top. Try running this code snippet below:

In [None]:
print("I am a code snippet")

If everything is working like it should, you should see the phrase "I am a code snippet" appear just below the print statement. 

Some other buttons/features to note are: 

- The stop button on the toolbar halts whatever code is currently running. This may be useful if in cases where, for example, you mistakenly tell the code to iterate over the wrong data set. 
- Right next to the stop button, is the restart button. This button will halt whatever code is currently running and then restart the kernel.
- Next to the refresh button is the restart and run button. This button will both restart the kernel and run the entire notebook. 

You can save a Jupyter Notebook by navigating to the "File" button on the toolbar and then to the "Save" button. Alternatively, you can save by directly clicking the "Save" button on the toolbar.

## Other Environments

Although this tutorial is going to take place in Jupyter Notebooks, it's still a good idea to know of other environments for working Python. When I say 'environment' what I mean is some place on your computer wherre the Python interpreter as well as libraries and scripts are installed so that you can run Python code. 

I'm going to walk you through two particularly popular Python environments: IDLE and Spyder.

[**IDLE**](https://www.python.org/downloads/) (Integrated Development and Learning Environment) is an environment that you can download directly from Python's website. The IDLE interface is very simple. You write code in one window that resembles a text editor, and when you run the code, the output appears in a separate window. Because of IDLE minimalist design, it can be a great environment for beginners. However, one downside to IDLE is that many libraries don't come pre-installed, so there is slightly more pre-work required for code that involves libraries. IDLE generally is a good place to get a handle on the basics *because* it requires you to do more of this sort of work, though once you become more familiar with Python, you will likely want to switch to a different environment that offers more features.

[**Spyder**](https://www.python.org/downloads/), on the other hand, is an environment that comes as part of Anaconda, a data science platform that comes with many environments (such as Jupyter Notebooks) installed. The benefit of using Spyder, or Anaconda in general, is that Anaconda makes installing libraries very easy, and in fact many libraries come pre-installed. Spyder is particularly popular for machine learning, natural language processing, and other methodologies that involve working with lots of data or natural language. One reason for its popularity, is that Spyder makes debugging code very simple by allowing you to run your code one line at a time. However, the trade off is that Spyder has many more buttons that IDLE so it can take some time getting used to navigating all of the settings. 

All of the code written in this tutorial can run in either of these environments, so if you want to do some fiddling on your own or just play around with the code in order to become more familiar with Python, try installing either IDLE or Spyder.

## Debugging in Python

It is useful to know how to debug in Python before getting too deep into any code. As I explained above, Python is an **intepreted** language which means that rather than compiling code before executing it, Python interprets code one line at a time. Python's status as an interpreted language makes debugging much easier because the code will simply stop running or print an error when it reaches the problem line. 

There are a couple of different ways that you can approach debugging in Python and some of these methods will also depend on the environment that you are using to code. 

### Built in Debugger

One simple way to debug your code is to use the built in debugger. If you run the code below, you'll notice that the output includes line numbers and an arrow pointing to the `set_trace()` function. The output tells us that there is a break point that has been placed in the code, as well as where that point is (in this case on line 6). Everything that is printed below the break point is the code that has not been run, yet. 

The debugger is ideally run in the Command Line. When you run the debugger from the Command Line, the break point is automatically placed at the first line in your code. There are some great resources out there if you want to read more about using the debugger in the [Command Line](https://hub.packtpub.com/debugging-and-profiling-python-scripts-tutorial/).

When you run the debugger in the console, the code will run until it reaches the first break point. Once the break point is reached, note how the console will now allow you to type and submit commands. This is because the code will not run again once it hits the break point, you have to tell the debugger to proceed. Some of the functions you can use in debugging mode are as follows:

- `help`, displays all of the available commands
- `continue`, execute the code until you reach either the end of the file or another break point
- `step`, execute the current line and then move to the next within a function
- `where`, displays the current line number
- `whatis`, tells you what the variable type of a variable is (for example, "whatis y")

In order to begin debugging, you want to insert the following lines into your code:

In [1]:
import pdb  #this line imports the pdb module which is the built in Python debugger

x = "some string"
pdb.set_trace() #this line is a break point. You can also use the built in function breakpoint()

y = 2
pdb.set_trace()

print (q)

q = x + y

--Return--
None
> [1;32m<ipython-input-1-d67252fc4242>[0m(4)[0;36m<module>[1;34m()[0m
[1;32m      2 [1;33m[1;33m[0m[0m
[0m[1;32m      3 [1;33m[0mx[0m [1;33m=[0m [1;34m"some string"[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m----> 4 [1;33m[0mpdb[0m[1;33m.[0m[0mset_trace[0m[1;33m([0m[1;33m)[0m [1;31m#this line is a break point. You can also use the built in function breakpoint()[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m      5 [1;33m[1;33m[0m[0m
[0m[1;32m      6 [1;33m[0my[0m [1;33m=[0m [1;36m2[0m[1;33m[0m[1;33m[0m[0m
[0m
ipdb> continue
--Return--
None
> [1;32m<ipython-input-1-d67252fc4242>[0m(7)[0;36m<module>[1;34m()[0m
[1;32m      5 [1;33m[1;33m[0m[0m
[0m[1;32m      6 [1;33m[0my[0m [1;33m=[0m [1;36m2[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m----> 7 [1;33m[0mpdb[0m[1;33m.[0m[0mset_trace[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m      8 [1;33m[1;33m[0m[0m
[0m[1;32m      9 [1;33m[0mprint

NameError: name 'q' is not defined

Note that the output for the code shows that the code ran until it reached line 4 which is where the first break point occurs. Then, I used the `continue` function by typing `ipdb> continue`. This command told the code to continue moving through the code until it hit the next break point at line 7. Then, when I used `continue` again, since there were no more break points, the code ran until it hit an error. In this case, the error is that I declared `q` _after_ I tried to invoke it with the `print()` function. Since Python is interpreted, when the computer go to line 9 where the `print()` statement is, it had no idea what `q` was and therefore could not run. The debugger will stop once it reaches the erroneous line. This style of debugging makes identifying errors very straightforward and also allows you to set break points in the places where you are unsure of the code and skip the places where you are more confident.

After you are done using the debugger, you want to use the `exit()` function which will exit out of debugging mode and let you continue to run code. The `exit()` function works by killing the kernel which will force it to restart, clearing any code currently being executed or variables that have been stored.

In [2]:
exit()


### Using Print Statements

One really easy way to debug your code, is to place print statements throughout your code. For example, if you have a loop and you know that the loop is supposed to run three times, then if you place a print statement within that loop it will print three times if the loop is correct. Similarly, you can place print statements inside of if statements and if the statement prints, then you know that the condition for the if statement to run was met. 

In [None]:
x = 2
y = 1

if x == y:
    print("I made it here")
    print(x)
    print(y)
    
else:
    print("something is wrong")
    print(x)

As you can see, because the if statement's condition was not met, the code printed "something is wrong" and then continued to execute the lines below that print statement. Because the if statement's condition was neve met, the print statement below the if statement was never printed and the lines below the statement never executed.

It is important to keep in mind that Python files or code snippets run until they hit an error. If there is an error in a loop, the code will sometimes run indefinitely. 

## Navigating to Your Working Directory

Often, you will need to know what folders on your computer Python has access to. Folders are often referred to as *directories* and the folder that is automatically opened when Python runs is called your *current working directory*. That just means that if you open or close any files, Python already knows about the working directory and thus you don't need to provide a full file path in order to tell Python where a particular file is. Imagine that Python thinks the entire universe is just whatever folder is your current working directory: it doesn't know where anything else on your computer is.

Below is a code snippet that will allow you to get the file path for your current working directory. We are going to import the `os` library which will give us access to the `getcwd()` function. This function will return the full file path for your current working directory. 

In [None]:
import os 
os.getcwd()

If you wanted to change your current working directory, you would use the `chdir()` function which comes with the `os` library. This function accepts a file path as its input.

In [None]:
os.chdir('SOME FILE PATH')

if you are using a Windows computer, you may get a unicode error informing you that bytes can't be decoded. That is totally fine and normal! All you want to do, is add an additional backslash to your file path so that each of the backslashes are escaped. It would look something like this:

`"C:\\users\\admin\\Desktop"`

If you were to restart the Python shell or kernel, your current working directory would be changed back to the default. You can read more about working directories in this `os` [tutorial](https://data-flair.training/blogs/python-directory/#:~:text=How%20to%20Get%20Current%20Python,use%20the%20getcwd()%20method.&text=Cwd%20is%20for%20current%20working,as%20a%20string%20in%20Python).

## Libraries

At this point, you may be wondering: what exactly is a library? Essentially, a library in python is like a recipe book for a bunch of different functions. When you import a library, all you are asking Python to do, is access the code within that library that will tell Python how to make certain functions work. For example, the `string` library includes a bunch of functions that make working with strings much easier. Above, we used the `os` library which makes working with operating system functions easier. In short, a library allows you to use functions that don't automatically come with Python.

A function that *does* automatically come with Python is called a *built in* function. A built in function is a function that will work even without importing libraries. For example, the `print()` function comes automatically with Python and you don't need to import a function to make it work. 

It's a good practice to keep all of your library imports at the top of your code. Because Python is *not* a compiled language, the code will always run from top to bottom. This means that Python has access to code in the order you give write it. Keeping all of your import calls at the top of the code ensures that the computer will have steady access to those libraries throughout the code without you having to keep track of when a particular line would require a library to be imported.

In the code block below, you will see that we are using `import` to import the `string` library.

In [None]:
import string

Another popular way to import a libaray, is to import it as a variable. Basically, this process is like giving the function a nickname. When you nickname a library, any time you need to access functions within that library, you just need to use the nickname instead of the entire name of the library every time.

In [6]:
import pandas as pd

dataframe = pd.DataFrame()
print(dataframe)

Empty DataFrame
Columns: []
Index: []


Finally, another useful trick for importing libraries, is that if you only need a single function or set of functions from a library, you can just import that section of the library rather than the entire library. Importing only what you need ensures that you are being memory efficient and will lessen the burden on your computer. This step won't cause your computer to crash, but saving memory becomes more and more important the more you work with larger and larger sets of data. In the example below, I import *only* the `DataFrame` function from the library `pandas` and then I also decide to nickname that function `df`. This means that when I want to make a new dataframe, I only need to use `df()` rather than `pd.DataFrame()` or `pandas.DataFrame()`. However, importing only the `DataFrame()` function means that the computer only has access to that specific `pandas` function.

In [7]:
from pandas import DataFrame as df

dataframe = df()
print(dataframe)

Empty DataFrame
Columns: []
Index: []


Some function come pre-installed with Python. This is not the same thing as a function being built in. A built in function is a function that you don't need to import. When a function is *installed* that means that Python has access to all of the code that makes up that library, but the library likely still needs to be imported to actually work. 

You can download libraries from all sorts of places, including Github. However, be careful when you are downloading a library that you are doing so from a reputable source since you likely won't know, fully, what actually comes with the library code. One way to make sure you are downloading a library safely, is to download the library through Anaconda, itself. If a library is in the Anaconda database, then you can be confident that it won't break your computer. You can read about how to use Anaconda's database to install libraries in Anaconda's [documentation](https://docs.anaconda.com/anaconda/user-guide/tasks/install-packages/). Generally, when you search for a function on Anaconda's website, Anaconda will provide you with the correct call to use in order to install it. 

It is also important to note that if you are *not* using Anaconda, that you will likely have to download many libraries. Most of the popular and widely used libraries come preinstalled with Anaconda.

## Functions and Variables

We've been using the term *function* a lot, but we haven't quite discussed what a function actually is and how to make one. As stated above, a function is essentially a recipe for some task that you want the computer to complete. By storing code within a function, you are able to recall that code with a single line of code rather than copy and pasting all of that code every time you need to use it. When you import a library, you are essentially importing a bunch of functions which makes recalling them even easier.

I've already walked you through how to import a function, but let's briefly walk through how you make your own functions. Generally, functions follow the same formulation which includes a name, a set of parameters, and a definition. 

- **Name**. When you give your function a name, you are telling your computer how you would like your code to be referred. You will use this name any time you want to call your function. 
- **Parameters**. Parameters are the set of arguments that the function requires in order to run. For instance, recall that `print()` is a built in function. The parenthesis next to the name `print` is where you would place the parameters for the function. Since `print()` prints its parameters to the console, it accepts most variables and strings as a parameter. So for instance, if you wanted to print the words "hello world" you would call `print()` using `print("hello world"). 
- **Definition**. A function definition is the actual recipe for the function itself: it's the code that tells the computer what to do when you call the function. To define a function, you use the `def` call, a colon after the function's name, and indent one level for the definition.

In the code block below, let's make a simple function that will add the two parameters you provide it with and then print the results. 

In [9]:
def add_function(parameter1, parameter2):
    result = parameter1 + parameter2
    print(result)

Now, if we want to use our function, we just need to call it with the correct parameters. Note: if there are any mistakes in the code within your function definition, those mistakes will only throw an error message once you actually use the function.

In [10]:
add_function(1, 12)

13


Notice above that within our function definition, I am storing the results of the addition within `results`. In this case, `results` is known as a *variable*. Just like in math, variables allow us to store some type of information for later use. Using variables can save you a lot of time since instead of having to type `parameter1 + parameter2` every time we need that calculation, we can just store the results within a variable called `result` and just use the variable when we need it. 

In Python, you don't need to tell the computer about your variables before you use them. In other languages, this is called *declaring* a variable. Instead, you can just start using your variables like empty boxes whenever you need one. For example, above, I didn't need to tell the computer about the variable `result`, I was able to just tell the computer what `result` *is*. In another language, for example JavaScript, you would need to specify that `result` was a variable of the type `string` before being able to use it. 

It is also useful to understand how variables are modified. When you use a variable within a function, that varible *only* exists within the function definition. This means that I can't suddenly start using the `result` variable in the rest of my code. The only way you can give the rest of the code access to a variable from a function is to `return` it, meaning that you end your function definition by including a line like below:

`return results`

This line tells the computer that you want to *local* variable `result` to become a `global` variable, meaning that all of the code can access it. However, remember that Python is an interpreted language which means it runs one line at a time. If you declare a variable at the top of your code and then use that same variable later to store some new information, the old information is tossed out like it never existed. Let's look at an example below and see if you can guess what the final version of the variable `x` is going to be. Another important feature to note, is that when you use a `return` statement at the end of a function, the computer isn't able to access that statement unless you *do* something with it. For example, you can set a variable equal to the function call, itself, and this will result in the variable holding the results of the `return` statement like below:

`variable = some_function(parameter)`



In [13]:
x = 1

def change_x(parameter):
    x = 12
    return x

print("the variable x is equal to", x)

the variable x is equal to 1


You may be wondering why `x`is not equal to the value 12 when our function above is supposed to change `x` to 12 and then return that variable. Remember, the code is only aware of functions and their code once you actually use the function. Here, we have defined a function `change_x()` but we haven't actually called it, yet, so the code has yet to be executed. Let's try again, but this time let's call the function.

In [16]:
x = 1

def change_x(parameter):
    x = 12
    return x

x = change_x(x)

print("the variable x is equal to", x)

the variable x is equal to 12


Whenever you are making new variables, you should try to use the variable name to be as descriptive as possible. You don't necessarily want a variable name that is several words long, but you do want a variable name to be descriptive enough that a stranger could read your code and understand what the variable is used for. The only exception is when you are using a variable a single time to briefly hold a value before moving it, or in cases where the function call asks that the variable name be something specific. For example, many functions used for graphing like for users to refer to `axis` as `ax`. In most of these cases, you can still use the full word `axis` but you will likely notice in the documentation for that function, that the creators use the abbreviated variables. In any case, you should use variable names that are useful and make sense to you and try to avoid names that will only lead to confusion down the road. 

## Final Thoughts

If any of this is still confusing to you and you are feeling lost, that is totally fine and normal! Just like spoken languages, you don't typically learn it all in one day. And also just like learning a new language, it's important to understand what your ultimate learning goals are: are you trying to become fluent or do you just want to be able to travel comfortably? Do you want to become a programmer or do you just want to know how to fix a few bugs and edit? The answer to that question will likely determine how intensely you immerse yourself as well as the community groups you form and join. 

Even seasoned Python pros often find themselves looking up the basics like how to define a function or what the name of a library is. One of the most valuable skills that you can have as a coder is to be willing to ask for help when you need it. The community around Python is very welcoming and active and you shouldn't feel shy about asking some more seasoned coders for advice or feedback--after all, most coding happens in groups!

If you want to learn more about Python, you should check out [W3Schools](https://www.w3schools.com/python/python_intro.asp) which offers a comprehensive tutorial for using Python as well as quick references. If you are looking for a community board for asking questions, [Stack Overflow](https://stackoverflow.com/) is the standard site used for most programming languages. 