# [MCB-163L] Introduction to Python

*Estimated Time: 30 minutes*

### Table of Contents:


**Part I: Python**
 1. [Data](#section_data)
 2. [Expressions](#section_expr)
 3. [Names](#section_names)
 4. [Functions](#section_func)
 
**Part II: Tables**
1. [Sorting Dataframes](#section_sort)
2. [Column / Row Selection](#section_filter)
3. [Attributes](#section_attributes)

**Part III: [Making Plots](#section_plots)**




# The Jupyter  Notebook <a id='section_jupyter'></a>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
Welcome to the Jupyter Notebook! `Notebooks` are documents that can contain text, code, visualizations, and more. 

A notebook is composed of rectangular sections called `cells`. There are 2 kinds of cells: markdown and code. A `markdown cell`, such as this one, contains text. A `code cell` contains code in Python, a programming language that we will be using for the remainder of this module. You can select any cell by clicking it once. After a cell is selected, you can navigate the notebook using the up and down arrow keys.

To run a code cell once it's been selected:

<ul>

  <li>press Shift-Enter, or</li>
  <li>click the Run button in the toolbar at the top of the screen.</li>

</ul>  

If a code cell is running, you will see an asterisk (\*) appear in the square brackets to the left of the cell. Once the cell has finished running, a number will replace the asterisk and any output from the code will appear under the cell.
</div>

In [None]:
# run this cell
print("Hello World!")

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
Code cells can be edited any time after they are highlighted. Try editing the next code cell to print your name.

</div>

In [None]:
# edit the code to print your name
print("Hello: my name is ...")

### Saving and Loading


<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
Your notebook can record all of your text and code edits, as well as any graphs you generate or calculations you make. You can save the notebook in its current state by clicking Control-S, clicking the floppy disc icon in the toolbar at the top of the page, or by going to the File menu and selecting "Save and Checkpoint".

The next time you open the notebook, it will look the same as when you last saved it.

<br><br>
<i><b>Note:</b></i> after loading a notebook you will see all the outputs (graphs, computations, etc) from your last session, but you won't be able to use any variables you assigned or functions you defined. You can get the functions and variables back by re-running the cells where they were defined- the easiest way is to highlight the cell where you left off work, then go to the Cell menu at the top of the screen and click "Run all above". You can also use this menu to run all cells in the notebook by clicking "Run all".</br></br>
</div>
    


<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">

Before we begin, we'll need a few extra tools to conduct our analysis. Run the next cell to load some code packages that we'll use later.


<br><br>
<i><b>Note: </b></i>this cell MUST be run in order for most of the rest of the notebook to work.

</div>

In [1]:
# dependencies: THIS CELL MUST BE RUN
from datascience import *
import numpy as np
import math
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')

# Part 1. Python <a id='section_python'></a>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">

<code>Python</code> is  programming language- a way for us to communicate with the computer and give it instructions. 

Just like any language, Python has a <i>vocabulary</i> made up of words it can understand, and a <i>syntax</i> giving the rules for how to structure communication.
</div>


### Errors <a id="subsection error"></a>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">

<code>Python</code> is  programming language- a way for us to communicate with the computer and give it instructions. 

Just like any language, Python has a <i>vocabulary</i> made up of words it can understand, and a <i>syntax</i> giving the rules for how to structure communication.

<p>Python is a language, and like natural human languages, it has rules. Whenever you write code, you will often accientally break some of these rules. When you run a code cell that doesn't follow every rule exactly, Python will produce an <code>error message</code>.</p>

<p>Errors are <i>normal</i>; experienced programmers make many errors every day. Errors are also <i>not dangerous</i>; you will not break your computer by making an error (in fact, errors are a big part of how you learn a coding language). An error is nothing more than a message from the computer saying it doesn't understand you and asking you to rewrite your command.
</p>

<p>We have made an error in the next cell.  Run it and see what happens.
</p>

</div>

In [None]:
print("This line is missing something."

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
The last line of the error output attempts to tell you what went wrong.  The <i>syntax</i> of a language is its structure, and this <code>SyntaxError</code> tells you that you have created an illegal structure.  "<code>EOF</code>" means "end of file," so the message is saying Python expected you to write something more (in this case, a right parenthesis) before finishing the cell.

</div>

## Part 1.1: Data <a id='section_data'></a>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
<b>Data:</b> is information- the "stuff" we manipulate to make and test hypotheses. 

Almost all data you will work with broadly falls into two types: numbers and text. <i>Numerical data</i> shows up green in code cells and can be positive, negative, or include a decimal.
</div>

In [None]:
# Numerical data

4

87623000983

-667

3.14159

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
Text data (also called <i>strings</i>) shows up red in code cells. Strings are enclosed in double or single quotes. Note that numbers can appear in strings.
</div>

In [None]:
# Strings
"a"

"Hi there!"

"We hold these truths to be self-evident that all men are created equal."

# this is a string, NOT numerical data
"3.14159"

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; "> 
We can store different types of data into a container called an <code>array</code>. These arrays are contained within <code>[...]</code>. Set <code>my_array</code> to different types of data and run the cell.

</div>

In [None]:
my_array = ...
my_array

## Part 1.2: Expressions <a id='section_expr'></a>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">

A bit of communication in Python is called an <code>expression</code>. It tells the computer what to do with the data we give it.

<p>Here's an example of an expression.</p>

</div>

In [None]:
# an expression
14 + 20

<div style="border-left: 3px solid #003262; p</code>adding: 1px; padding-left: 10px; background: #ffffff; ">
    
When you run the cell, the computer <code>evaluates</code> the expression and prints the result. Note that only the last line in a code cell will be printed, unless you explicitly tell the computer you want to print the result.
</div>

In [None]:
# more expressions. what gets printed and what doesn't?
100 / 10

print(4.3 + 10.98)

33 - 9 * (40000 + 1)

884

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
Many basic arithmetic operations are built in to Python, like multiplication <code> * </code>, addition <code> + </code>, subtraction <code> - </code>, and division <code> / </code>. There are many others, which you can find information about <a href="http://www.inferentialthinking.com/chapters/03/1/expressions.html">here</a>.

<p>The computer evaluates arithmetic according to the PEMDAS order of operations (just like you probably learned in middle school): anything in parentheses is done first, followed by exponents, then multiplication and division, and finally addition and subtraction.</p>
</div>

In [None]:
# before you run this cell, can you say what it should print?
4 - 2 * (1 + 6 / 3)

## Part 1.3: Names <a id='section_names'></a>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
Sometimes, the values you work with can get cumbersome- maybe the expression that gives the value is very complicated, or maybe the value itself is long. In these cases it's useful to give the value a <i>name</i>.

We can name values using what's called an <i>assignment</i> statement.
</div>

In [None]:
# assigns 442 to x
x = 442

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
The assignment statement has three parts. On the left is the <i>name</i> (<code>x</code>). On the right is the <i>value</i> (442). The <i>equals sign</i> in the middle tells the computer to assign the value to the name.

<p>You'll notice that when you run the cell with the assignment, it doesn't print anything. But, if we try to access <code>x</code> again in the future, it will have the value we assigned it.</p>
</div>

In [None]:
# print the value of x
x

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
You can also assign names to expressions. The computer will compute the expression and assign the name to the result of the computation.
</div>

In [None]:
y = 50 * 2 + 1
y

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
We can then use these name as if they were numbers.</div>

In [None]:
x - 42

In [None]:
x + y

## Part 1.4: Functions <a id='section_func'></a>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
We've seen that values can have names (often called <code>variables</code>), but operations may also have names. A named operation is called a <code>function</code>. Python has some functions built into it.
</div>

In [None]:
# a built-in function 
round

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
Functions get used in <i>call expressions</i>, where a function is named and given values to operate on inside a set of parentheses. The <code>round</code> function returns the number it was given, rounded to the nearest whole number.
</div>

In [None]:
# a call expression using round
round(1988.74699)

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
        
A function may also be called on more than one value (called <i>arguments</i>). For instance, the <b>min</b> function takes however many arguments you'd like and returns the smallest. Multiple arguments are separated by commas. (<i>max</i> works the same way, can you guess what it does?)
</div>

In [None]:
min(9, -34, 0, 99)

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
Another example of this is the <code>sum</code> function. The diference of this function is that the items that you are summing must be in an <code>array</code> (mentioned in Part 1.1) and must all be numbers. In the cell below, set <code>my_other_array</code> to a list of numbers you'd like and then use <code>sum</code> to add them up. 
</div>

In [None]:
my_other_array = ...
my_other_array

In [None]:
sum(my_other_array)

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">

<b>Practice:</b>
<ul>
  <li>The <code>abs</code> function takes one argument (just like <code>round</code>)</li>
</ul> 
Try calling <code>abs</code> in the cell below. What does it do?

Also try calling each function <i>incorrectly</i>, such as with the wrong number of arguments. What kinds of error messages do you see?
</div>

In [None]:
# replace the ... with calls to abs 
...


### Dot Notation

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
        
Python has a lot of <a href="https://docs.python.org/3/library/functions.html">built-in functions</a> (that is, functions that are already named and defined in Python), but even more functions are stored in collections called <i>modules</i>. Earlier, we imported the <code>math</code> module so we could use it later. Once a module is imported, you can use its functions by typing the name of the module, then the name of the function you want from it, separated with a <code>.</code>.
</div>

In [None]:
# a call expression with the factorial function from the math module
math.factorial(5)

# Part 2: Tables <a id='section_tables'></a>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
The last section covered four basic concepts of python: data, expressions, names, and functions. In this next section, we'll see just how much we can do to examine and manipulate our data with only these minimal Python skills. We will be using data from the Allen Brain Mouse Connectivity Atlas.
</div>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
<code>Tables</code> are fundamental ways of organizing and displaying data. Run the next cell to load the data.
</div>

In [None]:
# Run this cell
primary_auditory = pd.read_csv('./data/primary_auditory_area.csv')
primary_auditory.head()

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
        
This DataFrame (or table) is organized into <code>columns</code>: one for each <i>category</i> of information collected:

<p>You can also think about the table in terms of its <code>rows</code>. Each row represents all the information collected about a particular instance, which can be a person, location, action, or other unit. </p>

<p>In the <code>primary_auditory</code> rows, the instance is a projection from one brain structure to another. The columns then encompass different characteristics of this projection.</p>

<p>Using the function <code>.head()</code> give us only the first five rows by default. Can you see how many rows there are in total?</p>
</div>


## Part 2.1: Sorting DataFrames <a id='section_sort'></a>


### Sorting values in a column using  `.sort_values`

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
        
The <code>.sort_values</code> function is used to sort the values in a column of a DataFrame. This function takes in two arguments, The <b>column label</b> <i>(in string form)</i> and <code>ascending</code> <i>(must equal True or False)</i>. In order to get values sorted from least to greatest, <code>ascending = True</code>. In order to get values sorted from greatest to least, <code>ascending = False</code>.

<p>Let's sort the values in the column <code>projection_density</code> from <i>greatest to least</i> from the <code>primary_auditory</code> DataFrame.</p>
</div>

In [None]:
primary_auditory.sort_values('projection_density', ascending = False).head()

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">

<b>Practice:</b>

Try using <code>.sort_values</code> on <code>primary_auditory</code> in the cell below to arrange the volume from least to greatest. What is the <code>structure_id</code> of the experiment that had the least volume injected to it? If there is more than one with the same volume, just write down the top-most of the table. Assign this value to <code>least_volume</code>.
</div>

In [None]:
...

In [None]:
least_volume = ...
least_volume


## Part 2.2: Column/Row Selection <a id='section_filter'></a>


### Selecting columns with `[ ... ]`

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
        
The <b>[...]</b> is used to get a <code>Series</code> containing one column and an index. It takes in the name of a column in the form of a string.

Let's select the <code>projection_density</code> from the <code>primary_auditory</code> DataFrame and assign it to <code>proj_density</code>.
</div>

In [None]:
# make a new table with only selected columns
proj_density = primary_auditory[...]
proj_density


## Part 2.3: Attributes <a id='section_attributes'></a>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
       
Columns have <code>attributes</code> that give information about the them, like values contained within them. These attributes are accessed using the dot method. But, since an attribute doesn't perform an operation on the table, there are no parentheses (like there would be in a call expression).

An attributes you'll use frequently is <code>values</code>, which will give us a list of the values in the column in the form of an <code>array</code> (mentioned in Part 1.1).
</div>

In [None]:
# Run this cell
proj_density.values

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
We can do several things with the values of the columns. Using the length function <code>len()</code>, we can find the number of values in our selected column. Run the cell below to see how many values are in <code>proj_density</code>.
</div>

In [None]:
# Run this cell
len(proj_density.values)

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
Something else we can do with this data is to obtain specific values. To obtain the first value, we use <code>proj_density[0]</code>. Since our data is organized from greatest to lowest values, this would give us the greatest projection density.
</div>

In [None]:
# Run this cell
proj_density.values[0]

# Part 3: Making plots <a id='section_plots'></a>

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">  
    
### Step 1: Create some random data to plot
We'll use a function "random.rand" to create a random list of numbers. You decide on how long this list of numbers should be (anything between 5 to 100 is fine) by adding a value next to "list_length".

In [None]:
# Replace the ... to assign your chosen value to list_length below.
list_length = ...

random_list = np.random.rand(list_length,1) # Create a vector that is list_length long

print('Created a random list')

><b>Task</b>: Let's make sure Python did what we wanted it to do -- save our random list as a variable called <code>random_list</code>. Check by typing the variable name into the next cell, and running it. If that worked, you should see an array of values.

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">
    
### Step 2: Plot your data
Let's pretend this is data from an awesome experiment we ran. We need to plot the data. Remember that we imported <code>matplotlib.pyplot</code> as <code>plt</code>. We can now use the `plt.plot()` function to plot our random list.

In [None]:
# "plot" is matploblib pyplot's basic plotting function
# Below, we're asking it to make a line plot of the random_list we just created
plt.plot(random_list)

# this is where you can add additional changes to your plot

# plt.show() will show our plot below this code cell
plt.show()

><b>Tasks:</b>
> 1. Add axes to your plots by adding <code>plt.xlabel('yourxlabelhere')</code> and <code>plt.ylabel('yourylabel')</code>. Add those above <i>before</i> <code>plt.show()</code>.
> 2. Add some markers to your line! Do this by adding an additional argument to <code>plt.plot(random_list)</code>, so that it says <code>plt.plot(random_list, marker ="o")</code>. <a href="https://matplotlib.org/api/markers_api.html">You can add various markers of your choosing!</a>.

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">

### Step 3: Add some more data

Let's create another random list and create a scatterplot of the data.

><b>Task</b>: In the box below, create a list named <code>random_list_2</code> of the same length as your first list.

><b>Task</b>: In the box below, generate a scatterplot of your two lists. Hint: instead of using the command <code>plt.plot()</code> use <code>plt.scatter(...,...)</code> which graphs two variables. The x-axis variable goes first, then the y-axis variable. Since you are graphing random data, you can decide which list should be on the x-axis.

In [None]:
# Make a scatter plot of your random data
plt.scatter(...,...)

# this is where you can add additional changes to your plot

# plt.show() will show our plot below this code cell
plt.show()

<div style="border-left: 3px solid #003262; padding: 1px; padding-left: 10px; background: #ffffff; ">

### Step 4: Celebrate
That's the Jupyter Notebook tutorial! You're ready to tackle more complex notebooks. 

In [None]:
from IPython.display import HTML
HTML('<img src="https://media.tenor.com/images/99cff34bdcb675975b2b0cc661f2e4ce/tenor.gif">')

## Resources
For additional Jupyter Notebook information and practice, see [this tutorial](https://www.dataquest.io/blog/jupyter-notebook-tutorial/) from DataQuest. 

## About this Notebook
Parts 1 and 2 were developed by Elias Saravia & Daniel Lopez, students at UC Berkeley in the Data Science program.

Part 3 was created by Ashley Juavinett for classes at UC San Diego.