# POLSCI 3 Fall 2019

## Discussion 2: Using Tables and Numpy

In this notebook, we will cover additional material that Python provides. In the first notebook, we only used the built-in functionality that Python provides. However, we can also use *libaries*, which are packages of code that give us additional functionality. In this notebook, we will cover how to use <code>Numpy</code>, a library for calculating mathematical and statistical values quickly, and <code>datascience</code>, a library for working with tables.

## Numpy
The basis for numpy is the numpy array. It allows us to store numeric values much more efficientially than standard Python arrays, and presents new functions for our lists. To start, we need to import the libary to be able to access it. In the cell below, we import numpy using a nickname, "np".

In [1]:
import numpy as np

Next, let's practice making numpy arrays. To create a numpy array, call <code>np.array()</code>, using a basic Python array as the input:

In [2]:
x = 10
y = 9
z = 13
numbers = [x, y, z] #The standard Python array
np.array(numbers) #Converting the standard to Numpy

array([10,  9, 13])

To practice, create a numpy array called <code>more_numbers</code> that holds the values 1868, 538, and 435. 

In [3]:
# YOUR CODE HERE: Delete this line and practice. If you struggle, ask your GSI for help!
more_numbers = np.array([1868, 538, 435])
more_numbers

array([1868,  538,  435])

Unlike normal Python arrays, Numpy arrays allow us to perform mathematical operations on their contents. For example, we can square every value in <code>numbers</code> like so:

In [4]:
#Because numbers is a normal Python array, we need to pass it into np.array() first
np.array(numbers)**2

array([100,  81, 169], dtype=int32)

Try adding 3 to every value of <code>more_numbers</code>:

In [5]:
#Because more_numbers is already a numpy array, we don't need to pass it into the np.array() function
more_numbers +3

array([1871,  541,  438])

Just like with variables, we can perform multiple operations on a numpy array in one expression. Below, we triple every value of <code>numbers</code> and subtract 4:

In [6]:
np.array(numbers)*3 -4

array([26, 23, 35])

Try dividing every value of <code>more_numbers</code> by 6 and adding 8:

In [7]:
more_numbers/6 +8

array([319.33333333,  97.66666667,  80.5       ])

Next, let's explore some of the functions that numpy gives us:  
1) <code>np.mean()</code> calculates the mean of the input array.    
2) <code>np.median()</code>  calculates the median of the input array.  
3) <code>np.std()</code>  calculates the standard deviation of the input array.  
4) <code>np.sqrt()</code>  calculates the square root of the input value.  
5) <code>np.random.choice()</code> chooses one value at random from the input array.

### 1) np.mean()
In the cell below, we calculate the average of the <code>numbers</code> array. In the cell below it, calculate the mean of the <code>more_numbers</code> array. What differences do you notice?

In [8]:
np.mean(numbers)

10.666666666666666

In [9]:
np.mean(more_numbers)

947.0

### 2) np.median()
In the cell below, we calculate the median value of the <code>numbers</code> array. In the cell below it, calculate the median of the <code>more_numbers</code> array. What differences do you notice?

In [10]:
np.median(numbers)

10.0

In [11]:
np.median(more_numbers)

538.0

### 3) np.std()
In the cell below, we calculate the standard deivation of the <code>numbers</code> array. In the cell below it, calculate the standard deviation of the <code>more_numbers</code> array. What differences do you notice?

In [12]:
np.std(numbers)

1.699673171197595

In [13]:
np.std(more_numbers)

652.6014608217382

### 4) np.sqrt()
In the cell below, we calculate the square root of 9.

In [14]:
np.sqrt(9)

3.0

Calculate the square root of the year the school was established. 

In [15]:
np.sqrt(1868)

43.22036556994862

### 5) np.random.choice()
In the cell below, we select a random value from the <code>numbers</code> array. I

In [16]:
np.random.choice(numbers)

9

Choose a random value from the <code>more_numbers</code> array. What happens when you repeatedly run the cell?

In [17]:
np.random.choice(more_numbers)

538

## Tables
Using the <code>datascience</code> library, we can import Excel spreadsheets as CSVs. Python allows you to accomplish the same tasks as Excel faster, as well as providing even more extensive performance. To start, import the <code>Table</code> class from <code>datascience</code>:

In [18]:
from datascience import Table

To create a table, we call Table():

In [19]:
Table()

Wait a second! This doesn't have any information. In order to add columns, we use the <code>.add_columns()</code> function:

In [20]:
Table().with_columns(
    'First Column', [1,2],
    'Second Column', [3,4]
)

First Column,Second Column
1,3
2,4


The above function works as a series of strings for column names followed by an array for the column values. To practice, create a table with 3 columns, titled "Political Science", "Sociology", and "Legal Studies". For each column, give it an array with 3 numeric values (you can choose any numbers you want!).

In [21]:
Table().with_columns(
    'Political Science', [1,2,3],
    'Sociology', [4,5,6],
    'Legal Studies', [7,8,9]
)

Political Science,Sociology,Legal Studies
1,4,7
2,5,8
3,6,9


### Interacting with Table Values
Columns in these tables are secretly numpy arrays, which means that when we select single table columns, we can peform the same kind of operations as we can on numpy arrays. Below, we select the first column using <code>.column()</code>, with the column header as the input. 

In [22]:
basic_table = Table().with_columns(
    'First Column', [1,2],
    'Second Column', [3,4]
)
basic_table.column('First Column')

array([1, 2])

Let's calculate the average value, median, and standard deviation for each column:

In [23]:
first_mean = np.mean(basic_table.column('First Column')) 
first_median = np.median(basic_table.column('First Column'))
first_standard_dev = np.std(basic_table.column('First Column'))
first_mean, first_median, first_standard_dev

(1.5, 1.5, 0.5)

In [24]:
second_mean = np.mean(basic_table.column('Second Column')) 
second_median = np.median(basic_table.column('Second Column'))
second_standard_dev = np.std(basic_table.column('Second Column'))
second_mean, second_median, second_standard_dev

(3.5, 3.5, 0.5)

In addition to creating our own tables, we can also import tables. Calling <code>Table.read_table()</code> with a folder address (e.g, "C:\Users\user_name\directory_1\POLSCI-3\data\example.csv") as the input enables us to read in existing spreadsheets. In the cell below, we read in a CSV file with data on country military spending:

In [26]:
military_spending = Table.read_table('data/milspend_pct_05.csv')
military_spending

nation,milspend_pct,year,nat_num,x,counter
Chad,1.0,2005,27,0.142204,1
Sierra Leone,1.0,2005,111,0.147076,2
Bangladesh,1.0,2005,12,0.0647808,3
Belgium,1.1,2005,14,0.131931,4
Lithuania,1.2,2005,74,0.0682282,5
Uruguay,1.3,2005,132,0.137965,6
Congo,1.4,2005,31,0.0904684,7
Peru,1.4,2005,100,0.0550891,8
Albania,1.4,2005,2,0.167394,9
Germany,1.4,2005,48,0.0502653,10


The second column is the percentage of the budget that any country spent on their military. What is the average amount spent on the military?

In [27]:
np.mean(military_spending.column('milspend_pct'))

2.0909090909090913

How do the median and mean military spending values differ?

In [28]:
np.mean(military_spending.column('milspend_pct'))- np.median(military_spending.column('milspend_pct'))

0.6409090909090913

## Saving Your Notebook
Now that you've finished the homework, we need to save it! To do this, click <code>File</code> $\rightarrow$ <code>Download as</code> $\rightarrow$ <code>PDF via Chrome</code>