# Intro to Python


Python is a programming language that is used across a broad range sectors including academic, government, and private. 

The first program any _good_ programmer will write is the "Hello World". To execute the cell just hit "Shift + Enter"

In [None]:
print("Hello World")

Great! You just ran your first snippet of code.
![proud](https://media.giphy.com/media/fdyZ3qI0GVZC0/giphy.gif)

Now lets move on to some basics. First lets declare a variable. Lets call that variable "my_variable" and set it equal to 100. 

In [None]:
my_variable = 100

We now have a variable in our current environment that is set to the value we gave it. To see what that value is we just need to print that variabel.

In [None]:
print(my_variable)

## Exercise  
Do it yourself!!

We have now learned how to declare a variable. All of the cells we have ran so far take very little computational power.  Soon we will be working with large data and execution of commands will take much longer. Lets artificially simulate some of the time. 

In [None]:
import time
time.sleep(5)

Notice the star which resides inside the brackets to the left of the cell? It lasted for 5 seconds because we told the cell
to sleep for 5 seconds. 

In order to do this we performed an import(ant) python job.  "import" is used to call upon modules that exist that all perform unique functions.  In this case we imported the "time" package. time has a function sleep, which does just what it says, it makes the execution go to sleep for a prescribed time.  Go ahead and change the 5 to another number and you can see that it will sleep for however long you tell it!

Now that we have a basic grasp on packages and variables, lets try a function. A function is a piece of code which performs a, you guessed it, function. These are sort of the bread and butter of programming so you will become more familiar as we progress. For now lets start with something simple. 

In [None]:
def my_age(age):
    
    return "I really can't believe you are only {}, you look much older! jkjk".format(age)

In [None]:
my_age(49)

Lets break the function down into it's constituent parts.
* def - this is the command we use in python to initiate a function. 
* my_age() - this is just the name we have given to our function. Functions must have a () immediately following the name.
* age - again this is just a placeholder variable. We could name it something different and as long as we change it to reflect that in the next line, it would run the exact same. Just try it!
* return - this is an important aspect of functions.  This is the command which tells the function what it is we expect in "return" after the function is run. 

Pretty easy right! You are well on your way to being a coder and reading all the extra stuff in the matrix!

Ok lets venture into what is often the toughest thing to grasp formany, the directory :(

In [None]:
import os

In [None]:
os.getcwd()

As you may have guessed "os" stands for operating system.  This package gives us a variety of functions to interact directly with our computer. 

When you want to access the functions within a package it is oftentimes by using a "." that you can do so. Here we call os.getcwd() to get access to the _current working directory_. 

If you want to change it you can use the function os.chdir(). Lets change it to the data folder (this is just the folder inside what your working should be.) 

In [None]:
os.chdir('data')

This will be different for everyone working on their own computers. If we can get familiar with 
* Knowing what directory(folder) we are working in and,
* Knowing how to change that directory to what we want
This will alleviate a majority of _bugs_ you might face.

Lets go ahead and import another package and then load in our first data!

In [None]:
import pandas as pd

df = pd.read_csv('us_leader_lta.csv')

Alright review of what we just did
* pandas as pd - we are importing the package pandas and just naming it pd.  This is a convenience thing.  More on pandas later as its very important. 
* df - the single most stereotypical name for a _dataframe_ in history.  When you code, be more creative ;)
* pd.read_csv - This is the pandas (pd) function for reading a csv file.  Pretty self explanatory. 
* Be sure to put file names in ''.

So we got our first data loaded in. Lets check out whats inside.

In [None]:
df.head()

In [None]:
df.describe()

In [None]:
df.columns

Alright so we loaded up data on US Presidents Psychology.  Cool right?! Function review:
* .head() - This shows us the first 5 rows. If we put inside .head(10), it would show us the first ten rows. O and if you want to see the last rows that would be .tail().
* .describe() - Just some basic descriptives. 
* .columns - names of your columns.

Here we have a basic data frame (just like an excel).  Except now we have all the tools of python to investigate, manipulate, and visualize it. 

In [None]:
df.dtypes

It is always good to get a feel for what the data you are working with looks like.  One important technique for telling what kind of data you have is to use .dtypes

This shows you what type of data is each variable.
* Object - These are string data like names.
* int64 - These are whole numbers like 10, 20, 1000.
* float64 - These are numbers with decimals like 0.54, 9.44, 1000.2002

Now that we have taken a peak at our data lets begin to think like a computational social scientist. 
What sorts of relationships do we think exist? 
What patterns do we expect to see in the data?

Lets pick two variables and start investigating. How about a president's level of distrust and their conceptual complextity(cc)

A fun and visualing appealing way to start exploring is to make some plots.

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt

In [None]:
fig, ax = plt.subplots(1,2)
ax[0].boxplot(df['cc'], patch_artist=True)
ax[1].boxplot(df['distrust'],patch_artist=True)
ax[0].set_title('Conceptual Complexity')
ax[1].set_title('Level of Distrust')
plt.show()

Interesting! While we knew the average(Orange line) of thes two variables from our .describe step, we did not visually see a clear outlier.  Both Distrust and conceptual complexity have an outlier (as an aside if you have not read outliers by Malcolm Gladwell I would recommend it) represented by the circle.  

A high outlier on the distrust scale means that some president is particularly distrustful. Low on the conceptual complexity scale means that a president speaks in absolutes rather than allowing for any gray or middle ground. Let's find out who each of these are by looking at some of the high and low in each group. 

In [None]:
df['name'] = df['firstname'] + " " + df['lastname']
df = df.set_index('name')
df['distrust'].nlargest(1).reset_index()

So James Madison was the most distrusting president! Let's see who was the lowest on conceptual complexity.

In [None]:
df['cc'].nsmallest(1).reset_index()

Donald Trump is the lowest on the conceptual complexity score! We now know a little more about the psychological charicteristics of the presidents and along the way we learned how to code! These are just a few of the tools at your disposal as a python programmer.  In the next lab we will expand on our data set utilization and dive into spatial data types!

# !!BONUS PYTHON!!


![Bonus](https://media.giphy.com/media/l0HUjp8V93sygGziE/giphy.gif)

In [None]:
## Comments are those with a hashtag and don't run in the code
## Lists are another type of python object. Lets go with a list of presidents names. First we will create it, then
## I will extract a list from our data frame.
## When creating a list use []
pres_list = ['trump', 'obama', 'bush', 'clinton']

In [None]:
## For the extracted list
df = df.reset_index()
pres_list2 = list(df.name.unique())

In [None]:
## Lets print these lists 
pres_list

In [None]:
pres_list2

In [None]:
## Nice! how about a loop?
for president in pres_list:
    print(president)

In [None]:
## Ok lets print only the first name of all presidents in the big list
for president in pres_list2:
    print(president.split(" ",)[0])