# Python Resources


## Resources

### Software
* [Anaconda distribution](https://www.continuum.io/downloads): Has most of the modules you need for scientific work preinstalled

### Tutorials - Forums
* [Learn Python the Hard Way](http://learnpythonthehardway.org/book/): Really good tutorial to learn python from scratch
* [Learnpython.org](www.learnpython.org): More interactive
* [Stack Overflow](http://stackoverflow.com/questions/tagged/python)
* The best way to learn it is to get the basics from a tutorial and then have a little project and force yourself to do it in python

### Standard Modules for Data Analysis
* [Numpy](http://www.numpy.org/): Basic requirement for scientific stuff
* [Pandas](http://pandas.pydata.org/): R - like data frames and data analysis / management
* [Scikit-learn](http://scikit-learn.org/stable/): Machine learning / statistics
* [Matplot-lib](http://matplotlib.org/): For plotting

### Interactive Use
* [Ipython Notebook](http://ipython.org/notebook.html) (included in Anaconda distribution)
* [Spyder](https://pythonhosted.org/spyder/) (Included in Anaconda)

## Basics

In [None]:
number = 5
string = 'Hello I am a string'
LIST = [1, 2, 3, 5]
dictionary = {'Apples': 10, 'Oranges': 5}
integer = int(5)
FLOAT = float(5)

print LIST[0]
print dictionary['Apples']

for item in LIST:
    print item

#### Exercise - Basic Usage

**Commandline**
* Clone this reporsitory: https://github.com/BDSS-PSU/python_intro.git
* Open the file hello_world.py a text editor of your choice
* Write the following code in it:

In [None]:
print 'Hello World'
a = 5
b = {'Apples': 10, 'Oranges': 20}
print a
print b['Oranges']

* Open your terminal (mac/linux) / PowerShell (windows)
* Navigate to into the repository that you cloned
* type `python hello_world.py` into the prompt and hit enter

**Interactive**
* On the commandline, navigate into your repository
* type `ipython notebook`
* profit

## Modules

In [None]:
# Installed Modules
import os
import numpy as np
from pandas import DataFrame

# Your own modules / scripts
import myscript

In [None]:
print os.path
print myscript.hello_world(5)

If you want to install a package that is not contained in your distribution you should use a package manager. Anaconda has its own manager [`conda`](http://conda.pydata.org/docs/using/pkgs.html) and there is [`pip`](https://pypi.python.org/pypi/pip). In my experience `pip` works better.

#### Exercise 

* Go to your terminal
* type `pip install gensim`
* Check if it worked:
* In the command line type `python`
* At the python prompt type `import gensim` and see if you get errors

### Object Structure

Each object has a `class` and most objects contain stuff. You can use the `.` to get stuff that is stored in objects.

In [None]:
class my_class():
    ''' 
    An example class
    '''
    
    variable_a = 1
    variable_b = 2
    
    def my_sum(self):
        '''
        Calculates the sum of two variables
        '''
        
        return self.variable_a + self.variable_b 
    
    def useless_print_function(self, string):
        '''
        Prints each character of a string in a new line. Don't ask me why
        '''
        
        for character in string:
            print character

In [None]:
example = my_class()

In [None]:
print 'example Contains the following stuff:'
print dir(example)
print example.__doc__
print example.my_sum
print example.my_sum.__doc__

In [None]:
example.useless_print_function("Please Don't split me")

## A little example project

Let's see if Obama or Trump is more egocentric. Let's count how often they say "I" in two of their speeches. 

In [None]:
import io
import re
from pprint import pprint

# Where are the speeches stored?
trump_file_name = 'trump.txt'
obama_file_name = 'obama.txt'

# Open connections to the files
trump_file = io.open(trump_file_name, 'r', encoding='utf-8')
obama_file = io.open(obama_file_name, 'r', encoding='utf-8')


# Define the function that does what we want to do
def count_words(file_connection):
    '''
    Count all words in a file and return dictionary
    '''

    # Empty dictionary to hold the results
    out_dict = {}
    # First entry: Total words counts
    out_dict['total_words'] = 0
    
    # Loop through the lines in the input file
    for line in file_connection:

        # Remove everything that is not a letter (if you want to know more look up 'regular expressions')
        clean_line = re.sub('[^A-Za-z]', ' ', line)
        
        # Split the line into single words
        words = clean_line.split()

        # Actually count the words: 
        for word in words:
            
            out_dict['total_words'] += 1
            
            # if we enounter the word for the first time, we create a new entry in the dictionary
            if word not in out_dict.keys():
                out_dict[word] = 1
                
            # If not, we increment the word count by one
            else:
                out_dict[word] += 1
            
    return out_dict

obama_dictionary = count_words(obama_file)
trump_dictionary = count_words(trump_file)

trump_file.close()
obama_file.close()

In [None]:
pprint(obama_dictionary)

In [None]:
pprint(trump_dictionary)

In [None]:
obama_measure = float(obama_dictionary['I']) / float(obama_dictionary['total_words'])
trump_measure = float(trump_dictionary['I']) / float(trump_dictionary['total_words'])

print 'Obama said "I" {0} times in a speech {1} words long. That is: {2} per word'.format(obama_dictionary['I'], obama_dictionary['total_words'], round(obama_measure, 3))
print 'Trump said "I" {0} times in a speech {1} words long. That is: {2} per word'.format(trump_dictionary['I'], trump_dictionary['total_words'], round(trump_measure, 3))