# Hello and Welcome!

Hi There! We are excited that you will be joining us on a text analysis and visualization adventure!

We want everyone to be prepared for this adventure, so it is worth taking a few moments to learn a bit about the tools we are going to be using. 

Specifically, we will be using the Python programming language to work through the analysis portion of the workshop. Why python? Well, it has a fairly minimalistic syntax to learn. It is versatile and can be used for lots of programming tasks. And, importantly, it is very popular in the text analysis and NLP communities. 

There are a number of text analysis tools built using python. We will be focusing on using [NLTK](http://www.nltk.org/) in this workshop because of its wide adoption and broad features. 

But before we dive into NLTK and text analysis specifically, we wanted to orient those less familiar with the basic features of the language so it is easy to understand and work with the code presented during the workshop.

So let's get started learning a bit of Python!

## Jupyter Notebooks

For this workshop, we have organized the lessons as a series of [Jupyter Notebooks](http://jupyter.org/). If you have heard of iPython Notebooks - this is basically the same thing.

Jupyter Notebooks are a way to combine text and executed code together as a single interactive document. 


## Variables and Data Types



First things first, let's assign some variables. 


In [1]:
# this is a string <- this is a comment
a_string = "hello, how are you"

# this is a float
a_number = 7.4

a_number

7.4

Comments in python start with a `#` and go to the end of the line. 

In Jupyter Notebooks, we can see the output of the code block is the return value of the last statement in the block.

Use `print()` to print more lines

Data types have methods in python. For example, the [string](https://docs.python.org/2/library/string.html) type has `capitalize`, `lower`, and `find`:

In [2]:
a_string.capitalize()

'Hello, how are you'

Some functions work on data structures, like `len()`

In [4]:
len(a_string)

18

Other common functions include conversions to different data types. 

For example, to print a string and an associated number

In [5]:
a_string + a_number

TypeError: cannot concatenate 'str' and 'float' objects

Instead, you have to convert the number to a string first. 

In [6]:
a_string + str(a_number)

'hello, how are you7.4'

<h3 style="color:red" class="exercise">Your Turn</h3>

Check out the [python documentation](https://docs.python.org/2/library/stdtypes.html#string-methods) to find a string method that will change all occurances of 'cat' in the following sentence, to another word - like 'bat'.

_Hint: in Jupyter notebooks, you can use the **tab** button to autocomplete or recommend methods to call. Try it out after typing a variable and then a `.`_


In [7]:
# 1.2.1

another_sentence = "The cat in the hat liked to walk about, with a little cat spring and a little cat shout."

## Your code:
# modify another_sentence to change the word 'cat' to 'bat'


# view results
another_sentence

'The cat in the hat liked to walk about, with a little cat spring and a little cat shout.'

## Lists and Dicts

Two very common data types you will use frequently are [lists](https://docs.python.org/2/tutorial/introduction.html#lists) and [dicts](https://docs.python.org/2/tutorial/datastructures.html#dictionaries) or Dictionaries. 

**Lists** are typically used like arrays. 

Creating a List:

In [44]:
# create an empty list
my_list = []

# create a list with stuff in it
my_list = [100, 200, 300]

# get the value stored in the 1st index of the list
my_list[1]

200

Lists are 0-indexed. 

Lists have [lots of handy methods](https://docs.python.org/2/tutorial/datastructures.html#more-on-lists). You can append, insert, remove, and sort lists.

In [45]:
# adding to the end of a list
my_list.append(400)

print(len(my_list))

# adding to the beginning of a list
# lists can handle heterogenious data.
my_list.insert(0, 'my my my')

print(my_list)

# removing a value from a list
my_list.remove('my my my')

print(my_list)

4
['my my my', 100, 200, 300, 400]
[100, 200, 300, 400]


To remove a value at a particular index, the `del` function is used.

In [46]:
# remove the value at index 2 from the list
del(my_list[2])

my_list

[100, 200, 400]

**Dicts** are what are typically called _hashes_ or _maps_ in other languages - Just like _Objects_ in JavaScript, they associate keys with values. 

Creating a Dictionary:

In [13]:
# create an empty dictionary
my_dict = {}

# create a dictionary with stuff in it
my_dict = {'a': 123, 'b': 345, 'c': 567}

# get value associated with the key: 'b'
my_dict['b']

345

Use the `in` keyword to check if a key is in a dictionary. 

In [14]:
'b' in my_dict

True

In [15]:
'ee' in my_dict

False

Attempting to access a key that is not in the dictionary results in an error.

In [16]:
my_dict['ee']

KeyError: 'ee'

## String and List Slices

Both strings and lists in python allow for creating new strings or lists from _slices_ of another. 


The notation is to use square brackets `[` `]` to denote the slice. 

To get a larger slice, a colon `:` is used to separate the start and end elements to grab. 



In [19]:
# first a string
a_string = "hello, how are you"

# create a slice with just a single character
print(a_string[4])

# create a new slice with multiple characters using :
print(a_string[0:5])

o
hello


Slices start at the first element, and go up to, but don't include the last element.

In [20]:
# now a list
a_list = [1, 2, 3, 4, 5, 6]

print(a_list[2])

print(a_list[2:5])


3
[3, 4, 5]


You can also use negative numbers to start counting at the end of the string or list.

Leaving off the last value, and ending with a `:` will make a slice that includes up to the end of the string or list.

In [21]:
# This will get the last element in the list
print(a_list[-1])

# This will get the last three elements in the array.
# notice there is no stopping value.
print(a_list[-3:])

6
[4, 5, 6]


## Conditional Statements

A critical feature of Python is that it is 

   **INDENTATION SENSITIVE**.

Meaning that indentation is used to demark code blocks. 

Here is an example of an `if` statement in Python:

In [36]:
if('b' in my_dict):
    print('yep its there')
    
    print('still in the block')
else:
    print('nope its not')
    
    print('still not here')
    
print('we are done here')

yep its there
still in the block
we are done here


Conditionals are also suffixed by a ":" 


## Loops

Loops also use indentation to indicate the body of the loop. 

Here, we use [range()](https://docs.python.org/2/tutorial/controlflow.html#the-range-function) to create a list to iterate over. 

In [37]:
sum = 0

for i in range(10):
    sum = sum + i
    print(sum)

print('finally')    
print(sum)

0
1
3
6
10
15
21
28
36
45
finally
45


If you need both the index of the iteration and the value, use `enumerate`:


In [41]:
values = ['look', 'a', 'bird']

for i,value in enumerate(values):
    print(i)
    print(value)

0
look
1
a
2
bird


While it is possible to write a `for` loop like this, it is a more common Python practice to use "list comprehension" as we will see next.

## List Comprehension

List comprehensions are a tool in python for transforming one list into another one. 

Filtering is also possible using comprehensions.

List comprehensions are frequently used in place of for loops when possible. 

Let's look a few examples

In [24]:
my_list = [1, 2, 3]

# a list comprehension that doubles each value in my_list
double_list = [value * 2 for value in my_list]

double_list

[2, 4, 6]

The list comprehension is the stuff inside the `[ ]` brackets. In goes one list (`my_list`) and out comes another (`double_list`). 

It uses the `for`-`in` structure - just like the `for` loop we looked at.

You can perform any manipulation you'd like to transform the old list into the new list.

the variable placeholder representing the current value in the list can be named anything you want. 

Here, we convert each value to strings:

In [25]:
# numbers to strings comprehension
[str(val) for val in my_list]

['1', '2', '3']

List comprehensions can also contain a `if` suffix that serves as a **filter** on the old list.

In [26]:
# filtering and transforming
[val * 2 for val in my_list if val >= 2]

[4, 6]

There are lots more you can do with list comprehensions, but we will move on for now. For more info, check out [list comprehensions explained visually](https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/).

## Functions

We have used a few functions already, `len()` and `del()` for example. 

Now let's take a look at how to create functions of our own. 

In [27]:
def double(value):
    new_value = value * 2
    return new_value

In [28]:
double(20)

40

In [29]:
double(14.4)

28.8

Again, indentation is used to define the start and end of the function code block. 

The function definition begins with the `def` keyword and ends with a colon. `return` is used to return a value from the function. 

That's really all there is to it!

## Importing Modules

Python uses packages to modularize code. 

We can use these other modules by **importing** them. Once a module has been imported, we have access to its functions and variables. 


In [30]:
#import the re package using the import keyword
import re

pattern = re.compile("b.dy")


If you want to add a specific variable or function from a package into your environment, you can import it specifically using this syntax:

In [31]:
# import the punctuation variable from the string module
from string import punctuation

# now 'punctuation' is in our environment
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

## Reading from a File

Real quickly we will look at common patterns in Python for reading files. 

You can 'open' a file and create a file handle to that file using the `open()` function:


In [32]:
filename = 'data/test.txt'
handle =  open(filename)

Then reading the contents of a file can be done using the `read()` or `readlines()` methods

In [33]:
# returns a string with the entire contents of the file.
content = handle.read()
content

'This is the contents of the file.\nThe file has this in it.\nAnd nothing else.\n\n'

But make sure you close the file handle after you are done

In [34]:
handle.close()

Often you will see the use of `with` to create the file handle.

This will automatically close the file after the code block has been executed

In [35]:
with open(filename) as handle:
    for line in handle:
        print(line)
        


This is the contents of the file.

The file has this in it.

And nothing else.





Check out more about [File I/O here](https://docs.python.org/2/tutorial/inputoutput.html)